Avatar
Raphael Olivier
PhD candidate working on robust speech representations and trustworthy speech processing. Email : raphael.franck.olivier@gmail.com or rolivier@cs.cmu.edu

I am on the job market this year (Summer 2023)! Reach out at either of the addresses above.

About

I am a third year Ph.D student in the Language Technologies Institute at Carnegie Mellon University, Pittsburgh PA. I am working under the supervision of Prof. Bhiksha Raj. My research interests include Automatic Speech Recognition, Self-supervised learning, Adversarial robustness, Secure and Trustworthy AI

I was a graduate of École Polytechnique in Paris, where I double majored in math and CS. I joined the Master in Language Technologies at CMU in 2017.

Detailed Research interests

AI security

The Deep Learning models that have revolutionized the fields of Computer Vision, Speech and Language Processing, can behave strangely out of their training domain. Agents with malicious intents can actively create these contexts in order to exploit deployed Artificial Intelligence (AI) systems : force self-driving cars to confuse signs and crash, military drones to mistake hospitals for military bases, personal assistants or smartphones to reveal private user information, etc.

In my thesis I work on formalizing, evaluating and mitigating these threats, such as adversarial perturbations, data poisoning, privacy attacks, etc. My long-term research goal is to contribute to the safe development of Artificial Intelligence, so that society can benefit and not suffer from it.

I also believe that security is a fascinating and central aspect of AI from a theoretical perspective. People who create these models want to replicate aspects of human intelligence, and security threats arise precisely when models behave differently from humans (in a way that attackers can control). I would argue that making models safer is equivalent to making them better.

Speech Recognition

Some aspects of AI security are common to all models, but other are specific to certain applications and architectures. When investigating the latter I focus on Speech processing applications, and in particular Automatic Speech Recognition. I am a member of the Machine Learning and Signal Processing research group at CMU. I have also conducted two internships in the Alexa Hybrid Science team in Pittsburgh, where I investigated attacks against and defenses for Amazon Alexa’s speech-to-text models.

Other interests

In my free time I read classic novels, watch films, brew coffee, play tennis and bridge (I’m looking for bridge partners in the Pittsburgh area).

I like photography too. Go check http://www.raphaelolivier.com. Sadly it belongs to a very talented homonym but I get to lure people I meet into thinking it’s mine.

News

Publications

Raphael Olivier, Bhiksha Raj
InterSpeech, Dublin, August 2023
Abstract

Whisper is a recent Automatic Speech Recognition (ASR) model displaying impressive robustness to both out-of-distribution inputs and random noise. In this work, we show that this robustness does not carry over to adversarial noise. We generate very small input perturbations with Signal Noise Ratio of up to 45dB, with which we can degrade Whisper performance dramatically, or even transcribe a target sentence of our choice. We also show that by fooling the Whisper language detector we can very easily degrade the performance of multilingual models. These vulnerabilities of a widely popular open-source model have practical security implications, and emphasize the need for adversarially robust ASR.

Bibtex

@misc{Olivier22WW, title = “There is more than one kind of robustness: Fooling Whisper with adversarial examples”, author = “Olivier Raphael and Raj, Bhiksha”, publisher = {arXiv}, year = {2022}, copyright = {arXiv.org perpetual, non-exclusive license}}

Abstract

A targeted adversarial attack produces audio samples that can force an Automatic Speech Recognition (ASR) system to output attacker-chosen text. To exploit ASR models in real-world, black-box settings, an adversary can leverage the transferability property, i.e. that an adversarial sample produced for a proxy ASR can also fool a different remote ASR. However recent work has shown that transferability against large ASR models is very difficult. In this work, we show that modern ASR architectures, specifically ones based on Self-Supervised Learning, are in fact vulnerable to transferability. We successfully demonstrate this phenomenon by evaluating state-of-the-art self-supervised ASR models like Wav2Vec2, HuBERT, Data2Vec and WavLM. We show that with low-level additive noise achieving a 30dB Signal-Noise Ratio, we can achieve target transferability with up to 80% accuracy. Next, we 1) use an ablation study to show that Self-Supervised learning is the main cause of that phenomenon, and 2) we provide an explanation for this phenomenon. Through this we show that modern ASR architectures are uniquely vulnerable to adversarial security threats.

Bibtex

@misc{Olivier22WW, title = “Watch What You Prerain For: Targeted, Transferable Adversarial Examples on Self-Supervised Speech Recognition models”, author = “Olivier Raphael and Abdullah, Hadi and Raj, Bhiksha”, publisher = {arXiv}, year = {2022}, copyright = {arXiv.org perpetual, non-exclusive license}}

Abstract

Robustness to adversarial attack is typically evaluated with adversarial accuracy. This metric quantifies the number of points for which, given a threat model, successful adversarial perturbations cannot be found. While essential, this metric does not capture all aspects of robustness and in particular leaves out the question of how many perturbations can be found for each point. In this work we introduce an alternative approach, adversarial sparsity, which quantifies how difficult it is to find a successful perturbation given both an input point and a constraint on the direction of the perturbation. This constraint may be angular (L2 perturbations), or based on the number of pixels (Linf perturbations). We show that sparsity provides valuable insight on neural networks in multiple ways. analyzing the sparsity of existing robust models illustrates important differences between them that accuracy analysis does not, and suggests approaches for improving their robustness. When applying broken defenses effective against weak attacks but not strong ones, sparsity can discriminate between the totally ineffective and the partially effective defenses. Finally, with sparsity we can measure increases in robustness that do not affect accuracy: we show for example that data augmentation can by itself increase adversarial robustness, without using adversarial training.

Bibtex

@misc{Olivier22HM, doi = {10.48550/ARXIV.2207.04129}, url = {https://arxiv.org/abs/2207.04129}, author = {Olivier, Raphael and Raj, Bhiksha}, title = {How many perturbations break this model? Evaluating robustness beyond adversarial accuracy}, publisher = {arXiv}, year = {2022}, copyright = {arXiv.org perpetual, non-exclusive license}}

Raphael Olivier, Bhiksha Raj
InterSpeech, Incheon, September 2022
Abstract

Like many other tasks involving neural networks, Speech Recognition models are vulnerable to adversarial attacks. However recent research has pointed out differences between attacks and defenses on ASR models compared to image models. Improving the robustness of ASR models requires a paradigm shift from evaluating attacks on one or a few models to a systemic approach in evaluation. We lay the ground for such research by evaluating on various architectures a representative set of adversarial attacks: targeted and untargeted, optimization and speech processing-based, white-box, black-box and targeted attacks. Our results show that the relative strengths of different attack algorithms vary considerably when changing the model architecture, and that the results of some attacks are not to be blindly trusted. They also indicate that training choices such as self-supervised pretraining can significantly impact robustness by enabling transferable perturbations. We release our source code as a package that should help future research in evaluating their attacks and defenses.

Bibtex

@inproceedings{Olivier22RI, title = “Recent improvements of ASR models in the face of adversarial attacks”, author = “Olivier Raphael and Raj, Bhiksha”, booktitle = “InterSpeech 2022”, month = sep, year = “2022”, address = “Incheon, South Korea”, publisher = “ISCA”}

Raphael Olivier, Bhiksha Raj
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), Punta Cana, November 2021
Abstract

While Automatic Speech Recognition has been shown to be vulnerable to adversarial attacks, defenses against these attacks are still lagging. Existing, naive defenses can be partially broken with an adaptive attack. In classification tasks, the Randomized Smoothing paradigm has been shown to be effective at defending models. However, it is difficult to apply this paradigm to ASR tasks, due to their complexity and the sequential nature of their outputs. Our paper overcomes some of these challenges by leveraging speech-specific tools like enhancement and ROVER voting to design an ASR model that is robust to perturbations. We apply adaptive versions of state-of-the-art attacks, such as the Imperceptible ASR attack, to our model, and show that our strongest defense is robust to all attacks that use inaudible noise, and can only be broken with very high distortion.

Bibtex

@inproceedings{Olivier21SR, title = “Sequential Randomized Smoothing for Adversarially Robust Speech Recognition”, author = “Olivier Raphael and Raj, Bhiksha”, booktitle = “Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing”, month = nov, year = “2021”, address = “Punta Cana, Dominican Republic”, publisher = “Association for Computational Linguistics”}

Raphael Olivier, Muhammad Shah, Bhiksha Raj
2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, June 2021
Abstract

Recent work suggests that adversarial examples are enabled by high-frequency components in the dataset. In the speech domain where spectrograms are used extensively, masking those components seems like a sound direction for defenses against attacks. We explore a smoothing approach based on additive noise masking in priority high frequencies. We show that this approach is much more robust than the naive noise filtering approach, and a promising research direction. We successfully apply our defense on a Librispeech speaker identification task, and on the UrbanSound8K audio classification dataset.

Bibtex

@inproceedings{Olivier21HF, author={Olivier, R. and Raj, B. and Shah, M.}, booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, title={High-Frequency Adversarial Defense for Speech and Audio}, year={2021}, volume={}, number={}, pages={2995-2999}, doi={10.1109/ICASSP39728.2021.9414525}}

Muhammad Shah, Raphael Olivier, Bhiksha Raj
2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, June 2021
Abstract

Deep Neural Networks (DNNs), while providing state-of-the-art performance in a wide variety of tasks, have been shown to be vulnerable to adversarial attacks. Recent studies have posited that this vulnerability arises because DNNs operate over a grossly overspecified input space with very sparse human supervision due to which they tend to learn spurious features that humans would ignore. These spurious features provide an attack vector for the adversary because perturbing these features would not alter the human’s decision but may alter the model’s prediction. In this paper we explore hypothesis that reducing the size of the model’s feature representation while maintaining its generalizability would discard spurious features while retaining perceptually relevant ones. We find that after the size of the feature representation has been reduced the models exhibit increased adversarial robustness, while suffering only a minimal loss in accuracy. In addition to being more robust, models with compact feature representations have the benefit of being more resource efficient.

Bibtex

@inproceedings{Shah21TA, author={Shah, Muhammad A. and Olivier, Raphael and Raj, Bhiksha}, booktitle={IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, title={Towards Adversarial Robustness Via Compact Feature Representations}, year={2021}, volume={}, number={}, pages={3845-3849}, doi={10.1109/ICASSP39728.2021.9414696}}

Muhammad Shah, Raphael Olivier, Bhiksha Raj
2020 25th International Conference on Pattern Recognition (ICPR), Milan, January 2021
Abstract

Deploying deep learning models, comprising of non-linear combination of millions, even billions, of parameters is challenging given the memory, power and compute constraints of the real world. This situation has led to research into model compression techniques most of which rely on suboptimal heuristics and do not consider the parameter redundancies due to linear dependence between neuron activations in overparametrized networks. In this paper, we propose a novel model compression approach based on exploitation of linear dependence, that compresses networks by elimination of entire neurons and redistribution of their activations over other neurons in a manner that is provably lossless while training. We combine this approach with an annealing algorithm that may be applied during training, or even on a trained model, and demonstrate, using popular datasets, that our method results in a reduction of up to 99% in overall network size with small loss in performance. Furthermore, we provide theoretical results showing that in overparametrized, locally linear (ReLU) neural networks where redundant features exist, and with correct hyperparameter selection, our method is indeed able to capture and suppress those dependencies.

Bibtex

@inproceedings{Shah21EN, author={Shah, Muhammad A. and Olivier, Raphael and Raj, Bhiksha}, booktitle={25th International Conference on Pattern Recognition (ICPR)}, title={Exploiting Non-Linear Redundancy for Neural Model Compression}, year={2021}, volume={}, number={}, pages={9928-9935}, doi={10.1109/ICPR48806.2021.9413178}}

Muhammad Shah, Raphael Olivier, Bhiksha Raj
2020 25th International Conference on Pattern Recognition (ICPR), Milan, January 2021
Abstract

Many machine learning tasks can be posed as matching problems in which we are given a “probe” entry that we expect matches some of the entries in our “gallery”. The general solution to these problems is to retrieve matching entries based on statistical dependencies between the probe and the gallery data that are learned using complex models. Often, however, there are other common covariates to the probe and gallery data which might be easily inferred and may explain some of the statistical dependencies between the two. In this paper we present a probabilistic framework to derive optimal matching strategies based only on covariate features for three broad tasks, namely N-way classification, pairwise verification and ranking. We use canonical metrics to determine the maximum performance that can be expected if only covariate features are used and determine the marginal gain of using complex models. We find that covariate matching achieves an EER within 10% of a CNN in the verification task, and an MAP within 22% of the a DNN based model in the ranking task.

Bibtex

@INPROCEEDINGS{Shah20OS, author={Shah, Muhammad A. and Olivier, Raphael and Raj, Bhiksha}, booktitle={25th International Conference on Pattern Recognition (ICPR)}, title={Optimal Strategies For Comparing Covariates To Solve Matching Problems}, year={2021}, volume={}, number={}, pages={10622-10628}, doi={10.1109/ICPR48806.2021.9412932}}

Antoine Cornuejols, Pierre-Alexandre Murena, Raphael Olivier
Advances in Intelligent Data Analysis XVIII (IDA), Konstanz, April 2020
Abstract

Using transfer learning to help in solving a new classification task where labeled data is scarce is becoming popular. Numerous experiments with deep neural networks, where the representation learned on a source task is transferred to learn a target neural network, have shown the benefits of the approach. This paper, similarly, deals with hypothesis transfer learning. However, it presents a new approach where, instead of transferring a representation, the source hypothesis is kept and this is a translation from the target domain to the source domain that is learned. In a way, a change of representation is learned. We show how this method performs very well on a classification of time series task where the space of time series is changed between source and target.

Bibtex

@InProceedings{Cornuejols20TL, author=“Cornu{'e}jols, Antoine and Murena, Pierre-Alexandre and Olivier, Rapha{"e}l”, editor=“Berthold, Michael R. and Feelders, Ad and Krempl, Georg”, title=“Transfer Learning by Learning Projections from Target to Source”, booktitle=“Advances in Intelligent Data Analysis XVIII”, year=“2020”, publisher=“Springer International Publishing”, address=“Cham”, pages=“119–131”, isbn=“978-3-030-44584-3”}

Shirley Anugrah Hayati*, Raphael Olivier*, Pravalika Avvaru*, Pengcheng Yin, Anthony Tomasic, Graham Neubig
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), Brussels, October 2018
Abstract

Deploying deep learning models, comprising of non-linear combination of millions, even billions, of parameters is challenging given the memory, power and compute constraints of the real world. This situation has led to research into model compression techniques most of which rely on suboptimal heuristics and do not consider the parameter redundancies due to linear dependence between neuron activations in overparametrized networks. In this paper, we propose a novel model compression approach based on exploitation of linear dependence, that compresses networks by elimination of entire neurons and redistribution of their activations over other neurons in a manner that is provably lossless while training. We combine this approach with an annealing algorithm that may be applied during training, or even on a trained model, and demonstrate, using popular datasets, that our method results in a reduction of up to 99% in overall network size with small loss in performance. Furthermore, we provide theoretical results showing that in overparametrized, locally linear (ReLU) neural networks where redundant features exist, and with correct hyperparameter selection, our method is indeed able to capture and suppress those dependencies.

Bibtex

@inproceedings{Hayati18RB, title = “Retrieval-Based Neural Code Generation”, author = “Hayati, Shirley Anugrah and Olivier, Raphael and Avvaru, Pravalika and Yin, Pengcheng and Tomasic, Anthony and Neubig, Graham”, booktitle = “Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing”, month = oct, year = “2018”, address = “Brussels, Belgium”, publisher = “Association for Computational Linguistics”, url = “https://aclanthology.org/D18-1111", doi = “10.18653/v1/D18-1111”, pages = “925–930”}

Education

PhD in Language Technologies
Carnegie Mellon University · Pittsburgh, PA, USA
2019 - Present
My advisor is Prof. Bhiksha Raj.
Master's in Language Technologies
Carnegie Mellon University · Pittsburgh, PA, USA
2017 - 2019
Courses : Machine Learning, Deep Learning, NLP, Machine Translation, Reinforcement Learning, Multimodal
Engineering degree
École polytechnique · Paris, France
2014 - 2017
École polytechnique is the top French “Grande École”. My program’s admission is based on a national, math-heavy competitive entrance exam. I majored in Math and Computer Science.
Classes préparatoires
Lycée Pasteur · Paris, France
2012 - 2014
2 year intense preparation in math, physics and computer science for the upcoming “Grande École” entrance exams.

Work Experience

Applied Scientist Intern
Amazon Alexa · Pittsburgh, PA, USA
June 2021 - August 2021
I worked on evaluating the threat of backdoor poisoning for Speech Recognition models.
Applied Scientist Intern
Amazon Alexa · Pittsburgh, PA, USA
June 2020 - August 2020
I worked on data privacy and membership inference attacks in the context of Speech Recognition, and their relationship to adversarial attacks.
Research intern
AgroParisTech · Paris, France
April 2017 - August 2017
During this research internship in the Learning and Information Knowledge laboratory of AgroParistech, I worked on transfer learning for time series using Boosting of weak projectors, mentored by prof. Antoine Cornuejols.
Intern
DataScienTest · Paris, France and Tel-Aviv, Israel
June 2016 - August 2016
DataScienTest is a startup that offers online data science training. I joined it at its very beginnings and contributed with content creation (Machine Learning exercises, solutions, and correction algorithms) and backend development.

Teaching and talks

Invited Speaker
November 2022
I gave a research talk on two of our recent papers
Invited Speaker
I gave a research talk on our recent paper
Guest Lecturer
Introduction to Deep Learning · Pittsburgh, PA, USA
March 2022
I gave the lecture on Transformers and Graph Neural Networks during this edition of the course. Here is the video.
Teaching Assistant
Introduction to Deep Learning · Pittsburgh, PA, USA
September 2018 - May 2019
This was the primary deep Learning course offered by Carnegie Mellon University and gathered over 250 students. I was a Teaching Assistant for 2 semesters. My responsibilities involved office hours, homework creation and grading, student project mentorship, recitation teaching, and surrogate lecture teaching. Here are some videos