Yossi Adi

Orcid: 0000-0003-2237-3898

According to our database1, Yossi Adi authored at least 98 papers between 2015 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Scaling Speech Technology to 1, 000+ Languages.
J. Mach. Learn. Res., 2024

Enhancing TTS Stability in Hebrew using Discrete Semantic Units.
CoRR, 2024

A Suite for Acoustic Language Model Evaluation.
CoRR, 2024

LAST: Language Model Aware Speech Tokenization.
CoRR, 2024

Latent Watermarking of Audio Generative Models.
CoRR, 2024

Audio Enhancement from Multiple Crowdsourced Recordings: A Simple and Effective Baseline.
CoRR, 2024

Discrete Flow Matching.
CoRR, 2024

Audio Conditioning for Music Generation via Discrete Bottleneck Features.
CoRR, 2024

A Language Modeling Approach to Diacritic-Free Hebrew TTS.
CoRR, 2024

HebDB: a Weakly Supervised Dataset for Hebrew Speech Processing.
CoRR, 2024

Improving Visual Commonsense in Language Models via Multiple Image Generation.
CoRR, 2024

NAST: Noise Aware Speech Tokenization for Speech Language Models.
CoRR, 2024

Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation.
CoRR, 2024

The Interspeech 2024 Challenge on Speech Processing Using Discrete Units.
CoRR, 2024

The Larger the Better? Improved LLM Code-Generation via Budget Reallocation.
CoRR, 2024

Transformers are Multi-State RNNs.
CoRR, 2024

An Independence-promoting Loss for Music Generation with Language Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Masked Audio Generation using a Single Non-Autoregressive Transformer.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Transformers are Multi-State RNNs.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Layer Collaboration in the Forward-Forward Algorithm.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
High Fidelity Neural Audio Compression.
Trans. Mach. Learn. Res., 2023

Generative Spoken Dialogue Language Modeling.
Trans. Assoc. Comput. Linguistics, 2023

Low-Resource Self-Supervised Learning with SSL-Enhanced TTS.
CoRR, 2023

Code Llama: Open Foundation Models for Code.
CoRR, 2023

AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation.
CoRR, 2023

From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Textually Pretrained Speech Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Simple and Controllable Music Generation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling.
Proceedings of the 20th International Conference on Spoken Language Translation, 2023

Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

AudioGen: Textually Guided Audio Generation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Analysing Discrete Self Supervised Speech Representation For Spoken Language Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2023

I Hear Your True Colors: Image Guided Audio Generation.
Proceedings of the IEEE International Conference on Acoustics, 2023

AERO: Audio Super Resolution in the Spectral Domain.
Proceedings of the IEEE International Conference on Acoustics, 2023

A Holistic Cascade System, Benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Do Coarser Units Benefit Cluster Prediction-Based Speech Pre-Training?
Proceedings of the IEEE International Conference on Acoustics, 2023

Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Generative Spoken Language Model based on continuous word-sized audio tokens.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Regeneration.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Differentiable Model Compression via Pseudo Quantization Noise.
Trans. Mach. Learn. Res., 2022

RemixIT: Continual Self-Training of Speech Enhancement Models via Bootstrapped Remixing.
IEEE J. Sel. Top. Signal Process., 2022

ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement.
CoRR, 2022

Speaking Style Conversion With Discrete Self-Supervised Units.
CoRR, 2022

Audio Language Modeling using Perceptually-Guided Discrete Representations.
CoRR, 2022

On The Robustness of Self-Supervised Representations for Spoken Language Modeling.
CoRR, 2022

textless-lib: a Library for Textless Spoken Language Processing.
CoRR, 2022

Stop: A Dataset for Spoken Task Oriented Semantic Parsing.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

On the Importance of Gradient Norm in PAC-Bayesian Bounds.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Textless Speech-to-Speech Translation on Real Data.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Deep Audio Waveform Prior.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Probing phoneme, language and speaker information in unsupervised speech representations.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Unsupervised Symbolic Music Segmentation using Ensemble Temporal Prediction Errors.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Learning Discrete Structured Variational Auto-Encoder using Natural Evolution Strategies.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Continual Self-Training With Bootstrapped Remixing For Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2022

Textless Speech Emotion Conversion using Discrete & Decomposed Representations.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Direct Speech-to-Speech Translation With Discrete Units.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Text-Free Prosody-Aware Generative Spoken Language Modeling.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
SAGRNN: Self-Attentive Gated RNN For Binaural Speaker Separation With Interaural Cue Preservation.
IEEE Signal Process. Lett., 2021

Textless Speech Emotion Conversion using Decomposed and Discrete Representations.
CoRR, 2021

Direct speech-to-speech translation with discrete units.
CoRR, 2021

Online Self-Attentive Gated RNNs for Real-Time Speaker Separation.
CoRR, 2021

Generative Spoken Language Modeling from Raw Audio.
CoRR, 2021

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

High Fidelity Speech Regeneration with Application to Speech Enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2021

Single Channel Voice Separation for Unknown Number of Speakers Under Reverberant and Noisy Settings.
Proceedings of the IEEE International Conference on Acoustics, 2021

fairseq S\^2: A Scalable and Integrable Speech Synthesis Toolkit.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2021

Fairness in the Eyes of the Data: Certifying Machine-Learning Models.
Proceedings of the AIES '21: AAAI/ACM Conference on AI, 2021

2020
On the generalization of bayesian deep nets for multi-class classification.
CoRR, 2020

Minimal Modifications of Deep Neural Networks using Verification.
Proceedings of the LPAR 2020: 23rd International Conference on Logic for Programming, 2020

Unsupervised Cross-Domain Singing Voice Conversion.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Hide and Speak: Towards Deep Neural Networks for Speech Steganography.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Real Time Speech Enhancement in the Waveform Domain.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Voice Separation with an Unknown Number of Multiple Speakers.
Proceedings of the 37th International Conference on Machine Learning, 2020

Phoneme Boundary Detection Using Learnable Segmental Features.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Hide and Speak: Deep Neural Networks for Speech Steganography.
CoRR, 2019

To Reverse the Gradient or Not: an Empirical Comparison of Adversarial and Multi-task Learning in Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

2018
Fooling End-to-end Speaker Verification by Adversarial Examples.
CoRR, 2018

Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring.
Proceedings of the 27th USENIX Security Symposium, 2018

Out-of-Distribution Detection using Multiple Semantic Label Representations.
Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Fooling End-To-End Speaker Verification With Adversarial Examples.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Analysis of sentence embedding models using prediction tasks in natural language processing.
IBM J. Res. Dev., 2017

Learning Similarity Function for Pronunciation Variations.
CoRR, 2017

Houdini: Fooling Deep Structured Prediction Models.
CoRR, 2017

Houdini: Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples.
Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Automatic Measurement of Pre-Aspiration.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Learning Similarity Functions for Pronunciation Variations.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks.
Proceedings of the 5th International Conference on Learning Representations, 2017

Sequence segmentation using joint RNN and structured prediction models.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016
StructED: Risk Minimization in Structured Prediction.
J. Mach. Learn. Res., 2016

Automatic measurement of vowel duration via structured prediction.
CoRR, 2016

Automatic Measurement of Voice Onset Time and Prevoicing Using Recurrent Neural Networks.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

2015
Vowel duration measurement using deep neural networks.
Proceedings of the 25th IEEE International Workshop on Machine Learning for Signal Processing, 2015


  Loading...