Yossi Adi

Orcid: 0000-0003-2237-3898

According to our database¹, Yossi Adi authored at least 98 papers between 2015 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Scaling Speech Technology to 1, 000+ Languages.

[BibT_eX]

[DOI]

J. Mach. Learn. Res., 2024

Enhancing TTS Stability in Hebrew using Discrete Semantic Units.

[BibT_eX]

[DOI]

Ella Zeldes

Or Tal

Yossi Adi

CoRR, 2024

A Suite for Acoustic Language Model Evaluation.

[BibT_eX]

[DOI]

Gallil Maimon

Amit Roth

Yossi Adi

CoRR, 2024

LAST: Language Model Aware Speech Tokenization.

[BibT_eX]

[DOI]

Arnon Turetzky

Yossi Adi

CoRR, 2024

Latent Watermarking of Audio Generative Models.

[BibT_eX]

[DOI]

CoRR, 2024

Audio Enhancement from Multiple Crowdsourced Recordings: A Simple and Effective Baseline.

[BibT_eX]

[DOI]

Shiran Aziz

Yossi Adi

Shmuel Peleg

CoRR, 2024

Discrete Flow Matching.

[BibT_eX]

[DOI]

CoRR, 2024

Audio Conditioning for Music Generation via Discrete Bottleneck Features.

[BibT_eX]

[DOI]

CoRR, 2024

A Language Modeling Approach to Diacritic-Free Hebrew TTS.

[BibT_eX]

[DOI]

Amit Roth

Arnon Turetzky

Yossi Adi

CoRR, 2024

HebDB: a Weakly Supervised Dataset for Hebrew Speech Processing.

[BibT_eX]

[DOI]

CoRR, 2024

Improving Visual Commonsense in Language Models via Multiple Image Generation.

[BibT_eX]

[DOI]

CoRR, 2024

NAST: Noise Aware Speech Tokenization for Speech Language Models.

[BibT_eX]

[DOI]

Shoval Messica

Yossi Adi

CoRR, 2024

Joint Audio and Symbolic Conditioning for Temporally Controlled Text-to-Music Generation.

[BibT_eX]

[DOI]

CoRR, 2024

The Interspeech 2024 Challenge on Speech Processing Using Discrete Units.

[BibT_eX]

[DOI]

CoRR, 2024

The Larger the Better? Improved LLM Code-Generation via Budget Reallocation.

[BibT_eX]

[DOI]

CoRR, 2024

Transformers are Multi-State RNNs.

[BibT_eX]

[DOI]

CoRR, 2024

An Independence-promoting Loss for Music Generation with Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Masked Audio Generation using a Single Non-Autoregressive Transformer.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Transformers are Multi-State RNNs.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Layer Collaboration in the Forward-Forward Algorithm.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

High Fidelity Neural Audio Compression.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2023

Generative Spoken Dialogue Language Modeling.

[BibT_eX]

[DOI]

Trans. Assoc. Comput. Linguistics, 2023

Low-Resource Self-Supervised Learning with SSL-Enhanced TTS.

[BibT_eX]

[DOI]

CoRR, 2023

Code Llama: Open Foundation Models for Code.

[BibT_eX]

[DOI]

CoRR, 2023

AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation.

[BibT_eX]

[DOI]

CoRR, 2023

From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Textually Pretrained Speech Language Models.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Simple and Controllable Music Generation.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Spoken Language Translation, 2023

Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Expresso: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

AudioGen: Textually Guided Audio Generation.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

Analysing Discrete Self Supervised Speech Representation For Spoken Language Modeling.

[BibT_eX]

[DOI]

Amitay Sicherman

Yossi Adi

Proceedings of the IEEE International Conference on Acoustics, 2023

I Hear Your True Colors: Image Guided Audio Generation.

[BibT_eX]

[DOI]

Roy Sheffer

Yossi Adi

Proceedings of the IEEE International Conference on Acoustics, 2023

AERO: Audio Super Resolution in the Spectral Domain.

[BibT_eX]

[DOI]

Moshe Mandel

Or Tal

Yossi Adi

Proceedings of the IEEE International Conference on Acoustics, 2023

A Holistic Cascade System, Benchmark, and Human Evaluation Protocol for Expressive Speech-to-Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Do Coarser Units Benefit Cluster Prediction-Based Speech Pre-Training?

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Speaking Style Conversion in the Waveform Domain Using Discrete Self-Supervised Units.

[BibT_eX]

[DOI]

Gallil Maimon

Yossi Adi

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Generative Spoken Language Model based on continuous word-sized audio tokens.

[BibT_eX]

[DOI]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Regeneration.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022

Differentiable Model Compression via Pseudo Quantization Noise.

[BibT_eX]

[DOI]

Alexandre Défossez

Yossi Adi

Gabriel Synnaeve

Trans. Mach. Learn. Res., 2022

RemixIT: Continual Self-Training of Speech Enhancement Models via Bootstrapped Remixing.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2022

ReVISE: Self-Supervised Speech Resynthesis with Visual Input for Universal and Generalized Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2022

Speaking Style Conversion With Discrete Self-Supervised Units.

[BibT_eX]

[DOI]

Gallil Maimon

Yossi Adi

CoRR, 2022

Audio Language Modeling using Perceptually-Guided Discrete Representations.

[BibT_eX]

[DOI]

CoRR, 2022

On The Robustness of Self-Supervised Representations for Spoken Language Modeling.

[BibT_eX]

[DOI]

CoRR, 2022

textless-lib: a Library for Textless Spoken Language Processing.

[BibT_eX]

[DOI]

CoRR, 2022

Stop: A Dataset for Spoken Task Oriented Semantic Parsing.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

On the Importance of Gradient Norm in PAC-Bayesian Bounds.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Textless Speech-to-Speech Translation on Real Data.

[BibT_eX]

[DOI]

Ann Lee

Hongyu Gong

Paul-Ambroise Duquenne

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Deep Audio Waveform Prior.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Systematic Comparison of Phonetic Aware Techniques for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Probing phoneme, language and speaker information in unsupervised speech representations.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Unsupervised Symbolic Music Segmentation using Ensemble Temporal Prediction Errors.

[BibT_eX]

[DOI]

Shahaf Bassan

Yossi Adi

Jeffrey S. Rosenschein

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Learning Discrete Structured Variational Auto-Encoder using Natural Evolution Strategies.

[BibT_eX]

[DOI]

Proceedings of the Tenth International Conference on Learning Representations, 2022

Continual Self-Training With Bootstrapped Remixing For Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Textless Speech Emotion Conversion using Discrete & Decomposed Representations.

[BibT_eX]

[DOI]

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Direct Speech-to-Speech Translation With Discrete Units.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Text-Free Prosody-Aware Generative Spoken Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021

SAGRNN: Self-Attentive Gated RNN For Binaural Speaker Separation With Interaural Cue Preservation.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2021

Textless Speech Emotion Conversion using Decomposed and Discrete Representations.

[BibT_eX]

[DOI]

CoRR, 2021

Direct speech-to-speech translation with discrete units.

[BibT_eX]

[DOI]

CoRR, 2021

Online Self-Attentive Gated RNNs for Real-Time Speaker Separation.

[BibT_eX]

[DOI]

CoRR, 2021

Generative Spoken Language Modeling from Raw Audio.

[BibT_eX]

[DOI]

CoRR, 2021

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

High Fidelity Speech Regeneration with Application to Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Single Channel Voice Separation for Unknown Number of Speakers Under Reverberant and Noisy Settings.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

fairseq S\^2: A Scalable and Integrable Speech Synthesis Toolkit.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2021

Fairness in the Eyes of the Data: Certifying Machine-Learning Models.

[BibT_eX]

[DOI]

Proceedings of the AIES '21: AAAI/ACM Conference on AI, 2021

2020

On the generalization of bayesian deep nets for multi-class classification.

[BibT_eX]

[DOI]

CoRR, 2020

Minimal Modifications of Deep Neural Networks using Verification.

[BibT_eX]

[DOI]

Proceedings of the LPAR 2020: 23rd International Conference on Logic for Programming, 2020

Unsupervised Cross-Domain Singing Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Self-Supervised Contrastive Learning for Unsupervised Phoneme Segmentation.

[BibT_eX]

[DOI]

Felix Kreuk

Joseph Keshet

Yossi Adi

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Hide and Speak: Towards Deep Neural Networks for Speech Steganography.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Real Time Speech Enhancement in the Waveform Domain.

[BibT_eX]

[DOI]

Alexandre Défossez

Gabriel Synnaeve

Yossi Adi

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Voice Separation with an Unknown Number of Multiple Speakers.

[BibT_eX]

[DOI]

Eliya Nachmani

Yossi Adi

Lior Wolf

Proceedings of the 37th International Conference on Machine Learning, 2020

Phoneme Boundary Detection Using Learnable Segmental Features.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Hide and Speak: Deep Neural Networks for Speech Steganography.

[BibT_eX]

[DOI]

CoRR, 2019

To Reverse the Gradient or Not: an Empirical Comparison of Adversarial and Multi-task Learning in Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Fooling End-to-end Speaker Verification by Adversarial Examples.

[BibT_eX]

[DOI]

CoRR, 2018

Turning Your Weakness Into a Strength: Watermarking Deep Neural Networks by Backdooring.

[BibT_eX]

[DOI]

Proceedings of the 27th USENIX Security Symposium, 2018

Out-of-Distribution Detection using Multiple Semantic Label Representations.

[BibT_eX]

[DOI]

Gabi Shalev

Yossi Adi

Joseph Keshet

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Fooling End-To-End Speaker Verification With Adversarial Examples.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Analysis of sentence embedding models using prediction tasks in natural language processing.

[BibT_eX]

[DOI]

IBM J. Res. Dev., 2017

Learning Similarity Function for Pronunciation Variations.

[BibT_eX]

[DOI]

Einat Naaman

Yossi Adi

Joseph Keshet

CoRR, 2017

Houdini: Fooling Deep Structured Prediction Models.

[BibT_eX]

[DOI]

CoRR, 2017

Houdini: Fooling Deep Structured Visual and Speech Recognition Models with Adversarial Examples.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017

Automatic Measurement of Pre-Aspiration.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Learning Similarity Functions for Pronunciation Variations.

[BibT_eX]

[DOI]

Einat Naaman

Yossi Adi

Joseph Keshet

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Fine-grained Analysis of Sentence Embeddings Using Auxiliary Prediction Tasks.

[BibT_eX]

[DOI]

Proceedings of the 5th International Conference on Learning Representations, 2017

Sequence segmentation using joint RNN and structured prediction models.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

StructED: Risk Minimization in Structured Prediction.

[BibT_eX]

[DOI]

Yossi Adi

Joseph Keshet

J. Mach. Learn. Res., 2016

Automatic measurement of vowel duration via structured prediction.

[BibT_eX]

[DOI]

CoRR, 2016

Automatic Measurement of Voice Onset Time and Prevoicing Using Recurrent Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

2015

Vowel duration measurement using deep neural networks.

[BibT_eX]

[DOI]

Yossi Adi

Joseph Keshet

Matthew Goldrick

Proceedings of the 25th IEEE International Workshop on Machine Learning for Signal Processing, 2015

Yossi Adi

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...