Nima Mesgarani

Orcid: 0000-0002-2987-759X

According to our database1, Nima Mesgarani authored at least 90 papers between 2004 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
StyleTTS-ZS: Efficient High-Quality Zero-Shot Text-to-Speech Synthesis with Distilled Time-Varying Style Diffusion.
CoRR, 2024

Just ASR + LLM? A Study on Speech Large Language Models' Ability to Identify and Understand Speaker in Spoken Dialogue.
CoRR, 2024

Style-Talker: Finetuning Audio Language Model and Style-Based Text-to-Speech Model for Fast Spoken Dialogue Generation.
CoRR, 2024

DeepSpeech models show Human-like Performance and Processing of Cochlear Implant Inputs.
CoRR, 2024

Speech Slytherin: Examining the Performance and Efficiency of Mamba for Speech Separation, Recognition, and Synthesis.
CoRR, 2024

SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model.
CoRR, 2024

Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation.
CoRR, 2024

Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience.
CoRR, 2024

Contextual Feature Extraction Hierarchies Converge in Large Language Models and the Brain.
CoRR, 2024

Exploring Self-supervised Contrastive Learning of Spatial Sound Event Representation.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
naplib-python: Neural acoustic data processing and analysis tools in python.
Softw. Impacts, September, 2023

Deep neural networks effectively model neural adaptation to changing background noise and suggest nonlinear noise filtering methods in auditory cortex.
NeuroImage, 2023

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform.
CoRR, 2023

SLMGAN: Exploiting Speech Language Model Representations for Unsupervised Zero-Shot Voice Conversion in GANs.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

StyleTTS 2: Towards Human-Level Text-to-Speech through Style Diffusion and Adversarial Training with Large Speech Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Phoneme-Level Bert for Enhanced Prosody of Text-To-Speech with Grapheme Predictions.
Proceedings of the IEEE International Conference on Acoustics, 2023

Online Binaural Speech Separation Of Moving Speakers With A Wavesplit Network.
Proceedings of the IEEE International Conference on Acoustics, 2023

Improved Decoding of Attentional Selection in Multi-Talker Environments with Self-Supervised Learned Speech Representation.
Proceedings of the 45th Annual International Conference of the IEEE Engineering in Medicine & Biology Society, 2023

2022
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis.
CoRR, 2022

Styletts-VC: One-Shot Voice Conversion by Knowledge Transfer From Style-Based TTS Models.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

2021
Group Communication With Context Codec for Lightweight Source Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Neural representation of linguistic feature hierarchy reflects second-language proficiency.
NeuroImage, 2021

Functional characterization of human Heschl's gyrus in response to natural speech.
NeuroImage, 2021

Distortion-Controlled Training for end-to-end Reverberant Speech Separation with Auxiliary Autoencoding Loss.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Understanding Adaptive, Multiscale Temporal Integration In Deep Speech Recognition Systems.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Implicit Filter-and-Sum Network for End-to-End Multi-Channel Speech Separation.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Empirical Analysis of Generalized Iterative Speech Separation Networks.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

StarGANv2-VC: A Diverse, Unsupervised, Non-Parallel Framework for Natural-Sounding Voice Conversion.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Binaural Speech Separation of Moving Speakers With Preserved Spatial Cues.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Continuous Speech Separation Using Speaker Inventory for Long Recording.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Ultra-Lightweight Speech Separation Via Group Communication.
Proceedings of the IEEE International Conference on Acoustics, 2021

Rethinking The Separation Layers In Speech Separation Networks.
Proceedings of the IEEE International Conference on Acoustics, 2021

Speaker and Direction Inferred Dual-Channel Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Brain-informed speech separation (BISS) for enhancement of target speaker in multitalker speech perception.
NeuroImage, 2020

Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording.
CoRR, 2020

Group Communication with Context Codec for Ultra-Lightweight Source Separation.
CoRR, 2020

Implicit Filter-and-sum Network for Multi-channel Speech Separation.
CoRR, 2020

Ultra-Lightweight Speech Separation via Group Communication.
CoRR, 2020

Separating Varying Numbers of Sources with Auxiliary Autoencoding Loss.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-end Microphone Permutation and Number Invariant Multi-channel Speech Separation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Real-Time Binaural Speech Separation with Preserved Spatial Cues.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Online Deep Attractor Network for Real-time Single-channel Speech Separation.
Proceedings of the IEEE International Conference on Acoustics, 2019

Augmented Time-frequency Mask Estimation in Cluster-based Source Separation Algorithms.
Proceedings of the IEEE International Conference on Acoustics, 2019

FaSNet: Low-Latency Adaptive Beamforming for Multi-Microphone Audio Processing.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018
Speaker-Independent Speech Separation With Deep Attractor Network.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

TasNet: Surpassing Ideal Time-Frequency Masking for Speech Separation.
CoRR, 2018

Speech Processing in the Human Brain Meets Deep Learning.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Music Source Activity Detection and Separation Using Deep Attractor Network.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Real-time Single-channel Dereverberation and Separation with Time-domain Audio Separation Network.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Lip2Audspec: Speech Reconstruction from Silent Lip Movements Video.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

TaSNet: Time-Domain Audio Separation Network for Real-Time, Single-Channel Speech Separation.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017
Understanding the Representation and Computation of Multilayer Perceptrons: A Case Study in Speech Recognition.
Proceedings of the 34th International Conference on Machine Learning, 2017

Deep clustering and conventional networks for music separation: Stronger together.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

NAPLib: An open source toolbox for real-time and offline Neural Acoustic Processing.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Deep attractor network for single-microphone speaker separation.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Neural decoding of attentional selection in multi-speaker environments without access to separated sources.
Proceedings of the 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2017

2016
On the Role of Nonlinear Transformations in Deep Neural Network Acoustic Models.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Adaptation of Neural Networks Constrained by Prior Statistics of Node Co-Activations.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Synaptic depression in deep neural networks for speech processing.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Designing a hands-on brain computer interface laboratory course.
Proceedings of the 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2016

Analyzing distributional learning of phonemic categories in unsupervised deep neural networks.
Proceedings of the 38th Annual Meeting of the Cognitive Science Society, 2016

2015
Keynote addresses: Reverse engineering the neural mechanisms involved in robust speech processing.
Proceedings of the 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2015

Speech reconstruction from human auditory cortex with deep neural networks.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Exploring how deep neural networks form phonemic categories.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

2014
Stimulus Reconstruction from Cortical Responses.
Proceedings of the Encyclopedia of Computational Neuroscience, 2014

Principal components of auditory spectro-temporal receptive fields.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

2013
Developing a speaker identification system for the DARPA RATS project.
Proceedings of the IEEE International Conference on Acoustics, 2013

2012
Automatic intelligibility assessment of pathologic speech in head and neck cancer based on auditory-inspired spectro-temporal modulations.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Acoustic and Data-driven Features for Robust Speech Activity Detection.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Developing a Speech Activity Detection System for the DARPA RATS Program.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Speech and speaker separation in human auditory cortex.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012


2011
Performance monitoring for robustness in automatic recognition of speechi.
Proceedings of the 2011 Symposium on Machine Learning in Speech and Language Processing, 2011

Adaptive Stream Fusion in Multistream Recognition of Speech.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Speech processing with a cortical representation of audio.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
Data-Driven and Feedback Based Spectro-Temporal Features for Speech Recognition.
IEEE Signal Process. Lett., 2010

A computational model of rapid task-related plasticity of auditory cortical receptive fields.
J. Comput. Neurosci., 2010

The use of spike-based representations for hardware audition systems.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2010), May 30, 2010

A phoneme recognition framework based on auditory spectro-temporal receptive fields.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

A multistream multiresolution framework for phoneme recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Nonlinear filtering of spectrotemporal modulations in speech enhancement.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
Discriminant spectrotemporal features for phoneme recognition.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

2007
Denoising in the Domain of Spectrotemporal Modulations.
EURASIP J. Audio Speech Music. Process., 2007

Representation of Phonemes in Primary Auditory Cortex: How the Brain Analyzes Speech.
Proceedings of the IEEE International Conference on Acoustics, 2007

2006
Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations.
IEEE Trans. Speech Audio Process., 2006

Discriminating speech and non-speech with regularized least squares.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

2005
Speech Enhancement Based on Filtering the Spectrotemporal Modulations.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004
Speech discrimination based on multiscale spectro-temporal modulations.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004


  Loading...