Anurag Kumar

Affiliations:

Facebook Research, Facebook Reality Labs, Redmond, WA USA
Indian Institute of Technology, Kanpur, India (former)

According to our database¹, Anurag Kumar authored at least 74 papers between 2012 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Neural-Network-Based Direction-of-Arrival Estimation for Reverberant Speech - The Importance of Energetic, Temporal, and Spatial Information.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Improved direction of arrival estimations with a wearable microphone array for dynamic environments by reliability weighting.

[BibT_eX]

[DOI]

CoRR, 2024

High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching.

[BibT_eX]

[DOI]

CoRR, 2024

AV-CrossNet: an Audiovisual Complex Spectral Mapping Network for Speech Separation By Leveraging Narrow- and Cross-Band Modeling.

[BibT_eX]

[DOI]

Vahid Ahmadi Kalkhorani

CoRR, 2024

URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2024

Cross-Talk Reduction.

[BibT_eX]

[DOI]

Zhong-Qiu Wang

Anurag Kumar

Shinji Watanabe

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Ambisonics Networks - The Effect of Radial Functions Regularization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

A Closer Look at Wav2vec2 Embeddings for On-Device Single-Channel Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Audiovisual Speaker Separation with Full- and Sub-Band Modeling in the Time-Frequency Domain.

[BibT_eX]

[DOI]

Vahid Ahmadi Kalkhorani

Proceedings of the IEEE International Conference on Acoustics, 2024

Spherical World-Locking for Audio-Visual Localization in Egocentric Videos.

[BibT_eX]

[DOI]

Heeseung Yun

Ruohan Gao

Ishwarya Ananthabhotla

Proceedings of the Computer Vision - ECCV 2024, 2024

Real Acoustic Fields: An Audio-Visual Room Acoustics Dataset and Benchmark.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch.

[BibT_eX]

[DOI]

CoRR, 2023

Rethinking Complex-Valued Deep Neural Networks for Monaural Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Spatialization Quality Metric for Binaural Speech.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Time-domain Transformer-based Audiovisual Speaker Separation.

[BibT_eX]

[DOI]

Vahid Ahmadi Kalkhorani

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Paaploss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

LA-VOCE: LOW-SNR Audio-Visual Speech Enhancement Using Neural Vocoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Nord: Non-Matching Reference Based Relative Depth Estimation from Binaural Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Torchaudio-Squim: Reference-Less Speech Quality and Intelligibility Measures in Torchaudio.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Leveraging Heteroscedastic Uncertainty in Learning Complex Spectral Mapping for Single-Channel Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

TorchAudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

RemixIT: Continual Self-Training of Speech Enhancement Models via Bootstrapped Remixing.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2022

Direction Of Arrival Estimation For Reverberant Speech Based On Neural Networks And The Direct-Path Dominance Test.

[BibT_eX]

[DOI]

Proceedings of the 17th International Workshop on Acoustic Signal Enhancement, 2022

Improving Speech Enhancement through Fine-Grained Speech Characteristics.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

SAQAM: Spatial Audio Quality Assessment Metric.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Speech Quality Assessment through MOS using Non-Matching References.

[BibT_eX]

[DOI]

Pranay Manocha

Anurag Kumar

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Time-domain Ad-hoc Array Speech Enhancement Using a Triple-path Network.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Continual Self-Training With Bootstrapped Remixing For Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Conformer-Based Self-Supervised Learning For Non-Speech Audio Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Multichannel Speech Enhancement Without Beamforming.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

TPARN: Triple-Path Attentive Recurrent Network for Time-Domain Multichannel Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

The Impact of Removing Head Movements on Audio-Visual Speech Enhancement.

[BibT_eX]

[DOI]

Zhiqi Kang

Mostafa Sadeghi

Radu Horaud

Xavier Alameda-Pineda

Jacob Donley

Anurag Kumar

Proceedings of the IEEE International Conference on Acoustics, 2022

Audio Signal Processing for Telepresence Based on Wearable Array in Noisy and Dynamic Scenes.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Ego4D: Around the World in 3, 000 Hours of Egocentric Video.

[BibT_eX]

[DOI]

Santhosh Kumar Ramakrishnan

Christoph Feichtenhofer

Giovanni Maria Farinella

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021

SAGRNN: Self-Attentive Gated RNN For Binaural Speaker Separation With Interaural Cue Preservation.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2021

NICE-Beam: Neural Integrated Covariance Estimators for Time-Varying Beamformers.

[BibT_eX]

[DOI]

CoRR, 2021

TADRN: Triple-Attentive Dual-Recurrent Network for Ad-hoc Array Multichannel Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2021

Ego4D: Around the World in 3, 000 Hours of Egocentric Video.

[BibT_eX]

[DOI]

Santhosh Kumar Ramakrishnan

Christoph Feichtenhofer

Giovanni Maria Farinella

CoRR, 2021

Online Self-Attentive Gated RNNs for Real-Time Speaker Separation.

[BibT_eX]

[DOI]

CoRR, 2021

DPLM: A Deep Perceptual Spatial-Audio Localization Metric.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2021

NORESQA: A Framework for Speech Quality Assessment using Non-Matching References.

[BibT_eX]

[DOI]

Pranay Manocha

Buye Xu

Anurag Kumar

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Do Sound Event Representations Generalize to Other Audio Tasks? A Case Study in Audio Transfer Learning.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Multi-Channel Speech Enhancement Using Graph Neural Networks.

[BibT_eX]

[DOI]

Panagiotis Tzirakis

Anurag Kumar

Jacob Donley

Proceedings of the IEEE International Conference on Acoustics, 2021

Incorporating Real-World Noisy Speech in Neural-Network-Based Speech Enhancement Systems.

[BibT_eX]

[DOI]

Yangyang Xia

Buye Xu

Anurag Kumar

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

A Sequential Self Teaching Approach for Improving Generalization in Sound Event Recognition.

[BibT_eX]

[DOI]

Anurag Kumar

Vamsi K. Ithapu

Proceedings of the 37th International Conference on Machine Learning, 2020

SeCoST: : Sequential Co-Supervision for Large Scale Weakly Labeled Audio Event Detection.

[BibT_eX]

[DOI]

Anurag Kumar

Vamsi Krishna Ithapu

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

SeCoST: Sequential Co-Supervision for Weakly Labeled Audio Event Detection.

[BibT_eX]

[DOI]

Anurag Kumar

Vamsi Krishna Ithapu

CoRR, 2019

Learning Sound Events from Webly Labeled Data.

[BibT_eX]

[DOI]

Anurag Kumar

Ankit Shah

Alexander G. Hauptmann

Bhiksha Raj

Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

2018

A Closer Look at Weak Label Learning for Audio Events.

[BibT_eX]

[DOI]

Ankit Shah

Anurag Kumar

Alexander G. Hauptmann

Bhiksha Raj

CoRR, 2018

NELS - Never-Ending Learner of Sounds.

[BibT_eX]

[DOI]

CoRR, 2018

Classifier Risk Estimation Under Limited Labeling Resources.

[BibT_eX]

[DOI]

Anurag Kumar

Bhiksha Raj

Proceedings of the Advances in Knowledge Discovery and Data Mining, 2018

Content-Based Representations of Audio Using Siamese Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Knowledge Transfer from Weakly Labeled Audio Using Convolutional Neural Network for Sound Events and Scenes.

[BibT_eX]

[DOI]

Anurag Kumar

Maksim Khadkevich

Christian Fügen

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Framework for Evaluation of Sound Event Detection in Web Videos.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Deep CNN Framework for Audio Event Recognition using Weakly Labeled Web Data.

[BibT_eX]

[DOI]

Anurag Kumar

Bhiksha Raj

CoRR, 2017

Audio Content Based Geotagging in Multimedia.

[BibT_eX]

[DOI]

Anurag Kumar

Benjamin Elizalde

Bhiksha Raj

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Audio event and scene recognition: A unified approach using strongly and weakly labeled data.

[BibT_eX]

[DOI]

Bhiksha Raj

Anurag Kumar

Proceedings of the 2017 International Joint Conference on Neural Networks, 2017

Discovering sound concepts and acoustic relations in text.

[BibT_eX]

[DOI]

Anurag Kumar

Bhiksha Raj

Ndapandula Nakashole

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

An approach for self-training audio event detectors using web data.

[BibT_eX]

[DOI]

Proceedings of the 25th European Signal Processing Conference, 2017

2016

An Approach for Self-Training Audio Event Detectors Using Web Data.

[BibT_eX]

[DOI]

CoRR, 2016

Features and Kernels for Audio Event Recognition.

[BibT_eX]

[DOI]

Anurag Kumar

Bhiksha Raj

CoRR, 2016

Audio Event Detection using Weakly Labeled Data.

[BibT_eX]

[DOI]

Anurag Kumar

Bhiksha Raj

Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Speech Enhancement in Multiple-Noise Conditions Using Deep Neural Networks.

[BibT_eX]

[DOI]

Anurag Kumar

Dinei A. F. Florêncio

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Weakly supervised scalable audio content analysis.

[BibT_eX]

[DOI]

Anurag Kumar

Bhiksha Raj

Proceedings of the IEEE International Conference on Multimedia and Expo, 2016

Experiments on the DCASE Challenge 2016: Acoustic Scene Classification and Sound Event Detection in Real Life Recording.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 2016

2015

Unsupervised Fusion Weight Learning in Multiple Classifier Systems.

[BibT_eX]

[DOI]

Anurag Kumar

Bhiksha Raj

CoRR, 2015

CMU Informedia@TRECVID 2015: MED/SIN/LNK/SED.

[BibT_eX]

[DOI]

Proceedings of the 2015 TREC Video Retrieval Evaluation, 2015

A novel ranking method for multiple classifier systems.

[BibT_eX]

[DOI]

Anurag Kumar

Bhiksha Raj

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014

Informedia @ TRECVID 2014.

[BibT_eX]

[DOI]

Proceedings of the 2014 TREC Video Retrieval Evaluation, 2014

Monaural speaker segregation using group delay spectral matrix factorization.

[BibT_eX]

[DOI]

Karan Nathwani

Anurag Kumar

Rajesh M. Hegde

Proceedings of the Twentieth National Conference on Communications, 2014

Detecting sound objects in audio recordings.

[BibT_eX]

[DOI]

Anurag Kumar

Rita Singh

Bhiksha Raj

Proceedings of the 22nd European Signal Processing Conference, 2014

2013

Event detection in short duration audio using Gaussian Mixture Model and Random Forest Classifier.

[BibT_eX]

[DOI]

Proceedings of the 21st European Signal Processing Conference, 2013

2012

Audio event detection from acoustic unit occurrence patterns.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Anurag Kumar

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...