Zeyu Jin

Wenjiao Zai

J. Supercomput., January, 2025

2024

DMDSpeech: Distilled Diffusion Model Surpassing The Teacher in Zero-shot Speech Synthesis via Direct Metric Optimization.

[BibT_eX]

[DOI]

Yingahao Aaron Li

Rithesh Kumar

CoRR, 2024

Code Drift: Towards Idempotent Neural Audio Codecs.

[BibT_eX]

[DOI]

CoRR, 2024

Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation.

[BibT_eX]

[DOI]

Ke Chen

Taylor Berg-Kirkpatrick

Shlomo Dubnov

Chandra Kiran Reddy Evuru

CoRR, 2024

VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap.

[BibT_eX]

[DOI]

Sreyan Ghosh

CoRR, 2024

SpeechCraft: A Fine-Grained Expressive Speech Dataset with Natural Language Description.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

A Closer Look at the Limitations of Instruction Tuning.

[BibT_eX]

[DOI]

Sreyan Ghosh

Chandra Kiran Reddy Evuru

Proceedings of the Forty-first International Conference on Machine Learning, 2024

GR0: Self-Supervised Global Representation Learning for Zero-Shot Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Maskmark: Robust Neuralwatermarking for Real and Synthetic Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

MDX-GAN: Enhancing Perceptual Quality in Multi-Class Source Separation Via Adversarial Training.

[BibT_eX]

[DOI]

Ke Chen

Proceedings of the IEEE International Conference on Acoustics, 2024

SoulSkipper: A Voice-Controlled Emotional Adaptive Game to Complement Therapy for Social Anxiety Disorder.

[BibT_eX]

[DOI]

Proceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, 2024

2023

HoloSinger: Semantics and Music Driven Motion Generation with Octahedral Holographic Projection.

[BibT_eX]

[DOI]

Proceedings of the 31st ACM International Conference on Multimedia, 2023

White Box Search Over Audio Synthesizer Parameters.

[BibT_eX]

[DOI]

Proceedings of the 24th International Society for Music Information Retrieval Conference, 2023

Efficient Spoken Language Recognition via Multilabel Classification.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

2022

High-order Numerical Homogenization for Dissipative Ordinary Differential Equations.

[BibT_eX]

[DOI]

Ruo Li

Multiscale Model. Simul., March, 2022

Stochastic Augmented Projected Gradient Methods for the Large-Scale Precoding Matrix Indicator Selection Problem.

[BibT_eX]

[DOI]

IEEE Trans. Wirel. Commun., 2022

HEAR 2021: Holistic Evaluation of Audio Representations.

[BibT_eX]

[DOI]

CoRR, 2022

Record Once, Post Everywhere: Automatic Shortening of Audio Stories for Social Media.

[BibT_eX]

[DOI]

Bryan Wang

Gautham J. Mysore

Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, 2022

AI Carpet: Automatic Generation of Aesthetic Carpet Pattern.

[BibT_eX]

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Audio Similarity is Unreliable as a Proxy for Audio Quality.

[BibT_eX]

[DOI]

Pranay Manocha

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Controllable Speech Representation Learning Via Voice Conversion and AIC Loss.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

SQAPP: No-Reference Speech Quality Assessment Via Pairwise Preference.

[BibT_eX]

[DOI]

Pranay Manocha

Proceedings of the IEEE International Conference on Acoustics, 2022

Music Enhancement via Image Translation and Vocoding.

[BibT_eX]

[DOI]

Nikhil Kandpal

Oriol Nieto

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Neural Pitch-Shifting and Time-Stretching with Controllable LPCNet.

[BibT_eX]

[DOI]

CoRR, 2021

HiFi-GAN-2: Studio-Quality Speech Enhancement via Generative Adversarial Networks Conditioned on Acoustic Features.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2021

HEAR: Holistic Evaluation of Audio Representations.

[BibT_eX]

[DOI]

Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, 2021

Controllable deep melody generation via hierarchical music structure representation.

[BibT_eX]

[DOI]

Proceedings of the 22nd International Society for Music Information Retrieval Conference, 2021

Compare Machine Learning Models in Text Classification Using Steam User Reviews.

[BibT_eX]

[DOI]

Proceedings of the ICSED 2021: 3rd International Conference on Software Engineering and Development, Xiamen, China, November 19, 2021

Bandwidth Extension is All You Need.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Context-Aware Prosody Correction for Text-Based Speech Editing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

CDPAM: Contrastive Learning for Perceptual Audio Similarity.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Pose2Pose: pose selection and transfer for 2D character animation.

[BibT_eX]

[DOI]

Nora S. Willett

Hijung Valentina Shin

Wilmot Li

Proceedings of the IUI '20: 25th International Conference on Intelligent User Interfaces, 2020

Metric learning vs classification for disentangled music representation learning.

[BibT_eX]

[DOI]

Proceedings of the 21th International Society for Music Information Retrieval Conference, 2020

HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Controllable Neural Prosody Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

A Differentiable Perceptual Audio Metric Learned from Just Noticeable Differences.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Acoustic Matching By Embedding Impulse Responses.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

F0-Consistent Many-To-Many Non-Parallel Voice Conversion Via Conditional Autoencoder.

[BibT_eX]

[DOI]

Kaizhi Qian

Mark Hasegawa-Johnson

Gautham J. Mysore

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Disentangled Multidimensional Metric Learning for Music Similarity.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Music Creation by Example.

[BibT_eX]

[DOI]

Emma Frid

Celso Gomes

Proceedings of the CHI '20: CHI Conference on Human Factors in Computing Systems, 2020

2019

Text-based editing of talking-head video.

[BibT_eX]

[DOI]

ACM Trans. Graph., 2019

Perceptually-motivated Environment-specific Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Learning Bandwidth Expansion Using Perceptually-motivated Loss.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Speech Synthesis for Text-Based Editing of Audio Narration

[BibT_eX]

[DOI]

PhD thesis, 2018

Fftnet: A Real-Time Speaker-Dependent Neural Vocoder.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

VoCo: text-based insertion and replacement in audio narration.

[BibT_eX]

[DOI]

ACM Trans. Graph., 2017

2016

Cute: A concatenative method for voice conversion using exemplar-based unit selection.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Mallo: a distributed synchronized musical instrument designed for internet performance.

[BibT_eX]

[DOI]

Proceedings of the 15th International Conference on New Interfaces for Musical Expression, 2015

2014

AudioQuilt: 2D Arrangements of Audio Samples using Metric Learning and Kernelized Sorting.

[BibT_eX]

[DOI]

Proceedings of the 14th International Conference on New Interfaces for Musical Expression, 2014

2013

Formal Semantics for Music Notation control Flow.

[BibT_eX]

[DOI]