Yuki Mitsufuji

Trans. Int. Soc. Music. Inf. Retr., January, 2024

The Sound Demixing Challenge 2023 - Music Demixing Track.

[BibT_eX]

[DOI]

Trans. Int. Soc. Music. Inf. Retr., January, 2024

HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes.

[BibT_eX]

[DOI]

Trans. Mach. Learn. Res., 2024

Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis.

[BibT_eX]

[DOI]

CoRR, 2024

SAVGBench: Benchmarking Spatially Aligned Audio-Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

TraSCE: Trajectory Steering for Concept Erasure.

[BibT_eX]

[DOI]

CoRR, 2024

Blind Inverse Problem Solving Made Easy by Text-to-Image Latent Diffusion.

[BibT_eX]

[DOI]

CoRR, 2024

Classifier-Free Guidance inside the Attraction Basin May Cause Memorization.

[BibT_eX]

[DOI]

CoRR, 2024

Music Foundation Model as Generic Booster for Music Downstream Tasks.

[BibT_eX]

[DOI]

CoRR, 2024

OpenMU: Your Swiss Army Knife for Music Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Mitigating Embedding Collapse in Diffusion Models for Categorical Data.

[BibT_eX]

[DOI]

CoRR, 2024

G2D2: Gradient-guided Discrete Diffusion for image inverse problem solving.

[BibT_eX]

[DOI]

CoRR, 2024

Distillation of Discrete Diffusion through Dimensional Correlations.

[BibT_eX]

[DOI]

CoRR, 2024

<i>Jump Your Steps</i>: Optimizing Sampling Schedule of Discrete Diffusion Models.

[BibT_eX]

[DOI]

CoRR, 2024

GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models.

[BibT_eX]

[DOI]

Muhammad Jehanzeb Mirza

CoRR, 2024

VRVQ: Variable Bitrate Residual Vector Quantization for Audio Compression.

[BibT_eX]

[DOI]

Kyogu Lee

CoRR, 2024

Human-Feedback Efficient Reinforcement Learning for Online Diffusion Model Finetuning.

[BibT_eX]

[DOI]

CoRR, 2024

Bellman Diffusion: Generative Modeling as Learning a Linear Operator in the Distribution Space.

[BibT_eX]

[DOI]

Yangming Li

Chieh-Hsin Lai

Carola-Bibiane Schönlieb

Stefano Ermon

CoRR, 2024

Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models.

[BibT_eX]

[DOI]

Muhammad Jehanzeb Mirza

CoRR, 2024

A Survey on Diffusion Models for Inverse Problems.

[BibT_eX]

[DOI]

Alexandros G. Dimakis

Mauricio Delbracio

CoRR, 2024

A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation.

[BibT_eX]

[DOI]

CoRR, 2024

LOCKEY: A Novel Approach to Model Authentication and Deepfake Tracking.

[BibT_eX]

[DOI]

CoRR, 2024

Latent Diffusion Bridges for Unsupervised Musical Audio Timbre Transfer.

[BibT_eX]

[DOI]

Giorgio Fabbro

CoRR, 2024

DisMix: Disentangling Mixtures of Musical Instruments for Source-level Pitch and Timbre Manipulation.

[BibT_eX]

[DOI]

CoRR, 2024

GRAFX: An Open-Source Library for Audio Processing Graphs in PyTorch.

[BibT_eX]

[DOI]

Sungho Lee

CoRR, 2024

SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond.

[BibT_eX]

[DOI]

CoRR, 2024

Improving Unsupervised Clean-to-Rendered Guitar Tone Transformation Using GANs and Integrated Unaligned Clean Data.

[BibT_eX]

[DOI]

Yu-Hua Chen

Woosung Choi

CoRR, 2024

ComperDial: Commonsense Persona-grounded Dialogue Dataset and Benchmark.

[BibT_eX]

[DOI]

CoRR, 2024

SilentCipher: Deep Audio Watermarking.

[BibT_eX]

[DOI]

CoRR, 2024

MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training.

[BibT_eX]

[DOI]

CoRR, 2024

Searching For Music Mixing Graphs: A Pruning Approach.

[BibT_eX]

[DOI]

Sungho Lee

CoRR, 2024

SoundCTM: Uniting Score-based and Consistency Models for Text-to-Sound Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

Discriminator-Guided Cooperative Diffusion for Joint Audio and Video Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Understanding Multimodal Contrastive Learning Through Pointwise Mutual Information.

[BibT_eX]

[DOI]

CoRR, 2024

Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation.

[BibT_eX]

[DOI]

CoRR, 2024

MR-MT3: Memory Retaining Multi-Track Music Transcription to Mitigate Instrument Leakage.

[BibT_eX]

[DOI]

CoRR, 2024

GenWarp: Single Image to Novel Views with Semantic-Preserving Generative Warping.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

SpecMaskGIT: Masked Generative Modeling of Audio Spectrogram for Efficient Audio Synthesis and Beyond.

[BibT_eX]

[DOI]

Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024

Towards Assessing Data Replication in Music Generation With Music Similarity Metrics on Raw Audio.

[BibT_eX]

[DOI]

Proceedings of the 25th International Society for Music Information Retrieval Conference, 2024

MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models.

[BibT_eX]

[DOI]

Simon Dixon

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

SAN: Inducing Metrizability of GAN with Discriminative Normalized Linear Layer.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Manifold Preserving Guided Diffusion.

[BibT_eX]

[DOI]

Proceedings of the Twelfth International Conference on Learning Representations, 2024

Zero- and Few-Shot Sound Event Localization and Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

VRDMG: Vocal Restoration via Diffusion Posterior Sampling with Multiple Guidance.

[BibT_eX]

[DOI]

Carlos Hernandez-Olivan

Koichi Saito

Naoki Murata

Chieh-Hsin Lai

Proceedings of the IEEE International Conference on Acoustics, 2024

Enhancing Semantic Communication with Deep Generative Models: An Overview.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Timbre-Trap: A Low-Resource Framework for Instrument-Agnostic Music Transcription.

[BibT_eX]

[DOI]

Frank Cwitkowitz

Kin Wai Cheuk

Woosung Choi

Keisuke Toyama

Proceedings of the IEEE International Conference on Acoustics, 2024

BIGVSAN: Enhancing Gan-Based Neural Vocoders with Slicing Adversarial Network.

[BibT_eX]

[DOI]

Takashi Shibuya

Yuhta Takida

Proceedings of the IEEE International Conference on Acoustics, 2024

On the Language Encoder of Contrastive Cross-modal Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

DiffuCOMET: Contextual Commonsense Knowledge Diffusion.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023

STARSS23: Sony-TAu Realistic Spatial Soundscapes 2023.

[BibT_eX]

[DOI]

Aapo Hakala

Shusuke Takahashi

Dataset, March, 2023

STARSS23: Sony-TAu Realistic Spatial Soundscapes 2023.

[BibT_eX]

[DOI]

Aapo Hakala

Shusuke Takahashi

Alexander L. Stempkovskiy

Dataset, March, 2023

Towards reporting bias in visual-language datasets: bimodal augmentation by decoupling object-attribute association.

[BibT_eX]

[DOI]

CoRR, 2023

Enhancing Semantic Communication with Deep Generative Models - An ICASSP Special Session Overview.

[BibT_eX]

[DOI]

CoRR, 2023

The Sound Demixing Challenge 2023 - Cinematic Demixing Track.

[BibT_eX]

[DOI]

Tatiana Habruseva

Mikhail Sukhovei

CoRR, 2023

On the Equivalence of Consistency-Type Models: Consistency Models, Consistent Diffusion Models, and Fokker-Planck Regularization.

[BibT_eX]

[DOI]

CoRR, 2023

The Whole Is Greater than the Sum of Its Parts: Improving DNN-based Music Source Separation.

[BibT_eX]

[DOI]

CoRR, 2023

Diffusion-based Signal Refiner for Speech Separation.

[BibT_eX]

[DOI]

CoRR, 2023

Cross-modal Face- and Voice-style Transfer.

[BibT_eX]

[DOI]

CoRR, 2023

Adversarially Slicing Generative Networks: Discriminator Slices Feature for One-Dimensional Optimal Transport.

[BibT_eX]

[DOI]

CoRR, 2023

Extending Audio Masked Autoencoders toward Audio Restoration.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Automatic Piano Transcription With Hierarchical Frequency-Time Transformer.

[BibT_eX]

[DOI]

Proceedings of the 24th International Society for Music Information Retrieval Conference, 2023

Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

GibbsDDRM: A Partially Collapsed Gibbs Sampler for Solving Blind Inverse Problems with Denoising Diffusion Restoration.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

FP-Diffusion: Improving Score-based Diffusion Models by Enforcing the Underlying Score Fokker-Planck Equation.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos.

[BibT_eX]

[DOI]

Taylor Berg-Kirkpatrick

Proceedings of the Eleventh International Conference on Learning Representations, 2023

An Attention-Based Approach to Hierarchical Multi-Label Music Instrument Classification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Hierarchical Diffusion Models for Singing Voice Neural Vocoder.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Unsupervised Vocal Dereverberation with Diffusion-Based Generative Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects.

[BibT_eX]

[DOI]

Junghyun Koo

Proceedings of the IEEE International Conference on Acoustics, 2023

Diffroll: Diffusion-Based Generative Music Transcription with Unsupervised Pretraining Capability.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

STARSS22: Sony-TAu Realistic Spatial Soundscapes 2022 dataset.

[BibT_eX]

[DOI]

Sharath Adavanne

Yuichiro Koyama

Shusuke Takahashi

Tuomas Virtanen

Dataset, May, 2022

STARSS22: Sony-TAu Realistic Spatial Soundscapes 2022 dataset.

[BibT_eX]

[DOI]

Adavanne Politis

Dataset, March, 2022

Preventing oversmoothing in VAE via generalized variance parameterization.

[BibT_eX]

[DOI]

Neurocomputing, 2022

A Versatile Diffusion-based Generative Refiner for Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2022

Robust One-Shot Singing Voice Conversion.

[BibT_eX]

[DOI]

CoRR, 2022

Regularizing Score-based Models with Score Fokker-Planck Equations.

[BibT_eX]

[DOI]

CoRR, 2022

Removing Distortion Effects in Music Using Deep Neural Networks.

[BibT_eX]

[DOI]

Johannes Imort

Giorgio Fabbro

Yuichiro Koyama

CoRR, 2022

Automatic music mixing with deep learning and out-of-domain data.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Society for Music Information Retrieval Conference, 2022

Distortion Audio Effects: Learning How to Recover the Clean Signal.

[BibT_eX]

[DOI]

Johannes Imort

Giorgio Fabbro

Yuichiro Koyama

Proceedings of the 23rd International Society for Music Information Retrieval Conference, 2022

SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Amicable Examples for Informed Source Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Multi-ACCDOA: Localizing And Detecting Overlapping Sounds From The Same Class With Auxiliary Duplicating Permutation Invariant Training.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Spatial Mixup: Directional Loudness Modification as Data Augmentation for Sound Event Localization and Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Music Source Separation With Deep Equilibrium Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Automatic DJ Transitions with Differentiable Audio Effects and Generative Adversarial Networks.

[BibT_eX]

[DOI]

Bo-Yu Chen

Wei-Han Hsu

Yi-Hsuan Yang

Proceedings of the IEEE International Conference on Acoustics, 2022

ComFact: A Benchmark for Linking Contextual Commonsense Knowledge.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

STARSS22: A Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events.

[BibT_eX]

[DOI]

Proceedings of the 7th Workshop on Detection and Classification of Acoustic Scenes and Events 2022, 2022

2021

Multichannel Blind Source Separation Based on Evanescent-Region-Aware Non-Negative Tensor Factorization in Spherical Harmonic Domain.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2021

Source Mixing and Separation Robust Audio Steganography.

[BibT_eX]

[DOI]

CoRR, 2021

Music Demixing Challenge at ISMIR 2021.

[BibT_eX]

[DOI]

CoRR, 2021

Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection.

[BibT_eX]

[DOI]

CoRR, 2021

Training Speech Enhancement Systems with Noisy Speech Datasets.

[BibT_eX]

[DOI]

CoRR, 2021

Preventing Posterior Collapse Induced by Oversmoothing in Gaussian VAE.

[BibT_eX]

[DOI]

CoRR, 2021

Hierarchical disentangled representation learning for singing voice conversion.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2021

Adversarial Attacks on Audio Source Separation.

[BibT_eX]

[DOI]

Shota Inoue

Proceedings of the IEEE International Conference on Acoustics, 2021

Accdoa: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization And Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

All For One And One For All: Improving Music Separation By Bridging Networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Densely Connected Multi-Dilated Convolutional Networks for Dense Prediction Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Psychophysiological Effect of Immersive Spatial Audio Experience Enhanced Using Sound Field Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 9th International Conference on Affective Computing and Intelligent Interaction, 2021

2020

Multichannel Non-Negative Matrix Factorization Using Banded Spatial Covariance Matrices in Wavenumber Domain.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Spherical-Harmonic-Domain Feedforward Active Noise Control Using Sparse Decomposition of Reference Signals from Distributed Sensor Arrays.

[BibT_eX]

[DOI]

Yu Maeno

Prasanga N. Samarasinghe

Naoki Murata

Thushara D. Abhayapala

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Densely connected multidilated convolutional networks for dense prediction tasks.

[BibT_eX]

[DOI]

CoRR, 2020

D3Net: Densely connected multidilated DenseNet for music source separation.

[BibT_eX]

[DOI]

CoRR, 2020

Sound Event Localization and Detection Using Activity-Coupled Cartesian DOA Vector and RD3net.

[BibT_eX]

[DOI]

CoRR, 2020

Improving Voice Separation by Incorporating End-To-End Speech Recognition.

[BibT_eX]

[DOI]

Sudarsanam Parthasaarathy

Sakya Basak

Sriram Ganapathy

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Array-Geometry-Aware Spatial Active Noise Control Based on Direction-of-Arrival Weighting.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Open-Unmix - A Reference Implementation for Music Source Separation.

[BibT_eX]

[DOI]

J. Open Source Softw., 2019

Closing the Training/Inference Gap for Deep Attractor Networks.

[BibT_eX]

[DOI]

CoRR, 2019

Recursive Speech Separation for Unknown Number of Speakers.

[BibT_eX]

[DOI]

Sudarsanam Parthasaarathy

Nabarun Goswami

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Global and Local Mode-domain Adaptive Algorithms for Spatial Active Noise Control Using Higher-order Sources.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

Improving DNN-based Music Source Separation using Phase Features.

[BibT_eX]

[DOI]

CoRR, 2018

Mmdenselstm: An Efficient Combination of Convolutional and Recurrent Neural Networks for Audio Source Separation.

[BibT_eX]

[DOI]

Nabarun Goswami

Proceedings of the 16th International Workshop on Acoustic Signal Enhancement, 2018

Mode-Domain Spatial Active Noise Control Using Multiple Circular Arrays.

[BibT_eX]

[DOI]

Yu Maeno

Prasanga N. Samarasinghe

Thushara D. Abhayapala

Proceedings of the 16th International Workshop on Acoustic Signal Enhancement, 2018

PhaseNet: Discretized Phase Modeling with Deep Neural Networks for Audio Source Separation.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Mode Domain Spatial Active Noise Control Using Sparse Signal Representation.

[BibT_eX]

[DOI]

Yu Maeno

Thushara D. Abhayapala

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Multi-Scale multi-band densenets for audio source separation.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2017

Improving music source separation based on deep neural networks through data augmentation and network blending.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Supervised monaural source separation based on autoencoders.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

Multichannel blind source separation based on non-negative tensor factorization in wavenumber domain.

[BibT_eX]

[DOI]

Shoichi Koyama

Hiroshi Saruwatari

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Deep neural network based instrument extraction from music.

[BibT_eX]

[DOI]

Franck Giron

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

NMF-based blind source separation using a linear predictive coding error clustering criterion.

[BibT_eX]

[DOI]

Xin Guo

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014

On the use of a spatial cue as prior information for stereo sound source separation based on spatially weighted non-negative tensor factorization.

[BibT_eX]

[DOI]

Axel Roebel

EURASIP J. Adv. Signal Process., 2014

Online NON-negative Tensor Deconvolution for source detection in 3DTV audio.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

Sound source separation based on non-negative tensor factorization incorporating spatial cue as prior knowledge.

[BibT_eX]

[DOI]