Zhiyao Duan

Orcid: 0000-0002-8334-9974

Affiliations:
  • University of Rochester, USA
  • Northwestern University, USA (former)


According to our database1, Zhiyao Duan authored at least 125 papers between 2007 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Measure by Measure: Measure-Based Automatic Music Composition with Modern Staff Notation.
Trans. Int. Soc. Music. Inf. Retr., January, 2024

Cacophony: An Improved Contrastive Audio-Text Model.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

MusicHiFi: Fast High-Fidelity Stereo Vocoding.
IEEE Signal Process. Lett., 2024

SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge.
CoRR, 2024

Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition.
CoRR, 2024

A Multi-Stream Fusion Approach with One-Class Learning for Audio-Visual Deepfake Detection.
CoRR, 2024

GTR-Voice: Articulatory Phonetics Informed Controllable Expressive Speech Synthesis.
CoRR, 2024

CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.
CoRR, 2024

SVDD Challenge 2024: A Singing Voice Deepfake Detection Challenge Evaluation Plan.
CoRR, 2024

Scoring Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription.
CoRR, 2024

Toward Fully Self-Supervised Multi-Pitch Estimation.
CoRR, 2024

Cacophony: An Improved Contrastive Audio-Text Model.
CoRR, 2024

Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech.
Proceedings of the IEEE International Conference on Acoustics, 2024

SynthTab: Leveraging Synthesized Data for Guitar Tablature Transcription.
Proceedings of the IEEE International Conference on Acoustics, 2024

SingFake: Singing Voice Deepfake Detection.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
Editorial for TISMIR Special Collection: Cultural Diversity in MIR Research.
Trans. Int. Soc. Music. Inf. Retr., January, 2023

EDMSound: Spectrogram Based Diffusion Models for Efficient and High-Quality Audio Synthesis.
CoRR, 2023

Mitigating Cross-Database Differences for Learning Unified HRTF Representation.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

Harmonic Analysis With Neural Semi-CRF.
Proceedings of the 24th International Society for Music Information Retrieval Conference, 2023

Phase perturbation improves channel robustness for speech spoofing countermeasures.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Transcription Free Filler Word Detection with Neural Semi-CRFs.
Proceedings of the IEEE International Conference on Acoustics, 2023

HRTF Field: Unifying Measured HRTF Magnitude Representation with Neural Fields.
Proceedings of the IEEE International Conference on Acoustics, 2023

SingNet: a real-time Singing Voice beat and Downbeat Tracking System.
Proceedings of the IEEE International Conference on Acoustics, 2023

SAMO: Speaker Attractor Multi-Center One-Class Learning For Voice Anti-Spoofing.
Proceedings of the IEEE International Conference on Acoustics, 2023

2022
Speech Driven Talking Face Generation From a Single Image and an Emotion Condition.
IEEE Trans. Multim., 2022

Draw and Listen! A Sketch-Based System for Music Inpainting.
Trans. Int. Soc. Music. Inf. Retr., 2022

Music Source Separation With Generative Flow.
IEEE Signal Process. Lett., 2022

ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Rhythm.
CoRR, 2022

Predicting Global Head-Related Transfer Functions From Scanned Head Geometry Using Deep Learning and Compact Representations.
CoRR, 2022

A Data-Driven Methodology for Considering Feasibility and Pairwise Likelihood in Deep Learning Based Guitar Tablature Transcription Systems.
CoRR, 2022

A New Fusion Strategy for Spoofing Aware Speaker Verification.
CoRR, 2022

A Probabilistic Fusion Framework for Spoofing Aware Speaker Verification.
Proceedings of the Odyssey 2022: The Speaker and Language Recognition Workshop, 28 June, 2022

DyViSE: Dynamic Vision-Guided Speaker Embedding for Audio-Visual Speaker Diarization.
Proceedings of the 24th IEEE International Workshop on Multimedia Signal Processing, 2022

Rethinking Audio-Visual Synchronization for Active Speaker Detection.
Proceedings of the 32nd IEEE International Workshop on Machine Learning for Signal Processing, 2022

Singing beat tracking with Self-supervised front-end and linear transformers.
Proceedings of the 23rd International Society for Music Information Retrieval Conference, 2022

A Study of The Robustness of Raw Waveform Based Speaker Embeddings Under Mismatched Conditions.
Proceedings of the IEEE International Conference on Acoustics, 2022

Progressive Teacher-Student Training Framework for Music Tagging.
Proceedings of the IEEE International Conference on Acoustics, 2022

A Novel 1D State Space for Efficient Music Rhythmic Analysis.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Audiovisual Singing Voice Separation.
Trans. Int. Soc. Music. Inf. Retr., 2021

One-Class Learning Towards Synthetic Voice Spoofing Detection.
IEEE Signal Process. Lett., 2021

Learning Sparse Analytic Filters for Piano Transcription.
CoRR, 2021

UR Channel-Robust Synthetic Speech Detection System for ASVspoof 2021.
CoRR, 2021

Skipping the Frame-Level: Event-Based Piano Transcription With Neural Semi-CRFs.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

CollageNet: Fusing arbitrary melody and accompaniment into a coherent song.
Proceedings of the 22nd International Society for Music Information Retrieval Conference, 2021

BeatNet: CRNN and Particle Filtering for Online Joint Beat, Downbeat and Meter Tracking.
Proceedings of the 22nd International Society for Music Information Retrieval Conference, 2021

Y-Vector: Multiscale Waveform Encoder for Speaker Embedding.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

An Empirical Study on Channel Effects for Synthetic Voice Spoofing Countermeasure Systems.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Don't Look Back: An Online Beat Tracking Method Using RNN and Enhanced Particle Filtering.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
Noise-Resilient Training Method for Face Landmark Generation From Speech.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Speaker Attractor Network: Generalizing Speech Separation to Unseen Numbers of Sources.
IEEE Signal Process. Lett., 2020

Do not look back: an online beat tracking method using RNN and enhanced particle filtering.
CoRR, 2020

One-class learning towards generalized voice spoofing detection.
CoRR, 2020

Raw-x-vector: Multi-scale Time Domain Speaker Embedding Network.
CoRR, 2020

Themes Inferred Audio-visual Correspondence Learning.
CoRR, 2020

When Counterpoint Meets Chinese Folk Melodies.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

BachDuet: A Deep Learning System for Human-Machine Counterpoint Improvisation.
Proceedings of the 20th International Conference on New Interfaces for Musical Expression, 2020

End-To-End Generation of Talking Faces from Noisy Speech.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Vroom!: A Search Engine for Sounds by Vocal Imitation Queries.
Proceedings of the CHIIR '20: Conference on Human Information Interaction and Retrieval, 2020

RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Creating a Multitrack Classical Music Performance Dataset for Multimodal Music Analysis: Challenges, Insights, and Applications.
IEEE Trans. Multim., 2019

Online Audio-Visual Source Association for Chamber Music Performances.
Trans. Int. Soc. Music. Inf. Retr., 2019

Siamese Style Convolutional Neural Networks for Sound Search by Vocal Imitation.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Audio-Visual Deep Clustering for Speech Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

Audiovisual Analysis of Music Performances: Overview of an Emerging Field.
IEEE Signal Process. Mag., 2019

Automatic Music Transcription: An Overview.
IEEE Signal Process. Mag., 2019

Adversarial Training for Speech Super-Resolution.
IEEE J. Sel. Top. Signal Process., 2019

Spoofing Speaker Verification Systems with Deep Multi-speaker Text-to-speech Synthesis.
CoRR, 2019

Sound Search by Text Description or Vocal Imitation?
CoRR, 2019

Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss.
CoRR, 2019

Audio-Visual Event Localization in the Wild.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Sound to Visual: Hierarchical Cross-Modal Talking Face Generation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2019

Hierarchical Cross-Modal Talking Face Generation With Dynamic Pixel-Wise Loss.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Listen and Look: Audio-Visual Matching Assisted Speech Source Separation.
IEEE Signal Process. Lett., 2018

Front-end speech enhancement for commercial speaker verification systems.
Speech Commun., 2018

Part-invariant Model for Music Generation and Harmonization.
Proceedings of the 19th International Society for Music Information Retrieval Conference, 2018

Skeleton Plays Piano: Online Generation of Pianist Body Movements from MIDI Performance.
Proceedings of the 19th International Society for Music Information Retrieval Conference, 2018

Joint Speaker Diarization and Recognition Using Convolutional and Recurrent Neural Networks.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Visualization and Interpretation of Siamese Style Convolutional Neural Networks for Sound Search by Vocal Imitation.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Score-Aligned Polyphonic Microtiming Estimation.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Multi-Scale Recurrent Neural Network for Sound Event Detection.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Unsupervised Learning Approach to Feature Analysis for Automatic Speech Emotion Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Generating Talking Face Landmarks from Speech.
Proceedings of the Latent Variable Analysis and Signal Separation, 2018

Audio-Visual Event Localization in Unconstrained Videos.
Proceedings of the Computer Vision - ECCV 2018, 2018

Lip Movements Generation at a Glance.
Proceedings of the Computer Vision - ECCV 2018, 2018

Vocal Imitation Set: a dataset of vocally imitated sound events using the AudioSet ontology.
Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 2018

2017
Piano Transcription With Convolutional Sparse Lateral Inhibition.
IEEE Signal Process. Lett., 2017

Enhanced multiclass SVM with thresholding fusion for speech-based emotion classification.
Int. J. Speech Technol., 2017

IMINET: Convolutional semi-siamese networks for sound search by vocal imitation.
Proceedings of the 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2017

Metric learning based data augmentation for environmental sound classification.
Proceedings of the 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2017

Deep Cross-Modal Audio-Visual Generation.
Proceedings of the on Thematic Workshops of ACM Multimedia 2017, Mountain View, CA, USA, October 23, 2017

Video-Based Vibrato Detection and Analysis for Polyphonic String Music.
Proceedings of the 18th International Society for Music Information Retrieval Conference, 2017

A Metric for Music Notation Transcription Accuracy.
Proceedings of the 18th International Society for Music Information Retrieval Conference, 2017

Deep ranking: Triplet MatchNet for music metric learning.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

See and listen: Score-informed association of sound tracks to players in chamber music performance videos.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Visually informed multi-pitch analysis of string ensembles.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016
An Approach to Score Following for Piano Performances With the Sustained Effect.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Context-Dependent Piano Music Transcription With Convolutional Sparse Coding.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Creating A Musical Performance Dataset for Multimodal Music Analysis: Challenges, Insights, and Applications.
CoRR, 2016

Transcribing Human Piano Performances into Music Notation.
Proceedings of the 17th International Society for Music Information Retrieval Conference, 2016

WISE: Web-based Interactive Speech Emotion Classification.
Proceedings of the 4th Workshop on Sentiment Analysis where AI meets Psychology (SAAIP 2016) co-located with 25th International Joint Conference on Artificial Intelligence (IJCAI 2016), 2016

IMISOUND: An unsupervised system for sound query by vocal imitation.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Emotion classification: How does an automated system compare to Naive human coders?
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
Rotational reset strategy for online semi-supervised NMF-based speech enhancement for long recordings.
Proceedings of the 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2015

Retrieving sounds by vocal imitation recognition.
Proceedings of the 25th IEEE International Workshop on Machine Learning for Signal Processing, 2015

Piano music transcription with fast convolutional sparse coding.
Proceedings of the 25th IEEE International Workshop on Machine Learning for Signal Processing, 2015

Score Following for Piano Performances with Sustain-Pedal Effects.
Proceedings of the 16th International Society for Music Information Retrieval Conference, 2015

Piano music transcription modeling note temporal evolution.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014
Combining rhythm-based and pitch-based methods for background and melody separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

Multi-pitch Streaming of Harmonic Sound Mixtures.
IEEE ACM Trans. Audio Speech Lang. Process., 2014

Note-level Music Transcription by Maximum Likelihood Sampling.
Proceedings of the 15th International Society for Music Information Retrieval Conference, 2014

A novel cepstral representation for timbre modeling of sound sources in polyphonic mixtures.
Proceedings of the IEEE International Conference on Acoustics, 2014

2012
Speech Enhancement by Online Non-negative Spectrogram Decomposition in Non-stationary Noise Environments.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Online PLCA for Real-Time Semi-supervised Source Separation.
Proceedings of the Latent Variable Analysis and Signal Separation, 2012

2011
Soundprism: An Online System for Score-Informed Source Separation of Music Audio.
IEEE J. Sel. Top. Signal Process., 2011

Aligning Semi-Improvised Music Audio with Its Lead Sheet.
Proceedings of the 12th International Society for Music Information Retrieval Conference, 2011

A state space model for online polyphonic audio-score alignment.
Proceedings of the IEEE International Conference on Acoustics, 2011

2010
Multiple Fundamental Frequency Estimation by Modeling Spectral Peaks and Non-Peak Regions.
IEEE Trans. Speech Audio Process., 2010

Song-level multi-pitch tracking by heavily constrained clustering.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
Harmonically Informed Multi-Pitch Tracking.
Proceedings of the 10th International Society for Music Information Retrieval Conference, 2009

2008
Unsupervised Single-Channel Music Source Separation by Average Harmonic Structure Modeling.
IEEE Trans. Speech Audio Process., 2008

Collective Annotation of Music from Multiple Semantic Categories.
Proceedings of the ISMIR 2008, 2008

Audio tonality mode classification without tonic annotations.
Proceedings of the 2008 IEEE International Conference on Multimedia and Expo, 2008

2007
Multi-Pitch Estimation Based on Partial Event and Support Transfer.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

Excitation signal Extraction for Guitar tones.
Proceedings of the 2007 International Computer Music Conference, 2007


  Loading...