Yan Song

Orcid: 0000-0002-5668-9068

Affiliations:
  • University of Science and Technology of China, National Engineering Laboratory for Speech and Language Information Processing, Hefei, China


According to our database1, Yan Song authored at least 115 papers between 2005 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection.
CoRR, 2024

Meta Representation Learning Method for Robust Speaker Verification in Unseen Domains.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
Hierarchical Audio-Visual Information Fusion with Multi-label Joint Decoding for MER 2023.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Robust Prototype Learning for Anomalous Sound Detection.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Fine-tuning Audio Spectrogram Transformer with Task-aware Adapters for Sound Event Detection.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Joint Generative-Contrastive Representation Learning for Anomalous Sound Detection.
Proceedings of the IEEE International Conference on Acoustics, 2023

AST-SED: An Effective Sound Event Detection Method Based on Audio Spectrogram Transformer.
Proceedings of the IEEE International Conference on Acoustics, 2023

Stargan-vc Based Cross-Domain Data Augmentation for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2023

An Effective Anomalous Sound Detection Method Based on Representation Learning with Simulated Anomalies.
Proceedings of the IEEE International Conference on Acoustics, 2023

Deepfake Algorithm Recognition System with Augmented Data for ADD 2023 Challenge.
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

Convolutional Recurrent Neural Network and Multitask Learning for Manipulation Region Location.
Proceedings of the Workshop on Deepfake Audio Detection and Analysis co-located with 32th International Joint Conference on Artificial Intelligence (IJCAI 2023), 2023

2022
Cross-Lingual Self-training to Learn Multilingual Representation for Low-Resource Speech Recognition.
Circuits Syst. Signal Process., 2022

Class-Aware Distribution Alignment based Unsupervised Domain Adaptation for Speaker Verification.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Frontend Attributes Disentanglement for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Domain Robust Deep Embedding Learning for Speaker Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Self-Supervised Representation Learning for Unsupervised Anomalous Sound Detection Under Domain Shift.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
Multi-Granularity Sequence Alignment Mapping for Encoder-Decoder Based End-to-End ASR.
IEEE ACM Trans. Audio Speech Lang. Process., 2021

Variance Normalised Features for Language and Dialect Discrimination.
Circuits Syst. Signal Process., 2021

XLST: Cross-lingual Self-training to Learn Multilingual Representation for Low Resource Speech Recognition.
CoRR, 2021

An Effective Mutual Mean Teaching Based Domain Adaptation Method for Sound Event Detection.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Weight Moving Average Based Alternate Decoupled Learning Algorithm for Long-Tailed Language Identification.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

An Improved Mean Teacher Based Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection.
Proceedings of the IEEE International Conference on Acoustics, 2021

An Effective Deep Embedding Learning Method Based on Dense-Residual Networks for Speaker Verification.
Proceedings of the IEEE International Conference on Acoustics, 2021

A Novel Fault Diagnosis Method Based on Topological Data Analysis.
Proceedings of the CAA Symposium on Fault Detection, 2021

2020
Segment boundary detection directed attention for online end-to-end speech recognition.
EURASIP J. Audio Speech Music. Process., 2020

Time-Frequency Feature Fusion for Noise Robust Audio Event Classification.
Circuits Syst. Signal Process., 2020

Effective Exploitation of Posterior Information for Attention-Based Speech Recognition.
IEEE Access, 2020

An Effective Perturbation Based Semi-Supervised Learning Method for Sound Event Detection.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Semi-Supervised End-to-End ASR via Teacher-Student Learning with Conditional Posterior Distribution.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

An Effective Speaker Recognition Method Based on Joint Identification and Verification Supervisions.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Task-Aware Mean Teacher Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

An Online Speaker-aware Speech Separation Approach Based on Time-domain Representation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Listening and Grouping: An Online Autoregressive Approach for Monaural Speech Separation.
IEEE ACM Trans. Audio Speech Lang. Process., 2019

An Effective Deep Embedding Learning Architecture for Speaker Verification.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

A Region Based Attention Method for Weakly Supervised Sound Event Detection and Classification.
Proceedings of the IEEE International Conference on Acoustics, 2019

Topic Detection in Conversational Telephone Speech Using CNN with Multi-stream Inputs.
Proceedings of the IEEE International Conference on Acoustics, 2019

Knowledge Distillation from Multilingual and Monolingual Teachers for End-to-End Multilingual Speech Recognition.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Speaker to Emotion: Domain Adaptation for Speech Emotion Recognition with Residual Adapters.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Learning Adaptive Downsampling Encoding for Online End-to-End Speech Recognition.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Triplet-Center Loss Based Deep Embedding Learning Method for Speaker Verification.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

2018
LID-Senones and Their Statistics for Language Identification.
IEEE ACM Trans. Audio Speech Lang. Process., 2018

A Conditional Generative Model for Speech Enhancement.
Circuits Syst. Signal Process., 2018

Improved Supervised Locality Preserving Projection for I-vector Based Speaker Verification.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Acoustic Modeling with Densely Connected Residual Network for Multichannel Speech Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Early Detection of Continuous and Partial Audio Events Using CNN.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

An Attention Pooling Based Representation Learning Method for Speech Emotion Recognition.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

An Improved Deep Embedding Learning Method for Short Duration Speaker Verification.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Source-Aware Context Network for Single-Channel Multi-Speaker Speech Separation.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A Capsule based Approach for Polyphonic Sound Event Detection.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2018

2017
End-to-End Language Identification Using High-Order Utterance Representation with Bilinear Pooling.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Fisher vector based CNN architecture for image classification.
Proceedings of the 2017 IEEE International Conference on Image Processing, 2017

Tibetan-Mandarin bilingual speech recognition based on end-to-end framework.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

Topic classification based on distributed document representation and latent topic information.
Proceedings of the 2017 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2017

2016
A new variance-based approach for discriminative feature extraction in machine hearing classification using spectrogram features.
Digit. Signal Process., 2016

Improved i-Vector Representation for Speaker Diarization.
Circuits Syst. Signal Process., 2016

Image classification with CNN-based Fisher vector coding.
Proceedings of the 2016 Visual Communications and Image Processing, 2016

Improvements on Deep Bottleneck Network based I-Vector Representation for Spoken Language Identification.
Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016

LID-senone Extraction via Deep Neural Networks for End-to-End Language Identification.
Proceedings of the Odyssey 2016: The Speaker and Language Recognition Workshop, 2016

Robust Sound Event Detection in Continuous Audio Environments.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Compact convolutional neural network transfer learning for small-scale image classification.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015
Robust Sound Event Classification Using Deep Neural Networks.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Reconstruction of Phonated Speech from Whispers Using Formant-Derived Plausible Pitch Modulation.
ACM Trans. Access. Comput., 2015

Mouth State Detection From Low-Frequency Ultrasonic Reflection.
Circuits Syst. Signal Process., 2015

Deep Bottleneck Feature for Image Classification.
Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015

Deep bottleneck network based i-vector representation for language identification.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Low frequency ultrasonic voice activity detection using convolutional neural networks.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Robust sound event recognition using convolutional neural networks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Improved language identification using deep bottleneck network.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014
Tone confusion in spoken and whispered Mandarin Chinese.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Reconstruction of pitch for whisper-to-speech conversion of Chinese.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Performance evaluation of deep bottleneck features for spoken language identification.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Task-aware deep bottleneck features for spoken language identification.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A spectral based visual matching method for image classification.
Proceedings of the International Conference on Audio, 2014

2013
Reconstruction of continuous voiced speech from whispers.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Joint spectral distribution modeling using restricted boltzmann machines for voice conversion.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Exemplar based language recognition method for short-duration speech segments.
Proceedings of the IEEE International Conference on Acoustics, 2013

Phoneme variation based synthesized speech discrimination for speaker verification.
Proceedings of the IEEE International Conference on Acoustics, 2013

2012
Intra-conversation intra-speaker variability compensation for speaker clustering.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

Exemplar-Based Sparse Representation for Language Recognition on I-Vectors.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

2011
Spatial pooling for transformation invariant image representation.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Effective image representation based on bi-layer visual codebook.
Proceedings of the First Asian Conference on Pattern Recognition, 2011

2010
The description of iFlyTek Speech Lab system for NIST2009 Language Recognition Evaluation.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Multiple instance learning using visual phrases for object classification.
Proceedings of the 2010 IEEE International Conference on Multimedia and Expo, 2010

2009
Unified Video Annotation via Multigraph Learning.
IEEE Trans. Circuits Syst. Video Technol., 2009

Semi-supervised kernel density estimation for video annotation.
Comput. Vis. Image Underst., 2009

Concept-Dependent Image Annotation via Existence-Based Multiple-Instance Learning.
Proceedings of the IEEE International Conference on Systems, 2009

Image Fusion Quality Metrics by Directional Projection.
Proceedings of the IEEE International Conference on Systems, 2009

Concept representation based video indexing.
Proceedings of the 32nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2009

An automatic language identification method based on subspace analysis.
Proceedings of the 2009 IEEE International Conference on Multimedia and Expo, 2009

2008
Video Annotation Based on Kernel Linear Neighborhood Propagation.
IEEE Trans. Multim., 2008

Optimizing Training Set Construction for Video Semantic Classification.
EURASIP J. Adv. Signal Process., 2008

A Sample and Feature Selection Scheme for GMM-SVM Based Language Recognition.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

The Adaptation Schemes In PR-SVM Based Language Recognition.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

2007
Interactive Video Annotation by Multi-Concept Multi-Modality Active Learning.
Int. J. Semantic Comput., 2007

RMulti-Concept Multi-Modality Active Learning for Interactive Video Annotation.
Proceedings of the First IEEE International Conference on Semantic Computing (ICSC 2007), 2007

Kernel-Based Linear Neighborhood Propagation for Semantic Video Annotation.
Proceedings of the Advances in Knowledge Discovery and Data Mining, 2007

An Efficient Automatic Video Shot Size Annotation Scheme.
Proceedings of the Advances in Multimedia Modeling, 2007

Video annotation by graph-based learning with neighborhood similarity.
Proceedings of the 15th International Conference on Multimedia 2007, 2007

Optimizing multi-graph learning: towards a unified video annotation scheme.
Proceedings of the 15th International Conference on Multimedia 2007, 2007

Multi-Graph Semi-Supervised Learning for Video Semantic Feature Extraction.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

Lazy Learning Based Efficient Video Annotation.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

Transductive Inference with Hierarchical Clustering for Video Annotation.
Proceedings of the 2007 IEEE International Conference on Multimedia and Expo, 2007

An Interactive Video Annotation Frameowrk with Multiple Modalities.
Proceedings of the IEEE International Conference on Acoustics, 2007

2006
Automatic video annotation by semi-supervised learning with kernel density estimation.
Proceedings of the 14th ACM International Conference on Multimedia, 2006

To construct optimal training set for video annotation.
Proceedings of the 14th ACM International Conference on Multimedia, 2006

Efficient semantic annotation method for indexing large personal video database.
Proceedings of the 8th ACM SIGMM International Workshop on Multimedia Information Retrieval, 2006

Two-layer Distance Scheme in Matching Engine for Query by Humming System.
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006

Automatic video annotation based on co-adaptation and label correction.
Proceedings of the International Symposium on Circuits and Systems (ISCAS 2006), 2006

Enhanced Semi-Supervised Learning for Automatic Video Annotation.
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006

Video Annotation by Active Learning and Semi-Supervised Ensembling.
Proceedings of the 2006 IEEE International Conference on Multimedia and Expo, 2006

Semi-Supervised Kernel Regression.
Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), 2006

An Automatic Video Semantic Annotation Scheme Based on Combination of Complementary Predictors.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Video Annotation by Active Learning and Cluster Tuning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2006

2005
Semi-automatic video annotation based on active learning with multiple complementary predictors.
Proceedings of the 7th ACM SIGMM International Workshop on Multimedia Information Retrieval, 2005


  Loading...