Shinji Watanabe
Orcid: 0000-0002-5970-8631Affiliations:
- Carnegie Mellon University, Pittsburgh, PA, USA
- Johns Hopkins University, Baltimore, MD, USA (former)
- Mitsubishi Electric Research Laboratories, Cambridge, MA, USA (2012 - 2017)
- NTT Communication Science Laboratories, Kyoto, Japan (2001 - 2011)
- Waseda University, Tokyo, Japan (PhD 2006)
According to our database1,
Shinji Watanabe
authored at least 598 papers
between 2002 and 2024.
Collaborative distances:
Collaborative distances:
Timeline
Legend:
Book In proceedings Article PhD thesis Dataset OtherLinks
Online presence:
-
on scopus.com
-
on linkedin.com
-
on clsp.jhu.edu
-
on orcid.org
On csauthors.net:
Bibliography
2024
Module-Based End-to-End Distant Speech Processing: A case study of far-field automatic speech recognition [Special Issue On Model-Based and Data-Driven Audio Signal Processing].
IEEE Signal Process. Mag., November, 2024
IEEE ACM Trans. Audio Speech Lang. Process., 2024
IEEE ACM Trans. Audio Speech Lang. Process., 2024
IEEE ACM Trans. Audio Speech Lang. Process., 2024
IEEE ACM Trans. Audio Speech Lang. Process., 2024
CoRR, 2024
Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition.
CoRR, 2024
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks.
CoRR, 2024
VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning.
CoRR, 2024
ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech.
CoRR, 2024
Hypothesis Clustering and Merging: Novel MultiTalker Speech Recognition with Speaker Tokens.
CoRR, 2024
CoRR, 2024
Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels.
CoRR, 2024
CoRR, 2024
Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition.
CoRR, 2024
Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition.
CoRR, 2024
SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data.
CoRR, 2024
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization.
CoRR, 2024
Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing.
CoRR, 2024
Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss.
CoRR, 2024
Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement.
CoRR, 2024
Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting.
CoRR, 2024
CoRR, 2024
MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model.
CoRR, 2024
DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding.
CoRR, 2024
CoRR, 2024
VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation.
CoRR, 2024
ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets.
CoRR, 2024
CoRR, 2024
EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation.
CoRR, 2024
URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement.
CoRR, 2024
Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement.
CoRR, 2024
4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders.
CoRR, 2024
TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages.
CoRR, 2024
CoRR, 2024
Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?
CoRR, 2024
CoRR, 2024
ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models.
CoRR, 2024
SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics.
CoRR, 2024
OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer.
CoRR, 2024
AugSumm: towards generalizable speech summarization using synthetic labels from large language model.
CoRR, 2024
UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024
Visual Speech Recognition for Languages with Limited Labeled Data Using Automatic Labels from Whisper.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Improving Audio Captioning Models with Fine-Grained Audio Features, Text Embedding Supervision, and LLM Mix-Up Augmentation.
Proceedings of the IEEE International Conference on Acoustics, 2024
Contextualized Automatic Speech Recognition With Attention-Based Bias Phrase Boosted Beam Search.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks.
Proceedings of the IEEE International Conference on Acoustics, 2024
Hubertopic: Enhancing Semantic Representation of Hubert Through Self-Supervision Utilizing Topic Model.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal Tokens.
Proceedings of the IEEE International Conference on Acoustics, 2024
AugSumm: Towards Generalizable Speech Summarization Using Synthetic Labels from Large Language Models.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization.
Proceedings of the IEEE International Conference on Acoustics, 2024
Dynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark For Speech.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
One Model to Rule Them All ? Towards End-to-End Joint Speaker Diarization and Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study.
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the IEEE International Conference on Acoustics, 2024
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: EMNLP 2024, 2024
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
Proceedings of the Findings of the Association for Computational Linguistics, 2024
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing.
J. Open Source Softw., November, 2023
Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing (espnet-v.202310).
Dataset, October, 2023
IEEE ACM Trans. Audio Speech Lang. Process., 2023
IEEE ACM Trans. Audio Speech Lang. Process., 2023
Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors.
IEEE ACM Trans. Audio Speech Lang. Process., 2023
IEEE ACM Trans. Audio Speech Lang. Process., 2023
A dilemma of ground truth in noisy speech separation and an approach to lessen the impact of imperfect training data.
Comput. Speech Lang., 2023
TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch.
CoRR, 2023
Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond.
CoRR, 2023
EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Multilingual and Low Resource Scenarios.
CoRR, 2023
UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network.
CoRR, 2023
CoRR, 2023
Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech.
CoRR, 2023
Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation.
CoRR, 2023
Visual Speech Recognition for Low-resource Languages with Automatic Labels From Whisper Model.
CoRR, 2023
The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction.
CoRR, 2023
The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios.
CoRR, 2023
CoRR, 2023
CoRR, 2023
Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling.
CoRR, 2023
Multi-Channel Target Speaker Extraction with Refinement: The WavLab Submission to the Second Clarity Enhancement Challenge.
CoRR, 2023
CoRR, 2023
Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023
Proceedings of the Text, Speech, and Dialogue - 26th International Conference, 2023
Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, 2023
UNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training Mixtures.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Proceedings of the 20th International Conference on Spoken Language Translation, 2023
Proceedings of the 20th International Conference on Spoken Language Translation, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Time-synchronous one-pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023
Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023
Proceedings of the International Conference on Machine Learning, 2023
Proceedings of the Eleventh International Conference on Learning Representations, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023
FNeural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated full- and sub-band Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2023
TF-GRIDNET: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
I3D: Transformer Architectures with Input-Dependent Dynamic Depth for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023
Structured Pruning of Self-Supervised Pre-Trained Models for Speech Recognition and Understanding.
Proceedings of the IEEE International Conference on Acoustics, 2023
Align, Write, Re-Order: Explainable End-to-End Speech Translation via Operation Sequence Generation.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Fully Unsupervised Topic Clustering of Unlabelled Spoken Audio Using Self-Supervised Representation Learning and Topic Model.
Proceedings of the IEEE International Conference on Acoustics, 2023
Articulatory Representation Learning via Joint Factor Analysis and Neural Matrix Factorization.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Speech Summarization of Long Spoken Document: Improving Memory Efficiency of Speech/Text Encoders.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
The Pipeline System of ASR and NLU with MLM-based data Augmentation Toward Stop Low-Resource Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2023
Multi-Channel Speaker Extraction with Adversarial Training: The Wavlab Submission to The Clarity ICASSP 2023 Grand Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
A Unified One-Shot Prosody and Speaker Conversion System with Self-Supervised Discrete Speech Units.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the IEEE International Conference on Acoustics, 2023
A Study on the Integration of Pipeline and E2E SLU Systems for Spoken Semantic Parsing Toward Stop Quality Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2023
Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History.
Proceedings of the IEEE International Conference on Acoustics, 2023
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Domain Adaptation by Data Distribution Matching Via Submodularity For Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Findings of the 2023 ML-Superb Challenge: Pre-Training And Evaluation Over More Languages And Beyond.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Espnet-Summ: Introducing a Novel Large Dataset, Toolkit, and a Cross-Corpora Evaluation of Speech Summarization Systems.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, And Extraction.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Summarize While Translating: Universal Model With Parallel Decoding for Summarization and Translation.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
TorchAudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023
Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
2022
IEEE ACM Trans. Audio Speech Lang. Process., 2022
IEEE ACM Trans. Audio Speech Lang. Process., 2022
IEEE Signal Process. Lett., 2022
IEEE J. Sel. Top. Signal Process., 2022
Editorial Editorial of Special Issue on Self-Supervised Learning for Speech and Audio Processing.
IEEE J. Sel. Top. Signal Process., 2022
Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition.
Comput. Speech Lang., 2022
An investigation of neural uncertainty estimation for target speaker extraction equipped RNN transducer.
Comput. Speech Lang., 2022
Train from scratch: Single-stage joint training of speech separation and recognition.
Comput. Speech Lang., 2022
Comput. Speech Lang., 2022
Comput. Speech Lang., 2022
Comput. Speech Lang., 2022
CoRR, 2022
CoRR, 2022
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
A Study on the Integration of Pre-Trained SSL, ASR, LM and SLU Models for Spoken Language Understanding.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Superb @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022
Proceedings of the 19th International Conference on Spoken Language Translation, 2022
Proceedings of the 19th International Conference on Spoken Language Translation, 2022
Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Minimum latency training of sequence transducers for streaming end-to-end speech recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022
Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding.
Proceedings of the International Conference on Machine Learning, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Run-and-Back Stitch Search: Novel Block Synchronous Decoding For Streaming Encoder-Decoder ASR.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Non-Autoregressive End-To-End Automatic Speech Recognition Incorporating Downstream Natural Language Processing.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
An Exploration of Hubert with Large Number of Cluster Units and Model Assessment Using Bayesian Information Criterion.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Towards Low-Distortion Multi-Channel Speech Enhancement: The ESPNET-Se Submission to the L3DAS22 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Improving Non-Autoregressive End-to-End Speech Recognition with Pre-Trained Acoustic and Language Models.
Proceedings of the IEEE International Conference on Acoustics, 2022
The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results.
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
Proceedings of the IEEE International Conference on Acoustics, 2022
BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022
Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022
2021
IEEE Signal Process. Lett., 2021
Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem.
CoRR, 2021
JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification.
CoRR, 2021
Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring.
CoRR, 2021
CoRR, 2021
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10, 000 Hours of Transcribed Audio.
CoRR, 2021
INTERSPEECH 2021 ConferencingSpeech Challenge: Towards Far-field Multi-Channel Speech Enhancement for Video Conferencing.
CoRR, 2021
The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap.
CoRR, 2021
Online End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers.
CoRR, 2021
Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2021
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for ASR Integration.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021
Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, 2021
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021
Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021
Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021
Proceedings of the 18th International Conference on Spoken Language Translation, 2021
Proceedings of the 18th International Conference on Spoken Language Translation, 2021
Auxiliary Loss Function for Target Speech Extraction and Recognition with Weak Supervision Based on Speaker Characteristics.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Online Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Data Augmentation Methods for End-to-End Speech Recognition on Distant-Talk Scenarios.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
SPGISpeech: 5, 000 Hours of Transcribed Financial Audio for Fully Formatted End-to-End Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Speech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
GigaSpeech: An Evolving, Multi-Domain ASR Corpus with 10, 000 Hours of Transcribed Audio.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021
End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend.
Proceedings of the IEEE International Conference on Acoustics, 2021
Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization.
Proceedings of the IEEE International Conference on Acoustics, 2021
Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation.
Proceedings of the IEEE International Conference on Acoustics, 2021
End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings.
Proceedings of the IEEE International Conference on Acoustics, 2021
Training Noisy Single-Channel Speech Separation with Noisy Oracle Sources: A Large Gap and a Small Step.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Gaussian Kernelized Self-Attention for Long Sequence Data and its Application to CTC-Based Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Proceedings of the IEEE International Conference on Acoustics, 2021
Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021
Proceedings of the 6th Workshop on Detection and Classification of Acoustic Scenes and Events 2021 (DCASE 2021), 2021
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
Conferencingspeech Challenge: Towards Far-Field Multi-Channel Speech Enhancement for Video Conferencing.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
A Study of Transducer Based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021
2020
Automated Development of DNN Based Spoken Language Systems Using Evolutionary Algorithms.
Proceedings of the Deep Neural Evolution - Deep Learning with Evolutionary Computation, 2020
IEEE ACM Trans. Audio Speech Lang. Process., 2020
IEEE ACM Trans. Audio Speech Lang. Process., 2020
The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans.
CoRR, 2020
Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording.
CoRR, 2020
Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation.
CoRR, 2020
CoRR, 2020
CHiME-6 Challenge: Tackling Multispeaker Speech Recognition for Unsegmented Recordings.
CoRR, 2020
End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification.
CoRR, 2020
Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020
End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020
End-to-End Automatic Speech Recognition Integrated with CTC-Based Voice Activity Detection.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Far-Field Location Guided Target Speech Extraction Using End-to-End Speech Recognition Objectives.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
A Practical Two-Stage Training Strategy for Multi-Stream End-to-End Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Semi-Supervised Speaker Adaptation for End-to-End Speech Synthesis with Pretrained Models.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Espnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020
Conformer-Based Sound Event Detection with Semi-Supervised Learning and Data Augmentation.
Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), 2020
The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2020
2019
Evolution-Strategy-Based Automation of System Development for High-Performance Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2019
Speech Processing for Digital Home Assistants: Combining signal processing with deep-learning techniques.
IEEE Signal Process. Mag., 2019
Introduction to the Issue on Far-Field Speech Processing in the Era of Deep Learning: Speech Enhancement, Separation, and Recognition.
IEEE J. Sel. Top. Signal Process., 2019
IEEE J. Sel. Top. Signal Process., 2019
Listen and Fill in the Missing Letters: Non-Autoregressive Transformer for Speech Recognition.
CoRR, 2019
Dry, Focus, and Transcribe: End-to-End Integration of Dereverberation, Beamforming, and ASR.
CoRR, 2019
Generalized Weighted-Prediction-Error Dereverberation with Varying Source Priors For Reverberant Speech Recognition.
Proceedings of the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019
Proceedings of the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019
Analysis of Robustness of Deep Single-Channel Speech Separation Using Corpora Constructed From Multiple Domains.
Proceedings of the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019
ESPnet How2 Speech Translation System for IWSLT 2019: Pre-training, Knowledge Distillation, and Going Deeper.
Proceedings of the 16th International Conference on Spoken Language Translation, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Study of the Performance of Automatic Speech Recognition Systems in Speakers with Parkinson's Disease.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Improving Transformer-Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019
Proceedings of the International Joint Conference on Neural Networks, 2019
Proceedings of the 2019 International Conference on Document Analysis and Recognition, 2019
Improving End-to-end Speech Recognition with Pronunciation-assisted Sub-word Modeling.
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
The Phasebook: Building Complex Masks via Discrete Representations for Source Separation.
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Acoustic Modeling for Distant Multi-talker Speech Recognition with Single- and Multi-channel Branches.
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Language Model Integration Based on Memory Control for Sequence to Sequence Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
Proceedings of the IEEE International Conference on Acoustics, 2019
CNN-based Multichannel End-to-End Speech Recognition for Everyday Home Environments<sup>*</sup>.
Proceedings of the 27th European Signal Processing Conference, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019
2018
CoRR, 2018
Vectorization of hypotheses and speech for faster beam search in encoder decoder-based speech recognition.
CoRR, 2018
CoRR, 2018
CoRR, 2018
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018
Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018
Proceedings of the 15th International Conference on Spoken Language Translation, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Building State-of-the-art Distant Speech Recognition Using the CHiME-4 Challenge with a Setup of Speech Enhancement Baseline.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
The Fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, Task and Baselines.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018
2017
IEEE ACM Trans. Audio Speech Lang. Process., 2017
IEEE J. Sel. Top. Signal Process., 2017
Unified Architecture for Multichannel End-to-End Speech Recognition With Neural Beamforming.
IEEE J. Sel. Top. Signal Process., 2017
Prior-based Binary Masking and Discriminative Methods for Reverberant and Noisy Speech Recognition Using Distant Stereo Microphones.
J. Inf. Process., 2017
An analysis of environment, microphone and data simulation mismatches in robust speech recognition.
Comput. Speech Lang., 2017
Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend.
Comput. Speech Lang., 2017
The third 'CHiME' speech separation and recognition challenge: Analysis and outcomes.
Comput. Speech Lang., 2017
Comput. Speech Lang., 2017
Does speech enhancement work with end-to-end ASR objectives?: Experimental analysis of multichannel end-to-end ASR.
Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017
Coupled Initialization of Multi-Channel Non-Negative Matrix Factorization Based on Spatial and Spectral Information.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Semi-Supervised Learning of a Pronunciation Dictionary from Disjoint Phonemic Transcripts and Text.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017
Proceedings of the 34th International Conference on Machine Learning, 2017
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Deep long short-term memory adaptive beamforming networks for multichannel robust speech recognition.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
BLSTM-HMM hybrid system combined with sound activity detection network for polyphonic Sound Event Detection.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017
Language independent end-to-end architecture for joint language identification and speech recognition.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017
Multi-level language modeling and decoding for open vocabulary end-to-end speech recognition.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017
Discriminative Beamforming with Phase-Aware Neural Networks for Speech Enhancement and Recognition.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017
Deep Recurrent Networks for Separation and Recognition of Single-Channel Speech in Nonstationary Background Audio.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017
2016
Automated structure discovery and parameter tuning of neural network language model based on evolution strategy.
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016
Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016
Data Selection by Sequence Summarizing Neural Network in Mismatch Condition Training.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Context-Sensitive and Role-Dependent Spoken Language Understanding Using Bidirectional and Attention LSTMs.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016
Proceedings of the IEEE International Conference on Multimedia and Expo, 2016
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Minimum word error training of long short-term memory recurrent neural network language models for speech recognition.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016
Proceedings of the 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2016
Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 2016
Beamforming networks using spatial covariance features for far-field speech recognition.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016
2015
Effectiveness of dereverberation, feature transformation, discriminative training methods, and system combination approach for various reverberant environments.
EURASIP J. Adv. Signal Process., 2015
Uncertainty training and decoding methods of deep neural networks based on stochastic representation of enhanced features.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Efficient learning for spoken language understanding tasks with word embedding based pre-training.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Robust speech processing using observation uncertainty and uncertainty propagation: session and paper overview.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015
Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR.
Proceedings of the Latent Variable Analysis and Signal Separation, 2015
Automation of system building for state-of-the-art large vocabulary speech recognition using evolution strategy.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015
The MERL/SRI system for the 3RD CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015
The third 'CHiME' speech separation and recognition challenge: Dataset, task and baselines.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015
Feature-space structural MAPLR with regression tree-based multiple transformation matrices for DNN.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015
Cambridge University Press, ISBN: 9781107295360, 2015
2014
J. Signal Process. Syst., 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Sequential maximum mutual information linear discriminant analysis for speech recognition.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014
Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Proceedings of the IEEE International Conference on Acoustics, 2014
Ensemble integration of calibrated speaker localization and statistical speech detection in domestic environments.
Proceedings of the 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays, 2014
Proceedings of the 2014 IEEE Global Conference on Signal and Information Processing, 2014
2013
Feature Enhancement With Joint Use of Consecutive Corrupted and Noise Feature Vectors With Discriminative Region Weighting.
IEEE Trans. Speech Audio Process., 2013
Speech Commun., 2013
Prior-shared feature and model space speaker adaptation by consistently employing map estimation.
Speech Commun., 2013
Training data selection with user's physical characteristics data for acceleration-based activity modeling.
Pers. Ubiquitous Comput., 2013
Cluster-based dynamic variance adaptation for interconnecting speech enhancement pre-processor and speech recognizer.
Comput. Speech Lang., 2013
Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds.
Comput. Speech Lang., 2013
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013
Blocked Gibbs sampling based multi-scale mixture model for speaker clustering on noisy data.
Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing, 2013
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013
Proceedings of the Sixth International Joint Conference on Natural Language Processing, 2013
Proceedings of the IEEE International Conference on Acoustics, 2013
The second 'chime' speech separation and recognition challenge: Datasets, tasks and baselines.
Proceedings of the IEEE International Conference on Acoustics, 2013
Effectiveness of discriminative training and feature transformation for reverberated and noisy speech.
Proceedings of the IEEE International Conference on Acoustics, 2013
The second 'CHiME' speech separation and recognition challenge: An overview of challenge systems and outcomes.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013
2012
IEEE Trans. Speech Audio Process., 2012
Structural Classification Methods Based on Weighted Finite-State Transducers for Automatic Speech Recognition.
IEEE Trans. Speech Audio Process., 2012
Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera.
IEEE Trans. Speech Audio Process., 2012
Frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection.
Speech Commun., 2012
Fully Bayesian speaker clustering based on hierarchically structured utterance-oriented Dirichlet process mixture model.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012
Bag Of ARCS: New representation of speech segment features based on finite state machines.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
Fully Bayesian inference of multi-mixture Gaussian model and its evaluation using speaker clustering.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
Basis vector orthogonalization for an improved kernel gradient matching pursuit method.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
Noise suppression with unsupervised joint speaker adaptation and noise mixture model estimation.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
Handling uncertain observations in unsupervised topic-mixture language model adaptation.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012
2011
Bayesian linear regression for Hidden Markov Model based on optimizing variational bounds.
Proceedings of the 2011 IEEE International Workshop on Machine Learning for Signal Processing, 2011
Proceedings of the 15th IEEE International Symposium on Wearable Computers (ISWC 2011), 2011
Model Adaptation for Automatic Speech Recognition Based on Multiple Time Scale Evolution.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011
Proceedings of the IJCAI 2011, 2011
Proceedings of the IEEE International Conference on Acoustics, 2011
High accurate model-integration-based voice conversion using dynamic features and model structure optimization.
Proceedings of the IEEE International Conference on Acoustics, 2011
Proceedings of the IEEE International Conference on Acoustics, 2011
Non-stationary noise estimation method based on bias-residual component decomposition for robust speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2011
Variance Compensation for Recognition of Reverberant Speech with Dereverberation Preprocessing.
Proceedings of the Robust Speech Recognition of Uncertain or Missing Data, 2011
2010
Predictor-Corrector Adaptation by Using Time Evolution System With Macroscopic Time Scale.
IEEE Trans. Speech Audio Process., 2010
A Sequential Pattern Classifier Based on Hidden Markov Kernel Machine and Its Application to Phoneme Classification.
IEEE J. Sel. Top. Signal Process., 2010
Online Unsupervised Classification With Model Comparison in the Variational Bayes Framework for Voice Activity Detection.
IEEE J. Sel. Top. Signal Process., 2010
Application of topic tracking model to language model adaptation and meeting analysis.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010
Real-time meeting recognition and understanding using distant microphones and omni-directional camera.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010
Large vocabulary continuous speech recognition using WFST-based linear classifier for structured data.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Probabilistic integration of joint density model and speaker model for voice conversion.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
A regularized discriminative training method of acoustic models derived by minimum relative entropy discrimination.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Improvements of search error risk minimization in viterbi beam search for speech recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Voice activity detection using frame-wise model re-estimation method based on Gaussian pruning with weight normalization.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010
Proceedings of the IEEE International Conference on Acoustics, 2010
A discriminative model for continuous speech recognition based on Weighted Finite State Transducers.
Proceedings of the IEEE International Conference on Acoustics, 2010
Discriminative training based on an integrated view of MPE and MMI in margin and error space.
Proceedings of the IEEE International Conference on Acoustics, 2010
Proceedings of the IEEE International Conference on Acoustics, 2010
Using online model comparison in the Variational Bayes framework for online unsupervised Voice Activity Detection.
Proceedings of the IEEE International Conference on Acoustics, 2010
Proceedings of the IEEE International Conference on Acoustics, 2010
2009
Static and Dynamic Variance Compensation for Recognition of Reverberant Speech With Dereverberation Preprocessing.
IEEE Trans. Speech Audio Process., 2009
Margin-space integration of MPE loss via differencing of MMI functionals for generalized error-weighted discriminative training.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
Stereo-input speech recognition using sparseness-based time-frequency masking in a reverberant environment.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009
Proceedings of the IJCAI 2009, 2009
On-line adaptation and Bayesian detection of environmental changes based on a macroscopic time evolution system.
Proceedings of the IEEE International Conference on Acoustics, 2009
A unified view for discriminative objective functions based on negative exponential of difference measure between strings.
Proceedings of the IEEE International Conference on Acoustics, 2009
2008
A unified interpretation of adaptation approaches based on a macroscopic time evolution system and indirect/direct adaptation approaches.
Proceedings of the IEEE International Conference on Acoustics, 2008
Combined static and dynamic variance adaptation for efficient interconnection of speech enhancement pre-processor with speech recognizer.
Proceedings of the IEEE International Conference on Acoustics, 2008
2007
Proceedings of the IEEE International Conference on Acoustics, 2007
2006
Automatic determination of acoustic model topology using variational Bayesian estimation and clustering for large vocabulary continuous speech recognition.
IEEE Trans. Speech Audio Process., 2006
Speech Recognition Based on Student's t-Distribution Derived from Total Bayesian Framework.
IEICE Trans. Inf. Syst., 2006
IEEE Comput. Intell. Mag., 2006
Acoustic Model Adaptation Based on Coarse/Fine Training of Transfer Vectors Using Directional Statistics.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006
2005
IEICE Trans. Inf. Syst., 2005
Effects of Bayesian predictive classification using variational Bayesian posteriors for sparse training data in speech recognition.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005
2004
IEEE Trans. Speech Audio Process., 2004
Acoustic model adaptation based on coarse/fine training of transfer vectors and its application to a speaker adaptation task.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004
Automatic determination of acoustic model topology using variational Bayesian estimation and clustering.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004
2003
Application of variational Bayesian estimation and clustering to acoustic model adaptation.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003
2002
Proceedings of the Advances in Neural Information Processing Systems 15 [Neural Information Processing Systems, 2002
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002