Shinji Watanabe

CoRR, 2024

SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR.

[BibT_eX]

[DOI]

CoRR, 2024

Fusion of Discrete Representations and Self-Augmented Representations for Multilingual Automatic Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks.

[BibT_eX]

[DOI]

Fabian Ritter Gutierrez

CoRR, 2024

Findings of the IWSLT 2024 Evaluation Campaign.

[BibT_eX]

[DOI]

CoRR, 2024

VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

Improving Multilingual ASR in the Wild Using Simple N-best Re-ranking.

[BibT_eX]

[DOI]

CoRR, 2024

SpoofCeleb: Speech Deepfake Detection and SASV In The Wild.

[BibT_eX]

[DOI]

CoRR, 2024

ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech.

[BibT_eX]

[DOI]

CoRR, 2024

Hypothesis Clustering and Merging: Novel MultiTalker Speech Recognition with Speaker Tokens.

[BibT_eX]

[DOI]

CoRR, 2024

Codec-SUPERB @ SLT 2024: A lightweight benchmark for neural audio codec models.

[BibT_eX]

[DOI]

CoRR, 2024

Preference Alignment Improves Language Model-Based TTS.

[BibT_eX]

[DOI]

CoRR, 2024

Robust Audiovisual Speech Recognition Models with Mixture-of-Experts.

[BibT_eX]

[DOI]

CoRR, 2024

Task Arithmetic for Language Expansion in Speech Translation.

[BibT_eX]

[DOI]

CoRR, 2024

Speaker-IPL: Unsupervised Learning of Speaker Characteristics with i-Vector based Pseudo-Labels.

[BibT_eX]

[DOI]

Tatiana Likhomanenko

Barry-John Theobald

CoRR, 2024

Exploring Prediction Targets in Masked Pre-Training for Speech Foundation Models.

[BibT_eX]

[DOI]

Li-Wei Chen

Takuya Higuchi

He Bai

CoRR, 2024

Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration.

[BibT_eX]

[DOI]

CoRR, 2024

Text-To-Speech Synthesis In The Wild.

[BibT_eX]

[DOI]

CoRR, 2024

Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

CMU's IWSLT 2024 Simultaneous Speech Translation System.

[BibT_eX]

[DOI]

CoRR, 2024

SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data.

[BibT_eX]

[DOI]

CoRR, 2024

The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization.

[BibT_eX]

[DOI]

CoRR, 2024

Multi-Convformer: Extending Conformer with Multiple Convolution Kernels.

[BibT_eX]

[DOI]

CoRR, 2024

Beyond Silence: Bias Analysis through Loss and Asymmetric Approach in Audio Anti-Spoofing.

[BibT_eX]

[DOI]

CoRR, 2024

Contextualized End-to-end Automatic Speech Recognition with Intermediate Biasing Loss.

[BibT_eX]

[DOI]

CoRR, 2024

Decoder-only Architecture for Streaming End-to-end Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

Diffusion-based Generative Modeling with Discriminative Guidance for Streamable Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2024

Rapid Language Adaptation for Multilingual E2E Speech Recognition Using Encoder Prompting.

[BibT_eX]

[DOI]

CoRR, 2024

Finding Task-specific Subnetworks in Multi-task Spoken Language Understanding Model.

[BibT_eX]

[DOI]

CoRR, 2024

MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model.

[BibT_eX]

[DOI]

CoRR, 2024

DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models.

[BibT_eX]

[DOI]

CoRR, 2024

VISinger2+: End-to-End Singing Voice Synthesis Augmented by Self-Supervised Learning Representation.

[BibT_eX]

[DOI]

CoRR, 2024

ML-SUPERB 2.0: Benchmarking Multilingual Speech Models Across Modeling Constraints, Languages, and Datasets.

[BibT_eX]

[DOI]

Vanya Bannihatti Kumar

CoRR, 2024

Self-Supervised Speech Representations are More Phonetic than Semantic.

[BibT_eX]

[DOI]

CoRR, 2024

Neural Blind Source Separation and Diarization for Distant Speech Recognition.

[BibT_eX]

[DOI]

Yoshiaki Bando

Tomohiko Nakamura

CoRR, 2024

The Interspeech 2024 Challenge on Speech Processing Using Discrete Units.

[BibT_eX]

[DOI]

CoRR, 2024

EARS: An Anechoic Fullband Speech Dataset Benchmarked for Speech Enhancement and Dereverberation.

[BibT_eX]

[DOI]

CoRR, 2024

To what extent can ASV systems naturally defend against spoofing attacks?

[BibT_eX]

[DOI]

CoRR, 2024

URGENT Challenge: Universality, Robustness, and Generalizability For Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2024

Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2024

4D ASR: Joint Beam Search Integrating CTC, Attention, Transducer, and Mask Predict Decoders.

[BibT_eX]

[DOI]

CoRR, 2024

Contextualized Automatic Speech Recognition with Dynamic Vocabulary.

[BibT_eX]

[DOI]

CoRR, 2024

Wav2Gloss: Generating Interlinear Glossed Text from Speech.

[BibT_eX]

[DOI]

Taiqi He

Kwanghee Choi

Lindia Tjuatja

Nathaniel R. Robinson

CoRR, 2024

TMT: Tri-Modal Translation between Speech, Image, and Text by Processing Different Modalities as Different Languages.

[BibT_eX]

[DOI]

CoRR, 2024

Evaluating and Improving Continual Learning in Spoken Language Understanding.

[BibT_eX]

[DOI]

CoRR, 2024

Can you Remove the Downstream Model for Speaker Recognition with Self-Supervised Speech Features?

[BibT_eX]

[DOI]

Barry-John Theobald

CoRR, 2024

SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition.

[BibT_eX]

[DOI]

CoRR, 2024

Singing Voice Data Scaling-up: An Introduction to ACE-Opencpop and KiSing-v2.

[BibT_eX]

[DOI]

CoRR, 2024

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models.

[BibT_eX]

[DOI]

CoRR, 2024

SpeechBERTScore: Reference-Aware Automatic Evaluation of Speech Generation Leveraging NLP Evaluation Metrics.

[BibT_eX]

[DOI]

CoRR, 2024

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer.

[BibT_eX]

[DOI]

CoRR, 2024

Improving Design of Input Condition Invariant Speech Enhancement.

[BibT_eX]

[DOI]

CoRR, 2024

Improving ASR Contextual Biasing with Guided Attention.

[BibT_eX]

[DOI]

CoRR, 2024

AugSumm: towards generalizable speech summarization using synthetic labels from large language model.

[BibT_eX]

[DOI]

CoRR, 2024

UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm.

[BibT_eX]

[DOI]

Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Cross-Talk Reduction.

[BibT_eX]

[DOI]

Zhong-Qiu Wang

Anurag Kumar

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

Visual Speech Recognition for Languages with Limited Labeled Data Using Automatic Labels from Whisper.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Cross-Modal Multi-Tasking for Speech-to-Text Translation via Hard Parameter Sharing.

[BibT_eX]

[DOI]

Xuankai Chang

Antonios Anastasopoulos

Yuya Fujita

Proceedings of the IEEE International Conference on Acoustics, 2024

Improving Audio Captioning Models with Fine-Grained Audio Features, Text Embedding Supervision, and LLM Mix-Up Augmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Contextualized Automatic Speech Recognition With Attention-Based Bias Phrase Boosted Beam Search.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Generative Context-Aware Fine-Tuning of Self-Supervised Speech Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Joint Optimization of Streaming and Non-Streaming Automatic Speech Recognition with Multi-Decoder and Knowledge Distillation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

PhISANet: Phonetically Informed Speech Animation Network.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

VoxtLM: Unified Decoder-Only Models for Consolidating Speech Recognition, Synthesis and Speech, Text Continuation Tasks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Hubertopic: Enhancing Semantic Representation of Hubert Through Self-Supervision Utilizing Topic Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Boosting Unknown-Number Speaker Separation with Transformer Decoder-Based Attractor.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

VoxMM: Rich Transcription of Conversations in the Wild.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Towards Practical and Efficient Image-to-Speech Captioning with Vision-Language Pre-Training and Multi-Modal Tokens.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

AugSumm: Towards Generalizable Speech Summarization Using Synthetic Labels from Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Speech Collage: Code-Switched Audio Generation by Collaging Monolingual Corpora.

[BibT_eX]

[DOI]

Shammur Absar Chowdhury

Ahmed Ali

Proceedings of the IEEE International Conference on Acoustics, 2024

Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization.

[BibT_eX]

[DOI]

Amir Hussein

Antonios Anastasopoulos

Proceedings of the IEEE International Conference on Acoustics, 2024

Dynamic-Superb: Towards a Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark For Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Less Peaky and More Accurate CTC Forced Alignment by Label Priors.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Phoneme-Aware Encoding for Prefix-Tree-Based Contextual ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

One Model to Rule Them All ? Towards End-to-End Joint Speaker Diarization and Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Understanding Probe Behaviors Through Variational Bounds of Mutual Information.

[BibT_eX]

[DOI]

Kwanghee Choi

Jee-Weon Jung

Proceedings of the IEEE International Conference on Acoustics, 2024

Summary on the Multimodal Information-Based Speech Processing (MISP) 2023 Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Train Long and Test Long: Leveraging Full Document Contexts in Speech Processing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Semi-Autoregressive Streaming ASR with Label Context.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

FastAdaSP: Multitask-Adapted Efficient Inference for Large Speech Language Model.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: EMNLP 2024, 2024

Towards Robust Speech Representation Learning for Thousands of Languages.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Wav2Gloss: Generating Interlinear Glossed Text from Speech.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

On the Evaluation of Speech Foundation Models for Spoken Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics, 2024

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing.

[BibT_eX]

[DOI]

J. Open Source Softw., November, 2023

Software Design and User Interface of ESPnet-SE++: Speech Enhancement for Robust Speech Processing (espnet-v.202310).

[BibT_eX]

[DOI]

Dataset, October, 2023

STFT-Domain Neural Speech Enhancement With Very Low Algorithmic Latency.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Improving Speech Enhancement Performance by Leveraging Contextual Broad Phonetic Class Information.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

Online Neural Diarization of Unlimited Numbers of Speakers Using Global and Local Attractors.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

LegoNN: Building Modular Encoder-Decoder Models.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2023

A dilemma of ground truth in noisy speech separation and an approach to lessen the impact of imperfect training data.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2023

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch.

[BibT_eX]

[DOI]

CoRR, 2023

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond.

[BibT_eX]

[DOI]

CoRR, 2023

EFFUSE: Efficient Self-Supervised Feature Fusion for E2E ASR in Multilingual and Low Resource Scenarios.

[BibT_eX]

[DOI]

CoRR, 2023

UniverSLU: Universal Spoken Language Understanding for Diverse Classification and Sequence Generation Tasks with a Single Network.

[BibT_eX]

[DOI]

CoRR, 2023

UniAudio: An Audio Foundation Model Toward Universal Audio Generation.

[BibT_eX]

[DOI]

CoRR, 2023

AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models.

[BibT_eX]

[DOI]

CoRR, 2023

Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech.

[BibT_eX]

[DOI]

CoRR, 2023

Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation.

[BibT_eX]

[DOI]

CoRR, 2023

Visual Speech Recognition for Low-resource Languages with Automatic Labels From Whisper Model.

[BibT_eX]

[DOI]

CoRR, 2023

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction.

[BibT_eX]

[DOI]

CoRR, 2023

The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios.

[BibT_eX]

[DOI]

CoRR, 2023

Exploration on HuBERT with Multiple Resolutions.

[BibT_eX]

[DOI]

CoRR, 2023

Improving Cascaded Unsupervised Speech Translation with Denoising Back-translation.

[BibT_eX]

[DOI]

CoRR, 2023

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head.

[BibT_eX]

[DOI]

CoRR, 2023

Neural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated Full- and Sub-Band Modeling.

[BibT_eX]

[DOI]

CoRR, 2023

Multi-Channel Target Speaker Extraction with Refinement: The WavLab Submission to the Second Clarity Enhancement Challenge.

[BibT_eX]

[DOI]

CoRR, 2023

Unsupervised Data Selection for TTS: Using Arabic Broadcast News as a Case Study.

[BibT_eX]

[DOI]

CoRR, 2023

Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

Multilingual TTS Accent Impressions for Accented ASR.

[BibT_eX]

[DOI]

Georgios Karakasidis

Nathaniel R. Robinson

Proceedings of the Text, Speech, and Dialogue - 26th International Conference, 2023

SigMoreFun Submission to the SIGMORPHON Shared Task on Interlinear Glossing.

[BibT_eX]

[DOI]

Taiqi He

Lindia Tjuatja

Nathaniel R. Robinson

Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, 2023

UNSSOR: Unsupervised Neural Speech Separation by Leveraging Over-determined Training Mixtures.

[BibT_eX]

[DOI]

Zhong-Qiu Wang

Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

CMU's IWSLT 2023 Simultaneous Speech Translation System.

[BibT_eX]

[DOI]

Proceedings of the 20th International Conference on Spoken Language Translation, 2023

Findings of the IWSLT 2023 Evaluation Campaign.

[BibT_eX]

[DOI]

Sweta Agrawal

Antonios Anastasopoulos

Alexandra Chronopoulou

Proceedings of the 20th International Conference on Spoken Language Translation, 2023

Deep Speech Synthesis from MRI-Based Articulatory Representations.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Bayes Risk Transducer: Transducer with Controllable Alignment Prediction.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A New Benchmark of Aphasia Speech Recognition and Detection Based on E-Branchformer and Multi-task Learning.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Time-synchronous one-pass Beam Search for Parallel Online and Offline Transducers with Dynamic Block Training.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Exploration on HuBERT with Multiple Resolution.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Incremental Blockwise Beam Search for Simultaneous Speech Translation with Controllable Quality-Latency Tradeoff.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

DPHuBERT: Joint Distillation and Pruning of Self-Supervised Speech Models.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Tensor decomposition for minimization of E2E SLU model toward on-device processing.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Reducing Barriers to Self-Supervised Learning: HuBERT Pre-training with Academic Compute.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

BASS: Block-wise Adaptation for Speech Summarization.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Learning to Speak from Text: Zero-Shot Multilingual Text-to-Speech with Unsupervised Text Pretraining.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Efficient Sequence Transduction by Jointly Predicting Tokens and Durations.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2023

Bayes Risk CTC: Controllable CTC Alignment in Sequence-to-Sequence Tasks.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Learning Representations, 2023

TAPLoss: A Temporal Acoustic Parameter Loss for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Paaploss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Towards Zero-Shot Code-Switched Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Multi-Blank Transducers for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Wav2Seq: Pre-Training Speech-to-Text Encoder-Decoder Models Using Pseudo Languages.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Speaker-Independent Acoustic-to-Articulatory Speech Inversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

FNeural Speech Enhancement with Very Low Algorithmic Latency and Complexity via Integrated full- and sub-band Modeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

TF-GRIDNET: Making Time-Frequency Domain Models Great Again for Monaural Speaker Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Context-Aware Fine-Tuning of Self-Supervised Speech Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Enhancing Speech-To-Speech Translation with Multiple TTS Targets.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Bridging Speech and Textual Pre-Trained Models With Unsupervised ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

I3D: Transformer Architectures with Input-Dependent Dynamic Depth for Speech Recognition.

[BibT_eX]

[DOI]

Yifan Peng

Proceedings of the IEEE International Conference on Acoustics, 2023

Structured Pruning of Self-Supervised Pre-Trained Models for Speech Recognition and Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Align, Write, Re-Order: Explainable End-to-End Speech Translation via Operation Sequence Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Speechlmscore: Evaluating Speech Generation Using Speech Language Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Fully Unsupervised Topic Clustering of Unlabelled Spoken Audio Using Self-Supervised Representation Learning and Topic Model.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Articulatory Representation Learning via Joint Factor Analysis and Neural Matrix Factorization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

E-Branchformer-Based E2E SLU Toward Stop on-Device Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Speech Summarization of Long Spoken Document: Improving Memory Efficiency of Speech/Text Encoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

In Search of Strong Embedding Extractors for Speaker Diarisation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

FindAdaptNet: Find and Insert Adapters by Learned Layer Importance.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

BECTRA: Transducer-Based End-To-End ASR with Bert-Enhanced Encoder.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Intermpl: Momentum Pseudo-Labeling With Intermediate CTC Loss.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Euro: Espnet Unsupervised ASR Open-Source Toolkit.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Streaming Joint Speech Recognition and Disfluency Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

The Pipeline System of ASR and NLU with MLM-based data Augmentation Toward Stop Low-Resource Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Multi-Channel Speaker Extraction with Adversarial Training: The Wavlab Submission to The Clarity ICASSP 2023 Grand Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Improving Massively Multilingual ASR with Auxiliary CTC Objectives.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

A Unified One-Shot Prosody and Speaker Conversion System with Self-Supervised Discrete Speech Units.

[BibT_eX]

[DOI]

Li-Wei Chen

Alexander Rudnicky

Proceedings of the IEEE International Conference on Acoustics, 2023

Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Avoid Overthinking in Self-Supervised Models for Speech Recognition.

[BibT_eX]

[DOI]

Dan Berrebbi

Proceedings of the IEEE International Conference on Acoustics, 2023

A Study on the Integration of Pipeline and E2E SLU Systems for Spoken Semantic Parsing Toward Stop Quality Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

CTC Alignments Improve Autoregressive Translation.

[BibT_eX]

[DOI]

Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

Toward Universal Speech Enhancement For Diverse Input Conditions.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Segment-Level Vectorized Beam Search Based on Partially Autoregressive Inference.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Domain Adaptation by Data Distribution Matching Via Submodularity For Speech Recognition.

[BibT_eX]

[DOI]

Yusuke Shinohara

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Findings of the 2023 ML-Superb Challenge: Pre-Training And Evaluation Over More Languages And Beyond.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Espnet-Summ: Introducing a Novel Large Dataset, Toolkit, and a Cross-Corpora Evaluation of Speech Summarization Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

A Single Speech Enhancement Model Unifying Dereverberation, Denoising, Speaker Counting, Separation, And Extraction.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Yodas: Youtube-Oriented Dataset for Audio and Speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Summarize While Translating: Universal Model With Parallel Decoding for Summarization and Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

TorchAudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

LV-CTC: Non-Autoregressive ASR With CTC and Latent Variable Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Joint Prediction and Denoising for Large-Scale Multilingual Self-Supervised Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Synthetic Data Augmentation for ASR with Domain Filtering.

[BibT_eX]

[DOI]

Proceedings of the Asia Pacific Signal and Information Processing Association Annual Summit and Conference, 2023

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2023

SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

A Vector Quantized Approach for Text to Speech Synthesis on Real-World Spontaneous Speech.

[BibT_eX]

[DOI]

Li-Wei Chen

Alexander Rudnicky

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

End-to-End Dereverberation, Beamforming, and Speech Recognition in a Cocktail Party.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

Encoder-Decoder Based Attractors for End-to-End Neural Diarization.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

Improving Frame-Online Neural Speech Enhancement With Overlapped-Frame Prediction.

[BibT_eX]

[DOI]

Zhong-Qiu Wang

IEEE Signal Process. Lett., 2022

Self-Supervised Speech Representation Learning: A Review.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2022

Editorial Editorial of Special Issue on Self-Supervised Learning for Speech and Audio Processing.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2022

Deep learning based multi-source localization with source splitting and its effectiveness in multi-talker speech recognition.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2022

An investigation of neural uncertainty estimation for target speaker extraction equipped RNN transducer.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2022

Train from scratch: Single-stage joint training of speech separation and recognition.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2022

A review of speaker diarization: Recent advances with deep learning.

[BibT_eX]

[DOI]

Tae Jin Park

Naoyuki Kanda

Dimitrios Dimitriadis

Kyu Jeong Han

Shrikanth Narayanan

Comput. Speech Lang., 2022

Arabic speech recognition by end-to-end, modular systems and human.

[BibT_eX]

[DOI]

Amir Hussein

Ahmed Ali

Comput. Speech Lang., 2022

Joint speaker diarization and speech recognition based on region proposal networks.

[BibT_eX]

[DOI]

Zili Huang

Desh Raj

Comput. Speech Lang., 2022

Large-scale learning of generalised representations for speaker recognition.

[BibT_eX]

[DOI]

CoRR, 2022

Online Neural Diarization of Unlimited Numbers of Speakers.

[BibT_eX]

[DOI]

CoRR, 2022

Muskits: an End-to-End Music Processing Toolkit for Singing Voice Synthesis.

[BibT_eX]

[DOI]

CoRR, 2022

HEAR 2021: Holistic Evaluation of Audio Representations.

[BibT_eX]

[DOI]

CoRR, 2022

End-to-End Multi-Speaker ASR with Independent Vector Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

A Study on the Integration of Pre-Trained SSL, ASR, LM and SLU Models for Spoken Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

On Compressing Sequences for Self-Supervised Speech Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

E-Branchformer: Branchformer with Enhanced Merging for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Superb @ SLT 2022: Challenge on Generalization and Efficiency of Self-Supervised Speech Representation Learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Phone Inventories and Recognition for Every Language.

[BibT_eX]

[DOI]

Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

CMU's IWSLT 2022 Dialect Speech Translation System.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Spoken Language Translation, 2022

Findings of the IWSLT 2022 Evaluation Campaign.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Spoken Language Translation, 2022

Audio-Visual Wake Word Spotting in MISP2021 Challenge: Dataset Release and Deep Analysis.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Online Continual Learning of End-to-End Speech Recognition Models.

[BibT_eX]

[DOI]

Muqiao Yang

Ian R. Lane

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving Speech Enhancement through Fine-Grained Speech Characteristics.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Deep Speech Synthesis from Articulatory Representations.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Residual Language Model for End-to-end Speech Recognition.

[BibT_eX]

[DOI]

Chaitanya Prasad Narisetty

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Updating Only Encoders Prevents Catastrophic Forgetting of End-to-End ASR Models.

[BibT_eX]

[DOI]

Yuki Takashima

Shota Horiguchi

Yohei Kawaguchi

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Automatic Speech Recognition with Re-blocking Processing Based on Integrated Voice Activity Detection.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Minimum latency training of sequence transducers for streaming end-to-end speech recognition.

[BibT_eX]

[DOI]

Yusuke Shinohara

Nathaniel Romney Robinson

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

VQ-T: RNN Transducers using Vector-Quantized Prediction Network States.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Muskits: an End-to-end Music Processing Toolkit for Singing Voice Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

When Is TTS Augmentation Through a Pivot Language Useful?

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

ASR2K: Speech Recognition for Around 2000 Languages without Audio.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Memory-Efficient Training of RNN-Transducer with Sampled Softmax.

[BibT_eX]

[DOI]

Lukas Lee

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Better Intermediates Improve CTC Inference.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

TriniTTS: Pitch-controllable End-to-end TTS without External Aligner.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Combining Spectral and Self-Supervised Features for Low Resource Speech Recognition and Translation.

[BibT_eX]

[DOI]

Dan Berrebbi

Jiatong Shi

Osbel López-Francisco

Jonathan D. Amith

Vincent Quenneville-Bélair

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Two-Pass Low Latency End-to-End Spoken Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Machine Learning, 2022

Torchaudio: Building Blocks for Audio and Speech Processing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Run-and-Back Stitch Search: Novel Block Synchronous Decoding For Streaming Encoder-Decoder ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Non-Autoregressive End-To-End Automatic Speech Recognition Incorporating Downstream Natural Language Processing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Joint Speech Recognition and Audio Captioning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Sequence Transduction with Graph-Based Supervision.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

An Exploration of Hubert with Large Number of Cluster Units and Model Assessment Using Bayesian Information Criterion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Conditional Diffusion Probabilistic Model for Speech Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Towards Low-Distortion Multi-Channel Speech Enhancement: The ESPNET-Se Submission to the L3DAS22 Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Integrating Multiple ASR Systems into NLP Backend with Attention Fusion.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

S3PRL-VC: Open-Source Voice Conversion Framework with Self-Supervised Speech Representations.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Investigating Self-Supervised Learning for Speech Enhancement and Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Multi-Channel End-To-End Neural Diarization with Distributed Microphones.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Improving Non-Autoregressive End-to-End Speech Recognition with Pre-Trained Acoustic and Language Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

ESPnet-SLU: Advancing Spoken Language Understanding Through ESPnet.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

BERT Meets CTC: New Formulation of End-to-End Speech Recognition with Pre-trained Masked Language Model.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities.

[BibT_eX]

[DOI]

Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Zero-shot Learning for Grapheme to Phoneme Conversion with Language Ensemble.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2021

Non-Autoregressive Transformer for Speech Recognition.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2021

Far-Field Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proc. IEEE, 2021

Discretization and Re-synthesis: an alternative method to solve the Cocktail Party Problem.

[BibT_eX]

[DOI]

CoRR, 2021

JTubeSpeech: corpus of Japanese speech collected from YouTube for speech recognition and speaker verification.

[BibT_eX]

[DOI]

CoRR, 2021

TorchAudio: Building Blocks for Audio and Speech Processing.

[BibT_eX]

[DOI]

CoRR, 2021

ESPnet2-TTS: Extending the Edge of TTS Research.

[BibT_eX]

[DOI]

CoRR, 2021

Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring.

[BibT_eX]

[DOI]

CoRR, 2021

Encoder-Decoder Based Attractor Calculation for End-to-End Neural Diarization.

[BibT_eX]

[DOI]

CoRR, 2021

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10, 000 Hours of Transcribed Audio.

[BibT_eX]

[DOI]

CoRR, 2021

INTERSPEECH 2021 ConferencingSpeech Challenge: Towards Far-field Multi-Channel Speech Enhancement for Video Conferencing.

[BibT_eX]

[DOI]

CoRR, 2021

The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap.

[BibT_eX]

[DOI]

CoRR, 2021

Online End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers.

[BibT_eX]

[DOI]

CoRR, 2021

Closing the Gap Between Time-Domain Multi-Channel Speech Enhancement on Real and Simulation Conditions.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2021

Online End-To-End Neural Diarization with Speaker-Tracing Buffer.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Streaming Transformer Asr With Blockwise Synchronous Beam Search.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

End-to-End Speaker Diarization Conditioned on Speech Activity and Overlap Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

DOVER-Lap: A Method for Combining Overlap-Aware Diarization Outputs.

[BibT_eX]

[DOI]

Desh Raj

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Integration of Speech Separation, Diarization, and Recognition for Multi-Speaker Meetings: System Description, Comparison, and Analysis.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Dual-Path RNN for Long Recording Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

ESPnet-SE: End-To-End Speech Enhancement and Separation Toolkit Designed for ASR Integration.

[BibT_eX]

[DOI]

Chenda Li

Jing Shi

Wangyou Zhang

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

HEAR: Holistic Evaluation of Audio Representations.

[BibT_eX]

[DOI]

Proceedings of the NeurIPS 2021 Competitions and Demonstrations Track, 2021

End-to-end ASR to jointly predict transcriptions and linguistic annotations.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Source and Target Bidirectional Knowledge Distillation for End-to-end Speech Translation.

[BibT_eX]

[DOI]

Tatsuya Kawahara

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks.

[BibT_eX]

[DOI]

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Self-Guided Curriculum Learning for Neural Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on Spoken Language Translation, 2021

ESPnet-ST IWSLT 2021 Offline Speech Translation System.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on Spoken Language Translation, 2021

Auxiliary Loss Function for Target Speech Extraction and Recognition with Weak Supervision Based on Speaker Characteristics.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

SUPERB: Speech Processing Universal PERformance Benchmark.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Differentiable Allophone Graphs for Language-Universal Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Online Streaming End-to-End Neural Diarization Handling Overlapping Speech and Flexible Numbers of Speakers.

[BibT_eX]

[DOI]

Kenji Nagamatsu

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Streaming End-to-End ASR Based on Blockwise Non-Autoregressive Models.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Data Augmentation Methods for End-to-End Speech Recognition on Distant-Talk Scenarios.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Semi-Supervised Training with Pseudo-Labeling for End-To-End Neural Diarization.

[BibT_eX]

[DOI]

Kenji Nagamatsu

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Leveraging Pre-Trained Language Model for Speech Sentiment Analysis.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

SPGISpeech: 5, 000 Hours of Transcribed Financial Audio for Fully Formatted End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Speech Representation Learning Combining Conformer CPC with Deep Cluster for the ZeroSpeech Challenge 2021.

[BibT_eX]

[DOI]

Alexander I. Rudnicky

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Speaker Verification-Based Evaluation of Single-Channel Speech Separation.

[BibT_eX]

[DOI]

Matthew Maciejewski

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Layer Pruning on Demand with Intermediate CTC.

[BibT_eX]

[DOI]

Jingu Kang

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Acoustic Event Detection with Classifier Chains.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Multi-Mode Transformer Transducer with Stochastic Future Context.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Continuous Speech Separation Using Speaker Inventory for Long Recording.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Toward Streaming ASR with Non-Autoregressive Insertion-Based Model.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

GigaSpeech: An Evolving, Multi-Domain ASR Corpus with 10, 000 Hours of Transcribed Audio.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

End-to-End Dereverberation, Beamforming, and Speech Recognition with Improved Numerical Stability and Advanced Frontend.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Improving RNN Transducer with Target Speaker Extraction and Neural Uncertainty Estimation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

End-To-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Training Noisy Single-Channel Speech Separation with Noisy Oracle Sources: A Large Gap and a Small Step.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Dual-Path Modeling for Long Recording Speech Separation in Meetings.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Intermediate Loss Regularization for CTC-Based Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Gaussian Kernelized Self-Attention for Long Sequence Data and its Application to CTC-Based Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

ORTHROS: non-autoregressive end-to-end speech translation With dual-decoder.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

End-To-End Speaker Diarization as Post-Processing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Improved Mask-CTC for Non-Autoregressive End-to-End ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Recent Developments on Espnet Toolkit Boosted By Conformer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Eat: Enhanced ASR-TTS for Self-Supervised Speech Recognition.

[BibT_eX]

[DOI]

Lukás Burget

Jan Honza Cernocký

Proceedings of the IEEE International Conference on Acoustics, 2021

Leveraging End-to-End ASR for Endangered Language Documentation: An Empirical Study on Yolóxochitl Mixtec.

[BibT_eX]

[DOI]

Jiatong Shi

Jonathan D. Amith

Rey Castillo García

Esteban Guadalupe Sierra

Chaitanya Prasad Narisetty

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

Leveraging State-of-the-art ASR Techniques to Audio Captioning.

[BibT_eX]

[DOI]

Proceedings of the 6th Workshop on Detection and Classification of Acoustic Scenes and Events 2021 (DCASE 2021), 2021

Cross-Lingual Transfer for Speech Processing Using Acoustic Language Similarity.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Conferencingspeech Challenge: Towards Far-Field Multi-Channel Speech Enhancement for Video Conferencing.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Attention-Based Multi-Hypothesis Fusion for Speech Summarization.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

On Prosody Modeling for ASR+TTS Based Voice Conversion.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Towards Neural Diarization for Unlimited Numbers of Speakers Using Global and Local Attractors.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

A Study of Transducer Based End-to-End ASR with ESPnet: Architecture, Auxiliary Loss and Decoding Strategies.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Understanding the Tradeoffs in Client-side Privacy for Downstream Speech Tasks.

[BibT_eX]

[DOI]

Louis-Philippe Morency

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

A Study on Speech Enhancement Based on Diffusion Probabilistic Model.

[BibT_eX]

[DOI]

Yen-Ju Lu

Yu Tsao

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2020

Automated Development of DNN Based Spoken Language Systems Using Evolutionary Algorithms.

[BibT_eX]

[DOI]

Takahiro Shinozaki

Proceedings of the Deep Neural Evolution - Deep Learning with Evolutionary Computation, 2020

Improving End-to-End Single-Channel Multi-Talker Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

Multi-Stream End-to-End Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2020

The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans.

[BibT_eX]

[DOI]

Wangyou Zhang

CoRR, 2020

Continuous Speech Separation Using Speaker Inventory for Long Multi-talker Recording.

[BibT_eX]

[DOI]

CoRR, 2020

Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation.

[BibT_eX]

[DOI]

CoRR, 2020

Augmentation adversarial training for unsupervised speaker recognition.

[BibT_eX]

[DOI]

CoRR, 2020

Streaming Transformer ASR with Blockwise Synchronous Inference.

[BibT_eX]

[DOI]

CoRR, 2020

The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge.

[BibT_eX]

[DOI]

Ashish Arora

Desh Raj

CoRR, 2020

Online End-to-End Neural Diarization with Speaker-Tracing Buffer.

[BibT_eX]

[DOI]

CoRR, 2020

Neural Speaker Diarization with Speaker-Wise Chain Rule.

[BibT_eX]

[DOI]

CoRR, 2020

DiscreTalk: Text-to-Speech as a Machine Translation Problem.

[BibT_eX]

[DOI]

Tomoki Hayashi

CoRR, 2020

CHiME-6 Challenge: Tackling Multispeaker Speech Recognition for Unsegmented Recordings.

[BibT_eX]

[DOI]

CoRR, 2020

End-to-End Neural Diarization: Reformulating Speaker Diarization as Simple Multi-label Classification.

[BibT_eX]

[DOI]

CoRR, 2020

Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming.

[BibT_eX]

[DOI]

Wangyou Zhang

Xuankai Chang

Yanmin Qian

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Insertion-Based Modeling for End-to-End Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Learning Speaker Embedding from Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-End ASR with Adaptive Span Self-Attention.

[BibT_eX]

[DOI]

Xuankai Chang

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Speaker-Conditional Chain Model for Speech Separation and Extraction.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-End Automatic Speech Recognition Integrated with CTC-Based Voice Activity Detection.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Far-Field Location Guided Target Speech Extraction Using End-to-End Speech Recognition Objectives.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Weakly-Supervised Sound Event Detection with Self-Attention.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A Practical Two-Stage Training Strategy for Multi-Stream End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Semi-Supervised Speaker Adaptation for End-to-End Speech Synthesis with Pretrained Models.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Speaker Diarization with Region Proposal Network.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Espnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Attention-Based ASR with Lightweight and Dynamic Convolutions.

[BibT_eX]

[DOI]

Yuya Fujita

Motoi Omachi

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-To-End Multi-Speaker Speech Recognition With Transformer.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Conformer-Based Sound Event Detection with Semi-Supervised Learning and Data Augmentation.

[BibT_eX]

[DOI]

Proceedings of 5th the Workshop on Detection and Classification of Acoustic Scenes and Events 2020 (DCASE 2020), 2020

The Sequence-to-Sequence Baseline for the Voice Conversion Challenge 2020: Cascading ASR and TTS.

[BibT_eX]

[DOI]

Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

ESPnet-ST: All-in-One Speech Translation Toolkit.

[BibT_eX]

[DOI]

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2020

2019

Evolution-Strategy-Based Automation of System Development for High-Performance Speech Recognition.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2019

Speech Processing for Digital Home Assistants: Combining signal processing with deep-learning techniques.

[BibT_eX]

[DOI]

IEEE Signal Process. Mag., 2019

Introduction to the Issue on Far-Field Speech Processing in the Era of Deep Learning: Speech Enhancement, Separation, and Recognition.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2019

Phasebook and Friends: Leveraging Discrete Representations for Source Separation.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2019

Listen and Fill in the Missing Letters: Non-Autoregressive Transformer for Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2019

Towards Online End-to-end Transformer Automatic Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2019

Self-supervised Sequence-to-sequence ASR using Unpaired Speech and Text.

[BibT_eX]

[DOI]

Lukás Burget

Jan Cernocký

CoRR, 2019

Dry, Focus, and Transcribe: End-to-End Integration of Dereverberation, Beamforming, and ASR.

[BibT_eX]

[DOI]

CoRR, 2019

Generalized Weighted-Prediction-Error Dereverberation with Varying Source Priors For Reverberant Speech Recognition.

[BibT_eX]

[DOI]

Toru Taniguchi

Proceedings of the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019

Speech Enhancement Using End-to-End Speech Recognition Objectives.

[BibT_eX]

[DOI]

Xiaofei Wang

Proceedings of the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019

Analysis of Robustness of Deep Single-Channel Speech Separation Using Corpora Constructed From Multiple Domains.

[BibT_eX]

[DOI]

Matthew Maciejewski

Gregory Sell

Yusuke Fujita

Proceedings of the 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2019

Massively Multilingual Adversarial Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

ESPnet How2 Speech Translation System for IWSLT 2019: Pre-training, Knowledge Distillation, and Going Deeper.

[BibT_eX]

[DOI]

Shun Kiyono

Jun Suzuki

Proceedings of the 16th International Conference on Spoken Language Translation, 2019

Pretraining by Backtranslation for End-to-End ASR in Low-Resource Settings.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

End-to-End Multilingual Multi-Speaker Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Vectorized Beam Search for CTC-Attention-Based Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Study of the Performance of Automatic Speech Recognition Systems in Speakers with Parkinson's Disease.

[BibT_eX]

[DOI]

Laureano Moro-Velázquez

Mark A. Hasegawa-Johnson

Odette Scharenborg

Heejin Kim

Najim Dehak

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Improving Transformer-Based End-to-End Speech Recognition with Connectionist Temporal Classification and Language Model Integration.

[BibT_eX]

[DOI]

Shigeki Karita

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Analysis of Multilingual Sequence-to-Sequence Speech Recognition Systems.

[BibT_eX]

[DOI]

Martin Karafiát

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Pre-Trained Text Embeddings for Enhanced Text-to-Speech Synthesis.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Speaker Recognition Benchmark Using the CHiME-5 Corpus.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

End-to-End Neural Speaker Diarization with Permutation-Free Objectives.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

End-to-End SpeakerBeam for Single Channel Target Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Semi-Supervised Sequence-to-Sequence ASR Using Unpaired Speech and Text.

[BibT_eX]

[DOI]

Lukás Burget

Jan Cernocký

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Weakly-Supervised Deep Recurrent Neural Networks for Basic Dance Step Generation.

[BibT_eX]

[DOI]

Proceedings of the International Joint Conference on Neural Networks, 2019

Using ASR Methods for OCR.

[BibT_eX]

[DOI]

Proceedings of the 2019 International Conference on Document Analysis and Recognition, 2019

Improving End-to-end Speech Recognition with Pronunciation-assisted Sub-word Modeling.

[BibT_eX]

[DOI]

Hainan Xu

Shuoyang Ding

Proceedings of the IEEE International Conference on Acoustics, 2019

Stream Attention-based Multi-array End-to-end Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

The Phasebook: Building Complex Masks via Discrete Representations for Source Separation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Acoustic Modeling for Overlapping Speech Recognition: Jhu Chime-5 Challenge System.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Joint Acoustic and Class Inference for Weakly Supervised Sound Event Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Semi-supervised End-to-end Speech Recognition Using Text-to-speech and Autoencoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Acoustic Modeling for Distant Multi-talker Speech Recognition with Single- and Multi-channel Branches.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Transfer Learning of Language-independent End-to-end ASR with Language Model Fusion.

[BibT_eX]

[DOI]

Tatsuya Kawahara

Proceedings of the IEEE International Conference on Acoustics, 2019

Cycle-consistency Training for End-to-end Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Language Model Integration Based on Memory Control for Sequence to Sequence Speech Recognition.

[BibT_eX]

[DOI]

Jesús Villalba

Najim Dehak

Proceedings of the IEEE International Conference on Acoustics, 2019

End-to-end Monaural Multi-speaker ASR System without Pretraining.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Promising Accurate Prefix Boosting for Sequence-to-sequence ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

CNN-based Multichannel End-to-End Speech Recognition for Everyday Home Environments<sup>*</sup>.

[BibT_eX]

[DOI]

Proceedings of the 27th European Signal Processing Conference, 2019

Espresso: A Fast End-to-End Neural Speech Recognition Toolkit.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Transformer ASR with Contextual Block Processing.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

A Comparative Study on Transformer vs RNN in Speech Applications.

[BibT_eX]

[DOI]

Ryuichi Yamamoto

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Multilingual End-to-End Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

End-to-End Neural Speaker Diarization with Self-Attention.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Low Resource Multi-modal Data Augmentation for End-to-end ASR.

[BibT_eX]

[DOI]

CoRR, 2018

Multi-encoder multi-resolution framework for end-to-end speech recognition.

[BibT_eX]

[DOI]

Ruizhi Li

Xiaofei Wang

Sri Harish Reddy Mallidi

Hynek Hermansky

CoRR, 2018

Vectorization of hypotheses and speech for faster beam search in encoder decoder-based speech recognition.

[BibT_eX]

[DOI]

Hiroshi Seki

CoRR, 2018

CNN-based MultiChannel End-to-End Speech Recognition for everyday home environments.

[BibT_eX]

[DOI]

CoRR, 2018

Building Corpora for Single-Channel Speech Separation Across Multiple Domains.

[BibT_eX]

[DOI]

Matthew Maciejewski

Gregory Sell

CoRR, 2018

Low-Resource Contextual Topic Identification on Speech.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

End-to-end Speech Recognition With Word-Based Rnn Language Models.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Back-Translation-Style Data Augmentation for end-to-end ASR.

[BibT_eX]

[DOI]

Kazuya Takeda

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Multilingual Sequence-to-Sequence Speech Recognition: Architecture, Transfer Learning, and Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

The JHU/KyotoU Speech Translation System for IWSLT 2018.

[BibT_eX]

[DOI]

Xuan Zhang

Zhiqi Wang

Proceedings of the 15th International Conference on Spoken Language Translation, 2018

ESPnet: End-to-End Speech Processing Toolkit.

[BibT_eX]

[DOI]

Jahn Heymann

Nanxin Chen

Tsubasa Ochiai

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Student-Teacher Learning for BLSTM Mask-based Speech Enhancement.

[BibT_eX]

[DOI]

Szu-Jui Chen

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Multi-Modal Data Augmentation for End-to-end ASR.

[BibT_eX]

[DOI]

Shuoyang Ding

Peter Sibbern Frederiksen

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Semi-Supervised End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Multi-Head Decoder for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Effectiveness of Single-Channel BLSTM Enhancement for Language Identification.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Auxiliary Feature Based Adaptation of End-to-end ASR Systems.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Building State-of-the-art Distant Speech Recognition Using the CHiME-4 Challenge with a Setup of Speech Enhancement Baseline.

[BibT_eX]

[DOI]

Szu-Jui Chen

Hainan Xu

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

The Fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, Task and Baselines.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

End-to-End Multi-Speaker Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

An End-to-End Language-Tracking Speech Recognizer for Mixed-Language Speech.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Speaker Adaptation for Multichannel End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A Purely End-to-End System for Multi-speaker Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, 2018

2017

Duration-Controlled LSTM for Polyphonic Sound Event Detection.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2017

Hybrid CTC/Attention Architecture for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2017

Unified Architecture for Multichannel End-to-End Speech Recognition With Neural Beamforming.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2017

Prior-based Binary Masking and Discriminative Methods for Reverberant and Noisy Speech Recognition Using Distant Stereo Microphones.

[BibT_eX]

[DOI]

J. Inf. Process., 2017

An analysis of environment, microphone and data simulation mismatches in robust speech recognition.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2017

Multi-microphone speech recognition integrating beamforming, robust feature extraction, and advanced DNN/RNN backend.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2017

The third 'CHiME' speech separation and recognition challenge: Analysis and outcomes.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2017

Multi-microphone speech recognition in everyday environments.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2017

Does speech enhancement work with end-to-end ASR objectives?: Experimental analysis of multichannel end-to-end ASR.

[BibT_eX]

[DOI]

Tsubasa Ochiai

Shigeru Katagiri

Proceedings of the 27th IEEE International Workshop on Machine Learning for Signal Processing, 2017

Coupled Initialization of Multi-Channel Non-Negative Matrix Factorization Based on Spatial and Spectral Information.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Semi-Supervised Learning of a Pronunciation Dictionary from Disjoint Phonemic Transcripts and Text.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Advances in Joint CTC-Attention Based End-to-End Speech Recognition with a Deep CNN Encoder and RNN-LM.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Multichannel End-to-end Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 34th International Conference on Machine Learning, 2017

Student-teacher network learning with enhanced features.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Deep long short-term memory adaptive beamforming networks for multichannel robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Joint CTC-attention based end-to-end speech recognition using multi-task learning.

[BibT_eX]

[DOI]

Suyoun Kim

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

BLSTM-HMM hybrid system combined with sound activity detection network for polyphonic Sound Event Detection.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Language independent end-to-end architecture for joint language identification and speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Composite embedding systems for ZeroSpeech2017 Track1.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Multi-level language modeling and decoding for open vocabulary end-to-end speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Joint CTC/attention decoding for end-to-end speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, 2017

Discriminative Beamforming with Phase-Aware Neural Networks for Speech Enhancement and Recognition.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

Toolkits for Robust Speech Processing.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

Preliminaries.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

Training Data Augmentation and Data Selection.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

Novel Deep Architectures in Speech Processing.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

Deep Recurrent Networks for Separation and Recognition of Single-Channel Speech in Nonstationary Background Audio.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

The CHiME Challenges: Robust Speech Recognition in Everyday Environments.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016

Automated structure discovery and parameter tuning of neural network language model based on evolution strategy.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Dialog state tracking with attention-based sequence-to-sequence learning.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE Spoken Language Technology Workshop, 2016

Data Selection by Sequence Summarizing Neural Network in Mismatch Condition Training.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Single-Channel Multi-Speaker Separation Using Deep Clustering.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Context-Sensitive and Role-Dependent Spoken Language Understanding Using Bidirectional and Attention LSTMs.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Driver confusion status detection using recurrent neural networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Multimedia and Expo, 2016

Deep beamforming networks for multi-channel speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Deep unfolding for multichannel source separation.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Sequence summarizing neural network for speaker adaptation.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Minimum word error training of long short-term memory recurrent neural network language models for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Deep clustering: Discriminative embeddings for segmentation and separation.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

High-accuracy user identification using EEG biometrics.

[BibT_eX]

[DOI]

Proceedings of the 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2016

Bidirectional LSTM-HMM Hybrid System for Polyphonic Sound Event Detection.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Detection and Classification of Acoustic Scenes and Events, 2016

Beamforming networks using spatial covariance features for far-field speech recognition.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2016

2015

Effectiveness of dereverberation, feature transformation, discriminative training methods, and system combination approach for various reverberant environments.

[BibT_eX]

[DOI]

Tomohiro Narita

EURASIP J. Adv. Signal Process., 2015

Uncertainty training and decoding methods of deep neural networks based on stochastic representation of enhanced features.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Efficient learning for spoken language understanding tasks with word embedding based pre-training.

[BibT_eX]

[DOI]

Yi Luan

Bret Harsham

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Robust speech processing using observation uncertainty and uncertainty propagation: session and paper overview.

[BibT_eX]

[DOI]

Dorothea Kolossa

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Uncertainty propagation through deep neural networks.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Discriminative method for recurrent neural network language models.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Structure discovery of deep neural network based on evolutionary algorithms.

[BibT_eX]

[DOI]

Takahiro Shinozaki

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR.

[BibT_eX]

[DOI]

Proceedings of the Latent Variable Analysis and Signal Separation, 2015

Automation of system building for state-of-the-art large vocabulary speech recognition using evolution strategy.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Robust speech recognition in unknown reverberant and noisy conditions.

[BibT_eX]

[DOI]

Sri Harish Reddy Mallidi

Hynek Hermansky

Stavros Tsakalidis

Richard M. Schwartz

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

The MERL/SRI system for the 3RD CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

The third 'CHiME' speech separation and recognition challenge: Dataset, task and baselines.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Feature-space structural MAPLR with regression tree-based multiple transformation matrices for DNN.

[BibT_eX]

[DOI]

Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

Bayesian Speech and Language Processing

[BibT_eX]

[DOI]

Jen-Tzung Chien

Cambridge University Press, ISBN: 9781107295360, 2015

2014

Structural Bayesian Linear Regression for Hidden Markov Models.

[BibT_eX]

[DOI]

Biing-Hwang Fred Juang

J. Signal Process. Syst., 2014

Discriminative NMF and its application to single-channel source separation.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Cost-level integration of statistical and rule-based dialog managers.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Sequential maximum mutual information linear discriminant analysis for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Deep recurrent de-noising auto-encoder and blind de-reverberation for reverberated speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Recurrent deep neural networks for robust speech recognition.

[BibT_eX]

[DOI]

Chao Weng

Dong Yu

Biing-Hwang Fred Juang

Proceedings of the IEEE International Conference on Acoustics, 2014

Black box optimization for automatic speech recognition.

[BibT_eX]

[DOI]

Jonathan Le Roux

Proceedings of the IEEE International Conference on Acoustics, 2014

Log-linear dialog manager.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Ensemble integration of calibrated speaker localization and statistical speech detection in domestic environments.

[BibT_eX]

[DOI]

Proceedings of the 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays, 2014

Sequence discriminative training for low-rank deep neural networks.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Global Conference on Signal and Information Processing, 2014

2013

Feature Enhancement With Joint Use of Consecutive Corrupted and Noise Feature Vectors With Discriminative Region Weighting.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2013

Influence relation estimation based on lexical entrainment in conversation.

[BibT_eX]

[DOI]

Speech Commun., 2013

Prior-shared feature and model space speaker adaptation by consistently employing map estimation.

[BibT_eX]

[DOI]

Speech Commun., 2013

Training data selection with user's physical characteristics data for acceleration-based activity modeling.

[BibT_eX]

[DOI]

Takuya Maekawa

Pers. Ubiquitous Comput., 2013

Cluster-based dynamic variance adaptation for interconnecting speech enhancement pre-processor and speech recognizer.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2013

Speech recognition in living rooms: Integrated speech enhancement and recognition system based on spatial, spectral and temporal modeling of sounds.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2013

Ensemble learning for speech enhancement.

[BibT_eX]

[DOI]

Jonathan Le Roux

Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013

Blocked Gibbs sampling based multi-scale mixture model for speaker clustering on noisy data.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing, 2013

Discriminative training of acoustic models for system combination.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Statistical Dialogue Management using Intention Dependency Graph.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Joint Conference on Natural Language Processing, 2013

Stereo-based feature enhancement using dictionary learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

The second 'chime' speech separation and recognition challenge: Datasets, tasks and baselines.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Effectiveness of discriminative training and feature transformation for reverberated and noisy speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

The second 'CHiME' speech separation and recognition challenge: An overview of challenge systems and outcomes.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

A generalized discriminative training framework for system combination.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012

Statistical Voice Conversion Based on Noisy Channel Model.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2012

Structural Classification Methods Based on Weighted Finite-State Transducers for Automatic Speech Recognition.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2012

Low-Latency Real-Time Meeting Recognition and Understanding Using Distant Microphones and Omni-Directional Camera.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2012

Frame-wise model re-estimation method based on Gaussian pruning with weight normalization for noise robust voice activity detection.

[BibT_eX]

[DOI]

Speech Commun., 2012

Fully Bayesian speaker clustering based on hierarchically structured utterance-oriented Dirichlet process mixture model.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Bag Of ARCS: New representation of speech segment features based on finite state machines.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Fully Bayesian inference of multi-mixture Gaussian model and its evaluation using speaker clustering.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

MFCC enhancement using joint corrupted and noise feature space for highly non-stationary noise environments.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Effect of dialog acts on word use in polylogue.

[BibT_eX]

[DOI]

Roland Roller

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Basis vector orthogonalization for an improved kernel gradient matching pursuit method.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Decoding network optimization using minimum transition error training.

[BibT_eX]

[DOI]

Yotaro Kubo

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Noise suppression with unsupervised joint speaker adaptation and noise mixture model estimation.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Discriminative feature transforms using differenced maximum mutual information.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Handling uncertain observations in unsupervised topic-mixture language model adaptation.

[BibT_eX]

[DOI]

Ekapol Chuangsuwanich

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Topic tracking language model for speech recognition.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2011

Bayesian linear regression for Hidden Markov Model based on optimizing variational bounds.

[BibT_eX]

[DOI]

Biing-Hwang Juang

Proceedings of the 2011 IEEE International Workshop on Machine Learning for Signal Processing, 2011

Unsupervised Activity Recognition with User's Physical Characteristics Data.

[BibT_eX]

[DOI]

Takuya Maekawa

Proceedings of the 15th IEEE International Symposium on Wearable Computers (ISWC 2011), 2011

Model Adaptation for Automatic Speech Recognition Based on Multiple Time Scale Evolution.

[BibT_eX]

[DOI]

Biing-Hwang Juang

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Speaker Clustering Based on Utterance-Oriented Dirichlet Process Mixture Model.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Learning Influences from Word Use in Polylogue.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

A Robust Estimation Method of Noise Mixture Model for Noise Suppression.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Fashion Coordinates Recommender System Using Photographs from Fashion Magazines.

[BibT_eX]

[DOI]

Hiroshi Sawada

Proceedings of the IJCAI 2011, 2011

Gibbs sampling based Multi-scale Mixture Model for speaker clustering.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

High accurate model-integration-based voice conversion using dynamic features and model structure optimization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Subspace pursuit method for kernel-log-linear models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Non-stationary noise estimation method based on bias-residual component decomposition for robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Variance Compensation for Recognition of Reverberant Speech with Dereverberation Preprocessing.

[BibT_eX]

[DOI]

Proceedings of the Robust Speech Recognition of Uncertain or Missing Data, 2011

2010

Predictor-Corrector Adaptation by Using Time Evolution System With Macroscopic Time Scale.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2010

A Sequential Pattern Classifier Based on Hidden Markov Kernel Machine and Its Application to Phoneme Classification.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2010

Online Unsupervised Classification With Model Comparison in the Variational Bayes Framework for Voice Activity Detection.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2010

Application of topic tracking model to language model adaptation and meeting analysis.

[BibT_eX]

[DOI]

Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Real-time meeting recognition and understanding using distant microphones and omni-directional camera.

[BibT_eX]

[DOI]

Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Large vocabulary continuous speech recognition using WFST-based linear classifier for structured data.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Probabilistic integration of joint density model and speaker model for voice conversion.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

A regularized discriminative training method of acoustic models derived by minimum relative entropy discrimination.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Improvements of search error risk minimization in viterbi beam search for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Voice activity detection using frame-wise model re-estimation method based on Gaussian pruning with weight normalization.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Minimum Error Classification with geometric margin control.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

A discriminative model for continuous speech recognition based on Weighted Finite State Transducers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

Discriminative training based on an integrated view of MPE and MMI in margin and error space.

[BibT_eX]

[DOI]

Erik McDermott

Proceedings of the IEEE International Conference on Acoustics, 2010

Search error risk minimization in Viterbi beam search for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

Using online model comparison in the Variational Bayes framework for online unsupervised Voice Activity Detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

Fast similarity search on a large speech data set with neighborhood graph indexing.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

Static and Dynamic Variance Compensation for Recognition of Reverberant Speech With Dereverberation Preprocessing.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2009

Margin-space integration of MPE loss via differencing of MMI functionals for generalized error-weighted discriminative training.

[BibT_eX]

[DOI]

Erik McDermott

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Stereo-input speech recognition using sparseness-based time-frequency masking in a reverberant environment.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Topic Tracking Model for Analyzing Consumer Purchase Behavior.

[BibT_eX]

[DOI]

Proceedings of the IJCAI 2009, 2009

On-line adaptation and Bayesian detection of environmental changes based on a macroscopic time evolution system.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

A unified view for discriminative objective functions based on negative exponential of difference measure between strings.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

2008

A unified interpretation of adaptation approaches based on a macroscopic time evolution system and indirect/direct adaptation approaches.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

Combined static and dynamic variance adaptation for efficient interconnection of speech enhancement pre-processor with speech recognizer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

2007

Incremental Adaptation Based on a Macroscopic Time Evolution System.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

2006

Automatic determination of acoustic model topology using variational Bayesian estimation and clustering for large vocabulary continuous speech recognition.

[BibT_eX]

[DOI]

Atsushi Sako

IEEE Trans. Speech Audio Process., 2006

Speech Recognition Based on Student's t-Distribution Derived from Total Bayesian Framework.

[BibT_eX]

[DOI]