Ralf Schlüter

Sakriani Sakti

CoRR, 2024

Dynamic Encoder Size Based on Data-Driven Layer-wise Pruning for Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

On the Effect of Purely Synthetic Training Data for Different Automatic Speech Recognition Architectures.

[BibT_eX]

[DOI]

Nick Rossenbach

Benedikt Hilmes

CoRR, 2024

Investigating the Effect of Label Topology and Training Criterion on ASR Performance and Alignment Quality.

[BibT_eX]

[DOI]

CoRR, 2024

Chunked Attention-Based Encoder-Decoder Model for Streaming Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

On the Relation Between Internal Language Model and Sequence Discriminative Training for Neural Transducers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Mixture Encoder Supporting Continuous Speech Separation for Meeting Recognition.

[BibT_eX]

[DOI]

CoRR, 2023

Comparative Analysis of the wav2vec 2.0 Feature Extractor.

[BibT_eX]

[DOI]

Peter Vieting

CoRR, 2023

Improving And Analyzing Neural Speaker Embeddings for ASR.

[BibT_eX]

[DOI]

CoRR, 2023

Competitive and Resource Efficient Factored Hybrid HMM Systems are Simpler Than You Think.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Mixture Encoder for Joint Speech Separation and Recognition.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

RASR2: The RWTH ASR Toolkit for Generic Sequence-to-sequence Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Enhancing and Adversarial: Improve ASR with Speaker Labels.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Lattice-Free Sequence Discriminative Training for Phoneme-Based Neural Transducers.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Efficient Utilization of Large Pre-Trained Models for Low Resource ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Investigating The Effect of Language Models in Sequence Discriminative Training For Neural Transducers.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Nick Rossenbach

Benedikt Hilmes

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

End-To-End Training of a Neural HMM with Label and Transition Probabilities.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

Efficient Use of Large Pre-Trained Models for Low Resource ASR.

[BibT_eX]

[DOI]

CoRR, 2022

Development of Hybrid ASR Systems for Low Resource Medical Domain Conversational Telephone Speech.

[BibT_eX]

[DOI]

CoRR, 2022

Monotonic Segmental Attention for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

HMM vs. CTC for Automatic Speech Recognition: Comparison Based on Full-Sum Training from Scratch.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Discrete Steps towards Approximate Computing.

[BibT_eX]

[DOI]

Proceedings of the 23rd International Symposium on Quality Electronic Design, 2022

Efficient Training of Neural Transducer for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Improving the Training Recipe for a Robust Conformer-based Hybrid Model.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Self-Normalized Importance Sampling for Neural Language Modeling.

[BibT_eX]

[DOI]

Zijian Yang

Yingbo Gao

Jintao Jiang

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Automatic Learning of Subword Dependent Model Scales.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

On Language Model Integration for RNN Transducer Based Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Conformer-Based Hybrid ASR System For Switchboard Dataset.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Efficient Sequence Training of Attention Models Using Approximative Recombination.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

Improving Factored Hybrid HMM Acoustic Modeling without State Tying.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Prediction of Listener Perception of Argumentative Speech in a Crowdsourced Dataset Using (Psycho-)Linguistic and Fluency Features.

[BibT_eX]

[DOI]

CoRR, 2021

Why does CTC result in peaky behavior?

[BibT_eX]

[DOI]

Albert Zeyer

CoRR, 2021

The Impact of ASR on the Automatic Analysis of Linguistic Complexity and Sophistication in Spontaneous L2 Speech.

[BibT_eX]

[DOI]

CoRR, 2021

Feature Replacement and Combination for Hybrid ASR Systems.

[BibT_eX]

[DOI]

CoRR, 2021

Towards Consistent Hybrid HMM Acoustic Modeling.

[BibT_eX]

[DOI]

CoRR, 2021

A study of latent monotonic attention variants.

[BibT_eX]

[DOI]

Albert Zeyer

CoRR, 2021

Tight Integrated End-to-End Training for Cascaded Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Acoustic Data-Driven Subword Modeling for End-to-End Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Equivalence of Segmental and Neural Transducer Modeling: A Proof of Concept.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Librispeech Transducer Model with Internal Language Model Prior Correction.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Investigating Methods to Improve Language Model Integration for Attention-Based Encoder-Decoder ASR Models.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

The Impact of ASR on the Automatic Analysis of Linguistic Complexity and Sophistication in Spontaneous L2 Speech.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

On Sampling-Based Training Criteria for Neural Language Modeling.

[BibT_eX]

[DOI]

Yingbo Gao

David Thulke

Khoa Viet Tran

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Phoneme Based Neural Transducer for Large Vocabulary Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

On Architectures and Training for Raw Waveform Feature Extraction in ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Investigations on Phoneme-Based End-To-End Speech Recognition.

[BibT_eX]

[DOI]

CoRR, 2020

Robust Beam Search for Encoder-Decoder Attention Based Speech Recognition Without Length Bias.

[BibT_eX]

[DOI]

Wei Zhou

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

A New Training Pipeline for an Improved Neural Transducer.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Context-Dependent Acoustic Modeling Without Explicit Phone Clustering.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Early Stage LM Integration Using Local and Global Log-Linear Combination.

[BibT_eX]

[DOI]

Wilfried Michel

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Investigation of Large-Margin Softmax in Neural Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

LVCSR with Transformer Language Models.

[BibT_eX]

[DOI]

Eugen Beck

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Full-Sum Decoding for Hybrid Hmm Based Speech Recognition Using LSTM Language Model.

[BibT_eX]

[DOI]

Wei Zhou

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

The Rwth Asr System for Ted-Lium Release 2: Improving Hybrid Hmm With Specaugment.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Layer-Normalized LSTM for Hybrid-Hmm and End-To-End ASR.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Frame-Level MMI as A Sequence Discriminative Training Criterion for LVCSR.

[BibT_eX]

[DOI]

Wilfried Michel

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

How Much Self-Attention Do We Need? Trading Attention for Feed-Forward Layers.

[BibT_eX]

[DOI]

Kazuki Irie

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A Comprehensive Study of Residual CNNS for Acoustic Modeling in ASR.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Exploring A Zero-Order Direct Hmm Based on Latent Attention for Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Training of reduced-rank linear transformations for multi-layer polynomial acoustic features for speech recognition.

[BibT_eX]

[DOI]

Speech Commun., 2019

Upper and Lower Tight Error Bounds for Feature Omission with an Extension to Context Reduction.

[BibT_eX]

[DOI]

Eugen Beck

IEEE Trans. Pattern Anal. Mach. Intell., 2019

LSTM Language Models for LVCSR in First-Pass Decoding and Lattice-Rescoring.

[BibT_eX]

[DOI]

CoRR, 2019

RWTH ASR Systems for LibriSpeech: Hybrid vs Attention - w/o Data Augmentation.

[BibT_eX]

[DOI]

CoRR, 2019

On Using SpecAugment for End-to-End Speech Translation.

[BibT_eX]

[DOI]

Proceedings of the 16th International Conference on Spoken Language Translation, 2019

Survey Talk: Modeling in Automatic Speech Recognition: Beyond Hidden Markov Models.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Rescoring Keyword Search Confidence Estimates with Graph-Based Re-Ranking Using Acoustic Word Embeddings.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Comparison of Lattice-Free and Lattice-Based Sequence Discriminative Training Criteria for LVCSR.

[BibT_eX]

[DOI]

Wilfried Michel

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

An Analysis of Local Monotonic Attention Variants.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

RWTH ASR Systems for LibriSpeech: Hybrid vs Attention.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Cumulative Adaptation for BLSTM Acoustic Models.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Language Modeling with Deep Transformers.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Investigation into Joint Optimization of Single Channel Speech Enhancement and Acoustic Modeling for Robust ASR.

[BibT_eX]

[DOI]

Tobias Menne

Proceedings of the IEEE International Conference on Acoustics, 2019

On Using 2D Sequence-to-sequence Models for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

A Comparison of Transformer and LSTM Encoder Decoder Models for ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Training Language Models for Long-Span Cross-Sentence Evaluation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Speaker Adapted Beamforming for Multi-Channel Automatic Speech Recognition.

[BibT_eX]

[DOI]

Tobias Menne

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Improved Training of End-to-end Attention Models for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Investigation on LSTM Recurrent N-gram Language Models for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Comparison of BLSTM-Layer-Specific Affine Transformations for Speaker Adaptation.

[BibT_eX]

[DOI]

Markus Kitza

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Investigation on Estimation of Sentence Probability by Combining Forward, Backward and Bi-directional LSTM-RNNs.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Segmental Encoder-Decoder Models for Large Vocabulary Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Acoustic Modeling of Speech Waveform Based on Multi-Resolution, Neural Network Signal Processing.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Prediction of LSTM-RNN Full Context States as a Subtask for N-Gram Feedforward Language Models.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Sequence Modeling and Alignment for LVCSR-Systems.

[BibT_eX]

[DOI]

Proceedings of the 13th ITG Symposium on Speech Communication, 2018

2017

Inverted Alignments for End-to-End Automatic Speech Recognition.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2017

The 2016 RWTH Keyword Search System for Low-Resource Languages.

[BibT_eX]

[DOI]

Proceedings of the Speech and Computer - 19th International Conference, 2017

CTC in the Context of Generalized Full-Sum HMM Training.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Parallel Neural Network Features for Improved Tandem Acoustic Modeling.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Faster sequence training.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

A comprehensive study of deep bidirectional LSTM RNNS for acoustic modeling in speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Noisy objective functions based on the f-divergence.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Investigations on byte-level convolutional neural networks for language modeling in low resource speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Returnn: The RWTH extensible training framework for universal recurrent neural networks.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

2016

Automatic Speech Recognition Based on Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Speech and Computer - 18th International Conference, 2016

The RWTH Aachen LVCSR system for IWSLT-2016 German Skype conversation recognition task.

[BibT_eX]

[DOI]

Proceedings of the 13th International Conference on Spoken Language Translation, 2016

Towards Online-Recognition with Deep Bidirectional LSTM Acoustic Models.

[BibT_eX]

[DOI]

Albert Zeyer

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

LSTM, GRU, Highway and a Bit of Attention: An Empirical Overview for Language Modeling in Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Investigation on log-linear interpolation of multi-domain neural network language model.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Robust Online Multi-Channel Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 12th ITG Symposium on Speech Communication, 2016

2015

From Feedforward to Recurrent LSTM Neural Networks for Language Modeling.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2015

Improvements in RWTH LVCSR evaluation systems for Polish, Portuguese, English, urdu, and Arabic.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Bag-of-words input for long history representation in neural network-based language models for speech recognition.

[BibT_eX]

[DOI]

Kazuki Irie

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Multilingual features based keyword search for very low-resource languages.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Convolutional neural networks for acoustic modeling of raw time signal in LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Error bounds for context reduction and feature omission.

[BibT_eX]

[DOI]

Eugen Beck

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Investigations on sequence training of neural networks.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Sequence-discriminative training of recurrent neural networks.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Integrating Gaussian mixtures into deep neural networks: Softmax layer with hidden variables.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Investigation of mixture splitting concept for training linear bottlenecks of deep neural network acoustic models.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Improved strategies for a zero oov rate LVCSR system.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Unsupervised adaptation of a denoising autoencoder by Bayesian Feature Enhancement for reverberant asr under mismatch conditions.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Speaker adaptive joint training of Gaussian mixture models and bottleneck features.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Multilingual representations for low resource speech recognition and keyword search.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

Acoustic modeling with deep neural networks using raw time signal for LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Data augmentation, feature combination, and multilingual neural networks to improve ASR and KWS performance for low-resource languages.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Lattice decoding and rescoring with long-Span neural network language models.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

rwthlm - the RWTH aachen university neural network language modeling toolkit.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

RWTH LVCSR systems for quaero and EU-bridge: German, Polish, Spanish and Portuguese.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Word pair approximation for more efficient decoding with high-order language models.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Open-Lexicon Language Modeling Combining Word and Character Levels.

[BibT_eX]

[DOI]

Proceedings of the 14th International Conference on Frontiers in Handwriting Recognition, 2014

Mean-normalized stochastic gradient for large-scale deep learning.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

RASR/NN: The RWTH neural network toolkit for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

The RWTH English lecture recognition system.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Multilingual MRASTA features for low-resource keyword search and speech recognition systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

A family of discriminative training criteria based on the F-divergence for deep neural networks.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

Lexical Prefix Tree and WFST: A Comparison of Two Dynamic Search Concepts for LVCSR.

[BibT_eX]

[DOI]

David Rybach

IEEE Trans. Speech Audio Process., 2013

Investigations on an EM-Style Optimization Algorithm for Discriminative Training of HMMs.

[BibT_eX]

[DOI]

Georg Heigold

IEEE ACM Trans. Audio Speech Lang. Process., 2013

The RWTH Aachen German and English LVCSR systems for IWSLT-2013.

[BibT_eX]

[DOI]

Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign@IWSLT 2013, 2013

Novel tight classification error bounds under mismatch conditions based on f-Divergence.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Information Theory Workshop, 2013

Multilingual hierarchical MRASTA features for ASR.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Training log-linear acoustic models in higher-order polynomial feature space for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Feature-rich sub-lexical language models using a maximum entropy approach for German LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Relative error bounds for statistical classifiers based on the f-divergence.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Morpheme level hierarchical pitman-yor class-based language models for LVCSR of morphologically rich languages.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Improving LVCSR with hidden conditional random fields for grapheme-to-phoneme conversion.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Development of the RWTH transcription system for slovenian.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

A critical evaluation of stochastic algorithms for convex optimization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Deep hierarchical bottleneck MRASTA features for LVCSR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Investigation on cross- and multilingual MLP features under matched and mismatched acoustical conditions.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Comparison of feedforward and recurrent neural network language models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Feature combination and stacking of recurrent and non-recurrent neural networks for LVCSR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Advanced search space pruning with acoustic look-ahead for WFST based LVCSR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

System combination and score normalization for spoken term detection.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Open vocabulary handwriting recognition using combined word-level and character-level language models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

A high-performance Cantonese keyword search system.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Efficient nearly error-less LVCSR decoding based on incremental forward and backward passes.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012

WFST Enabled Solutions to ASR Problems: Beyond HMM Decoding.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2012

Discriminative Training for Automatic Speech Recognition: Modeling, Criteria, Optimization, Implementation, and Performance.

[BibT_eX]

[DOI]

IEEE Signal Process. Mag., 2012

Does the Cost Function Matter in Bayes Decision Rule?

[BibT_eX]

[DOI]

Markus Nußbaum-Thom

IEEE Trans. Pattern Anal. Mach. Intell., 2012

Phase difference of filter-stable part-tones as acoustic feature.

[BibT_eX]

[DOI]

Friedhelm R. Drepper

Proceedings of the IEEE Statistical Signal Processing Workshop, 2012

Accelerated Batch Learning of Convex Log-linear Models for LVCSR.

[BibT_eX]

[DOI]

Simon Wiesler

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Context-Dependent MLPs for LVCSR: TANDEM, Hybrid or Both?

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Non-stationary signal processing and its application in speech recognition.

[BibT_eX]

[DOI]

Friedhelm R. Drepper

Proceedings of the ISCA Workshop on Statistical And Perceptual Audition, 2012

Simultaneous Discriminative Training and Mixture Splitting of HMMs for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

LSTM Neural Networks for Language Modeling.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Hierarchical hybrid language models for open vocabulary continuous speech recognition using WFST.

[BibT_eX]

[DOI]

Proceedings of the ISCA Workshop on Statistical And Perceptual Audition, 2012

Investigation of Maximum Entropy Hybrid Language Models for Open Vocabulary German and Polish LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Posterior-Scaled MPE: Novel Discriminative Training Criteria.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Search Space Pruning Based on Anticipated Path Recombination in LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Morpheme Level Feature-based Language Models for German LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Comparison and combination of different CRBE based MLP features for LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Silence is golden: Modeling non-speech events in WFST-based dynamic network decoders.

[BibT_eX]

[DOI]

David Rybach

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Extended search space pruning in LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Joining advantages of word-conditioned and token-passing decoding.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Investigations on the use of morpheme level features in Language Models for Arabic LVCSR.

[BibT_eX]

[DOI]

Amr El-Desoky Mousa

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Basis vector orthogonalization for an improved kernel gradient matching pursuit method.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

On the Relationship Between Bayes Risk and Word Error Rate in ASR.

[BibT_eX]

[DOI]

Markus Nußbaum-Thom

IEEE Trans. Speech Audio Process., 2011

Equivalence of Generative and Log-Linear Models.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2011

Speech recognition for machine translation in Quaero.

[BibT_eX]

[DOI]

Proceedings of the 2011 International Workshop on Spoken Language Translation, 2011

A Study on Speaker Normalized MLP Features in LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Log-Linear Optimization of Second-Order Polynomial Features with Subsequent Dimension Reduction for Speech Recognition.

[BibT_eX]

[DOI]

Muhammad Ali Tahir

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

On the Estimation of Discount Parameters for Language Model Smoothing.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Hybrid Language Models Using Mixed Types of Sub-Lexical Units for Open Vocabulary German LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Improved Acoustic Feature Combination for LVCSR by Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Compound Word Recombination for German LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Acoustic Look-Ahead for More Efficient Decoding in LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Morpheme Based Factored Language Models for German LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Feature selection for log-linear acoustic models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Non-stationary feature extraction for automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

The RWTH 2010 Quaero ASR evaluation system for English, French, and German.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Using morpheme and syllable based sub-words for polish LVCSR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

A comparative analysis of dynamic network decoding.

[BibT_eX]

[DOI]

David Rybach

Proceedings of the IEEE International Conference on Acoustics, 2011

Exploiting sparseness of backing-off language models for efficient look-ahead in LVCSR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

Subspace pursuit method for kernel-log-linear models.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2011

A convergence analysis of log-linear training and its application to speech recognition.

[BibT_eX]

[DOI]

Simon Wiesler

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

Discriminative splitting of Gaussian/log-linear mixture HMMs for speech recognition.

[BibT_eX]

[DOI]

Muhammad Ali Tahir

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

Cross-lingual portability of Chinese and english neural network features for French and German LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010

Margin-Based Discriminative Training for String Recognition.

[BibT_eX]

[DOI]

IEEE J. Sel. Top. Signal Process., 2010

Sub-lexical language models for German LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Evaluation of automatic transcription systems for the judicial domain.

[BibT_eX]

[DOI]

Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

A Hybrid Morphologically Decomposed Factored Language Models for Arabic LVCSR.

[BibT_eX]

[DOI]

Amr El-Desoky

Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2010

A discriminative splitting criterion for phonetic decision trees.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

On the relation of Bayes risk, word error, and word posteriors in ASR.

[BibT_eX]

[DOI]

Markus Nußbaum-Thom

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Revisiting VTLN using linear transformation on conventional MFCC.

[BibT_eX]

[DOI]

Doddipatla Rama Sanand

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Hierarchical bottle neck features for LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Parallel lexical-tree based LVCSR on multi-core processors.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

The RWTH 2009 quaero ASR evaluation system for English and German.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Time conditioned search in automatic speech recognition reconsidered.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Discriminative adaptation for log-linear acoustic models.

[BibT_eX]

[DOI]

Jonas Lööf

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Discriminative HMMS, log-linear models, and CRFS: What is the difference?

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

The RWTH aachen university open source speech recognition system.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Development of the GALE 2008 Mandarin LVCSR system.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Parallel fast likelihood computation for LVCSR using mixture decomposition.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Bayes risk approximations using time overlap with an application to system combination.

[BibT_eX]

[DOI]

Björn Hoffmeister

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Log-linear model combination with word-dependent scaling factors.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Investigations on convex optimization using log-linear HMMs for digit string recognition.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Investigating the use of morphological decomposition and diacritization for improving Arabic LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Automatic Transcription of Courtroom Recordings in the JUMAS project.

[BibT_eX]

[DOI]

Proceedings of the 2<sup>nd</sup> International Conference on ICT Solutions for Justice, 2009

Audio segmentation for speech recognition using segment features.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

Modified MPE/MMI in a transducer-based framework.

[BibT_eX]

[DOI]

Georg Heigold

Proceedings of the IEEE International Conference on Acoustics, 2009

Investigations on features for log-linear acoustic models in continuous speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

Generalized likelihood ratio discriminant analysis.

[BibT_eX]

[DOI]

Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

2008

Development of the SRI/nightingale Arabic ASR system.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Recent improvements of the RWTH GALE Mandarin LVCSR system.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

iCNC and iROVER: the limits of improving system combination with classification?

[BibT_eX]

[DOI]

Björn Hoffmeister

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

On the equivalence of Gaussian and log-linear HMMs.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Modified MMI/MPE: a direct evaluation of the margin in speech recognition.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning, 2008

A GIS-like training algorithm for log-linear models with hidden variables.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2008

2007

Using multiple acoustic feature sets for speech recognition.

[BibT_eX]

[DOI]

Speech Commun., 2007

iROVER: Improving System Combination with Classification.

[BibT_eX]

[DOI]

Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2007

Hierarchical neural networks feature extraction for LVCSR system.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Efficient estimation of speaker-specific projecting feature transforms.

[BibT_eX]

[DOI]

Jonas Lööf

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

The RWTH 2007 TC-STAR evaluation system for european English and Spanish.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

On the equivalence of Gaussian HMM and Gaussian HMM-like hidden conditional random fields.

[BibT_eX]

[DOI]

Georg Heigold

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

An improved method for unsupervised training of LVCSR systems.

[BibT_eX]

[DOI]

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Gammatone Features and Feature Combination for Large Vocabulary Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

Cross-Site and Intra-Site ASR System Combination: Comparisons on Lattice and 1-Best Methods.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

Advances in Arabic broadcast news transcription at RWTH.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

Development of the 2007 RWTH Mandarin LVCSR system.

[BibT_eX]

[DOI]

Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006

Feature combination using linear discriminant analysis and its pitfalls.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

The 2006 RWTH parliamentary speeches transcription system.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Frame based system combination and a comparison with weighted ROVER and CNC.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

2005

Bayes risk minimization using metric loss functions.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Investigations on error minimizing training criteria for discriminative training in automatic speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Articulatory motivated acoustic features for speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Acoustic Feature Combination for Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Cross Domain Automatic Transcription on the TC-STAR EPPS Corpus.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004

Discriminative training with tied covariance matrices.

[BibT_eX]

[DOI]

Wolfgang Macherey

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

2003

Extraction methods of voicing feature for robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

2002

Robust speech recognition using a voiced-unvoiced feature.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

2001

Confidence measures for large vocabulary continuous speech recognition.

[BibT_eX]

[DOI]

IEEE Trans. Speech Audio Process., 2001

Model-based MCE bound to the true Bayes' error.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2001

Comparison of discriminative training criteria and optimization methods for speech recognition.

[BibT_eX]

[DOI]

Speech Commun., 2001

Vocal tract normalization equals linear transformation in cepstral space.

[BibT_eX]

[DOI]

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Explicit word error minimization using word hypothesis posterior probabilities.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2001

Using phase spectrum information for improved speech recognition performance.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2001

Computing Mel-frequency cepstral coefficients on the power spectrum.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2001

2000

Investigations on discriminative training criteria.

[BibT_eX]

[DOI]

PhD thesis, 2000

The RWTH Large Vocabulary Speech Recognition System for Spontaneous Speech.

[BibT_eX]

Proceedings of the KONVENS 2000 / Sprachkommunikation, 2000

Speech recognition using context conditional word posterior probabilities.

[BibT_eX]

[DOI]

Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Using posterior word probabilities for improved speech recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2000

Recent improvements of the RWTH large vocabulary speech recognition system on spontaneous speech.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2000

1999

A combined maximum mutual information and maximum likelihood approach for mixture density splitting.

[BibT_eX]

[DOI]

Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Discriminative Training of Gaussian Mixtures for Image Object Recognition.

[BibT_eX]

[DOI]

Jörg Dahmen

Proceedings of the Mustererkennung 1999, 1999

1998

Using word probabilities as confidence measures.

[BibT_eX]

[DOI]

Klaus Macherey

Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

Comparison of discriminative training criteria.

[BibT_eX]

[DOI]