Zoltán Tüske

CoRR, 2023

Speech Translation with Style: AppTek's Submissions to the IWSLT Subtitling and Formality Tracks in 2023.

[BibT_eX]

[DOI]

Parnia Bahar

Patrick Wilken

Javier Iranzo-Sánchez

Mattia Di Gangi

Evgeny Matusov

Proceedings of the 20th International Conference on Spoken Language Translation, 2023

2022

Improving End-to-end Models for Set Prediction in Spoken Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

On the Limit of English Conversational Speech Recognition.

[BibT_eX]

[DOI]

George Saon

Brian Kingsbury

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Improving Customization of Neural Transducers by Mitigating Acoustic Mismatch of Synthesized Audio.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Integrating Dialog History into End-to-End Spoken Language Understanding Systems.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

4-Bit Quantization of LSTM-Based Speech Recognition Models.

[BibT_eX]

[DOI]

Swagath Venkataramani

Kailash Gopalakrishnan

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Reducing Exposure Bias in Training Recurrent Neural Network Transducers.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Advancing RNN Transducer Technology for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

End-to-End Spoken Language Understanding Using Transformer Networks and Self-Supervised Pre-Trained Features.

[BibT_eX]

[DOI]

Edmilson da Silva Morais

Proceedings of the IEEE International Conference on Acoustics, 2021

RNN Transducer Models for Spoken Language Understanding.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

2020

Discriminative feature modeling for statistical speech recognition.

[BibT_eX]

[DOI]

PhD thesis, 2020

Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard-300.

[BibT_eX]

[DOI]

CoRR, 2020

Single Headed Attention Based Sequence-to-Sequence Model for State-of-the-Art Results on Switchboard.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

End-to-End Spoken Language Understanding Without Full Transcripts.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Alignment-Length Synchronous Decoding for RNN Transducer.

[BibT_eX]

[DOI]

George Saon

Kartik Audhkhasi

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Advancing Sequence-to-Sequence Based Speech Recognition.

[BibT_eX]

[DOI]

Kartik Audhkhasi

George Saon

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Detection and Recovery of OOVs for Improved English Broadcast News Captioning.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Challenging the Boundaries of Speech Recognition: The MALACH Corpus.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Forget a Bit to Learn Better: Soft Forgetting for CTC-Based Automatic Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

English Broadcast News Speech Recognition by Humans and Machines.

[BibT_eX]

[DOI]

Alice Kaiser-Schatzlein

Bern Samko

Proceedings of the IEEE International Conference on Acoustics, 2019

Sequence Noise Injected Training for End-to-end Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Simplified LSTMS for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Semi-Supervised Training and Data Augmentation for Adaptation of Automatic Broadcast News Captioning Systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, 2018

Investigation on LSTM Recurrent N-gram Language Models for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Acoustic Modeling of Speech Waveform Based on Multi-Resolution, Neural Network Signal Processing.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

The 2016 RWTH Keyword Search System for Low-Resource Languages.

[BibT_eX]

[DOI]

Proceedings of the Speech and Computer - 19th International Conference, 2017

Parallel Neural Network Features for Improved Tandem Acoustic Modeling.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016

Automatic Speech Recognition Based on Neural Networks.

[BibT_eX]

[DOI]

Proceedings of the Speech and Computer - 18th International Conference, 2016

The RWTH Aachen LVCSR system for IWSLT-2016 German Skype conversation recognition task.

[BibT_eX]

[DOI]

Proceedings of the 13th International Conference on Spoken Language Translation, 2016

LSTM, GRU, Highway and a Bit of Attention: An Empirical Overview for Language Modeling in Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Investigation on log-linear interpolation of multi-domain neural network language model.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Improvements in RWTH LVCSR evaluation systems for Polish, Portuguese, English, urdu, and Arabic.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Multilingual features based keyword search for very low-resource languages.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Convolutional neural networks for acoustic modeling of raw time signal in LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Integrating Gaussian mixtures into deep neural networks: Softmax layer with hidden variables.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Speaker adaptive joint training of Gaussian mixture models and bottleneck features.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Multilingual representations for low resource speech recognition and keyword search.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

Acoustic modeling with deep neural networks using raw time signal for LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Data augmentation, feature combination, and multilingual neural networks to improve ASR and KWS performance for low-resource languages.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Lattice decoding and rescoring with long-Span neural network language models.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

RWTH LVCSR systems for quaero and EU-bridge: German, Polish, Spanish and Portuguese.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

The RWTH English lecture recognition system.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Multilingual MRASTA features for low-resource keyword search and speech recognition systems.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

2013

The RWTH Aachen German and English LVCSR systems for IWSLT-2013.

[BibT_eX]

[DOI]

Proceedings of the 10th International Workshop on Spoken Language Translation: Evaluation Campaign@IWSLT 2013, 2013

Multilingual hierarchical MRASTA features for ASR.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Development of the RWTH transcription system for slovenian.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Deep hierarchical bottleneck MRASTA features for LVCSR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Investigation on cross- and multilingual MLP features under matched and mismatched acoustical conditions.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

2012

Phase difference of filter-stable part-tones as acoustic feature.

[BibT_eX]

[DOI]

Friedhelm R. Drepper

Proceedings of the IEEE Statistical Signal Processing Workshop, 2012

Context-Dependent MLPs for LVCSR: TANDEM, Hybrid or Both?

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Non-stationary signal processing and its application in speech recognition.

[BibT_eX]

[DOI]

Friedhelm R. Drepper

Proceedings of the ISCA Workshop on Statistical And Perceptual Audition, 2012

Posterior-Scaled MPE: Novel Discriminative Training Criteria.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Comparison and combination of different CRBE based MLP features for LVCSR.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

A Study on Speaker Normalized MLP Features in LVCSR.

[BibT_eX]

[DOI]

Christian Plahl