Florian Metze

Orcid: 0000-0002-6663-8600

  • Carnegie Mellon University, Pittsburgh, USA

According to our database1, Florian Metze authored at least 256 papers between 1996 and 2024.

Collaborative distances:



In proceedings 
PhD thesis 


Online presence:

On csauthors.net:


Audio-Journey: Open Domain Latent Diffusion Based Text-To-Audio Generation.
Proceedings of the IEEE International Conference on Acoustics, 2024

LegoNN: Building Modular Encoder-Decoder Models.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

CTC Alignments Improve Autoregressive Translation.
Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, 2023

Error-aware Quantization through Noise Tempering.
CoRR, 2022

SQuAT: Sharpness- and Quantization-Aware Training for BERT.
CoRR, 2022

Robustness of Neural Architectures for Audio Event Detection.
CoRR, 2022

Masked Autoencoders that Listen.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Phone Inventories and Recognition for Every Language.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

ASR2K: Speech Recognition for Around 2000 Languages without Audio.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

End-to-End Speech Summarization Using Restricted Self-Attention.
Proceedings of the IEEE International Conference on Acoustics, 2022

On Adversarial Robustness Of Large-Scale Audio Visual Learning.
Proceedings of the IEEE International Conference on Acoustics, 2022

Normalized Contrastive Learning for Text-Video Retrieval.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Token-level Sequence Labeling for Spoken Language Understanding using Compositional End-to-End Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Self-supervised object detection from audio-visual correspondence.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Zero-shot Learning for Grapheme to Phoneme Conversion with Language Ensemble.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

Speech Summarization using Restricted Self-Attention.
CoRR, 2021

Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Searchable Hidden Intermediates for End-to-End Models of Decomposable Sequence Tasks.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Differentiable Allophone Graphs for Language-Universal Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Multimodal Speech Summarization Through Semantic Concept Learning.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Hierarchical Phone Recognition with Compositional Phonetics.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Rethinking End-to-End Evaluation of Decomposable Tasks: A Case Study on Spoken Language Understanding.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Support-set bottlenecks for video-text representation learning.
Proceedings of the 9th International Conference on Learning Representations, 2021

Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Audio-Visual Event Recognition Through the Lens of Adversary.
Proceedings of the IEEE International Conference on Acoustics, 2021

Multilingual Phonetic Dataset for Low Resource Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2021

Phone Distribution Estimation for Low Resource Languages.
Proceedings of the IEEE International Conference on Acoustics, 2021

VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

NoiseQA: Challenge Set Evaluation for User-Centric Question Answering.
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

How2Sign: A Large-Scale Multimodal Dataset for Continuous American Sign Language.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

Machine Listening for Heart Status Monitoring: Introducing and Benchmarking HSS - The Heart Sounds Shenzhen Corpus.
IEEE J. Biomed. Health Informatics, 2020

Speech Technology for Unwritten Languages.
IEEE ACM Trans. Audio Speech Lang. Process., 2020

Grounded Sequence to Sequence Transduction.
IEEE J. Sel. Top. Signal Process., 2020

Transfer learning for multimodal dialog.
Comput. Speech Lang., 2020

Multimodal Speech Recognition with Unstructured Audio Masking.
CoRR, 2020

Revisiting Factorizing Aggregated Posterior in Learning Disentangled Representations.
CoRR, 2020

How2Sign: A Large-scale Multimodal Dataset for Continuous American Sign Language.
CoRR, 2020

On Dimensional Linguistic Properties of the Word Embedding Space.
Proceedings of the 5th Workshop on Representation Learning for NLP, 2020

AlloVera: A Multilingual Allophone Database.
Proceedings of The 12th Language Resources and Evaluation Conference, 2020

Towards Context-Aware End-to-End Code-Switching Speech Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Contextual RNN-T for Open Domain ASR.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Looking Enhances Listening: Recovering Missing Speech Using Images.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

ASR Error Correction and Domain Adaptation Using Machine Translation.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Universal Phone Recognition with a Multilingual Allophone System.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Fine-Grained Grounding for Multimodal Speech Recognition.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

On Long-Tailed Phenomena in Neural Machine Translation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

Gun Source and Muzzle Head Detection.
Proceedings of the Imaging and Multimedia Analytics in a Web and Mobile World 2020, 2020

Towards Zero-Shot Learning for Automatic Phonemic Transcription.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

Automatic word count estimation from daylong child-centered recordings in various language environments using language-independent syllabification of speech.
Speech Commun., 2019

Joint embeddings with multimodal cues for video-text retrieval.
Int. J. Multim. Inf. Retr., 2019

Enforcing Encoder-Decoder Modularity in Sequence-to-Sequence Models.
CoRR, 2019

On Compositionality in Neural Machine Translation.
CoRR, 2019

Adversarial Music: Real World Audio Adversary Against Wake-word Detection System.
CoRR, 2019

Analyzing Utility of Visual Context in Multimodal Speech Recognition Under Noisy Conditions.
CoRR, 2019

Grounding Object Detections With Transcriptions.
CoRR, 2019

The ARIEL-CMU Systems for LoReHLT18.
CoRR, 2019

Effective Dimensionality Reduction for Word Embeddings.
Proceedings of the 4th Workshop on Representation Learning for NLP, 2019

Adversarial Music: Real world Audio Adversary against Wake-word Detection System.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Acoustic-to-Word Models with Conversational Context Information.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

MediaEval 2019: Eyes and Ears Together.
Proceedings of the Working Notes Proceedings of the MediaEval 2019 Workshop, 2019

Multitask Learning For Different Subword Segmentations In Neural Machine Translation.
Proceedings of the 16th International Conference on Spoken Language Translation, 2019

CMU's Machine Translation System for IWSLT 2019.
Proceedings of the 16th International Conference on Spoken Language Translation, 2019

Survey Talk: Multimodal Processing of Speech and Language.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

SANTLR: Speech Annotation Toolkit for Low Resource Languages.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Multilingual Speech Recognition with Corpus Relatedness Sampling.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Cross-Attention End-to-End ASR for Two-Party Conversations.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

On Leveraging the Visual Modality for Neural Machine Translation.
Proceedings of the 12th International Conference on Natural Language Generation, 2019

Learned in Speech Recognition: Contextual Acoustic Word Embeddings.
Proceedings of the IEEE International Conference on Acoustics, 2019

Learning from Multiview Correlations in Open-domain Videos.
Proceedings of the IEEE International Conference on Acoustics, 2019

Phoneme Level Language Models for Sequence Based Low Resource ASR.
Proceedings of the IEEE International Conference on Acoustics, 2019

Multimodal Grounding for Sequence-to-sequence Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2019

Connectionist Temporal Localization for Sound Event Detection with Sequential Labeling.
Proceedings of the IEEE International Conference on Acoustics, 2019

A Comparison of Five Multiple Instance Learning Pooling Functions for Sound Event Detection with Weak Labeling.
Proceedings of the IEEE International Conference on Acoustics, 2019

Multimodal Abstractive Summarization for How2 Videos.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

How2: A Large-scale Dataset for Multimodal Language Understanding.
CoRR, 2018

Activity Recognition on a Large Scale in Short Videos - Moments in Time Dataset.
CoRR, 2018

Hierarchical Multi Task Learning With CTC.
CoRR, 2018

Hierarchical Multitask Learning With CTC.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Acoustic-to-Word Recognition with Sequence-to-Sequence Models.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Dialog-Context Aware end-to-end Speech Recognition.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Domain Robust Feature Extraction for Rapid Low Resource ASR Development.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval.
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, 2018

Eyes and Ears Together: New Task for Multimodal Spoken Content Analysis.
Proceedings of the Working Notes Proceedings of the MediaEval 2018 Workshop, 2018

Annotating High-Level Structures of Short Stories and Personal Anecdotes.
Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Subword and Crossword Units for CTC Acoustic Models.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Comparing the Max and Noisy-Or Pooling Functions in Multiple Instance Learning for Weakly Supervised Sequence Learning Tasks.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Multiple Instance Deep Learning for Weakly Supervised Small-Footprint Audio Event Detection.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

The ACLEW DiViMe: An Easy-to-use Diarization Tool.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Linguistic Unit Discovery from Multi-Modal Inputs in Unwritten Languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Enhancement and Analysis of Conversational Speech: JSALT 2017.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

End-to-end Multimodal Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A Light-Weight Multimodal Framework for Improved Environmental Audio Tagging.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Sequence-Based Multi-Lingual Low Resource Speech Recognition.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Multiple Instance Deep Learning for Weakly Supervised Audio Event Detection.
CoRR, 2017

A Comparison of deep learning methods for environmental sound.
CoRR, 2017

Proceedings of the 2017 TREC Video Retrieval Evaluation, 2017

Comparison of Decoding Strategies for CTC Acoustic Models.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A first attempt at polyphonic sound event detection using connectionist temporal classification.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

A comparison of Deep Learning methods for environmental sound detection.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Visual features for context-aware speech recognition.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Toolkits for Robust Speech Processing.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

End-to-End Architectures for Speech Recognition.
Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

Robust end-to-end deep audiovisual speech recognition.
CoRR, 2016

The effects of automatic speech recognition quality on human transcription latency.
Proceedings of the 13th Web for All Conference, 2016

Recurrent Support Vector Machines for Audio-Based Multimedia Event Detection.
Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, 2016

Open-Domain Audio-Visual Speech Recognition: A Deep Learning Approach.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Virtual Machines and Containers as a Platform for Experimentation.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Manipulating Word Lattices to Incorporate Human Corrections.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Experiences with Shared Resources for Research and Education in Speech and Language Processing.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Audio-based multimedia event detection using deep recurrent neural networks.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

An empirical exploration of CTC acoustic models.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Training Deep Neural Networks for Reverberation Robust Speech Recognition.
Proceedings of the 12th ITG Symposium on Speech Communication, 2016

Speaker Adaptive Training of Deep Neural Network Acoustic Models Using I-Vectors.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Query by Example Search on Speech at Mediaeval 2015.
Proceedings of the Working Notes Proceedings of the MediaEval 2015 Workshop, 2015

On speaker adaptation of long short-term memory recurrent neural networks.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Distance-aware DNNs for robust speech recognition.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

The speech recognition virtual kitchen turns one.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Using keyword spotting to help humans correct captioning faster.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Regularizing DNN acoustic models with Gaussian stochastic neurons.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Semi-supervised training in low-resource ASR and KWS.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

QUESST2014: Evaluating Query-by-Example Speech Search in a zero-resource setting with real-life queries.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Language independent search in MediaEval's Spoken Web Search task.
Comput. Speech Lang., 2014

Enabling the Rapid Development and Adoption of Speech-User Interfaces.
Computer, 2014

Query-by-example spoken term detection evaluation on low-resource languages.
Proceedings of the 4th Workshop on Spoken Language Technologies for Under-resourced Languages, 2014

EM-based phoneme confusion matrix generation for low-resource spoken term detection.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

A keyword search system using open source software.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Improvements to speaker adaptive training of deep neural networks.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

A methodology for using crowdsourced data to measure uncertainty in natural speech.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Query by Example Search on Speech at Mediaeval 2014.
Proceedings of the Working Notes Proceedings of the MediaEval 2014 Workshop, 2014

Multilingual deep bottle neck features: a study on language selection and training techniques.
Proceedings of the 11th International Workshop on Spoken Language Translation: Papers, 2014

Word-based probabilistic phonetic retrieval for low-resource spoken term detection.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

An in-depth comparison of keyword specific thresholding and sum-to-one score normalization.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

The speech recognition virtual kitchen: launch party.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Towards speaker adaptive training of deep neural network acoustic models.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Distributed learning of multilingual DNN feature extractors using GPUs.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Improving language-universal feature extraction with deep maxout and convolutional neural networks.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Neural network language models for low resource languages.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Query-by-example spoken term detection on multilingual unconstrained speech.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Improved audio features for large-scale multimedia event detection.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014

Semi-automatic audio semantic concept discovery for multimedia retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2014

Exploring audio semantic concepts for event-based video retrieval.
Proceedings of the IEEE International Conference on Acoustics, 2014

Optimization of Neural Network Language Models for keyword search.
Proceedings of the IEEE International Conference on Acoustics, 2014

Augmenting Translation Models with Simulated Acoustic Confusions for Improved Spoken Language Translation.
Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 2014

Semantics for Large-Scale Multimedia: New Challenges for NLP.
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014

Beyond audio and video retrieval: topic-oriented multimedia summarization.
Int. J. Multim. Inf. Retr., 2013

The Spoken Web Search Task.
Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, 2013

Robust audio-codebooks for large-scale event detection in consumer videos.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Improving low-resource CD-DNN-HMM using dropout and multilingual DNN training.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

The speech recognition virtual kitchen.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Formalizing expert knowledge for developing accurate speech recognizers.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Multi-layer mutually reinforced random walk with hidden parameters for improved multi-party meeting summarization.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Prosody-Based Unsupervised Speech Summarization with Two-Layer Mutually Reinforced Random Walk.
Proceedings of the Sixth International Joint Conference on Natural Language Processing, 2013

Identification and modeling of word fragments in spontaneous speech.
Proceedings of the IEEE International Conference on Acoustics, 2013

Learning discriminative basis coefficients for eigenspace MLLR unsupervised adaptation.
Proceedings of the IEEE International Conference on Acoustics, 2013

Subspace mixture model for low-resource speech recognition in cross-lingual settings.
Proceedings of the IEEE International Conference on Acoustics, 2013

The spoken web search task at MediaEval 2012.
Proceedings of the IEEE International Conference on Acoustics, 2013

Extracting deep bottleneck features using stacked auto-encoders.
Proceedings of the IEEE International Conference on Acoustics, 2013

Neighbour selection and adaptation for rapid speaker-dependent ASR.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Deep maxout networks for low-resource speech recognition.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Models of tone for tonal and non-tonal languages.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

DNN acoustic modeling with modular multi-lingual feature extraction networks.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Using web text to improve keyword spotting in speech.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Subword Modeling for Automatic Speech Recognition: Past, Present, and Emerging Approaches.
IEEE Signal Process. Mag., 2012

Integration of language identification into a recognition system for spoken conversations containing code-Switches.
Proceedings of the Third Workshop on Spoken Language Technologies for Under-resourced Languages, 2012

Multilingual bottle-neck features and its application for under-resourced languages.
Proceedings of the Third Workshop on Spoken Language Technologies for Under-resourced Languages, 2012

Active learning for accent adaptation in Automatic Speech Recognition.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Two-layer mutually reinforced random walk for improved multi-party meeting summarization.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Intra-Speaker Topic Modeling for Improved Multi-Party Meeting Summarization with Integrated Random Walk.
Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2012

AMVA'12: ACM international workshop on audio and multimedia methods for large-scale video analysis.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Semi-supervised learning for speech recognition in the context of accent adaptation.
Proceedings of the 2012 Symposium on Machine Learning in Speech and Language Processing, 2012

Beyond audio and video retrieval: towards multimedia summarization.
Proceedings of the International Conference on Multimedia Retrieval, 2012

The Spoken Web Search Task.
Proceedings of the Working Notes Proceedings of the MediaEval 2012 Workshop, 2012

Initialization Schemes for Multilayer Perceptron Training and their Impact on ASR Performance using Multilingual Data.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

On Speaker-Independent Personality Perception and Prediction from Speech.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Enhanced Polyphone Decision Tree Adaptation for Accented Speech Recognition.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

The Speech Recognition Virtual Kitchen: An Initial Prototype.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Event-based Video Retrieval Using Audio.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Integrating Intra-Speaker Topic Modeling and Temporal-Based Inter-Speaker Topic Modeling in Random Walk for Improved Multi-Party Meeting Summarization.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Generating Natural Language Summaries for Multimedia.
Proceedings of the INLG 2012 - Proceedings of the Seventh International Natural Language Generation Conference, 30 May 2012, 2012

The Spoken Web Search Task at MediaEval 2011.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Articulatory features for expressive speech synthesis.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Anger recognition in speech using acoustic and linguistic cues.
Speech Commun., 2011

Informedia@TRECVID 2011: Surveillance Event Detection.
Proceedings of the 2011 TREC Video Retrieval Evaluation, 2011

Spoken Web Search.
Proceedings of the Working Notes Proceedings of the MediaEval 2011 Workshop, 2011

Modeling Speaker Personality Using Voice.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Analysis of Dialectal Influence in Pan-Arabic ASR.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

A Review of Personality in Voice-Based Man Machine Interaction.
Proceedings of the Human-Computer Interaction. Interaction Techniques and Environments, 2011

Salient Features for Anger Recognition in German and English IVR Portals.
Proceedings of the Spoken Dialogue Systems Technology and Design, 2011

Informedia @ TRECVID2010.
Proceedings of the TRECVID 2010 workshop participants notebook papers, 2010

Automatically assessing acoustic manifestations of personality in speech.
Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Automatically Assessing Personality from Speech.
Proceedings of the 4th IEEE International Conference on Semantic Computing (ICSC 2010), 2010

Multimedia content with a speech track: ACM multimedia 2010 workshop on searching spontaneous conversational speech.
Proceedings of the 18th International Conference on Multimedia 2010, 2010

Analysis of gender normalization using MLP and VTLN features.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

The 2010 CMU GALE speech-to-text system.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Emotion recognition using imperfect speech recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Improvements to generalized discriminative feature transformation for speech recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Late fusion of individual engines for improved recognition of negative emotion in speech - learning vs. democratic vote.
Proceedings of the IEEE International Conference on Acoustics, 2010

Getting closer: tailored human-computer speech dialog.
Univers. Access Inf. Soc., 2009

Fusion of Acoustic and Linguistic Features for Emotion Detection.
Proceedings of the 3rd IEEE International Conference on Semantic Computing (ICSC 2009), 2009

Usability-Evaluation multimodaler Schnittstellen: Ist das Ganze die Summe seiner Teile?
Proceedings of the Mensch & Computer 2009: Grenzenlos frei!?, 2009

Benutzerstudien zur Bewertung multimodaler, interaktiver Anzeigetafeln in unterschiedlichen Entwicklungsstufen.
Proceedings of the Workshop-Proceedings der Tagung Mensch & Computer 2009, 2009

Digital Signage mit Interaktiven Displays.
Proceedings of the Workshop-Proceedings der Tagung Mensch & Computer 2009, 2009

Predicting the quality of multimodal systems based on judgments of single modalities.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Influence of training on direct and indirect measures for the evaluation of multimodal systems.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Emotion classification in children's speech using fusion of acoustic and linguistic features.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Detecting real life anger.
Proceedings of the IEEE International Conference on Acoustics, 2009

Usability Evaluation of Multimodal Interfaces: Is the Whole the Sum of Its Parts?
Proceedings of the Human-Computer Interaction. Novel Interaction Methods and Techniques, 2009

Reliable Evaluation of Multimodal Dialogue Systems.
Proceedings of the Human-Computer Interaction. Novel Interaction Methods and Techniques, 2009

User perception of multi-modal interfaces for mobile applications.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Detecting trends in social bookmarking systems using a probabilistic generative model and smoothing.
Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), 2008

Tailoring Taxonomies for Efficient Text Categorization and Expert Finding.
Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and International Conference on Intelligent Agent Technology, 2008

Discriminative speaker adaptation using articulatory features.
Speech Commun., 2007

An intelligent knowledge sharing system for web communities.
Proceedings of the IEEE International Conference on Systems, 2007

The "Spree" Expert Finding System.
Proceedings of the First IEEE International Conference on Semantic Computing (ICSC 2007), 2007

On using Articulatory Features for Discriminative Speaker Adaptation.
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2007

Comparison of Four Approaches to Age and Gender Recognition for Telephone Applications.
Proceedings of the IEEE International Conference on Acoustics, 2007

Spotting using Durational Entropy.
Proceedings of the IEEE International Conference on Acoustics, 2007

Articulatory features for "meeting" speech recognition.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Articulatory features for conversational speech recognition.
PhD thesis, 2005

Automatically Transcribing Meetings using Distant Microphones.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Issues in meeting transcription - the ISL meeting transcription system.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

The 2003 ISL rich transcription system for conversational telephony speech.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Large Vocabulary Audio-Visual Speech Recognition Using the Janus Speech Recognition Toolkit.
Proceedings of the Pattern Recognition, 26th DAGM Symposium, August 30, 2004

Integrating multilingual articulatory features into speech recognition.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

The NESPOLE! voIP multilingual corpora in tourism and medical domains.
Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Multilingual articulatory features.
Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

Compensating for hyperarticulation by modeling articulatory properties.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

A flexible stream architecture for ASR using articulatory features.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Efficient language model lookahead through polymorphic linguistic context assignment.
Proceedings of the IEEE International Conference on Acoustics, 2002

A Multi-Perspective Evaluation of the NESPOLE! Speech-to-Speech Translation System.
Proceedings of the Workshop on Speech-to-Speech Translation: Algorithms and Systems@ACL 2002, 2002

Advances in meeting recognition.
Proceedings of the First International Conference on Human Language Technology Research, 2001

Speech recognition over netmeeting connections.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

The nespole! voIP dialogue database.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Advances in automatic meeting record creation and access.
Proceedings of the IEEE International Conference on Acoustics, 2001

The ISL evaluation system for Verbmobil-II.
Proceedings of the IEEE International Conference on Acoustics, 2001

Speaker compensation with sine-log all-pass transforms.
Proceedings of the IEEE International Conference on Acoustics, 2001

Generalized radial basis function networks for classification and novelty detection: self-organization of optimal Bayesian decision.
Neural Networks, 2000

Das View4You- System: End-to-End Evaluation.
Proceedings of the KONVENS 2000 / Sprachkommunikation, 2000

Confidence measure based language identification.
Proceedings of the IEEE International Conference on Acoustics, 2000

Indeterminateness in Qualitative and Quantitative Reasoning.
Proceedings of the Seventh International Workshop on Database and Expert Systems Applications, 1996
