Florian Metze

Amit K. Roy-Chowdhury

Int. J. Multim. Inf. Retr., 2019

Enforcing Encoder-Decoder Modularity in Sequence-to-Sequence Models.

[BibT_eX]

[DOI]

CoRR, 2019

On Compositionality in Neural Machine Translation.

[BibT_eX]

[DOI]

Vikas Raunak

Vaibhav Kumar

CoRR, 2019

Adversarial Music: Real World Audio Adversary Against Wake-word Detection System.

[BibT_eX]

[DOI]

CoRR, 2019

Analyzing Utility of Visual Context in Multimodal Speech Recognition Under Noisy Conditions.

[BibT_eX]

[DOI]

Tejas Srinivasan

CoRR, 2019

Grounding Object Detections With Transcriptions.

[BibT_eX]

[DOI]

CoRR, 2019

The ARIEL-CMU Systems for LoReHLT18.

[BibT_eX]

[DOI]

CoRR, 2019

OPERA: Operations-oriented Probabilistic Extraction, Reasoning, and Analysis.

[BibT_eX]

[DOI]

Proceedings of the 2019 Text Analysis Conference, 2019

Effective Dimensionality Reduction for Word Embeddings.

[BibT_eX]

[DOI]

Vikas Raunak

Vivek Gupta

Proceedings of the 4th Workshop on Representation Learning for NLP, 2019

Adversarial Music: Real world Audio Adversary against Wake-word Detection System.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Acoustic-to-Word Models with Conversational Context Information.

[BibT_eX]

[DOI]

Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

MediaEval 2019: Eyes and Ears Together.

[BibT_eX]

[DOI]

Proceedings of the Working Notes Proceedings of the MediaEval 2019 Workshop, 2019

Multitask Learning For Different Subword Segmentations In Neural Machine Translation.

[BibT_eX]

[DOI]

Tejas Srinivasan

Proceedings of the 16th International Conference on Spoken Language Translation, 2019

CMU's Machine Translation System for IWSLT 2019.

[BibT_eX]

[DOI]

Tejas Srinivasan

Proceedings of the 16th International Conference on Spoken Language Translation, 2019

Survey Talk: Multimodal Processing of Speech and Language.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

SANTLR: Speech Annotation Toolkit for Low Resource Languages.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Multilingual Speech Recognition with Corpus Relatedness Sampling.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Cross-Attention End-to-End ASR for Two-Party Conversations.

[BibT_eX]

[DOI]

Siddharth Dalmia

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

On Leveraging the Visual Modality for Neural Machine Translation.

[BibT_eX]

[DOI]

Proceedings of the 12th International Conference on Natural Language Generation, 2019

Learned in Speech Recognition: Contextual Acoustic Word Embeddings.

[BibT_eX]

[DOI]

Shruti Palaskar

Vikas Raunak

Proceedings of the IEEE International Conference on Acoustics, 2019

Learning from Multiview Correlations in Open-domain Videos.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Phoneme Level Language Models for Sequence Based Low Resource ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Multimodal Grounding for Sequence-to-sequence Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Connectionist Temporal Localization for Sound Event Detection with Sequential Labeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

A Comparison of Five Multiple Instance Learning Pooling Functions for Sound Event Detection with Weak Labeling.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2019

Multimodal Abstractive Summarization for How2 Videos.

[BibT_eX]

[DOI]

Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion.

[BibT_eX]

[DOI]

Siddharth Dalmia

Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018

How2: A Large-scale Dataset for Multimodal Language Understanding.

[BibT_eX]

[DOI]

CoRR, 2018

Activity Recognition on a Large Scale in Short Videos - Moments in Time Dataset.

[BibT_eX]

[DOI]

CoRR, 2018

Hierarchical Multi Task Learning With CTC.

[BibT_eX]

[DOI]

CoRR, 2018

OPERA: Operations-oriented Probabilistic Extraction, Reasoning, and Analysis.

[BibT_eX]

[DOI]

Eduard H. Hovy

Taylor Berg-Kirkpatrick

Jaime G. Carbonell

Hans Chalupsky

Anatole Gershman

Hector Zhengzhong Liu

Proceedings of the 2018 Text Analysis Conference, 2018

Hierarchical Multitask Learning With CTC.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Acoustic-to-Word Recognition with Sequence-to-Sequence Models.

[BibT_eX]

[DOI]

Shruti Palaskar

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Dialog-Context Aware end-to-end Speech Recognition.

[BibT_eX]

[DOI]

Niluthpol Chowdhury Mithun

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Domain Robust Feature Extraction for Rapid Low Resource ASR Development.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Learning Joint Embedding with Multimodal Cues for Cross-Modal Video-Text Retrieval.

[BibT_eX]

[DOI]

Amit K. Roy-Chowdhury

Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, 2018

Eyes and Ears Together: New Task for Multimodal Spoken Content Analysis.

[BibT_eX]

[DOI]

Proceedings of the Working Notes Proceedings of the MediaEval 2018 Workshop, 2018

Annotating High-Level Structures of Short Stories and Personal Anecdotes.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Subword and Crossword Units for CTC Acoustic Models.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Comparing the Max and Noisy-Or Pooling Functions in Multiple Instance Learning for Weakly Supervised Sequence Learning Tasks.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Multiple Instance Deep Learning for Weakly Supervised Small-Footprint Audio Event Detection.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

The ACLEW DiViMe: An Easy-to-use Diarization Tool.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Linguistic Unit Discovery from Multi-Modal Inputs in Unwritten Languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop.

[BibT_eX]

[DOI]

Odette Scharenborg

Laurent Besacier

Alan W. Black

Mark Hasegawa-Johnson

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Enhancement and Analysis of Conversational Speech: JSALT 2017.

[BibT_eX]

[DOI]

Mahesh Krishnamoorthy

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

End-to-end Multimodal Speech Recognition.

[BibT_eX]

[DOI]

Shruti Palaskar

Niluthpol Chowdhury Mithun

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A Light-Weight Multimodal Framework for Improved Environmental Audio Tagging.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Sequence-Based Multi-Lingual Low Resource Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

2017

Multiple Instance Deep Learning for Weakly Supervised Audio Event Detection.

[BibT_eX]

[DOI]

CoRR, 2017

A Comparison of deep learning methods for environmental sound.

[BibT_eX]

[DOI]

CoRR, 2017

CMU-UCR-BOSCH @ TRECVID 2017: VIDEO TO TEXT RETRIEVAL.

[BibT_eX]

[DOI]

Juncheng B. Li

Amit K. Roy-Chowdhury

Samarjit Das

Proceedings of the 2017 TREC Video Retrieval Evaluation, 2017

Comparison of Decoding Strategies for CTC Acoustic Models.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A Transfer Learning Based Feature Extractor for Polyphonic Sound Event Detection Using Connectionist Temporal Classification.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

A first attempt at polyphonic sound event detection using connectionist temporal classification.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

A comparison of Deep Learning methods for environmental sound detection.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Visual features for context-aware speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Toolkits for Robust Speech Processing.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

Preliminaries.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

End-to-End Architectures for Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the New Era for Robust Speech Recognition, Exploiting Deep Learning., 2017

2016

Robust end-to-end deep audiovisual speech recognition.

[BibT_eX]

[DOI]

Fernando De la Torre

CoRR, 2016

The effects of automatic speech recognition quality on human transcription latency.

[BibT_eX]

[DOI]

Proceedings of the 13th Web for All Conference, 2016

Recurrent Support Vector Machines for Audio-Based Multimedia Event Detection.

[BibT_eX]

[DOI]

Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, 2016

Open-Domain Audio-Visual Speech Recognition: A Deep Learning Approach.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Virtual Machines and Containers as a Platform for Experimentation.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Manipulating Word Lattices to Incorporate Human Corrections.

[BibT_eX]

[DOI]

Yashesh Gaur

Jeffrey P. Bigham

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Experiences with Shared Resources for Research and Education in Speech and Language Processing.

[BibT_eX]

[DOI]

Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Audio-based multimedia event detection using deep recurrent neural networks.

[BibT_eX]

[DOI]

Leonardo Neves

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

An empirical exploration of CTC acoustic models.

[BibT_eX]

[DOI]

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Training Deep Neural Networks for Reverberation Robust Speech Recognition.

[BibT_eX]

[DOI]

Proceedings of the 12th ITG Symposium on Speech Communication, 2016

2015

Speaker Adaptive Training of Deep Neural Network Acoustic Models Using I-Vectors.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2015

CMU Informedia@TRECVID 2015: MED/SIN/LNK/SED.

[BibT_eX]

[DOI]

Proceedings of the 2015 TREC Video Retrieval Evaluation, 2015

Query by Example Search on Speech at Mediaeval 2015.

[BibT_eX]

[DOI]

Igor Szöke

Proceedings of the Working Notes Proceedings of the MediaEval 2015 Workshop, 2015

On speaker adaptation of long short-term memory recurrent neural networks.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Distance-aware DNNs for robust speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

The speech recognition virtual kitchen turns one.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Using keyword spotting to help humans correct captioning faster.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Regularizing DNN acoustic models with Gaussian stochastic neurons.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Semi-supervised training in low-resource ASR and KWS.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

QUESST2014: Evaluating Query-by-Example Speech Search in a zero-resource setting with real-life queries.

[BibT_eX]

[DOI]

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding.

[BibT_eX]

[DOI]

Mohammad Gowayyed

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

Language independent search in MediaEval's Spoken Web Search task.

[BibT_eX]

[DOI]

Comput. Speech Lang., 2014

Enabling the Rapid Development and Adoption of Speech-User Interfaces.

[BibT_eX]

[DOI]

Anuj Kumar

Matthew Kam

Computer, 2014

Informedia @ TRECVID 2014.

[BibT_eX]

[DOI]

Proceedings of the 2014 TREC Video Retrieval Evaluation, 2014

Query-by-example spoken term detection evaluation on low-resource languages.

[BibT_eX]

[DOI]

Proceedings of the 4th Workshop on Spoken Language Technologies for Under-resourced Languages, 2014

EM-based phoneme confusion matrix generation for low-resource spoken term detection.

[BibT_eX]

[DOI]

Di Xu

Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

A keyword search system using open source software.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Improvements to speaker adaptive training of deep neural networks.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

A methodology for using crowdsourced data to measure uncertainty in natural speech.

[BibT_eX]

[DOI]

Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Query by Example Search on Speech at Mediaeval 2014.

[BibT_eX]

[DOI]

Igor Szöke

Andi Buzo

Proceedings of the Working Notes Proceedings of the MediaEval 2014 Workshop, 2014

Multilingual deep bottle neck features: a study on language selection and training techniques.

[BibT_eX]

[DOI]

Proceedings of the 11th International Workshop on Spoken Language Translation: Papers, 2014

Word-based probabilistic phonetic retrieval for low-resource spoken term detection.

[BibT_eX]

[DOI]

Di Xu

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

An in-depth comparison of keyword specific thresholding and sum-to-one score normalization.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

The speech recognition virtual kitchen: launch party.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Towards speaker adaptive training of deep neural network acoustic models.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Distributed learning of multilingual DNN feature extractors using GPUs.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Improving language-universal feature extraction with deep maxout and convolutional neural networks.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Neural network language models for low resource languages.

[BibT_eX]

[DOI]

Ankur Gandhe

Ian R. Lane

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Query-by-example spoken term detection on multilingual unconstrained speech.

[BibT_eX]

[DOI]

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Improved audio features for large-scale multimedia event detection.

[BibT_eX]

[DOI]

Yipei Wang

Proceedings of the IEEE International Conference on Multimedia and Expo, 2014

Semi-automatic audio semantic concept discovery for multimedia retrieval.

[BibT_eX]

[DOI]

Yipei Wang

Proceedings of the IEEE International Conference on Acoustics, 2014

Exploring audio semantic concepts for event-based video retrieval.

[BibT_eX]

[DOI]

Yipei Wang

Proceedings of the IEEE International Conference on Acoustics, 2014

Optimization of Neural Network Language Models for keyword search.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2014

Augmenting Translation Models with Simulated Acoustic Confusions for Improved Spoken Language Translation.

[BibT_eX]

[DOI]

Yulia Tsvetkov

Chris Dyer

Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 2014

Semantics for Large-Scale Multimedia: New Challenges for NLP.

[BibT_eX]

[DOI]

Koichi Shinoda

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 2014

2013

Beyond audio and video retrieval: topic-oriented multimedia summarization.

[BibT_eX]

[DOI]

Duo Ding

Ehsan Younessian

Int. J. Multim. Inf. Retr., 2013

Informedia@TRECVID 2013.

[BibT_eX]

[DOI]

Proceedings of the 2013 TREC Video Retrieval Evaluation, 2013

The Spoken Web Search Task.

[BibT_eX]

[DOI]

Proceedings of the MediaEval 2013 Multimedia Benchmark Workshop, 2013

Robust audio-codebooks for large-scale event detection in consumer videos.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Improving low-resource CD-DNN-HMM using dropout and multilingual DNN training.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

The speech recognition virtual kitchen.

[BibT_eX]

[DOI]

Eric Fosler-Lussier

Rebecca Bates

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Formalizing expert knowledge for developing accurate speech recognizers.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Multi-layer mutually reinforced random walk with hidden parameters for improved multi-party meeting summarization.

[BibT_eX]

[DOI]

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Prosody-Based Unsupervised Speech Summarization with Two-Layer Mutually Reinforced Random Walk.

[BibT_eX]

[DOI]

Sujay Kumar Jauhar

Proceedings of the Sixth International Joint Conference on Natural Language Processing, 2013

Identification and modeling of word fragments in spontaneous speech.

[BibT_eX]

[DOI]

Yulia Tsvetkov

Zaid Sheikh

Proceedings of the IEEE International Conference on Acoustics, 2013

Learning discriminative basis coefficients for eigenspace MLLR unsupervised adaptation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Subspace mixture model for low-resource speech recognition in cross-lingual settings.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

The spoken web search task at MediaEval 2012.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Extracting deep bottleneck features using stacked auto-encoders.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2013

Neighbour selection and adaptation for rapid speaker-dependent ASR.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Deep maxout networks for low-resource speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Models of tone for tonal and non-tonal languages.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

DNN acoustic modeling with modular multi-lingual feature extraction networks.

[BibT_eX]

[DOI]

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Using web text to improve keyword spotting in speech.

[BibT_eX]

[DOI]

Ankur Gandhe

Long Qin

Alexander I. Rudnicky

Ian R. Lane

Matthias Eck

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012

Subword Modeling for Automatic Speech Recognition: Past, Present, and Emerging Approaches.

[BibT_eX]

[DOI]

Karen Livescu

Eric Fosler-Lussier

IEEE Signal Process. Mag., 2012

Informedia @TRECVID 2012.

[BibT_eX]

[DOI]

Proceedings of the 2012 TREC Video Retrieval Evaluation, 2012

Integration of language identification into a recognition system for spoken conversations containing code-Switches.

[BibT_eX]

[DOI]

Proceedings of the Third Workshop on Spoken Language Technologies for Under-resourced Languages, 2012

Multilingual bottle-neck features and its application for under-resourced languages.

[BibT_eX]

[DOI]

Ngoc Thang Vu

Proceedings of the Third Workshop on Spoken Language Technologies for Under-resourced Languages, 2012

Active learning for accent adaptation in Automatic Speech Recognition.

[BibT_eX]

[DOI]

Udhyakumar Nallasamy

Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Two-layer mutually reinforced random walk for improved multi-party meeting summarization.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Intra-Speaker Topic Modeling for Improved Multi-Party Meeting Summarization with Integrated Random Walk.

[BibT_eX]

[DOI]

Proceedings of the Human Language Technologies: Conference of the North American Chapter of the Association of Computational Linguistics, 2012

AMVA'12: ACM international workshop on audio and multimedia methods for large-scale video analysis.

[BibT_eX]

[DOI]

Gerald Friedland

Daniel P. W. Ellis

Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Semi-supervised learning for speech recognition in the context of accent adaptation.

[BibT_eX]

[DOI]

Udhyakumar Nallasamy

Proceedings of the 2012 Symposium on Machine Learning in Speech and Language Processing, 2012

Beyond audio and video retrieval: towards multimedia summarization.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Multimedia Retrieval, 2012

The Spoken Web Search Task.

[BibT_eX]

[DOI]

Charl Johannes van Heerden

Etienne Barnard

Marelie H. Davel

Guillaume Gravier

Nitendra Rajput

Proceedings of the Working Notes Proceedings of the MediaEval 2012 Workshop, 2012

Initialization Schemes for Multilayer Perceptron Training and their Impact on ASR Performance using Multilingual Data.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

On Speaker-Independent Personality Perception and Prediction from Speech.

[BibT_eX]

[DOI]

Alessandro Vinciarelli

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Enhanced Polyphone Decision Tree Adaptation for Accented Speech Recognition.

[BibT_eX]

[DOI]

Udhyakumar Nallasamy

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

The Speech Recognition Virtual Kitchen: An Initial Prototype.

[BibT_eX]

[DOI]

Eric Fosler-Lussier

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Event-based Video Retrieval Using Audio.

[BibT_eX]

[DOI]

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Integrating Intra-Speaker Topic Modeling and Temporal-Based Inter-Speaker Topic Modeling in Random Walk for Improved Multi-Party Meeting Summarization.

[BibT_eX]

[DOI]

Charl Johannes van Heerden

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Generating Natural Language Summaries for Multimedia.

[BibT_eX]

[DOI]

Proceedings of the INLG 2012 - Proceedings of the Seventh International Natural Language Generation Conference, 30 May 2012, 2012

The Spoken Web Search Task at MediaEval 2011.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Articulatory features for expressive speech synthesis.

[BibT_eX]

[DOI]

Alan W. Black

H. Timothy Bunnell

Ying Dou

Prasanna Kumar Muthukumar

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Anger recognition in speech using acoustic and linguistic cues.

[BibT_eX]

[DOI]

Speech Commun., 2011

Informedia@TRECVID 2011: Surveillance Event Detection.

[BibT_eX]

[DOI]

Proceedings of the 2011 TREC Video Retrieval Evaluation, 2011

Spoken Web Search.

[BibT_eX]

[DOI]

Nitendra Rajput

Proceedings of the Working Notes Proceedings of the MediaEval 2011 Workshop, 2011

Modeling Speaker Personality Using Voice.

[BibT_eX]

[DOI]

Sebastian Möller

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Analysis of Dialectal Influence in Pan-Arabic ASR.

[BibT_eX]

[DOI]

Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

A Review of Personality in Voice-Based Man Machine Interaction.

[BibT_eX]

[DOI]

Alan W. Black

Proceedings of the Human-Computer Interaction. Interaction Techniques and Environments, 2011

Salient Features for Anger Recognition in German and English IVR Portals.

[BibT_eX]

[DOI]

Alexander Schmitt

Proceedings of the Spoken Dialogue Systems Technology and Design, 2011

2010

Informedia @ TRECVID2010.

[BibT_eX]

[DOI]

Proceedings of the TRECVID 2010 workshop participants notebook papers, 2010

Automatically assessing acoustic manifestations of personality in speech.

[BibT_eX]

[DOI]

Sebastian Möller

Proceedings of the 2010 IEEE Spoken Language Technology Workshop, 2010

Automatically Assessing Personality from Speech.

[BibT_eX]

[DOI]

Sebastian Möller

Proceedings of the 4th IEEE International Conference on Semantic Computing (ICSC 2010), 2010

Multimedia content with a speech track: ACM multimedia 2010 workshop on searching spontaneous conversational speech.

[BibT_eX]

[DOI]

Proceedings of the 18th International Conference on Multimedia 2010, 2010

Analysis of gender normalization using MLP and VTLN features.

[BibT_eX]

[DOI]

Thomas Schaaf

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

The 2010 CMU GALE speech-to-text system.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Emotion recognition using imperfect speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Improvements to generalized discriminative feature transformation for speech recognition.

[BibT_eX]

[DOI]

Roger Hsiao

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Late fusion of individual engines for improved recognition of negative emotion in speech - learning vs. democratic vote.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

Getting closer: tailored human-computer speech dialog.

[BibT_eX]

[DOI]

Univers. Access Inf. Soc., 2009

Fusion of Acoustic and Linguistic Features for Emotion Detection.

[BibT_eX]

[DOI]

Michael Wagner

Proceedings of the 3rd IEEE International Conference on Semantic Computing (ICSC 2009), 2009

Usability-Evaluation multimodaler Schnittstellen: Ist das Ganze die Summe seiner Teile?

[BibT_eX]

[DOI]

Ina Wechsung

Klaus-Peter Engelbrecht

Proceedings of the Mensch & Computer 2009: Grenzenlos frei!?, 2009

Benutzerstudien zur Bewertung multimodaler, interaktiver Anzeigetafeln in unterschiedlichen Entwicklungsstufen.

[BibT_eX]

[DOI]

Proceedings of the Workshop-Proceedings der Tagung Mensch & Computer 2009, 2009

Digital Signage mit Interaktiven Displays.

[BibT_eX]

[DOI]

Roman Englert

Proceedings of the Workshop-Proceedings der Tagung Mensch & Computer 2009, 2009

Predicting the quality of multimodal systems based on judgments of single modalities.

[BibT_eX]

[DOI]

Ina Wechsung

Klaus-Peter Engelbrecht

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Influence of training on direct and indirect measures for the evaluation of multimodal systems.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Emotion classification in children's speech using fusion of acoustic and linguistic features.

[BibT_eX]

[DOI]

Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Detecting real life anger.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2009

Usability Evaluation of Multimodal Interfaces: Is the Whole the Sum of Its Parts?

[BibT_eX]

[DOI]

Ina Wechsung

Klaus-Peter Engelbrecht

Proceedings of the Human-Computer Interaction. Novel Interaction Methods and Techniques, 2009

Reliable Evaluation of Multimodal Dialogue Systems.

[BibT_eX]

[DOI]

Proceedings of the Human-Computer Interaction. Novel Interaction Methods and Techniques, 2009

2008

User perception of multi-modal interfaces for mobile applications.

[BibT_eX]

[DOI]

Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Detecting trends in social bookmarking systems using a probabilistic generative model and smoothing.

[BibT_eX]

[DOI]

Proceedings of the 19th International Conference on Pattern Recognition (ICPR 2008), 2008

Tailoring Taxonomies for Efficient Text Categorization and Expert Finding.

[BibT_eX]

[DOI]

Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and International Conference on Intelligent Agent Technology, 2008

2007

Discriminative speaker adaptation using articulatory features.

[BibT_eX]

[DOI]

Speech Commun., 2007

An intelligent knowledge sharing system for web communities.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Systems, 2007

The "Spree" Expert Finding System.

[BibT_eX]

[DOI]

Christian Bauckhage

Tansu Alpcan

Proceedings of the First IEEE International Conference on Semantic Computing (ICSC 2007), 2007

On using Articulatory Features for Discriminative Speaker Adaptation.

[BibT_eX]

[DOI]

Proceedings of the Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, 2007

Comparison of Four Approaches to Age and Gender Recognition for Telephone Applications.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2007

Spotting using Durational Entropy.

[BibT_eX]

[DOI]

Jitendra Ajmera

Proceedings of the IEEE International Conference on Acoustics, 2007

2006

Articulatory features for "meeting" speech recognition.

[BibT_eX]

[DOI]

Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

2005

Articulatory features for conversational speech recognition.

[BibT_eX]

[DOI]

PhD thesis, 2005

The "FAME" Interactive Space.

[BibT_eX]

[DOI]

Proceedings of the Machine Learning for Multimodal Interaction, 2005

Automatically Transcribing Meetings using Distant Microphones.

[BibT_eX]

[DOI]

Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

2004

Issues in meeting transcription - the ISL meeting transcription system.

[BibT_eX]

[DOI]

Proceedings of the 8th International Conference on Spoken Language Processing, 2004

The 2003 ISL rich transcription system for conversational telephony speech.

[BibT_eX]

[DOI]

Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

Large Vocabulary Audio-Visual Speech Recognition Using the Janus Speech Recognition Toolkit.

[BibT_eX]

[DOI]

Proceedings of the Pattern Recognition, 26th DAGM Symposium, August 30, 2004

2003

Integrating multilingual articulatory features into speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

The NESPOLE! voIP multilingual corpora in tourism and medical domains.

[BibT_eX]

[DOI]

Proceedings of the 8th European Conference on Speech Communication and Technology, EUROSPEECH 2003, 2003

Multilingual articulatory features.

[BibT_eX]

[DOI]

Proceedings of the 2003 IEEE International Conference on Acoustics, 2003

2002

Compensating for hyperarticulation by modeling articulatory properties.

[BibT_eX]

[DOI]

Hagen Soltau

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

A flexible stream architecture for ASR using articulatory features.

[BibT_eX]

[DOI]

Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

Efficient language model lookahead through polymorphic linguistic context assignment.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2002

A Multi-Perspective Evaluation of the NESPOLE! Speech-to-Speech Translation System.

[BibT_eX]

[DOI]

Proceedings of the Workshop on Speech-to-Speech Translation: Algorithms and Systems@ACL 2002, 2002

2001

Advances in meeting recognition.

[BibT_eX]

[DOI]

Proceedings of the First International Conference on Human Language Technology Research, 2001

Speech recognition over netmeeting connections.

[BibT_eX]

[DOI]

John W. McDonough

Hagen Soltau

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

The nespole! voIP dialogue database.

[BibT_eX]

[DOI]

Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Advances in automatic meeting record creation and access.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2001

The ISL evaluation system for Verbmobil-II.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2001

Speaker compensation with sine-log all-pass transforms.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2001

2000

Generalized radial basis function networks for classification and novelty detection: self-organization of optimal Bayesian decision.

[BibT_eX]

[DOI]

Neural Networks, 2000

Das View4You- System: End-to-End Evaluation.

[BibT_eX]

Thomas Kemp

Proceedings of the KONVENS 2000 / Sprachkommunikation, 2000

Confidence measure based language identification.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2000

1996

Indeterminateness in Qualitative and Quantitative Reasoning.

[BibT_eX]

[DOI]

Daniel Zboril