Aren Jansen

According to our database1, Aren Jansen authored at least 84 papers between 2006 and 2024.

Collaborative distances:



In proceedings 
PhD thesis 




Long-Form Speech Generation with Spoken Language Models.
CoRR, 2024

MELODI: Exploring Memory Compression for Long Contexts.
CoRR, 2024

A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

V2Meow: Meowing to the Visual Beat via Video-to-Music Generation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

V2Meow: Meowing to the Visual Beat via Music Generation.
CoRR, 2023

MusicLM: Generating Music From Text.
CoRR, 2023

MAQA: A Multimodal QA Benchmark for Negation.
CoRR, 2023

Dataset Balancing Can Hurt Model Performance.
Proceedings of the IEEE International Conference on Acoustics, 2023

A machine-learning based objective measure for ALS disease severity.
npj Digit. Medicine, 2022

BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition.
IEEE J. Sel. Top. Signal Process., 2022

MuLan: A Joint Embedding of Music Audio and Natural Language.
Proceedings of the 23rd International Society for Music Information Retrieval Conference, 2022

Text-Driven Separation of Arbitrary Sounds.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Universal Paralinguistic Speech Representations Using self-Supervised Conformers.
Proceedings of the IEEE International Conference on Acoustics, 2022

Sparse, Efficient, and Semantic Mixture Invariant Training: Taming In-the-Wild Unsupervised Sound Separation.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2021

Self-Supervised Learning from Automatically Separated Sound Scenes.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2021

Attention Bottlenecks for Multimodal Fusion.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds.
Proceedings of the 9th International Conference on Learning Representations, 2021

The Benefit of Temporally-Strong Labels in Audio Event Classification.
Proceedings of the IEEE International Conference on Acoustics, 2021

Addressing Missing Labels in Large-Scale Sound Event Recognition Using a Teacher-Student Framework With Loss Masking.
IEEE Signal Process. Lett., 2020

Addressing Missing Labels in Large-scale Sound Event Recognition using a Teacher-student Framework with Loss Masking.
CoRR, 2020

Semantically Meaningful Attributes from Co-Listen Embeddings for Playlist Exploration and Expansion.
Proceedings of the 21th International Society for Music Information Retrieval Conference, 2020

Towards Learning a Universal Non-Semantic Representation of Speech.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Improving Universal Sound Separation Using Sound Classification.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Coincidence, Categorization, and Consolidation: Learning to Recognize Sounds with Minimal Supervision.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Large-Scale Weakly-Supervised Content Embeddings for Music Recommendation and Tagging.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Unsupervised Learning of Semantic Audio Representations.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Evaluating Low-Level Speech Features Against Human Perceptual Data.
Trans. Assoc. Comput. Linguistics, 2017

Scalable out-of-sample extension of graph embeddings using deep neural networks.
Pattern Recognit. Lett., 2017

A segmental framework for fully-unsupervised large-vocabulary speech recognition.
Comput. Speech Lang., 2017

Large-scale audio event discovery in one million YouTube videos.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

CNN architectures for large-scale audio classification.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Audio Set: An ontology and human-labeled dataset for audio events.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

The Zero Resource Speech Challenge 2015: Proposed Approaches and Results.
Proceedings of the SLTU-2016, 2016

Context-dependent point process models for keyword search and detection-based ASR.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

A Framework for Evaluating Speech Representations.
Proceedings of the 38th Annual Meeting of the Cognitive Science Society, 2016

A Test Collection for Spoken Gujarati Queries.
Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015

Using Zero-Resource Spoken Term Discovery for Ranked Retrieval.
Proceedings of the NAACL HLT 2015, The 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Denver, Colorado, USA, May 31, 2015

The zero resource speech challenge 2015.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

A comparison of neural network methods for unsupervised representation learning on the zero resource speech challenge.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

An evaluation of graph clustering methods for unsupervised term discovery.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Fully unsupervised small-vocabulary speech recognition using a segmental Bayesian model.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Content-based recommender systems for spoken documents.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Segmental acoustic indexing for zero resource keyword search.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Unsupervised neural network based feature extraction using weak top-down constraints.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A keyword search system using open source software.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Unsupervised lexical clustering of speech segments using fixed-dimensional acoustic embeddings.
Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Bridging the gap between speech technology and natural language processing: an evaluation toolbox for term discovery systems.
Proceedings of the Ninth International Conference on Language Resources and Evaluation, 2014

Low-resource open vocabulary keyword search using point process models.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Featherweight phonetic keyword search for conversational speech.
Proceedings of the IEEE International Conference on Acoustics, 2014

Unsupervised idiolect discovery for speaker recognition.
Proceedings of the IEEE International Conference on Acoustics, 2014

Intrinsic Spectral Analysis.
IEEE Trans. Signal Process., 2013

Evaluating speech features with the minimal-pair ABX task: analysis of the classical MFC/PLP pipeline.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Semi-supervised manifold learning approaches for spoken term verification.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Text-to-speech inspired duration modeling for improved whole-word acoustic models.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Zero resource graph-based confidence estimation for open vocabulary spoken term detection.
Proceedings of the IEEE International Conference on Acoustics, 2013

Weak top-down constraints for unsupervised acoustic model training.
Proceedings of the IEEE International Conference on Acoustics, 2013

Frequency offset correction in speech without detecting pitch.
Proceedings of the IEEE International Conference on Acoustics, 2013

The FIRE 2013 Question Answering for the Spoken Web Task.
Proceedings of the 5th 2013 Forum on Information Retrieval Evaluation, 2013

Fixed-dimensional acoustic embeddings of variable-length segments in low-resource settings.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

The JHU-HLTCOE Spoken Web Search System for MediaEval 2012.
Proceedings of the Working Notes Proceedings of the MediaEval 2012 Workshop, 2012

Data-driven Posterior Features for Low Resource Speech Recognition Applications.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Exploiting Discriminative Point Process Models for Spoken Term Detection.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

MAP Estimation of Whole-Word Acoustic Models with Dictionary Priors.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Inverting the Point Process Model for Fast Phonetic Keyword Search.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Intrinsic Spectral Analysis for Zero and High Resource Speech Recognition.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Indexing Raw Acoustic Features for Scalable Zero Resource Search.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Event Selection from Phone Posteriorgrams Using Matched Filters.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Towards Unsupervised Training of Speaker Independent Acoustic Models.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Rapid Evaluation of Speech Representations for Spoken Term Discovery.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Speech recognitionwith segmental conditional random fields: A summary of the JHU CLSP 2010 Summer Workshop.
Proceedings of the IEEE International Conference on Acoustics, 2011

Whole word discriminative point process models.
Proceedings of the IEEE International Conference on Acoustics, 2011

Estimating document frequencies in a speech corpus.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

Efficient spoken term discovery using randomized algorithms.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

Towards spoken term discovery at scale with zero resources.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Detection-based speech recognition with sparse point process models.
Proceedings of the IEEE International Conference on Acoustics, 2010

NLP on Spoken Documents Without ASR.
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, 2010

Point Process Models for Spotting Keywords in Continuous Speech.
IEEE Trans. Speech Audio Process., 2009

Point process models for event-based speech recognition.
Speech Commun., 2009

Robust keyword spotting with rapidly adapting point process models.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

A hierarchical point process model for speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2008

Semi-supervised learning of speech sounds.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Intrinsic Fourier Analysis on the Manifold of Speech Sounds.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006
