Piotr Zelasko

IEEE ACM Trans. Audio Speech Lang. Process., 2024

Slowness Regularized Contrastive Predictive Coding for Acoustic Unit Discovery.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2024

NeKo: Toward Post Recognition Generative Correction Large Language Models with Task-Oriented Experts.

[BibT_eX]

[DOI]

CoRR, 2024

VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

EMMeTT: Efficient Multimodal Machine Translation Training.

[BibT_eX]

[DOI]

CoRR, 2024

Chain-of-Thought Prompting for Speech Translation.

[BibT_eX]

[DOI]

CoRR, 2024

Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5.

[BibT_eX]

[DOI]

CoRR, 2024

Less is More: Accurate Speech Recognition & Translation without Web-Scale Data.

[BibT_eX]

[DOI]

CoRR, 2024

2023

Delay-Penalized Transducer for Low-Latency Streaming ASR.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Fast and Parallel Decoding for Transducer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2023

Why Aren't We NER Yet? Artifacts of ASR Errors in Named Entity Recognition in Spontaneous Speech Transcripts.

[BibT_eX]

[DOI]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

Unsupervised Speech Segmentation and Variable Rate Representation Learning Using Segmental Contrastive Predictive Coding.

[BibT_eX]

[DOI]

IEEE ACM Trans. Audio Speech Lang. Process., 2022

Discovering phonetic inventories with crosslingual automatic speech recognition.

[BibT_eX]

[DOI]

Siyuan Feng

Ali Abavisani

Mark Hasegawa-Johnson

Comput. Speech Lang., 2022

Defense against Adversarial Attacks on Hybrid Speech Recognition using Joint Adversarial Fine-tuning with Denoiser.

[BibT_eX]

[DOI]

CoRR, 2022

Vsameter: Evaluation of a New Open-Source Tool to Measure Vowel Space Area and Related Metrics.

[BibT_eX]

[DOI]

Tianyu Cao

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Defense against Adversarial Attacks on Hybrid Speech Recognition System using Adversarial Fine-tuning with Denoiser.

[BibT_eX]

[DOI]

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Non-contrastive self-supervised learning of utterance-level speech representations.

[BibT_eX]

[DOI]

Jaejin Cho

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

2021

Study of Pre-Processing Defenses Against Adversarial Attacks on State-of-the-Art Speaker Recognition Systems.

[BibT_eX]

[DOI]

Sonal Joshi

IEEE Trans. Inf. Forensics Secur., 2021

What Helps Transformers Recognize Conversational Structure? Importance of Context, Punctuation, and Labels in Dialog Act Recognition.

[BibT_eX]

[DOI]

Trans. Assoc. Comput. Linguistics, 2021

Non-Autoregressive Transformer for Speech Recognition.

[BibT_eX]

[DOI]

IEEE Signal Process. Lett., 2021

Lhotse: a speech data representation library for the modern deep learning ecosystem.

[BibT_eX]

[DOI]

CoRR, 2021

Adversarial Attacks and Defenses for Speech Recognition Systems.

[BibT_eX]

[DOI]

CoRR, 2021

Adversarial Attacks and Defenses for Speaker Identification Systems.

[BibT_eX]

[DOI]

Sonal Joshi

CoRR, 2021

Representation Learning to Classify and Detect Adversarial Attacks Against Speaker and Speech Recognition Systems.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Spine2Net: SpineNet with Res2Net and Time-Squeeze-and-Excitation Blocks for Speaker Recognition.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Earnings-21: A Practical Benchmark for ASR in the Wild.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Automatic Detection and Assessment of Alzheimer Disease Using Speech and Language Technologies in Low-Resource Scenarios.

[BibT_eX]

[DOI]

Jaejin Cho

Sonal Joshi

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Deep Feature CycleGANs: Speaker Identity Preserving Non-Parallel Microphone-Telephone Domain Adaptation for Speaker Verification.

[BibT_eX]

[DOI]

Saurabh Kataria

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Unsupervised Acoustic Unit Discovery by Leveraging a Language-Independent Subword Discriminative Feature Representation.

[BibT_eX]

[DOI]

Siyuan Feng

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Align-Denoise: Single-Pass Non-Autoregressive Speech Recognition.

[BibT_eX]

[DOI]

Nanxin Chen

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Segmental Contrastive Predictive Coding for Unsupervised Word Segmentation.

[BibT_eX]

[DOI]

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

CopyPaste: An Augmentation Method for Speech Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

How Phonotactics Affect Multilingual and Zero-Shot ASR Performance.

[BibT_eX]

[DOI]

Siyuan Feng

Ali Abavisani

Mark Hasegawa-Johnson

Proceedings of the IEEE International Conference on Acoustics, 2021

Improving Reconstruction Loss Based Speaker Embedding in Unsupervised and Semi-Supervised Scenarios.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Focus on the Present: A Regularization Method for the ASR Source-Target Attention Layer.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2021

Beyond Isolated Utterances: Conversational Emotion Recognition.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Joint Prediction of Truecasing and Punctuation for Conversational Speech in Low-Resource Scenarios.

[BibT_eX]

[DOI]

Agnieszka Mikolajczyk

Piotr Pezik

Aswin Shanmugam Subramanian

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

The JHU Multi-Microphone Multi-Speaker ASR System for the CHiME-6 Challenge.

[BibT_eX]

[DOI]

Ashish Arora

Desh Raj

CoRR, 2020

That Sounds Familiar: An Analysis of Phonetic Representations Transfer Across Languages.

[BibT_eX]

[DOI]

Mark Hasegawa-Johnson

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Learning Speaker Embedding from Text-to-Speech.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Self-Expressing Autoencoders for Unsupervised Spoken Term Discovery.

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Punctuation Prediction in Spontaneous Conversations: Can We Mitigate ASR Errors with Retrofitted Word Embeddings?

[BibT_eX]

[DOI]

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

WER we are and WER we think we are.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

2019

Towards Better Understanding of Spontaneous Conversations: Overcoming Automatic Speech Recognition Errors With Intent Recognition.

[BibT_eX]

[DOI]

CoRR, 2019

Avaya Conversational Intelligence: A Real-Time System for Spoken Language Understanding in Human-Human Call Center Conversations.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Hierarchical Transformers for Long Document Classification.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

An Application for Building a Polish Telephone Speech Corpus.

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Expanding Abbreviations in a Strongly Inflected Language: Are Morphosyntactic Tags Sufficient?

[BibT_eX]

[DOI]

Proceedings of the Eleventh International Conference on Language Resources and Evaluation, 2018

Punctuation Prediction Model for Conversational Speech.

[BibT_eX]

[DOI]

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017

LSTM Network for Inflected Abbreviation Expansion.

[BibT_eX]

[DOI]

CoRR, 2017

Audio Replay Attack Detection Using High-Frequency Features.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

2016

AGH corpus of Polish speech.

[BibT_eX]

[DOI]

Lang. Resour. Evaluation, 2016

Structure of pauses in speech in the context of speaker verification and classification of speech type.

[BibT_eX]

[DOI]

EURASIP J. Audio Speech Music. Process., 2016

2015

SARMATA 2.0 automatic Polish language speech recognition system.

[BibT_eX]

[DOI]

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Linguistically motivated tied-state triphones for polish speech recognition.

[BibT_eX]

[DOI]

Proceedings of the 2nd IEEE International Conference on Cybernetics, 2015

2014

HMM-based Breath and Filled Pauses Elimination in ASR.

[BibT_eX]

[DOI]