Milos Cernak

Orcid: 0000-0002-5569-9491

According to our database1, Milos Cernak authored at least 81 papers between 2004 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.



In proceedings 
PhD thesis 




Semi-intrusive audio evaluation: Casting non-intrusive assessment as a multi-modal text prediction task.
CoRR, 2024

OpenACE: An Open Benchmark for Evaluating Audio Coding Performance.
CoRR, 2024

Diffusion-based Speech Enhancement with Schrödinger Bridge and Symmetric Noise Schedule.
CoRR, 2024

On Real-Time Multi-Stage Speech Enhancement Systems.
Proceedings of the IEEE International Conference on Acoustics, 2024

Multi-Channel Mosra: Mean Opinion Score and Room Acoustics Estimation Using Simulated Data and A Teacher Model.
Proceedings of the IEEE International Conference on Acoustics, 2024

Cluster-based pruning techniques for audio data.
CoRR, 2023

Demo Abstract: In-Ear-Voice - Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing Platforms.
Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation, 2023

In-Ear-Voice: Towards Milli-Watt Audio Enhancement With Bone-Conduction Microphones for In-Ear Sensing Platforms.
Proceedings of the 8th ACM/IEEE Conference on Internet of Things Design and Implementation, 2023

ALO-VC: Any-to-any Low-latency One-shot Voice Conversion.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Speaker Embeddings as Individuality Proxy for Voice Stress Detection.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Personalized Task Load Prediction in Speech Communication.
Proceedings of the IEEE International Conference on Acoustics, 2023

Efficient Speech Quality Assessment Using Self-Supervised Framewise Embeddings.
Proceedings of the IEEE International Conference on Acoustics, 2023

BC-VAD: A Robust Bone Conduction Voice Activity Detection.
CoRR, 2022

Fast accuracy estimation of deep learning based multi-class musical source separation.
Proceedings of the 2022 Northern Lights Deep Learning Workshop, 2022

Application for Real-time Personalized Speaker Extraction.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

MOSRA: Joint Mean Opinion Score and Room Acoustics Speech Quality Assessment.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Hybrid Handcrafted and Learnable Audio Representation for Analysis of Speech Under Cognitive and Physical Load.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

PEAF: Learnable Power Efficient Analog Acoustic Features for Audio Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

SERAB: A Multi-Lingual Benchmark for Speech Emotion Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Power efficient analog features for audio recognition.
CoRR, 2021

A Universal Deep Room Acoustics Estimator.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2021

Non-Intrusive Speech Quality Assessment with Transfer Learning and Subject-Specific Scaling.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping.
Proceedings of the HEAR: Holistic Evaluation of Audio Representations, 2021

Word-Level Embeddings for Cross-Task Transfer Learning in Speech Processing.
Proceedings of the 29th European Signal Processing Conference, 2021

AC-VC: Non-Parallel Low Latency Phonetic Posteriorgrams Based Voice Conversion.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Joint Blind Room Acoustic Characterization From Speech And Music Signals Using Convolutional Recurrent Neural Networks.
CoRR, 2020

Deep Speech Inpainting of Time-Frequency Masks.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Spiking Neural Networks Trained With Backpropagation for Low Power Neuromorphic Implementation of Voice Activity Detection.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A Bin Encoding Training of a Spiking Neural Network Based Voice Activity Detection.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

FastVC: Fast Voice Conversion with non-parallel data.
Proceedings of the Joint Workshop for the Blizzard Challenge and Voice Conversion Challenge 2020, 2020

Voice Presentation Attack Detection Using Convolutional Neural Networks.
Proceedings of the Handbook of Biometric Anti-Spoofing, 2019

Speech-VGG: A deep feature extractor for speech processing.
CoRR, 2019

End-to-End Accented Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Open-Vocabulary Keyword Spotting with Audio and Text Embeddings.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Evaluating Audiovisual Source Separation in the Context of Video Conferencing.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Phone-Attribute Posteriors to Evaluate the Speech of Cochlear Implant Users.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Cognitive Speech Coding: Examining the Impact of Cognitive Speech Processing on Speech Compression.
IEEE Signal Process. Mag., 2018

SoftwareX, 2018

NeuroSpeech: An open-source software for Parkinson's speech analysis.
Digit. Signal Process., 2018

Phonological Posteriors and GRU Recurrent Units to Assess Speech Impairments of Patients with Parkinson's Disease.
Proceedings of the Text, Speech, and Dialogue - 21st International Conference, 2018

Phonological i-Vectors to Detect Parkinson's Disease.
Proceedings of the Text, Speech, and Dialogue - 21st International Conference, 2018

Nasal Speech Sounds Detection Using Connectionist Temporal Classification.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Perceptual Information Loss due to Impaired Speech Production.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

Characterisation of voice quality of Parkinson's disease using differential phonological posterior features.
Comput. Speech Lang., 2017

Speech vocoding for laboratory phonology.
Comput. Speech Lang., 2017

Bob Speaks Kaldi.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Multi-view representation learning via gcca for multimodal analysis of Parkinson's disease.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

On the impact of non-modal phonation on phonological features.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Composition of Deep and Spiking Neural Networks for Very Low Bit Rate Speech Coding.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

On structured sparsity of phonological posteriors for linguistic parsing.
Speech Commun., 2016

An Analysis of Rhythmic Staccato-Vocalization Based on Frequency Demodulation for Laughter Detection in Conversational Meetings.
CoRR, 2016

Investigating Spectral Amplitude Modulation Phase Hierarchy Features in Speech Synthesis.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

HMM-Based Non-Native Accent Assessment Using Posterior Features.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Probabilistic Amplitude Demodulation Features in Speech Synthesis for Improving Prosody.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

PhonVoc: A Phonetic and Phonological Vocoding Toolkit.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Sound Pattern Matching for Automatic Prosodic Event Detection.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Phonetic and Phonological Posterior Search Space Hashing Exploiting Class-Specific Sparsity Structures.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Modeling unvoiced sounds in statistical parametric speech synthesis with a continuous vocoder.
Proceedings of the 24th European Signal Processing Conference, 2016

Incremental Syllable-Context Phonetic Vocoding.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Residual-Based Excitation with Continuous F0 Modeling in HMM-Based Speech Synthesis.
Proceedings of the Statistical Language and Speech Processing, 2015

Automatic accentedness evaluation of non-native speech using phonetic and sub-phonetic posterior probabilities.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Neuromorphic based oscillatory device for incremental syllable boundary detection.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

An empirical model of emphatic word detection.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

On compressibility of neural network phonological features for low bit rate speech coding.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Phonological vocoding using artificial neural networks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Development of bilingual ASR system for MediaParl corpus.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Stress and accent transmission in HMM-based syllable-context very low bit rate speech coding.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

A Simple Continuous Pitch Estimation Algorithm.
IEEE Signal Process. Lett., 2013

Syllable-based pitch encoding for low bit rate speech coding with recognition/synthesis architecture.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

On the (UN)importance of the contextual factors in HMM-based speech synthesis and coding.
Proceedings of the IEEE International Conference on Acoustics, 2013

Automatic Staging of Audio with Emotions.
Proceedings of the 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction, 2013

Reading companion: the technical and social design of an automated reading tutor.
Proceedings of the Third Workshop on Child, Computer and Interaction, 2012

Robust triphone mapping for acoustic modeling.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Rule-Based Triphone Mapping for Acoustic Modeling in Automatic Speech Recognition.
Proceedings of the Text, Speech and Dialogue - 14th International Conference, 2011

Effective Triphone Mapping for Acoustic Modeling in Speech Recognition.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

A Comparison of Decision Tree Classifiers for Automatic Diagnosis of Speech Recognition Errors.
Comput. Informatics, 2010

Diagnostics for Debugging Speech Recognition Systems.
Proceedings of the Text, Speech and Dialogue, 13th International Conference, 2010

Unit Selection Speech Synthesis in Noise.
Proceedings of the 2006 IEEE International Conference on Acoustics Speech and Signal Processing, 2006

Diagnostics of speech recognition using classification phoneme diagnostic trees.
Proceedings of the Second IASTED International Conference on Computational Intelligence, 2006

TTSBOX: a MATLAB toolbox for teaching text-to-speech synthesis.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Slovak Speech Database for Experiments and Application Building in Unit-Selection Speech Synthesis.
Proceedings of the Text, Speech and Dialogue, 7th International Conference, 2004
