Tamás Grósz

Orcid: 0000-0001-7918-9579

According to our database1, Tamás Grósz authored at least 56 papers between 2013 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.




In proceedings 
PhD thesis 


Online presence:

On csauthors.net:


Principled Comparisons for End-to-End Speech Recognition: Attention vs Hybrid at the 1000-Hour Scale.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

From Raw Speech to Fixed Representations: A Comprehensive Evaluation of Speech Embedding Techniques.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Comparison and analysis of new curriculum criteria for end-to-end ASR.
Speech Commun., 2024

Multimodal Humor Detection and Social Perception Prediction.
Proceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor, 2024

Investigating the Clusters Discovered By Pre-Trained AV-HuBERT.
Proceedings of the IEEE International Conference on Acoustics, 2024

Collecting Linguistic Resources for Assessing Children's Pronunciation of Nordic Languages.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Lahjoita puhetta: a large-scale corpus of spoken Finnish with some benchmarks.
Lang. Resour. Evaluation, September, 2023

Developing an AI-Assisted Low-Resource Spoken Language Learning App for Children.
IEEE Access, 2023

A pronunciation Scoring System Embedded into Children's Foreign Language Learning Games with Experimental Verification of Learning Benefits.
Proceedings of the 9th Workshop on Speech and Language Technology in Education, 2023

Multi-task wav2vec2 Serving as a Pronunciation Training System for Children.
Proceedings of the 9th Workshop on Speech and Language Technology in Education, 2023

CaptainA - A mobile app for practising Finnish pronunciation.
Proceedings of the 24th Nordic Conference on Computational Linguistics, 2023

Discovering Relevant Sub-spaces of BERT, Wav2Vec 2.0, ELECTRA and ViT Embeddings for Humor and Mimicked Emotion Recognition with Integrated Gradients.
Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, 2023

Advancing Audio Emotion and Intent Recognition with Large Pre-Trained Models and Bayesian Inference.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Investigating wav2vec2 context representations and the effects of fine-tuning, a case-study of a Finnish model.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Topic Identification for Spontaneous Speech: Enriching Audio Features with Embedded Linguistic Information.
Proceedings of the 31st European Signal Processing Conference, 2023

End-to-end Ensemble-based Feature Selection for Paralinguistics Tasks.
CoRR, 2022

Lahjoita puhetta - a large-scale corpus of spoken Finnish with some benchmarks.
CoRR, 2022

Wav2vec2-based Paralinguistic Systems to Recognise Vocalised Emotions and Stuttering.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Comparison and Analysis of New Curriculum Criteria for End-to-End ASR.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

wav2vec2-based Speech Rating System for Children with Speech Sound Disorder.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Tracing Signs of Urbanity in the Finnish Fiction Film of the 1950s: Toward a Multimodal Analysis of Audiovisual Data.
Proceedings of the 6th Digital Humanities in the Nordic and Baltic Countries Conference (DHNB 2022), 2022

LSTM-XL: Attention Enhanced Long-Term Memory for LSTM Cells.
Proceedings of the Text, Speech, and Dialogue - 24th International Conference, 2021

Social Signal Detection by Probabilistic Sampling DNN Training.
IEEE Trans. Affect. Comput., 2020

Aalto's End-to-End DNN systems for the INTERSPEECH 2020 Computational Paralinguistics Challenge.
CoRR, 2020

Deep learning in static, metric-based bug prediction.
Array, 2020

Visual Interpretation of DNN-based Acoustic Models using Deep Autoencoders.
Proceedings of the 3rd Workshop on Machine Learning Methods in Visualisation for Big Data, 2020

Data Augmentation Using Prosody and False Starts to Recognize Non-Native Children's Speech.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Automatic segmentation of hyperreflective foci in OCT images.
Comput. Methods Programs Biomed., 2019

Ultrasound-Based Silent Speech Interface Built on a Continuous Vocoder.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Autoencoder-Based Articulatory-to-Acoustic Mapping for Ultrasound Silent Speech Interfaces.
Proceedings of the International Joint Conference on Neural Networks, 2019

A Reconstruction-Free Projection Selection Procedure for Binary Tomography Using Convolutional Neural Networks.
Proceedings of the Image Analysis and Recognition - 16th International Conference, 2019

Using Deep Rectifier Neural Nets and Probabilistic Sampling for Topical Unit Classification.
Proceedings of the Cognitive Infocommunications, Theory and Applications, 2019

Training Methods for Deep Neural Network-Based Acoustic Models in Speech Recognition
PhD thesis, 2018

Efficient visual code localization with neural networks.
Pattern Anal. Appl., 2018

Multi-Task Learning of Speech Recognition and Speech Synthesis Parameters for Ultrasound-based Silent Speech Interfaces.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

General Utterance-Level Feature Extraction for Classifying Crying Sounds, Atypical & Self-Assessed Affect and Heart Beats.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Automatic Detection and Characterization of Biomarkers in OCT Images.
Proceedings of the Image Analysis and Recognition - 15th International Conference, 2018

F0 Estimation for DNN-Based Ultrasound Silent Speech Interfaces.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

A Comparative Evaluation of GMM-Free State Tying Methods for ASR.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Training Context-Dependent DNN Acoustic Models Using Probabilistic Sampling.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

DNN-Based Feature Extraction and Classifier Combination for Child-Directed Speech, Cold and Snoring Identification.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

DNN-Based Ultrasound-to-Speech Conversion for a Silent Speech Interface.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Detecting Mild Cognitive Impairment from Spontaneous Speech by Correlation-Based Phonetic Feature Selection.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

GMM-Free Flat Start Sequence-Discriminative DNN Training.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Estimating the Sincerity of Apologies in Speech by DNN Rank Learning and Prosodic Analysis.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Determining Native Language and Deception Using Phonetic Features and Classifier Combination.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Topical unit classification using deep neural nets and probabilistic sampling.
Proceedings of the 7th IEEE International Conference on Cognitive Infocommunications, 2016

Assessing the degree of nativeness and parkinson's condition using Gaussian processes and deep rectifier neural networks.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Building context-dependent DNN acoustic models using Kullback-Leibler divergence-based state tying.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Document Classification with Deep Rectifier Neural Networks and Probabilistic Sampling.
Proceedings of the Text, Speech and Dialogue - 17th International Conference, 2014

Robust Multi-Band ASR Using Deep Neural Nets and Spectro-temporal Features.
Proceedings of the Speech and Computer - 16th International Conference, 2014

A Sequence Training Method for Deep Rectifier Neural Networks in Speech Recognition.
Proceedings of the Speech and Computer - 16th International Conference, 2014

QR code localization using deep neural networks.
Proceedings of the IEEE International Workshop on Machine Learning for Signal Processing, 2014

Detecting the intensity of cognitive and physical load using AdaBoost and deep rectifier neural networks.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Localization of Visual Codes in the DCT Domain Using Deep Rectifier Neural Networks.
Proceedings of the International Workshop on Artificial Neural Networks and Intelligent Information Processing, 2014

A Comparison of Deep Neural Network Training Methods for Large Vocabulary Speech Recognition.
Proceedings of the Text, Speech, and Dialogue - 16th International Conference, 2013
