Martti Vainio

Orcid: 0000-0003-2570-0196

  • University of Helsinki, Finland

According to our database1, Martti Vainio authored at least 72 papers between 1996 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.




In proceedings 
PhD thesis 


Online presence:



Sound symbolism in manual and vocal responses: phoneme-response interactions associated with grasping as well as vertical and size dimensions of keypresses.
Cogn. Process., August, 2024

High-Pitched Sound is Open and Low-Pitched Sound is Closed: Representing the Spatial Meaning of Pitch Height.
Cogn. Sci., August, 2024

Investigating the Utility of Surprisal from Large Language Models for Speech Synthesis Prosody.
Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

Analyzing second language proficiency using wavelet-based prominence estimates.
J. Phonetics, 2020

Prosodic Prominence and Boundaries in Sequence-to-Sequence Speech Synthesis.
CoRR, 2020

Towards transformational creation of novel songs.
Connect. Sci., 2019

The Sound of Grasp Affordances: Influence of Grasp-Related Size of Categorized Objects on Vocalization.
Cogn. Sci., 2019

Predicting Prosodic Prominence from Text with Pre-trained Contextualized Word Representations.
Proceedings of the 22nd Nordic Conference on Computational Linguistics, NoDaLiDa 2019, Turku, Finland, September 30, 2019

Comparative Analysis of Prosodic Characteristics Using WaveNet Embeddings.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Prosodic Representations of Prominence Classification Neural Networks and Autoencoders Using Bottleneck Features.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Hierarchical representation and estimation of prosody using continuous wavelet transform.
Comput. Speech Lang., 2017

Comparing Languages Using Hierarchical Prosodic Analysis.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Phase perception of the glottal excitation and its relevance in statistical parametric speech synthesis.
Speech Commun., 2016

Congruency Effect Between Articulation and Grasping in Native English Speakers.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Digitala: An Augmented Test and Review Process Prototype for High-Stakes Spoken Foreign Language Examination.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Rapid and automatic speech-specific learning mechanism in human neocortex.
NeuroImage, 2015

Hierarchical Representation of Prosody for Statistical Speech Synthesis.
CoRR, 2015

Action planning and congruency effect between articulation and grasping.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Phase perception of the glottal excitation of vocoded speech.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Different parts of the same elephant: A roadmap to disentangle and connect different perspectives on prosodic prominence.
Proceedings of the 18th International Congress of Phonetic Sciences, 2015

Pitch, perceived duration and auditory biases: Comparison among languages.
Proceedings of the 18th International Congress of Phonetic Sciences, 2015

Prosodic and syntactic segmentation of spontaneous speech: A preliminary study.
Proceedings of the 18th International Congress of Phonetic Sciences, 2015

Emergent consonantal quantity contrast and context-dependence of gestural phasing.
J. Phonetics, 2014

Synthesis and perception of breathy, normal, and Lombard speech in the presence of noise.
Comput. Speech Lang., 2014

An adaptive post-filtering method producing an artificial Lombard-like effect for intelligibility enhancement of narrowband telephone speech.
Comput. Speech Lang., 2014

Phonetics and Machine Learning: Hierarchical Modelling of Prosody in Statistical Speech Synthesis.
Proceedings of the Statistical Language and Speech Processing, 2014

Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Voice source modelling using deep neural networks for statistical parametric speech synthesis.
Proceedings of the 22nd European Signal Processing Conference, 2014

Wavelets for intonation modeling in HMM speech synthesis.
Proceedings of the Eighth ISCA Tutorial and Research Workshop on Speech Synthesis, 2013

Acoustic and visual phonetic features in the mcgurk effect - an audiovisual speech illusion.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Lombard modified text-to-speech synthesis for improved intelligibility: submission for the hurricane challenge 2013.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Analysis and synthesis of shouted speech.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Language background affects the strength of the pitch bias in a duration discrimination task.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Comparing glottal-flow-excited statistical parametric speech synthesis methods.
Proceedings of the IEEE International Conference on Acoustics, 2013

How far are vowel formants from computed vocal tract resonances?
CoRR, 2012

Effect of noise type and level on focus related fundamental frequency changes.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Wideband Parametric Speech Synthesis Using Warped Linear Prediction.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Utilization of the Lombard effect in post-filtering for intelligibility enhancement of telephone speech.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Improved formant frequency estimation from high-pitched vowels by downgrading the contribution of the glottal source with weighted linear prediction.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Intonational speaker verification: A study on parameters and performance under noisy conditions.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

On measuring the intelligibility of synthetic speech in noise - Do we need a realistic noise environment?
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Comparison of post-filtering methods for intelligibility enhancement of telephone speech.
Proceedings of the 20th European Signal Processing Conference, 2012

The GlottHMM Entry for Blizzard Challenge 2012: Hybrid Approach.
Proceedings of the Blizzard Challenge 2012, Portland, OR, USA, September 14, 2012, 2012

HMM-Based Speech Synthesis Utilizing Glottal Inverse Filtering.
IEEE Trans. Speech Audio Process., 2011

Analysis of HMM-Based Lombard Speech Synthesis.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

Relative Timing of Bilabial Gesture in Finnish.
Proceedings of the 17th International Congress of Phonetic Sciences, 2011

Estimates for the Measurement and Articulatory Error in MRI Data from Sustained Vowel Production.
Proceedings of the 17th International Congress of Phonetic Sciences, 2011

Utilizing glottal source pulse library for generating improved excitation signal for HMM-based speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2011

The GlottHMM Speech Synthesis Entry for Blizzard Challenge 2011: Utilizing Source Unit Selection in HMM-Based Speech Synthesis for Improved Excitation Generation.
Proceedings of the Blizzard Challenge 2011, Turin, Italy, September 2, 2011, 2011

Recording Speech Sound and Articulation in MRI.
Proceedings of the BIODEVICES 2011, 2011

Comparison of formant enhancement methods for HMM-based speech synthesis.
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

Laryngeal voice quality in the expression of focus.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

The GlottHMM Speech Synthesis Entry for Blizzard Challenge 2010.
Proceedings of the Blizzard Challenge 2010, Kansai Science City, Japan, September 25, 2010, 2010

New method for delexicalization and its application to prosodic tagging for text-to-speech synthesis.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Resources for speech research: present and future infrastructure needs.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

Evaluation of an Artificial Speech Bandwidth Extension Method in Three Languages.
IEEE Trans. Speech Audio Process., 2008

Deep Syntactic Analysis and Rule Based Accentuation in Text-to-Speech Synthesis.
Proceedings of the Text, Speech and Dialogue, 11th International Conference, 2008

HMM-based Finnish text-to-speech system utilizing glottal inverse filtering.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Laryngeal voice quality changes in expression of prominence in continuous speech.
Proceedings of the Fifth International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, 2007

Tonal features, intensity, and word order in the perception of prominence.
J. Phonetics, 2006

Word order and tonal shape in the production of focus in short Finnish utterances.
Proceedings of the Ninth International Conference on Spoken Language Processing, 2006

Three-dimensional modelling of speech corpora: added value through visualisation.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001

Object-oriented Access to the Estonian Phonetic Database.
Proceedings of the Second International Conference on Language Resources and Evaluation, 2000

Measuring the importance of morphological information for finnish speech synthesis.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Reduced impedance mismatch in speech database access.
Proceedings of the Sixth International Conference on Spoken Language Processing, 2000

Towards a high quality Finnish talking head.
Proceedings of the Third IEEE Workshop on Multimedia Signal Processing, 1999

Relational vs. object-oriented models for representing speech: a comparison using ANDOSL data.
Proceedings of the Sixth European Conference on Speech Communication and Technology, 1999

Modeling the microprosody of pitch and loudness for speech synthesis with neural networks.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Forming generic models of speech for uniform database access.
Proceedings of the 5th International Conference on Spoken Language Processing, Incorporating The 7th Australian International Speech Science and Technology Conference, Sydney Convention Centre, Sydney, Australia, 30th November, 1998

Speech synthesis using warped linear prediction and neural networks.
Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

Pitch, loudness, and segmental duration correlates: towards a model for the phonetic aspects of finnish prosody.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

A multilingual phonetic representation and analysis system for different speech databases.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996
