Philip N. Garner

Orcid: 0000-0002-0814-1348

According to our database1, Philip N. Garner authored at least 115 papers between 1993 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.




In proceedings 
PhD thesis 


Online presence:



Bayesian Parameter-Efficient Fine-Tuning for Overcoming Catastrophic Forgetting.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity.
CoRR, 2024

A Bayesian Interpretation of Adaptive Low-Rank Adaptation.
CoRR, 2024

Exploring neural oscillations during speech perception via surrogate gradient spiking neural networks.
CoRR, 2024

An investigation into the adaptability of a diffusion-based TTS model.
CoRR, 2023

Diffusion Transformer for Adaptive Text-to-Speech.
Proceedings of the 12th ISCA Speech Synthesis Workshop, 2023

Can ChatGPT Detect Intent? Evaluating Large Language Models for Spoken Language Understanding.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Vulnerability of Automatic Identity Recognition to Audio-Visual Deepfakes.
Proceedings of the IEEE International Joint Conference on Biometrics, 2023

The Interpreter Understands Your Meaning: End-to-end Spoken Language Understanding Aided by Speech Translation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

The Idiap Speech Synthesis System for the Blizzard Challenge 2023.
Proceedings of the 18th Blizzard Challenge Workshop, Grenoble, France, August 29, 2023, 2023

Investigating a neural all pass warp in modern TTS applications.
Speech Commun., 2022

Surrogate Gradient Spiking Neural Networks as Encoders for Large Vocabulary Continuous Speech Recognition.
CoRR, 2022

Conversational Speech Recognition Needs Data? Experiments with Austrian German.
Proceedings of the Thirteenth Language Resources and Evaluation Conference, 2022

Low-Level Physiological Implications of End-to-End Learning for Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Bayesian Recurrent Units and the Forward-Backward Algorithm.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

A Bayesian Approach to Recurrence in Neural Networks.
IEEE Trans. Pattern Anal. Mach. Intell., 2021

Improving Emotional TTS with an Emotion Intensity Input from Unsupervised Extraction.
Proceedings of the 11th ISCA Speech Synthesis Workshop, 2021

Modeling Dialectal Variation for Swiss German Automatic Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

A Bayesian Interpretation of the Light Gated Recurrent Unit.
Proceedings of the IEEE International Conference on Acoustics, 2021

Learning to Translate Low-Resourced Swiss German Dialectal Speech into Standard German Text.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

An Evaluation Benchmark for Automatic Speech Recognition of German-English Code-Switching.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

A $t$-Distribution Based Operator for Enhancing Out of Distribution Robustness of Neural Network Classifiers.
IEEE Signal Process. Lett., 2020

Neural VTLN for Speaker Adaptation in TTS.
Proceedings of the 10th ISCA Speech Synthesis Workshop, 2019

Unbiased Semi-Supervised LF-MMI Training Using Dropout.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Self-Attention for Speech Emotion Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

An Investigation of Multilingual ASR Using End-to-end LF-MMI.
Proceedings of the IEEE International Conference on Acoustics, 2019

Empirical Evaluation and Combination of Punctuation Prediction Models Applied to Broadcast News.
Proceedings of the IEEE International Conference on Acoustics, 2019

An End-to-end Network to Synthesize Intonation Using a Generalized Command Response Model.
Proceedings of the IEEE International Conference on Acoustics, 2019

Cross-lingual adaptation of a CTC-based multilingual acoustic model.
Speech Commun., 2018

Intonation modelling using a muscle model and perceptually weighted matching pursuit.
Speech Commun., 2018

A Variational Prosody Model for the decomposition and synthesis of speech prosody.
CoRR, 2018

Context-Aware Attention Mechanism for Speech Emotion Recognition.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Fast Language Adaptation Using Phonological Information.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

A Neural Model to Predict Parameters for a Generalized Command Response Model of Intonation.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model.
CoRR, 2017

An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Composition of Deep and Spiking Neural Networks for Very Low Bit Rate Speech Coding.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Investigating Spectral Amplitude Modulation Phase Hierarchy Features in Speech Synthesis.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Emphasis recreation for TTS using intonation atoms.
Proceedings of the 9th ISCA Speech Synthesis Workshop, 2016

Design of a Speech Corpus for Research on Cross-Lingual Prosody Transfer.
Proceedings of the Speech and Computer - 18th International Conference, 2016

An Agonist-Antagonist Pitch Production Model.
Proceedings of the Speech and Computer - 18th International Conference, 2016

Investigating Cross-lingual Multi-level Adaptive Networks: The Importance of the Correlation of Source and Target Languages.
Proceedings of the 13th International Conference on Spoken Language Translation, 2016

Probabilistic Amplitude Demodulation Features in Speech Synthesis for Improving Prosody.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

The SIWIS Database: A Multilingual Speech Database with Acted Emphasis.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

PhonVoc: A Phonetic and Phonological Vocoding Toolkit.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Sound Pattern Matching for Automatic Prosodic Event Detection.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Modeling unvoiced sounds in statistical parametric speech synthesis with a continuous vocoder.
Proceedings of the 24th European Signal Processing Conference, 2016

Incremental Syllable-Context Phonetic Vocoding.
IEEE ACM Trans. Audio Speech Lang. Process., 2015

Ad hoc microphone array calibration: Euclidean distance matrix completion algorithm and theoretical guarantees.
Signal Process., 2015

Spatial Sound Localization via Multipath Euclidean Distance Matrix Recovery.
IEEE J. Sel. Top. Signal Process., 2015

Exploiting foreign resources for DNN-based ASR.
EURASIP J. Audio Speech Music. Process., 2015

DNN-Based Speech Synthesis: Importance of Input Features and Training Data.
Proceedings of the Speech and Computer - 17th International Conference, 2015

Weighted correlation based atom decomposition intonation modelling.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Robust microphone placement for source localization from noisy distance measurements.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Atom decomposition-based intonation modelling.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Phonological vocoding using artificial neural networks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Using out-of-language data to improve an under-resourced speech recognizer.
Speech Commun., 2014

Enhanced diffuse field model for ad hoc microphone array calibration.
Signal Process., 2014

Combining Vocal Tract Length Normalization With Hierarchical Linear Transformations.
IEEE J. Sel. Top. Signal Process., 2014

Swiss French Regional Accent Identification.
Proceedings of the Odyssey 2014: The Speaker and Language Recognition Workshop, 2014

Automatic speech recognition and translation of a Swiss German dialect: Walliserdeutsch.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Stress and accent transmission in HMM-based syllable-context very low bit rate speech coding.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

ROCKIT: Roadmap for Conversational Interaction Technologies.
Proceedings of the 2014 Workshop on Roadmapping the Future of Multimodal Interaction Research including Business Opportunities and Challenges, 2014

Ad-hoc microphone array calibration from partial distance measurements.
Proceedings of the 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays, 2014

Applying Multi- and Cross-Lingual Stochastic Phone Space Transformations to Non-Native Speech Recognition.
IEEE Trans. Speech Audio Process., 2013

A Simple Continuous Pitch Estimation Algorithm.
IEEE Signal Process. Lett., 2013

Crosslingual tandem-SGMM: exploiting out-of-language data for acoustic model and feature level adaptation.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Syllable-based pitch encoding for low bit rate speech coding with recognition/synthesis architecture.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Euclidean distance matrix completion for ad-hoc microphone array calibration.
Proceedings of the 18th International Conference on Digital Signal Processing, 2013

Accent adaptation using Subspace Gaussian Mixture Models.
Proceedings of the IEEE International Conference on Acoustics, 2013

On the (UN)importance of the contextual factors in HMM-based speech synthesis and coding.
Proceedings of the IEEE International Conference on Acoustics, 2013

Evaluating intra- and crosslingual adaptation for non-native speech recognition in a bilingual environment.
Proceedings of the IEEE 4th International Conference on Cognitive Infocommunications, 2013

Impact of deep MLP architecture on different acoustic modeling techniques for under-resourced speech recognition.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Vocal Tract Length Normalization for Statistical Parametric Speech Synthesis.
IEEE Trans. Speech Audio Process., 2012

Transcribing Meetings With the AMIDA Systems.
IEEE Trans. Speech Audio Process., 2012

Boosting under-resourced speech recognizers by exploiting out-of-language data - case study on Afrikaans.
Proceedings of the Third Workshop on Spoken Language Technologies for Under-resourced Languages, 2012

MediaParl: Bilingual mixed language accented speech database.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Combining cepstral normalization and cochlear implant-like speech processing for microphone array-based speech recognition.
Proceedings of the 2012 IEEE Spoken Language Technology Workshop (SLT), 2012

Comparing different acoustic modeling techniques for multilingual boosting.
Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Microphone array beampattern characterization for hands-free speech applications.
Proceedings of the IEEE 7th Sensor Array and Multichannel Signal Processing Workshop, 2012

Combining vocal tract length normalization with hierarchial linear transformations.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Using KL-divergence and multilingual information to improve ASR for under-resourced languages.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Bayesian approaches to uncertainty in speech processing.
PhD thesis, 2011

Cepstral normalisation and the signal to noise ratio spectrum in automatic speech recognition.
Speech Commun., 2011

A Just-in-Time Document Retrieval System for Dialogues or Monologues.
Proceedings of the SIGDIAL 2011 Conference, 2011

Improving Non-Native ASR Through Stochastic Multilingual Phoneme Space Transformations.
Proceedings of the 12th Annual Conference of the International Speech Communication Association, 2011

A Speech-based Just-in-Time Retrieval System using Semantic Search.
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference, 19-24 June, 2011, Portland, Oregon, USA, 2011

Speaker adaptation and the evaluation of speaker similarity in the EMIME speech-to-speech translation project.
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

Implementation of VTLN for statistical speech synthesis.
Proceedings of the Seventh ISCA Tutorial and Research Workshop on Speech Synthesis, 2010

English spoken term detection in multilingual recordings.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Hands free audio analysis from home entertainment.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

The AMIDA 2009 meeting transcription system.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Tracter: a lightweight dataflow framework.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Sparse component analysis for speech recognition in multi-speaker environment.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

VTLN adaptation for statistical speech synthesis.
Proceedings of the IEEE International Conference on Acoustics, 2010

Automatic temporal alignment of AV data with confidence estimation.
Proceedings of the IEEE International Conference on Acoustics, 2010

Beamforming With a Maximum Negentropy Criterion.
IEEE Trans. Speech Audio Process., 2009

Real-time ASR from meetings.
Proceedings of the 10th Annual Conference of the International Speech Communication Association, 2009

SNR features for automatic speech recognition.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

Maximum kurtosis beamforming with the generalized sidelobe canceller.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Silence models in weighted finite-state transducers.
Proceedings of the 9th Annual Conference of the International Speech Communication Association, 2008

Filter bank design based on minimization of individual aliasing terms for minimum mutual information subband adaptive beamforming.
Proceedings of the IEEE International Conference on Acoustics, 2008

A differential spectral voice activity detector.
Proceedings of the 2004 IEEE International Conference on Acoustics, 2004

SpokenContent representation in MPEG-7.
IEEE Trans. Circuits Syst. Video Technol., 2001

Representation and linking mechanisms for audio in MPEG-7.
Signal Process. Image Commun., 2000

Spoken content metadata and MPEG-7.
Proceedings of the ACM Multimedia 2000 Workshops, Los Angeles, CA, USA, October 30, 2000

On the robust incorporation of formant features into hidden Markov models for automatic speech recognition.
Proceedings of the 1998 IEEE International Conference on Acoustics, 1998

On topic identification and dialogue move recognition.
Comput. Speech Lang., 1997

Using formant frequencies in speech recognition.
Proceedings of the Fifth European Conference on Speech Communication and Technology, 1997

A keyword selection strategy for dialogue move recognition and multi-class topic identification.
Proceedings of the 1997 IEEE International Conference on Acoustics, 1997

A theory of word frequencies and its application to dialogue move recognition.
Proceedings of the 4th International Conference on Spoken Language Processing, 1996

Source position estimation using radial basis functions.
Proceedings of the 13th International Conference on Pattern Recognition, 1996

Towards Sonar Based Perception and Modelling for Unmanned Untethered Underwater Vehicles.
Proceedings of the 1993 IEEE International Conference on Robotics and Automation, 1993
