Hasim Sak

  • Google, Inc., USA
  • Bogazici University, Department of Computer Engineering, Istanbul, Turkey (PhD 2011)

According to our database1, Hasim Sak authored at least 54 papers between 2005 and 2024.

Collaborative distances:



In proceedings 
PhD thesis 


Online presence:

On csauthors.net:


Clustering and Mining Accented Speech for Inclusive and Fair Speech Recognition.
CoRR, 2024

Monte Carlo Self-Training for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

Cross-Training: A Semi-Supervised Training Scheme for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection.
Proceedings of the IEEE International Conference on Acoustics, 2022

Contrastive Siamese Network for Semi-Supervised Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2022

Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection.
CoRR, 2021

Reducing Streaming ASR Model Delay with Self Alignment.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Transformer Transducer: One Model Unifying Streaming and Non-streaming Speech Recognition.
CoRR, 2020

Multilingual Speech Recognition with Self-Attention Structured Parameterization.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-To-End Multi-Talker Overlapping Speech Recognition.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Adversarial Training for Multilingual Acoustic Modeling.
CoRR, 2019

Large-Scale Visual Speech Recognition.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Monotonic Recurrent Neural Network Transducer and Decoding Strategies.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

A Density Ratio Approach to Language Model Fusion in End-to-End Automatic Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

Large-Scale Visual Speech Recognition.
CoRR, 2018

Speech Recognition for Medical Conversations.
Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Recurrent Neural Aligner: An Encoder-Decoder Neural Network Model for Sequence to Sequence Mapping.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Acoustic Modeling for Google Home.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Multi-accent speech recognition with hierarchical grapheme based models.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Reducing the computational complexity for whole word models.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Flat start training of CD-CTC-SMBR LSTM RNN acoustic models.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Personalized speech recognition on mobile devices.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Fast and accurate recurrent neural network acoustic models for speech recognition.
Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Context dependent phone models for LSTM RNN acoustic modelling.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Learning acoustic frame labeling for speech recognition with recurrent neural networks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Acoustic modelling with CD-CTC-SMBR LSTM RNNS.
Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition.
CoRR, 2014

Sequence discriminative distributed training of long short-term memory recurrent neural networks.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Long short-term memory recurrent neural network architectures for large scale acoustic modeling.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Automatic language identification using long short-term memory recurrent neural networks.
Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Written-domain language modeling for automatic speech recognition.
Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Language model verbalization for automatic speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Mixture of mixture n-gram language models.
Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

Morpholexical and Discriminative Language Models for Turkish Automatic Speech Recognition.
IEEE Trans. Speech Audio Process., 2012

Integrating morphology into automatic speech recognition: Morpholexical and discriminative language models for Turkish (Biçimbilimin otomatik konuşma tanımaya bütünleştirilmesi: Türkçe için biçimsözlüksel ve ayırıcı dil modelleri)
PhD thesis, 2011

Resources for Turkish morphological processing.
Lang. Resour. Evaluation, 2011

Automatic fingersign-to-speech translation system.
J. Multimodal User Interfaces, 2011

Discriminative reranking of ASR hypotheses with morpholexical and N-best-list features.
Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

On-the-fly lattice rescoring for real-time automatic speech recognition.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Morphology-based and sub-word language modeling for Turkish speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2010

Turkish Broadcast News Transcription and Retrieval.
IEEE Trans. Speech Audio Process., 2009

Integrating morphology into automatic speech recognition.
Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

A Stochastic Finite-State Morphological Parser for Turkish.
Proceedings of the ACL 2009, 2009

Turkish Language Resources: Morphological Parser, Morphological Disambiguator and Web Corpus.
Proceedings of the Advances in Natural Language Processing, 2008

Language modeling for automatic turkish broadcast news transcription.
Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Morphological Disambiguation of Turkish Text with Perceptron Algorithm.
Proceedings of the Computational Linguistics and Intelligent Text Processing, 2007

Generation of synthetic speech from Turkish text.
Proceedings of the 13th European Signal Processing Conference, 2005
