We stand with Ukraine

We stand with Ukraine

Hasim Sak

Affiliations:

Google, Inc., USA
Bogazici University, Department of Computer Engineering, Istanbul, Turkey (PhD 2011)

According to our database¹, Hasim Sak authored at least 54 papers between 2005 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

Online presence:

On csauthors.net:

Bibliography

2024

Clustering and Mining Accented Speech for Inclusive and Fair Speech Recognition.

[BibT_eX]

[DOI]

,

,

,

Anshuman Tripathi

,

,

CoRR, 2024

Monte Carlo Self-Training for Speech Recognition.

[BibT_eX]

[DOI]

Anshuman Tripathi

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

2023

Cross-Training: A Semi-Supervised Training Scheme for Speech Recognition.

[BibT_eX]

[DOI]

,

Anshuman Tripathi

,

,

,

,

Rohit Prabhavalkar

,

Proceedings of the IEEE International Conference on Acoustics, 2023

2022

Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection.

[BibT_eX]

[DOI]

,

,

,

Anshuman Tripathi

,

,

Ignacio López-Moreno

,

Proceedings of the IEEE International Conference on Acoustics, 2022

Contrastive Siamese Network for Semi-Supervised Speech Recognition.

[BibT_eX]

[DOI]

,

,

Anshuman Tripathi

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

2021

Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection.

[BibT_eX]

[DOI]

,

,

,

Anshuman Tripathi

,

Ignacio López-Moreno

,

CoRR, 2021

Reducing Streaming ASR Model Delay with Self Alignment.

[BibT_eX]

[DOI]

,

,

Anshuman Tripathi

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

2020

Transformer Transducer: One Model Unifying Streaming and Non-streaming Speech Recognition.

[BibT_eX]

[DOI]

Anshuman Tripathi

,

,

,

,

CoRR, 2020

Multilingual Speech Recognition with Self-Attention Structured Parameterization.

[BibT_eX]

[DOI]

,

,

Anshuman Tripathi

,

Bhuvana Ramabhadran

,

,

,

,

,

,

,

Pedro J. Moreno

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss.

[BibT_eX]

[DOI]

,

,

,

Anshuman Tripathi

,

,

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

End-To-End Multi-Talker Overlapping Speech Recognition.

[BibT_eX]

[DOI]

Anshuman Tripathi

,

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Adversarial Training for Multilingual Acoustic Modeling.

[BibT_eX]

[DOI]

,

,

CoRR, 2019

Large-Scale Visual Speech Recognition.

[BibT_eX]

[DOI]

Brendan Shillingford

,

Yannis M. Assael

,

Matthew W. Hoffman

,

,

,

,

,

,

,

Lorrayne Bennett

,

,

,

,

,

Andrew W. Senior

,

Nando de Freitas

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Monotonic Recurrent Neural Network Transducer and Decoding Strategies.

[BibT_eX]

[DOI]

Anshuman Tripathi

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

A Density Ratio Approach to Language Model Fusion in End-to-End Automatic Speech Recognition.

[BibT_eX]

[DOI]

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2019

2018

Large-Scale Visual Speech Recognition.

[BibT_eX]

[DOI]

Brendan Shillingford

,

Yannis M. Assael

,

Matthew W. Hoffman

,

,

,

,

,

,

,

Lorrayne Bennett

,

,

,

,

Andrew W. Senior

,

Nando de Freitas

CoRR, 2018

Speech Recognition for Medical Conversations.

[BibT_eX]

[DOI]

Chung-Cheng Chiu

,

Anshuman Tripathi

,

,

,

,

Diana Jaunzeikare

,

,

,

,

,

Justin Tansuwan

,

,

,

Proceedings of the 19th Annual Conference of the International Speech Communication Association, 2018

2017

Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition.

[BibT_eX]

[DOI]

,

,

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Recurrent Neural Aligner: An Encoder-Decoder Neural Network Model for Sequence to Sequence Mapping.

[BibT_eX]

[DOI]

,

,

,

Françoise Beaufays

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Acoustic Modeling for Google Home.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Multi-accent speech recognition with hierarchical grapheme based models.

[BibT_eX]

[DOI]

,

Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Reducing the computational complexity for whole word models.

[BibT_eX]

[DOI]

,

,

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer.

[BibT_eX]

[DOI]

,

,

Rohit Prabhavalkar

Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

2016

Flat start training of CD-CTC-SMBR LSTM RNN acoustic models.

[BibT_eX]

[DOI]

,

Andrew W. Senior

,

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Personalized speech recognition on mobile devices.

[BibT_eX]

[DOI]

,

Rohit Prabhavalkar

,

,

Montse Gonzalez Arenas

,

,

,

,

,

Alexander Gruenstein

,

Françoise Beaufays

,

Carolina Parada

Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

2015

Fast and accurate recurrent neural network acoustic models for speech recognition.

[BibT_eX]

[DOI]

,

Andrew W. Senior

,

,

Françoise Beaufays

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis.

[BibT_eX]

[DOI]

,

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Context dependent phone models for LSTM RNN acoustic modelling.

[BibT_eX]

[DOI]

Andrew W. Senior

,

,

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Learning acoustic frame labeling for speech recognition with recurrent neural networks.

[BibT_eX]

[DOI]

,

Andrew W. Senior

,

,

,

,

Françoise Beaufays

,

Johan Schalkwyk

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks.

[BibT_eX]

[DOI]

Tara N. Sainath

,

,

Andrew W. Senior

,

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Grapheme-to-phoneme conversion using Long Short-Term Memory recurrent neural networks.

[BibT_eX]

[DOI]

,

,

,

Françoise Beaufays

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Acoustic modelling with CD-CTC-SMBR LSTM RNNS.

[BibT_eX]

[DOI]

Andrew W. Senior

,

,

Felix de Chaumont Quitry

,

Tara N. Sainath

,

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition.

[BibT_eX]

[DOI]

,

Andrew W. Senior

,

Françoise Beaufays

CoRR, 2014

Sequence discriminative distributed training of long short-term memory recurrent neural networks.

[BibT_eX]

[DOI]

,

,

,

Andrew W. Senior

,

,

,

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Long short-term memory recurrent neural network architectures for large scale acoustic modeling.

[BibT_eX]

[DOI]

,

Andrew W. Senior

,

Françoise Beaufays

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

Automatic language identification using long short-term memory recurrent neural networks.

[BibT_eX]

[DOI]

Javier Gonzalez-Dominguez

,

Ignacio López-Moreno

,

,

Joaquin Gonzalez-Rodriguez

,

Pedro J. Moreno

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

2013

Written-domain language modeling for automatic speech recognition.

[BibT_eX]

[DOI]

,

,

Françoise Beaufays

,

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Language model verbalization for automatic speech recognition.

[BibT_eX]

[DOI]

,

Françoise Beaufays

,

Kaisuke Nakajima

,

Proceedings of the IEEE International Conference on Acoustics, 2013

Mixture of mixture n-gram language models.

[BibT_eX]

[DOI]

,

,

Kaisuke Nakajima

,

Françoise Beaufays

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012

Morpholexical and Discriminative Language Models for Turkish Automatic Speech Recognition.

[BibT_eX]

[DOI]

,

,

IEEE Trans. Speech Audio Process., 2012

Semi-supervised discriminative language modeling for Turkish ASR.

[BibT_eX]

[DOI]

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Integrating morphology into automatic speech recognition: Morpholexical and discriminative language models for Turkish (Biçimbilimin otomatik konuşma tanımaya bütünleştirilmesi: Türkçe için biçimsözlüksel ve ayırıcı dil modelleri)

[BibT_eX]

[DOI]

PhD thesis, 2011

Resources for Turkish morphological processing.

[BibT_eX]

[DOI]

,

,

Lang. Resour. Evaluation, 2011

Automatic fingersign-to-speech translation system.

[BibT_eX]

[DOI]

,

,

,

Ahmet Alp Kindiroglu

,

,

Alexander L. Ronzhin

,

,

,

,

,

,

,

Murat Saraçlar

,

J. Multimodal User Interfaces, 2011

Discriminative reranking of ASR hypotheses with morpholexical and N-best-list features.

[BibT_eX]

[DOI]

,

,

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

2010

On-the-fly lattice rescoring for real-time automatic speech recognition.

[BibT_eX]

[DOI]

,

,

Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Morphology-based and sub-word language modeling for Turkish speech recognition.

[BibT_eX]

[DOI]

,

,

Proceedings of the IEEE International Conference on Acoustics, 2010

2009

Turkish Broadcast News Transcription and Retrieval.

[BibT_eX]

[DOI]

,

,

,

,

IEEE Trans. Speech Audio Process., 2009

Integrating morphology into automatic speech recognition.

[BibT_eX]

[DOI]

,

,

Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition & Understanding, 2009

A Stochastic Finite-State Morphological Parser for Turkish.

[BibT_eX]

[DOI]

,

,

Proceedings of the ACL 2009, 2009

2008

Turkish Language Resources: Morphological Parser, Morphological Disambiguator and Web Corpus.

[BibT_eX]

[DOI]

,

,

Proceedings of the Advances in Natural Language Processing, 2008

2007

Language modeling for automatic turkish broadcast news transcription.

[BibT_eX]

[DOI]

,

,

Proceedings of the 8th Annual Conference of the International Speech Communication Association, 2007

Morphological Disambiguation of Turkish Text with Perceptron Algorithm.

[BibT_eX]

[DOI]

,

,

Proceedings of the Computational Linguistics and Intelligent Text Processing, 2007

2005

Generation of synthetic speech from Turkish text.

[BibT_eX]

[DOI]

,

,

Proceedings of the 13th European Signal Processing Conference, 2005

Loading...