Boris Ginsburg

According to our database1, Boris Ginsburg authored at least 100 papers between 2002 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Anticipating Future with Large Language Model for Simultaneous Machine Translation.
CoRR, 2024

VoiceTextBlender: Augmenting Large Language Models with Speech Capabilities via Single-Stage Joint Speech-Text Supervised Fine-Tuning.
CoRR, 2024

Three-in-One: Fast and Accurate Transducer for Hybrid-Autoregressive ASR.
CoRR, 2024

nGPT: Normalized Transformer with Representation Learning on the Hypersphere.
CoRR, 2024

Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data.
CoRR, 2024

EMMeTT: Efficient Multimodal Machine Translation Training.
CoRR, 2024

META-CAT: Speaker-Informed Speech Embeddings via Meta Information Concatenation for Multi-talker ASR.
CoRR, 2024

Chain-of-Thought Prompting for Speech Translation.
CoRR, 2024

Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition.
CoRR, 2024

Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens.
CoRR, 2024

Longer is (Not Necessarily) Stronger: Punctuated Long-Sequence Training for Enhanced Speech Recognition and Translation.
CoRR, 2024

Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR.
CoRR, 2024

NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks.
CoRR, 2024

Genetic Instruct: Scaling up Synthetic Generation of Coding Instructions for Large Language Models.
CoRR, 2024

Romanization Encoding For Multilingual ASR.
CoRR, 2024

Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations.
CoRR, 2024

BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5.
CoRR, 2024

Less is More: Accurate Speech Recognition & Translation without Web-Scale Data.
CoRR, 2024

DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment.
CoRR, 2024

Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment.
CoRR, 2024

Instruction Data Generation and Unsupervised Adaptation for Speech Language Models.
CoRR, 2024

Nemotron-4 340B Technical Report.
CoRR, 2024

Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter.
CoRR, 2024

Label-Looping: Highly Efficient Decoding for Transducers.
CoRR, 2024

RULER: What's the Real Context Size of Your Long-Context Language Models?
CoRR, 2024

SelfVC: Voice Conversion With Iterative Refinement using Self Transformations.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Transducers with Pronunciation-Aware Embeddings for Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

Stateful Conformer with Cache-Based Inference for Streaming Automatic Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2024

Investigating End-to-End ASR Architectures for Long Form Audio Transcription.
Proceedings of the IEEE International Conference on Acoustics, 2024

SALM: Speech-Augmented Language Model with in-Context Learning for Speech Recognition and Translation.
Proceedings of the IEEE International Conference on Acoustics, 2024

Multilingual Audio-Visual Speech Recognition with Hybrid CTC/RNN-T Fast Conformer.
Proceedings of the IEEE International Conference on Acoustics, 2024

A Chat about Boring Problems: Studying GPT-Based Text Normalization.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System.
CoRR, 2023

Property-Aware Multi-Speaker Data Simulation: A Probabilistic Modelling Technique for Synthetic Data Generation.
CoRR, 2023

Towards training Bilingual and Code-Switched Speech Recognition models from Monolingual data sources.
CoRR, 2023

Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition.
CoRR, 2023

Flexible Multichannel Speech Enhancement for Noise-Robust Frontend.
Proceedings of the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2023

NVIDIA NeMo Offline Speech Translation Systems for IWSLT 2023.
Proceedings of the 20th International Conference on Spoken Language Translation, 2023

NeMo Forced Aligner and its application to word alignment for subtitle generation.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

A Compact End-to-End Model with Local and Global Context for Spoken Language Identification.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Leveraging Pretrained ASR Encoders for Effective and Efficient End-to-End Speech Intent Classification and Slot Filling.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Adapter-Based Extension of Multi-Speaker Text-To-Speech Model for New Speakers.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Confidence-based Ensembles of End-to-End Speech Recognition Models.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Text-only domain adaptation for end-to-end ASR using integrated text-to-mel-spectrogram generator.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram mappings.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Efficient Sequence Transduction by Jointly Predicting Tokens and Durations.
Proceedings of the International Conference on Machine Learning, 2023

BigVGAN: A Universal Neural Vocoder with Large-Scale Training.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Conformer-Based Target-Speaker Automatic Speech Recognition For Single-Channel Audio.
Proceedings of the IEEE International Conference on Acoustics, 2023

Multi-Blank Transducers for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Powerful and Extensible WFST Framework for Rnn-Transducer Losses.
Proceedings of the IEEE International Conference on Acoustics, 2023

ACE-VC: Adaptive and Controllable Voice Conversion Using Explicitly Disentangled Self-Supervised Speech Representations.
Proceedings of the IEEE International Conference on Acoustics, 2023

Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models.
Proceedings of the IEEE International Conference on Acoustics, 2023

Vani: Very-Lightweight Accent-Controllable TTS for Native And Non-Native Speakers With Identity Preservation.
Proceedings of the IEEE International Conference on Acoustics, 2023

Fast Conformer With Linearly Scalable Attention For Efficient Speech Recognition.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

LibriSpeech-PC: Benchmark for Evaluation of Punctuation and Capitalization Capabilities of End-to-End ASR Models.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
AmberNet: A Compact End-to-End Model for Spoken Language Identification.
CoRR, 2022

Damage Control During Domain Adaptation for Transducer Based Automatic Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Fast Entropy-Based Methods of Word-Level Confidence Estimation for End-to-End Automatic Speech Recognition.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

NeMo Open Source Speaker Diarization System.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Multi-scale Speaker Diarization with Dynamic Scale Weighting.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

CTC Variations Through New WFST Topologies.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Shallow Fusion of Weighted Finite-State Transducer and Language Model for Text Normalization.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Thutmose Tagger: Single-pass neural model for Inverse Text Normalization.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Mixer-TTS: Non-Autoregressive, Fast and Compact Text-to-Speech Model Conditioned on Language Model Embeddings.
Proceedings of the IEEE International Conference on Acoustics, 2022

TitaNet: Neural Model for Speaker Representation with 1D Depth-Wise Separable Convolutions and Global Context.
Proceedings of the IEEE International Conference on Acoustics, 2022

2021
OpenChem: A Deep Learning Toolkit for Computational Chemistry and Drug Design.
J. Chem. Inf. Model., 2021

Adapting TTS models For New Speakers using Transfer Learning.
CoRR, 2021

A Unified Transformer-based Framework for Duplex Text Normalization.
CoRR, 2021

CarneliNet: Neural Mixture Model for Automatic Speech Recognition.
CoRR, 2021

SGD-QA: Fast Schema-Guided Dialogue State Tracking for Unseen Services.
CoRR, 2021

TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction.
CoRR, 2021

NeMo Toolbox for Speech Dataset Construction.
CoRR, 2021

A Toolbox for Construction and Analysis of Speech Datasets.
Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

NeMo Inverse Text Normalization: From Development to Production.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

NeMo (Inverse) Text Normalization: From Development to Production.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

SPGISpeech: 5, 000 Hours of Transcribed Financial Audio for Fully Formatted End-to-End Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

TalkNet: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Hi-Fi Multi-Speaker English TTS Dataset.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Cross-Language Transfer Learning and Domain Adaptation for End-to-End Automatic Speech Recognition.
Proceedings of the 2021 IEEE International Conference on Multimedia and Expo, 2021

MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection.
Proceedings of the IEEE International Conference on Acoustics, 2021

2020
On regularization of gradient descent, layer imbalance and flat minima.
CoRR, 2020

MatchboxNet: 1D Time-Channel Separable Convolutional Neural Network Architecture for Speech Commands Recognition.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Quartznet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Correction of Automatic Speech Recognition with Transformer Sequence-To-Sequence Model.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
NeMo: a toolkit for building AI applications using Neural Modules.
CoRR, 2019

Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks.
CoRR, 2019

Jasper: An End-to-End Convolutional Neural Acoustic Model.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2018
Training Neural Speech Recognition Systems with Synthetic Speech Augmentation.
CoRR, 2018

OpenSeq2Seq: extensible toolkit for distributed and mixed precision training of sequence-to-sequence models.
CoRR, 2018

Computational mammography using deep neural networks.
Comput. methods Biomech. Biomed. Eng. Imaging Vis., 2018

Mixed Precision Training.
Proceedings of the 6th International Conference on Learning Representations, 2018

Spatially Parallel Convolutions.
Proceedings of the 6th International Conference on Learning Representations, 2018

2017
Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification.
CoRR, 2017

Scaling SGD Batch Size to 32K for ImageNet Training.
CoRR, 2017

Training Deep AutoEncoders for Collaborative Filtering.
CoRR, 2017

On Improving the Numerical Stability of Winograd Convolutions.
Proceedings of the 5th International Conference on Learning Representations, 2017

Factorization tricks for LSTM networks.
Proceedings of the 5th International Conference on Learning Representations, 2017

2016
SEBOOST - Boosting Stochastic Learning Using Subspace Optimization Techniques.
Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, 2016

2002
The ForSpec Temporal Logic: A New Temporal Property-Specification Language.
Proceedings of the Tools and Algorithms for the Construction and Analysis of Systems, 2002


  Loading...