We stand with Ukraine

We stand with Ukraine

Yangyang Shi

Orcid: 0000-0001-5297-4155

According to our database¹, Yangyang Shi authored at least 93 papers between 2011 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Self-Calibration Method of Displacement Sensor in AMB-Rotor System Based on Magnetic Bearing Current Control.

[BibT_eX]

[DOI]

,

,

,

,

,

IEEE Trans. Ind. Electron., May, 2024

Agent-as-a-Judge: Evaluate Agents with Agents.

[BibT_eX]

[DOI]

,

Changsheng Zhao

,

Dylan R. Ashley

,

,

Dmitrii Khizbullin

,

,

,

,

Raghuraman Krishnamoorthi

,

,

,

,

Jürgen Schmidhuber

CoRR, 2024

High Fidelity Text-Guided Music Generation and Editing via Single-Stage Flow Matching.

[BibT_eX]

[DOI]

,

,

,

Sidd Srinivasan

,

,

,

,

,

,

,

,

CoRR, 2024

Speech ReaLLM - Real-time Streaming Speech Recognition with Multimodal LLMs by Teaching the Flow of Time.

[BibT_eX]

[DOI]

,

,

,

,

,

CoRR, 2024

Basis Selection: Low-Rank Decomposition of Pretrained Large Language Models for Target Applications.

[BibT_eX]

[DOI]

,

Changsheng Zhao

,

,

,

,

CoRR, 2024

Not All Weights Are Created Equal: Enhancing Energy Efficiency in On-Device Streaming Speech Recognition.

[BibT_eX]

[DOI]

,

,

,

,

,

Changsheng Zhao

,

,

CoRR, 2024

FADI-AEC: Fast Score Based Diffusion Model Guided by Far-end Signal for Acoustic Echo Cancellation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

CoRR, 2024

StegoType: Surface Typing from Egocentric Cameras.

[BibT_eX]

[DOI]

Mark Richardson

,

,

,

,

Bradford J. Snow

,

,

,

,

,

Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology, 2024

Foleygen: Visually-Guided Audio Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 34th IEEE International Workshop on Machine Learning for Signal Processing, 2024

Characterizing the Histology Spatial Intersections Between Tumor-Infiltrating Lymphocytes and Tumors for Survival Prediction of Cancers Via Graph Contrastive Learning.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the Machine Learning in Medical Imaging - 15th International Workshop, 2024

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases.

[BibT_eX]

[DOI]

,

Changsheng Zhao

,

Forrest N. Iandola

,

,

,

,

,

,

,

Raghuraman Krishnamoorthi

,

,

Proceedings of the Forty-first International Conference on Machine Learning, 2024

Folding Attention: Memory and Power Optimization for On-Device Transformer-Based Streaming Speech Recognition.

[BibT_eX]

[DOI]

,

,

,

Forrest N. Iandola

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Stack-and-Delay: A New Codebook Pattern for Music Generation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Forrest N. Iandola

,

Proceedings of the IEEE International Conference on Acoustics, 2024

On the Open Prompt Challenge in Conditional Audio Generation.

[BibT_eX]

[DOI]

,

Sidd Srinivasan

,

,

,

,

Forrest N. Iandola

,

,

,

Changsheng Zhao

,

,

Proceedings of the IEEE International Conference on Acoustics, 2024

In-Context Prompt Editing for Conditional Audio Generation.

[BibT_eX]

[DOI]

,

,

,

Sidd Srinivasan

,

,

,

,

Forrest N. Iandola

,

Proceedings of the IEEE International Conference on Acoustics, 2024

Scheduled Execution-Based Binary Indirect Call Targets Refinement.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the Computer Security - ESORICS 2024, 2024

Scaling Parameter-Constrained Language Models with Quality Data.

[BibT_eX]

[DOI]

,

Matteo Paltenghi

,

,

,

Changsheng Zhao

,

,

,

Rastislav Rabatin

,

,

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: EMNLP 2024, 2024

Target-Aware Language Modeling via Granular Data Sampling.

[BibT_eX]

[DOI]

,

,

,

Changsheng Zhao

,

,

Rastislav Rabatin

,

,

,

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Tumor Micro-Environment Interactions Guided Graph Learning for Survival Analysis of Human Cancers from Whole-Slide Pathological Images.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

LLM-QAT: Data-Free Quantization Aware Training for Large Language Models.

[BibT_eX]

[DOI]

,

,

Changsheng Zhao

,

,

,

,

,

Raghuraman Krishnamoorthi

,

Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023

Model Reference Adaptive Compensation and Robust Controller for Magnetic Bearing Systems With Strong Persistent Disturbances.

[BibT_eX]

[DOI]

,

,

,

,

,

IEEE Trans. Ind. Electron., November, 2023

Characterizing the Survival-Associated Interactions Between Tumor-Infiltrating Lymphocytes and Tumors From Pathological Images and Multi-Omics Data.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

IEEE Trans. Medical Imaging, October, 2023

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch.

[BibT_eX]

[DOI]

CoRR, 2023

Enhance audio generation controllability through representation similarity regularization.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Forrest N. Iandola

,

,

CoRR, 2023

Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition.

[BibT_eX]

[DOI]

,

,

,

Forrest N. Iandola

,

,

,

CoRR, 2023

DISGO: Automatic End-to-End Evaluation for Scene Text OCR.

[BibT_eX]

[DOI]

,

,

Ankit Ramchandani

,

,

Praveen Krishnan

,

,

,

,

CoRR, 2023

Biased Self-supervised Learning for ASR.

[BibT_eX]

[DOI]

Florian L. Kreyssig

,

,

,

,

Abdel-rahman Mohamed

,

Philip C. Woodland

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Multi-Head State Space Model for Speech Recognition.

[BibT_eX]

[DOI]

Yassir Fathullah

,

,

,

,

,

,

,

,

,

,

Mark J. F. Gales

Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

SCA: Streaming Cross-Attention Alignment For Echo Cancellation.

[BibT_eX]

[DOI]

,

,

,

Kaustubh Kalgaonkar

,

Sriram Srinivasan

,

Proceedings of the IEEE International Conference on Acoustics, 2023

Improving fast-slow Encoder based Transducer with Streaming Deliberation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Michael L. Seltzer

,

Proceedings of the IEEE International Conference on Acoustics, 2023

TorchAudio 2.1: Advancing Speech Recognition, Self-Supervised Learning, and Audio Processing Components for Pytorch.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

Towards Zero-Shot Multilingual Transfer for Code-Switched Responses.

[BibT_eX]

[DOI]

,

Changsheng Zhao

,

,

,

,

,

Biing-Hwang Juang

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Binary and Ternary Natural Language Generation.

[BibT_eX]

[DOI]

,

,

,

,

Raghuraman Krishnamoorthi

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Revisiting Sample Size Determination in Natural Language Understanding.

[BibT_eX]

[DOI]

,

Muhammad Hassan Rashid

,

,

Changsheng Zhao

,

,

,

Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022

Position Extraction of Ultralow-Speed Gimbal Servo System With Linear Hall Sensors.

[BibT_eX]

[DOI]

,

,

IEEE Trans. Ind. Electron., 2022

Synergistic Digital Twin and Holographic Augmented-Reality-Guided Percutaneous Puncture of Respiratory Liver Tumor.

[BibT_eX]

[DOI]

,

,

,

,

,

,

IEEE Trans. Hum. Mach. Syst., 2022

LiCo-Net: Linearized Convolution Network for Hardware-efficient Keyword Spotting.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Raghuraman Krishnamoorthi

,

CoRR, 2022

SCA: Streaming Cross-attention Alignment for Echo Cancellation.

[BibT_eX]

[DOI]

,

,

,

Kaustubh Kalgaonkar

,

Sriram Srinivasan

,

CoRR, 2022

Learning a Dual-Mode Speech Recognition Model VIA Self-Pruning.

[BibT_eX]

[DOI]

,

,

,

,

Raghuraman Krishnamoorthi

,

Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Streaming parallel transducer beam search with fast slow cascaded encoders.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Michael L. Seltzer

Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

Streaming Transformer Transducer based Speech Recognition Using Non-Causal Convolution.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2022

Gadgets Splicing: Dynamic Binary Transformation for Precise Rewriting.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the IEEE/ACM International Symposium on Code Generation and Optimization, 2022

2021

TorchAudio: Building Blocks for Audio and Speech Processing.

[BibT_eX]

[DOI]

CoRR, 2021

Transferring Voice Knowledge for Acoustic Event Detection: An Empirical Study.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

CoRR, 2021

Flexi-Transducer: Optimizing Latency, Accuracy and Compute forMulti-Domain On-Device Scenarios.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Christian Fuegen

,

Michael L. Seltzer

CoRR, 2021

A multiple-relaxation-time collision model by Hermite expansion.

[BibT_eX]

[DOI]

,

,

CoRR, 2021

Versatile multi-constrained planning for thermal ablation of large liver tumors.

[BibT_eX]

[DOI]

,

,

,

,

,

Michael Weinmann

,

,

Comput. Medical Imaging Graph., 2021

Streaming Attention-Based Models with Augmented Memory for End-To-End Speech Recognition.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Michael L. Seltzer

Proceedings of the IEEE Spoken Language Technology Workshop, 2021

Internal Motion Estimation during Free-Breathing via External/Internal Correlation Model.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Real-time Computing and Robotics, 2021

Transformer-Based Acoustic Modeling for Streaming Speech Synthesis.

[BibT_eX]

[DOI]

,

,

,

,

Christian Fuegen

,

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Dynamic Encoder Transducer: A Flexible Solution for Trading Off Accuracy for Latency.

[BibT_eX]

[DOI]

,

,

,

,

,

Rohit Prabhavalkar

,

,

,

,

Christian Fuegen

,

,

Michael L. Seltzer

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Dissecting User-Perceived Latency of On-Device E2E Speech Recognition.

[BibT_eX]

[DOI]

,

Rohit Prabhavalkar

,

,

,

,

,

,

,

,

Christian Fuegen

,

Michael L. Seltzer

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Collaborative Training of Acoustic Encoders for Speech Recognition.

[BibT_eX]

[DOI]

,

,

Ganesh Venkatesh

,

,

Michael L. Seltzer

,

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Flexi-Transducer: Optimizing Latency, Accuracy and Compute for Multi-Domain On-Device Scenarios.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Christian Fuegen

,

Michael L. Seltzer

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

Christian Fuegen

,

,

,

Michael L. Seltzer

Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Transformer in Action: A Comparative Study of Transformer-Based Acoustic Models for Large Scale Speech Recognition Applications.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

Emformer: Efficient Memory Transformer Based Acoustic Model for Low Latency Streaming Speech Recognition.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2021

On Lattice-Free Boosted MMI Training of HMM and CTC-Based Full-Context ASR Models.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

2020

Emformer: Efficient Memory Transformer Based Acoustic Model For Low Latency Streaming Speech Recognition.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

Michael L. Seltzer

CoRR, 2020

Incorporating Android Code Smells into Java Static Code Metrics for Security Risk Prediction of Android Applications.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 20th IEEE International Conference on Software Quality, 2020

Functional code clone detection with syntax and semantics fusion learning.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the ISSTA '20: 29th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2020

Streaming Transformer-Based Acoustic Models Using Self-Attention with Augmented Memory.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Weak-Attention Suppression for Transformer Based Speech Recognition.

[BibT_eX]

[DOI]

,

,

,

Christian Fuegen

,

,

,

,

Michael L. Seltzer

Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Mining Effective Negative Training Samples for Keyword Spotting.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019

Region Proposal Network Based Small-Footprint Keyword Spotting.

[BibT_eX]

[DOI]

,

,

,

,

IEEE Signal Process. Lett., 2019

Knowledge Distillation for Recurrent Neural Network Language Modeling with Trust Regularization.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the IEEE International Conference on Acoustics, 2019

End-to-end Speech Recognition Using a High Rank LSTM-CTC Based Model.

[BibT_eX]

[DOI]

,

,

Proceedings of the IEEE International Conference on Acoustics, 2019

2018

A review of "linear programming computation" by Ping-Qi Pan.

[BibT_eX]

[DOI]

,

,

Eur. J. Oper. Res., 2018

Robust Control for a Magnetically Suspended Control Moment Gyro with Strong Gyroscopic Effects.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the IECON 2018, 2018

2017

基于Feistel结构的超轻量级分组密码算法(PFP) (Ultra-lightweight Block Cipher Algorithm (PFP) Based on Feistel Structure).

[BibT_eX]

[DOI]

,

,

,

,

,

计算机科学, 2017

2016

Deep LSTM based Feature Mapping for Query Classification.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the NAACL HLT 2016, 2016

Recurrent Support Vector Machines For Slot Tagging In Spoken Language Understanding.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the NAACL HLT 2016, 2016

2015

Integrating meta-information into recurrent neural network language models.

[BibT_eX]

[DOI]

,

Martha A. Larson

,

,

Catholijn M. Jonker

,

Patrick Wambacq

,

,

Speech Commun., 2015

Recurrent neural network language model adaptation with curriculum learning.

[BibT_eX]

[DOI]

,

Martha A. Larson

,

Catholijn M. Jonker

Comput. Speech Lang., 2015

RNN-based labeled data generation for spoken language understanding.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 16th Annual Conference of the International Speech Communication Association, 2015

Contextual spoken language understanding using recurrent neural networks.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

A factorization network based method for multi-lingual domain classification.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Semi-supervised slot tagging in spoken language understanding using recurrent transductive support vector machines.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding, 2015

2014

Spoken language understanding using long short-term memory neural networks.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 2014 IEEE Spoken Language Technology Workshop, 2014

Cluster based Chinese abbreviation modeling.

[BibT_eX]

[DOI]

,

,

Proceedings of the 15th Annual Conference of the International Speech Communication Association, 2014

2013

Classifying the socio-situational settings of transcripts of spoken discourses.

[BibT_eX]

[DOI]

,

,

Catholijn M. Jonker

Speech Commun., 2013

K-Component Adaptive Recurrent Neural Network Language Models.

[BibT_eX]

[DOI]

,

Martha A. Larson

,

,

Catholijn M. Jonker

Proceedings of the Text, Speech, and Dialogue - 16th International Conference, 2013

Recurrent neural networks for language understanding.

[BibT_eX]

[DOI]

,

,

,

,

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Exploiting the succeeding words in recurrent neural network language models.

[BibT_eX]

[DOI]

,

Martha A. Larson

,

,

Catholijn M. Jonker

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

Speed up of recurrent neural network language models with sentence independent subsampling stochastic gradient descent.

[BibT_eX]

[DOI]

,

,

,

Martha A. Larson

Proceedings of the 14th Annual Conference of the International Speech Communication Association, 2013

K-component recurrent neural network language models using curriculum learning.

[BibT_eX]

[DOI]

,

Martha A. Larson

,

Catholijn M. Jonker

Proceedings of the 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 2013

2012

Adaptive Language Modeling with a Set of Domain Dependent Models.

[BibT_eX]

[DOI]

,

,

Catholijn M. Jonker

Proceedings of the Text, Speech and Dialogue - 15th International Conference, 2012

TUD at MediaEval 2012 genre tagging task: Multi-modality video categorization with one-vs-all classifiers.

[BibT_eX]

[DOI]

,

,

Martha A. Larson

Proceedings of the Working Notes Proceedings of the MediaEval 2012 Workshop, 2012

MediaEval 2012 Tagging Task: Prediction based on One Best List and Confusion Networks.

[BibT_eX]

[DOI]

,

Martha A. Larson

,

,

Catholijn M. Jonker

Proceedings of the Working Notes Proceedings of the MediaEval 2012 Workshop, 2012

Towards Recurrent Neural Networks Language Models with Linguistic and Contextual Features.

[BibT_eX]

[DOI]

,

,

Catholijn M. Jonker

Proceedings of the 13th Annual Conference of the International Speech Communication Association, 2012

Dynamic Bayesian socio-situational setting classification.

[BibT_eX]

[DOI]

,

,

Catholijn M. Jonker

Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

2011

Combining Topic Specific Language Models.

[BibT_eX]

[DOI]

,

,

Catholijn M. Jonker

Proceedings of the Text, Speech and Dialogue - 14th International Conference, 2011

Socio-situational setting classification based on language use.

[BibT_eX]

[DOI]

,

,

Catholijn M. Jonker

Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, 2011

Loading...