Zhijian Ou

Orcid: 0000-0002-9018-5074

  • Tsinghua University, China

According to our database1, Zhijian Ou authored at least 90 papers between 2001 and 2024.

Collaborative distances:



In proceedings 
PhD thesis 


Online presence:

On csauthors.net:


Energy-Based Models with Applications to Speech and Language Processing.
Found. Trends Signal Process., 2024

An Empirical Study of Retrieval Augmented Generation with Chain-of-Thought.
CoRR, 2024

Low-Resourced Speech Recognition for Iu Mien Language via Weakly-Supervised Phoneme-based Multilingual Pre-training.
CoRR, 2024

CUSIDE-T: Chunking, Simulating Future and Decoding for Transducer based Streaming ASR.
CoRR, 2024

Whistle: Data-Efficient Multilingual and Crosslingual Speech Recognition via Weakly Phonetic Supervision.
CoRR, 2024

The 2nd FutureDial Challenge: Dialog Systems with Retrieval Augmented Generation (FutureDial-RAG).
CoRR, 2024

UniPCM: Universal Pre-trained Conversation Model with Task-aware Automatic Prompt.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Variational Latent-State GPT for Semi-Supervised Task-Oriented Dialog Systems.
IEEE ACM Trans. Audio Speech Lang. Process., 2023

Persistently Trained, Diffusion-assisted Energy-based Models.
CoRR, 2023

Callee: Recovering Call Graphs for Binaries with Transfer and Contrastive Learning.
Proceedings of the 44th IEEE Symposium on Security and Privacy, 2023

Exploring Energy-based Language Models with Different Architectures and Training Methods for Speech Recognition.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Knowledge-Retrieval Task-Oriented Dialog Systems with Semi-Supervision.
Proceedings of the 24th Annual Conference of the International Speech Communication Association, 2023

Prompt Pool Based Class-Incremental Continual Learning for Dialog State Tracking.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

A Generative User Simulator with GPT-based Architecture and Goal State Tracking for Reinforced Multi-Domain Dialog Systems.
CoRR, 2022

Jointly Reinforced User Simulator and Task-oriented Dialog System with Simplified Generative Architecture.
CoRR, 2022

Information Extraction and Human-Robot Dialogue towards Real-life Tasks: A Baseline Study with the MobileCS Dataset.
CoRR, 2022

A Challenge on Semi-Supervised and Reinforced Task-Oriented Dialog Systems.
CoRR, 2022

Revisiting Markovian Generative Architectures for Efficient Task-Oriented Dialog Systems.
CoRR, 2022

Exploiting Single-Channel Speech for Multi-Channel End-to-End Speech Recognition: A Comparative Study.
CoRR, 2022

Building Markovian Generative Architectures Over Pretrained LM Backbones for Efficient Task-Oriented Dialog Systems.
Proceedings of the IEEE Spoken Language Technology Workshop, 2022

Advancing Semi-Supervised Task Oriented Dialog Systems by JSA Learning of Discrete Latent Variable Models.
Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, 2022

Exploiting Single-Channel Speech for Multi-Channel End-to-End Speech Recognition: A Comparative Study.
Proceedings of the 13th International Symposium on Chinese Spoken Language Processing, 2022

An Empirical Study of Language Model Integration for Transducer based Speech Recognition.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

CUSIDE: Chunking, Simulating Future Context and Decoding for Streaming ASR.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

iCallee: Recovering Call Graphs for Binaries.
CoRR, 2021

Advancing CTC-CRF Based End-to-End Speech Recognition with Wordpieces and Conformers.
CoRR, 2021

Efficient Neural Architecture Search for End-to-End Speech Recognition Via Straight-Through Gradients.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

The SLT 2021 Children Speech Recognition Challenge: Open Datasets, Rules and Baselines.
Proceedings of the IEEE Spoken Language Technology Workshop, 2021

An Empirical Comparison of Joint-Training and Pre-Training for Domain-Agnostic Semi-Supervised Learning Via Energy-Based Models.
Proceedings of the 2021 IEEE 31st International Workshop on Machine Learning for Signal Processing (MLSP), 2021

Deformable TDNN with Adaptive Receptive Fields for Speech Recognition.
Proceedings of the 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30, 2021

Multilingual and Crosslingual Speech Recognition Using Phonological-Vector Based Phone Embeddings.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2021

Semi-Supervised Seq2seq Joint-Stochastic-Approximation Autoencoders With Applications to Semantic Parsing.
IEEE Signal Process. Lett., 2020

An empirical study of domain-agnostic semi-supervised learning via energy-based models: joint-training and pre-training.
CoRR, 2020

A Probabilistic End-To-End Task-Oriented Dialog Model with Latent Belief States towards Semi-Supervised Learning.
CoRR, 2020

Joint Stochastic Approximation and Its Application to Learning Discrete Latent Variable Models.
Proceedings of the Thirty-Sixth Conference on Uncertainty in Artificial Intelligence, 2020

Improved Learning of Word Embeddings with Word Definitions and Semantic Injection.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

CAT: A CTC-CRF Based ASR Toolkit Bridging the Hybrid and the End-to-End Approaches Towards Data Efficiency and Low Latency.
Proceedings of the 21st Annual Conference of the International Speech Communication Association, 2020

Upgrading CRFS to JRFS and its Benefits to Sequence Modeling and Labeling.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Integrating Discrete and Neural Features Via Mixed-Feature Trans-Dimensional Random Field Language Models.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

A Probabilistic End-To-End Task-Oriented Dialog Model with Latent Belief States towards Semi-Supervised Learning.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Paraphrase Augmented Task-Oriented Dialog Generation.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

Task-Oriented Dialog Systems That Consider Multiple Appropriate Responses under the Same Context.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

CAT: CRF-based ASR Toolkit.
CoRR, 2019

CRF-based Single-stage Acoustic Modeling with CTC Topology.
Proceedings of the IEEE International Conference on Acoustics, 2019

Neural CRF Transducers for Sequence Labeling.
Proceedings of the IEEE International Conference on Acoustics, 2019

Learning Trans-Dimensional Random Fields with Applications to Language Modeling.
IEEE Trans. Pattern Anal. Mach. Intell., 2018

Elastic CRFs for Open-ontology Slot Filling.
CoRR, 2018

A Review of Learning with Deep Generative Models from perspective of graphical modeling.
CoRR, 2018

Learning Neural Random Fields with Inclusive Auxiliary Generators.
CoRR, 2018

Learning Sparse Structured Ensembles with SG-MCMC and Network Pruning.
CoRR, 2018

Improved Training Of Neural Trans-Dimensional Random field Language Models with Dynamic Noise-Contrastive Estimation.
Proceedings of the 2018 IEEE Spoken Language Technology Workshop, 2018

Learning Sparse Structured Ensembles with stochastic Gradient MCMC Sampling and Network Pruning.
Proceedings of the 28th IEEE International Workshop on Machine Learning for Signal Processing, 2018

Hybrid CTC-Attention based End-to-End Speech Recognition using Subword Units.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Angular Softmax Loss for End-to-end Speaker Verification.
Proceedings of the 11th International Symposium on Chinese Spoken Language Processing, 2018

Learning Neural Trans-Dimensional Random Field Language Models with Noise-Contrastive Estimation.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Tracking of Enriched Dialog States for Flexible Conversational Information Access.
Proceedings of the 2018 IEEE International Conference on Acoustics, 2018

Joint Bayesian Gaussian Discriminant Analysis for speaker verification.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Language modeling with neural trans-dimensional random fields.
Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop, 2017

Scalable Discovery of Audio Fingerprint Motifs in Broadcast Streams With Determinantal Point Process Based Motif Clustering.
IEEE ACM Trans. Audio Speech Lang. Process., 2016

Joint Stochastic Approximation learning of Helmholtz Machines.
CoRR, 2016

Improving and Scaling Trans-dimensional Random Field Language Models.
CoRR, 2016

Block-wise map inference for determinantal point processes with application to change-point detection.
Proceedings of the IEEE Statistical Signal Processing Workshop, 2016

Use of particle filtering and MCMC for inference in Probabilistic Acoustic Tube model.
Proceedings of the IEEE Statistical Signal Processing Workshop, 2016

Block-Wise MAP Inference for Determinantal Point Processes with Application to Change-Point Detection.
CoRR, 2015

Incorporating AM-FM effect in voiced speech for probabilistic acoustic tube model.
Proceedings of the 2015 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2015

Trans-dimensional Random Fields for Language Modeling.
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, 2015

Low-complexity video encoder for smart eyes based on underdetermined blind signal separation.
CoRR, 2014

Joint-character-POC N-gram language modeling for Chinese speech recognition.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014

Improvement of Probabilistic Acoustic Tube model for speech decomposition.
Proceedings of the IEEE International Conference on Acoustics, 2014

Probabilistic acoustic tube: a probabilistic generative model of speech for speech analysis/synthesis.
Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, 2012

CRF-based confidence measures of recognized candidates for lattice-based audio indexing.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Combining eigenvoice speaker modeling and VTS-based environment compensation for robust speech recognition.
Proceedings of the 2012 IEEE International Conference on Acoustics, 2012

Combining HMM-based melody extraction and NMF-based soft masking for separating voice and accompaniment from monaural audio.
Proceedings of the IEEE International Conference on Acoustics, 2011

Topic-weak-correlated Latent Dirichlet allocation.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

A study of large vocabulary speech recognition decoding using finite-state graphs.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

Spoken English assessment system for non-native speakers using acoustic and prosodic features.
Proceedings of the 11th Annual Conference of the International Speech Communication Association, 2010

Variational nonparametric Bayesian Hidden Markov Model.
Proceedings of the IEEE International Conference on Acoustics, 2010

Caption-aided speech detection in videos.
Proceedings of the IEEE International Conference on Acoustics, 2008

Closely Coupled Array Processing and Model-Based Compensation for Microphone Array Speech Recognition.
IEEE Trans. Speech Audio Process., 2007

Switching Auxiliary Chains for Speech Recognition.
IEEE Signal Process. Lett., 2007

Latent Correlation Analysis of HMM Parameters for Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2007

Generalized Time-Series Active Search With Kullback-Leibler Distance for Audio Fingerprinting.
IEEE Signal Process. Lett., 2006

Partial-tied-mixture Auxiliary Chain Models for Speech Recognition Based on Dynamic Bayesian Networks.
Proceedings of the IEEE International Conference on Systems, 2006

Switching Auxiliary Chains for Speech Recognition based on Dynamic Bayesian Networks.
Proceedings of the 18th International Conference on Pattern Recognition (ICPR 2006), 2006

Discriminative speaker adaptation with eigenvoices.
Proceedings of the 9th European Conference on Speech Communication and Technology, 2005

Closely Coupled Array Processing and Model-Based Compensation for Microphone Array Speech Recognition.
Proceedings of the 2005 IEEE International Conference on Acoustics, 2005

Discriminative combination of multiple linear predictions for speech recognition.
Proceedings of the 8th International Conference on Spoken Language Processing, 2004

A combined model of statics-dynamics of speech optimized using maximum mutual information.
Proceedings of the 7th International Conference on Spoken Language Processing, ICSLP2002, 2002

A new combined model of statics-dynamics of speech.
Proceedings of the IEEE International Conference on Acoustics, 2002

A new DP-like speaker clustering algorithm.
Proceedings of the EUROSPEECH 2001 Scandinavia, 2001
