Cong Liu

Orcid: 0009-0003-0328-423X

Affiliations:
  • IFLYTEK Research, Hefei, China
  • Microsoft Research Asia, Beijing, China (former)
  • University of Science and Technology of China, Hefei, China (former)


According to our database1, Cong Liu authored at least 63 papers between 2006 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2025
VGTS: Visually Guided Text Spotting for novel categories in historical manuscripts.
Expert Syst. Appl., 2025

2024
Multi-Dimensional Medical Image Fusion With Complex Sparse Representation.
IEEE Trans. Biomed. Eng., September, 2024

Weakly supervised scene text generation for low-resource languages.
Expert Syst. Appl., March, 2024

Dynamic facial expression recognition with pseudo-label guided multi-modal pre-training.
IET Comput. Vis., February, 2024

Syntax-Augmented Hierarchical Interactive Encoder for Zero-Shot Cross-Lingual Information Extraction.
IEEE ACM Trans. Audio Speech Lang. Process., 2024

SEMv2: Table separation line detection based on instance segmentation.
Pattern Recognit., 2024

NDOrder: Exploring a novel decoding order for scene text recognition.
Expert Syst. Appl., 2024

EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion.
CoRR, 2024

Cross-modulated Attention Transformer for RGBT Tracking.
CoRR, 2024

Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation.
CoRR, 2024

1DFormer: A Transformer Architecture Learning 1D Landmark Representations for Facial Landmark Tracking.
Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, 2024

ICDAR 2024 Competition on Recognition of Chemical Structures.
Proceedings of the Document Analysis and Recognition - ICDAR 2024 - 18th International Conference, Athens, Greece, August 30, 2024

NAMER: Non-autoregressive Modeling for Handwritten Mathematical Expression Recognition.
Proceedings of the Computer Vision - ECCV 2024, 2024

Image as a Language: Revisiting Scene Text Recognition via Balanced, Unified and Synchronized Vision-Language Reasoning Network.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Generative Input: Towards Next-Generation Input Methods Paradigm.
CoRR, 2023

1DFormer: Learning 1D Landmark Representations via Transformer for Facial Landmark Tracking.
CoRR, 2023

Untying the Reversal Curse via Bidirectional Language Model Editing.
CoRR, 2023

Exploring Part-Informed Visual-Language Learning for Person Re-Identification.
CoRR, 2023

MADNet: Maximizing Addressee Deduction Expectation for Multi-Party Conversation Generation.
CoRR, 2023

SHINE: Syntax-augmented Hierarchical Interactive Encoder for Zero-shot Cross-lingual Information Extraction.
CoRR, 2023

OTS: A One-shot Learning Approach for Text Spotting in Historical Manuscripts.
CoRR, 2023

SEMv2: Table Separation Line Detection Based on Conditional Convolution.
CoRR, 2023

X-Adv: Physical Adversarial Object Attacks against X-ray Prohibited Item Detection.
Proceedings of the 32nd USENIX Security Symposium, 2023

Handwritten Chemical Structure Image to Structure-Specific Markup Using Random Conditional Guided Decoder.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving.
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

End-to-End Multilingual Text Recognition Based on Byte Modeling.
Proceedings of the Image and Graphics - 12th International Conference, 2023

A Multimodal Text Block Segmentation Framework for Photo Translation.
Proceedings of the Image and Graphics - 12th International Conference, 2023

Speech4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Self-Supervised Audio-Visual Speech Representations Learning by Multimodal Self-Distillation.
Proceedings of the IEEE International Conference on Acoustics, 2023

The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2023

Summary on the Multimodal Information Based Speech Processing (MISP) 2022 Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2023

MADNet: Maximizing Addressee Deduction Expectation for Multi-Party Conversation Generation.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Bi-LRFusion: Bi-Directional LiDAR-Radar Fusion for 3D Dynamic Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

GIFT: Graph-Induced Fine-Tuning for Multi-Party Conversation Understanding.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

HRDoc: Dataset and Baseline Method toward Hierarchical Reconstruction of Document Structures.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
RCIT: An RSVP-Based Concealed Information Test Framework Using EEG Signals.
IEEE Trans. Cogn. Dev. Syst., 2022

AFA: adversarial frequency alignment for domain generalized lung nodule detection.
Neural Comput. Appl., 2022

InterHT: Knowledge Graph Embeddings by Interaction between Head and Tail Entities.
CoRR, 2022

Few-shot X-ray Prohibited Item Detection: A Benchmark and Weak-feature Enhancement Network.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding.
Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results.
Proceedings of the IEEE International Conference on Acoustics, 2022

Wider & Closer: Mixture of Short-channel Distillers for Zero-shot Cross-lingual Named Entity Recognition.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

2021
Generative domain adaptation for chest X-ray image analysis.
IET Image Process., 2021

A Deep Analysis of Speech Separation Guided Diarization Under Realistic Conditions.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2021

2018
Fast and Robust Detection of Anatomical Landmarks Using Cascaded 3D Convolutional Networks Guided by Linear Square Regression.
Proceedings of the Biometric Recognition - 13th Chinese Conference, 2018

2017
Nonrecurrent Neural Structure for Long-Term Dependence.
IEEE ACM Trans. Audio Speech Lang. Process., 2017

2016
Joint training of DNNs by incorporating an explicit dereverberation structure for distant speech recognition.
EURASIP J. Adv. Signal Process., 2016

2015
Feedforward Sequential Memory Networks: A New Structure to Learn Long-term Dependency.
CoRR, 2015

Multi-task deep neural network acoustic models with model adaptation using discriminative speaker identity for whisper recognition.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Improving Deep Neural Network Based Speech Enhancement in Low SNR Environments.
Proceedings of the Latent Variable Analysis and Signal Separation, 2015

A unified speaker-dependent speech separation and enhancement system based on deep neural networks.
Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing, 2015

2013
A cluster-based multiple deep neural networks method for large vocabulary continuous speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

Incoherent training of deep neural networks to de-correlate bottleneck features for speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2013

2012
Investigation of deep neural networks (DNN) for large vocabulary continuous speech recognition: Why DNN surpasses GMMS in acoustic modeling.
Proceedings of the 8th International Symposium on Chinese Spoken Language Processing, 2012

2011
Trust Region-Based Optimization for Maximum Mutual Information Estimation of HMMs in Speech Recognition.
IEEE ACM Trans. Audio Speech Lang. Process., 2011

2010
Phonetic clustering based confidence measure for embedded speech recognition.
Proceedings of the 7th International Symposium on Chinese Spoken Language Processing, 2010

A bounded trust region optimization for discriminative training of HMMS in speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2010

2009
A trust region based optimization for maximum mutual information estimation of HMMS in speech recognition.
Proceedings of the IEEE International Conference on Acoustics, 2009

2008
A Constrained Line Search Optimization Method for Discriminative Training of HMMs.
IEEE Trans. Speech Audio Process., 2008

Exploiting Non-Target Region Information for Confidence Measure Based on Bayesian Information Criterion.
Proceedings of the 6th International Symposium on Chinese Spoken Language Processing, 2008

2007
A Constrained Line Search Optimization for Discriminative Training in Speech Recognition.
Proceedings of the IEEE International Conference on Acoustics, 2007

A constrained line search approach to general discriminative HMM training.
Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, 2007

2006
A Comparative Study on Confidence Measure in Mandarin Command Word Recognition.
Proceedings of the 5th International Symposium on Chinese Spoken Language Processing, 2006


  Loading...