2025
OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning.
CoRR, January, 2025

2024
Asymmetric Deformable Spatio-temporal Framework for Infrared Object Tracking.
ACM Trans. Multim. Comput. Commun. Appl., October, 2024

SYRER: Synergistic Relational Reasoning for RGB-D Cross-Modal Re-Identification.
IEEE Trans. Multim., 2024

A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding.
CoRR, 2024

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy.
CoRR, 2024

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering.
CoRR, 2024

TextSquare: Scaling up Text-Centric Visual Instruction Tuning.
CoRR, 2024

Harmonizing Visual Text Comprehension and Generation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Prompt-Enhanced Software Vulnerability Detection Using ChatGPT.
Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings, 2024

Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Grab What You Need: Rethinking Complex Table Structure Recognition with Flexible Components Deliberation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding.
CoRR, 2023

Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration.
CoRR, 2023

RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename Refactoring.
Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2023

Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

TaCo: Textual Attribute Recognition via Contrastive Learning.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

The Devil Is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-training.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training.
CoRR, 2022

GMN: Generative Multi-modal Network for Practical Document Information Extraction.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Query-driven Generative Network for Document Information Extraction in the Wild.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Knowledge Mining with Scene Text for Fine-Grained Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Neural Collaborative Graph Machines for Table Structure Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Perceiving Stroke-Semantic Context: Hierarchical Contrastive Learning for Robust Scene Text Recognition.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
Show, Read and Reason: Table Structure Recognition with Flexible Context Aggregator.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

RecycleNet: An Overlapped Text Instance Recovery Approach.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

2020
Person Attribute Recognition by Sequence Contextual Relation Learning.
IEEE Trans. Circuits Syst. Video Technol., 2020

PuzzleNet: Scene Text Detection by Segment Context Graph Learning.
CoRR, 2020

Accurate Structured-Text Spotting for Arithmetical Exercise Correction.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Independent metric learning with aligned multi-part features for video-based person re-identification.
Multim. Tools Appl., 2019

Deep feature representation and multiple metric ensembles for person re-identification in security surveillance system.
Multim. Tools Appl., 2019

Local region partition for person re-identification.
Multim. Tools Appl., 2019

2018
Video-Based Person Re-Identification With Accumulative Motion Context.
IEEE Trans. Circuits Syst. Video Technol., 2018

Sequence-based Person Attribute Recognition with Joint CTC-Attention Model.
CoRR, 2018

Video-Based Person Re-identification with Adaptive Multi-part Features Learning.
Proceedings of the Advances in Multimedia Information Processing - PCM 2018, 2018

Multi-View Image Generation from a Single-View.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

2017
End-to-End Comparative Attention Networks for Person Re-Identification.
IEEE Trans. Image Process., 2017

Multi-View Image Generation from a Single-View.
CoRR, 2017

Neural Person Search Machines.
Proceedings of the IEEE International Conference on Computer Vision, 2017

2016
Robust Face Recognition with Deep Multi-View Representation Learning.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

2015
Kernelized Relaxed Margin Components Analysis for Person Re-identification.
IEEE Signal Process. Lett., 2015

2014
Non-linear metric learning with multiple features for person re-identification.
Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, 2014