2025
OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, January, 2025
2024
Asymmetric Deformable Spatio-temporal Framework for Infrared Object Tracking.
ACM Trans. Multim. Comput. Commun. Appl., October, 2024
SYRER: Synergistic Relational Reasoning for RGB-D Cross-Modal Re-Identification.
IEEE Trans. Multim., 2024
A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding.
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy.
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
TextSquare: Scaling up Text-Centric Visual Instruction Tuning.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Harmonizing Visual Text Comprehension and Generation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Prompt-Enhanced Software Vulnerability Detection Using ChatGPT.
Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings, 2024
Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Grab What You Need: Rethinking Complex Table Structure Recognition with Flexible Components Deliberation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding.
CoRR, 2023
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration.
CoRR, 2023
RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename Refactoring.
Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2023
Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
TaCo: Textual Attribute Recognition via Contrastive Learning.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
The Devil Is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-training.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023
2022
The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training.
CoRR, 2022
GMN: Generative Multi-modal Network for Practical Document Information Extraction.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022
Query-driven Generative Network for Document Information Extraction in the Wild.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022
Knowledge Mining with Scene Text for Fine-Grained Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Neural Collaborative Graph Machines for Table Structure Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022
Perceiving Stroke-Semantic Context: Hierarchical Contrastive Learning for Robust Scene Text Recognition.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022
2021
Show, Read and Reason: Table Structure Recognition with Flexible Context Aggregator.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021
RecycleNet: An Overlapped Text Instance Recovery Approach.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021
2020
Person Attribute Recognition by Sequence Contextual Relation Learning.
IEEE Trans. Circuits Syst. Video Technol., 2020
PuzzleNet: Scene Text Detection by Segment Context Graph Learning.
CoRR, 2020
Accurate Structured-Text Spotting for Arithmetical Exercise Correction.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
2019
Independent metric learning with aligned multi-part features for video-based person re-identification.
Multim. Tools Appl., 2019
Deep feature representation and multiple metric ensembles for person re-identification in security surveillance system.
Multim. Tools Appl., 2019
Local region partition for person re-identification.
Multim. Tools Appl., 2019
2018
Video-Based Person Re-Identification With Accumulative Motion Context.
IEEE Trans. Circuits Syst. Video Technol., 2018
Sequence-based Person Attribute Recognition with Joint CTC-Attention Model.
CoRR, 2018
Video-Based Person Re-identification with Adaptive Multi-part Features Learning.
Proceedings of the Advances in Multimedia Information Processing - PCM 2018, 2018
Multi-View Image Generation from a Single-View.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018
2017
End-to-End Comparative Attention Networks for Person Re-Identification.
IEEE Trans. Image Process., 2017
Multi-View Image Generation from a Single-View.
CoRR, 2017
Neural Person Search Machines.
Proceedings of the IEEE International Conference on Computer Vision, 2017
2016
Robust Face Recognition with Deep Multi-View Representation Learning.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016
2015
Kernelized Relaxed Margin Components Analysis for Person Re-identification.
IEEE Signal Process. Lett., 2015
2014
Non-linear metric learning with multiple features for person re-identification.
Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, 2014