2025

OCRBench v2: An Improved Benchmark for Evaluating Large Multimodal Models on Visual Text Localization and Reasoning.

[DOI]

Ling Fu

Biao Yang

CoRR, January, 2025

2024

Asymmetric Deformable Spatio-temporal Framework for Infrared Object Tracking.

[DOI]

ACM Trans. Multim. Comput. Commun. Appl., October, 2024

SYRER: Synergistic Relational Reasoning for RGB-D Cross-Modal Re-Identification.

[DOI]

IEEE Trans. Multim., 2024

A Bounding Box is Worth One Token: Interleaving Layout and Text in a Large Language Model for Document Understanding.

[DOI]

CoRR, 2024

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy.

[DOI]

CoRR, 2024

MTVQA: Benchmarking Multilingual Text-Centric Visual Question Answering.

[DOI]

Mohamad Fitri Faiz Bin Mahmood

CoRR, 2024

TextSquare: Scaling up Text-Centric Visual Instruction Tuning.

[DOI]

CoRR, 2024

Harmonizing Visual Text Comprehension and Generation.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy.

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Prompt-Enhanced Software Vulnerability Detection Using ChatGPT.

[DOI]

Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings, 2024

Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Grab What You Need: Rethinking Complex Table Structure Recognition with Flexible Components Deliberation.

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding.

[DOI]

CoRR, 2023

Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration.

[DOI]

CoRR, 2023

RefBERT: A Two-Stage Pre-trained Framework for Automatic Rename Refactoring.

[DOI]

Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2023

Locate Then Generate: Bridging Vision and Language with Bounding Box for Scene-Text VQA.

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

TaCo: Textual Attribute Recognition via Contrastive Learning.

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

The Devil Is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-training.

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

The Devil is in the Frequency: Geminated Gestalt Autoencoder for Self-Supervised Visual Pre-Training.

[DOI]

CoRR, 2022

GMN: Generative Multi-modal Network for Practical Document Information Extraction.

[DOI]

Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

Query-driven Generative Network for Document Information Extraction in the Wild.

[DOI]

Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Knowledge Mining with Scene Text for Fine-Grained Recognition.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Neural Collaborative Graph Machines for Table Structure Recognition.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

NomMer: Nominate Synergistic Context in Vision Transformer for Visual Recognition.

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Perceiving Stroke-Semantic Context: Hierarchical Contrastive Learning for Robust Scene Text Recognition.

[DOI]

Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021

Show, Read and Reason: Table Structure Recognition with Flexible Context Aggregator.

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

RecycleNet: An Overlapped Text Instance Recovery Approach.

[DOI]

Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

2020

Person Attribute Recognition by Sequence Contextual Relation Learning.

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2020

PuzzleNet: Scene Text Detection by Segment Context Graph Learning.

[DOI]

CoRR, 2020

Accurate Structured-Text Spotting for Arithmetical Exercise Correction.

[DOI]

Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019

Independent metric learning with aligned multi-part features for video-based person re-identification.

[DOI]

Multim. Tools Appl., 2019

Deep feature representation and multiple metric ensembles for person re-identification in security surveillance system.

[DOI]

Multim. Tools Appl., 2019

Local region partition for person re-identification.

[DOI]

Multim. Tools Appl., 2019

2018

Video-Based Person Re-Identification With Accumulative Motion Context.

[DOI]

IEEE Trans. Circuits Syst. Video Technol., 2018

Sequence-based Person Attribute Recognition with Joint CTC-Attention Model.

[DOI]

CoRR, 2018

Video-Based Person Re-identification with Adaptive Multi-part Features Learning.

[DOI]

Proceedings of the Advances in Multimedia Information Processing - PCM 2018, 2018

Multi-View Image Generation from a Single-View.

[DOI]

Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

2017

End-to-End Comparative Attention Networks for Person Re-Identification.

[DOI]

IEEE Trans. Image Process., 2017

Multi-View Image Generation from a Single-View.

[DOI]

CoRR, 2017

Neural Person Search Machines.

[DOI]

Proceedings of the IEEE International Conference on Computer Vision, 2017

2016

Robust Face Recognition with Deep Multi-View Representation Learning.

[DOI]

Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

2015

Kernelized Relaxed Margin Components Analysis for Person Re-identification.

[DOI]

Hao Liu

Meibin Qi

Jianguo Jiang

IEEE Signal Process. Lett., 2015

2014

Non-linear metric learning with multiple features for person re-identification.

[DOI]

Jianguo Jiang

Hao Liu

Meibin Qi

Proceedings of the IEEE China Summit & International Conference on Signal and Information Processing, 2014