Conghui He

Orcid: 0000-0001-8697-695X

According to our database¹, Conghui He authored at least 96 papers between 2014 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

Exploring the user guidance for more accurate building segmentation from high-resolution remote sensing images.

[BibT_eX]

[DOI]

Int. J. Appl. Earth Obs. Geoinformation, February, 2024

DropQueries: A Simple Way to Discover Comprehensive Segment Representations.

[BibT_eX]

[DOI]

IEEE Trans. Multim., 2024

Weakly Supervised 3-D Building Reconstruction From Monocular Remote Sensing Images.

[BibT_eX]

[DOI]

IEEE Trans. Geosci. Remote. Sens., 2024

Accelerating Diffusion Transformers with Dual Feature Caching.

[BibT_eX]

[DOI]

CoRR, 2024

Where am I? Cross-View Geo-localization with Natural Language Descriptions.

[BibT_eX]

[DOI]

CoRR, 2024

GeoX: Geometric Problem Solving Through Unified Formalized Vision-Language Pre-training.

[BibT_eX]

[DOI]

CoRR, 2024

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions.

[BibT_eX]

[DOI]

CoRR, 2024

OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations.

[BibT_eX]

[DOI]

CoRR, 2024

Chimera: Improving Generalist Model with Domain-Specific Experts.

[BibT_eX]

[DOI]

CoRR, 2024

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling.

[BibT_eX]

[DOI]

CoRR, 2024

OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation.

[BibT_eX]

[DOI]

CoRR, 2024

Can LLMs be Good Graph Judger for Knowledge Graph Construction?

[BibT_eX]

[DOI]

CoRR, 2024

Document Parsing Unveiled: Techniques, Challenges, and Prospects for Structured Information Extraction.

[BibT_eX]

[DOI]

Qintong Zhang

Victor Shea-Jay Huang

CoRR, 2024

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction.

[BibT_eX]

[DOI]

CoRR, 2024

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception.

[BibT_eX]

[DOI]

CoRR, 2024

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models.

[BibT_eX]

[DOI]

CoRR, 2024

Multi-Agent Collaborative Data Selection for Efficient LLM Pretraining.

[BibT_eX]

[DOI]

CoRR, 2024

Utilize the Flow before Stepping into the Same River Twice: Certainty Represented Knowledge Flow for Refusal-Aware Instruction Tuning.

[BibT_eX]

[DOI]

CoRR, 2024

Gradual Learning: Optimizing Fine-Tuning with Partially Mastered Knowledge in Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

MinerU: An Open-Source Solution for Precise Document Content Extraction.

[BibT_eX]

[DOI]

CoRR, 2024

Harnessing Diversity for Important Data Selection in Pretraining Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

CDM: A Reliable Metric for Fair and Accurate Formula Recognition Evaluation.

[BibT_eX]

[DOI]

CoRR, 2024

UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios.

[BibT_eX]

[DOI]

CoRR, 2024

CrossViewDiff: A Cross-View Diffusion Model for Satellite-to-Street View Synthesis.

[BibT_eX]

[DOI]

CoRR, 2024

Fine-Grained Building Function Recognition from Street-View Images via Geometry-Aware Semi-Supervised Learning.

[BibT_eX]

[DOI]

CoRR, 2024

SkyDiffusion: Street-to-Satellite Image Synthesis with Diffusion Models and BEV Paradigm.

[BibT_eX]

[DOI]

CoRR, 2024

Synth-Empathy: Towards High-Quality Synthetic Empathy Data.

[BibT_eX]

[DOI]

CoRR, 2024

SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

OpenDataLab: Empowering General Artificial Intelligence with Open Datasets.

[BibT_eX]

[DOI]

CoRR, 2024

Navigating the Data Trading Crossroads: An Interdisciplinary Survey.

[BibT_eX]

[DOI]

CoRR, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.

[BibT_eX]

[DOI]

CoRR, 2024

KeyVideoLLM: Towards Large-scale Video Keyframe Selection.

[BibT_eX]

[DOI]

CoRR, 2024

DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text.

[BibT_eX]

[DOI]

CoRR, 2024

DSDL: Data Set Description Language for Bridging Modalities and Tasks in AI Data.

[BibT_eX]

[DOI]

CoRR, 2024

A Survey of Multimodal Large Language Model from A Data-centric Perspective.

[BibT_eX]

[DOI]

CoRR, 2024

FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites.

[BibT_eX]

[DOI]

CoRR, 2024

UniMERNet: A Universal Network for Real-World Mathematical Expression Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.

[BibT_eX]

[DOI]

CoRR, 2024

H2RSVLM: Towards Helpful and Honest Remote Sensing Large Vision Language Model.

[BibT_eX]

[DOI]

CoRR, 2024

InternLM2 Technical Report.

[BibT_eX]

[DOI]

et al.

CoRR, 2024

ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training.

[BibT_eX]

[DOI]

CoRR, 2024

LOCR: Location-Guided Transformer for Optical Character Recognition.

[BibT_eX]

[DOI]

CoRR, 2024

WanJuan-CC: A Safe and High-Quality Open-sourced English Webtext Dataset.

[BibT_eX]

[DOI]

CoRR, 2024

SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation.

[BibT_eX]

[DOI]

CoRR, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model.

[BibT_eX]

[DOI]

CoRR, 2024

SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models.

[BibT_eX]

[DOI]

Proceedings of the Forty-first International Conference on Machine Learning, 2024

LOCR: Location-Guided Transformer for Optical Character Recognition.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

LongWanjuan: Towards Systematic Measurement for Long Text Quality.

[BibT_eX]

[DOI]

Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Cross-View Image Geo-Localization with Panorama-BEV Co-retrieval Network.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

MMBench: Is Your Multi-modal Model an All-Around Player?

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

Parrot Captions Teach CLIP to Spot Text.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

ShareGPT4V: Improving Large Multi-modal Models with Better Captions.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2024, 2024

SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

3D Building Reconstruction from Monocular Remote Sensing Images with Multi-level Supervisions.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ProtLLM: An Interleaved Protein-Language LLM with Protein-as-Word Pre-Training.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Benchmarking Chinese Commonsense Reasoning of LLMs: From Chinese-Specifics to Reasoning-Memorization Correlations.

[BibT_eX]

[DOI]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

VIGC: Visual Instruction Generation and Correction.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023

Beyond Hallucinations: Enhancing LVLMs through Hallucination-Aware Direct Preference Optimization.

[BibT_eX]

[DOI]

CoRR, 2023

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition.

[BibT_eX]

[DOI]

CoRR, 2023

MiChao-HuaFen 1.0: A Specialized Pre-trained Corpus Dataset for Domain-specific Large Models.

[BibT_eX]

[DOI]

CoRR, 2023

MLLM-DataEngine: An Iterative Refinement Approach for MLLM.

[BibT_eX]

[DOI]

CoRR, 2023

WanJuan: A Comprehensive Multimodal Dataset for Advancing English and Chinese Large Models.

[BibT_eX]

[DOI]

CoRR, 2023

LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model.

[BibT_eX]

[DOI]

CoRR, 2023

V3Det: Vast Vocabulary Visual Detection Dataset.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

OmniCity: Omnipotent City Understanding with Multi-Level and Multi-View Images.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Think Twice before Driving: Towards Scalable Decoders for End-to-End Autonomous Driving.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

SEPT: Towards Scalable and Efficient Visual Pre-training.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022

Unified Interactive Image Matting.

[BibT_eX]

[DOI]

CoRR, 2022

PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2022, 2022

2021

INTERN: A New Learning Paradigm Towards General Vision.

[BibT_eX]

[DOI]

CoRR, 2021

Influence Selection for Active Learning.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

3D Building Reconstruction from Monocular Remote Sensing Images.

[BibT_eX]

[DOI]

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Joint Semantic-geometric Learning for Polygonal Building Segmentation.

[BibT_eX]

[DOI]

Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020

FLAVA: Find, Localize, Adjust and Verify to Annotate LiDAR-based Point Clouds.

[BibT_eX]

[DOI]

Proceedings of the UIST '20 Adjunct: The 33rd Annual ACM Symposium on User Interface Software and Technology, 2020

2019

Optimizing Finite Volume Method Solvers on Nvidia GPUs.

[BibT_eX]

[DOI]

IEEE Trans. Parallel Distributed Syst., 2019

Semantic Segmentation-Based Building Footprint Extraction Using Very High-Resolution Satellite Images and Multi-Source GIS Data.

[BibT_eX]

[DOI]

Remote. Sens., 2019

A Real-Time Tree Crown Detection Approach for Large-Scale Remote Sensing Images on FPGAs.

[BibT_eX]

[DOI]

Remote. Sens., 2019

Finding Mutual X at WeChat-Scale Social Network in Ten Minitues.

[BibT_eX]

[DOI]

Proceedings of the 2019 IEEE International Conference on Big Data (IEEE BigData), 2019

2018

Simulating the Wenchuan earthquake with accurate surface topography on Sunway TaihuLight.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2018

Semantic Segmentation Based Building Extraction Method Using Multi-Source GIS Map Datasets and Satellite Imagery.

[BibT_eX]

[DOI]

Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018

swCaffe: A Parallel Framework for Accelerating Deep Learning Applications on Sunway TaihuLight.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Cluster Computing, 2018

2017

A Fully-Pipelined Hardware Design for Gaussian Mixture Models.

[BibT_eX]

[DOI]

IEEE Trans. Computers, 2017

18.9-Pflops nonlinear earthquake simulation on Sunway TaihuLight: enabling depiction of 18-Hz and 8-meter scenarios.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2017

An FPGA-based tree crown detection approach for remote sensing images.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Field Programmable Technology, 2017

Exploring the potential of reconfigurable platforms for order book update.

[BibT_eX]

[DOI]

Proceedings of the 27th International Conference on Field Programmable Logic and Applications, 2017

Accelerating Financial Market Server through Hybrid List Design (Abstract Only).

[BibT_eX]

[DOI]

Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017

A Nanosecond-Level Hybrid Table Design for Financial Market Data Generators.

[BibT_eX]

[DOI]

Proceedings of the 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2017

2016

A time-space domain stereo finite difference method for 3D scalar wave propagation.

[BibT_eX]

[DOI]

Comput. Geosci., 2016

Refactoring and optimizing the community atmosphere model (CAM) on the sunway taihulight supercomputer.

[BibT_eX]

[DOI]

Proceedings of the International Conference for High Performance Computing, 2016

2014

Global-Scale Associations of Vegetation Phenology with Rainfall and Temperature at a High Spatio-Temporal Resolution.

[BibT_eX]

[DOI]

Remote. Sens., 2014

Conghui He

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...