Yuhang Cao

Orcid: 0009-0008-3627-590X

According to our database¹, Yuhang Cao authored at least 36 papers between 2017 and 2024.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of four.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

On csauthors.net:

Bibliography

2024

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions.

[BibT_eX]

[DOI]

CoRR, 2024

MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction.

[BibT_eX]

[DOI]

CoRR, 2024

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree.

[BibT_eX]

[DOI]

CoRR, 2024

Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate.

[BibT_eX]

[DOI]

CoRR, 2024

BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way.

[BibT_eX]

[DOI]

CoRR, 2024

SCA: Highly Efficient Semantic-Consistent Unrestricted Adversarial Attack.

[BibT_eX]

[DOI]

CoRR, 2024

A General-Purpose Device for Interaction with LLMs.

[BibT_eX]

[DOI]

CoRR, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.

[BibT_eX]

[DOI]

CoRR, 2024

V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results.

[BibT_eX]

[DOI]

CoRR, 2024

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.

[BibT_eX]

[DOI]

CoRR, 2024

DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models.

[BibT_eX]

[DOI]

CoRR, 2024

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model.

[BibT_eX]

[DOI]

CoRR, 2024

Ximalaya ASDR System for ICASSP 2024 in-Car Multi-Channel (ICMC) ASR Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

Diacorrect: Error Correction Back-End for Speaker Diarization.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2024

MDCRA: A Reconfigurable Accelerator Framework for Multiple Dataflow Lanes.

[BibT_eX]

[DOI]

Proceedings of the 35th IEEE International Conference on Application-specific Systems, 2024

2023

InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition.

[BibT_eX]

[DOI]

CoRR, 2023

Exploring the Power of Cross-Contextual Large Language Model in Mimic Emotion Prediction.

[BibT_eX]

[DOI]

Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, 2023

Multimodal Cross-Lingual Features and Weight Fusion for Cross-Cultural Humor Detection.

[BibT_eX]

[DOI]

Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, 2023

V3Det: Vast Vocabulary Visual Detection Dataset.

[BibT_eX]

[DOI]

Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

A Dynamic Partial Reconfigurable CGRA Framework for Multi-Kernel Applications.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Field Programmable Technology, 2023

E<sup>2</sup>-ACE: An Energy-Efficient Reconfigurable Crypto-Accelerator with Agile End-to-End Toolchain.

[BibT_eX]

[DOI]

Proceedings of the International Conference on Field Programmable Technology, 2023

PP-MET: A Real-World Personalized Prompt Based Meeting Transcription System.

[BibT_eX]

[DOI]

Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022

MINI: Mining Implicit Novel Instances for Few-Shot Object Detection.

[BibT_eX]

[DOI]

CoRR, 2022

The USTC-Ximalaya System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription (M2met) Challenge.

[BibT_eX]

[DOI]

Proceedings of the IEEE International Conference on Acoustics, 2022

TRAM: An Open-Source Template-based Reconfigurable Architecture Modeling Framework.

[BibT_eX]

[DOI]

Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

2021

WSSOD: A New Pipeline for Weakly- and Semi-Supervised Object Detection.

[BibT_eX]

[DOI]

CoRR, 2021

Few-Shot Object Detection via Association and DIscrimination.

[BibT_eX]

[DOI]

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Seesaw Loss for Long-Tailed Instance Segmentation.

[BibT_eX]

[DOI]

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020

Feature Pyramid Grids.

[BibT_eX]

[DOI]

Christoph Feichtenhofer

CoRR, 2020

Side-Aware Boundary Localization for More Precise Object Detection.

[BibT_eX]

[DOI]

Proceedings of the Computer Vision - ECCV 2020, 2020

Prime Sample Attention in Object Detection.

[BibT_eX]

[DOI]

Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019

Speaker Direction-of-Arrival Estimation Based on Orthogonal Dipoles.

[BibT_eX]

[DOI]

Circuits Syst. Signal Process., 2019

MMDetection: Open MMLab Detection Toolbox and Benchmark.

[BibT_eX]

[DOI]

CoRR, 2019

Investigation of Cost Function for Supervised Monaural Speech Separation.

[BibT_eX]

[DOI]

Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2017

Speaker Direction-of-Arrival Estimation Based on Frequency-Independent Beampattern.

[BibT_eX]

[DOI]

Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017

Yuhang Cao

Timeline

Legend:

Links

On csauthors.net:

Bibliography

Loading...