Yuhang Cao

Orcid: 0009-0008-3627-590X

According to our database1, Yuhang Cao authored at least 35 papers between 2017 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models.
CoRR, 2024

PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction.
CoRR, 2024

SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree.
CoRR, 2024

Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate.
CoRR, 2024

BroadWay: Boost Your Text-to-Video Generation Model in a Training-free Way.
CoRR, 2024

SCA: Highly Efficient Semantic-Consistent Unrestricted Adversarial Attack.
CoRR, 2024

A General-Purpose Device for Interaction with LLMs.
CoRR, 2024

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output.
CoRR, 2024

V3Det Challenge 2024 on Vast Vocabulary and Open Vocabulary Object Detection: Methods and Results.
CoRR, 2024

InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD.
CoRR, 2024

DualFocus: Integrating Macro and Micro Perspectives in Multi-modal Large Language Models.
CoRR, 2024

InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model.
CoRR, 2024

Ximalaya ASDR System for ICASSP 2024 in-Car Multi-Channel (ICMC) ASR Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2024

Diacorrect: Error Correction Back-End for Speaker Diarization.
Proceedings of the IEEE International Conference on Acoustics, 2024

MDCRA: A Reconfigurable Accelerator Framework for Multiple Dataflow Lanes.
Proceedings of the 35th IEEE International Conference on Application-specific Systems, 2024

2023
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition.
CoRR, 2023

Exploring the Power of Cross-Contextual Large Language Model in Mimic Emotion Prediction.
Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, 2023

Multimodal Cross-Lingual Features and Weight Fusion for Cross-Cultural Humor Detection.
Proceedings of the 4th on Multimodal Sentiment Analysis Challenge and Workshop: Mimicked Emotions, 2023

V3Det: Vast Vocabulary Visual Detection Dataset.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

A Dynamic Partial Reconfigurable CGRA Framework for Multi-Kernel Applications.
Proceedings of the International Conference on Field Programmable Technology, 2023

E<sup>2</sup>-ACE: An Energy-Efficient Reconfigurable Crypto-Accelerator with Agile End-to-End Toolchain.
Proceedings of the International Conference on Field Programmable Technology, 2023

PP-MET: A Real-World Personalized Prompt Based Meeting Transcription System.
Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop, 2023

2022
MINI: Mining Implicit Novel Instances for Few-Shot Object Detection.
CoRR, 2022

The USTC-Ximalaya System for the ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription (M2met) Challenge.
Proceedings of the IEEE International Conference on Acoustics, 2022

TRAM: An Open-Source Template-based Reconfigurable Architecture Modeling Framework.
Proceedings of the 32nd International Conference on Field-Programmable Logic and Applications, 2022

2021
WSSOD: A New Pipeline for Weakly- and Semi-Supervised Object Detection.
CoRR, 2021

Few-Shot Object Detection via Association and DIscrimination.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Seesaw Loss for Long-Tailed Instance Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
Feature Pyramid Grids.
CoRR, 2020

Side-Aware Boundary Localization for More Precise Object Detection.
Proceedings of the Computer Vision - ECCV 2020, 2020

Prime Sample Attention in Object Detection.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Speaker Direction-of-Arrival Estimation Based on Orthogonal Dipoles.
Circuits Syst. Signal Process., 2019

MMDetection: Open MMLab Detection Toolbox and Benchmark.
CoRR, 2019

Investigation of Cost Function for Supervised Monaural Speech Separation.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

2017
Speaker Direction-of-Arrival Estimation Based on Frequency-Independent Beampattern.
Proceedings of the 18th Annual Conference of the International Speech Communication Association, 2017


  Loading...