Siteng Huang

Orcid: 0000-0002-9735-1186

According to our database1, Siteng Huang authored at least 27 papers between 2019 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
QUART-Online: Latency-Free Large Multimodal Language Model for Quadruped Robot Learning.
CoRR, 2024

Score and Distribution Matching Policy: Advanced Accelerated Visuomotor Policies via Matched Distillation.
CoRR, 2024

CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction.
CoRR, 2024

Rethinking Token Reduction in MLLMs: Towards a Unified Paradigm for Training-Free Acceleration.
CoRR, 2024

Accelerating Diffusion Transformers with Token-wise Feature Caching.
CoRR, 2024

Focus-Consistent Multi-Level Aggregation for Compositional Zero-Shot Learning.
CoRR, 2024

M<sup>2</sup>IST: Multi-Modal Interactive Side-Tuning for Memory-efficient Referring Expression Comprehension.
CoRR, 2024

Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference.
CoRR, 2024

Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference.
CoRR, 2024

ProFD: Prompt-Guided Feature Disentangling for Occluded Person Re-Identification.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

DARA: Domain- and Relation-Aware Adapters Make Parameter-Efficient Tuning for Visual Grounding.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

VGDIFFZERO: Text-To-Image Diffusion Models Can Be Zero-Shot Visual Grounders.
Proceedings of the IEEE International Conference on Acoustics, 2024

PiTe: Pixel-Temporal Alignment for Large Video-Language Model.
Proceedings of the Computer Vision - ECCV 2024, 2024

QUAR-VLA: Vision-Language-Action Model for Quadruped Robots.
Proceedings of the Computer Vision - ECCV 2024, 2024

Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Check, Locate, Rectify: A Training-Free Layout Calibration System for Text- to- Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Prompt-Based Distribution Alignment for Unsupervised Domain Adaptation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Troika: Multi-Path Cross-Modal Traction for Compositional Zero-Shot Learning.
CoRR, 2023

Reference-Limited Compositional Zero-Shot Learning.
Proceedings of the 2023 ACM International Conference on Multimedia Retrieval, 2023

VoP: Text-Video Co-Operative Prompt Tuning for Cross-Modal Retrieval.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Domain Generalized Few-Shot Image Classification via Meta Regularization Network.
Proceedings of the IEEE International Conference on Acoustics, 2022

Tree Structure-Aware Few-Shot Image Classification via Hierarchical Aggregation.
Proceedings of the Computer Vision - ECCV 2022, 2022

2021
HINFShot: A Challenge Dataset for Few-Shot Node Classification in Heterogeneous Information Network.
Proceedings of the ICMR '21: International Conference on Multimedia Retrieval, 2021

Pareto Self-Supervised Training for Few-Shot Learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Attributes-Guided and Pure-Visual Attention Alignment for Few-Shot Recognition.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2019
DSANet: Dual Self-Attention Network for Multivariate Time Series Forecasting.
Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019


  Loading...