Jinfa Huang

Orcid: 0000-0002-0081-4106

According to our database1, Jinfa Huang authored at least 24 papers between 1987 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
A multimodal multidomain multilingual medical foundation model for zero shot clinical diagnosis.
npj Digit. Medicine, 2025

Evolver: Chain-of-Evolution Prompting to Boost Large Multimodal Models for Hateful Meme Detection.
Proceedings of the 31st International Conference on Computational Linguistics, 2025

2024
Identity-Preserving Text-to-Video Generation by Frequency Decomposition.
CoRR, 2024

Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension.
CoRR, 2024

Autoregressive Models in Vision: A Survey.
CoRR, 2024

A Survey of Camouflaged Object Detection and Beyond.
CoRR, 2024

MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval.
CoRR, 2024

MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators.
CoRR, 2024

LLMBind: A Unified Modality-Task Integration Framework.
CoRR, 2024

ChronoMagic-Bench: A Benchmark for Metamorphic Evaluation of Text-to-Time-lapse Video Generation.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024

Continuous-Multiple Image Outpainting in One-Step via Positional Query and A Diffusion-based Approach.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inference.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
Weakly-Supervised 3D Spatial Reasoning for Text-Based Visual Question Answering.
IEEE Trans. Image Process., 2023

GPT-4V(ision) as A Social Media Analysis Engine.
CoRR, 2023

Improving Scene Graph Generation with Superpixel-Based Interaction Learning.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Toward 3D Spatial Reasoning for Human-like Text-based Visual Question Answering.
CoRR, 2022

Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

2020
Guoym at SemEval-2020 Task 8: Ensemble-based Classification of Visuo-Lingual Metaphor in Memes.
Proceedings of the Fourteenth Workshop on Semantic Evaluation, 2020

LDNN: Linguistic Knowledge Injectable Deep Neural Network for Group Cohesiveness Understanding.
Proceedings of the ICMI '20: International Conference on Multimodal Interaction, 2020

1987
A Chinese Mandarin speech output system.
Proceedings of the European Conference on Speech Technology, 1987


  Loading...