Tao Jin

Orcid: 0000-0003-3564-1628

Affiliations:
  • Zhejiang University, Hangzhou, China


According to our database1, Tao Jin authored at least 42 papers between 2019 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Multi-Granularity Relational Attention Network for Audio-Visual Question Answering.
IEEE Trans. Circuits Syst. Video Technol., August, 2024

GTADT: Gated tone-sensitive acne grading via augmented domain transfer.
Multim. Tools Appl., 2024

Speech Watermarking with Discrete Intermediate Representations.
CoRR, 2024

A Wander Through the Multimodal Landscape: Efficient Transfer Learning via Low-rank Sequence Multimodal Adapter.
CoRR, 2024

Classifier-guided Gradient Modulation for Enhanced Multimodal Learning.
CoRR, 2024

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup.
CoRR, 2024

EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration.
CoRR, 2024

SyncTalklip: Highly Synchronized Lip-Readable Speaker Generation with Multi-Task Learning.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Low-rank Prompt Interaction for Continual Vision-Language Retrieval.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Boosting Speech Recognition Robustness to Modality-Distortion with Contrast-Augmented Prompts.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Calibrating Prompt from History for Continual Vision-Language Retrieval and Grounding.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

EAGER: Two-Stream Generative Recommender with Behavior-Semantic Collaboration.
Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

Non-confusing Generation of Customized Concepts in Diffusion Models.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

AudioVSR: Enhancing Video Speech Recognition with Audio Data.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

MPOD123: One Image to 3D Content Generation Using Mask-Enhanced Progressive Outline-to-Detail Optimization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Uni-Dubbing: Zero-Shot Speech Synthesis from Visual Articulation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Rethinking the Multimodal Correlation of Multimodal Sequential Learning via Generalizable Attentional Results Alignment.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Multimodal Prompt Learning with Missing Modalities for Sentiment Analysis and Emotion Recognition.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
Electromagnetic Imaging Boosted Visual Object Recognition Under Difficult Visual Conditions.
IEEE Trans. Geosci. Remote. Sens., 2023

TransFace: Unit-Based Audio-Visual Speech Synthesizer for Talking Head Translation.
CoRR, 2023

Chat-3D v2: Bridging 3D Scene and Large Language Models with Object Identifiers.
CoRR, 2023

Extending Multi-modal Contrastive Representations.
CoRR, 2023

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition.
CoRR, 2023

Rethinking Missing Modality Learning from a Decoding Perspective.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Exploring Group Video Captioning with Efficient Relational Approximation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

MixSpeech: Cross-Modality Self-Learning with Audio-Visual Stream Mixup for Visual Speech Translation and Recognition.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Weakly-Supervised Spoken Video Grounding via Semantic Interaction Learning.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Semantic-conditioned Dual Adaptation for Cross-domain Query-based Visual Segmentation.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

TAVT: Towards Transferable Audio-Visual Text Generation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Contrastive Token-Wise Meta-Learning for Unseen Performer Visual Temporal-Aligned Translation.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

OpenSR: Open-Modality Speech Recognition via Maintaining Multi-Modality Alignment.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Interaction augmented transformer with decoupled decoding for video captioning.
Neurocomputing, 2022

MC-SLT: Towards Low-Resource Signer-Adaptive Sign Language Translation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Prior Knowledge and Memory Enriched Transformer for Sign Language Translation.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

2021
Generalizable Multi-linear Attention Network.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Contrastive Disentangled Meta-Learning for Signer-Independent Sign Language Translation.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

2020
SBAT: Video Captioning with Sparse Boundary-Aware Transformer.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Dual Low-Rank Multimodal Fusion.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

2019
Recurrent convolutional video captioning with global and local attention.
Neurocomputing, 2019

Low-Rank HOCA: Efficient High-Order Cross-Modal Attention for Video Captioning.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019


  Loading...