Yiyi Zhou

Orcid: 0000-0002-5110-4526

According to our database1, Yiyi Zhou authored at least 68 papers between 2015 and 2025.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2025
M3ixup: A multi-modal data augmentation approach for image captioning.
Pattern Recognit., 2025

2024
Towards Language-Guided Visual Recognition via Dynamic Convolutions.
Int. J. Comput. Vis., January, 2024

A Survivor in the Era of Large-Scale Pretraining: An Empirical Study of One-Stage Referring Expression Comprehension.
IEEE Trans. Multim., 2024

Deep hybrid transformer network for robust modulation classification in wireless communications.
Knowl. Based Syst., 2024

Fit and Prune: Fast and Training-free Visual Token Pruning for Multi-modal Large Language Models.
CoRR, 2024

Routing Experts: Learning to Route Dynamic Experts in Multi-modal Large Language Models.
CoRR, 2024

Image Captioning via Dynamic Path Customization.
CoRR, 2024

Not All Attention is Needed: Parameter and Computation Efficient Transfer Learning for Multi-modal Large Language Models.
CoRR, 2024

Feast Your Eyes: Mixture-of-Resolution Adaptation for Multimodal Large Language Models.
CoRR, 2024

Deep Instruction Tuning for Segment Anything Model.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

QueryMatch: A Query-based Contrastive Learning Framework for Weakly Supervised Visual Grounding.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Adapting Pre-trained Language Models to Vision-Language Tasksvia Dynamic Visual Prompting.
Proceedings of the International Joint Conference on Neural Networks, 2024

Fast Text-to-3D-Aware Face Generation and Manipulation via Direct Cross-modal Mapping and Geometric Regularization.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

Towards Omni-supervised Referring Expression Segmentation.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2024

MMAPS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Towards Efficient Diffusion-Based Image Editing with Instant Attention Masks.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Towards local visual modeling for image captioning.
Pattern Recognit., June, 2023

A Real-Time Global Inference Network for One-Stage Referring Expression Comprehension.
IEEE Trans. Neural Networks Learn. Syst., 2023

Knowing What it is: Semantic-Enhanced Dual Attention Transformer.
IEEE Trans. Multim., 2023

Multi-Branch Distance-Sensitive Self-Attention Network for Image Captioning.
IEEE Trans. Multim., 2023

NICE: Improving Panoptic Narrative Detection and Segmentation with Cascading Collaborative Learning.
CoRR, 2023

M3PS: End-to-End Multi-Grained Multi-Modal Attribute-Aware Product Summarization in E-commerce.
CoRR, 2023

Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer.
CoRR, 2023

Approximated Prompt Tuning for Vision-Language Pre-trained Models.
CoRR, 2023

Adapting Pre-trained Language Models to Vision-Language Tasks via Dynamic Visual Prompting.
CoRR, 2023

Towards End-to-end Semi-supervised Learning for One-stage Object Detection.
CoRR, 2023

Towards Efficient Visual Adaption via Structural Re-parameterization.
CoRR, 2023

HSM-QA: Question Answering System Based on Hierarchical Semantic Matching.
IEEE Access, 2023

Semantic-Guided Selective Representation for Image Captioning.
IEEE Access, 2023

Parameter and Computation Efficient Transfer Learning for Vision-Language Pre-trained Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

PixelFace+: Towards Controllable Face Generation and Manipulation with Text Descriptions and Segmentation Masks.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

A three-circle triangle model of Bearing-Only Passive Locating of the UAVs.
Proceedings of the 8th International Conference on Information Systems Engineering, 2023

RefTeacher: A Strong Baseline for Semi-Supervised Referring Expression Comprehension.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

RefCLIP: A Universal Teacher for Weakly Supervised Referring Expression Comprehension.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Towards Real-Time Panoptic Narrative Grounding by an End-to-End Grounding Network.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Knowledge-Driven Generative Adversarial Network for Text-to-Image Synthesis.
IEEE Trans. Multim., 2022

Towards Lightweight Transformer Via Group-Wise Transformation for Vision-and-Language Tasks.
IEEE Trans. Image Process., 2022

Knowing What to Learn: A Metric-Oriented Focal Mechanism for Image Captioning.
IEEE Trans. Image Process., 2022

Plenty is Plague: Fine-Grained Learning for Visual Question Answering.
IEEE Trans. Pattern Anal. Mach. Intell., 2022

CycleTrans: Learning Neutral yet Discriminative Features for Visible-Infrared Person Re-Identification.
CoRR, 2022

What Goes beyond Multi-modal Fusion in One-stage Referring Expression Comprehension: An Empirical Study.
CoRR, 2022

PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation.
CoRR, 2022

What Hinders Perceptual Quality of PSNR-oriented Methods?
CoRR, 2022

Make Sharpness-Aware Minimization Stronger: A Sparsified Perturbation Approach.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Towards Open-Ended Text-to-Face Generation, Combination and Manipulation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Learning Dynamic Prior Knowledge for Text-to-Face Pixel Synthesis.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

SeqTR: A Simple Yet Universal Network for Visual Grounding.
Proceedings of the Computer Vision - ECCV 2022, 2022

PixelFolder: An Efficient Progressive Pixel Synthesis Network for Image Generation.
Proceedings of the Computer Vision - ECCV 2022, 2022

DIFNet: Boosting Visual Information Flow for Image Captioning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Active Teacher for Semi-Supervised Object Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Uncovering Media Bias via Social Network Learning.
ACM Trans. Intell. Syst. Technol., 2021

Towards Language-guided Visual Recognition via Dynamic Convolutions.
CoRR, 2021

TRAR: Routing the Attention Spans in Transformer for Visual Question Answering.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

RSTNet: Captioning With Adaptive Attention on Visual and Non-Visual Words.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
Consumer Search and Automobile Dealer Colocation.
Manag. Sci., 2020

K-armed Bandit based Multi-Modal Network Architecture Search for Visual Question Answering.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Cascade Grouped Attention Network for Referring Expression Segmentation.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Attacking Image Captioning Towards Accuracy-Preserving Target Words Removal.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Multi-Task Collaborative Network for Joint Referring Expression Comprehension and Segmentation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Social Media Based Topic Modeling for Smart Campus: A Deep Topical Correlation Analysis Method.
IEEE Access, 2019

Towards Cross-modality Topic Modelling via Deep Topical Correlation Analysis.
Proceedings of the IEEE International Conference on Acoustics, 2019

Dynamic Capsule Attention for Visual Question Answering.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Free VQA Models from Knowledge Inertia by Pairwise Inconformity Learning.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2017
Bayesian Estimation of a Dynamic Model of Two-Sided Markets: Application to the U.S. Video Game Industry.
Manag. Sci., 2017

More Than An Answer: Neural Pivot Network for Visual Qestion Answering.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

2016
Survey of visual sentiment prediction for social media analysis.
Frontiers Comput. Sci., 2016

2015
Design of Personalized News Comments Recommendation System.
Proceedings of the Data Science - Second International Conference, 2015


  Loading...