Zhi-Qi Cheng

Orcid: 0000-0002-1720-2085

According to our database1, Zhi-Qi Cheng authored at least 64 papers between 2016 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
DPDEdit: Detail-Preserved Diffusion Models for Multimodal Fashion Image Editing.
CoRR, 2024

FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing.
CoRR, 2024

Combo: Co-speech holistic 3D human motion generation and efficient customizable adaptation in harmony.
CoRR, 2024

SHIELD: LLM-Driven Schema Induction for Predictive Analytics in EV Battery Supply Chain Disruptions.
CoRR, 2024

Prioritize Alignment in Dataset Distillation.
CoRR, 2024

Robust Adaptation of Foundation Models with Black-Box Visual Prompting.
CoRR, 2024

MetaDesigner: Advancing Artistic Typography through AI-Driven, User-Centric, and Multilingual WordArt Synthesis.
CoRR, 2024

Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions.
CoRR, 2024

Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning.
CoRR, 2024

MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion.
CoRR, 2024

MM-TTS: A Unified Framework for Multimodal, Prompt-Induced Emotional Text-to-Speech Synthesis.
CoRR, 2024

LEAF: Unveiling Two Sides of the Same Coin in Semi-supervised Facial Expression Recognition.
CoRR, 2024

IVAC-P2L: Leveraging Irregular Repetition Priors for Improving Video Action Counting.
CoRR, 2024

DyRoNet: Dynamic Routing and Low-Rank Adapters for Autonomous Driving Streaming Perception.
CoRR, 2024

WordArt Designer API: User-Driven Artistic Typography Synthesis with Large Language Models on ModelScope.
CoRR, 2024

MIPS at SemEval-2024 Task 3: Multimodal Emotion-Cause Pair Extraction in Conversations with Multimodal Language Models.
Proceedings of the 18th International Workshop on Semantic Evaluation, 2024

SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition.
Proceedings of the 2nd International Workshop on Multimodal and Responsible Affective Computing, 2024

DCPT: Darkness Clue-Prompted Tracking in Nighttime UAVs.
Proceedings of the IEEE International Conference on Robotics and Automation, 2024

FaceChain-ImagineID: Freely Crafting High-Fidelity Diverse Talking Faces from Disentangled Audio.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MotionEditor: Editing Video Motion via Content-Aware Diffusion.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

ProS: Prompting-to-Simulate Generalized Knowledge for Universal Cross-Domain Retrieval.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

BlockGCN: Redefine Topology Awareness for Skeleton-Based Action Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Music2P: A Multi-Modal AI-Driven Tool for Simplifying Album Cover Design.
Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024

2023
Tracking with Human-Intent Reasoning.
CoRR, 2023

Towards Calibrated Robust Fine-Tuning of Vision-Language Models.
CoRR, 2023

WordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models.
CoRR, 2023

Refined Temporal Pyramidal Compression-and-Amplification Transformer for 3D Human Pose Estimation.
CoRR, 2023

Overcoming Topology Agnosticism: Enhancing Skeleton-Based Action Recognition through Redefined Skeletal Topology Awareness.
CoRR, 2023

ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules.
CoRR, 2023

Improving Anomaly Segmentation with Multi-Granularity Cross-Domain Alignment.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Debunking Free Fusion Myth: Online Multi-view Anomaly Detection with Disentangled Product-of-Experts Modeling.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

PoSynDA: Multi-Hypothesis Pose Synthesis Domain Adaptation for Robust 3D Human Pose Estimation.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

KeyPosS: Plug-and-Play Facial Landmark Detection through GPS-Inspired True-Range Multilateration.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

DAMO-StreamNet: Optimizing Streaming Perception in Autonomous Driving.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

HDFormer: High-order Directed Transformer for 3D Human Pose Estimation.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

Implicit Temporal Modeling with Learnable Alignment for Video Recognition.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

ChartReader: A Unified Framework for Chart Derendering and Comprehension without Heuristic Rules.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Longshortnet: Exploring Temporal and Semantic Features Fusion In Streaming Perception.
Proceedings of the IEEE International Conference on Acoustics, 2023

Procontext: Exploring Progressive Context Transformer for Tracking.
Proceedings of the IEEE International Conference on Acoustics, 2023

WordArt Designer: User-Driven Artistic Typography Synthesis using Large Language Models.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: EMNLP 2023, 2023

2022
Hypergraph Transformer for Skeleton-based Action Recognition.
CoRR, 2022

CrossNet: Boosting Crowd Counting with Localization.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Real-time Semantic Segmentation with Parallel Multiple Views Feature Augmentation.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

GSRFormer: Grounded Situation Recognition Transformer with Alternate Semantic Attention Refinement.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Rethinking Spatial Invariance of Convolutional Networks for Object Counting.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
DB-LSTM: Densely-connected Bi-directional LSTM for human action recognition.
Neurocomputing, 2021

Subspace Representation Learning for Few-shot Image Classification.
CoRR, 2021

2020
Generating Person Images with Appearance-aware Pose Stylizer.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

Stacked Pooling for Boosting Scale Invariance of Crowd Counting.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

2019
Improving the Learning of Multi-column Convolutional Neural Network for Crowd Counting.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Learning Spatial Awareness to Improve Crowd Counting.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

2018
Personalized clothing recommendation combining user social circle and fashion style consistency.
Multim. Tools Appl., 2018

Perceiving Physical Equation by Observing Visual Scenarios.
CoRR, 2018

Stacked Pooling: Improving Crowd Counting by Boosting Scale Invariance.
CoRR, 2018

Video2Shop: Exactly Matching Clothes in Videos to Online Shopping Images.
CoRR, 2018

Multi-View Image Generation from a Single-View.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

GNAS: A Greedy Neural Architecture Search Method for Multi-Attribute Learning.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

Learning to Transfer: Generalizable Attribute Learning with Multitask Neural Model Search.
Proceedings of the 2018 ACM Multimedia Conference on Multimedia Conference, 2018

2017
Video eCommerce++: Toward Large Scale Online Video Advertising.
IEEE Trans. Multim., 2017

Multi-View Image Generation from a Single-View.
CoRR, 2017

VIREO @ TRECVID 2017: Video-to-Text, Ad-hoc Video Search, and Video hyperlinking.
Proceedings of the 2017 TREC Video Retrieval Evaluation, 2017

On the Selection of Anchors and Targets for Video Hyperlinking.
Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, 2017

Video2Shop: Exact Matching Clothes in Videos to Online Shopping Images.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016
Video eCommerce: Towards Online Video Advertising.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016


  Loading...