Gang Yu

Orcid: 0000-0001-5570-2710

Affiliations:
  • StepFun
  • Tencent, Shanghai, China (2019 - 2024)
  • Megvii, Beijing, China (2014 - 2019)
  • Nanyang Technological University, Singapore (PhD 2014)
  • Shanghai Jiao Tong University, China (former)


According to our database1, Gang Yu authored at least 121 papers between 2009 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning.
IEEE Trans. Pattern Anal. Mach. Intell., November, 2024

Enhancing quality of pose-varied face restoration with local weak feature sensing and GAN prior.
Neural Comput. Appl., January, 2024

Scene123: One Prompt to 3D Scene Generation via Video-Assisted and Consistency-Enhanced MAE.
CoRR, 2024

Lightweight Model Pre-training via Language Guided Knowledge Distillation.
CoRR, 2024

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers.
CoRR, 2024

MeshXL: Neural Coordinate Field for Generative 3D Foundation Models.
CoRR, 2024

Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation.
CoRR, 2024

Generative Motion Stylization within Canonical Motion Space.
CoRR, 2024

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment.
CoRR, 2024

MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies.
CoRR, 2024

Disentangled Pre-training for Image Matting.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024

Generative Motion Stylization of Cross-structure Characters within Canonical Motion Space.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

TapMo: Shape-aware Motion Generation of Skeleton-free Characters.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

MotionChain: Conversational Motion Controllers via Multimodal Prompts.
Proceedings of the Computer Vision - ECCV 2024, 2024

Paint3D: Paint Anything 3D With Lighting-Less Texture Diffusion Models.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

PM-INR: Prior-Rich Multi-Modal Implicit Large-Scale Scene Neural Representation.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

IT3D: Improved Text-to-3D Generation with Explicit View Synthesis.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
UAV-Served Energy Harvesting-Enabled M2M Networks for Green Industry - A Perspective of Energy Efficient Resource Management Scheme.
IEEE Trans. Green Commun. Netw., December, 2023

DCNet: Large-Scale Point Cloud Semantic Segmentation With Discriminative and Efficient Feature Aggregation.
IEEE Trans. Circuits Syst. Video Technol., August, 2023

AppAgent: Multimodal Agents as Smartphone Users.
CoRR, 2023

M3DBench: Let's Instruct Large Models with Multi-modal 3D Prompts.
CoRR, 2023

FaceStudio: Put Your Face Everywhere in Seconds.
CoRR, 2023

ShapeGPT: 3D Shape Generation with A Unified Multi-modal Language Model.
CoRR, 2023

ChartLlama: A Multimodal LLM for Chart Understanding and Generation.
CoRR, 2023

VQ-NeRF: Vector Quantization Enhances Implicit Neural Representations.
CoRR, 2023

Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering.
CoRR, 2023

Vote2Cap-DETR++: Decoupling Localization and Describing for End-to-End 3D Dense Captioning.
CoRR, 2023

IT3D: Improved Text-to-3D Generation with Explicit View Synthesis.
CoRR, 2023

StableLLaVA: Enhanced Visual Instruction Tuning with Synthesized Image-Dialogue Data.
CoRR, 2023

Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation.
CoRR, 2023

StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity 3D Avatar Generation.
CoRR, 2023

A Large-Scale Outdoor Multi-modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction.
CoRR, 2023

End-to-End 3D Dense Captioning with Vote2Cap-DETR.
CoRR, 2023

Michelangelo: Conditional 3D Shape Generation based on Shape-Image-Text Aligned Latent Representation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

MotionGPT: Human Motion as a Foreign Language.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

PDF: Point Diffusion Implicit Function for Large-scale Scene Neural Representation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Capturing the Motion of Every Joint: 3D Human Pose and Shape Estimation with Independent Tokens.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

SeaFormer: Squeeze-enhanced Axial Transformer for Mobile Semantic Segmentation.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Robust Geometry-Preserving Depth Estimation Using Differentiable Rendering.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

A Large-Scale Outdoor Multi-modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Metric3D: Towards Zero-shot Metric 3D Prediction from A Single Image.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

STAR Loss: Reducing Semantic Ambiguity in Facial Landmark Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

End-to-End 3D Dense Captioning with Vote2Cap-DETR.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Executing your Commands via Motion Diffusion in Latent Space.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Stochastic Game for Resource Management in Cellular Zero-Touch Deterministic Industrial M2M Networks.
IEEE Wirel. Commun. Lett., 2022

Sample-Centric Feature Generation for Semi-Supervised Few-Shot Learning.
IEEE Trans. Image Process., 2022

Executing your Commands via Motion Diffusion in Latent Space.
CoRR, 2022

Learning Variational Motion Prior for Video-based Motion Capture.
CoRR, 2022

Hierarchical Normalization for Robust Monocular Depth Estimation.
CoRR, 2022

Resource allocation for UAV-aided energy harvesting-powered D2D communications: A reinforcement learning-based scheme.
Ad Hoc Networks, 2022

Hierarchical Normalization for Robust Monocular Depth Estimation.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Coordinates Are NOT Lonely - Codebook Prior Helps Implicit Neural 3D representations.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

D &D: Learning Human Dynamics from Dynamic Camera.
Proceedings of the Computer Vision - ECCV 2022, 2022


TopFormer: Token Pyramid Transformer for Mobile Semantic Segmentation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022


2021
Generative Adversarial LSTM Networks Learning for Resource Allocation in UAV-Served M2M Communications.
IEEE Wirel. Commun. Lett., 2021

Human pose estimation and its application to action recognition: A survey.
J. Vis. Commun. Image Represent., 2021

BiSeNet V2: Bilateral Network with Guided Aggregation for Real-Time Semantic Segmentation.
Int. J. Comput. Vis., 2021

Sketch Me A Video.
CoRR, 2021

Fine-grained Identity Preserving Landmark Synthesis for Face Reenactment.
CoRR, 2021

Shuffle Transformer with Feature Alignment for Video Face Parsing.
CoRR, 2021

Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer.
CoRR, 2021

Fast and Accurate Single-Image Depth Estimation on Mobile Devices, Mobile AI 2021 Challenge: Report.
CoRR, 2021

Object-aware Long-short-range Spatial Alignment for Few-Shot Fine-Grained Image Classification.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Attribute-specific Control Units in StyleGAN for Fine-grained Image Manipulation.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

HSEGAN: Hair Synthesis and Editing Using Structure-Adaptive Normalization on Generative Adversarial Network.
Proceedings of the 2021 IEEE International Conference on Image Processing, 2021

A Simple Baseline for Fast and Accurate Depth Estimation on Mobile Devices.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

2020
AnchorFace: An Anchor-based Facial Landmark Detector Across Large Poses.
CoRR, 2020

Context Prior for Scene Segmentation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

State-Aware Tracker for Real-Time Video Object Segmentation.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

SiamFC++: Towards Robust and Accurate Visual Tracking with Target Estimation Guidelines.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
Real-Time Semantic Segmentation via Multiply Spatial Fusion Network.
CoRR, 2019

Double Anchor R-CNN for Human Detection in a Crowd.
CoRR, 2019

Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection.
CoRR, 2019

Shape Robust Text Detection with Progressive Scale Expansion Network.
CoRR, 2019

ThunderNet: Towards Real-time Generic Object Detection.
CoRR, 2019

WIDER Face and Pedestrian Challenge 2018: Methods and Results.
CoRR, 2019

Rethinking on Multi-Stage Networks for Human Pose Estimation.
CoRR, 2019

Learnable Tree Filter for Structure-preserving Feature Transform.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019

Efficient and Accurate Arbitrary-Shaped Text Detection With Pixel Aggregation Network.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

ThunderNet: Towards Real-Time Generic Object Detection on Mobile Devices.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Objects365: A Large-Scale, High-Quality Dataset for Object Detection.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Shape Robust Text Detection With Progressive Scale Expansion Network.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

TACNet: Transition-Aware Context Network for Spatio-Temporal Action Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

An End-To-End Network for Panoptic Segmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Modeling Local Geometric Structure of 3D Point Clouds Using Geo-CNN.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Scene Text Detection with Supervised Pyramid Context Network.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

Attention-Based Multi-Context Guiding for Few-Shot Semantic Segmentation.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
CrowdHuman: A Benchmark for Detecting Human in a Crowd.
CoRR, 2018

SFace: An Efficient Network for Face Detection in Large Scale Variations.
CoRR, 2018

DetNet: A Backbone network for Object Detection.
CoRR, 2018

Selecting Informative Frames for Action Recognition with Partial Observations.
Proceedings of the 2018 IEEE International Conference on Image Processing, 2018

BiSeNet: Bilateral Segmentation Network for Real-Time Semantic Segmentation.
Proceedings of the Computer Vision - ECCV 2018, 2018

DetNet: Design Backbone for Object Detection.
Proceedings of the Computer Vision - ECCV 2018, 2018

Associating Inter-image Salient Instances for Weakly Supervised Semantic Segmentation.
Proceedings of the Computer Vision - ECCV 2018, 2018

Learning a Discriminative Feature Network for Semantic Segmentation.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

MegDet: A Large Mini-Batch Object Detector.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Cascaded Pyramid Network for Multi-Person Pose Estimation.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

R-FCN++: Towards Accurate Region-Based Fully Convolutional Networks for Object Detection.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
SOT for MOT.
CoRR, 2017

Light-Head R-CNN: In Defense of Two-Stage Object Detector.
CoRR, 2017

Face Attention Network: An Effective Face Detector for the Occluded Faces.
CoRR, 2017

Large Kernel Matters - Improve Semantic Segmentation by Global Convolutional Network.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2015
Propagative Hough Voting for Human Activity Detection and Recognition.
IEEE Trans. Circuits Syst. Video Technol., 2015

Fast action proposals for human action detection and search.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015

2014
Scalable forest hashing for fast similarity search.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2014

Discriminative Orderlet Mining for Real-Time Recognition of Human-Object Interaction.
Proceedings of the Computer Vision - ACCV 2014, 2014

2013
Action Search by Example Using Randomized Visual Vocabularies.
IEEE Trans. Image Process., 2013

2012
Predicting human activities using spatio-temporal structure of interest points.
Proceedings of the 20th ACM Multimedia Conference, MM '12, Nara, Japan, October 29, 2012

Propagative Hough Voting for Human Activity Recognition.
Proceedings of the Computer Vision - ECCV 2012, 2012

Randomized Spatial Partition for Scene Recognition.
Proceedings of the Computer Vision - ECCV 2012, 2012

2011
Fast Action Detection via Discriminative Random Forest Voting and Top-K Subvolume Search.
IEEE Trans. Multim., 2011

Robust object tracking with occlusion handle.
Neural Comput. Appl., 2011

Real-time human action search using random forest based hough voting.
Proceedings of the 19th International Conference on Multimedia 2011, Scottsdale, AZ, USA, November 28, 2011

Unsupervised random forest indexing for fast action search.
Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition, 2011

2009
Robust Incremental Subspace Learning for Object Tracking.
Proceedings of the Neural Information Processing, 16th International Conference, 2009

Illumination Invariant Object Tracking with Incremental Subspace Learning.
Proceedings of the Fifth International Conference on Image and Graphics, 2009


  Loading...