Yanghao Li

Orcid: 0000-0002-5274-1367

According to our database1, Yanghao Li authored at least 66 papers between 2014 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Improve Vision Language Model Chain-of-thought Reasoning.
CoRR, 2024

Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs.
CoRR, 2024

MM-Ego: Towards Building Egocentric Multimodal LLMs.
CoRR, 2024

EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing.
CoRR, 2024

MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning.
CoRR, 2024

Byzantine-Robust and Communication-Efficient Distributed Learning via Compressed Momentum Filtering.
CoRR, 2024

Apple Intelligence Foundation Language Models.
CoRR, 2024

Idempotence and Perceptual Image Compression.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

R-MAE: Regions Meet Masked Autoencoders.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Bandwidth-Efficient Inference for Nerual Image Compression.
Proceedings of the IEEE International Conference on Acoustics, 2024

2023
Bandwidth-efficient Inference for Neural Image Compression.
CoRR, 2023

Conditional Perceptual Quality Preserving Image Compression.
CoRR, 2023

Evaluating Strong Idempotence of Image Codec.
CoRR, 2023

Idempotent Learned Image Compression with Right-Inverse.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

MAViL: Masked Audio-Video Learners.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles.
Proceedings of the International Conference on Machine Learning, 2023

Diffusion Models as Masked Autoencoders.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Your Camera Improves Your Point Cloud Compression.
Proceedings of the IEEE International Conference on Acoustics, 2023

Where is my Wallet? Modeling Object Proposal Sets for Egocentric Visual Query Localization.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Scaling Language-Image Pre-Training via Masking.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Bit Allocation using Optimization.
CoRR, 2022

Negative Frames Matter in Egocentric Visual Query 2D Localization.
CoRR, 2022

Masked Autoencoders As Spatiotemporal Learners.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Rate Control for Learned Video Compression.
Proceedings of the IEEE International Conference on Acoustics, 2022

Exploring Plain Vision Transformer Backbones for Object Detection.
Proceedings of the Computer Vision - ECCV 2022, 2022

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Reversible Vision Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

MViTv2: Improved Multiscale Vision Transformers for Classification and Detection.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Masked Autoencoders Are Scalable Vision Learners.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022


2021
Improved Multiscale Vision Transformers for Classification and Detection.
CoRR, 2021

Benchmarking Detection Transfer Learning with Vision Transformers.
CoRR, 2021

Ego4D: Around the World in 3, 000 Hours of Egocentric Video.
CoRR, 2021

PyTorchVideo: A Deep Learning Library for Video Understanding.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Multiscale Vision Transformers.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Learning Model-Blind Temporal Denoisers without Ground Truths.
Proceedings of the IEEE International Conference on Acoustics, 2021

Decision Tree Based Inter Partition Termination For Av1 Encoding.
Proceedings of the IEEE International Conference on Acoustics, 2021

Ego-Exo: Transferring Visual Representations From Third-Person to First-Person Videos.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
A Benchmark Dataset and Comparison Study for Multi-modal Human Action Analytics.
ACM Trans. Multim. Comput. Commun. Appl., 2020

Modality Compensation Network: Cross-Modal Adaptation for Action Recognition.
IEEE Trans. Image Process., 2020

Ego-Topo: Environment Affordances From Egocentric Video.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Multi-Modality Multi-Task Recurrent Neural Network for Online Action Detection.
IEEE Trans. Circuits Syst. Video Technol., 2019

SimpleDet: A Simple and Versatile Distributed Framework for Object Detection and Instance Recognition.
J. Mach. Learn. Res., 2019

Scale-Aware Trident Networks for Object Detection.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Temporal Bilinear Networks for Video Action Recognition.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Adaptive Batch Normalization for practical domain adaptation.
Pattern Recognit., 2018

Click versus Share: A Feature-driven Study of Micro-Video Popularity and Virality in Social Media.
Proceedings of the 2018 SIAM International Conference on Data Mining, 2018

Rethinking Fusion Baselines for Multi-modal Human Action Recognition.
Proceedings of the Advances in Multimedia Information Processing - PCM 2018, 2018

A Deep Convolutional Network Based Supervised Coarse-to-Fine Algorithm for Optical Flow Measurement.
Proceedings of the 20th IEEE International Workshop on Multimedia Signal Processing, 2018

2017
PKU-MMD: A Large Scale Benchmark for Continuous Multi-Modal Human Action Understanding.
CoRR, 2017

Characterizing the Click and Share Dynamics of Micro-Videos in Social Media.
Proceedings of the Posters and Demos Proceedings of the Conference of the ACM Special Interest Group on Data Communication, 2017

PKU-MMD: A Large Scale Benchmark for Skeleton-Based Human Action Understanding.
Proceedings of the Workshop on Visual Analysis in Smart and Connected Communities, 2017

Demystifying Neural Style Transfer.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

Revisiting Batch Normalization For Practical Domain Adaptation.
Proceedings of the 5th International Conference on Learning Representations, 2017

Deep joint discriminative learning for vehicle re-identification and retrieval.
Proceedings of the 2017 IEEE International Conference on Image Processing, 2017

Factorized Bilinear Models for Image Recognition.
Proceedings of the IEEE International Conference on Computer Vision, 2017

Online action detection and forecast via Multitask deep Recurrent Neural Networks.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Temporal Perceptive Network for Skeleton-Based Action Recognition.
Proceedings of the British Machine Vision Conference 2017, 2017

2016
Joint sub-band based neighbor embedding for image super-resolution.
Proceedings of the 2016 IEEE International Conference on Acoustics, 2016

Online Human Action Detection Using Joint Classification-Regression Recurrent Neural Networks.
Proceedings of the Computer Vision - ECCV 2016, 2016

Co-Occurrence Feature Learning for Skeleton Based Action Recognition Using Regularized Deep LSTM Networks.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
Multi-pose face hallucination via neighbor embedding for facial components.
Proceedings of the 2015 IEEE International Conference on Image Processing, 2015

Neighborhood regression for edge-preserving image super-resolution.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

Face hallucination based on neighbor embedding via illumination adaptation.
Proceedings of the Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2015

2014
Image transformation using limited reference with application to photo-sketch synthesis.
Proceedings of the 2014 IEEE Visual Communications and Image Processing Conference, 2014


  Loading...