Junwei Liang

Orcid: 0000-0003-2219-5569

Affiliations:
  • Hong Kong University of Science and Technology Guangzhou (HKUST-GZ), Guangzhou, China
  • Carnegie Mellon University, Pittsburgh, PA, USA (PhD 2021)
  • Tencent Youtu Lab, Shenzhen, China


According to our database1, Junwei Liang authored at least 60 papers between 2014 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Open-Vocabulary 3D Semantic Segmentation with Text-to-Image Diffusion Models.
CoRR, 2024

Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with 3D Semantic Maps.
CoRR, 2024

Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation.
CoRR, 2024

Vision-Language Models Meet Meteorology: Developing Models for Extreme Weather Events Detection with Heatmaps.
CoRR, 2024

Contrastive Imitation Learning for Language-guided Multi-Task Robotic Manipulation.
CoRR, 2024

Improving Gloss-free Sign Language Translation by Reducing Representation Density.
CoRR, 2024

FinTextQA: A Dataset for Long-form Financial Question Answering.
CoRR, 2024

VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting.
CoRR, 2024

Adversarially Masked Video Consistency for Unsupervised Domain Adaptation.
CoRR, 2024

Prioritized Semantic Learning for Zero-shot Instance Navigation.
CoRR, 2024

ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition.
CoRR, 2024

An Examination of the Compositionality of Large Generative Vision-Language Models.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2024

FinTextQA: A Dataset for Long-form Financial Question Answering.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
GeoDeformer: Geometric Deformable Transformer for Action Recognition.
CoRR, 2023

AdaFocus: Towards End-to-end Weakly Supervised Learning for Long-Video Action Understanding.
CoRR, 2023

PostRainBench: A comprehensive benchmark and a new model for precipitation forecasting.
CoRR, 2023

PatchMixer: A Patch-Mixing Architecture for Long-Term Time Series Forecasting.
CoRR, 2023

TFNet: Exploiting Temporal Cues for Fast and Accurate LiDAR Semantic Segmentation.
CoRR, 2023

Spatial-Temporal Alignment Network for Action Recognition.
CoRR, 2023

STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

2022
Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Multi-dataset Training of Transformers for Robust Action Recognition.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

A Transformer-based System for Action Spotting in Soccer Videos.
Proceedings of the MMSports@MM 2022: Proceedings of the 5th International ACM Workshop on Multimedia Content Analysis in Sports, 2022


Stargazer: A Transformer-based Driver Action Detection System for Intelligent Transportation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022

2021
MSNet: A Multilevel Instance Segmentation Network for Natural Disaster Damage Assessment in Aerial Videos.
Proceedings of the IEEE Winter Conference on Applications of Computer Vision, 2021

Weakly Supervised 3D Semantic Segmentation Using Cross-Image Consensus and Inter-Voxel Affinity Relations.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

2020
Spatial-Temporal Alignment Network for Action Recognition and Detection.
CoRR, 2020

Joint Analysis and Prediction of Human Actions and Paths in Video.
CoRR, 2020

SimAug: Learning Robust Representations from 3D Simulation for Pedestrian Trajectory Prediction in Unseen Cameras.
CoRR, 2020

Argus: Efficient Activity Detection System for Extended Video Analysis.
Proceedings of the IEEE Winter Applications of Computer Vision Workshops, 2020

SimAug: Learning Robust Representations from Simulation for Trajectory Prediction.
Proceedings of the Computer Vision - ECCV 2020, 2020

The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Focal Visual-Text Attention for Memex Question Answering.
IEEE Trans. Pattern Anal. Mach. Intell., 2019

Technical Report of the DAISY System - Shooter Localization, Models, Interface, and Beyond.
CoRR, 2019

Minding the Gaps in a Video Action Analysis Pipeline.
Proceedings of the IEEE Winter Applications of Computer Vision Workshops, 2019

MMVG-INF-Etrol@TRECVID 2019: Activities in Extended Video.
Proceedings of the 2019 TREC Video Retrieval Evaluation, 2019

Shooter Localization Using Social Media Videos.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Peeking Into the Future: Predicting Future Person Activities and Locations in Videos.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

Shooter Localization Using Videos in the Wild.
Proceedings of the 2019 International Conference on Content-Based Multimedia Indexing, 2019

2018
Multimodal Co-Training for Selecting Good Examples from Webly Labeled Video.
CoRR, 2018

Informedia @ TRECVID 2018: Ad-hoc Video Search, Video to Text Description, Activities in Extended video.
Proceedings of the 2018 TREC Video Retrieval Evaluation, 2018

Multimodal Filtering of Social Media for Temporal Monitoring and Event Analysis.
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, 2018

Focal Visual-Text Attention for Visual Question Answering.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

2017
MemexQA: Visual Memex Question Answering.
CoRR, 2017

Informedia @ TRECVID 2017.
Proceedings of the 2017 TREC Video Retrieval Evaluation, 2017

Leveraging Multi-modal Prior Knowledge for Large-scale Concept Learning in Noisy Web Data.
Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, 2017

Temporal localization of audio events for conflict monitoring in social media.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Synchronization for multi-perspective videos in the wild.
Proceedings of the 2017 IEEE International Conference on Acoustics, 2017

Webly-Supervised Learning of Multimodal Video Detectors.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

An Event Reconstruction Tool for Conflict Monitoring Using Social Media.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017

2016
Exploiting Multi-modal Curriculum in Noisy Web Data for Large-scale Concept Learning.
CoRR, 2016

Informedia @ TRECVID 2016.
Proceedings of the 2016 TREC Video Retrieval Evaluation, 2016

Video Description Generation using Audio and Visual Cues.
Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval, 2016

Generating Natural Video Descriptions via Multimodal Processing.
Proceedings of the 17th Annual Conference of the International Speech Communication Association, 2016

Learning to Detect Concepts from Webly-Labeled Video Data.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

2015
Semantic Concept Annotation For User Generated Videos Using Soundtracks.
Proceedings of the 5th ACM on International Conference on Multimedia Retrieval, 2015

Detecting semantic concepts in consumer videos using audio.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

RUC-Tencent at ImageCLEF 2015: Concept Detection, Localization and Sentence Generation.
Proceedings of the Working Notes of CLEF 2015, 2015

2014
Semantic Concept Annotation of Consumer Videos at Frame-Level Using Audio.
Proceedings of the Advances in Multimedia Information Processing - PCM 2014, 2014


  Loading...