Shizhe Chen

Orcid: 0000-0002-7313-9703

According to our database1, Shizhe Chen authored at least 83 papers between 2014 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
SOD-diffusion: Salient Object Detection via Diffusion-Based Image Generators.
Comput. Graph. Forum, October, 2024

Towards Generalizable Vision-Language Robotic Manipulation: A Benchmark and LLM-guided 3D Policy.
CoRR, 2024

Conan-embedding: General Text Embedding with More and Better Negative Samples.
CoRR, 2024

ViViDex: Learning Vision-based Dexterous Manipulation from Human Videos.
CoRR, 2024

Think-Program-reCtify: 3D Situated Reasoning with Large Language Models.
CoRR, 2024

SUGAR : Pre-training 3D Visual Representations for Robotics.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
A Seawater Salinity Sensor Based on Optimized Long Period Fiber Grating in the Dispersion Turning Point.
Sensors, 2023

Translating Text Synopses to Video Storyboards.
CoRR, 2023

TeViS: Translating Text Synopses to Video Storyboards.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Robust Visual Sim-to-Real Transfer for Robotic Manipulation.
IROS, 2023

Object Goal Navigation with Recursive Implicit Maps.
IROS, 2023

Explore and Tell: Embodied Visual Captioning in 3D Environments.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

gSDF: Geometry-Driven Signed Distance Functions for 3D Hand-Object Reconstruction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation.
Proceedings of the Conference on Robot Learning, 2023

InfoMetIC: An Informative Metric for Reference-free Image Caption Evaluation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
Enhancing Neural Machine Translation With Dual-Side Multimodal Awareness.
IEEE Trans. Multim., 2022

Language Conditioned Spatial Relation Reasoning for 3D Object Grounding.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Few-Shot Action Recognition with Hierarchical Matching and Contrastive Learning.
Proceedings of the Computer Vision - ECCV 2022, 2022

Learning from Unlabeled 3D Environments for Vision-and-Language Navigation.
Proceedings of the Computer Vision - ECCV 2022, 2022

VRDFormer: End-to-End Video Visual Relation Detection with Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Instruction-driven history-aware policies for robotic manipulations.
Proceedings of the Conference on Robot Learning, 2022

2021
Development of Capacitive Rain Gauge for Marine Environment.
J. Sensors, 2021

Team RUC_AIM3 Technical Report at ActivityNet 2021: Entities Object Localization.
CoRR, 2021

WenLan: Bridging Vision and Language by Large-Scale Multi-Modal Pre-Training.
CoRR, 2021

A Continuous Space Location Model and a Particle Swarm Optimization-Based Heuristic Algorithm for Maximizing the Allocation of Ocean-Moored Buoys.
IEEE Access, 2021

History Aware Multimodal Transformer for Vision-and-Language Navigation.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Question-controlled Text-aware Image Captioning.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

Product-oriented Machine Translation with Cross-modal Cross-lingual Pre-training.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

MMPT'21: International Joint Workshop on Multi-Modal Pre-Training for Multimedia Understanding.
Proceedings of the ICMR '21: International Conference on Multimedia Retrieval, 2021

Airbert: In-domain Pretraining for Vision-and-Language Navigation.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Elaborative Rehearsal for Zero-shot Action Recognition.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Sketch, Ground, and Refine: Top-Down Dense Video Captioning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Towards Diverse Paragraph Captioning for Untrimmed Videos.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
The End-of-End-to-End: A Video Understanding Pentathlon Challenge (2020).
CoRR, 2020

2nd Place Solution to ECCV 2020 VIPriors Object Detection Challenge.
CoRR, 2020

Team RUC_AIM3 Technical Report at Activitynet 2020 Task 2: Exploring Sequential Events Detection for Dense Video Captioning.
CoRR, 2020

YouMakeup VQA Challenge: Towards Fine-grained Action Understanding in Domain-Specific Videos.
CoRR, 2020

RUC_AIM3 at TRECVID 2020: Ad-hoc Video Search & Video to Text Description.
Proceedings of the 2020 TREC Video Retrieval Evaluation, 2020

ICECAP: Information Concentrated Entity-aware Image Captioning.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020

Skeleton-Based Interactive Graph Network For Human Object Interaction Detection.
Proceedings of the IEEE International Conference on Multimedia and Expo, 2020

Fine-Grained Video-Text Retrieval With Hierarchical Graph Reasoning.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

Say As You Wish: Fine-Grained Control of Image Caption Generation With Abstract Scene Graphs.
Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020

2019
Generating Video Descriptions With Latent Topic Guidance.
IEEE Trans. Multim., 2019

Integrating Temporal and Spatial Attentions for VATEX Video Captioning Challenge 2019.
CoRR, 2019

Activitynet 2019 Task 3: Exploring Contexts for Dense Captioning Events in Videos.
CoRR, 2019

RUC_AIM3 at TRECVID 2019: Video to Text.
Proceedings of the 2019 TREC Video Retrieval Evaluation, 2019

Visual Relation Detection with Multi-Level Attention.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Relation Understanding in Videos.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Adversarial Domain Adaption for Multi-Cultural Dimensional Emotion Recognition in Dyadic Interactions.
Proceedings of the 9th International on Audio/Visual Emotion Challenge and Workshop, 2019

Neural Storyboard Artist: Visualizing Stories with Coherent Image Sequences.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Speech Emotion Recognition in Dyadic Dialogues with Attentive Interaction Modeling.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

From Words to Sentences: A Progressive Learning Approach for Zero-resource Machine Translation with Visual Pivots.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Cross-culture Multimodal Emotion Recognition with Adversarial Learning.
Proceedings of the IEEE International Conference on Acoustics, 2019

YouMakeup: A Large-Scale Domain-Specific Multimodal Dataset for Fine-Grained Semantic Comprehension.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Semi-supervised Multimodal Emotion Recognition with Improved Wasserstein GANs.
Proceedings of the 2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, 2019

Unsupervised Bilingual Lexicon Induction from Mono-Lingual Multimodal Data.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
RUC+CMU: System Report for Dense Captioning Events in Videos.
CoRR, 2018

Informedia @ TRECVID 2018: Ad-hoc Video Search, Video to Text Description, Activities in Extended video.
Proceedings of the 2018 TREC Video Retrieval Evaluation, 2018

Multimodal Dimensional and Continuous Emotion Recognition in Dyadic Video Interactions.
Proceedings of the Advances in Multimedia Information Processing - PCM 2018, 2018

iMakeup: Makeup Instructional Video Dataset for Fine-Grained Dense Video Captioning.
Proceedings of the Advances in Multimedia Information Processing - PCM 2018, 2018

Multi-modal Multi-cultural Dimensional Continues Emotion Recognition in Dyadic Interactions.
Proceedings of the 2018 on Audio/Visual Emotion Challenge and Workshop, 2018

Class-aware Self-Attention for Audio Event Recognition.
Proceedings of the 2018 ACM on International Conference on Multimedia Retrieval, 2018

RUC at MediaEval 2018: Visual and Textual Features Exploration for Predicting Media Memorability.
Proceedings of the Working Notes Proceedings of the MediaEval 2018 Workshop, 2018

2017
Informedia @ TRECVID 2017.
Proceedings of the 2017 TREC Video Retrieval Evaluation, 2017

Knowing Yourself: Improving Video Caption via In-depth Recap.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Multimodal Multi-task Learning for Dimensional and Continuous Emotion Recognition.
Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, Mountain View, CA, USA, October 23, 2017

Video Captioning with Guidance of Multimodal Latent Topics.
Proceedings of the 2017 ACM on Multimedia Conference, 2017

Generating Video Descriptions with Topic Guidance.
Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval, 2017

RUC at MediaEval 2017: Predicting Media Interestingness Task.
Proceedings of the Working Notes Proceedings of the MediaEval 2017 Workshop co-located with the Conference and Labs of the Evaluation Forum (CLEF 2017), 2017

Emotion recognition with multimodal features and temporal models.
Proceedings of the 19th ACM International Conference on Multimodal Interaction, 2017

Facial Action Units Detection with Multi-Features and -AUs Fusion.
Proceedings of the 12th IEEE International Conference on Automatic Face & Gesture Recognition, 2017

2016
Describing Videos using Multi-modal Fusion.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

Multi-modal Conditional Attention Fusion for Dimensional Emotion Prediction.
Proceedings of the 2016 ACM Conference on Multimedia Conference, 2016

RUC at MediaEval 2016 Emotional Impact of Movies Task: Fusion of Multimodal Features.
Proceedings of the Working Notes Proceedings of the MediaEval 2016 Workshop, 2016

RUC at MediaEval 2016: Predicting Media Interestingness Task.
Proceedings of the Working Notes Proceedings of the MediaEval 2016 Workshop, 2016

Video emotion recognition in the wild based on fusion of multimodal features.
Proceedings of the 18th ACM International Conference on Multimodal Interaction, 2016

Emotion Recognition in Videos via Fusing Multimodal Features.
Proceedings of the Pattern Recognition - 7th Chinese Conference, 2016

2015
基于声学特征的语言情感识别 (Speech Emotion Recognition Based on Acoustic Features).
计算机科学, 2015

Multi-modal Dimensional Emotion Recognition using Recurrent Neural Networks.
Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, 2015

Speech emotion recognition with acoustic and lexical features.
Proceedings of the 2015 IEEE International Conference on Acoustics, 2015

2014
Speech emotion classification using acoustic features.
Proceedings of the 9th International Symposium on Chinese Spoken Language Processing, 2014


  Loading...