Shentong Mo

Orcid: 0000-0003-3308-9585

According to our database1, Shentong Mo authored at least 63 papers between 2018 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
Context Autoencoder for Self-supervised Representation Learning.
Int. J. Comput. Vis., January, 2024

BSTG-Trans: A Bayesian Spatial-Temporal Graph Transformer for Long-Term Pose Forecasting.
IEEE Trans. Multim., 2024

Aligning Audio-Visual Joint Representations with an Agentic Workflow.
CoRR, 2024

Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning.
CoRR, 2024

Rethinking Positive Pairs in Contrastive Learning.
CoRR, 2024

Multi-scale Multi-instance Visual Sound Localization and Segmentation.
CoRR, 2024

MultiMed: Massively Multimodal and Multitask Medical Understanding.
CoRR, 2024

IoT-LM: Large Multisensory Language Models for the Internet of Things.
CoRR, 2024

Semantic Grouping Network for Audio Source Separation.
CoRR, 2024

Efficient 3D Shape Generation via Diffusion Mamba with Bidirectional SSMs.
CoRR, 2024

DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture.
CoRR, 2024

Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and Video Generation.
CoRR, 2024

Unified Video-Language Pre-training with Synchronized Audio.
CoRR, 2024

A Large-scale Medical Visual Task Adaptation Benchmark.
CoRR, 2024

DailyMAE: Towards Pretraining Masked Autoencoders in One Day.
CoRR, 2024

Text-to-Audio Generation Synchronized with Videos.
CoRR, 2024

LSPT: Long-term Spatial Prompt Tuning for Visual Representation Learning.
CoRR, 2024

We Choose to Go to Space: Agent-driven Human and Multi-Robot Collaboration in Microgravity.
CoRR, 2024

Tree of Uncertain Thoughts Reasoning for Large Language Models.
Proceedings of the IEEE International Conference on Acoustics, 2024

Audio-Synchronized Visual Animation.
Proceedings of the Computer Vision - ECCV 2024, 2024

Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation.
Proceedings of the Computer Vision - ECCV 2024, 2024

Audio-Visual Generalized Zero-Shot Learning the Easy Way.
Proceedings of the Computer Vision - ECCV 2024, 2024

Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions Through Masked Modeling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

2023
High-Modality Multimodal Transformer: Quantifying Modality & Interaction Heterogeneity for High-Modality Representation Learning.
Trans. Mach. Learn. Res., 2023

Beyond Accuracy: Statistical Measures and Benchmark for Evaluation of Representation from Self-Supervised Learning.
CoRR, 2023

MultiIoT: Towards Large-scale Multisensory Learning for the Internet of Things.
CoRR, 2023

Exploring Data Augmentations on Self-/Semi-/Fully- Supervised Pre-trained Models.
CoRR, 2023

Masked Momentum Contrastive Learning for Zero-shot Semantic Understanding.
CoRR, 2023

DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation.
CoRR, 2023

DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment.
CoRR, 2023

AV-SAM: Segment Anything Model Meets Audio-Visual Localization and Segmentation.
CoRR, 2023

CAVL: Learning Contrastive and Adaptive Representations of Vision and Language.
CoRR, 2023

Variantional autoencoder with decremental information bottleneck for disentanglement.
CoRR, 2023

Multi-level Contrastive Learning for Self-Supervised Vision Transformers.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

Representation Disentanglement in Generative Models with Contrastive Learning.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

DiT-3D: Exploring Plain Diffusion Transformers for 3D Shape Generation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Weakly-Supervised Audio-Visual Segmentation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

DiffComplete: Diffusion-based Generative 3D Shape Completion.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

A Unified Audio-Visual Learning Framework for Localization, Separation, and Recognition.
Proceedings of the International Conference on Machine Learning, 2023

Audio-Visual Class-Incremental Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Class-Incremental Grouping Network for Continual Audio-Visual Learning.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Audio-Visual Grouping Network for Sound Localization from Mixtures.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Variational Autoencoders with Decremental Information Bottleneck for Disentanglement.
Proceedings of the 34th British Machine Vision Conference 2023, 2023

2022
Object-wise Masked Autoencoders for Fast Pre-training.
CoRR, 2022

Multi-Scale Self-Contrastive Learning with Hard Negative Mining for Weakly-Supervised Query-based Video Grounding.
CoRR, 2022

HighMMT: Towards Modality and Task Generalization for High-Modality Representation Learning.
CoRR, 2022

Multi-modal Grouping Network for Weakly-Supervised Audio-Visual Video Parsing.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

A Closer Look at Weakly-Supervised Audio-Visual Source Localization.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

Localizing Visual Sounds the Easy Way.
Proceedings of the Computer Vision - ECCV 2022, 2022

Unitail: Detecting, Reading, and Matching in Retail Scene.
Proceedings of the Computer Vision - ECCV 2022, 2022

Rethinking Prototypical Contrastive Learning through Alignment, Uniformity and Correlation.
Proceedings of the 33rd British Machine Vision Conference 2022, 2022

2021
Multi-modal Self-supervised Pre-training for Regulatory Genome Across Cell Types.
CoRR, 2021

Learning by Examples Based on Multi-level Optimization.
CoRR, 2021

An Empirical Study of Uncertainty Gap for Disentangling Factors.
Proceedings of the Trustworthy AI'21: Proceedings of the 1st International Workshop on Trustworthy AI for Multimedia Computing, 2021

OsGG-Net: One-step Graph Generation Network for Unbiased Head Pose Estimation.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021

EVA-GCN: Head Pose Estimation Based on Graph Convolutional Networks.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

Long-Term Head Pose Forecasting Conditioned on the Gaze-Guiding Prior.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2021

Point3D: tracking actions as moving points with 3D CNNs.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

Siamese Prototypical Contrastive Learning.
Proceedings of the 32nd British Machine Vision Conference 2021, 2021

2020
Towards Improving Spatiotemporal Action Recognition in Videos.
CoRR, 2020

Automatic Speech Verification Spoofing Detection.
CoRR, 2020

2018
SERS spectrum of RHB solution measured on different patterns.
Dataset, November, 2018


  Loading...