We stand with Ukraine

We stand with Ukraine

Jie Lei

Affiliations:

Meta AI, Seattle, WA, USA
University of North Carolina at Chapel Hill, Department of Computer Science, NC, USA (PhD 2022)

According to our database¹, Jie Lei authored at least 22 papers between 2017 and 2023.

Collaborative distances:

Dijkstra number² of four.
Erdős number³ of three.

Timeline

Legend:

Book

In proceedings

Article

PhD thesis

Dataset

Other

Links

Online presence:

on jayleicn.github.io
on scholar.google.com

On csauthors.net:

Bibliography

2023

PERCEIVER-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

Vision Transformers are Parameter-Efficient Audio-Visual Learners.

[BibT_eX]

[DOI]

,

,

,

,

Gedas Bertasius

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

VindLU: A Recipe for Effective Video-and-Language Pretraining.

[BibT_eX]

[DOI]

,

,

,

David J. Crandall

,

,

Gedas Bertasius

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Revealing Single Frame Bias for Video-and-Language Learning.

[BibT_eX]

[DOI]

,

,

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022

LoopITR: Combining Dual and Cross Encoder Architectures for Image-Text Retrieval.

[BibT_eX]

[DOI]

,

,

,

,

,

,

CoRR, 2022

Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners.

[BibT_eX]

[DOI]

Zhenhailong Wang

,

,

,

,

,

,

,

,

,

,

,

,

Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

EclipSE: Efficient Long-Range Video Retrieval Using Sight and Sound.

[BibT_eX]

[DOI]

,

,

,

Gedas Bertasius

Proceedings of the Computer Vision - ECCV 2022, 2022

2021

QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries.

[BibT_eX]

[DOI]

,

,

CoRR, 2021

VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning.

[BibT_eX]

[DOI]

,

,

,

CoRR, 2021

VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation.

[BibT_eX]

[DOI]

,

,

,

,

,

,

,

,

,

William Yang Wang

,

,

,

,

,

Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks 1, 2021

Detecting Moments and Highlights in Videos via Natural Language Queries.

[BibT_eX]

[DOI]

,

,

Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

DeCEMBERT: Learning from Noisy Instructional Videos via Dense Captions and Entropy Minimization.

[BibT_eX]

[DOI]

,

,

Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Unifying Vision-and-Language Tasks via Text Generation.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 38th International Conference on Machine Learning, 2021

Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Less Is More: ClipBERT for Video-and-Language Learning via Sparse Sampling.

[BibT_eX]

[DOI]

,

,

,

,

,

,

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

mTVR: Multilingual Moment Retrieval in Videos.

[BibT_eX]

[DOI]

,

,

Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020

What is More Likely to Happen Next? Video-and-Language Future Event Prediction.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the Computer Vision - ECCV 2020, 2020

TVQA+: Spatio-Temporal Grounding for Video Question Answering.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning.

[BibT_eX]

[DOI]

,

,

,

,

,

Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2018

TVQA: Localized, Compositional Video Question Answering.

[BibT_eX]

[DOI]

,

,

,

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31, 2018

2017

Weakly Supervised Image Classification with Coarse and Fine Labels.

[BibT_eX]

[DOI]

,

,

Proceedings of the 14th Conference on Computer and Robot Vision, 2017

Loading...