Youngjae Yu

Orcid: 0000-0002-5867-0782

According to our database1, Youngjae Yu authored at least 64 papers between 2016 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
CLARA: Classifying and Disambiguating User Commands for Reliable Interactive Robotic Agents.
IEEE Robotics Autom. Lett., February, 2024

Towards Visual Text Design Transfer Across Languages.
CoRR, 2024

C<sup>2</sup>: Scalable Auto-Feedback for LLM-based Chart Generation.
CoRR, 2024

CANVAS: Commonsense-Aware Navigation System for Intuitive Human-Robot Interaction.
CoRR, 2024

DEEPTalk: Dynamic Emotion Embedding for Probabilistic Speech-Driven 3D Face Animation.
CoRR, 2024

Layout-and-Retouch: A Dual-stage Framework for Improving Diversity in Personalized Image Generation.
CoRR, 2024

Scalp Diagnostic System With Label-Free Segmentation and Training-Free Image Translation.
CoRR, 2024

Do LLMs Have Distinct and Consistent Personality? TRAIT: Personality Testset designed for LLMs with Psychometrics.
CoRR, 2024

i-SRT: Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective Judgment.
CoRR, 2024

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models.
CoRR, 2024

Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset.
CoRR, 2024

SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

How to Train Your Fact Verifier: Knowledge Transfer with Multimodal Open Models.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Cactus: Towards Psychological Counseling Conversations using Cognitive Behavioral Theory.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Selective Vision is the Challenge for Visual Reasoning: A Benchmark for Visual Argument Understanding.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you!
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Language Models as Compilers: Simulating Pseudocode Execution Improves Algorithmic Reasoning in Language Models.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

ActionSwitch: Class-Agnostic Detection of Simultaneous Actions in Streaming Videos.
Proceedings of the Computer Vision - ECCV 2024, 2024

Aligning Large Language Models by On-Policy Self-Judgment.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Pearl: A Review-driven Persona-Knowledge Grounded Conversational Recommendation Dataset.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Can Large Language Models be Good Emotional Supporter? Mitigating Preference Bias on Emotional Support Conversation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024

2023
Long Story Short: a Summarize-then-Search Method for Long Video Question Answering.
CoRR, 2023

Multimodal C4: An Open, Billion-scale Corpus of Images Interleaved with Text.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Localized Symbolic Knowledge Distillation for Visual Commonsense Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Zero-shot Active Visual Search (ZAVIS): Intelligent Object Search for Robotic Assistants.
Proceedings of the IEEE International Conference on Robotics and Automation, 2023

Champagne: Learning Real-world Conversation from Large-Scale Web Videos.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

VLIS: Unimodal Language Models Guide Multimodal Language Generation.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Dialogue Chain-of-Thought Distillation for Commonsense-aware Conversational Agents.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Fusing Pre-Trained Language Models with Multimodal Prompts through Reinforcement Learning.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023

Long Story Short: a Summarize-then-Search Method for Prompt-Based Long Video Question Answering.
Proceedings of the 34th British Machine Vision Conference 2023, 2023

Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-Step.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization.
CoRR, 2022

Learning Joint Representation of Human Motion and Language.
CoRR, 2022

Active Visual Search in the Wild.
CoRR, 2022

Multimodal Knowledge Alignment with Reinforcement Learning.
CoRR, 2022

Connecting the Dots between Audio and Text without Parallel Data through Visual Knowledge Transfer.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

NeuroLogic A*esque Decoding: Constrained Text Generation with Lookahead Heuristics.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

ProsocialDialog: A Prosocial Backbone for Conversational Agents.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

MERLOT RESERVE: Neural Script Knowledge through Vision and Language and Sound.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2021
Cycled Compositional Learning between Images and Text.
CoRR, 2021

Automatic Curation of Large-Scale Datasets for Audio-Visual Representation Learning.
CoRR, 2021

MERLOT: Multimodal Neural Script Knowledge Models.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Self-Supervised Learning of Compressed Video Representations.
Proceedings of the 9th International Conference on Learning Representations, 2021

Parameter Efficient Multimodal Transformers for Video Representation Learning.
Proceedings of the 9th International Conference on Learning Representations, 2021

Pano-AVQA: Grounded Audio-Visual Question Answering on 360° Videos.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021

Transitional Adaptation of Pretrained Models for Visual Storytelling.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Dual Compositional Learning in Interactive Image Retrieval.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021

2020
CurlingNet: Compositional Learning between Images and Text for Fashion IQ Data.
CoRR, 2020

Character Grounding and Re-identification in Story of Videos and Text Descriptions.
Proceedings of the Computer Vision - ECCV 2020, 2020

Augmenting Data for Sarcasm Detection with Unlabeled Conversation Context.
Proceedings of the Second Workshop on Figurative Language Processing, 2020

2019
Video Question Answering with Spatio-Temporal Reasoning.
Int. J. Comput. Vis., 2019

2018
A Joint Sequence Fusion Model for Video Question Answering and Retrieval.
Proceedings of the Computer Vision - ECCV 2018, 2018

A Memory Network Approach for Story-Based Temporal Summarization of 360° Videos.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

A Deep Ranking Model for Spatio-Temporal Highlight Detection From a 360◦ Video.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset.
CoRR, 2017

TimesVector: a vectorized clustering approach to the analysis of time series transcriptome data from multiple phenotypes.
Bioinform., 2017

End-to-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

Supervising Neural Attention Models for Video Captioning by Human Gaze Data.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017

2016
Video Captioning and Retrieval Models with Semantic Attention.
CoRR, 2016


  Loading...