2025
ReSpec: Relevance and Specificity Grounded Online Filtering for Learning on Video-Text Data Streams.
CoRR, April, 2025
How to Move Your Dragon: Text-to-Motion Synthesis for Large-Vocabulary Objects.
CoRR, March, 2025
Distilling Reinforcement Learning Algorithms for In-Context Model-Based Planning.
CoRR, February, 2025
When Meta-Learning Meets Online and Continual Learning: A Survey.
IEEE Trans. Pattern Anal. Mach. Intell., January, 2025
Is a Peeled Apple Still Red? Evaluating LLMs' Ability for Conceptual Combination with Property Type.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025
Behavior-SD: Behaviorally Aware Spoken Dialogue Generation with Large Language Models.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025
Meta-Continual Learning of Neural Fields.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
ViSAGe: Video-to-Spatial Audio Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
2024
See It All: Contextualized Late Aggregation for 3D Dense Captioning.
CoRR, 2024
Sample Selection via Contrastive Fragmentation for Noisy Label Regression.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
FedAvP: Augment Local Data via Shared Policy in Federated Learning.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Learning to Continually Learn with the Bayesian Principle.
Proceedings of the Forty-first International Conference on Machine Learning, 2024
Compositional Conservatism: A Transductive Approach in Offline Reinforcement Learning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
Text2Chart31: Instruction Tuning for Chart Generation with Automatic Feedback.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
DynamicER: Resolving Emerging Mentions to Dynamic Entities for RAG.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Spherical World-Locking for Audio-Visual Localization in Egocentric Videos.
Proceedings of the Computer Vision - ECCV 2024, 2024
Bi-directional Contextual Attention for 3D Dense Captioning.
Proceedings of the Computer Vision - ECCV 2024, 2024
ESR-NeRF: Emissive Source Reconstruction Using LDR Multi-View Images.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Who Wrote this Code? Watermarking for Code Generation.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
GrowOVER: How Can LLMs Adapt to Growing Real-World Knowledge?
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2024
TimeChara: Evaluating Point-in-Time Character Hallucination of Role-Playing Large Language Models.
Proceedings of the Findings of the Association for Computational Linguistics, 2024
See It All: Contextualized Late Aggregation for 3D Dense Captioning.
Proceedings of the Findings of the Association for Computational Linguistics, 2024
2023
KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application.
CoRR, 2023
Recasting Continual Learning as Sequence Modeling.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Benchmark of Machine Learning Force Fields for Semiconductor Simulations: Datasets, Metrics, and Comparative Analysis.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Federated Learning via Meta-Variational Dropout.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
Dense 2D-3D Indoor Prediction with Sound via Aligned Cross-Modal Distillation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
EP2P-Loc: End-to-End 3D Point to 2D Pixel Localization for Large-Scale Visual Localization.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
mRedditSum: A Multimodal Abstractive Summarization Dataset of Reddit Threads with Images.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
Can Language Models Laugh at YouTube Short-form Videos?
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023
Fusing Pre-Trained Language Models with Multimodal Prompts through Reinforcement Learning.
,
,
,
,
,
,
,
,
,
,
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
Recursion of Thought: A Divide-and-Conquer Approach to Multi-Context Reasoning with Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
KoSBI: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Applications.
Proceedings of the The 61st Annual Meeting of the Association for Computational Linguistics: Industry Track, 2023
SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created through Human-Machine Collaboration.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
MPCHAT: Towards Multimodal Persona-Grounded Conversation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
2022
Guest Editorial Introduction to the Special Section on Video and Language.
IEEE Trans. Circuits Syst. Video Technol., 2022
SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization.
,
,
,
,
,
,
,
,
,
,
CoRR, 2022
Panoramic Vision Transformer for Saliency Detection in 360° Videos.
CoRR, 2022
LAVOLUTION: Measurement of Non-target Structural Displacement Calibrated by Structured Light.
CoRR, 2022
Multimodal Knowledge Alignment with Reinforcement Learning.
,
,
,
,
,
,
,
,
,
,
CoRR, 2022
Constrained GPI for Zero-Shot Transfer in Reinforcement Learning.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022
Lipschitz-constrained Unsupervised Skill Discovery.
Proceedings of the Tenth International Conference on Learning Representations, 2022
Neural Variational Dropout Processes.
Proceedings of the Tenth International Conference on Learning Representations, 2022
ProsocialDialog: A Prosocial Backbone for Conversational Agents.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Panoramic Vision Transformer for Saliency Detection in 360$^\circ $ Videos.
Proceedings of the Computer Vision - ECCV 2022, 2022
On Convergence of Lookahead in Smooth Games.
Proceedings of the International Conference on Artificial Intelligence and Statistics, 2022
2021
On the Virality of Animated GIFs on Tumblr.
CoRR, 2021
Cycled Compositional Learning between Images and Text.
CoRR, 2021
Automatic Curation of Large-Scale Datasets for Audio-Visual Representation Learning.
CoRR, 2021
Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
How Robust are Fact Checking Systems on Colloquial Claims?
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021
Unsupervised Representation Learning via Neural Activation Coding.
Proceedings of the 38th International Conference on Machine Learning, 2021
Unsupervised Skill Discovery with Bottleneck Option Learning.
Proceedings of the 38th International Conference on Machine Learning, 2021
Self-Supervised Learning of Compressed Video Representations.
Proceedings of the 9th International Conference on Learning Representations, 2021
SEDONA: Search for Decoupled Neural Networks toward Greedy Block-wise Learning.
Proceedings of the 9th International Conference on Learning Representations, 2021
Parameter Efficient Multimodal Transformers for Video Representation Learning.
Proceedings of the 9th International Conference on Learning Representations, 2021
Drop-Bottleneck: Learning Discrete Compressed Representation for Noise-Robust Exploration.
Proceedings of the 9th International Conference on Learning Representations, 2021
Contextual Label Transformation For Scene Graph Generation.
Proceedings of the 2021 IEEE International Conference on Image Processing, 2021
Pano-AVQA: Grounded Audio-Visual Question Answering on 360° Videos.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Viewpoint-Agnostic Change Captioning with Cycle Consistency.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Continual Learning on Noisy Data Streams via Self-Purified Replay.
Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision, 2021
Perspective-taking and Pragmatics for Generating Empathetic Responses Focused on Emotion Causes.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021
Transitional Adaptation of Pretrained Models for Visual Storytelling.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
StyleMix: Separating Content and Style for Enhanced Data Augmentation.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021
IB-GAN: Disentangled Representation Learning with Information Bottleneck Generative Adversarial Networks.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
Dual Compositional Learning in Interactive Image Retrieval.
Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, 2021
2020
LID 2020: The Learning from Imperfect Data Challenge Results.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2020
Public Self-consciousness for Endowing Dialogue Agents with Consistent Persona.
CoRR, 2020
CurlingNet: Compositional Learning between Images and Text for Fashion IQ Data.
CoRR, 2020
A Neural Dirichlet Process Mixture Model for Task-Free Continual Learning.
Proceedings of the 8th International Conference on Learning Representations, 2020
Sequential Latent Knowledge Selection for Knowledge-Grounded Dialogue.
Proceedings of the 8th International Conference on Learning Representations, 2020
Will I Sound Like Me? Improving Persona Consistency in Dialogues through Pragmatic Self-Consciousness.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020
Character Grounding and Re-identification in Story of Videos and Text Descriptions.
Proceedings of the Computer Vision - ECCV 2020, 2020
Model-Agnostic Boundary-Adversarial Sampling for Test-Time Generalization in Few-Shot Learning.
Proceedings of the Computer Vision - ECCV 2020, 2020
Imbalanced Continual Learning with Partitioning Reservoir Sampling.
Proceedings of the Computer Vision - ECCV 2020, 2020
Rethinking Class Activation Mapping for Weakly Supervised Object Localization.
Proceedings of the Computer Vision - ECCV 2020, 2020
Augmenting Data for Sarcasm Detection with Unlabeled Conversation Context.
Proceedings of the Second Workshop on Figurative Language Processing, 2020
2019
Towards Personalized Image Captioning via Multimodal Memory Networks.
IEEE Trans. Pattern Anal. Mach. Intell., 2019
Video Question Answering with Spatio-Temporal Reasoning.
Int. J. Comput. Vis., 2019
POL360: A Universal Mobile VR Motion Controller using Polarized Light.
Proceedings of the 25th ACM Symposium on Virtual Reality Software and Technology, 2019
Self-Routing Capsule Networks.
Proceedings of the Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, 2019
AudioCaps: Generating Captions for Audios in The Wild.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019
Abstractive Summarization of Reddit Posts with Multi-level Memory Networks.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019
Variational Laplace Autoencoders.
Proceedings of the 36th International Conference on Machine Learning, 2019
Curiosity-Bottleneck: Exploration By Distilling Task-Specific Novelty.
Proceedings of the 36th International Conference on Machine Learning, 2019
Discovery of Natural Language Concepts in Individual Units of CNNs.
Proceedings of the 7th International Conference on Learning Representations, 2019
Harmonizing Maximum Likelihood with GANs for Multimodal Conditional Generation.
Proceedings of the 7th International Conference on Learning Representations, 2019
Automating System Configuration of Distributed Machine Learning.
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 39th IEEE International Conference on Distributed Computing Systems, 2019
Better to Follow, Follow to Be Better: Towards Precise Supervision of Feature Super-Resolution for Small Object Detection.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019
Multi-Task Self-Supervised Object Detection via Recycling of Bounding Box Annotations.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019
2018
Retrieval of Sentence Sequences for an Image Stream via Coherence Recurrent Convolutional Networks.
IEEE Trans. Pattern Anal. Mach. Intell., 2018
A Hierarchical Latent Structure for Variational Conversation Modeling.
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2018
Video Prediction with Appearance and Motion Conditions.
Proceedings of the 35th International Conference on Machine Learning, 2018
Memorization Precedes Generation: Learning Unsupervised GANs with Memory Networks.
Proceedings of the 6th International Conference on Learning Representations, 2018
A Joint Sequence Fusion Model for Video Question Answering and Retrieval.
Proceedings of the Computer Vision - ECCV 2018, 2018
Improving Occlusion and Hard Negative Handling for Single-Stage Pedestrian Detectors.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018
A Memory Network Approach for Story-Based Temporal Summarization of 360° Videos.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018
A Deep Ranking Model for Spatio-Temporal Highlight Detection From a 360◦ Video.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018
2017
Encoding Video and Label Priors for Multi-label Video Classification on YouTube-8M dataset.
CoRR, 2017
SplitNet: Learning to Semantically Split Deep Networks for Parameter Reduction and Model Parallelization.
Proceedings of the 34th International Conference on Machine Learning, 2017
A Read-Write Memory Network for Movie Story Understanding.
Proceedings of the IEEE International Conference on Computer Vision, 2017
End-to-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017
Supervising Neural Attention Models for Video Captioning by Human Gaze Data.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017
Attend to You: Personalized Image Captioning with Context Sequence Memory Networks.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering.
Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017
Detection and Recognition of Text Embedded in Online Images via Neural Context Models.
Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, 2017
2016
Video Captioning and Retrieval Models with Semantic Attention.
CoRR, 2016
A calibration method for optical see-through head-mounted displays with a depth camera.
Proceedings of the 2016 IEEE Virtual Reality, 2016
Taxonomy-Regularized Semantic Deep Convolutional Neural Networks.
Proceedings of the Computer Vision - ECCV 2016, 2016
2015
Poseidon: A System Architecture for Efficient GPU-based Deep Learning on Multiple Machines.
CoRR, 2015
Expressing an Image Stream with a Sequence of Natural Sentences.
Proceedings of the Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, 2015
Dynamic Topic Modeling for Monitoring Market Competition from Online Text and Image Data.
Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015
Discovering Collective Narratives of Theme Parks from Large Collections of Visitors' Photo Streams.
Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015
Storyline Representation of Egocentric Videos with an Applications to Story-Based Search.
Proceedings of the 2015 IEEE International Conference on Computer Vision, 2015
Joint photo stream and blog post summarization and exploration.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015
Ranking and retrieval of image sequences from multiple paragraph queries.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015
Big/little deep neural network for ultra low power inference.
Proceedings of the 2015 International Conference on Hardware/Software Codesign and System Synthesis, 2015
2014
QuMinS: Fast and scalable querying, mining and summarizing multi-modal databases.
,
,
,
,
,
,
,
,
,
,
,
Inf. Sci., 2014
Visualizing brand associations from web community photos.
Proceedings of the Seventh ACM International Conference on Web Search and Data Mining, 2014
Reconstructing Storyline Graphs for Image Recommendation from Web Community Photos.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014
Joint Summarization of Large-Scale Collections of Web Images and Videos for Storyline Reconstruction.
Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014
2013
Time-sensitive web image ranking and retrieval via dynamic multi-task regression.
Proceedings of the Sixth ACM International Conference on Web Search and Data Mining, 2013
Discovering Pictorial Brand Associations from Large-Scale Online Image Data.
Proceedings of the 2013 IEEE International Conference on Computer Vision Workshops, 2013
Jointly Aligning and Segmenting Multiple Web Photo Streams for the Inference of Collective Photo Storylines.
Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, 2013
2012
Web image prediction using multivariate point processes.
Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2012
On multiple foreground cosegmentation.
Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, 2012
2011
Distributed cosegmentation via submodular optimization on anisotropic diffusion.
Proceedings of the IEEE International Conference on Computer Vision, 2011
A Guideline for an Outpatient Guidance System for Use in General Hospitals.
Proceedings of the Design, User Experience, and Usability. Theory, Methods, Tools and Practice, 2011
CheMO: mixed object instruments and interactions for tangible chemistry experiments.
Proceedings of the International Conference on Human Factors in Computing Systems, 2011
2010
QMAS: Querying, Mining and Summarization of Multi-modal Databases.
Proceedings of the ICDM 2010, 2010
Modeling and Analysis of Dynamic Behaviors of Web Image Collections.
Proceedings of the Computer Vision - ECCV 2010, 2010
2009
Unsupervised Detection of Regions of Interest Using Iterative Link Analysis.
Proceedings of the Advances in Neural Information Processing Systems 22: 23rd Annual Conference on Neural Information Processing Systems 2009. Proceedings of a meeting held 7-10 December 2009, 2009
Context-aware communication support system with pictographic cards.
Proceedings of the 11th Conference on Human-Computer Interaction with Mobile Devices and Services, 2009
An OWL-Based Knowledge Model for Combined-Process-and-Location Aware Service.
Proceedings of the Human Interface and the Management of Information. Information and Interaction, 2009
Process and Location-Aware Information Service System for the Disabled and the Elderly.
Proceedings of the Universal Access in Human-Computer Interaction. Applications and Services, 2009
Object Recognition with 3D Models.
Proceedings of the British Machine Vision Conference, 2009
2008
Segmentation of Salient Regions in Outdoor Scenes Using Imagery and 3-D Data.
Proceedings of the 9th IEEE Workshop on Applications of Computer Vision (WACV 2008), 2008
Unsupervised modeling and recognition of object categories with combination of visual contents and geometric similarity links.
Proceedings of the 1st ACM SIGMM International Conference on Multimedia Information Retrieval, 2008
Unsupervised modeling of object categories using link analysis techniques.
Proceedings of the 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2008), 2008
2007
Navigation Behavior Selection Using Generalized Stochastic Petri Nets for a Service Robot.
IEEE Trans. Syst. Man Cybern. Part C, 2007
Development of the multi-functional indoor service robot PSR systems.
Auton. Robots, 2007
2006
Tripodal Schematic Control Architecture for Integration of Multi-Functional Indoor Service Robots.
IEEE Trans. Ind. Electron., 2006
2005
Intellectual property management on MPEG-4 video for hand-held device and mobile video streaming service.
IEEE Trans. Consumer Electron., 2005
Experimental research of navigation behavior selection using generalized stochastic Petri nets (GSPN) for a tour-guide robot.
Proceedings of the 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2005
A Selection Framework of Multiple Navigation Primitives Using Generalized Stochastic Petri Nets.
Proceedings of the 2005 IEEE International Conference on Robotics and Automation, 2005
2004
An efficient methodology for multimedia digital rights management on mobile handset.
IEEE Trans. Consumer Electron., 2004
The autonomous tour-guide robot Jinny.
Proceedings of the 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems, Sendai, Japan, September 28, 2004
Implementation of Multi-functional Service Robots using Tripodal Schematic Control Architecture.
Proceedings of the 2004 IEEE International Conference on Robotics and Automation, 2004
Integrated Navigation System for Indoor Service Robots in Large-scale Environments.
Proceedings of the 2004 IEEE International Conference on Robotics and Automation, 2004
Design of a Middleware and HIML (Human Interaction Markup Language) for Context Aware Services in a Ubiquitous Computing Environment.
Proceedings of the Embedded and Ubiquitous Computing, 2004
An Effective Adaptation of Encryption on MPEG-4 Video Streams for Digital Rights Management in an Ubiquitous Computing Environment.
Proceedings of the Embedded and Ubiquitous Computing, 2004
2003
Tripodal schematic design of the control architecture for the service robot PSR.
Proceedings of the 2003 IEEE International Conference on Robotics and Automation, 2003