2025
Explore the Reasoning Capability of LLMs in the Chess Testbed.
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies, 2025
ToolGen: Unified Tool Retrieval and Calling via Generation.
Proceedings of the Thirteenth International Conference on Learning Representations, 2025
2024
G-VOILA: Gaze-Facilitated Information Querying in Daily Scenarios.
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., May, 2024
Voila-A: Aligning Vision-Language Models with User's Gaze Attention.
Proceedings of the Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, 2024
Exploring Diffusion Time-steps for Unsupervised Representation Learning.
Proceedings of the Twelfth International Conference on Learning Representations, 2024
AssistGUI: Task-Oriented PC Graphical User Interface Automation.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
HORIZON: High-Resolution Semantically Controlled Panorama Synthesis.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024
2023
ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation.
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
GroundNLQ @ Ego4D Natural Language Queries Challenge 2023.
CoRR, 2023
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn.
CoRR, 2023
TaskMatrix.AI: Completing Tasks by Connecting Foundation Models with Millions of APIs.
,
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2023
EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images.
,
,
,
,
,
,
,
,
,
,
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023
MIST : Multi-modal Iterative Spatial-Temporal Transformer for Long-form Video Question Answering.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023
KU-DMIS-MSRA at RadSum23: Pre-trained Vision-Language Model for Radiology Report Summarization.
Proceedings of the 22nd Workshop on Biomedical Natural Language Processing and BioNLP Shared Tasks, 2023
CONE: An Efficient COarse-to-fiNE Alignment Framework for Long Video Temporal Grounding.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023
2022
CLIP4Clip: An empirical study of CLIP for end to end video clip retrieval and captioning.
Neurocomputing, 2022
Multimodal graph neural network for video procedural captioning.
Neurocomputing, 2022
An Efficient COarse-to-fiNE Alignment Framework @ Ego4D Natural Language Queries Challenge 2022.
CoRR, 2022
HORIZON: A High-Resolution Panorama Synthesis Framework.
CoRR, 2022
Learning Temporal Video Procedure Segmentation from an Automatically Collected Large Dataset.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022
Trace Controlled Text to Image Generation.
Proceedings of the Computer Vision - ECCV 2022, 2022
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion.
Proceedings of the Computer Vision - ECCV 2022, 2022
2021
ScaleVLAD: Improving Multimodal Sentiment Analysis via Multi-Scale Fusion of Locally Descriptors.
CoRR, 2021
GODIVA: Generating Open-DomaIn Videos from nAtural Descriptions.
CoRR, 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval.
CoRR, 2021
XGPT: Cross-modal Generative Pre-Training for Image Captioning.
Proceedings of the Natural Language Processing and Chinese Computing, 2021
Learning from Inside: Self-driven Siamese Sampling and Reasoning for Video Question Answering.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021
Hybrid Reasoning Network for Video-based Commonsense Captioning.
Proceedings of the MM '21: ACM Multimedia Conference, Virtual Event, China, October 20, 2021
Control Image Captioning Spatially and Temporally.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021
Hashing based Efficient Inference for Image-Text Matching.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021
GEM: A General Evaluation Benchmark for Multimodal Tasks.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021
Hierarchical Context-aware Network for Dense Video Event Captioning.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021
2020
Tag and Correct: Question aware Open Information Extraction with Two-stage Decoding.
CoRR, 2020
A Benchmark for Structured Procedural Knowledge Extraction from Cooking Videos.
CoRR, 2020
XGPT: Cross-modal Generative Pre-Training for Image Captioning.
CoRR, 2020
UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation.
CoRR, 2020
Learning Semantic Concepts and Temporal Alignment for Narrated Video Procedural Captioning.
Proceedings of the MM '20: The 28th ACM International Conference on Multimedia, 2020
GRACE: Gradient Harmonized and Cascaded Labeling for Aspect-based Sentiment Analysis.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020
Segment-Then-Rank: Non-Factoid Question Answering on Instructional Videos.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
Functionality Discovery and Prediction of Physical Objects.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020
2019
Microsoft Concept Graph: Mining Semantic Concepts for Short Text Understanding.
Data Intell., 2019
Knowledge Aware Semantic Concept Expansion for Image-Text Matching.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019
Dense Procedure Captioning in Narrated Instructional Videos.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019
2018
Concept and Attention-Based CNN for Question Retrieval in Multi-View Learning.
ACM Trans. Intell. Syst. Technol., 2018
R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering.
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018
2017
Can Machines Intelligently Propose Novel and Reasonable Scientific Hypotheses?
Proceedings of the 26th International Conference on World Wide Web Companion, 2017
Concept Embedded Convolutional Semantic Model for Question Retrieval.
Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, 2017
Sequence Modeling with Hierarchical Deep Generative Models with Dual Memory.
Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017
2016
Learning to Extract Conditional Knowledge for Question Answering using Dialogue.
Proceedings of the 25th ACM International Conference on Information and Knowledge Management, 2016
2015
Acronym Disambiguation Using Word Embedding.
Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015
2013
Learning open-domain comparable entity graphs from user search queries.
Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, 2013
2012
Endless and Scalable Knowledge Table Extraction from Semi-structured Websites.
Proceedings of the 12th IEEE International Conference on Data Mining Workshops, 2012
2011
Sparse hidden-dynamics conditional random fields for user intent understanding.
Proceedings of the 20th International Conference on World Wide Web, 2011
Cross Domain Random Walk for Query Intent Pattern Mining from Search Engine Log.
Proceedings of the 11th IEEE International Conference on Data Mining, 2011
Extract knowledge from semi-structured websites for search task simplification.
Proceedings of the 20th ACM Conference on Information and Knowledge Management, 2011
Multi-view random walk framework for search task discovery from click-through log.
Proceedings of the 20th ACM Conference on Information and Knowledge Management, 2011
Collaborative Users' Brand Preference Mining across Multiple Domains from Implicit Feedbacks.
Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011
2010
Quantum Path Integral Inspired Query Sequence Suggestion for User Search Task Simplification.
Proceedings of the ICDMW 2010, 2010
2009
Search result re-ranking based on gap between search queries and social tags.
Proceedings of the 18th International Conference on World Wide Web, 2009
ExSearch: a novel vertical search engine for online barter business.
Proceedings of the 18th ACM Conference on Information and Knowledge Management, 2009
2007
A novel clustering-based RSS aggregator.
Proceedings of the 16th International Conference on World Wide Web, 2007
2005
Improving web search results using affinity graph.
Proceedings of the SIGIR 2005: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2005
A Similarity Reinforcement Algorithm for Heterogeneous Web Pages.
Proceedings of the Web Technologies Research and Development - APWeb 2005, 7th Asia-Pacific Web Conference, Shanghai, China, March 29, 2005