Jiuxiang Gu

According to our database1, Jiuxiang Gu authored at least 83 papers between 2015 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
LoRA-Contextualizing Adaptation of Large Multimodal Models for Long Document Understanding.
CoRR, 2024

Personalization of Large Language Models: A Survey.
CoRR, 2024

A Survey of Small Language Models.
CoRR, 2024

VipAct: Visual-Perception Enhancement via Specialized VLM Agent Collaboration and Tool-use.
CoRR, 2024

ImageFolder: Autoregressive Image Generation with Folded Tokens.
CoRR, 2024

A Multi-LLM Debiasing Framework.
CoRR, 2024

MMR: Evaluating Reading Ability of Large Multimodal Models.
CoRR, 2024

Fast John Ellipsoid Computation with Differential Privacy Optimization.
CoRR, 2024

CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models.
CoRR, 2024

LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models.
CoRR, 2024

Differential Privacy of Cross-Attention with Provable Guarantee.
CoRR, 2024

Differential Privacy Mechanisms in Neural Tangent Kernel Regression.
CoRR, 2024

Toward Infinite-Long Prefix in Transformer.
CoRR, 2024

ARTIST: Improving the Generation of Text-rich Images by Disentanglement.
CoRR, 2024

Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation.
CoRR, 2024

DocSynthv2: A Practical Autoregressive Modeling for Document Generation.
CoRR, 2024

Unraveling the Smoothness Properties of Diffusion Models: A Gaussian Mixture Perspective.
CoRR, 2024

Tensor Attention Training: Provably Efficient Learning of Higher-order Transformers.
CoRR, 2024

Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers.
CoRR, 2024

Exploring the Frontiers of Softmax: Provable Optimization, Applications in Diffusion Model, and Beyond.
CoRR, 2024

Fourier Circuits in Neural Networks: Unlocking the Potential of Large Language Models in Mathematical Reasoning and Modular Arithmetic.
CoRR, 2024

Self-Cleaning: Improving a Named Entity Recognizer Trained on Noisy Data with a Few Clean Instances.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

Category-Aware Active Domain Adaptation.
Proceedings of the Forty-first International Conference on Machine Learning, 2024

LRM: Large Reconstruction Model for Single Image to 3D.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

ADOPD: A Large-Scale Document Page Decomposition Dataset.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

SOHES: Self-supervised Open-world Hierarchical Entity Segmentation.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

TextLap: Customizing Language Models for Text-to-Layout Planning.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Advancing Vision-Language Models with Adapter Ensemble Strategies.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Customization Assistant for Text-to-image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

TRINS: Towards Multimodal Language Models that Can Read.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

DocScript: Document-level Script Event Prediction.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
Open World Entity Segmentation.
IEEE Trans. Pattern Anal. Mach. Intell., July, 2023

Improving a Named Entity Recognizer Trained on Noisy Data with a Few Clean Instances.
CoRR, 2023

Reflection-Tuning: Data Recycling Improves LLM Instruction-Tuning.
CoRR, 2023

LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding.
CoRR, 2023

AIMS: All-Inclusive Multi-Level Segmentation.
CoRR, 2023

LayerDoc: Layer-wise Extraction of Spatial Hierarchical Structure in Visually-Rich Documents.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023

AIMS: All-Inclusive Multi-Level Segmentation for Anything.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

High Quality Entity Segmentation.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Learning the Visualness of Text Using Large Vision-Language Models.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

A Critical Analysis of Document Out-of-Distribution Detection.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

DocEdit: Language-Guided Document Editing.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
Fine-Grained Entity Segmentation.
CoRR, 2022

Unified Pretraining Framework for Document Understanding.
CoRR, 2022

FedKC: Federated Knowledge Composition for Multilingual Natural Language Understanding.
Proceedings of the WWW '22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25, 2022

Delving into Out-of-Distribution Detection with Vision-Language Representations.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

DocTime: A Document-level Temporal Dependency Graph Parser.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

DocLayoutTTS: Dataset and Baselines for Layout-informed Document-level Neural Speech Synthesis.
Proceedings of the 23rd Annual Conference of the International Speech Communication Association, 2022

MGDoc: Pre-training with Multi-granular Hierarchy for Document Image Understanding.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Meta Spatio-Temporal Debiasing for Video Scene Graph Generation.
Proceedings of the Computer Vision - ECCV 2022, 2022

Improving the Reliability for Confidence Estimation.
Proceedings of the Computer Vision - ECCV 2022, 2022

CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation.
Proceedings of the Computer Vision - ECCV 2022, 2022

Towards Language-Free Training for Text-to-Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

EI-CLIP: Entity-aware Interventional Contrastive Learning for E-commerce Cross-modal Retrieval.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

User-Entity Differential Privacy in Learning Natural Language Models.
Proceedings of the IEEE International Conference on Big Data, 2022

Learning Adaptive Axis Attentions in Fine-tuning: Beyond Fixed Sparse Attention Patterns.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

TiGAN: Text-Based Interactive Image Generation and Manipulation.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

UNISON: Unpaired Cross-Lingual Image Captioning.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
CaSP: Class-agnostic Semi-Supervised Pretraining for Detection and Segmentation.
CoRR, 2021

LAFITE: Towards Language-Free Training for Text-to-Image Generation.
CoRR, 2021

UniDoc: Unified Pretraining Framework for Document Understanding.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Towards Interpreting and Mitigating Shortcut Learning Behavior of NLU models.
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021

Exploiting Semantic Embedding and Visual Feature for Facial Action Unit Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

Multi-Scale Aligned Distillation for Low-Resolution Detection.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

SelfDoc: Self-Supervised Document Representation Learning.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021

2020
Video captioning with boundary-aware hierarchical language decoding and joint video prediction.
Neurocomputing, 2020

Unsupervised Cross-lingual Image Captioning.
CoRR, 2020

Self-Supervised Relationship Probing.
Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, 2020

Finding It at Another Side: A Viewpoint-Adapted Matching Encoder for Change Captioning.
Proceedings of the Computer Vision - ECCV 2020, 2020

2019
Resilient Load Restoration in Microgrids Considering Mobile Energy Storage Fleets: A Deep Reinforcement Learning Approach.
CoRR, 2019

Watch It Twice: Video Captioning with a Refocused Video Encoder.
Proceedings of the 27th ACM International Conference on Multimedia, 2019

Unpaired Image Captioning via Scene Graph Alignments.
Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision, 2019

Scene Graph Generation With External Knowledge and Image Reconstruction.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019

2018
Recent advances in convolutional neural networks.
Pattern Recognit., 2018

NTU ROSE Lab at TRECVID 2018: Ad-hoc Video Search and Video to Text.
Proceedings of the 2018 TREC Video Retrieval Evaluation, 2018

Unpaired Image Captioning by Language Pivoting.
Proceedings of the Computer Vision - ECCV 2018, 2018

Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval With Generative Models.
Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition, 2018

Stack-Captioning: Coarse-to-Fine Learning for Image Captioning.
Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, 2018

2017
An Empirical Study of Language CNN for Image Captioning.
Proceedings of the IEEE International Conference on Computer Vision, 2017

2016
Recurrent Highway Networks with Language CNN for Image Captioning.
CoRR, 2016

2015
Recent Advances in Convolutional Neural Networks.
CoRR, 2015


  Loading...