Haiyang Xu

Orcid: 0009-0005-1998-1827

According to our database1, Haiyang Xu authored at least 83 papers between 1998 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
UniQRNet: Unifying Referring Expression Grounding and Segmentation with QRNet.
ACM Trans. Multim. Comput. Commun. Appl., August, 2024

Feature Mixture on Pre-Trained Model for Few-Shot Learning.
IEEE Trans. Image Process., 2024

SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing.
CoRR, 2024

mPLUG-DocOwl2: High-resolution Compressing for OCR-free Multi-page Document Understanding.
CoRR, 2024

MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model.
CoRR, 2024

mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models.
CoRR, 2024

Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration.
CoRR, 2024

TinyChart: Efficient Chart Understanding with Visual Token Merging and Program-of-Thoughts Learning.
CoRR, 2024

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding.
CoRR, 2024

Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection.
CoRR, 2024

Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception.
CoRR, 2024

Sparse Mixture of Experts Language Models Excel in Knowledge Distillation.
Proceedings of the Natural Language Processing and Chinese Computing, 2024

Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

mPLUG-PaperOwl: Scientific Diagram Analysis with the Multimodal Large Language Model.
Proceedings of the 32nd ACM International Conference on Multimedia, MM 2024, Melbourne, VIC, Australia, 28 October 2024, 2024

Halflife: An Adaptive Flowlet-based Load Balancer with Fading Timeout in Data Center Networks.
Proceedings of the Nineteenth European Conference on Computer Systems, 2024

TinyChart: Efficient Chart Understanding with Program-of-Thoughts Learning and Visual Token Merging.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

MIBench: Evaluating Multimodal Large Language Models over Multiple Images.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

mPLUG-OwI2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Bayesian Diffusion Models for 3D Shape Reconstruction.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

OmniControlNet: Dual-stage Integration for Conditional Image Generation.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024

Unifying Latent and Lexicon Representations for Effective Video-Text Retrieval.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

TiMix: Text-Aware Image Mixing for Effective Vision-Language Pre-training.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
Learning Video-Text Aligned Representations for Video Captioning.
ACM Trans. Multim. Comput. Commun. Appl., 2023

Achieving Human Parity on Visual Question Answering.
ACM Trans. Inf. Syst., 2023

Analysis and evaluation of hemiplegic gait based on wearable sensor network.
Inf. Fusion, 2023

mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration.
CoRR, 2023

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model.
CoRR, 2023

ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models.
CoRR, 2023

Evaluation and Analysis of Hallucination in Large Vision-Language Models.
CoRR, 2023

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding.
CoRR, 2023

Vision Transformer with Attention Map Hallucination and FFN Compaction.
CoRR, 2023

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Dataset for Pre-training and Benchmarks.
CoRR, 2023

Vision Langauge Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation.
CoRR, 2023

Transforming Visual Scene Graphs to Image Captions.
CoRR, 2023

mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality.
CoRR, 2023

ChatPLUG: Open-Domain Generative Dialogue System with Internet-Augmented Instruction Tuning for Digital Human.
CoRR, 2023

Adaptively Clustering Neighbor Elements for Image Captioning.
CoRR, 2023

mPLUG-Octopus: The Versatile Assistant Empowered by A Modularized End-to-End Multimodal LLM.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Learning Semantics-Grounded Vocabulary Representation for Video-Text Retrieval.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

COPA : Efficient Vision-Language Pre-training through Collaborative Object- and Patch-Text Alignment.
Proceedings of the 31st ACM International Conference on Multimedia, 2023

Curriculum Multi-Level Learning for Imbalanced Live-Stream Recommendation.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video.
Proceedings of the International Conference on Machine Learning, 2023

HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

Learning Trajectory-Word Alignments for Video-Language Tasks.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

BUS : Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization.
Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023

UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Towards Adaptive Prefix Tuning for Parameter-Efficient Language Model Fine-tuning.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2023

Transforming Visual Scene Graphs to Image Captions.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Vision Language Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
A Bi-Objective Learn-and-Deploy Scheduling Method for Bursty and Stochastic Requests on Heterogeneous Cloud Servers.
IEEE Trans. Parallel Distributed Syst., 2022

Real-time numerical system convertor via two-dimensional WS2-based memristive device.
Frontiers Comput. Neurosci., 2022

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections.
CoRR, 2022

Image Captioning In the Transformer Age.
CoRR, 2022

mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

TRIPS: Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching.
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

Control-based Bidding for Mobile Livestreaming Ads with Exposure Guarantee.
Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022

2021
Achieving Human Parity on Visual Question Answering.
CoRR, 2021

Grid-VLP: Revisiting Grid Features for Vision-Language Pre-training.
CoRR, 2021

SemVLP: Vision-Language Pre-training by Aligning Semantics at Multiple Levels.
CoRR, 2021

We Know What You Want: An Advertising Strategy Recommender System for Online Advertising.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

Multi-objective Dynamic Auction Mechanism for Online Advertising.
Proceedings of the IEEE International Performance, 2021

A Two-phase Constrained Multi-Objective Evolutionary Algorithm Based on the Constrained Decomposition Approach.
Proceedings of the 7th IEEE International Conference on Cloud Computing and Intelligent Systems, 2021

E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
Storyline extraction from news articles with dynamic dependency.
Intell. Data Anal., 2020

Adversarial Multi-Binary Neural Network for Multi-class Classification.
CoRR, 2020

Selective Attention Encoders by Syntactic Graph Convolutional Networks for Document Summarization.
Proceedings of the 2020 IEEE International Conference on Acoustics, 2020

Learning to Infer User Hidden States for Online Sequential Advertising.
Proceedings of the CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, 2020

A Deep Prediction Network for Understanding Advertiser Intent and Satisfaction.
Proceedings of the CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, 2020

Neural Topic Modeling with Bidirectional Adversarial Training.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
DELTA: A DEep learning based Language Technology plAtform.
CoRR, 2019

Learning Alignment for Multimodal Emotion Recognition from Speech.
Proceedings of the 20th Annual Conference of the International Speech Communication Association, 2019

Learning Syntactic and Dynamic Selective Encoding for Document Summarization.
Proceedings of the International Joint Conference on Neural Networks, 2019

NVSRN: A Neural Variational Scaling Reasoning Network for Initiative Response Generation.
Proceedings of the 2019 IEEE International Conference on Data Mining, 2019

2016
Unsupervised Storyline Extraction from News Articles.
Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, 2016

2015
An Unsupervised Bayesian Modelling Approach for Storyline Detection on News Articles.
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015

2014
A Formal Transformation Approach for Embedded Software Modeling.
J. Softw., 2014

2007
Probabilistic Logic Operator based on Zero-order N/T-Norms Complete Clusters.
Proceedings of the 8th ACIS International Conference on Software Engineering, 2007

1998
Stochastic volatility in interest rates and nonlinearity in velocity.
Int. J. Syst. Sci., 1998


  Loading...