Weizhu Chen

According to our database1, Weizhu Chen authored at least 158 papers between 2007 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning.
CoRR, 2024

GRIN: GRadient-INformed MoE.
CoRR, 2024

Arena Learning: Build Data Flywheel for LLMs Post-training via Simulated Chatbot Arena.
CoRR, 2024

Samba: Simple Hybrid State Space Models for Efficient Unlimited Context Language Modeling.
CoRR, 2024

Self-Augmented Preference Optimization: Off-Policy Paradigms for Language Model Alignment.
CoRR, 2024

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone.
CoRR, 2024

Rho-1: Not All Tokens Are What You Need.
CoRR, 2024

A Note on LoRA.
CoRR, 2024

Exploring the Mystery of Influential Data for Mathematical Reasoning.
CoRR, 2024

Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning.
CoRR, 2024

Multi-LoRA Composition for Image Generation.
CoRR, 2024

SciAgent: Tool-augmented Language Models for Scientific Reasoning.
CoRR, 2024

Relative Preference Optimization: Enhancing LLM Alignment through Contrasting Responses across Identical and Diverse Prompts.
CoRR, 2024

AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators.
Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track, 2024

Language Models can be Deductive Solvers.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2024, 2024

Supervised Knowledge Makes Large Language Models Better In-context Learners.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

LoftQ: LoRA-Fine-Tuning-aware Quantization for Large Language Models.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Automatic Instruction Evolving for Large Language Models.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Can LLMs Learn From Mistakes? An Empirical Study on Reasoning Tasks.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

Competition-Level Problems are Effective LLM Evaluators.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
Language Models can be Logical Solvers.
CoRR, 2023

Learning From Mistakes Makes LLM Better Reasoner.
CoRR, 2023

Sparse Backpropagation for MoE Training.
CoRR, 2023

Enhancing Large Language Models in Coding Through Multi-Perspective Self-Consistency.
CoRR, 2023

Deep Reinforcement Learning from Hierarchical Weak Preference Feedback.
CoRR, 2023

GRILL: Grounded Vision-language Pre-training via Aligning Text and Image Regions.
CoRR, 2023

RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation.
CoRR, 2023

Check Your Facts and Try Again: Improving Large Language Models with External Knowledge and Automated Feedback.
CoRR, 2023

AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Patch Diffusion: Faster and More Data-Efficient Training of Diffusion Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

In-Context Learning Unlocked for Diffusion Models.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Meet in the Middle: A New Pre-training Paradigm.
Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, 2023

Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models.
Proceedings of the International Conference on Machine Learning, 2023

HyperTuning: Toward Adapting Large Language Models without Back-propagation.
Proceedings of the International Conference on Machine Learning, 2023

Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise.
Proceedings of the International Conference on Machine Learning, 2023

Less is More: Task-aware Layer-wise Distillation for Language Model Compression.
Proceedings of the International Conference on Machine Learning, 2023

LoSparse: Structured Compression of Large Language Models based on Low-Rank and Sparse Approximation.
Proceedings of the International Conference on Machine Learning, 2023

Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Auto-Encoders.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

Diffusion-GAN: Training GANs with Diffusion.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

CodeT: Code Generation with Generated Tests.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Skill-Based Few-Shot Selection for In-Context Learning.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Joint Generator-Ranker Learning for Natural Language Generation.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Code Execution with Pre-trained Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

Making Language Models Better Reasoners with Step-Aware Verifier.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

DSEE: Dually Sparsity-embedded Efficient Tuning of Pre-trained Language Models.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

2022
GENIE: Large Scale Pre-training for Text Generation with Diffusion Model.
CoRR, 2022

Generation-Augmented Query Expansion For Code Retrieval.
CoRR, 2022

GENIUS: Sketch-based Language Model Pre-training via Extreme and Selective Masking for Text Generation and Augmentation.
CoRR, 2022

SimANS: Simple Ambiguous Negatives Sampling for Dense Text Retrieval.
CoRR, 2022

On the Advance of Making Language Models Better Reasoners.
CoRR, 2022

A Self-Paced Mixed Distillation Method for Non-Autoregressive Generation.
CoRR, 2022

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer.
CoRR, 2022

Input-Tuning: Adapting Unfamiliar Inputs to Frozen Pretrained Models.
CoRR, 2022

Truncated Diffusion Probabilistic Models.
CoRR, 2022

Mixing and Shifting: Exploiting Global and Local Dependencies in Vision MLPs.
CoRR, 2022

CodeRetriever: Unimodal and Bimodal Contrastive Learning.
CoRR, 2022

Virtual information core optimization for collaborative filtering recommendation based on clustering and evolutionary algorithms.
Appl. Soft Comput., 2022

MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

ALLSH: Active Learning Guided by Local Sensitivity and Hardness.
Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2022, 2022

OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

CERT: Continual Pre-training on Sketches for Library-oriented Code Generation.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance.
Proceedings of the International Conference on Machine Learning, 2022

Adversarial Retriever-Ranker for Dense Text Retrieval.
Proceedings of the Tenth International Conference on Learning Representations, 2022

TAPEX: Table Pre-training via Learning a Neural SQL Executor.
Proceedings of the Tenth International Conference on Learning Representations, 2022

No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models.
Proceedings of the Tenth International Conference on Learning Representations, 2022

LoRA: Low-Rank Adaptation of Large Language Models.
Proceedings of the Tenth International Conference on Learning Representations, 2022

Reasoning Like Program Executors.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

CodeRetriever: A Large Scale Contrastive Pre-Training Method for Code Search.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Soft-Labeled Contrastive Pre-Training for Function-Level Code Representation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, 2022

Scalable Learning to Optimize: A Learned Optimizer Can Train Big Models.
Proceedings of the Computer Vision - ECCV 2022, 2022

Controllable Natural Language Generation with Contrastive Prefixes.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

Finding the Dominant Winning Ticket in Pre-Trained Language Models.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, 2022

DialogVED: A Pre-trained Latent Variable Encoder-Decoder Model for Dialog Response Generation.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

What Makes Good In-Context Examples for GPT-3?
Proceedings of Deep Learning Inside Out: The 3rd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, 2022

XLM-K: Improving Cross-Lingual Language Model Pre-training with Multilingual Knowledge.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

2021
LoRA: Low-Rank Adaptation of Large Language Models.
CoRR, 2021

HiddenCut: Simple Data Augmentation for Natural Language Understanding with Better Generalization.
CoRR, 2021

Adversarial Training as Stackelberg Game: An Unrolled Optimization Approach.
CoRR, 2021

Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer.
Proceedings of the Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, 2021

Contextual Bandit Applications in a Customer Support Bot.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

Poolingformer: Long Document Modeling with Pooling Attention.
Proceedings of the 38th International Conference on Machine Learning, 2021

BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining.
Proceedings of the 38th International Conference on Machine Learning, 2021

CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding.
Proceedings of the 9th International Conference on Learning Representations, 2021

MixKD: Towards Efficient Distillation of Large-scale Language Models.
Proceedings of the 9th International Conference on Learning Representations, 2021

Deberta: decoding-Enhanced Bert with Disentangled Attention.
Proceedings of the 9th International Conference on Learning Representations, 2021

Adversarial Regularization as Stackelberg Game: An Unrolled Optimization Approach.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

ARCH: Efficient Adversarial Regularized Training with Caching.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Token-wise Curriculum Learning for Neural Machine Translation.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, 2021

Finetuning Pretrained Transformers into RNNs.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Few-Shot Named Entity Recognition: An Empirical Baseline Study.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Memory-Efficient Differentiable Transformer Architecture Search.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

Reader-Guided Passage Reranking for Open-Domain Question Answering.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

Generation-Augmented Retrieval for Open-Domain Question Answering.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

GLGE: A New General Language Generation Evaluation Benchmark.
Proceedings of the Findings of the Association for Computational Linguistics: ACL/IJCNLP 2021, 2021

Super Tickets in Pre-Trained Language Models: From Model Compression to Improving Generalization.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

HiddenCut: Simple Data Augmentation for Natural Language Understanding with Better Generalizability.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

UnitedQA: A Hybrid Approach for Open Domain Question Answering.
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, 2021

2020
Few-Shot Named Entity Recognition: A Comprehensive Study.
CoRR, 2020

Improving Self-supervised Pre-training via a Fully-Explored Masked Language Model.
CoRR, 2020

A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation.
CoRR, 2020

Example-Based Named Entity Recognition.
CoRR, 2020

Adversarial Training for Large Neural Language Models.
CoRR, 2020

Conditional Self-Attention for Query-based Summarization.
CoRR, 2020


On the Variance of the Adaptive Learning Rate and Beyond.
Proceedings of the 8th International Conference on Learning Representations, 2020

Exploiting Structured Knowledge in Text via Graph-Guided Representation Learning.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

Understanding the Difficulty of Training Transformers.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020

The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, 2020

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization.
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020

2019
DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization.
J. Mach. Learn. Res., 2019

X-SQL: reinforce schema representation with context.
CoRR, 2019

A Hybrid Neural Network Model for Commonsense Reasoning.
CoRR, 2019

Lessons from Real-World Reinforcement Learning in a Customer Support Bot.
CoRR, 2019

Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding.
CoRR, 2019

Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Open-domain Question Answering.
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2019

Parameter-free Sentence Embedding via Orthogonal Basis.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, 2019

Multi-Task Deep Neural Networks for Natural Language Understanding.
Proceedings of the 57th Conference of the Association for Computational Linguistics, 2019

2018
Zero-training Sentence Embedding via Orthogonal Basis.
CoRR, 2018

IncSQL: Training Incremental Text-to-SQL Parsers with Non-Deterministic Oracles.
CoRR, 2018

Learning to Attend On Essential Terms: An Enhanced Retriever-Reader Model for Scientific Question Answering.
CoRR, 2018

FusionNet: Fusing via Fully-aware Attention with Application to Machine Comprehension.
Proceedings of the 6th International Conference on Learning Representations, 2018

2017
Limited-memory Common-directions Method for Distributed Optimization and its Application on Empirical Risk Minimization.
Proceedings of the 2017 SIAM International Conference on Data Mining, 2017

2016
ReasoNet: Learning to Stop Reading in Machine Comprehension.
Proceedings of the Workshop on Cognitive Computation: Integrating neural and symbolic approaches 2016 co-located with the 30th Annual Conference on Neural Information Processing Systems (NIPS 2016), 2016

2014
Large-scale L-BFGS using MapReduce.
Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 2014

Transfer Understanding from Head Queries to Tail Queries.
Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, 2014

2012
Personalized click model through collaborative filtering.
Proceedings of the Fifth International Conference on Web Search and Web Data Mining, 2012

A noise-aware click model for web search.
Proceedings of the Fifth International Conference on Web Search and Web Data Mining, 2012

Beyond ten blue links: enabling user click modeling in federated web search.
Proceedings of the Fifth International Conference on Web Search and Web Data Mining, 2012

2011
Characterizing search intent diversity into click models.
Proceedings of the 20th International Conference on World Wide Web, 2011

Action prediction and identification from mining temporal user behaviors.
Proceedings of the Forth International Conference on Web Search and Web Data Mining, 2011

User-click modeling for understanding and predicting search-behavior.
Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2011

Short Text Conceptualization Using a Probabilistic Knowledgebase.
Proceedings of the IJCAI 2011, 2011

Characterizing Inverse Time Dependency in Multi-class Learning.
Proceedings of the 11th IEEE International Conference on Data Mining, 2011

A Whole Page Click Model to Better Interpret Search Engine Click Data.
Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011

2010
Co-optimization of multiple relevance metrics in web search.
Proceedings of the 19th International Conference on World Wide Web, 2010

A novel click model and its applications to online advertising.
Proceedings of the Third International Conference on Web Search and Web Data Mining, 2010

Incorporating post-click behaviors into a click model.
Proceedings of the Proceeding of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2010

Learning click models via probit bayesian inference.
Proceedings of the 19th ACM Conference on Information and Knowledge Management, 2010

Explore click models for search ranking.
Proceedings of the 19th ACM Conference on Information and Knowledge Management, 2010

2009
Inverse Time Dependency in Convex Regularized Learning.
Proceedings of the ICDM 2009, 2009

P-packSVM: Parallel Primal grAdient desCent Kernel SVM.
Proceedings of the ICDM 2009, 2009

A general magnitude-preserving boosting algorithm for search ranking.
Proceedings of the 18th ACM Conference on Information and Knowledge Management, 2009

To divide and conquer search ranking by learning query difficulty.
Proceedings of the 18th ACM Conference on Information and Knowledge Management, 2009

2008
Web query translation via web log mining.
Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2008

Mining Translations of Web Queries from Web Click-through Data.
Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence, 2008

2007
Document Transformation for Multi-label Feature Selection in Text Categorization.
Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007), 2007

Mining Web Query Hierarchies from Clickthrough Data.
Proceedings of the Twenty-Second AAAI Conference on Artificial Intelligence, 2007


  Loading...