Shi Han

Orcid: 0000-0002-0360-6089

According to our database1, Shi Han authored at least 118 papers between 2004 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

On csauthors.net:

Bibliography

2024
FXAM: A unified and fast interpretable model for predictive analytics.
Expert Syst. Appl., 2024

SpreadsheetLLM: Encoding Spreadsheets for Large Language Models.
CoRR, 2024

Scalable Differentiable Causal Discovery in the Presence of Latent Confounders with Skeleton Posterior (Extended Version).
CoRR, 2024

Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities.
CoRR, 2024

CONLINE: Complex Code Generation and Refinement with Online Searching and Correctness Testing.
CoRR, 2024

Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study.
Proceedings of the 17th ACM International Conference on Web Search and Data Mining, 2024

Source Free Graph Unsupervised Domain Adaptation.
Proceedings of the 17th ACM International Conference on Web Search and Data Mining, 2024

Professional Network Matters: Connections Empower Person-Job Fit.
Proceedings of the 17th ACM International Conference on Web Search and Data Mining, 2024

Scalable Differentiable Causal Discovery in the Presence of Latent Confounders with Skeleton Posterior.
Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2024

TAROT: A Hierarchical Framework with Multitask co-pretraining on Semi-Structured Data Towards Effective Person-Job fit.
Proceedings of the IEEE International Conference on Acoustics, 2024

PromptIntern: Saving Inference Costs by Internalizing Recurrent Prompt during Large Language Model Fine-tuning.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, 2024

CoCoST: Automatic Complex Code Generation with Online Searching and Correctness Testing.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Encoding Spreadsheets for Large Language Models.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

KET-QA: A Dataset for Knowledge Enhanced Table Question Answering.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

Text-to-Image Generation for Abstract Concepts.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

Text2Analysis: A Benchmark of Table Question Answering with Advanced Data Analysis and Unclear Queries.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
CoCoAST: Representing Source Code via Hierarchical Splitting and Reconstruction of Abstract Syntax Trees.
Empir. Softw. Eng., November, 2023

XInsight: eXplainable Data Analysis Through The Lens of Causality.
Proc. ACM Manag. Data, 2023

SoTaNa: The Open-Source Software Development Assistant.
CoRR, 2023

Leveraging LLMs for KPIs Retrieval from Hybrid Long-Document: A Comprehensive Framework and Dataset.
CoRR, 2023

Evaluating and Enhancing Structural Understanding Capabilities of Large Language Models on Tables via Input Designs.
CoRR, 2023

Demonstration of InsightPilot: An LLM-Empowered Automated Data Exploration System.
CoRR, 2023

Robust Mid-Pass Filtering Graph Convolutional Networks.
Proceedings of the ACM Web Conference 2023, 2023

Homophily-oriented Heterogeneous Graph Rewiring.
Proceedings of the ACM Web Conference 2023, 2023

DIGMN: Dynamic Intent Guided Meta Network for Differentiated User Engagement Forecasting in Online Professional Social Platforms.
Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, 2023

Revisiting Code Search in a Two-Stage Paradigm.
Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, 2023

MM-GNN: Mix-Moment Graph Neural Network towards Modeling Neighborhood Feature Distribution.
Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, 2023

ML4C: Seeing Causality Through Latent Vicinity.
Proceedings of the 2023 SIAM International Conference on Data Mining, 2023

Auto-Validate by-History: Auto-Program Data Quality Constraints to Validate Recurring Data Pipelines.
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

GetPt: Graph-enhanced General Table Pre-training with Alternate Attention Network.
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

On Manipulating Signals of User-Item Graph: A Jacobi Polynomial-based Graph Collaborative Filtering.
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

Towards Efficient Fine-Tuning of Pre-trained Code Models: An Experimental Study and Beyond.
Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2023

Causal-Based Supervision of Attention in Graph Neural Network: A Better and Simpler Choice towards Powerful Attention.
Proceedings of the Thirty-Second International Joint Conference on Artificial Intelligence, 2023

CoCoSoDa: Effective Contrastive Learning for Code Search.
Proceedings of the 45th IEEE/ACM International Conference on Software Engineering, 2023

Out-of-Distribution Detection based on In-Distribution Data Patterns Memorization with Modern Hopfield Energy.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

CASR: Generating Complex Sequences with Autoregressive Self-Boost Refinement.
Proceedings of the Eleventh International Conference on Learning Representations, 2023

InsightPilot: An LLM-Empowered Automated Data Exploration System.
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023

Hadamard Adapter: An Extreme Parameter-Efficient Adapter Tuning Method for Pre-trained Language Models.
Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, 2023

AnaMeta: A Table Understanding Dataset of Field Metadata Knowledge Shared by Multi-dimensional Data Analysis Tasks.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

HermEs: Interactive Spreadsheet Formula Prediction via Hierarchical Formulet Expansion.
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2023

Unveiling the Black Box of PLMs with Semantic Anchors: Towards Interpretable Neural Semantic Parsing.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

SheetPT: Spreadsheet Pre-training Based on Hierarchical Attention Network.
Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence, 2023

2022
A large-scale empirical study of commit message generation: models, datasets and evaluation.
Empir. Softw. Eng., 2022

LUNA: Language Understanding with Number Augmentations on Transformers via Number Plugins and Pre-training.
CoRR, 2022

Reflection of Thought: Inversely Eliciting Numerical Reasoning in Language Models via Solving Linear Systems.
CoRR, 2022

Guiding the PLMs with Semantic Anchors as Intermediate Supervision: Towards Interpretable Semantic Parsing.
CoRR, 2022

Make Heterophily Graphs Better Fit GNN: A Graph Rewiring Approach.
CoRR, 2022

Inferring Tabular Analysis Metadata by Infusing Distribution and Knowledge Information.
CoRR, 2022

Long Code for Code Search.
CoRR, 2022

ASTA: Learning Analytical Semantics over Tables for Intelligent Data Analysis and Visualization.
CoRR, 2022

TaCube: Pre-computing Data Cubes for Answering Numerical-Reasoning Questions over Tabular Data.
CoRR, 2022

Enhancing Semantic Code Search with Multimodal Contrastive Learning and Soft Data Augmentation.
CoRR, 2022

ECMG: Exemplar-based Commit Message Generation.
CoRR, 2022

Table Pre-training: A Survey on Model Architectures, Pretraining Objectives, and Downstream Tasks.
CoRR, 2022

GBK-GNN: Gated Bi-Kernel Graph Neural Networks for Modeling Both Homophily and Heterophily.
Proceedings of the WWW '22: The ACM Web Conference 2022, Virtual Event, Lyon, France, April 25, 2022

Understanding and Improvement of Adversarial Training for Network Embedding from an Optimization Perspective.
Proceedings of the WSDM '22: The Fifteenth ACM International Conference on Web Search and Data Mining, Virtual Event / Tempe, AZ, USA, February 21, 2022

Neuron with Steady Response Leads to Better Generalization.
Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, 2022

LibDB: An Effective and Efficient Framework for Detecting Third-Party Libraries in Binaries.
Proceedings of the 19th IEEE/ACM International Conference on Mining Software Repositories, 2022

TrajGAT: A Graph-based Long-term Dependency Modeling Approach for Trajectory Similarity Computation.
Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

pureGAM: Learning an Inherently Pure Additive Model.
Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

ML4S: Learning Causal Skeleton from Vicinal Graphs.
Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

Table Pre-training: A Survey on Model Architectures, Pre-training Objectives, and Downstream Tasks.
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, 2022

On the Evaluation of Neural Code Summarization.
Proceedings of the 44th IEEE/ACM 44th International Conference on Software Engineering, 2022

TaCube: Pre-computing Data Cubes for Answering Numerical-Reasoning Questions over Tabular Data.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Towards Robust Numerical Question Answering: Diagnosing Numerical Capabilities of NLP Systems.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

RACE: Retrieval-augmented Commit Message Generation.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

FormLM: Recommending Creation Ideas for Online Forms by Modelling Semantic and Structural Information.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

PLOG: Table-to-Logic Pretraining for Logical Table-to-Text Generation.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Learning Rate Perturbation: A Generic Plugin of Learning Rate Schedule towards Flatter Local Minima.
Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022

Accelerating Code Search with Deep Hashing and Code Classification.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

FORTAP: Using Formulas for Numerical-Reasoning-Aware Table Pretraining.
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022

2021
Historical Spectrum Based Fault Localization.
IEEE Trans. Software Eng., 2021

Source Free Unsupervised Graph Domain Adaptation.
CoRR, 2021

A Unified and Fast Interpretable Model for Predictive Analytics.
CoRR, 2021

GBK-GNN: Gated Bi-Kernel Graph Neural Networks for Modeling Both Homophily and Heterophily.
CoRR, 2021

FORTAP: Using Formulae for Numerical-Reasoning-Aware Table Pretraining.
CoRR, 2021

Neural Code Summarization: How Far Are We?
CoRR, 2021

Is a Single Model Enough? MuCoS: A Multi-Model Ensemble Learning for Semantic Code Search.
CoRR, 2021

CoCoSum: Contextual Code Summarization with Multi-Relational Graph Neural Network.
CoRR, 2021

Understanding and Improvement of Adversarial Training for Network Embedding from an Optimization Perspective.
CoRR, 2021

MetaInsight: Automatic Discovery of Structured Knowledge for Exploratory Data Analysis.
Proceedings of the SIGMOD '21: International Conference on Management of Data, 2021

Table2Charts: Recommending Charts by Learning Shared Table Representations.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

TUTA: Tree-based Transformers for Generally Structured Table Pre-training.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

TabularNet: A Neural Network Architecture for Understanding Semantic Structures of Tabular Data.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

Semantic table structure identification in spreadsheets.
Proceedings of the ISSTA '21: 30th ACM SIGSOFT International Symposium on Software Testing and Analysis, 2021

On the Evaluation of Commit Message Generation Models: An Experimental Study.
Proceedings of the IEEE International Conference on Software Maintenance and Evolution, 2021

CAST: Enhancing Code Summarization with Hierarchical Splitting and Reconstruction of Abstract Syntax Trees.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

Neuron Campaign for Initialization Guided by Information Bottleneck Theory.
Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021

Is a Single Model Enough? MuCoS: A Multi-Model Ensemble Learning Approach for Semantic Code Search.
Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021

2020
Structure-aware Pre-training for Table Understanding with Tree-based Transformers.
CoRR, 2020

Table2Charts: Learning Shared Representations for Recommending Charts on Multi-dimensional Data.
CoRR, 2020

Learning Formatting Style Transfer and Structure Extraction for Spreadsheet Tables with a Hybrid Neural Network Architecture.
Proceedings of the CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, 2020

Neural Formatting for Spreadsheet Tables.
Proceedings of the CIKM '20: The 29th ACM International Conference on Information and Knowledge Management, 2020

Reliable and Efficient Anytime Skeleton Learning.
Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, 2020

2019
QuickInsights: Quick and Automatic Discovery of Insights from Multi-Dimensional Data.
Proceedings of the 2019 International Conference on Management of Data, 2019

TableSense: Spreadsheet Table Detection with Convolutional Neural Networks.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Systematically Ensuring the Confidence of Real-Time Home Automation IoT Systems.
ACM Trans. Cyber Phys. Syst., 2018

Automated refactoring of nested-IF formulae in spreadsheets.
Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2018

Expandable group identification in spreadsheets.
Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering, 2018

2017
Extracting Top-K Insights from Multi-dimensional Data.
Proceedings of the 2017 ACM International Conference on Management of Data, 2017

Precise condition synthesis for program repair.
Proceedings of the 39th International Conference on Software Engineering, 2017

2016
Systematically Debugging IoT Control System Correctness for Building Automation.
Proceedings of the 3rd ACM International Conference on Systems for Energy-Efficient Built Environments, 2016

2015
Uncovering JavaScript Performance Code Smells Relevant to Type Mutations.
Proceedings of the Programming Languages and Systems - 13th Asian Symposium, 2015

2014
Comprehending performance from real-world execution traces: a device-driver case.
Proceedings of the Architectural Support for Programming Languages and Operating Systems, 2014

2013
Software Analytics in Practice.
IEEE Softw., 2013

Context-sensitive delta inference for identifying workload-dependent performance bottlenecks.
Proceedings of the International Symposium on Software Testing and Analysis, 2013

2012
Performance debugging in the large via mining millions of stack traces.
Proceedings of the 34th International Conference on Software Engineering, 2012

Teaching and Training for Software Analytics.
Proceedings of the 25th IEEE Conference on Software Engineering Education and Training, 2012

2009
A Unified Framework for Recognizing Handwritten Chemical Expressions.
Proceedings of the 10th International Conference on Document Analysis and Recognition, 2009

2007
Systematic Multi-Path HMM Topology Design for Online Handwriting Recognition of East Asian Characters.
Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR 2007), 2007

2006
Automatic Real-Time Barcode Localization in Complex Scenes.
Proceedings of the International Conference on Image Processing, 2006

Super-Resolution of 3D Face.
Proceedings of the Computer Vision, 2006

Hallucinating 3D Faces.
Proceedings of the Computer Vision, 2006

2005
3D Face Recognition using Mapped Depth Images.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2005

2004
Sphere-Spin-Image: A Viewpoint-Invariant Surface Representation for 3D Face Recognition.
Proceedings of the Computational Science, 2004

Human Face Orientation Detection Using Power Spectrum Based Measurements.
Proceedings of the Sixth IEEE International Conference on Automatic Face and Gesture Recognition (FGR 2004), 2004


  Loading...