Jian Wu

Orcid: 0000-0003-0173-4463

Affiliations:
  • Old Dominion University, Department of Computer Science, Norfolk, VA, USA
  • Pennsylvania State University, College of Information Sciences and Technology, University Park, PA, USA (former, PhD 2011)


According to our database1, Jian Wu authored at least 89 papers between 2012 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Building datasets to support information extraction and structure parsing from electronic theses and dissertations.
Int. J. Digit. Libr., June, 2024

[Re] Network Deconvolution.
CoRR, 2024

Uncertainty Quantification in Table Structure Recognition.
CoRR, 2024

Can citations tell us about a paper's reproducibility? A case study of machine learning papers.
CoRR, 2024

Can Large Language Models Discern Evidence for Scientific Hypotheses? Case Studies in the Social Sciences.
Proceedings of the 2024 Joint International Conference on Computational Linguistics, 2024

SHORT: Can citations tell us about a paper's reproducibility? A case study of machine learning papers.
Proceedings of the 2nd ACM Conference on Reproducibility and Replicability, 2024

ETDPC: A Multimodality Framework for Classifying Pages in Electronic Theses and Dissertations.
Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, 2024

2023
DeepPatent2: A Large-Scale Benchmarking Corpus for Technical Drawing Understanding.
CoRR, 2023

MSVEC: A Multidomain Testing Dataset for Scientific Claim Verification.
Proceedings of the Twenty-fourth International Symposium on Theory, 2023

Maximizing Equitable Reach and Accessibility of ETDs.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 2023

MetaEnhance: Metadata Quality Improvement for Electronic Theses and Dissertations of University Libraries.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 2023

Who can Submit an Excellent Review for this Manuscript in the Next 30 Days? - Peer Reviewing in the Age of Overload.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 2023

Can machine learning algorithms predict publication outcomes? A case study of COVID-19 preprints.
Proceedings of the IEEE International Conference on Data Mining, 2023

A Study on Reproducibility and Replicability of Table Structure Recognition Methods.
Proceedings of the Document Analysis and Recognition - ICDAR 2023, 2023

It's Not Just GitHub: Identifying Data and Software Sources Included in Publications.
Proceedings of the Linking Theory and Practice of Digital Libraries: 27th International Conference on Theory and Practice of Digital Libraries, 2023

ClaimDistiller: Scientific Claim Extraction with Supervised Contrastive Learning.
Proceedings of Joint Workshop of the 4th Extraction and Evaluation of Knowledge Entities from Scientific Documents (EEKE2023) and the 3rd AI + Informetrics (AII2023) co-located with the JCDL 2023, 2023

ACL-Fig: A Dataset for Scientific Figure Classification.
Proceedings of the Workshop on Scientific Document Understanding co-located with 37th AAAI Conference on Artificial Inteligence (AAAI 2023), 2023

2022
ArithFusion: An Arithmetic Deep Model for Temporal Remote Sensing Image Fusion.
Remote. Sens., December, 2022

SciEv: Finding Scientific Evidence Papers for Scientific News.
CoRR, 2022

A Study of Computational Reproducibility using URLs Linking to Open Access Datasets and Software.
Proceedings of the Companion of The Web Conference 2022, Virtual Event / Lyon, France, April 25, 2022

DeepPatent: Large scale patent drawing recognition and retrieval.
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2022

Online Deep Learning from Doubly-Streaming Data.
Proceedings of the MM '22: The 30th ACM International Conference on Multimedia, Lisboa, Portugal, October 10, 2022

Visual descriptor extraction from patent figure captions: a case study of data efficiency between BiLSTM and transformer.
Proceedings of the JCDL '22: The ACM/IEEE Joint Conference on Digital Libraries in 2022, Cologne, Germany, June 20, 2022

Design Considerations for a Sustainable Scholarly Big Data Service.
Proceedings of the 14th Annual Meeting of the Forum for Information Retrieval Evaluation, 2022

Theory entity extraction for social and behavioral sciences papers using distant supervision.
Proceedings of the 22nd ACM Symposium on Document Engineering, 2022

Scholarly big data quality assessment: a case study of document linking and conflation with S2ORC.
Proceedings of the 22nd ACM Symposium on Document Engineering, 2022

Applications of data analysis on scholarly long documents.
Proceedings of the IEEE International Conference on Big Data, 2022

A Synthetic Prediction Market for Estimating Confidence in Published Work.
Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, 2022

Segmenting Technical Drawing Figures in US Patents.
Proceedings of the Workshop on Scientific Document Understanding co-located with 36th AAAI Conference on Artificial Inteligence, 2022

2021
Three Benchmark Datasets for Scholarly Article Layout Analysis.
Dataset, May, 2021

SampannaKahu/ScanBank: v0.2.
Dataset, April, 2021

Scholar2vec: Vector Representation of Scholars for Lifetime Collaborator Prediction.
ACM Trans. Knowl. Discov. Data, 2021

Extractive Research Slide Generation Using Windowed Labeling Ranking.
CoRR, 2021

Predicting the Reproducibility of Social and Behavioral Science Papers Using Supervised Learning Models.
CoRR, 2021

Extraction and Evaluation of Statistical Information from Social and Behavioral Science Papers.
Proceedings of the Companion of The Web Conference 2021, 2021

What Were People Searching For? A Query Log Analysis of An Academic Search Engine.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 2021

ScanBank: A Benchmark Dataset for Figure Extraction from Scanned Electronic Theses and Dissertations.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 2021

Automatic Metadata Extraction Incorporating Visual Features from Scanned Electronic Theses and Dissertations.
Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 2021

ChartReader: Automatic Parsing of Bar-Plots.
Proceedings of the 22nd IEEE International Conference on Information Reuse and Integration for Data Science, 2021

Document Domain Randomization for Deep Learning Document Layout Extraction.
Proceedings of the 16th International Conference on Document Analysis and Recognition, 2021

Ranked List Fusion and Re-ranking with Pre-trained Transformers for ARQMath Lab.
Proceedings of the Working Notes of CLEF 2021 - Conference and Labs of the Evaluation Forum, Bucharest, Romania, September 21st - to, 2021

Building A Large Collection of Multi-domain Electronic Theses and Dissertations.
Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), 2021

Building an Accessible, Usable, Scalable, and Sustainable Service for Scholarly Big Data.
Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), 2021

Understanding and Predicting Retractions of Published Work.
Proceedings of the Workshop on Scientific Document Understanding co-located with 35th AAAI Conference on Artificial Inteligence, 2021

Recognizing Figure Labels in Patents.
Proceedings of the Workshop on Scientific Document Understanding co-located with 35th AAAI Conference on Artificial Inteligence, 2021

2020
Large Scale Subject Category Classification of Scholarly Papers With Deep Attentive Neural Networks.
Frontiers Res. Metrics Anal., 2020

A Comparative Study of Sequence Tagging Methods for Domain Knowledge Entity Recognition in Biomedical Papers.
Proceedings of the JCDL '20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, 2020

Inter-subdiscipline Analysis Based on Mathematical Statements.
Proceedings of the JCDL '20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, 2020

Analyzing the Effect of Reading Patterns using Eye Tracking Measures.
Proceedings of the JCDL '20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, 2020

A Heuristic Baseline Method for Metadata Extraction from Scanned Electronic Theses and Dissertations.
Proceedings of the JCDL '20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, 2020

Keyphrase Extraction in Scholarly Digital Library Search Engines.
Proceedings of the Web Services - ICWS 2020, 2020

Acknowledgement Entity Recognition in CORD-19 Papers.
Proceedings of the First Workshop on Scholarly Document Processing, 2020

Accelerating Substructure Similarity Search for Formula Retrieval.
Proceedings of the Advances in Information Retrieval, 2020

COVIDSeer: Extending the CORD-19 Dataset.
Proceedings of the DocEng '20: ACM Symposium on Document Engineering 2020, Virtual Event, CA, USA, September 29, 2020

PSU at CLEF-2020 ARQMath Track: Unsupervised Re-ranking using Pretraining.
Proceedings of the Working Notes of CLEF 2020, 2020

Modeling Updates of Scholarly Webpages Using Archived Data.
Proceedings of the 2020 IEEE International Conference on Big Data (IEEE BigData 2020), 2020

2019
Query Auto Completion for Math Formula Search.
CoRR, 2019

Sec-Lib: Protecting Scholarly Digital Libraries From Infected Papers Using Active Machine Learning Framework.
IEEE Access, 2019

Automatic Slide Generation for Scientific Papers.
Proceedings of the Third International Workshop on Capturing Scientific Knowledge co-located with the 10th International Conference on Knowledge Capture (K-CAP 2019), 2019

Searching for Evidence of Scientific News in Scholarly Big Data.
Proceedings of the 10th International Conference on Knowledge Capture, 2019

Tangent-CFT: An Embedding Model for Mathematical Formulas.
Proceedings of the 2019 ACM SIGIR International Conference on Theory of Information Retrieval, 2019

Learned Neural Iterative Decoding for Lossy Image Compression Systems.
Proceedings of the Data Compression Conference, 2019

CiteSeerX: 20 years of service to scholarly big data.
Proceedings of the Conference on Artificial Intelligence for Data Discovery and Reuse, 2019

Cleaning Noisy and Heterogeneous Metadata for Record Linking across Scholarly Big Datasets.
Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence, 2019

2018
Learned Iterative Decoding for Lossy Image Compression Systems.
CoRR, 2018

CiteSeerX-2018: A Cleansed Multidisciplinary Scholarly Big Dataset.
Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2018), 2018

2017
Scholarly Digital Libraries as a Platform for Malware Distribution.
Proceedings of the A Systems Approach to Cyber Security, 2017

A Supervised Learning Approach To Entity Matching Between Scholarly Big Datasets.
Proceedings of the Knowledge Capture Conference, 2017

HESDK: A Hybrid Approach to Extracting Scientific Domain Knowledge Entities.
Proceedings of the 2017 ACM/IEEE Joint Conference on Digital Libraries, 2017

Compiling Keyphrase Candidates for Scientific Literature Based on Wikipedia.
Proceedings of the Joint Proceedings of the 1st Workshop on Temporal Dynamics in Digital Libraries (TDDL 2017), 2017

2016
CiteSeerX data: semanticizing scholarly papers.
Proceedings of the International Workshop on Semantic Big Data, 2016

Information Extraction for Scholarly Digital Libraries.
Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, 2016

Document Type Classification in Online Digital Libraries.
Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016

2015
CiteSeerX: AI in a Digital Library Search Engine.
AI Mag., 2015

Big Scholarly Data in CiteSeerX: Information Extraction from the Web.
Proceedings of the 24th International Conference on World Wide Web Companion, 2015

Online Learning of Deep Hybrid Architectures for Semi-supervised Categorization.
Proceedings of the Machine Learning and Knowledge Discovery in Databases, 2015

PDFMEF: A Multi-Entity Knowledge Extraction Framework for Scholarly Documents and Semantic Search.
Proceedings of the 8th International Conference on Knowledge Capture, 2015

2014
Towards building a scholarly big data platform: Challenges, lessons and opportunities.
Proceedings of the IEEE/ACM Joint Conference on Digital Libraries, 2014

A Web Service for Scholarly Big Data Information Extraction.
Proceedings of the 2014 IEEE International Conference on Web Services, 2014

Scholarly big data information extraction and integration in the CiteSeer<sup>χ</sup> digital library.
Proceedings of the Workshops Proceedings of the 30th International Conference on Data Engineering Workshops, 2014

Migrating a Digital Library to a Private Cloud.
Proceedings of the 2014 IEEE International Conference on Cloud Engineering, 2014

Utility-Based Control Feedback in a Digital Library Search Engine: Cases in CiteSeerX.
Proceedings of the 9th International Workshop on Feedback Computing, 2014

CiteSeer x : A Scholarly Big Dataset.
Proceedings of the Advances in Information Retrieval, 2014

SimSeerX: a similar document search engine.
Proceedings of the ACM Symposium on Document Engineering 2014, 2014

The impact of user corrections on a crawl-based digital library: A CiteSeerX perspective.
Proceedings of the 10th IEEE International Conference on Collaborative Computing: Networking, 2014

CiteSeerX: AI in a Digital Library Search Engine.
Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, 2014

2012
Specialized Research Datasets in the CiteSeer<sup>x</sup> Digital Library.
D Lib Mag., 2012

Web crawler middleware for search engine digital libraries: a case study for citeseerX.
Proceedings of the Twelfth International Workshop on Web Information and Data Management, 2012

The evolution of a crawling strategy for an academic document search engine: whitelists and blacklists.
Proceedings of the Web Science 2012, 2012


  Loading...