Chen Li

Orcid: 0000-0001-8015-6870

Affiliations:
  • University of California, Irvine, CA, USA


According to our database1, Chen Li authored at least 160 papers between 1998 and 2024.

Collaborative distances:

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
Texera: A System for Collaborative and Interactive Data Analytics Using Workflows.
Proc. VLDB Endow., July, 2024

mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding.
CoRR, 2024

Demonstration of Udon: Line-by-line Debugging of User-Defined Functions in Data Workflows.
Proceedings of the Companion of the 2024 International Conference on Management of Data, 2024

scMulan: A Multitask Generative Pre-Trained Language Model for Single-Cell Analysis.
Proceedings of the Research in Computational Molecular Biology, 2024

Data Science Tasks Implemented with Scripts versus GUI-Based Workflows: The Good, the Bad, and the Ugly.
Proceedings of the 40th International Conference on Data Engineering, ICDE 2024, 2024

A Simple yet Effective Training-free Prompt-free Approach to Chinese Spelling Correction Based on Large Language Models.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024

Towards Better Utilization of Multi-Reference Training Data for Chinese Grammatical Error Correction.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

Towards Demonstration-Aware Large Language Models for Machine Translation.
Proceedings of the Findings of the Association for Computational Linguistics, 2024

2023
Udon: Efficient Debugging of User-Defined Functions in Big Data Systems with Line-by-Line Control.
Proc. ACM Manag. Data, December, 2023

Tempura: a general cost-based optimizer framework for incremental data processing (Journal Version).
VLDB J., November, 2023

Building a Collaborative Data Analytics System: Opportunities and Challenges.
Proc. VLDB Endow., 2023

Demo of QueryBooster: Supporting Middleware-based SQL Query Rewriting as a Service.
Proc. VLDB Endow., 2023

QueryBooster: Improving SQL Performance Using Middleware Services for Human-Centered Query Rewriting.
Proc. VLDB Endow., 2023

Veer: Verifying Equivalence of Workflow Versions in Iterative Data Analytics.
CoRR, 2023

Raven: Accelerating Execution of Iterative Data Analytics by Reusing Results of Previous Equivalent Versions.
Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 2023

Improving Seq2Seq Grammatical Error Correction via Decoding Interventions.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023

Maliva: Using Machine Learning to Rewrite Visualization Queries Under Time Constraints.
Proceedings of the Proceedings 26th International Conference on Extending Database Technology, 2023

NaSGEC: a Multi-Domain Chinese Grammatical Error Correction Dataset from Native Speaker Texts.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023

2022
Optimizing Machine Learning Inference Queries with Correlative Proxy Models.
Proc. VLDB Endow., 2022

Demonstration of Accelerating Machine Learning Inference Queries with Correlative Proxy Models.
Proc. VLDB Endow., 2022

Fries: Fast and Consistent Runtime Reconfiguration in Dataflow Systems with Transactional Guarantees.
Proc. VLDB Endow., 2022

Demonstration of Collaborative and Interactive Workflow-Based Data Analytics in Texera.
Proc. VLDB Endow., 2022

Fries: Fast and Consistent Runtime Reconfiguration in Dataflow Systems with Transactional Guarantees (Extended Version).
CoRR, 2022

Reshape: Adaptive Result-aware Skew Handling for Exploratory Analysis on Big Data.
CoRR, 2022

Mining Error Templates for Grammatical Error Correction.
CoRR, 2022

JEDI: These aren't the JSON documents you're looking for... (Extended Version*).
CoRR, 2022

JEDI: These aren't the JSON documents you're looking for?
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022

GSViz: progressive visualization of geospatial influences in social networks.
Proceedings of the 30th International Conference on Advances in Geographic Information Systems, 2022

SynGEC: Syntax-Enhanced Grammatical Error Correction with a Tailored GEC-Oriented Parser.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022

Rainbow: A Rendering-Aware Index for High-Quality Spatial Scatterplots with Result-Size Budgets.
Proceedings of the 22nd Eurographics Symposium on Parallel Graphics and Visualization, 2022

Demo of VisBooster: Accelerating Tableau Live Mode Queries Up to 100 Times Faster.
Proceedings of the Workshops of the EDBT/ICDT 2022 Joint Conference, 2022

Distributed Dynamic Economic Optimal Scheduling Method for Microgrid Based on Deep Learning.
Proceedings of the CAIBDA 2022, 2022

Public Opinions toward COVID-19 Vaccine Mandates: A Machine Learning-based Analysis of U.S. Tweets.
Proceedings of the AMIA 2022, 2022

2021
Why do people oppose mask wearing? A comprehensive analysis of U.S. tweets during the COVID-19 pandemic.
J. Am. Medical Informatics Assoc., 2021

Entity Relation Extraction as Dependency Parsing in Visually Rich Documents.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021

2020
Robust and efficient memory management in Apache AsterixDB.
Softw. Pract. Exp., 2020

Tempura: A General Cost-Based Optimizer Framework for Incremental Data Processing.
Proc. VLDB Endow., 2020

Demonstration of Interactive Runtime Debugging of Distributed Dataflows in Texera.
Proc. VLDB Endow., 2020

Amber: A Debuggable Dataflow System Based on the Actor Model.
Proc. VLDB Endow., 2020

Similarity query support in big data management systems.
Inf. Syst., 2020

Tempura: A General Cost Based Optimizer Framework for Incremental Data Processing (Extended Version).
CoRR, 2020

Grosbeak: A Data Warehouse Supporting Resource-Aware Incremental Computing.
Proceedings of the 2020 International Conference on Management of Data, 2020

Marviq: Quality-Aware Geospatial Visualization of Range-Selection Queries Using Materialization.
Proceedings of the 2020 International Conference on Management of Data, 2020

Chunk-based Chinese Spelling Check with Global Optimization.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020

2019
Inves: Incremental Partitioning-Based Verification for Graph Similarity Search.
Proceedings of the Advances in Database Technology, 2019

Synergy of Database Techniques and Machine Learning Models for String Similarity Search and Join.
Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019

2018
ZigZag: Supporting Similarity Queries on Vector Space Models.
Proceedings of the 2018 International Conference on Management of Data, 2018

End-to-End Machine Learning with Apache AsterixDB.
Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning, 2018

Enhancing Big Data with Semantics: The AsterixDB Approach (Poster).
Proceedings of the 12th IEEE International Conference on Semantic Computing, 2018

Supporting Similarity Queries in Apache AsterixDB.
Proceedings of the 21st International Conference on Extending Database Technology, 2018

Visually Analyzing A Billion Tweets: An Application for Collaborative Visual Analytics on Large High-Resolution Display.
Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2018), 2018

Heatflip: Temporal-Spatial Sampling for Progressive Heat Maps on Social Media Data.
Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2018), 2018

2017
Erratum to: Special issue on best papers of VLDB 2015.
VLDB J., 2017

Special issue on best papers of VLDB 2015.
VLDB J., 2017

A Demonstration of TextDB: Declarative and Scalable Text Analytics on Large Data Sets.
Proceedings of the 33rd IEEE International Conference on Data Engineering, 2017

A Comparative Study of Log-Structured Merge-Tree-Based Spatial Indexes for Big Data.
Proceedings of the 33rd IEEE International Conference on Data Engineering, 2017

Caching Geospatial Objects in Web Browsers.
Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2017

Drum: A rhythmic approach to interactive analytics on large data.
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017

2016
Negative Factor: Improving Regular-Expression Matching in Strings.
ACM Trans. Database Syst., 2016

Hobbes3: Dynamic generation of variable-length signatures for efficient approximate subsequence mappings.
Proceedings of the 32nd IEEE International Conference on Data Engineering, 2016

Towards interactive analytics and visualization on one billion tweets.
Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS 2016, Burlingame, California, USA, October 31, 2016

2015
Boosting the Quality of Approximate String Matching by Synonyms.
ACM Trans. Database Syst., 2015

Front Matter.
Proc. VLDB Endow., 2015

LSM-Based Storage and Indexing: An Old Idea with Timely Benefits.
Proceedings of the Second International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, 2015

RILCA: Collecting and Analyzing User-Behavior Information in Instant Search Using Relational DBMS.
Proceedings of the Real-Time Business Intelligence and Analytics, 2015

2014
Efficient hosted interpreters on the JVM.
ACM Trans. Archit. Code Optim., 2014

Storage Management in AsterixDB.
Proc. VLDB Endow., 2014

AsterixDB: A Scalable, Open Source BDMS.
Proc. VLDB Endow., 2014

Improving read mapping using additional prefix grams.
BMC Bioinform., 2014

Efficient instant-fuzzy search with proximity ranking.
Proceedings of the IEEE 30th International Conference on Data Engineering, Chicago, 2014

Mux-Kmeans: multiplex kmeans for clustering large-scale data set.
Proceedings of the ScienceCloud'14, 2014

2013
Supporting Search-As-You-Type Using SQL in Databases.
IEEE Trans. Knowl. Data Eng., 2013

Improving regular-expression matching on strings using negative factors.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2013

String similarity measures and joins with synonyms.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2013

Efficient interpreter optimizations for the JVM.
Proceedings of the 2013 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, 2013

Efficient direct search on compressed genomic data.
Proceedings of the 29th IEEE International Conference on Data Engineering, 2013

Record Linkage: A 10-Year Retrospective.
Proceedings of the Database Systems for Advanced Applications, 2013

2012
ASTERIX: An Open Source System for "Big Data" Management and Analysis.
Proc. VLDB Endow., 2012

Speeding Up Chemical Searches Using the Inverted Index: The Convergence of Chemoinformatics and Text Search Methods.
J. Chem. Inf. Model., 2012

SKIF-P: a point-based indexing and ranking of web documents for spatial-keyword search.
GeoInformatica, 2012

Big data platforms: what's next?
XRDS, 2012

Analysis of Instant Search Query Logs.
Proceedings of the 15th International Workshop on the Web and Databases 2012, 2012

Supporting efficient top-k queries in type-ahead search.
Proceedings of the 35th International ACM SIGIR conference on research and development in Information Retrieval, 2012

Inside "Big Data management": ogres, onions, or parfaits?
Proceedings of the 15th International Conference on Extending Database Technology, 2012

2011
Efficient fuzzy full-text type-ahead search.
VLDB J., 2011

Supporting BioMedical Information Retrieval: The BioTracer Approach.
Trans. Large Scale Data Knowl. Centered Syst., 2011

ASTERIX: towards a scalable, semistructured data platform for evolving-world models.
Distributed Parallel Databases, 2011

Location-Based Instant Search.
Proceedings of the Scientific and Statistical Database Management, 2011

CHIME: An Efficient Error-Tolerant Chinese Pinyin Input Method.
Proceedings of the IJCAI 2011, 2011

Answering approximate string queries on large data sets using external memory.
Proceedings of the 27th International Conference on Data Engineering, 2011

The Flamingo Software Package on Approximate String Queries.
Proceedings of the Database Systems for Adanced Applications, 2011

2010
Seaform: Search-As-You-Type in Forms.
Proc. VLDB Endow., 2010

Search-As-You-Type: Opportunities and Challenges.
IEEE Data Eng. Bull., 2010

Interactive and fuzzy search: a dynamic way to explore MEDLINE.
Bioinform., 2010

Efficient parallel set-similarity joins using MapReduce.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2010

Efficient fuzzy type-ahead search in TASTIER.
Proceedings of the 26th International Conference on Data Engineering, 2010

Supporting location-based approximate-keyword queries.
Proceedings of the 18th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, 2010

Hybrid Indexing and Seamless Ranking of Spatial and Textual Features of Web Documents.
Proceedings of the Database and Expert Systems Applications, 21st International Conference, 2010

Fuzzy Keyword Search on Spatial Data.
Proceedings of the Database Systems for Advanced Applications, 2010

2009
Rewriting Queries using Views.
Proceedings of the Encyclopedia of Database Systems, 2009

Efficient Approximate Search on String Collections.
Proc. VLDB Endow., 2009

SAIL: Structure-aware indexing for effective and progressive top-k keyword search over XML documents.
Inf. Sci., 2009

Human genomes as email attachments.
Bioinform., 2009

Efficient interactive fuzzy keyword search.
Proceedings of the 18th International Conference on World Wide Web, 2009

Efficient top-k algorithms for fuzzy search in string collections.
Proceedings of the First International Workshop on Keyword Search on Structured Data, 2009

Efficient type-ahead search on relational data: a TASTIER approach.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2009

Best-Effort Top-k Query Processing Under Budgetary Constraints.
Proceedings of the 25th International Conference on Data Engineering, 2009

Space-Constrained Gram-Based Indexing for Efficient Approximate String Search.
Proceedings of the 25th International Conference on Data Engineering, 2009

2008
SEPIA: estimating selectivities of approximate string predicates in large Databases.
VLDB J., 2008

Adaptive-sampling algorithms for answering aggregation queries on Web sites.
Data Knowl. Eng., 2008

Cost-based variable-length-gram selection for string collections to support approximate queries efficiently.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2008

Data exchange: query answering for incomplete data sources.
Proceedings of the 3rd International ICST Conference on Scalable Information Systems, 2008

Quality-Aware Retrieval of Data Objects from Autonomous Sources for Web-Based Repositories.
Proceedings of the 24th International Conference on Data Engineering, 2008

Efficient Merging and Filtering Algorithms for Approximate String Searches.
Proceedings of the 24th International Conference on Data Engineering, 2008

Data exchange in the presence of arithmetic comparisons.
Proceedings of the EDBT 2008, 2008

Supporting Keyword Queries on Structured Databases with Limited Search Interfaces.
Proceedings of the Database Systems for Advanced Applications, 2008

2007
Report on the First International VLDB Workshop on Clean Databases (CleanDB 2006).
SIGMOD Rec., 2007

Using views to generate efficient evaluation plans for queries.
J. Comput. Syst. Sci., 2007

Communication-Efficient Query Answering with Quality Guarantees in Client-Server Applications.
Proceedings of the Tenth International Workshop on the Web and Databases, 2007

VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams.
Proceedings of the 33rd International Conference on Very Large Data Bases, 2007

Processing Spatial-Keyword (SK) Queries in Geographic Information Retrieval (GIR) Systems.
Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007

Protecting Individual Information Against Inference Attacks in Data Publishing.
Proceedings of the Advances in Databases: Concepts, 2007

2006
Supporting Efficient Record Linkage for Large Data Sets Using Mapping Techniques.
World Wide Web, 2006

Answering queries using materialized views with minimum size.
VLDB J., 2006

Achieving Communication Efficiency through Push-Pull Partitioning of Semantic Spaces to Disseminate Dynamic Information.
IEEE Trans. Knowl. Data Eng., 2006

Rewriting queries using views in the presence of arithmetic comparisons.
Theor. Comput. Sci., 2006

Relaxing Join and Selection Queries.
Proceedings of the 32nd International Conference on Very Large Data Bases, 2006

Supporting Approximate Similarity Queries with Quality Guarantees in P2P Systems.
Proceedings of the 13th International Conference on Management of Data, 2006

2005
Selectivity Estimation for Fuzzy String Predicates in Large Data Sets.
Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30, 2005

Indexing Mixed Types for Approximate Retrieval.
Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30, 2005

XGuard: A System for Publishing XML Documents without Information Leakage in the Presence of Data Inference.
Proceedings of the 21st International Conference on Data Engineering, 2005

Quality-driven approximate methods for integrating GIS data.
Proceedings of the 13th ACM International Workshop on Geographic Information Systems, 2005

Answering aggregation queries on hierarchical web sites using adaptive sampling.
Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management, Bremen, Germany, October 31, 2005

2004
Secure XML Publishing without Information Leakage in the Presence of Data Inference.
Proceedings of the (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, Toronto, Canada, August 31, 2004

RACCOON: A Peer-Based System for Data Integration and Sharing.
Proceedings of the 20th International Conference on Data Engineering, 2004

NNH: Improving Performance of Nearest-Neighbor Searches Using Histograms.
Proceedings of the Advances in Database Technology, 2004

On Containment of Conjunctive Queries with Arithmetic Comparisons.
Proceedings of the Advances in Database Technology, 2004

2003
Computing complete answers to queries in the presence of limited access patterns.
VLDB J., 2003

Using Constraints to Describe Source Contents in Data Integration Systems.
IEEE Intell. Syst., 2003

Schema-guided wrapper maintenance for web-data extraction.
Proceedings of the Fifth ACM CIKM International Workshop on Web Information and Data Management (WIDM 2003), 2003

Materializing views with minimal size to answer queries.
Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 2003

Describing and Utilizing Constraints to Answer Queries in Data-Integration Systems.
Proceedings of IJCAI-03 Workshop on Information Integration on the Web (IIWeb-03), 2003

Efficient Record Linkage in Large Data Sets.
Proceedings of the Eighth International Conference on Database Systems for Advanced Applications (DASFAA '03), 2003

A Supervised Visual Wrapper Generator for Web-Data Extraction.
Proceedings of the 27th International Computer Software and Applications Conference (COMPSAC 2003): Design and Assessment of Trustworthy Software-Based Systems, 2003

2002
Clustering for Approximate Similarity Search in High-Dimensional Spaces.
IEEE Trans. Knowl. Data Eng., 2002

Executing SQL over encrypted data in the database-service-provider model.
Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, 2002

Answering Queries Using Views with Arithmetic Comparisons.
Proceedings of the Twenty-first ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 2002

2001
Query processing and optimization in information-integration systems.
PhD thesis, 2001

Answering queries with useful bindings.
ACM Trans. Database Syst., 2001

Generating Efficient Plans for Queries Using Views.
Proceedings of the 2001 ACM SIGMOD international conference on Management of data, 2001

Data Placement for Multi-user Interactive DTV.
Proceedings of the 2001 IEEE International Conference on Multimedia and Expo, 2001

On Answering Queries in the Presence of Limited Access Patterns.
Proceedings of the Database Theory, 2001

Minimizing View Sets without Losing Query-Answering Power.
Proceedings of the Database Theory, 2001

2000
Answering Queries with Database Restrictions.
Proceedings of the Abstraction, 2000

Query Planning with Limited Source Capabilities.
Proceedings of the 16th International Conference on Data Engineering, San Diego, California, USA, February 28, 2000

1999
Computing Capabilities of Mediators.
Proceedings of the SIGMOD 1999, 1999

Optimizing Large Join Queries in Mediation Systems.
Proceedings of the Database Theory, 1999

1998
Capability Based Mediation in TSIMMIS.
Proceedings of the SIGMOD 1998, 1998

2D BubbleUp: Managing Parallel Disks for Media Servers.
Proceedings of the 5th International Conference of Foundations of Data Organization (FODO'98), 1998


  Loading...