2024
Pasta: A Cost-Based Optimizer for Generating Pipelining Schedules for Dataflow DAGs.
Proc. ACM Manag. Data, December, 2024
IcedTea: Efficient and Responsive Time-Travel Debugging in Dataflow Systems.
Proc. VLDB Endow., November, 2024
Wording Matters: The Effect of Linguistic Characteristics and Political Ideology on Resharing of COVID-19 Vaccine Tweets.
ACM Trans. Comput. Hum. Interact., August, 2024
Texera: A System for Collaborative and Interactive Data Analytics Using Workflows.
Proc. VLDB Endow., July, 2024
DISC: Plug-and-Play Decoding Intervention with Similarity of Characters for Chinese Spelling Check.
CoRR, 2024
mPLUG-DocOwl 1.5: Unified Structure Learning for OCR-free Document Understanding.
,
,
,
,
,
,
,
,
,
,
CoRR, 2024
Demonstration of Udon: Line-by-line Debugging of User-Defined Functions in Data Workflows.
Proceedings of the Companion of the 2024 International Conference on Management of Data, 2024
scMulan: A Multitask Generative Pre-Trained Language Model for Single-Cell Analysis.
Proceedings of the Research in Computational Molecular Biology, 2024
Data Science Tasks Implemented with Scripts versus GUI-Based Workflows: The Good, the Bad, and the Ugly.
Proceedings of the 40th International Conference on Data Engineering, ICDE 2024, 2024
A Simple yet Effective Training-free Prompt-free Approach to Chinese Spelling Correction Based on Large Language Models.
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, 2024
Towards Better Utilization of Multi-Reference Training Data for Chinese Grammatical Error Correction.
Proceedings of the Findings of the Association for Computational Linguistics, 2024
Towards Demonstration-Aware Large Language Models for Machine Translation.
Proceedings of the Findings of the Association for Computational Linguistics, 2024
2023
Udon: Efficient Debugging of User-Defined Functions in Big Data Systems with Line-by-Line Control.
Proc. ACM Manag. Data, December, 2023
Tempura: a general cost-based optimizer framework for incremental data processing (Journal Version).
,
,
,
,
,
,
,
,
,
,
,
,
VLDB J., November, 2023
Building a Collaborative Data Analytics System: Opportunities and Challenges.
Proc. VLDB Endow., 2023
Demo of QueryBooster: Supporting Middleware-based SQL Query Rewriting as a Service.
Proc. VLDB Endow., 2023
QueryBooster: Improving SQL Performance Using Middleware Services for Human-Centered Query Rewriting.
Proc. VLDB Endow., 2023
Veer: Verifying Equivalence of Workflow Versions in Iterative Data Analytics.
CoRR, 2023
Raven: Accelerating Execution of Iterative Data Analytics by Reusing Results of Previous Equivalent Versions.
Proceedings of the Workshop on Human-In-the-Loop Data Analytics, 2023
Improving Seq2Seq Grammatical Error Correction via Decoding Interventions.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, 2023
Maliva: Using Machine Learning to Rewrite Visualization Queries Under Time Constraints.
Proceedings of the Proceedings 26th International Conference on Extending Database Technology, 2023
NaSGEC: a Multi-Domain Chinese Grammatical Error Correction Dataset from Native Speaker Texts.
Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, 2023
2022
Optimizing Machine Learning Inference Queries with Correlative Proxy Models.
Proc. VLDB Endow., 2022
Demonstration of Accelerating Machine Learning Inference Queries with Correlative Proxy Models.
Proc. VLDB Endow., 2022
Fries: Fast and Consistent Runtime Reconfiguration in Dataflow Systems with Transactional Guarantees.
Proc. VLDB Endow., 2022
Demonstration of Collaborative and Interactive Workflow-Based Data Analytics in Texera.
Proc. VLDB Endow., 2022
Fries: Fast and Consistent Runtime Reconfiguration in Dataflow Systems with Transactional Guarantees (Extended Version).
CoRR, 2022
Reshape: Adaptive Result-aware Skew Handling for Exploratory Analysis on Big Data.
CoRR, 2022
Mining Error Templates for Grammatical Error Correction.
CoRR, 2022
JEDI: These aren't the JSON documents you're looking for... (Extended Version*).
CoRR, 2022
JEDI: These aren't the JSON documents you're looking for?
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022
MuCGEC: a Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction.
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022
GSViz: progressive visualization of geospatial influences in social networks.
Proceedings of the 30th International Conference on Advances in Geographic Information Systems, 2022
SynGEC: Syntax-Enhanced Grammatical Error Correction with a Tailored GEC-Oriented Parser.
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
Rainbow: A Rendering-Aware Index for High-Quality Spatial Scatterplots with Result-Size Budgets.
Proceedings of the 22nd Eurographics Symposium on Parallel Graphics and Visualization, 2022
Demo of VisBooster: Accelerating Tableau Live Mode Queries Up to 100 Times Faster.
Proceedings of the Workshops of the EDBT/ICDT 2022 Joint Conference, 2022
Distributed Dynamic Economic Optimal Scheduling Method for Microgrid Based on Deep Learning.
Proceedings of the CAIBDA 2022, 2022
Public Opinions toward COVID-19 Vaccine Mandates: A Machine Learning-based Analysis of U.S. Tweets.
Proceedings of the AMIA 2022, 2022
2021
Why do people oppose mask wearing? A comprehensive analysis of U.S. tweets during the COVID-19 pandemic.
J. Am. Medical Informatics Assoc., 2021
Entity Relation Extraction as Dependency Parsing in Visually Rich Documents.
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021
2020
Robust and efficient memory management in Apache AsterixDB.
,
,
,
,
,
,
,
,
,
,
,
,
Softw. Pract. Exp., 2020
Tempura: A General Cost-Based Optimizer Framework for Incremental Data Processing.
,
,
,
,
,
,
,
,
,
,
,
,
Proc. VLDB Endow., 2020
Demonstration of Interactive Runtime Debugging of Distributed Dataflows in Texera.
Proc. VLDB Endow., 2020
Amber: A Debuggable Dataflow System Based on the Actor Model.
Proc. VLDB Endow., 2020
Similarity query support in big data management systems.
Inf. Syst., 2020
Tempura: A General Cost Based Optimizer Framework for Incremental Data Processing (Extended Version).
,
,
,
,
,
,
,
,
,
,
,
,
CoRR, 2020
Grosbeak: A Data Warehouse Supporting Resource-Aware Incremental Computing.
,
,
,
,
,
,
,
,
,
,
,
,
Proceedings of the 2020 International Conference on Management of Data, 2020
Marviq: Quality-Aware Geospatial Visualization of Range-Selection Queries Using Materialization.
Proceedings of the 2020 International Conference on Management of Data, 2020
Chunk-based Chinese Spelling Check with Global Optimization.
Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, 2020
2019
Inves: Incremental Partitioning-Based Verification for Graph Similarity Search.
Proceedings of the Advances in Database Technology, 2019
Synergy of Database Techniques and Machine Learning Models for String Similarity Search and Join.
Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019
2018
ZigZag: Supporting Similarity Queries on Vector Space Models.
Proceedings of the 2018 International Conference on Management of Data, 2018
End-to-End Machine Learning with Apache AsterixDB.
Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning, 2018
Enhancing Big Data with Semantics: The AsterixDB Approach (Poster).
Proceedings of the 12th IEEE International Conference on Semantic Computing, 2018
Supporting Similarity Queries in Apache AsterixDB.
Proceedings of the 21st International Conference on Extending Database Technology, 2018
Visually Analyzing A Billion Tweets: An Application for Collaborative Visual Analytics on Large High-Resolution Display.
Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2018), 2018
Heatflip: Temporal-Spatial Sampling for Progressive Heat Maps on Social Media Data.
Proceedings of the IEEE International Conference on Big Data (IEEE BigData 2018), 2018
2017
Erratum to: Special issue on best papers of VLDB 2015.
VLDB J., 2017
Special issue on best papers of VLDB 2015.
VLDB J., 2017
A Demonstration of TextDB: Declarative and Scalable Text Analytics on Large Data Sets.
Proceedings of the 33rd IEEE International Conference on Data Engineering, 2017
A Comparative Study of Log-Structured Merge-Tree-Based Spatial Indexes for Big Data.
Proceedings of the 33rd IEEE International Conference on Data Engineering, 2017
Caching Geospatial Objects in Web Browsers.
Proceedings of the 25th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, 2017
Drum: A rhythmic approach to interactive analytics on large data.
Proceedings of the 2017 IEEE International Conference on Big Data (IEEE BigData 2017), 2017
2016
Negative Factor: Improving Regular-Expression Matching in Strings.
ACM Trans. Database Syst., 2016
Hobbes3: Dynamic generation of variable-length signatures for efficient approximate subsequence mappings.
Proceedings of the 32nd IEEE International Conference on Data Engineering, 2016
Towards interactive analytics and visualization on one billion tweets.
Proceedings of the 24th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS 2016, Burlingame, California, USA, October 31, 2016
2015
Boosting the Quality of Approximate String Matching by Synonyms.
ACM Trans. Database Syst., 2015
LSM-Based Storage and Indexing: An Old Idea with Timely Benefits.
Proceedings of the Second International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data, 2015
RILCA: Collecting and Analyzing User-Behavior Information in Instant Search Using Relational DBMS.
Proceedings of the Real-Time Business Intelligence and Analytics, 2015
2014
Efficient hosted interpreters on the JVM.
ACM Trans. Archit. Code Optim., 2014
Storage Management in AsterixDB.
Proc. VLDB Endow., 2014
AsterixDB: A Scalable, Open Source BDMS.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proc. VLDB Endow., 2014
Improving read mapping using additional prefix grams.
BMC Bioinform., 2014
Efficient instant-fuzzy search with proximity ranking.
Proceedings of the IEEE 30th International Conference on Data Engineering, Chicago, 2014
Mux-Kmeans: multiplex kmeans for clustering large-scale data set.
Proceedings of the ScienceCloud'14, 2014
2013
Supporting Search-As-You-Type Using SQL in Databases.
IEEE Trans. Knowl. Data Eng., 2013
Improving regular-expression matching on strings using negative factors.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2013
String similarity measures and joins with synonyms.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2013
Efficient interpreter optimizations for the JVM.
Proceedings of the 2013 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, 2013
Efficient direct search on compressed genomic data.
Proceedings of the 29th IEEE International Conference on Data Engineering, 2013
Record Linkage: A 10-Year Retrospective.
Proceedings of the Database Systems for Advanced Applications, 2013
2012
ASTERIX: An Open Source System for "Big Data" Management and Analysis.
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Proc. VLDB Endow., 2012
Speeding Up Chemical Searches Using the Inverted Index: The Convergence of Chemoinformatics and Text Search Methods.
J. Chem. Inf. Model., 2012
SKIF-P: a point-based indexing and ranking of web documents for spatial-keyword search.
GeoInformatica, 2012
Big data platforms: what's next?
XRDS, 2012
Analysis of Instant Search Query Logs.
Proceedings of the 15th International Workshop on the Web and Databases 2012, 2012
Supporting efficient top-k queries in type-ahead search.
Proceedings of the 35th International ACM SIGIR conference on research and development in Information Retrieval, 2012
Inside "Big Data management": ogres, onions, or parfaits?
Proceedings of the 15th International Conference on Extending Database Technology, 2012
2011
Efficient fuzzy full-text type-ahead search.
VLDB J., 2011
Supporting BioMedical Information Retrieval: The BioTracer Approach.
Trans. Large Scale Data Knowl. Centered Syst., 2011
ASTERIX: towards a scalable, semistructured data platform for evolving-world models.
Distributed Parallel Databases, 2011
Location-Based Instant Search.
Proceedings of the Scientific and Statistical Database Management, 2011
CHIME: An Efficient Error-Tolerant Chinese Pinyin Input Method.
Proceedings of the IJCAI 2011, 2011
Answering approximate string queries on large data sets using external memory.
Proceedings of the 27th International Conference on Data Engineering, 2011
The Flamingo Software Package on Approximate String Queries.
Proceedings of the Database Systems for Adanced Applications, 2011
2010
Seaform: Search-As-You-Type in Forms.
Proc. VLDB Endow., 2010
Search-As-You-Type: Opportunities and Challenges.
IEEE Data Eng. Bull., 2010
Interactive and fuzzy search: a dynamic way to explore MEDLINE.
Bioinform., 2010
Efficient parallel set-similarity joins using MapReduce.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2010
Efficient fuzzy type-ahead search in TASTIER.
Proceedings of the 26th International Conference on Data Engineering, 2010
Supporting location-based approximate-keyword queries.
Proceedings of the 18th ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems, 2010
Hybrid Indexing and Seamless Ranking of Spatial and Textual Features of Web Documents.
Proceedings of the Database and Expert Systems Applications, 21st International Conference, 2010
Fuzzy Keyword Search on Spatial Data.
Proceedings of the Database Systems for Advanced Applications, 2010
2009
Rewriting Queries using Views.
Proceedings of the Encyclopedia of Database Systems, 2009
Efficient Approximate Search on String Collections.
Proc. VLDB Endow., 2009
SAIL: Structure-aware indexing for effective and progressive top-k keyword search over XML documents.
Inf. Sci., 2009
Human genomes as email attachments.
Bioinform., 2009
Efficient interactive fuzzy keyword search.
Proceedings of the 18th International Conference on World Wide Web, 2009
Efficient top-k algorithms for fuzzy search in string collections.
Proceedings of the First International Workshop on Keyword Search on Structured Data, 2009
Efficient type-ahead search on relational data: a TASTIER approach.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2009
Best-Effort Top-k Query Processing Under Budgetary Constraints.
Proceedings of the 25th International Conference on Data Engineering, 2009
Space-Constrained Gram-Based Indexing for Efficient Approximate String Search.
Proceedings of the 25th International Conference on Data Engineering, 2009
2008
SEPIA: estimating selectivities of approximate string predicates in large Databases.
VLDB J., 2008
Adaptive-sampling algorithms for answering aggregation queries on Web sites.
Data Knowl. Eng., 2008
Cost-based variable-length-gram selection for string collections to support approximate queries efficiently.
Proceedings of the ACM SIGMOD International Conference on Management of Data, 2008
Data exchange: query answering for incomplete data sources.
Proceedings of the 3rd International ICST Conference on Scalable Information Systems, 2008
Quality-Aware Retrieval of Data Objects from Autonomous Sources for Web-Based Repositories.
Proceedings of the 24th International Conference on Data Engineering, 2008
Efficient Merging and Filtering Algorithms for Approximate String Searches.
Proceedings of the 24th International Conference on Data Engineering, 2008
Data exchange in the presence of arithmetic comparisons.
Proceedings of the EDBT 2008, 2008
Supporting Keyword Queries on Structured Databases with Limited Search Interfaces.
Proceedings of the Database Systems for Advanced Applications, 2008
2007
Report on the First International VLDB Workshop on Clean Databases (CleanDB 2006).
SIGMOD Rec., 2007
Using views to generate efficient evaluation plans for queries.
J. Comput. Syst. Sci., 2007
Communication-Efficient Query Answering with Quality Guarantees in Client-Server Applications.
Proceedings of the Tenth International Workshop on the Web and Databases, 2007
VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams.
Proceedings of the 33rd International Conference on Very Large Data Bases, 2007
Processing Spatial-Keyword (SK) Queries in Geographic Information Retrieval (GIR) Systems.
Proceedings of the 19th International Conference on Scientific and Statistical Database Management, 2007
Protecting Individual Information Against Inference Attacks in Data Publishing.
Proceedings of the Advances in Databases: Concepts, 2007
2006
Supporting Efficient Record Linkage for Large Data Sets Using Mapping Techniques.
World Wide Web, 2006
Answering queries using materialized views with minimum size.
VLDB J., 2006
Achieving Communication Efficiency through Push-Pull Partitioning of Semantic Spaces to Disseminate Dynamic Information.
IEEE Trans. Knowl. Data Eng., 2006
Rewriting queries using views in the presence of arithmetic comparisons.
Theor. Comput. Sci., 2006
Relaxing Join and Selection Queries.
Proceedings of the 32nd International Conference on Very Large Data Bases, 2006
Supporting Approximate Similarity Queries with Quality Guarantees in P2P Systems.
Proceedings of the 13th International Conference on Management of Data, 2006
2005
Selectivity Estimation for Fuzzy String Predicates in Large Data Sets.
Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30, 2005
Indexing Mixed Types for Approximate Retrieval.
Proceedings of the 31st International Conference on Very Large Data Bases, Trondheim, Norway, August 30, 2005
XGuard: A System for Publishing XML Documents without Information Leakage in the Presence of Data Inference.
Proceedings of the 21st International Conference on Data Engineering, 2005
Quality-driven approximate methods for integrating GIS data.
Proceedings of the 13th ACM International Workshop on Geographic Information Systems, 2005
Answering aggregation queries on hierarchical web sites using adaptive sampling.
Proceedings of the 2005 ACM CIKM International Conference on Information and Knowledge Management, Bremen, Germany, October 31, 2005
2004
Secure XML Publishing without Information Leakage in the Presence of Data Inference.
Proceedings of the (e)Proceedings of the Thirtieth International Conference on Very Large Data Bases, VLDB 2004, Toronto, Canada, August 31, 2004
RACCOON: A Peer-Based System for Data Integration and Sharing.
Proceedings of the 20th International Conference on Data Engineering, 2004
NNH: Improving Performance of Nearest-Neighbor Searches Using Histograms.
Proceedings of the Advances in Database Technology, 2004
On Containment of Conjunctive Queries with Arithmetic Comparisons.
Proceedings of the Advances in Database Technology, 2004
2003
Computing complete answers to queries in the presence of limited access patterns.
VLDB J., 2003
Using Constraints to Describe Source Contents in Data Integration Systems.
IEEE Intell. Syst., 2003
Schema-guided wrapper maintenance for web-data extraction.
Proceedings of the Fifth ACM CIKM International Workshop on Web Information and Data Management (WIDM 2003), 2003
Materializing views with minimal size to answer queries.
Proceedings of the Twenty-Second ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 2003
Describing and Utilizing Constraints to Answer Queries in Data-Integration Systems.
Proceedings of IJCAI-03 Workshop on Information Integration on the Web (IIWeb-03), 2003
Efficient Record Linkage in Large Data Sets.
Proceedings of the Eighth International Conference on Database Systems for Advanced Applications (DASFAA '03), 2003
A Supervised Visual Wrapper Generator for Web-Data Extraction.
Proceedings of the 27th International Computer Software and Applications Conference (COMPSAC 2003): Design and Assessment of Trustworthy Software-Based Systems, 2003
2002
Clustering for Approximate Similarity Search in High-Dimensional Spaces.
IEEE Trans. Knowl. Data Eng., 2002
Executing SQL over encrypted data in the database-service-provider model.
Proceedings of the 2002 ACM SIGMOD International Conference on Management of Data, 2002
Answering Queries Using Views with Arithmetic Comparisons.
Proceedings of the Twenty-first ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, 2002
2001
Query processing and optimization in information-integration systems.
PhD thesis, 2001
Answering queries with useful bindings.
ACM Trans. Database Syst., 2001
Generating Efficient Plans for Queries Using Views.
Proceedings of the 2001 ACM SIGMOD international conference on Management of data, 2001
Data Placement for Multi-user Interactive DTV.
Proceedings of the 2001 IEEE International Conference on Multimedia and Expo, 2001
On Answering Queries in the Presence of Limited Access Patterns.
Proceedings of the Database Theory, 2001
Minimizing View Sets without Losing Query-Answering Power.
Proceedings of the Database Theory, 2001
2000
Answering Queries with Database Restrictions.
Proceedings of the Abstraction, 2000
Query Planning with Limited Source Capabilities.
Proceedings of the 16th International Conference on Data Engineering, San Diego, California, USA, February 28, 2000
1999
Computing Capabilities of Mediators.
Proceedings of the SIGMOD 1999, 1999
Optimizing Large Join Queries in Mediation Systems.
Proceedings of the Database Theory, 1999
1998
Capability Based Mediation in TSIMMIS.
Proceedings of the SIGMOD 1998, 1998
2D BubbleUp: Managing Parallel Disks for Media Servers.
Proceedings of the 5th International Conference of Foundations of Data Organization (FODO'98), 1998