Jin Wang

Orcid: 0000-0002-3172-6133

Affiliations:
  • Megagon Labs
  • University of California, Los Angeles, Computer Science Department, CA, USA (former)
  • Tsinghua University, TNList, RIIT, Center for High-speed Railway Technology, Beijing, China (former)


According to our database1, Jin Wang authored at least 63 papers between 2013 and 2024.

Collaborative distances:
  • Dijkstra number2 of four.
  • Erdős number3 of four.

Timeline

Legend:

Book 
In proceedings 
Article 
PhD thesis 
Dataset
Other 

Links

Online presence:

On csauthors.net:

Bibliography

2024
PORCA: Root Cause Analysis with Partially Observed Data.
CoRR, 2024

A Blueprint Architecture of Compound AI Systems for Enterprise.
CoRR, 2024

LLM-assisted Labeling Function Generation for Semantic Type Detection.
Proceedings of Workshops at the 50th International Conference on Very Large Data Bases, 2024

Boosting the Adversarial Robustness of Graph Neural Networks: An OOD Perspective.
Proceedings of the Twelfth International Conference on Learning Representations, 2024

Fairness-Aware Data Preparation for Entity Matching.
Proceedings of the 40th IEEE International Conference on Data Engineering, 2024

Deep Dirichlet Process Mixture Model for Non-parametric Trajectory Clustering.
Proceedings of the 40th IEEE International Conference on Data Engineering, 2024

Demonstration of a Multi-agent Framework for Text to SQL Applications with Large Language Models.
Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, 2024

2023
Watchog: A Light-weight Contrastive Learning based Framework for Column Annotation.
Proc. ACM Manag. Data, December, 2023

Efficient EMD-Based Similarity Search via Batch Pruning and Incremental Computation.
IEEE Trans. Knowl. Data Eng., 2023

Semantics-aware Dataset Discovery from Data Lakes with Contextualized Column-based Representation Learning.
Proc. VLDB Endow., 2023

Table Discovery in Data Lakes: State-of-the-art and Future Directions.
Proceedings of the Companion of the 2023 International Conference on Management of Data, 2023

Causal Discovery from Temporal Data.
Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2023

Sudowoodo: Contrastive Self-supervised Learning for Multi-purpose Data Integration and Preparation.
Proceedings of the 39th IEEE International Conference on Data Engineering, 2023

Spatiotemporal Activity Modeling via Hierarchical Cross-Modal Embedding : Extended Abstract.
Proceedings of the 39th IEEE International Conference on Data Engineering, 2023

2022
Spatiotemporal Activity Modeling via Hierarchical Cross-Modal Embedding.
IEEE Trans. Knowl. Data Eng., 2022

Optimizing Parallel Recursive Datalog Evaluation on Multicore Machines.
Proceedings of the SIGMOD '22: International Conference on Management of Data, Philadelphia, PA, USA, June 12, 2022

Machop: an end-to-end generalized entity matching framework.
Proceedings of the aiDM '22: Proceedings of the Fifth International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, 2022

MSDR: Multi-Step Dependency Relation Networks for Spatial Temporal Forecasting.
Proceedings of the KDD '22: The 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 14, 2022

Highly Efficient String Similarity Search and Join over Compressed Indexes.
Proceedings of the 38th IEEE International Conference on Data Engineering, 2022

Efficient EMD-based Similarity Search via Batch Pruning and Incremental Computation (Extended Abstract).
Proceedings of the 38th IEEE International Conference on Data Engineering, 2022

Minun: evaluating counterfactual explanations for entity matching.
Proceedings of the DEEM '22: Proceedings of the Sixth Workshop on Data Management for End-To-End Machine Learning Philadelphia, 2022

Demonstration of LogicLib: An Expressive Multi-Language Interface over Scalable Datalog System.
Proceedings of the 31st ACM International Conference on Information & Knowledge Management, 2022

2021
Clustering Enhanced Error-tolerant Top-k Spatio-textual Search.
World Wide Web, 2021

Formal semantics and high performance in declarative machine learning using Datalog.
VLDB J., 2021

Updatable Learned Index with Precise Positions.
Proc. VLDB Endow., 2021

Deep Entity Matching: Challenges and Opportunities.
ACM J. Data Inf. Qual., 2021

Developing Big-Data Application as Queries: an Aggregate-Based approach.
IEEE Data Eng. Bull., 2021

PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation.
CoRR, 2021

A Graph-based Approach for Trajectory Similarity Computation in Spatial Networks.
Proceedings of the KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2021

KDDLog: Performance and Scalability in Knowledge Discovery by Declarative Queries with Aggregates.
Proceedings of the 37th IEEE International Conference on Data Engineering, 2021

Revisiting Data Prefetching for Database Systems with Machine Learning Techniques.
Proceedings of the 37th IEEE International Conference on Data Engineering, 2021

Machamp: A Generalized Entity Matching Benchmark.
Proceedings of the CIKM '21: The 30th ACM International Conference on Information and Knowledge Management, Virtual Event, Queensland, Australia, November 1, 2021

2020
Power, Performance and Scalability for Big Data Query Languages: The Machine Learning Challenge.
PhD thesis, 2020

A Transformation-Based Framework for KNN Set Similarity Search.
IEEE Trans. Knowl. Data Eng., 2020

Boosting approximate dictionary-based entity extraction with synonyms.
Inf. Sci., 2020

RASQL: A Powerful Language and its System for Big Data Applications.
Proceedings of the 2020 International Conference on Management of Data, 2020

Discovering Subsequence Patterns for Next POI Recommendation.
Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, 2020

A Transformation-based Framework for KNN Set Similarity Search(Extended Abstract).
Proceedings of the 36th IEEE International Conference on Data Engineering, 2020

Fast Error-tolerant Location-aware Query Autocompletion.
Proceedings of the 36th IEEE International Conference on Data Engineering, 2020

2019
Large-Scale Frequent Episode Mining from Complex Event Sequences with Hierarchies.
ACM Trans. Intell. Syst. Technol., 2019

Monotonic Properties of Completed Aggregates in Recursive Queries.
CoRR, 2019

BigData Applications from Graph Analytics to Machine Learning by Aggregates in Recursion.
Proceedings of the Proceedings 35th International Conference on Logic Programming (Technical Communications), 2019

Improving Distributed Similarity Join in Metric Space with Error-bounded Sampling.
CoRR, 2019

Hierarchical Inter-Attention Network for Document Classification with Multi-Task Learning.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

Learn Smart with Less: Building Better Online Decision Trees with Fewer Training Examples.
Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, 2019

A Hierarchical Framework for Top-k Location-Aware Error-Tolerant Keyword Search.
Proceedings of the 35th IEEE International Conference on Data Engineering, 2019

Scalable Metric Similarity Join Using MapReduce.
Proceedings of the 35th IEEE International Conference on Data Engineering, 2019

MF-Join: Efficient Fuzzy String Similarity Join with Multi-level Filtering.
Proceedings of the 35th IEEE International Conference on Data Engineering, 2019

An Efficient Sliding Window Approach for Approximate Entity Extraction with Synonyms.
Proceedings of the Advances in Database Technology, 2019

Distributed Query Engine for Multiple-Query Optimization over Data Stream.
Proceedings of the Database Systems for Advanced Applications, 2019

Synergy of Database Techniques and Machine Learning Models for String Similarity Search and Join.
Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019

2018
Mining Precise-Positioning Episode Rules from Event Sequences.
IEEE Trans. Knowl. Data Eng., 2018

Beyond Polarity: Interpretable Financial Sentiment Analysis with Hierarchical Query-driven Attention.
Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, 2018

Modeling Patient Visit Using Electronic Medical Records for Cost Profile Estimation.
Proceedings of the Database Systems for Advanced Applications, 2018

2017
A unified framework for string similarity search with edit-distance constraint.
VLDB J., 2017

Combining Knowledge with Deep Convolutional Neural Networks for Short Text Classification.
Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, 2017

An Efficient Framework for Exact Set Similarity Search Using Tree Structure Indexes.
Proceedings of the 33rd IEEE International Conference on Data Engineering, 2017

2016
Ranking support for matched patterns over complex event streams: The CEPR system.
Proceedings of the 32nd IEEE International Conference on Data Engineering, 2016

2015
Two birds with one stone: An efficient hierarchical framework for top-k and threshold-based string similarity search.
Proceedings of the 31st IEEE International Conference on Data Engineering, 2015

A Cost-aware Buffer Management Policy for Flash-based Storage Devices.
Proceedings of the Database Systems for Advanced Applications, 2015

2014
TL: A High Performance Buffer Replacement Strategy for Read-Write Splitting Web Applications.
Proceedings of the Web Technologies and Applications - 16th Asia-Pacific Web Conference, 2014

2013
A New Plug-in System Supporting Very Large Digital Library.
Proceedings of the Digital Libraries: Social Media and Community Networks, 2013

pLSM: A Highly Efficient LSM-Tree Index Supporting Real-Time Big Data Analysis.
Proceedings of the 37th Annual IEEE Computer Software and Applications Conference, 2013


  Loading...