2024
DNABERT-S: Learning Species-Aware DNA Embedding with Genome Foundation Models.
CoRR, 2024

2021
LSHvec: a vector representation of DNA sequences using locality sensitive hashing and FastText word embeddings.
Proceedings of the BCB '21: 12th ACM International Conference on Bioinformatics, 2021

2020
Comparison and Benchmark of Graph Clustering Algorithms.
CoRR, 2020

2019
SpaRC: scalable sequence clustering using Apache Spark.
Bioinform., 2019

2017
A case study of tuning MapReduce for efficient Bioinformatics in the cloud.
Parallel Comput., 2017

2015
Performance evaluation and tuning of BioPig for genomic analysis.
Proceedings of the 2015 International Workshop on Data-Intensive Scalable Computing Systems, 2015