DNABERT-S: Learning Species-Aware DNA Embedding with Genome Foundation Models.
CoRR, 2024
LSHvec: a vector representation of DNA sequences using locality sensitive hashing and FastText word embeddings.
Proceedings of the BCB '21: 12th ACM International Conference on Bioinformatics, 2021
Comparison and Benchmark of Graph Clustering Algorithms.
CoRR, 2020
SpaRC: scalable sequence clustering using Apache Spark.
Bioinform., 2019
A case study of tuning MapReduce for efficient Bioinformatics in the cloud.
Parallel Comput., 2017
Performance evaluation and tuning of BioPig for genomic analysis.
Proceedings of the 2015 International Workshop on Data-Intensive Scalable Computing Systems, 2015