The feature that I'm most excited about in sourmash 3.3.0 is the ability to directly use compressed SBT search databases.
Previously, if you wanted to search (say) 100,000 genomes from GenBank, you'd have to download a several GB .tar.gz file, and then uncompress it out to ~20 GB before searching it. The time and disk space requirements for this were major barriers for teaching and use.
In v3.3.0, Luiz Irber fixed this by, first, releasing the niffler Rust library with Pierre Marijon, to read and write compressed files; second, replacing our old khmer Bloom filter nodegraph with a Rust implementation (sourmash PR #799); and, third, adding direct zip file storage (sourmash #648).
So, as of the latest release, you can do the following:
# install sourmash v3.3.0 conda create -y -n sourmash-demo \ -c conda-forge -c bioconda sourmash=3.3.0 # activate environment conda activate sourmash-demo # download the 25k GTDB release89 guide database (~1.4 GB) curl -L https://osf.io/5mb9k/download > gtdb-release89-k31.sbt.zip # grab
from Planet SciPy
read more
No comments:
Post a Comment