This Saturday and Monday I’ve had my first experience presenting at virtual conferences – on Saturday it was for Remote Pizza Python (brilliant line-up!) and on Monday (note – this post predates the talk, I’ll update it tomorrow after I’ve spoken) at BudapestBI.
My slides for Remote Pizza Python are here “Flying Pandas – Modin, Dask & Vaex“. I cover the following in a 10 min talk:
- Modin – new academic project, makes a new algebra for dataframes (not just Pandas), provides automated column & row parallelisation options for no code changes
- Dask – great for blocked Pandas DataFrames in parallel on 1 or more machines (it can also parallelise on a single machine multi-core with in-RAM data which I didn’t cover)
- Vaex – new Pandas-like DataFrame with a subset of operations, better string implementation so you fit more strings into RAM than with Pandas
- I recommended sticking to Pandas if your code fits in RAM, trying Modin if you have it in RAM or using Dask if you have a bigger-than-RAM scenario, with Vaex being great for an experiment
The reaction was very positive and on the internal Discord chat we had some great Q&A about the use of Numba, Dask, Modin and other tools.
Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.
The post “Flying Pandas” and “Making Pandas Fly” – virtual talks this weekend on faster data processing with Pandas, Modin, Dask and Vaex appeared first on Entrepreneurial Geekiness.
from Planet Python
via read more
No comments:
Post a Comment