Monday, April 27, 2020

Ian Ozsvald: “Flying Pandas” and “Making Pandas Fly” – virtual talks this weekend on faster data processing with Pandas, Modin, Dask and Vaex

This Saturday and Monday I’ve had my first experience presenting at virtual conferences – on Saturday it was for Remote Pizza Python (brilliant line-up!) and on Monday (note – this post predates the talk, I’ll update it tomorrow after I’ve spoken) at BudapestBI.

My slides for Remote Pizza Python are here “Flying Pandas – Modin, Dask & Vaex“. I cover the following in a 10 min talk:

  • Modin – new academic project, makes a new algebra for dataframes (not just Pandas), provides automated column & row parallelisation options for no code changes
  • Dask – great for blocked Pandas DataFrames in parallel on 1 or more machines (it can also parallelise on a single machine multi-core with in-RAM data which I didn’t cover)
  • Vaex – new Pandas-like DataFrame with a subset of operations, better string implementation so you fit more strings into RAM than with Pandas
  • I recommended sticking to Pandas if your code fits in RAM, trying Modin if you have it in RAM or using Dask if you have a bigger-than-RAM scenario, with Vaex being great for an experiment

The reaction was very positive and on the internal Discord chat we had some great Q&A about the use of Numba, Dask, Modin and other tools.


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.

The post “Flying Pandas” and “Making Pandas Fly” – virtual talks this weekend on faster data processing with Pandas, Modin, Dask and Vaex appeared first on Entrepreneurial Geekiness.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...