Saturday, November 16, 2019

Ian Ozsvald: “Higher Performance Python” at PyDataCambridge 2019

I’ve had the pleasure of speaking at the first PyDataCambridge conference (2019), this is the second PyData conference in the UK after PyDataLondon (which colleagues and I co-founded 6 years back). I’m super proud to see PyData spread to 6 regional meetups and now 2 UK conferences.

I spoke on Higher Performance Python with a focus towards making Pandas operations go faster and an eye on the upcoming Second Edition of our High Performance Python (O’Reilly) book. The talk covers:

  • Using line_profiler to evaluate sklearn’s LinearRegression vs NumPy’s lstsq (spoiler – lstsq is much faster but that’s due to sklearn being much safer, the slow-down is all due to safety code in sklearn that helps keep your productivity higher overall)
  • Using Pandas for line-by-line iteration (slow) vs apply (faster) and apply with raw=True to expose NumPy arrays (fastest)
  • Using Numba to JIT compile lstsq using apply with raw=True for a huge speed-up
  • Using Dask to parallelise the Numba solution for further speed-ups
  • Advice on being a “highly performant data scientist”

The last point is important – going “compiler happy” and writing highly efficient code may well slow down your team and your overall velocity. Amongst other items I recommended profiling first, maybe introducing Dask & Numba only with a team’s consent and looking at tools like Bulwark to add tests to DataFrames to avoid being derailed by strange data bugs.

Right now Micha and I are busily working to complete the second edition of our book, all going well it’ll be in for Christmas with a publication date around April 2020.

 


Ian is a Chief Interim Data Scientist via his Mor Consulting. Sign-up for Data Science tutorials in London and to hear about his data science thoughts and jobs. He lives in London, is walked by his high energy Springer Spaniel and is a consumer of fine coffees.

The post “Higher Performance Python” at PyDataCambridge 2019 appeared first on Entrepreneurial Geekiness.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...