Monday, August 17, 2020

PSF GSoC students blogs: Parallelizing Wheel Downloads

And now it's clear as this promise
That we're making
Two progress bars into one

Hello there! It has been raining a lot lately and my wisdom tooth has decided to start growing today, causing me a mild fever. To whoever reading this, I hope it wouldn't happen to you.

Download Parallelization

I've been working on pip's download parallelization for quite a while now. As distribution download in pip was modeled as a lazily evaluated iterable of chunks, parallelizing such procedure is as simple as submitting routines that write files to disk to a worker pool.

Or at least that is what I thought.

Progress Reporting UI

pip is currently using customly defined progress reporting classes, which was not designed to working with multithreading code. Firstly, I want to try using these instead of defining separate UI for multithreaded progresses. As they use system signals for termination, one must the progress bars has to be running the main thread. Or sort of.

Since the progress bars are designed as iterators, I realized that we can call next on them. So quickly, I throw in some queues and locks, and prototyped the first working implementation of progress synchronization.

Performance Issues

Welp, I only said that it works, but I didn't mention the performance, which is terrible. I am pretty sure that the slow down is with the synchronization, since the map_multithread call doesn't seem to trigger anything that may introduce any sort of blocking.

This seems like a lot of fun, and I hope I'll get better tomorrow to continue playing with it!



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...