Tuesday, August 25, 2020

Python⇒Speed: Estimating and modeling memory requirements for data processing

Whether it’s a data processing pipeline or a scientific computation, you will often want to figure out how much memory your process is going to need:

  • If you’re running out of memory, it’s good to know whether you just need to upgrade your laptop from 8GB to 16GB RAM, or whether your process wants 200GB RAM and it’s time to do some optimization.
  • If you’re running a parallelized computation, you will want to know how much memory each individual task takes, so you know how many tasks to run in parallel.
  • If you’re scaling up to multiple runs, you’ll want to estimate the costs, whether hardware or cloud resources.

In the first case above, you can’t actually measure peak memory usage because your process is running out memory. And in the remaining cases, you might be running with differents inputs at different times, resulting in different memory requirements.

What you really need then is model of how much memory your program will need for different input sizes. Let’s see how you can do that.

Read more...

from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...