Monday, June 28, 2021

Python⇒Speed: Measuring the memory usage of a Pandas DataFrame

How much memory are your Pandas DataFrame or Series using? Pandas provides an API for measuring this information, but a variety of implementation details means the results can be confusing or misleading.

Consider the following example:

>>> import pandas as pd
>>> series = pd.Series(["abcdefhjiklmnopqrstuvwxyz" * 10
...                     for i in range(1_000_000)])
>>> series.memory_usage()
8000128
>>> series.memory_usage(deep=True)
307000128

Which is correct, is memory usage 8MB or 300MB? Neither!

In this special case, it’s actually 67MB, at least with the default Python interpreter. This is partially because I cheated, and often 300MB will actually be closer to the truth.

What’s going on? Let’s find out!

Read more...

from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...