How much memory are your Pandas DataFrame or Series using? Pandas provides an API for measuring this information, but a variety of implementation details means the results can be confusing or misleading.
Consider the following example:
>>> import pandas as pd
>>> series = pd.Series(["abcdefhjiklmnopqrstuvwxyz" * 10
... for i in range(1_000_000)])
>>> series.memory_usage()
8000128
>>> series.memory_usage(deep=True)
307000128
Which is correct, is memory usage 8MB or 300MB? Neither!
In this special case, it’s actually 67MB, at least with the default Python interpreter. This is partially because I cheated, and often 300MB will actually be closer to the truth.
What’s going on? Let’s find out!
Read more...from Planet Python
via read more
No comments:
Post a Comment