Wednesday, June 10, 2020

Real Python: SettingWithCopyWarning in Pandas: Views vs Copies

NumPy and Pandas are very comprehensive, efficient, and flexible Python tools for data manipulation. An important concept for proficient users of these two libraries to understand is how data are referenced as shallow copies (views) and deep copies (or just copies). Pandas sometimes issues a SettingWithCopyWarning to warn the user of a potentially inappropriate use of views and copies.

In this article, you’ll learn:

  • What views and copies are in NumPy and Pandas
  • How to properly work with views and copies in NumPy and Pandas
  • Why the SettingWithCopyWarning happens in Pandas
  • How to avoid getting a SettingWithCopyWarning in Pandas

You’ll first see a short explanation of what the SettingWithCopyWarning is and how to avoid it. You might find this enough for your needs, but you can also dig a bit deeper into the details of NumPy and Pandas to learn more about copies and views.

Free Bonus: Click here to get access to a free NumPy Resources Guide that points you to the best tutorials, videos, and books for improving your NumPy skills.

Prerequisites

To follow the examples in this article, you’ll need Python 3.7 or 3.8, as well as the libraries NumPy and Pandas. This article is written for NumPy version 1.18.1 and Pandas version 1.0.3. You can install them with pip:

$ python -m pip install -U "numpy==1.18.*" "pandas==1.0.*"

If you prefer Anaconda or Miniconda distributions, you can use the conda package management system. To learn more about this approach, check out Setting Up Python for Machine Learning on Windows. For now, it’ll be enough to install NumPy and Pandas in your environment:

$ conda install numpy=1.18.* pandas=1.0.*

Now that you have NumPy and Pandas installed, you can import them and check their versions:

>>>
>>> import numpy as np
>>> import pandas as pd

>>> np.__version__
'1.18.1'
>>> pd.__version__
'1.0.3'

That’s it. You have all the prerequisites for this article. Your versions might vary slightly, but the information below will still apply.

Note: This article requires you to have some prior Pandas knowledge. You’ll also need some knowledge of NumPy for the later sections.

To refresh your NumPy skills, you can check out the following resources:

To remind yourself about Pandas, you can read the following:

Now you’re ready to start learning about views, copies, and the SettingWithCopyWarning!

Example of a SettingWithCopyWarning

If you work with Pandas, chances are that you’ve already seen a SettingWithCopyWarning in action. It can be annoying and sometimes hard to understand. However, it’s issued for a reason.

The first thing you should know about the SettingWithCopyWarning is that it’s not an error. It’s a warning. It warns you that you’ve probably done something that’s going to result in unwanted behavior in your code.

Let’s see an example. You’ll start by creating a Pandas DataFrame:

>>>
>>> data = {"x": 2**np.arange(5),
...         "y": 3**np.arange(5),
...         "z": np.array([45, 98, 24, 11, 64])}

>>> index = ["a", "b", "c", "d", "e"]

>>> df = pd.DataFrame(data=data, index=index)
>>> df
    x   y   z
a   1   1  45
b   2   3  98
c   4   9  24
d   8  27  11
e  16  81  64

This example creates a dictionary referenced by the variable data that contains:

  • The keys "x", "y", and "z", which will be the column labels of the DataFrame
  • Three NumPy arrays that hold the data of the DataFrame

You create the first two arrays with the routine numpy.arange() and the last one with numpy.array(). To learn more about arange(), check out NumPy arange(): How to Use np.arange().

The list attached to the variable index contains the strings "a", "b", "c", "d", and "e", which will be the row labels for the DataFrame.

Finally, you initialize the DataFrame df that contains the information from data and index. You can visualize it like this:

Read the full article at https://realpython.com/pandas-settingwithcopywarning/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...