Tuesday, July 7, 2020

Erik Marsja: Adding New Columns to a Dataframe in Pandas (with Examples)

The post Adding New Columns to a Dataframe in Pandas (with Examples) appeared first on Erik Marsja.

In this Pandas tutorial, we are going to learn all there is about adding new columns to a dataframe. Here, we are going to use the same three methods that we used to add empty columns to a Pandas dataframe. Specifically, when adding columns to the dataframe we are going to use the following 3 methods:

  1. Simply assigning new data to the dataframe
  2. The assign() method to add new columns
  3. The insert() method to add new columns

Outline

The outline of the tutorial is as follow: a brief introduction, and a quick overview on how to add new columns to Pandas dataframe (all three methods). Following the overview of the three methods, we create some fake data, and then we use the three methods to add columns to the created dataframe.

Introduction

There are many things that we may want to do after we have created, or loaded, our dataframe in Pandas. For instance, we may go on and do some data manipulation tasks such as manipulating the columns of the dataframe. Now, if we are reading most of the data from one data source but some data from another we need to know how to add columns to a dataframe.

Adding a column to a Pandas dataframe is easy. Furthermore, as you surely have noticed, there are a few ways to carry out this task. Of course, this can create some confusion for beginners- Here, as a beginner you might see several different ways to add a column to a dataframe and you may ask yourself: which one should I use?

How to Add New Columns to a Dataframe in Pandas in 3 Ways

As previously mentioned, this tutorial is going to go through 3 different methods we can use when adding columns to the dataframe. First, we are going to use the method you may be familiar with if you know Python but have not worked with Pandas that much yet. Namely, we are going to use simple assigning:

1. Adding a New Column by Assigning New Data:

Here’s how to add a list, for example, to an existing dataframe in Pandas:df[‘NewCol’] = [1, 3, 4, 5, 6]. In the next example, we are going to use the assign() method:

2. Adding New Columns Using the assign() Method:

Here’s how to add new columns by using the assign() method: df = df.assign(NewCol1=[1, 2, 3, 4, 5], NewCol2=[.1, .2, .3., .5, -3]). After this, we will see an example of adding new columns using the insert() method:

3. Adding New Columns Using the insert() Method:

Here’s how new columns can be added with the insert() method: df.insert(4, [1, 2, 3, 4, 5]). In the next section, before we go through the examples, we are going to create some example data to play around with.

Pandas dataframe from a dictionary

In most cases, we are going to read our data from an external data source. Here, however, we are going to create a Pandas dataframe from a dictionary

import pandas as pd

gender = ['M', 'F', 'F', 'M']
cond = ['Silent', 'Silent', 
        'Noise', 'Noise']
age = [19, 21, 20, 22]
rt = [631.2, 601.3, 
     721.3, 722.4]

data = {'Gender':gender,
       'Condition':cond,
       'age':age,
       'RT':rt}

# Creating the Datafame from dict:
df = pd.DataFrame(data)

In the code chunk above, we imported Pandas and created 4 Python lists. Second, we created a dictionary with the column names we later want in our dataframe as keys and the 4 lists as values. Finally, we used the dataframe constructor to create a dataframe from our list. If you need to learn more about importing data to a Pandas dataframe check the following tutorials:

Example 1: Adding New Columns to a dataframe by Assigning Data

In the first example, we are going to add new columns to the dataframe by assigning new data. For example, if we are having two lists, containing new data, that we need to add to an existing dataframe we can just assign each list as follows:

df['NewCol1'] = 'A'
df['NewCol2'] = [1, 2, 3, 4]

display(df)

In the code above, we first added the list ([1 ,2 ,3 ,4 ,5]) by assigning it to a new column. To explain, the new column was created using the brackets ([]). Second, we added another column in the same way. Now, the second column, on the other hand, we just added a string (‘A’). Note, assigning a single value, as we did, will fill the entire newly added column with that value. Finally, when adding columns using this method we set the new column names using Python strings.

Two added columns to dataframe in pandasThe dataframe with the new, added, columns

Now, it’s important to know that each list we assign to a new column from, for example, a list it needs to be of the exact same length as the existing columns in the Pandas dataframe. For example, the example dataframe we are working with have 4 rows:

If we try to add 3 new rows, it won’t work (see the image below, for error message).

Example 2: Adding New Columns to a dataframe with the assign() method

In the second example, we are adding new columns to the Pandas dataframe with the assign() method:

df.assign(NewCol1='A',
         NewCol2=[1, 2, 3, 4])

In the second adding new columns example, we assigned two new columns to our dataframe by adding two arguments to the assign method. These two arguments will become the new column names. Furthermore, each of our new columns also has the two lists we used in the previous example added. This way the result is exactly the same as in the first example.  Importantly, if we use the same names as already existing columns in the dataframe, the old columns will be overwritten. Again, when adding new columns the data you want to add need to be of the exact same length as the number of rows of the Pandas dataframe.

Example 3: Adding New Columns to dataframe in Pandas with the insert() method

In the third example, we are going to add new columns to the dataframe using the insert() method:

df.insert(4, 'NewCol1', 'Bre')
df.insert(5, 'NewCol2', [1, 2, 3, 4])

display(df)

To explain the code above: we added two empty columns using 3 arguments of the insert() method. First, we used the loc argument to “tell” Pandas where we want our new column to be located in the dataframe. In our case, we add them to the last position in the dataframe. Second, we used the column argument (takes a string for the new column names). Lastly, we used the value argument to actually add the same list as in the previous examples. Here is the resulting dataframe:

Two new columns added to the dataframe in Pandas using the insert method.

As you may have noticed, when working with insert() method we need to how many columns there are in the dataframe. For example, when we use the code above it is not possible to insert a column where there already is one. Another option, however, that we can use if we don’t know the number of columns is using len(df.columns). Here is the same example as above using the length of the columns instead:

df.insert(len(df.columns), 'NewCol1', 'Bre')
df.insert(len(df.columns), 'NewCol2', [1, 2, 3, 4])

Note, if we really want to we actually can insert columns wherever we want in the dataframe. To accomplish this we need to set the allow_duplicates to true. For example, the following adding column example will work:

df.insert(1, 'NewCol1', 'Bre', allow_duplicates=TRUE)
df.insert(3, 'NewCol2', [1, 2, 3, 4], allow_duplicates=TRUE)

Now, if we have a lot of columns there are, of course, alternatives that may be more feasable than the one we have covered here. For instance, if we want to add columns from another dataframe we can either use the join, concat, or merge methods.

Conclusion

In this post, we learned how to add new columns to a dataframe in Pandas. Specifically, we used 3 different methods. First, we added a column by simply assigning a string and a list. This method is very similar to when we assign variables to Python variables. Second, we used the assign() method and added new columns in the Pandas dataframe. Finally, we had a look at the insert() method and used this method to add new columns in the dataframe. In conclusion, the best method to add columns is the assign() method. Of course, if we read data from other sources and want to merge two dataframe, only getting the new columns from one dataframe, we should use other methods (e.g., concat or merge).

Hope you enjoyed this Pandas tutorial and please leave a comment below. Especially, if there is something you want to be covered on the blog or something that should be added to this blog post. Finally, please share the post if you learned something new!

The post Adding New Columns to a Dataframe in Pandas (with Examples) appeared first on Erik Marsja.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...