Tuesday, February 2, 2021

Stack Abuse: How to Rename Pandas DataFrame Column in Python

Introduction

Pandas is a Python library for data analysis and manipulation. Almost all operations in pandas revolve around DataFrames.

A Dataframe is is an abstract representation of a two-dimensional table which can contain all sorts of data. They also enable us give all the columns names, which is why oftentimes columns are referred to as attributes or fields when using DataFrames.

In this article we'll see how we can rename an already existing DataFrame's columns.

There are two options for manipulating the column names of a DataFrame:

  1. Renaming the columns of an existing DataFrame
  2. Assigning custom column names while creating a new DataFrame

Let's take a look at both of the methods.

Renaming Columns of an Existing Dataframe

We have a sample DataFrame below:

import pandas as pd
data = {'Name':['John', 'Doe', 'Paul'], 
        'age':[22, 31, 15]} 
df = pd.DataFrame(data)

The DataFrame df looks like this:

original pandas dataframe

To rename the columns of this DataFrame, we can use the rename() method which takes:

  1. A dictionary as the columns argument containing the mapping of original column names to the new column names as a key-value pairs
  2. A boolean value as the inplace argument, which if set to True will make changes on the original Dataframe

Let us change the column names in our DataFrame from Name, age to First Name, Age.

df.rename(columns = {'Name' : 'First Name', 'age' : 'Age'}, inplace = True)

Now, our df contains:

pandas dataframe rename column

Assign Column Names While Creating a Dataframe

Now we will discuss how to assign column names while creating a DataFrame.

This is particularly helpful when you are creating a DataFrame from a csv file and want to ignore the header column names and assign your own.

By passing a list to the names argument, we can override the already existing header column with our own. The list must have a name for every column in the data, otherwise, an exception is thrown.

Note that if we want to rename only a few columns, it is better to use the rename method on the DataFrame after creating it.

We will be creating a DataFrame using out.csv, which has the following contents:

Name, age
John, 22
Doe, 31
Paul, 15

Note that the first line in the file is the header line and contains the column names. Pandas, by default, assigns the column names to the DataFrame from the first line.

Hence, we will specify to ignore the header line while creating our DataFrame and specify the column names in a list that is passed to the names argument:

columns = ['First Name', 'Age']
df = pd.read_csv('out.csv', header = None, names = columns)
df

This results in:

new dataframe changed column names

Another way of doing this is by specifying the column names in the plain old DataFrame() constructor.

The one difference being that now the parameter that takes the list of column names is called column instead of names:

import numpy as np

new_columns = ['First Name', 'Age']
data = np.array([["Nicholas", 23],["Scott", 32],["David", 25]])

df = pd.DataFrame(data, columns = new_columns)

This results in a different DataFrame:

dataframe constructor rename column

Conclusion

In this article we've quickly gone over how we can name and rename columns in DataFrames. Either by assigning names while constructing the DataFrame instance, or by renaming them after the fact with the rename() method.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...