Introduction
Pandas is a Python library for data analysis and manipulation. Almost all operations in pandas
revolve around DataFrame
s.
A Dataframe
is is an abstract representation of a two-dimensional table which can contain all sorts of data. They also enable us give all the columns names, which is why oftentimes columns are referred to as attributes or fields when using DataFrames
.
In this article we'll see how we can rename an already existing DataFrame
's columns.
There are two options for manipulating the column names of a DataFrame
:
- Renaming the columns of an existing
DataFrame
- Assigning custom column names while creating a new
DataFrame
Let's take a look at both of the methods.
Renaming Columns of an Existing Dataframe
We have a sample DataFrame
below:
import pandas as pd
data = {'Name':['John', 'Doe', 'Paul'],
'age':[22, 31, 15]}
df = pd.DataFrame(data)
The DataFrame
df
looks like this:
To rename the columns of this DataFrame
, we can use the rename()
method which takes:
- A dictionary as the
columns
argument containing the mapping of original column names to the new column names as a key-value pairs - A
boolean
value as theinplace
argument, which if set toTrue
will make changes on the originalDataframe
Let us change the column names in our DataFrame
from Name, age
to First Name, Age
.
df.rename(columns = {'Name' : 'First Name', 'age' : 'Age'}, inplace = True)
Now, our df
contains:
Assign Column Names While Creating a Dataframe
Now we will discuss how to assign column names while creating a DataFrame
.
This is particularly helpful when you are creating a DataFrame
from a csv
file and want to ignore the header column names and assign your own.
By passing a list to the names
argument, we can override the already existing header column with our own. The list must have a name for every column in the data, otherwise, an exception is thrown.
Note that if we want to rename only a few columns, it is better to use the rename
method on the DataFrame
after creating it.
We will be creating a DataFrame
using out.csv
, which has the following contents:
Name, age
John, 22
Doe, 31
Paul, 15
Note that the first line in the file is the header line and contains the column names. Pandas, by default, assigns the column names to the DataFrame
from the first line.
Hence, we will specify to ignore the header line while creating our DataFrame
and specify the column names in a list that is passed to the names
argument:
columns = ['First Name', 'Age']
df = pd.read_csv('out.csv', header = None, names = columns)
df
This results in:
Another way of doing this is by specifying the column names in the plain old DataFrame()
constructor.
The one difference being that now the parameter that takes the list of column names is called column
instead of names
:
import numpy as np
new_columns = ['First Name', 'Age']
data = np.array([["Nicholas", 23],["Scott", 32],["David", 25]])
df = pd.DataFrame(data, columns = new_columns)
This results in a different DataFrame
:
Conclusion
In this article we've quickly gone over how we can name and rename columns in DataFrame
s. Either by assigning names while constructing the DataFrame
instance, or by renaming them after the fact with the rename()
method.
from Planet Python
via read more
No comments:
Post a Comment