Saturday, August 29, 2020

Erik Marsja: How to Make Column Index in Pandas Dataframe – with Examples

The post How to Make Column Index in Pandas Dataframe – with Examples appeared first on Erik Marsja.

In this short Pandas tutorial, you will learn how to make column index in a dataframe. Standarly, when creating a dataframe, whether from a dictionary, or by reading a file (e.g., reading a CSV file, opening an Excel file) an index column is created. For this reason, we need to either set the specific column we want to be index when creating the file or, simply, making one of the columns index later (e.g., after we’ve read a CSV file). Namely, if we want a specific column to be the index in our dataframe. 

Outline of this Tutorial

For this purpose, the standard functionality of Pandas, this tutorial will teach you how to set which column you want to make index. First of all, you are going to learn the general syntax of the set_index() method and get the answer to the question “How do I create an index in Pandas??” followed by the general syntax of the set_index() method. In the following two sections, you will learn how to make a specific column index when reading data from 1) a CSV file, and 2) an Excel file. 

Following this, the next two sections will get into more details on how to use set_index() to make 1) a column in the dataframe index, and 2) how to create multiple indexes (MultiIndex) from two columns. First off all, you need to know the column names (or get column names of the Pandas dataframe) to set a column index.

Make a Column index: Syntax of the set_index() method

Here’s how to make a specific column index in Pandas dataframe:

your_df.set_index('Column_to_make_index')

In the code example above, ‘Column_to_make_index’ is, of course, the name of the column you want to make index. As often, when working with Pandas, this will note make the column you chose permanently the dataframes index (to make it permanent you should use the inplace parameter).

Column made as index in Pandas dataframe

One neat thing to remember is that set_index() can take multiple columns as the first argument. Here’s how to make multiple columns index in the dataframe:

your_df.set_index(['Col1', 'Col2'])

As you may have understood now, Pandas set_index()method can take a string, list, series, or dataframe to make index of your dataframe. Have a look at the documentation for more information.

As you will learn later, index column can be set while creating, or importing, data too. This, among other things, is what you actually will learn later in this post.

How do I create an index in pandas??

To create an index, from a column, in Pandas dataframe you use the set_index() method. For example, if you want the column “Year” to be index you type df.set_index(“Year”). Now, the set_index()method will return the modified dataframe as a result. Therefore, you should use the inplace parameter to make the change permanent. In the next two sections, you will learn how to make a column index while importing data.

<<<<

Make a Column Index when Reading a CSV file with Pandas

To make a specific column index when reading a csv file, we use the index_col parameter. First, let’s have a look if we just import data from the .csv file with Pandas read_csv method:

import pandas as pd

csv_url = https://raw.githubusercontent.com/marsja/jupyter/master/SimData/FifthDayData.csv'
df = pd.read_csv(csv_url)
Dataframe without setting column as index Pandas

Evidently, the index column was created when importing the data from the file. In this situation, if we want to make a specific column in the CSV file a column, we can change the code above to this:

# Make column index while reading CSV
df = pd.read_csv('https://raw.githubusercontent.com/marsja/jupyter/master/SimData/FifthDayData.csv',
                index_col=0)
df.head()

As a result, we set the “ID” column as index. Here’s what first five rows in the new dataframe, in which we set the index to a column, looks like:

The column

As you can see, making the “ID” column index was pretty simple. In the next section, you will get an example of how to import an Excel file and setting a column index.  Note, it’s also possible to use an integer, as input, to the parameter. For example, instead of using “ID”, in the example above, using “0” would give us the same result (without “”, of course). Next step, might be to rename a column in the dataframe (or multiple columns).

How to Make a Column Index when Reading an Excel file with Pandas

Here’s how to read an Excel (.xlsx) file using Pandas and making a column, of your choice, index:

xlsx_url = 'https://github.com/marsja/jupyter/blob/master/SimData/example_concat.xlsx?raw=true'

# Make column index while reading xlsx
df = pd.read_excel(xlsx_url, csv_url
                  index_col='ID')
df.head()

Again, we used the index_col parameter. As a result, this is how the first 5 rows look like:

Here is the dataframe in which we made

Note, if you want to play around with the Excel file you can download it from here. Additionally, you can, now, add an empty column to the dataframe. In the next section, we will have a look at how to make a column index in an existing dataframe. As an illustration, we will first create a dataframe from a Python dictionary. 

How to Make a Column Index in an Existing Dataframe

In this section, we will have a look at how to make a column index when we already have a dataframe. First, however, we will create a dataframe from a Python dictionary:

import pandas as pd

data = {'ID':[1, 2, 3, 4, 5],
       'Day':['Sun', 'Mon', 'Tue', 
              'Sun', 'Sat'],
       'Pandas':['GroupBy', 'Histogram',
                'Correlation Matrix', 'Rename Column',
                'Slicing']}

# Create Dataframe from dict
df = pd.DataFrame(data)

In the example above, we first created a dictionary. Each key in the dictionary will become a column. Furthermore, data (or cells) will be the values in the dictionary (i.e., in lists). Here’s how the resulting Pandas dataframe looks like:

Pandas dataframe where we are going to make the

If you want to, you can add a column to the Pandas dataframe. That is, a column that you want to set as index (see next section).

Set a Column Index

Here’s how to use Pandas dataframe set_index() method to make a column index:

# make column index
df.set_index('ID', inplace=True)

As previously mentioned, and as you can see in the code example above, we used the inplace parameter and set it to True. Remember, we will have to do this to make the change (e.g., setting the index) permanent. Here’s how the resulting dataframe looks like:

Here is the dataframe after we set the

Set Multiple Columns Index

In this example, we are going to make two columns as index in the Pandas dataframe. As you may have understood already, this can be done by merely adding a list of column names to the set_index() method:

df.set_index(['ID', 'Pandas'], inplace=True)

All in all, setting two, or more, columns index will end up with a MultiIndex dataframe. Have a look at the image below to see an example where we printed the indexes we made above.

indexes (columns)

It is worth noting, however, that the code example above will not work if you already ran the previous example. That is, this will only work if we did not set “ID” as index before trying to set both “ID” and “Pandas” as indexes.

That was it, now you know how to set a column as index. There are, of course, other things you can do now. For instance, you can also use reset_index() to revert what you have done in this post (i.e., undo making a column index). Furthermore, you can go on and rename columns in the dataframe, calculate descriptive statistics, creating dummy variables with the get_dummies() method, to name a few things. 

Conclusion

In this Pandas tutorial, you have learned how to make a column index in the dataframe. First, you had a look at the syntax of set_index(). Later on, you learned how to make a column, in a data file, index when reading 1) a .csv file, and 2) an Excel file. Finally, in the two last sections, you learned how to 1) make a single column index, and 2) how to make multiple (here two) columns index, in an existing dataframe. 

Hope you learned something valuable! If you did, please leave a comment and share the post (press the buttons below) on social media. Finally, you can also leave a comment if there’s something that is not working in the post, or if you want me to cover something in a future post.

The post How to Make Column Index in Pandas Dataframe – with Examples appeared first on Erik Marsja.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...