Wednesday, February 12, 2020

Erik Marsja: How to Plot a Histogram with Pandas in 3 Simple Steps

The post How to Plot a Histogram with Pandas in 3 Simple Steps appeared first on Erik Marsja.

In this post, we are going to learn how to plot histograms with Pandas in Python. Specifically, we are going to learn 3 simple steps to make a histogram with Pandas. Now, plotting a histogram is a good way to explore the distribution of our data.

Prerequisites

First of all, and quite obvious, we need to have Python 3.x and Pandas installed to be able to create a histogram with Pandas. Now, Python and Pandas will be installed if we have a scientific Python distribution, such as Anaconda or ActivePython, installed. On the other hand, Pandas can be installed, as many Python packages, using Pip: pip install pandas.

How to Make a Pandas Histogram

Now, before we go on and learn how to make a histogram in Pandas step-by-step here’s how we generally create a histogram using Pandas: pandas.DataFrame.hist(). That is, we use the method available on a dataframe object: df.hist(column='DV'). Note, that DV is the column with the dependent variable we want to plot.

How to Plot a Histogram with Pandas in 3 Simple Steps:

Now, we are ready to learn how to make a Histogram using Pandas. It’s, as previously mentioned, very easy and we will go through each step here.

1. Import Pandas

In the first step, we import pandas as pd. That is, write the following on top of the Python script, or Jupyter notebook:

import pandas as pd

2. Import Data

In the second step, we are going to load data from an Excel file using Pandas read_excel() method. Here’s how to read data:

from pathlib import Path
datafile = Path('Simdata', 'play_data.xlsx')

df = pd.read_excel(datafile, 
               index_col=0)

Note, that in the example above the Excel file (i.e., the data) is located in the same folder as the Python script (or Jupyter notebook). It’s also possible to read CSV files into Pandas dataframes. That is if we store our data in that file type.

3. Create the Histogram using Pandas hist()

In the third, and final step, we are going to create a histogram with Pandas. Specifically, we are going to use df.hist() to do this.

df.hist(column='RT')
how to plot histogram using Pandas

As can be seen in the code chunk above, we have also selected one column (i.e., by “column=’RT’”) to create the histogram from. Note, if we didn’t select the “RT” column, Pandas hist() method would have created histograms of all numeric variables in the dataframe:

how to make a histogram using Pandas hist()

How to Plot Histograms by Group using Pandas:

Now, Pandas hist() have more parameters than column that we can work with. In this section, we are going to create a histogram by gender. We will accomplish this by using the by argument:

df.hist(by='Gender',
       column='RT')/code>
how to create a histogram with Pandas

In the histogram above, we grouped by gender. It is also possible to group the dataframe, before plotting, using Pandas groupby().

How to Change the Size of a Pandas Histogram:

In this section, we will use the figsize argument to change the size of the histogram:

df.hist(by='Gender',
       column='RT',
       figsize=[10, 6]))/code>
how to plot histogram with pandas

There are, of course, other arguments that can be used when plotting histograms using Pandas. Make sure you check the documentation when needed. For instance, the ticks can be rotated using the xrot argument.

Here’s a Jupyter notebook with all the code examples.

Conclusion: How to Create a Histogram with Pandas in Python

In this short post, we learned 3 simple steps to plot a histogram with Pandas. Furthermore, we learned how to create histograms by a group and how to change the size of a Pandas histogram.

The post How to Plot a Histogram with Pandas in 3 Simple Steps appeared first on Erik Marsja.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...