Saturday, February 29, 2020

Erik Marsja: How to Convert a Python Dictionary to a Pandas DataFrame

The post How to Convert a Python Dictionary to a Pandas DataFrame appeared first on Erik Marsja.

In this brief Python Pandas tutorial, we will go through the steps of creating a dataframe from a dictionary. Specifically, we will learn how to convert a dictionary to a Pandas dataframe in 3 simple steps. First, however, we will just look at the syntax. After we have had a quick look at the syntax on how to create a dataframe from a dictionary we will learn the easy steps and some extra things. In the end, there’s a YouTube Video and a link to the Jupyter Notebook containing all the example code from this post.

Data Import in Python with Pandas

Now, most of the time we will use Pandas read_csv or read_excel to import data for our statistical analysis in Python. Of course, sometimes we may use the read_sav, read_spss, and so on. If we need to import data from other file types refer to the following posts on how to read csv files with Pandas, how to read excel files with Pandas, and how to read Stata, read SPSS files, and read SAS files with Python and Pandas.

convert a python dictionary to a pandas dataframe

However, there are cases when we may only have a few rows of data or some basic calculations that need to be done. If this is the case, we may want to know how to easily convert a Python dictionary to a Pandas dataframe.

Basic Syntax for Creating a Dataframe from a Dictionary

If we want to convert a Python Dictionary to a Pandas dataframe here’s the simple syntax:

import pandas as pd

data = {‘key1’: values, ‘key2’:values, ‘key3’:values, …, ‘keyN’:values}
df = pd.DataFrame(data)

When we use the above template we will create a dataframe from a dictionary. Now, before we go on with the steps on how to convert a dictionary to a dataframe we are going to answer some questions:

What is a Python Dictionary?

Now, a dictionary in Python is an unordered collection of data values. If we compare a Python dictionary to other data types, in Python, it holds a key:value pair.

What is a DataFrame?

Now, the next question we are going to answer is concerning what a dataframe is. A DataFrame is a 2-d labeled data structure that has columns of potentially different types (e.g., numerical, categorical, date). It is in many ways a lot like a spreadsheet or SQL table.

Create a Dataframe from a Dictionary

In general, we can create the dataframe from a range of different objects. We will just use the default constructor.

convert a Dictionary to a dataframeDataFrame Constructor

3 Steps to Convert a Dictionary to a Dataframe

Now, we are ready to go through how to convert a dictionary to a Pandas dataframe step by step.  In the first example, on how to build a dataframe from a dictionary we will get some data on the popularity of programming languages (here).

1. Add, or gather, data to the Dictionary

In the first step, we need to get our data to a Python dictionary. This may be done by scraping data from the web or just crunching in the numbers in a dictionary as in the example below.

If we collect the top 5 most popular programming languages:

make dataframe from python dictionary

2. Create the Python Dictionary

In the second step, we will create our Python dictionary from the data we gathered in the first step. That is, before converting the dictionary to a dataframe we need to create it:

data = {'Rank':[1, 2, 3, 4, 5],
       'Language': ['Python', 'Java',
                   'Javascript',
                   'C#', 'PHP'],
       'Share':[29.88, 19.05, 8.17,
               7.3, 6.15],
       'Trend':[4.1, -1.8, 0.1, -0.1, -1.0]}

print(data)
dataframe from dict

3. Convert the Dictionary to a Pandas Dataframe

Finally, we are ready to take our Python dictionary and convert it into a Pandas dataframe. This is easily done, and we will just use pd.DataFrame and put the dictionary as the only input:

df = pd.DataFrame(data)

display(df)
make a dataframe from a dictionary

Note, when we created the Python dictionary, we added the values in lists. If we’re to have different lengths of the Python lists, we would not be able to create a dataframe from the dictionary. This would lead to a ValueError (“ValueError: arrays must all be the same length”).

Now that we have our dataframe, we may want to get the column names from the Pandas dataframe.

Pandas Dataframe from Dictionary Example 2

In the second, how to create Pandas create dataframe from dictionary example, we are going to work with Python’s OrderedDict.

from collections import OrderedDict

data= OrderedDict([('Trend', [4.1, -1.8, 0.1, 
                              -0.1, -1.0]),
                   ('Rank',[1, 2, 3, 4, 5]),
                   ('Language', ['Python', 'Java',
                                 'Javascript',
                                 'C#', 'PHP']),
                   ('Share', [29.88, 19.05, 8.17,
                              7.3, 6.15])])

display(data)
create a dataframe from a python dictionary

Now, to create a dataframe from the ordered dictionary (i.e. OrderedDict) we just use the pd.DataFrame constructor again:

df = pd.DataFrame(data)

Note, this dataframe, that we created from the OrderedDict, will, of course, look exactly the same as the previous ones.

Create a DataFrame from a Dictionary Example 3: Custom Indexes

Now, in the third create a DataFrame from a Python dictionary, we will use the index argument to create custom indexes of the dataframe.

from collections import OrderedDict

data = OrderedDict([('Trend', [4.1, -1.8, 0.1, 
                              -0.1, -1.0]),
                   ('Rank',[1, 2, 3, 4, 5]),
                   ('Language', ['Python', 'Java',
                                 'Javascript',
                                 'C#', 'PHP']),
                   ('Share', [29.88, 19.05, 8.17,
                              7.3, 6.15])])

df = pd.DataFrame(data, index = ['A', 'B',
                                'C', 'D',
                                'E'])

display(df)
convert a dictionary to a pandas dataframe

Note, we can, of course, use the columns argument also when creating a dataframe from a dictionary, as in the previous examples.

Create a DataFrame from a Dictionary Example 4: Skip Data

In the fourth example, we are going to create a dataframe from a dictionary and skip some columns. This is easily done using the columns argument. This argument takes a list as a parameter and the elements in the list will be the selected columns:

from collections import OrderedDict

data = OrderedDict([('Trend', [4.1, -1.8, 0.1, 
                              -0.1, -1.0]),
                   ('Rank',[1, 2, 3, 4, 5]),
                   ('Language', ['Python', 'Java',
                                 'Javascript',
                                 'C#', 'PHP']),
                   ('Share', [29.88, 19.05, 8.17,
                              7.3, 6.15])])
df = pd.DataFrame(data, index = ['A', 'B',
                                'C', 'D',
                                'E'],
                 columns=['Language', 'Share'])

display(df)
dict to pandas dataframe

Create DataFrame from Dictionary Example 5: Changing the Orientation

In the fifth example, we are going to make a dataframe from a dictionary and change the orientation. That is, in this example, we are going to make the rows columns. Note, however, that here we use the from_dict method to make a dataframe from a dictionary:

df = pd.DataFrame.from_dict(data, orient='index')

df.head()
how to convert a dictionary to a pandas dataframe

As we can see in the image above, the dataframe we have created has the column names 0 to 4. If we want to, we can name the columns using the columns argument:

df = pd.DataFrame.from_dict(data, orient='index',
                            columns=['A', 'B', 'C',
                                    'D', 'F'])

df.head()
convert dictionary to dataframe

YouTube Video: Convert a Dictionary to a Pandas Dataframe

Now, if you prefer to watch and listen to someone explaining how to make a dataframe from a Python dictionary here’s a YouTube video going through the steps as well as most of the other parts of this tutorial:

Bonus: Save the DataFrame as a CSV

Finally, and as a bonus, we will learn how to save the dataframe we have created from a Python dictionary to a CSV file:

df.to_csv('top5_prog_lang.csv')

That was simple, saving data as CSV with Pandas is quite simple. It is, of course, also possible to write the dataframe as an Excel (.xlsx) file with Pandas. Finally, here’s the Jupyter Notebook for the code examples from this post.

Summary

Most of the time, we import data to Pandas dataframes from CSV, Excel, or SQL file types. Moreover, we may also read data from Stata, SPSS, and SAS files. However, there are times when we have the data in a basic list or, as we’ve learned in this post, a dictionary. Now, using Pandas it is, of course, possible to create a dataframe from a Python dictionary.

The post How to Convert a Python Dictionary to a Pandas DataFrame appeared first on Erik Marsja.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...