Wednesday, April 22, 2020

Real Python: The Pandas DataFrame: Make Working With Data Delightful

The Pandas DataFrame is a structure that contains two-dimensional data and its corresponding labels. DataFrames are widely used in data science, machine learning, scientific computing, and many other data-intensive fields.

DataFrames are similar to SQL tables or the spreadsheets that you work with in Excel or Calc. In many cases, DataFrames are faster, easier to use, and more powerful than tables or spreadsheets because they’re an integral part of the Python and NumPy ecosystems.

In this tutorial, you’ll learn:

  • What a Pandas DataFrame is and how to create one
  • How to access, modify, add, sort, filter, and delete data
  • How to handle missing values
  • How to work with time-series data
  • How to quickly visualize data

It’s time to get started with Pandas DataFrames!

Free Bonus: 5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you'll need to take your Python skills to the next level.

Introducing the Pandas DataFrame

Pandas DataFrames are data structures that contain:

  • Data organized in two dimensions, rows and columns
  • Labels that correspond to the rows and columns

You can start working with DataFrames by importing Pandas:

>>>
>>> import pandas as pd

Now that you have Pandas imported, you can work with DataFrames.

Imagine you’re using Pandas to analyze data about job candidates for a position developing web applications with Python. Say you’re interested in the candidates’ names, cities, ages, and scores on a Python programming test, or py-score:

name city age py-score
101 Xavier Mexico City 41 88.0
102 Ann Toronto 28 79.0
103 Jana Prague 33 81.0
104 Yi Shanghai 34 80.0
105 Robin Manchester 38 68.0
106 Amal Cairo 31 61.0
107 Nori Osaka 37 84.0

In this table, the first row contains the column labels (name, city, age, and py-score). The first column holds the row labels (101, 102, and so on). All other cells are filled with the data values.

Now you have everything you need to create a Pandas DataFrame.

There are several ways to create a Pandas DataFrame. In most cases, you’ll use the DataFrame constructor and provide the data, labels, and other information. You can pass the data as a two-dimensional list, tuple, or NumPy array. You can also pass it as a dictionary or Pandas Series instance, or as one of several other data types not covered in this tutorial.

For this example, assume you’re using a dictionary to pass the data:

>>>
>>> data = {
...     'name': ['Xavier', 'Ann', 'Jana', 'Yi', 'Robin', 'Amal', 'Nori'],
...     'city': ['Mexico City', 'Toronto', 'Prague', 'Shanghai',
...              'Manchester', 'Cairo', 'Osaka'],
...     'age': [41, 28, 33, 34, 38, 31, 37],
...     'py-score': [88.0, 79.0, 81.0, 80.0, 68.0, 61.0, 84.0]
... }

>>> row_labels = [101, 102, 103, 104, 105, 106, 107]

data is a Python variable that refers to the dictionary that holds your candidate data. It also contains the labels of the columns:

  • 'name'
  • 'city'
  • 'age'
  • 'py-score'

Finally, row_labels refers to a list that contains the labels of the rows, which are numbers ranging from 101 to 107.

Now you’re ready to create a Pandas DataFrame:

>>>
>>> df = pd.DataFrame(data=data, index=row_labels)
>>> df
       name         city  age  py-score
101  Xavier  Mexico City   41      88.0
102     Ann      Toronto   28      79.0
103    Jana       Prague   33      81.0
104      Yi     Shanghai   34      80.0
105   Robin   Manchester   38      68.0
106    Amal        Cairo   31      61.0
107    Nori        Osaka   37      84.0

That’s it! df is a variable that holds the reference to your Pandas DataFrame. This Pandas DataFrame looks just like the candidate table above and has the following features:

Read the full article at https://realpython.com/pandas-dataframe/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...