The Pandas DataFrame is a structure that contains two-dimensional data and its corresponding labels. DataFrames are widely used in data science, machine learning, scientific computing, and many other data-intensive fields.
DataFrames are similar to SQL tables or the spreadsheets that you work with in Excel or Calc. In many cases, DataFrames are faster, easier to use, and more powerful than tables or spreadsheets because they’re an integral part of the Python and NumPy ecosystems.
In this tutorial, you’ll learn:
- What a Pandas DataFrame is and how to create one
- How to access, modify, add, sort, filter, and delete data
- How to handle missing values
- How to work with time-series data
- How to quickly visualize data
It’s time to get started with Pandas DataFrames!
Free Bonus: 5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you'll need to take your Python skills to the next level.
Introducing the Pandas DataFrame
Pandas DataFrames are data structures that contain:
- Data organized in two dimensions, rows and columns
- Labels that correspond to the rows and columns
You can start working with DataFrames by importing Pandas:
>>> import pandas as pd
Now that you have Pandas imported, you can work with DataFrames.
Imagine you’re using Pandas to analyze data about job candidates for a position developing web applications with Python. Say you’re interested in the candidates’ names, cities, ages, and scores on a Python programming test, or py-score:
name |
city |
age |
py-score |
|
|---|---|---|---|---|
101 |
Xavier |
Mexico City |
41 |
88.0 |
102 |
Ann |
Toronto |
28 |
79.0 |
103 |
Jana |
Prague |
33 |
81.0 |
104 |
Yi |
Shanghai |
34 |
80.0 |
105 |
Robin |
Manchester |
38 |
68.0 |
106 |
Amal |
Cairo |
31 |
61.0 |
107 |
Nori |
Osaka |
37 |
84.0 |
In this table, the first row contains the column labels (name, city, age, and py-score). The first column holds the row labels (101, 102, and so on). All other cells are filled with the data values.
Now you have everything you need to create a Pandas DataFrame.
There are several ways to create a Pandas DataFrame. In most cases, you’ll use the DataFrame constructor and provide the data, labels, and other information. You can pass the data as a two-dimensional list, tuple, or NumPy array. You can also pass it as a dictionary or Pandas Series instance, or as one of several other data types not covered in this tutorial.
For this example, assume you’re using a dictionary to pass the data:
>>> data = {
... 'name': ['Xavier', 'Ann', 'Jana', 'Yi', 'Robin', 'Amal', 'Nori'],
... 'city': ['Mexico City', 'Toronto', 'Prague', 'Shanghai',
... 'Manchester', 'Cairo', 'Osaka'],
... 'age': [41, 28, 33, 34, 38, 31, 37],
... 'py-score': [88.0, 79.0, 81.0, 80.0, 68.0, 61.0, 84.0]
... }
>>> row_labels = [101, 102, 103, 104, 105, 106, 107]
data is a Python variable that refers to the dictionary that holds your candidate data. It also contains the labels of the columns:
'name''city''age''py-score'
Finally, row_labels refers to a list that contains the labels of the rows, which are numbers ranging from 101 to 107.
Now you’re ready to create a Pandas DataFrame:
>>> df = pd.DataFrame(data=data, index=row_labels)
>>> df
name city age py-score
101 Xavier Mexico City 41 88.0
102 Ann Toronto 28 79.0
103 Jana Prague 33 81.0
104 Yi Shanghai 34 80.0
105 Robin Manchester 38 68.0
106 Amal Cairo 31 61.0
107 Nori Osaka 37 84.0
That’s it! df is a variable that holds the reference to your Pandas DataFrame. This Pandas DataFrame looks just like the candidate table above and has the following features:
Read the full article at https://realpython.com/pandas-dataframe/ »
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
from Planet Python
via read more
No comments:
Post a Comment