Wednesday, March 25, 2020

Stack Abuse: Reading and Writing CSV Files in Python with Pandas

There are many ways of reading and writing CSV files in Python. There are a few different methods, for example, you can use Python's built in open() function to read the CSV (Comma Separated Values) files or you can use Python's dedicated csv module to read and write CSV files. Depending on your use-case, you can also use Python's Pandas library to read and write CSV files.

In this article, you will see how to use Python's Pandas library to read and write CSV files. However, before that let's briefly see what a CSV file is.

What is a CSV File?

A CSV file is nothing more than a simple text file. However, it is the most common, simple, and easiest method to store tabular data. This particular format arranges tables by following a specific structure divided into rows and columns. It is these rows and columns that contain your data. A new line terminates each row to start the next row. Similarly, a comma, also known as the delimiter, separates columns within each row.

Take the following table as an example:

| City         | State        | Capital | Population    |
| ------------ | ------------ | ------- | ------------- |
| Philadelphia | Pennsylvania | No      | 1.581 Million |
| Sacramento   | California   | Yes     | 0.5 Million   |
| New York     | New York     | No      | 8.623 Million |
| Austin       | Texas        | Yes     | 0.95 Million  |
| Miami        | Florida      | No      | 0.463 Million |

Now, the above table will look as follows if we represent it in CSV format:

City, State, Capital, Population
Philadelphia, Pennsylvania, No, 1.581 Million
Sacramento, California, Yes, 0.5 Million
New York, New York, No, 8.623 Million
Austin, Texas, Yes, 0.95 Million
Miami, Florida, No, 0.463 Million

As you can see, a comma separates all the values in columns within each row. However, you can use other symbols such as a semicolon (;) as a separator as well. Every row of the table becomes a new line of the CSV file. The core purpose of the CSV format is to help you present the tabular data compactly and concisely.

Now that you have understood what a CSV file is, it is time to study how Pandas' read_csv() method is used to read CSV files in Python.

Reading and Writing CSV Files using Pandas

Pandas is a very powerful and popular framework for data analysis and manipulation. One of the most striking features of Pandas is its ability to read and write various types of files including CSV and Excel. You can effectively and easily manipulate CSV files in Pandas using functions like read_csv() and to_csv().

Installing Pandas

We have to install Panda before using the framework. One of the easiest methods to install Pandas is to install Anaconda. It is a cross-platform Python Distribution for tasks like Python computing and data analysis. Once you install Anaconda, you will have access to Pandas and other libraries such as SciPy and NumPy without doing anything else.

Besides, you can also install Pandas via pip installer, as shown here:

$ pip install pandas

Reading CSV Files with read_csv()

Once you go through the installation, you can use the read_csv() function to read a CSV file. We will try to read the "titanic.csv" file, which can be downloaded from this link.

First we have to import the Pandas library:

import pandas as pd

Now, we use the following line of code to actually read and parse the file:

titanic_data = pd.read_csv('titanic.csv')

When we execute this code, it will read the CSV file "titanic.csv" from the current directory. You can see from the script above that to read a CSV file, you have to pass the file path to the read_csv() method of the Pandas library. The read_csv() method then returns a Pandas DataFrame that contains the data of the CSV file.

You can display the first five rows of the CSV file via the head() method of the Pandas DataFrame, as shown below:

titanic_data.head()

Output:

Titanic Dataset Header

You can also read a CSV file from an online resource by simply passing the URL of the resource to the read_csv() method. Let's read the "titanic.csv" file located on a remote GitHub repository.

import pandas as pd

titanic_data = pd.read_csv(r'https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv')

By default, the read_csv() method treats the values in the first row of a CSV file as column headers. However, you can pass your custom header names while reading a file via the read_csv() method:

import pandas as pd

col_names = ['Id',
             'Survived',
             'Passenger Class',
             'Full Name',
             'Gender',
             'Age',
             'SibSp',
             'Parch',
             'Ticket Number',
             'Price', 'Cabin',
             'Station']

titanic_data = pd.read_csv(r'E:\Datasets\titanic.csv', names=col_names, header=None)

To read a CSV file with custom headers, you need to pass a list of custom column names to the names attribute of the read_csv() method. If you want to overwrite the default header names, you can pass the header=None keyword argument.

Let's now see the header names of the "titanic.csv" file that we read in the last example.

titanic_data.head()

Output:

Titanic Dataset Custom Header

Here you can see your custom headers in the output from the DataFrame.

Writing CSV Files with to_csv()

The process of creating or writing a CSV file through Pandas can be a little more complicated than reading CSV, but it's still relatively simple. We use the to_csv() function to perform this task. However, you have to create a Pandas DataFrame first, followed by writing that DataFrame to the CSV file.

Column names can also be specified via the keyword argument columns, as well as a different delimiter via the sep argument. Again, the default delimiter is a comma, ','.

Here is a simple example showing how to export a DataFrame to a CSV file via to_csv():

import pandas as pd

city = pd.DataFrame([['Sacramento', 'California'], ['Miami', 'Florida']], columns=['City', 'State'])
city.to_csv('city.csv')

In the above example, we have created a DataFrame named city. Subsequently, we have written that DataFrame to a file named "city.csv" using the to_csv() function. The resulting CSV file should have the following contents:

City,State
Sacramento,California
Miami,Florida

The CSV file contain our custom headers, followed by the 2 rows of data contained in the DataFrame we created.

Conclusion

The article shows how to read and write CSV files using Python's Pandas library. To read a CSV file, the read_csv() method of the Pandas library is used. You can also pass custom header names while reading CSV files via the names attribute of the read_csv() method. Finally, to write a CSV file using Pandas, you first have to create a Pandas DataFrame object and then call to_csv method on the DataFrame.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...