Daily Python: ListenData: 15 ways to read CSV file with pandas

This tutorial explains how to read a CSV file in python using read_csv function of pandas package. Without use of read_csv function, it is not straightforward to import CSV file with python object-oriented programming. Pandas is an awesome powerful python package for data manipulation and supports various functions to load and import data from various formats. Here we are covering how to deal with common issues in importing CSV file.

Table of Contents

Install and Load Pandas Package

Make sure you have pandas package already installed on your system. If you set up python using Anaconda, it comes with pandas package so you don't need to install it again. Otherwise you can install it by using command pip install pandas. Next step is to load the package by running the following command. pd is an alias of pandas package. We will use it instead of full name "pandas".

import pandas as pd

Create Sample Data for Import

The program below creates a sample pandas dataframe which can be used further for demonstration.


dt = {'ID': [11, 12, 13, 14, 15],
            'first_name': ['David', 'Jamie', 'Steve', 'Stevart', 'John'],
            'company': ['Aon', 'TCS', 'Google', 'RBS', '.'],
            'salary': [74, 76, 96, 71, 78]}
mydt = pd.DataFrame(dt, columns = ['ID', 'first_name', 'company', 'salary'])

The sample data looks like below -


  ID first_name company  salary
0  11      David     Aon      74
1  12      Jamie     TCS      76
2  13      Steve  Google      96
3  14    Stevart     RBS      71
4  15       John       .      78

Save data as CSV in the working directory

Check working directory before you save your datafile.


import os
os.getcwd()

Incase you want to change the working directory, you can specify it in under os.chdir( ) function. Single backslash does not work in Python so use 2 backslashes while specifying file location.


os.chdir("C:\\Users\\DELL\\Documents\\")

The following command tells python to write data in CSV format in your working directory.


mydt.to_csv('workingfile.csv', index=False)

Example 1 : Read CSV file with header row

It's the basic syntax of read_csv() function. You just need to mention the filename. It assumes you have column names in first row of your CSV file.


mydata = pd.read_csv("workingfile.csv")

It stores the data the way It should be as we have headers in the first row of our datafile. It is important to highlight that header=0 is the default value. Hence we don't need to mention the header= parameter. It means header starts from first row as indexing in python starts from 0. The above code is equivalent to this line of code. pd.read_csv("workingfile.csv", header=0)

Inspect data after importing


mydata.shape
mydata.columns
mydata.dtypes

It returns 5 number of rows and 4 number of columns. Column Names are ['ID', 'first_name', 'company', 'salary']

See the column types of data we imported. first_name and company are character variables. Remaining variables are numeric ones.


ID             int64
first_name    object
company       object
salary         int64

Example 2 : Read CSV file with header in second row

Suppose you have column or variable names in second row. To read this kind of CSV file, you can submit the following command.

mydata = pd.read_csv("workingfile.csv", header = 1)

header=1 tells python to pick header from second row. It's setting second row as header. It's not a realistic example. I just used it for illustration so that you get an idea how to solve it. To make it practical, you can add random values in first row in CSV file and then import it again.


   11    David     Aon  74
0  12    Jamie     TCS  76
1  13    Steve  Google  96
2  14  Stevart     RBS  71
3  15     John       .  78

Define your own column names instead of header row from CSV file


mydata0 = pd.read_csv("workingfile.csv", skiprows=1, names=['CustID', 'Name', 'Companies', 'Income'])

skiprows = 1 means we are ignoring first row and names= option is used to assign variable names manually.


   CustID     Name Companies  Income
0      11    David       Aon      74
1      12    Jamie       TCS      76
2      13    Steve    Google      96
3      14  Stevart       RBS      71
4      15     John         .      78

READ MORE »

from Planet Python
via read more

Daily Python

Tuesday, July 30, 2019

ListenData: 15 ways to read CSV file with pandas

Example 1 : Read CSV file with header row

Example 2 : Read CSV file with header in second row

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

Search This Blog