This tutorial explains how to read a CSV file in python using read_csv function of pandas package. Without use of read_csv function, it is not straightforward to import CSV file with python object-oriented programming. Pandas is an awesome powerful python package for data manipulation and supports various functions to load and import data from various formats. Here we are covering how to deal with common issues in importing CSV file.
READ MORE »Table of Contents
Install and Load Pandas Package
Make sure you have pandas package already installed on your system. If you set up python using Anaconda, it comes with pandas package so you don't need to install it again. Otherwise you can install it by using command pip install pandas
. Next step is to load the package by running the following command. pd
is an alias of pandas package. We will use it instead of full name "pandas".
import pandas as pd
Create Sample Data for Import
The program below creates a sample pandas dataframe which can be used further for demonstration.
The sample data looks like below -
dt = {'ID': [11, 12, 13, 14, 15],
'first_name': ['David', 'Jamie', 'Steve', 'Stevart', 'John'],
'company': ['Aon', 'TCS', 'Google', 'RBS', '.'],
'salary': [74, 76, 96, 71, 78]}
mydt = pd.DataFrame(dt, columns = ['ID', 'first_name', 'company', 'salary'])
ID first_name company salary
0 11 David Aon 74
1 12 Jamie TCS 76
2 13 Steve Google 96
3 14 Stevart RBS 71
4 15 John . 78
Save data as CSV in the working directory
Check working directory before you save your datafile.
Incase you want to change the working directory, you can specify it in under
import os
os.getcwd()
os.chdir( )
function. Single backslash does not work in Python so use 2 backslashes while specifying file location.
The following command tells python to write data in CSV format in your working directory.
os.chdir("C:\\Users\\DELL\\Documents\\")
mydt.to_csv('workingfile.csv', index=False)
Example 1 : Read CSV file with header row
It's the basic syntax of read_csv() function. You just need to mention the filename. It assumes you have column names in first row of your CSV file.It stores the data the way It should be as we have headers in the first row of our datafile. It is important to highlight that
mydata = pd.read_csv("workingfile.csv")
header=0
is the default value. Hence we don't need to mention the header= parameter. It means header starts from first row as indexing in python starts from 0. The above code is equivalent to this line of code. pd.read_csv("workingfile.csv", header=0)
Inspect data after importing
It returns 5 number of rows and 4 number of columns. Column Names are
mydata.shape
mydata.columns
mydata.dtypes
['ID', 'first_name', 'company', 'salary']
See the column types of data we imported. first_name and company are character variables. Remaining variables are numeric ones.
ID int64
first_name object
company object
salary int64
Example 2 : Read CSV file with header in second row
Suppose you have column or variable names in second row. To read this kind of CSV file, you can submit the following command.mydata = pd.read_csv("workingfile.csv", header = 1)
header=1
tells python to pick header from second row. It's setting second row as header. It's not a realistic example. I just used it for illustration so that you get an idea how to solve it. To make it practical, you can add random values in first row in CSV file and then import it again.
11 David Aon 74
0 12 Jamie TCS 76
1 13 Steve Google 96
2 14 Stevart RBS 71
3 15 John . 78
Define your own column names instead of header row from CSV file
skiprows = 1 means we are ignoring first row and names= option is used to assign variable names manually.
mydata0 = pd.read_csv("workingfile.csv", skiprows=1, names=['CustID', 'Name', 'Companies', 'Income'])
CustID Name Companies Income
0 11 David Aon 74
1 12 Jamie TCS 76
2 13 Steve Google 96
3 14 Stevart RBS 71
4 15 John . 78
from Planet Python
via read more
No comments:
Post a Comment