Sunday, October 13, 2019

Erik Marsja: How to Read SAS Files in Python with Pandas

The post How to Read SAS Files in Python with Pandas appeared first on Erik Marsja.

In this post, we are going to learn how to read SAS (.sas7dbat) files in Python.

As previously described (in the read .sav files in Python post) Python is a general-purpose language that also can be used for doing data analysis and data visualization.

One potential downside, however, is that Python is not really user-friendly for data storage. This has, of course, lead to that our data many times are stored using Excel, SPSS, SAS, or similar software. See, for instance, the posts about reading .sav and .xlxs files in Python:

Can I Open a SAS File in Python?

Now we may want to answer the question whether how to open a SAS file in Python? In Python, there the two useful packages Pyreadstat, and Pandas that enables us to open SAS files. If we are working with Pandas, the   read_sas method will load a .sav file into a Pandas dataframe. Note, Pyreadstat that is dependent on Pandas, will also create a Pandas dataframe from a .sas file.

How to Open a SAS file in Python

In this secion, we are going to learn how to load a SAS file in Python using the Python package Pyreadstat. Of course, before we use Pyreadstat we need to make sure we have it installed.

How to install Pyreadstat:

Pyreadstat can be installed either using pip or conda:

  1. Install Pyreadstat using pip:
    Open up a terminal, or Windows PowerShell, and type pip install pyreadstat
  2. Install using Conda:
    Open up a terminal, or Windows PowerShell, and type conda install -c conda-forge pyreadstat

How to Load a .sas7bdat File in Python Using Pyreadstat

In this section, we are going to use pyreadstat to import data into a Pandas dataframe. First, we import pyreadstat:

import pyreadstat

Now, we are ready to import SAS files using the method read_sas7bdat (download airline.sas7dbat). Note that, when we load a file using the Pyreadstat package, recognize that it will look for the file in Python’s working directory.

df, meta = pyreadstat.read_sas7bdat('airline.sas7bdat')

In the code chunk above we create two variables; df, and meta. As can be seen when using type the variable “df” is a Pandas dataframe:

type(df)

Thus, we can use all methods available for Pandas dataframe objects. In the next line of code, we are going to print the 5 first rows of the dataframe using pandas head method.

df.head()

See more about working with Pandas dataframes in the following tutorials:

How to Read a SAS file with Python Using Pandas

In this section, we are going to load the same .sav7bdat file into a Pandas dataframe but by using Pandas read_sas method, instead. This have the advantage that we can load the SAS file from an URL.

Before we continue, we need to import Pandas:

import pandas as pd

Now, when we have done that, we can read the .sas7bdat file into a Pandas dataframe using the read_sas method. In the read SAS example here, we are importing the same data file as in the previous example.

Here, we print the 5 last rows of the dataframe using Pandas tail method.

url = 'http://www.principlesofeconometrics.com/sas/airline.sas7bdat'

df = pd.read_sas(url)
df.tail()

How to Read a SAS File and Specific Columns

Note, that read_sas7bdat (Pyreadstat) have the argument “usecols”. By using this argument, we can also select which columns we want to load from the SPSS file to the dataframe:

cols = ['YEAR', 'Y', 'W']
df, meta = pyreadstat.read_sas7bdat('airline.sas7bdat', usecols=cols)
df.head()

How to Save a SAS file to CSV

In this section of the Pandas SAS tutorial we are going to export the .sas7bdat file to a .csv file. This is easy done, we just have to use the to_csv method from the dataframe object we created earlier:

df.to_csv('data_from_sas.csv', index=False)

Remember to put the right path, as second argument, when using to_csv to save a .sas7bdat file as CSV.

Summary: Read SAS Files using Python

Now we have learned how to read and write SAS files in Python. It was quite simple and both methods are, in fact, using the same Python packages.

The post How to Read SAS Files in Python with Pandas appeared first on Erik Marsja.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...