In this tutorial, you will learn how to work along with Python's os
module.
Table of Contents
- Introduction
- Basic Functions
- List Files / Folders in Current Working Directory
- Change working Directory
- Create Single and Nested Directory Structure
- Remove Single and Nested Directory Structure Recursively
- Example with Data Processing
- Conclusion
Introduction
Python is one of the most frequently used languages in recent times for various tasks such as data processing, data analysis, and website building. In this process, there are various tasks that are operating system dependent. Python allows the developer to use several OS-dependent functionalities with the Python module os
. This package abstracts the functionalities of the platform and provides the python functions to navigate, create, delete and modify files and folders. In this tutorial one can expect to learn how to import this package, its basic functionalities and a sample project in python which uses this library for a data merging task.
Some Basic Functions
Let's explore the module with some example code.
Import the library:
import os
Let's get the list of methods that we can use with this module.
print(dir(os))
Output:
['DirEntry', 'F_OK', 'MutableMapping', 'O_APPEND', 'O_BINARY', 'O_CREAT', 'O_EXCL', 'O_NOINHERIT', 'O_RANDOM', 'O_RDONLY', 'O_RDWR', 'O_SEQUENTIAL', 'O_SHORT_LIVED', 'O_TEMPORARY', 'O_TEXT', 'O_TRUNC', 'O_WRONLY', 'P_DETACH', 'P_NOWAIT', 'P_NOWAITO', 'P_OVERLAY', 'P_WAIT', 'PathLike', 'R_OK', 'SEEK_CUR', 'SEEK_END', 'SEEK_SET', 'TMP_MAX', 'W_OK', 'X_OK', '_Environ', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_execvpe', '_exists', '_exit', '_fspath', '_get_exports_list', '_putenv', '_unsetenv', '_wrap_close', 'abc', 'abort', 'access', 'altsep', 'chdir', 'chmod', 'close', 'closerange', 'cpu_count', 'curdir', 'defpath', 'device_encoding', 'devnull', 'dup', 'dup2', 'environ', 'errno', 'error', 'execl', 'execle', 'execlp', 'execlpe', 'execv', 'execve', 'execvp', 'execvpe', 'extsep', 'fdopen', 'fsdecode', 'fsencode', 'fspath', 'fstat', 'fsync', 'ftruncate', 'get_exec_path', 'get_handle_inheritable', 'get_inheritable', 'get_terminal_size', 'getcwd', 'getcwdb', 'getenv', 'getlogin', 'getpid', 'getppid', 'isatty', 'kill', 'linesep', 'link', 'listdir', 'lseek', 'lstat', 'makedirs', 'mkdir', 'name', 'open', 'pardir', 'path', 'pathsep', 'pipe', 'popen', 'putenv', 'read', 'readlink', 'remove', 'removedirs', 'rename', 'renames', 'replace', 'rmdir', 'scandir', 'sep', 'set_handle_inheritable', 'set_inheritable', 'spawnl', 'spawnle', 'spawnv', 'spawnve', 'st', 'startfile', 'stat', 'stat_float_times', 'stat_result', 'statvfs_result', 'strerror', 'supports_bytes_environ', 'supports_dir_fd', 'supports_effective_ids', 'supports_fd', 'supports_follow_symlinks', 'symlink', 'sys', 'system', 'terminal_size', 'times', 'times_result', 'truncate', 'umask', 'uname_result', 'unlink', 'urandom', 'utime', 'waitpid', 'walk', 'write']
Now, using the getcwd
method, we can retrieve the path of the current working directory.
print(os.getcwd())
Output:
C:\Users\hpandya\OneDrive\work\StackAbuse\os_python\os_python\Project
List Folders and Files
Let's list the folders/files in the current directory using listdir
:
print(os.listdir())
Output:
['Data', 'Population_Data', 'README.md', 'tutorial.py', 'tutorial_v2.py']
As you can see, I have 2 folders: Data
and Population_Data
. I also have 3 files: README.md
markdown file, and two Python files namely, tutorial.py
and tutorial_v2.py
.
In order to get the entire tree structure of my project folder, let's write a function and then use os.walk()
to iterate over all the files in each folder of the current directory.
# function to list files in each folder of the current working directory
def list_files(startpath):
for root, dirs, files in os.walk(startpath):
# print(dirs)
if dir!= '.git':
level = root.replace(startpath, '').count(os.sep)
indent = ' ' * 4 * (level)
print('{}{}/'.format(indent, os.path.basename(root)))
subindent = ' ' * 4 * (level + 1)
for f in files:
print('{}{}'.format(subindent, f))
Call this function using the current working directory path, which we saw how to do earlier:
startpath = os.getcwd()
list_files(startpath)
Output:
Project/
README.md
tutorial.py
tutorial_v2.py
Data/
uscitiesv1.4.csv
Population_Data/
Alabama/
Alabama_population.csv
Alaska/
Alaska_population.csv
Arizona/
Arizona_population.csv
Arkansas/
Arkansas_population.csv
California/
California_population.csv
Colorado/
Colorado_population.csv
Connecticut/
Connecticut_population.csv
Delaware/
Delaware_population.csv
...
Note: The output has been truncated for brevity.
As seen from the output, the folders' names are ended with a /
and the files within the folders have been indented four spaces to the right. The Data
folder has one csv file named uscitiesv1.4.csv
. This file has data about population for each city in the United States. The folder Population_Data
has folders for States, containing separated csv files for population data for each state, extracted from uscitiesv1.4.csv
.
Change Working Directory
Let's change the working directory and enter into the directory of data with the state of "New York".
os.chdir('Population_Data/New York')
Now let's run the list_files
method again, but in this directory.
list_files(os.getcwd())
Output:
New York/
New York_population.csv
As you can see, we have entered the New York
folder under Population_Data
folder.
Create Single and Nested Directory Structure
Now, let's create a new directory called testdir
in this directory.
os.mkdir('testdir')
list_files(os.getcwd())
Output:
New York/
New York_population.csv
testdir/
As you can see, it creates the new directory in the current working directory.
Let's create a nested directory with 2 levels.
os.mkdir('level1dir/level2dir')
Output:
Traceback (most recent call last):
File "<ipython-input-12-ac5055572301>", line 1, in <module>
os.mkdir('level1dir/level2dir')
FileNotFoundError: [WinError 3] The system cannot find the path specified: 'level1dir/level2dir'
We get an Error. To be specific, we get a FileNotFoundError
. You might wonder, why a FileNotFound
error when we are trying to create a directory.
The reason: the Python module looks for a directory called level1dir
to create the directory level2dir
. Since level1dir
does not exist, in the first place, it throws a FileNotFoundError
.
For purposes like this, the mkdirs()
function is used instead, which can create multiple directories recursively.
os.makedirs('level1dir/level2dir')
Check the current directory tree,
list_files(os.getcwd())
Output:
New York/
New York_population.csv
level1dir/
level2dir/
testdir/
As we can see, now we have two subdirectories under New York
folder. testdir
and level1dir
. level1dir
has a directory underneath called level2dir
.
Remove Single and Multiple Directories Recursively
The os
module also had methods to modify or remove directories, which I'll show here.
Now, let's remove the directories we just created using rmdir
:
os.rmdir('testdir')
Check the current directory tree to verify that the directory no longer exists:
list_files(os.getcwd())
Output:
New York/
New York_population.csv
level1dir/
level2dir/
As you can see, testdir
has been deleted.
Let's try and delete the nested directory structure of level1dir
and level2dir
.
os.rmdir('level1dir')
Output:
OSError
Traceback (most recent call last)
<ipython-input-14-690e535bcf2c> in <module>()
----> 1 os.rmdir('level1dir')
OSError: [WinError 145] The directory is not empty: 'level1dir'
As seen, this throws a OSError
and rightly so. It says level1dir
directory is not empty. That is correct because it has level2dir
underneath it.
With the rmdir
method it is not possible to remove a non-empty directory, similar to the Unix command-line version.
Just like the makedirs()
method, let's try rmdirs()
, which recursively removes directories in a tree structure.
os.removedirs('level1dir/level2dir')
Let's see the directory tree structure again:
list_files(os.getcwd())
Output:
New York/
New York_population.csv
This brings us to the previous state of the directory.
Example with Data Processing
So far we have explored how to view, create, and remove a nested directory structure. Now let's see an example of how the os
module helps in data processing.
For that let's go one level up in the directory structure.
os.chdir('../')
With that, let's again view the directory tree structure.
list_files(os.getcwd())
Output:
Population_Data/
Alabama/
Alabama_population.csv
Alaska/
Alaska_population.csv
Arizona/
Arizona_population.csv
Arkansas/
Arkansas_population.csv
California/
California_population.csv
Colorado/
Colorado_population.csv
Connecticut/
Connecticut_population.csv
Delaware/
Delaware_population.csv
...
Note: The output has been truncated for brevity.
Let's merge the data from all of the states, iterating over the directory of each state and merging the CSV files likewise.
import os
import pandas as pd
# create a list to hold the data from each state
list_states = []
# iteratively loop over all the folders and add their data to the list
for root, dirs, files in os.walk(os.getcwd()):
if files:
list_states.append(pd.read_csv(root+'/'+files[0], index_col=None))
# merge the dataframes into a single dataframe using Pandas library
merge_data = pd.concat(list_states[1:], sort=False)
Thanks in part to the os
module we were able to create merge_data
, which is a dataframe containing population data from every state.
Conclusion
In this article, we briefly explored different capabilities of Python's built-in os
module. We also saw a brief example of how the module can be used in the world of Data Science and Analytics. It is important to understand that os
has a lot more to offer, and based on the need of the developer a much more complex logic can be constructed.
from Planet Python
via read more
No comments:
Post a Comment