Introduction
String manipulation in Python is achieved through a set of built-in methods, which return new strings since they're immutable. In this guide, we will look at methods to strip whitespace (in other languages it's also known as trimming) from strings in Python.
Trim Methods - strip()
In Python, the stripping methods are capable of removing leading and trailing spaces and specific characters. The leading and trailing spaces, include blanks, tabs (\t
), carriage returns (\r
, \n
) and the other lesser-known whitespace characters that can be found here.
There are three ways in which the spaces or specific characters can be stripped from strings:
strip(chars)
- The vanilla strip method strips both the left and right sides of the string of the white spaces or characters mentioned.lstrip(chars)
- The 'l' in the method's name corresponds to left, and this method strips white spaces or characters mentioned to the left of the input string.rstrip(chars)
- The 'r' in the method's name correspond to right, and you guessed it right - it strips the characters to the right of the input string
If characters are to be stripped from the string, they need to be passed as an argument to the method, say input.rstrip("abc")
. This is an optional argument, by default the functions strip whitespace as that's the most common usage.
Trimming/Stripping Whitespace from Strings
Now that we are clear of what these methods are capable of, let's dive into some examples. We have an input
and output
string. The input
variable denotes a string with both trailing and leading spaces, while the output
string is a template that we can use to highlight these spaces:
# trim.py
input = " Stack Abuse "
output = "|{}|"
# Remove leading spaces or spaces to the left
print("lstrip() Output:", output.format(input.lstrip()))
# Remove trailing spaces or spaces to the right
print("rstrip() Output:", output.format(input.rstrip()))
# Remove both trailing and leading spaces
print(" strip() Output:", output.format(input.strip()))
Once we strip()
the input
and add that result in-between the pipes (|
), any whitespaces left will be very noticable.
Running this code results in:
$ python trim.py
lstrip() Output: |Stack Abuse |
rstrip() Output: | Stack Abuse|
strip() Output: |Stack Abuse|
Trimming/Stripping Special Characters from Strings
Instead of removing only empty characters, it's not uncommon to remove a certain trailing and leading special character. Let's pass in a character argument to the strip()
method:
# trim_chars.py
input = " ~~ Stack Abuse ~~ "
output = "|{}|"
# Remove leading spaces or spaces to the left
print("lstrip() Output:", output.format(input.lstrip("~ ")))
# Remove trailing spaces or spaces to the right
print("rstrip() Output:", output.format(input.rstrip("~ ")))
# Remove both trailing and leading spaces
print(" strip() Output:", output.format(input.strip("~ ")))
We've passed in the tilde as well as a whitespace ("~
") as the argument of the strip()
methods, removing any occurrence of either of them from the left, right and both sides of the string. It's worth noting that the order of these doesn't matter and that the strip()
method doesn't perform pattern matching to remove these. It's conceptually similar to calling the strip()
method twice, for each character.
Running this code results in:
$ python trim_chars.py
lstrip() Output: |Stack Abuse ~~ |
rstrip() Output: | ~~ Stack Abuse|
strip() Output: |Stack Abuse|
Using strip() on a Pandas Series
We can also perform the strip()
methods for the a Pandas Series
. The trailing spaces and characters for individual cells of the series can be stripped off. One thing to note is that the series needs to be converted to a string series before one performs the strip()
operation.
Note: If you are new to Pandas, read our Beginner's Guide to Pandas to learn more about the library and how to set it up. Once Pandas is installed in your system, you can follow along with this code example!
Consider the following script:
# strip_series.py
import pandas as pd
s = pd.Series(['1. Cell1. ~', '2. Cell2!\n'])
print("Before strip():\n", s)
print("\nAfter strip():\n", s.str.strip('.\n!~ '))
Here, we're creating a series with cells containing special characters and trailing spaces. Via series.str
, we can perform a method on each element of the Series. Considering the fact that these are strings, we can run any string method.
That being said, we can easily perform a strip()
on each element of the sequence:
$ python strip_series.py
Before strip()
0 1. Cell1. ~
1 2. Cell2!\n
dtype: object
After strip()
0 1. Cell1
1 2. Cell2
dtype: object
Conclusion
Stripping (or trimming) characters from a string can be quite helpful when cleaning datasets, text files with multiple lines, or even the API responses. The basic but powerful Python methods can even work on Pandas series as well.
from Planet Python
via read more
No comments:
Post a Comment