Monday, January 25, 2021

Python Pool: How to Remove Punctuation From a String, List, and File in Python

While doing some Python projects, we need to remove the Punctuation marks to make our code look cleaner. So, keeping this in mind, Python Pool brings you an in-depth article on removing the punctuation marks from a string, list, and file in Python.

The whole article will be divided into three parts. In the first part, we will look at the elimination of punctuation from a string. After that, we will move on to the List, and subsequently, we will see how to remove Punctuation from a file in Python. Accordingly, without wasting any time, let’s directly jump into the tutorial.

What is a Punctuation Mark?

According to Google: Any one of the marks (such as a period, comma, or question mark) used to divide a piece of writing into sentences, clauses, etc., are known as Punctuation marks. Broadly speaking, there are 14 Punctuation Marks listed in English Grammar. They are the period (full stop), question mark, exclamation point/mark, comma, semicolon, colon, dash, hyphen, parentheses, brackets, braces, apostrophe, quotation marks, and ellipses. In this article, we will see how to remove these punctuation marks from our program using Python.

Removing Punctuation Marks from a String in Python

Moving to the first part of our article, we will discuss all possible ways to remove punctuation from a string in Python. At the same time, digging and researching this particular topic. I got to know about 5 ways to remove punctuation from a string. I will try my best to explain through examples and step by step walkthrough to get a clear cut idea. You will not look into other websites or video tutorials after reading this whole composition.

Ways to Remove Punctuation Marks from a String in Python

5 ways to Remove Punctuation from a string in Python:

  1. Using Loops and Punctuation marks string
  2. Using the Regex
  3. By using the translate() method
  4. Using the join() method 
  5. By using Generator Expression

Let’s start our journey with the above five ways to remove punctuation from a String in Python.

Using a for Loop and Punctuation String

This program will remove all punctuations out of a string. We’ll assess each part of the string using for loop. From time to time, we might want to split a sentence into a list of phrases. In these situations, we might first wish to wash up the string and eliminate all punctuation marks. Here’s a good illustration of how it’s completed.

Let’s see the working through an example:

punctuations = '''!()-[]{};:'"\,<>./?@#$%^&amp;*_~'''

inp_str = input("Enter a string: ")

no_punc = ""
for char in inp_str:
   if char not in punctuations:
       no_punc = no_punc + char

print("Punctuation Free String: ",no_punc)

Output:

Enter a string: Hi I am Karan from @python.pool
Punctuation Free String:  Hi I am Karan from pythonpool

Explanation

The above method to remove punctuation from a string in python is a simple brute way this task can be carried out. In this, we assess for the punctuations utilizing a raw string that contains punctuations, and we build string after removing those punctuations.

In this program, we first defined a string named ‘punctuations‘ consists of all punctuation marks. After that, we have taken the input from the user and stored it in ‘inp_str’. Then we iterate over the provided string using a for loop.
We check if the character is a punctuation mark or not using the membership evaluation in every iteration. We have an empty string to which we include (concatenate) the character if it’s no punctuation. Ultimately, we exhibit the cleaned-up string.

Using the Regex to Remove Punctuation from String in Python

Python gives us the regex library to manage all sorts of regular expressions and also control and manipulate the same. If you don’t know what a regular expression is let me tell you: A regular expression is a sequence of characters which specify a search pattern. Normally, these patterns are utilized by string-searching algorithms for “find” or” find and replace” operations on strings, or for input signal. It’s a strategy developed in theoretical computer science and formal language theory.

Note: We need to import re library to work with regular expression.

Regex in python comes with sub-string function and we will use that function. To remove punctuation from string in python.

Syntax of re.sub

re.sub(pattern, replacement, original_string)
  • pattern: The punctuation marks(pattern) we want to replace.
  • replacement: Pattern replacement string (mostly empty string).
  • original_string: The original string from which we need to remove punctuations(pattern).

Let’s see the working through an example:

Example to Remove Punctuation from a String in Python Using Regex

import re

my_string = "Python P$#@!*oo()&amp;l,. is ##th$e$ Bes.t pl*ace to Le@arn P)(*y&amp;tho.n"

op_string = re.sub(r'[^\w\s]','',my_string)

print('String with Punctuation: ', my_string)
print('String without Punctuation: ', op_string)

Output:

String with Punctuation:  Python P$#@!*oo()&l,. is ##th$e$ Bes.t pl*ace to Le@arn P)(*y&tho.n
String without Punctuation:  Python Pool is the Best place to Learn Python

Explanation

In the above example, we need to import the regex library because we are using a function that is available in the regex library. Then we have our input string with punctuations in it. And we have stored it in the variable my_string. Subsequently, with the function re.sub we have, we have removed all the punctuations. Here in the parameters of ‘re.sub’ you might be wondering what r'[^\w\s] is. So, basically, r'[^\w\s] is a pattern to select characters and numbers.

I prefer using Regular Expressions though as they easy to maintain and also easier to understand (if someone else is reading your code).

By using the translate() method to Remove Punctuation From a String in Python

The string translate method is the fastest way to remove punctuation from a string in python. The translate() function is available in the built-in string library. So, we need to import string module to use translate function.

If you don’t know what translate function do is let me explain it to you. The translate() method returns a string where some particular characters are replaced with the character outlined in a dictionary, or in a mapping table.

Let’s see the working through an example:

Example To Remove Punctuation From A String In Python Using Translate Function

import string

my_string = "H*!i I a&amp;m K@ar$an F)(&amp;rom Python P$#@!*oo()&amp;l,"

op_string = my_string.translate(str.maketrans('', '', string.punctuation))

print('String with Punctuation: ', my_string)
print('String without Punctuation: ', op_string)

Output:

String with Punctuation:  H*!i I a&m K@ar$an F)(&rom Python P$#@!*oo()&l,
String without Punctuation:  Hi I am Karan From Python Pool

Explanation

In the above example firstly we need to import the string library. As the translate method is a part of the string module in python. After that, we have initialized our string which consists of a lot of punctuation marks. We can remove all punctuation from these values using the translate() method in the next step. How this method work is it makes a copy of a string with a specific set of values substituted.

To make this work, we’re going to use the string.punctuation as a parameter in the translate method. This method, which is part of the “string” library, gives us a list of all punctuation marks.

Using the join() Method to Remove Punctuation from String in Python

We can also use the join() method to remove punctuation from the string. If you don’t know about the join method let me briefly explain it to you. The join() method gives a flexible approach to make strings out of iterable objects. It joins each component of an iterable (for example, list, string, and tuple) with a string separator (the string on the join() method is called) and returns the concatenated string.

The syntax of the join() method is:

string.join(iterable)

The join() method takes an iterable as the parameter.
Let’s see through an example how we can remove punctuation from a string in python using the join() method.

import string

st = "This , is a sam^ple string f#^ro@m P#@ytho&amp;n P#o#o*~l"

exclude = set(string.punctuation)
st = ''.join(ch for ch in st if ch not in exclude)
print(st)

Output:

This  is a sample string from Python Pool

Explanation:

In the given example, we first start importing the string module. This module provides multiple sets of characters as per your need. In our case, we required all the punctuation characters and created a set of those punctuation marks. Next, we used the join method to combine all the characters by eliminating the punctuation marks in one line.

The join function can be used as a one-liner initializer for lists and strings. In this case, we used it for the sample string.

By Using Generator Expression

The last but not the least method to remove punctuation from a string in Python is by using the generator.  Generators are a simple way of creating iterators.  It returns an object (iterator) which we can iterate over (one value at a time).

def remove_punc_generator(string):
    punc = '''!()-[]{};:'"\,<>./?@#$%^&amp;*_~'''
    for ele in string:  
        if ele in punc:  
            string = string.replace(ele, "") 
    yield string


sample = "This is, a list! For% #Pythonpool"

sample = remove_punc_generator(sample)

print(next(sample))

Output:

This is a list For Pythonpool

Explanation:

There are multiple ways of creating a generator. Two of them are by using yield statements and () comprehension. In the given example, we’ve used the yield to create a generator object for our string.

Firstly, we start by creating a function that accepts a string and then yields the string in the final statement. The yield statement allows the function to return a generator object, further using the next() function. In our code’s last statement, we’ve used the next(sample) to get the item from the generator object.

Removing Punctuation From a List in Python

We have talked about a lot of methods to remove punctuation from a string in Python. But the string is not the only thing in python. We have Lists too. The list is one of the most popular built-in data types. So, it’s become mandatory for us to talk about such a popular datatype and how to remove punctuation from the Lists in Python.

If you guys don’t know what a list is let me briefly explain it to you: The list is a most flexible datatype available in Python. List can be written as a list of comma-separated values (items) between square brackets. Important thing about a list is that items in a list need not be of the same type.

Without wasting any time let’s directly jump to example:

Example to Remove Punctuation From a List in Python

lis = ["Th@!is", "i#s" , "*&amp;a", "list!", "For%", "#Pyt#$hon.?^pool"]

def remove_punc(string):
    punc = '''!()-[]{};:'"\, <>./?@#$%^&amp;*_~'''
    for ele in string:  
        if ele in punc:  
            string = string.replace(ele, "") 
    return string

lis = [remove_punc(i) for i in lis]
print(lis) # cleaned list

Output:

['This', 'is', 'a', 'list', 'For', 'Pythonpool']

Explanation:

Lists are one of the most used data types in python. There are multiple ways for iterating through the list. In the above example, we’ll use list comprehension to loop through all the elements of the list.

Firstly, we start by creating a customized function that accepts a string as a parameter and removes all the string’s punctuations. The removal process is done by replacing all the punctuation marks with an empty character in the string. Then we create a sample list consisting of multiple strings and use the list comprehension method to apply remove_punc() on each of the list elements. Then finally, to check the list, print() is used.

How to Remove Punctuation From a File in Python

While doing some projects and some mathematical tasks it becomes necessary to have a clean and clear text file to work with. Which has no punctuation marks in it. So, that we can perform mathematical calculations easily.

Remove Punctuation From a File in PythonOriginal Text File with Punctuation
filename = input("Enter filename: ")


def remove_punc(string):
    punc = '''!()-[]{};:'"\, <>./?@#$%^&amp;*_~'''
    for ele in string:  
        if ele in punc:  
            string = string.replace(ele, "") 
    return string


try:
    with open(filename,'r',encoding="utf-8") as f:
        data = f.read()
    with open(filename,"w+",encoding="utf-8") as f:
        f.write(remove_punc(data))
    print("Removed punctuations from the file", filename)
except FileNotFoundError:
    print("File not found")

Output:

Remove Punctuation From a File in Python outputClean text file after removing punctuation using Python

Explanation:

Reading and Writing files is an integral part of python codes, and every coder must know how to do it. To do the same, we’ve used the open() method to read and write files.

Firstly, we declare a user input variable that asks the user to enter a filename. Next, we created a customized function to remove all the string punctuation characters. Then we read the file using an open() statement. To avoid the File Not Found error, we’ve used the try-catch method to inform the end-user that the filename is invalid. Then, we use remove_punc() to remove all the punctuation characters and rewrite the file using the open() method.

You Might Be Also Interested in Reading:

Application

 This can have application in data preprocessing in Data Science domain and also in day-day programming. 

Conclusion

To summarize, in this post, you have learned various methods to remove punctuation marks from a string, list, and file in Python.

However, if you have any doubts or questions, do let me know in the comment section below. I will try to help you as soon as possible.

Happy Pythoning!

The post How to Remove Punctuation From a String, List, and File in Python appeared first on Python Pool.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...