Friday, December 31, 2021

Stack Abuse: Count Number of Word Occurrences in List Python

Introduction

Counting the word frequency in a list element in Python is a relatively common task - especially when creating distribution data for histograms.

Say we have a list ['b', 'b', 'a'] - we have two occurrences on "b" and one of "a". This guide will show you three different ways to count the number of word occurrences in a Python list:

  • Using Pandas and Numpy
  • Using the count() Function
  • Using the Collection Module's Counter
  • Using a Loop and a Counter Variable

In practice, you'll use Pandas/Nunpy, the count() function or a Counter as they're pretty convenient to use.

Using Pandas and Numpy

The shortest and easiest way to get value counts in an easily-manipulable format (DataFrame) is via Numpy and Pandas. We can wrap the list into a Numpy array, and then call the value_counts() method of the pd instance (which is also available for all DataFrame instances):

import numpy as np
import pandas as pd

words = ['hello', 'goodbye', 'howdy', 'hello', 'hello', 'hi', 'bye']

pd.value_counts(np.array(words))

This results in a DataFrame that contains:

hello      3
goodbye    1
bye        1
howdy      1
hi         1
dtype: int64

You can access its values field to get the counts themselves, or index to get the words themselves:

df = pd.value_counts(np.array(words))

print('Index:', df.index)
print('Values:', df.values)

This results in:

Index: Index(['hello', 'goodbye', 'bye', 'howdy', 'hi'], dtype='object')

Values: [3 1 1 1 1]

Using the count() Function

The "standard" way (no external libraries) to get the count of word occurrences in a list is by using the list object's count() function.

The count() method is a built-in function that takes an element as its only argument and returns the number of times that element appears in the list.

The complexity of the count() function is O(n), where n is the number of factors present in the list.

The code below uses count() to get the number of occurrences for a word in a list:

words = ['hello', 'goodbye', 'howdy', 'hello', 'hello', 'hi', 'bye']

print(f'"hello" appears {words.count("hello")} time(s)')
print(f'"howdy" appears {words.count("howdy")} time(s)')

This should give us the same output as before using loops:

"hello" appears 3 time(s)
"howdy" appears 1 time(s)

The count() method offers us an easy way to get the number of word occurrences in a list for each individual word.

Using the Collection Module's Counter

The Counter class instance can be used to, well, count instances of other objects. By passing a list into its constructor, we instantiate a Counter which returns a dictionary of all the elements and their occurrences in a list.

From there, to get a single word's occurrence you can just use the word as a key for the dictionary:

from collections import Counter

words = ['hello', 'goodbye', 'howdy', 'hello', 'hello', 'hi', 'bye']

word_counts = Counter(words)

print(f'"hello" appears {word_counts["hello"]} time(s)')
print(f'"howdy" appears {word_counts["howdy"]} time(s)')

This resuts in:

"hello" appears 3 time(s)
"howdy" appears 1 time(s)

Using a Loop and a Counter Variable

Ultimately, a brute force approach that loops through every word in the list, incrementing a counter by one when the word is found, and returning the total word count will work!

Of course, this method gets more inefficient as the list size grows, it's just conceptually easy to understand and implement.

The code below uses this approach in the count_occurrence() method:

def count_occurrence(words, word_to_count):
    count = 0
    for word in words:
        if word == word_to_count:
          # update counter variable
            count = count + 1
    return count


words = ['hello', 'goodbye', 'howdy', 'hello', 'hello', 'hi', 'bye']
print(f'"hello" appears {count_occurrence(words, "hello")} time(s)')
print(f'"howdy" appears {count_occurrence(words, "howdy")} time(s)')

If you run this code you should see this output:

"hello" appears 3 time(s)
"howdy" appears 1 time(s)

Nice and easy!

Most Efficient Solution?

Naturally - you'll be searching for the most efficient solution if you're dealing with large corpora of words. Let's benchmark all of these to see how they perform.

The task can be broken down into finding occurrences for all words or a single word, and we'll be doing benchmarks for both, starting with all words:

import numpy as np
import pandas as pd
import collections

def pdNumpy(words):
    def _pdNumpy():
        return pd.value_counts(np.array(words))
    return _pdNumpy

def countFunction(words):
    def _countFunction():
        counts = []
        for word in words:
            counts.append(words.count(word))
        return counts
    return _countFunction

def counterObject(words):
    def _counterObject():
        return collections.Counter(words)
    return _counterObject
    
import timeit

words = ['hello', 'goodbye', 'howdy', 'hello', 'hello', 'hi', 'bye']

print("Time to execute:\n")
print("Pandas/Numpy: %ss" % timeit.Timer(pdNumpy(words)).timeit(1000))
print("count(): %ss" % timeit.Timer(countFunction(words)).timeit(1000))
print("Counter: %ss" % timeit.Timer(counterObject(words)).timeit(1000))

Which results in:

Time to execute:

Pandas/Numpy: 0.33886080000047514s
count(): 0.0009540999999444466s
Counter: 0.0019409999995332328s

The count() method is extremely fast compared to the other variants, however, it doesn't give us the labels associated with the counts like the other two do.

If you need the labels - the Counter outperforms the inefficient process of wrapping the list in a Numpy array and then counting.

On the other hand, you can make use of DataFrame's methods for sorting or other manipulation that you can't do otherwise. Counter has some unique methods as well.

Ultimately, you can use the Counter to create a dictionary and turn the dictionary into a DataFrame as as well, to leverage the speed of Counter and the versatility of DataFrames:

df = pd.DataFrame.from_dict([Counter(words)]).T

If you don't need the labels - count() is the way to go.

Alternatively, if you're looking for a single word:

import numpy as np
import pandas as pd
import collections

def countFunction(words, word_to_search):
    def _countFunction():
        return words.count(word_to_search)
    return _countFunction

def counterObject(words, word_to_search):
    def _counterObject():
        return collections.Counter(words)[word_to_search]
    return _counterObject

def bruteForce(words, word_to_search):
    def _bruteForce():
        counts = []
        count = 0
        for word in words:
            if word == word_to_search:
              # update counter variable
                count = count + 1
            counts.append(count)
        return counts
    return _bruteForce
    
import timeit

words = ['hello', 'goodbye', 'howdy', 'hello', 'hello', 'hi', 'bye']

print("Time to execute:\n")
print("count(): %ss" % timeit.Timer(countFunction(words, 'hello')).timeit(1000))
print("Counter: %ss" % timeit.Timer(counterObject(words, 'hello')).timeit(1000))
print("Brute Force: %ss" % timeit.Timer(bruteForce(words, 'hello')).timeit(1000))

Which results in:

Time to execute:

count(): 0.0001573999998072395s
Counter: 0.0019498999999996158s
Brute Force: 0.0005682000000888365s

The brute force search and count() methods outperform the Counter, mainly because the Counter inherently counts all words instead of one.

Conclusion

In this guide, we explored finding the occurrence of the word in a Python list, assessing the efficiency of each solution and weighing when each is more suitable.



from Planet Python
via read more

Thursday, December 30, 2021

Codementor: Geospatial Data in Python - Interactive Visualization

A step by step tutorial to visualize geospatial data on interactive maps using Python and Folium.

from Planet Python
via read more

Zero to Mastery: Python Monthly Newsletter 💻🐍 December 2021

25th issue of the Python Monthly Newsletter! Read by 20,000+ Python developers every month. This monthly Python newsletter covers the latest Python news so that you stay up-to-date with the industry and keep your skills sharp.

from Planet Python
via read more

Data-Centric Approach vs Model-Centric Approach in Machine Learning

Code and data are the foundations of the AI system. Both of these components play an important role in the development of a robust model but which one should you focus on more? In this article, we’ll go through the data-centric vs model-centric approaches, and see which one is better, we would also talk about […]

The post Data-Centric Approach vs Model-Centric Approach in Machine Learning appeared first on neptune.ai.



from Planet SciPy
read more

Python for Beginners: Factors Of A Number In Python

You might have heard about multiples and factors of a number in Python. If you are reading this blog, I can definitely tell you that you are looking to write a program for finding factors of a number. In this article, we will discuss and implement a program to find factors of a number in python.

What Are Factors Of A Number?

A number N is said to be the factor of another number M if N completely divides M. In other words, If we are given two numbers N and M and Upon dividing M by N, there is no remainder then N is called a factor of M. You can also easily identify that any factor of a number is always less than or equal to the number itself.

For instance, 5 is a factor of 20 as 20 divided by 5 gives 4 as output with no remainder. 

How To Find Factors Of A Number In Python?

To find the factors of a number M, we can divide M by numbers from 1 to M. While dividing M, if a number N leaves no remainder, we will say that N is a factor of M. For this purpose, we can use a for loop in python as follows.

factors = set()
M = 120  # number whose factors we need to find
for N in range(1, M + 1):
    if M % N == 0:  # remainder is zero
        factors.add(N)
print("Factors of {} are {}".format(M, factors))

Output:

Factors of 120 are {1, 2, 3, 4, 5, 6, 8, 40, 10, 12, 120, 15, 20, 24, 60, 30}

In the above example, we have declared a set named factors to store the factors of the number M. If any number leaves remainder 0 when it divides M, we add the number to the set. After the execution of the for loop, we get a set of all the factors of the number M.

As we know, the only factor of a number M greater than M/2 is M itself. So, we can skip dividing M by numbers greater than M/2 to find the factors of M more efficiently as follows.

factors = set()
M = 120  # number whose factors we need to find
factors.add(M)  # a number is a factor of itself
for N in range(1, M // 2 + 1):
    if M % N == 0:  # remainder is zero
        factors.add(N)
print("Factors of {} are {}".format(M, factors))

Output:

Factors of 120 are {1, 2, 3, 4, 5, 6, 8, 40, 10, 12, 15, 20, 120, 24, 60, 30}

We know that factors of a number occur in pairs. For example, factors of a number M can be present in the pairs (1, M), (2, M/2), (3, M/3), (4, M/4) till (M1/2, M1/2). So, instead of using a for loop to check for factors till M/2, we will check for factors till M1/2. Whenever we find a factor, we will also store its pair in the set containing all the factors. In this way, we can find factors of a number in python more efficiently.

factors = set()
M = 120  # number whose factors we need to find
factors.add(M)  # a number is a factor of itself
for N in range(1, M):
    if N * N > M:
        break
    if M % N == 0:  # remainder is zero
        factors.add(N)
        factors.add(M // N)
print("Factors of {} are {}".format(M, factors))

Output:

Factors of 120 are {1, 2, 3, 4, 5, 6, 40, 8, 10, 12, 15, 20, 120, 24, 60, 30}

Conclusion

In this article, we have discussed three programs to find factors of a number in python. To learn more about numbers in python, you can read this article on decimal numbers in python. You might also like this article on complex numbers in python.

The post Factors Of A Number In Python appeared first on PythonForBeginners.com.



from Planet Python
via read more

Michał Bultrowicz: Make and entr for code validation during editing

For a while now, I’ve been wondering how to combine entr (which automatically runs commands on file changes) with the way I setup project validation (both for CI/CD and for local developer usage) with Makefiles. The best thing I got so far is the validate_continously target in my Makefile.



from Planet Python
via read more

Stack Abuse: Parsing XML with BeautifulSoup in Python

Introduction

Extensible Markup Language (XML) is a markup language that's popular because of the way it structures data. It found usage in data transmission (representing serialized objects) and configuration files.

Despite JSON's rising popularity, you can still find XML in Android development's manifest file, Java/Maven build tools and SOAP APIs on the web. Parsing XML is therefore still a common task a developer would have to do.

In Python, we can read and parse XML by leveraging two libraries: BeautifulSoup and LXML.

In this guide, we’ll take a look at extracting and parsing data from XML files with BeautifulSoup and LXML, and store the results using Pandas.

Setting up LXML and BeautifulSoup

We first need to install both libraries. We'll create a new folder in your workspace, set up a virtual environment, and install the libraries:

$ mkdir xml_parsing_tutorial
$ cd xml_parsing_tutorial
$ python3 -m venv env # Create a virtual environment for this project
$ . env/bin/activate # Activate the virtual environment
$ pip install lxml beautifulsoup4 # Install both Python packages

Now that we have everything set up, let's do some parsing!

Parsing XML with lxml and BeautifulSoup

Parsing always depends on the underlying file and the structure it uses so there's no single silver bullet for all files. BeautifulSoup parses them automatically, but the underlying elements are task-dependent.

Thus, it's best to learn parsing with a hands-on approach. Save the following XML into a file in your working directory - teachers.xml:

<?xml version="1.0" encoding="UTF-8"?>
<teachers>
    <teacher>
        <name>Sam Davies</name>
        <age>35</age>
        <subject>Maths</subject>
    </teacher>
    <teacher>
        <name>Cassie Stone</name>
        <age>24</age>
        <subject>Science</subject>
    </teacher>
    <teacher>
        <name>Derek Brandon</name>
        <age>32</age>
        <subject>History</subject>
    </teacher>
</teachers>

The <teachers> tag indicates the root of the XML document, the <teacher> tag is a child or sub-element of the <teachers></teachers>, with information about a singular person. The <name>, <age>, <subject> are children of the <teacher> tag, and grand-children of the <teachers> tag.

The first line, <?xml version="1.0" encoding="UTF-8"?>, in the sample document above is called an XML prolog. It always comes at the beginning of an XML file, although it is completely optional to include an XML prolog in an XML document.

The XML prolog shown above indicates the version of XML used and the type of character encoding. In this case, the characters in the XML document are encoded in UTF-8.

Now that we understand the structure of the XML file - we can parse it. Create a new file called teachers.py in your working directory, and import the BeautifulSoup library:

from bs4 import BeautifulSoup

Note: As you may have noticed, we didn’t import lxml! With importing BeautifulSoup, LXML is automatically integrated, so importing it separately isn't necessary, but it isn't installed as part of BeautifulSoup.

Now let’s read the contents of the XML file we created and store it in a variable called soup so we can begin parsing:

with open('teachers.xml', 'r') as f:
        file = f.read() 

# 'xml' is the parser used. For html files, which BeautifulSoup is typically used for, it would be 'html.parser'.
soup = BeautifulSoup(file, 'xml')

The soup variable now has the parsed contents of our XML file. We can use this variable and the methods attached to it to retrieve the XML information with Python code.

Let’s say we want to view only the names of the teachers from the XML document. We can get that information with a few lines of code:

names = soup.find_all('name')
for name in names:
    print(name.text)

Running python teachers.py would give us:

Sam Davis 
Cassie Stone 
Derek Brandon

The find_all() method returns a list of all the matching tags passed into it as an argument. As shown in the code above, soup.find_all('name') returns all the <name> tags in the XML file. We then iterate over these tags and print their text property, which contains the tags' values.

Display Parsed Data in a Table

Let's take things one step further, we'll parse all the contents of the XML file and display it in a tabular format.

Let's rewrite the teachers.py file with:

from bs4 import BeautifulSoup

# Opens and reads the xml file we saved earlier
with open('teachers.xml', 'r') as f:
    file = f.read()

# Initializing soup variable
soup = BeautifulSoup(file, 'xml')

# Storing <name> tags and elements in names variable
names = soup.find_all('name')

# Storing <age> tags and elements in 'ages' variable
ages = soup.find_all('age')

# Storing <subject> tags and elements in 'subjects' variable
subjects = soup.find_all('subject')

# Displaying data in tabular format
print('-'.center(35, '-'))
print('|' + 'Name'.center(15) + '|' + ' Age ' + '|' + 'Subject'.center(11) + '|')
for i in range(0, len(names)):
    print('-'.center(35, '-'))
    print(
        f'|{names[i].text.center(15)}|{ages[i].text.center(5)}|{subjects[i].text.center(11)}|')
print('-'.center(35, '-'))

The output of the code above would look like this:

-----------------------------------
|      Name     | Age |  Subject  |
-----------------------------------
|   Sam Davies  |  35 |   Maths   |
-----------------------------------
|  Cassie Stone |  24 |  Science  |
-----------------------------------
| Derek Brandon |  32 |  History  |
-----------------------------------

Congrats! You just parsed your first XML file with BeautifulSoup and LXML! Now that you're more comfortable with the theory and the process, let's try a more real-world example.

We've formatted the data as a table as a precursor to storing it in a versatile data structure. Namely - in the upcoming mini-project, we'll store the data in a Pandas DataFrame.

If you aren't already familiar with DataFrames - read our Python with Pandas: Guide to DataFrames!

Parsing an RSS Feed and Storing the Data to a CSV

In this section, we'll parse an RSS feed of The New York Times News, and store that data in a CSV file.

RSS is short for Really Simple Syndication. An RSS feed is a file that contains a summary of updates from a website and is written in XML. In this case, the RSS feed of The New York Times contains a summary of daily news updates on their website. This summary contains links to news releases, links to article images, descriptions of news items, and more. RSS feeds are also used to allow people to get data without scraping websites as a nice token by website owners.

Here's a snapshot of an RSS feed from The New York Times:

Image showcasing an XML document containing news updates relating to the U.S on The New York Times News website.

You can gain access to different New York Times RSS feeds of different continents, countries, regions, topics and other criteria via this link.

It's important to see and understand the structure of the data before you can begin parsing it. The data we would like to extract from the RSS feed about each news article is:

  • Globally Unique Identifier (GUID)
  • Title
  • Publication Date
  • Description

Now that we're familiar with the structure and have clear goals, let's kick off our program! We'll need the requests library and the pandas library to retrieve the data and easily convert it to a CSV file.

If you haven't worked with requests before, read out Guide to Python's requests Module!

With requests, we can make HTTP requests to websites and parse the responses. In this case, we can use it to retrieve their RSS feeds (in XML) so BeautifulSoup can parse it. With pandas, we will be able to format the parsed data in a table, and finally store the table's contents into a CSV file.

In the same working directory, install requests and pandas (your virtual environment should still be active):

$ pip install requests pandas

In a new file, nyt_rss_feed.py, let's import our libraries:

import requests
from bs4 import BeautifulSoup
import pandas as pd

Then, let's make an HTTP request to The New York Times' server to get their RSS feed and retrieve its contents:

url = 'https://rss.nytimes.com/services/xml/rss/nyt/US.xml'
xml_data = requests.get(url).content 

With the code above, we have been able to get a response from the HTTP request and store its contents in the xml_data variable. The requests library returns data as bytes.

Now, create the following function to parse the XML data into a table in Pandas, with the help of BeautifulSoup:

def parse_xml(xml_data):
  # Initializing soup variable
    soup = BeautifulSoup(xml_data, 'xml')

  # Creating column for table
    df = pd.DataFrame(columns=['guid', 'title', 'pubDate', 'description'])

  # Iterating through item tag and extracting elements
    all_items = soup.find_all('item')
    items_length = len(all_items)
    
    for index, item in enumerate(all_items):
        guid = item.find('guid').text
        title = item.find('title').text
        pub_date = item.find('pubDate').text
        description = item.find('description').text

       # Adding extracted elements to rows in table
        row = {
            'guid': guid,
            'title': title,
            'pubDate': pub_date,
            'description': description
        }

        df = df.append(row, ignore_index=True)
        print(f'Appending row %s of %s' % (index+1, items_length))

    return df

The function above parses XML data from an HTTP request with BeautifulSoup, storing its contents in a soup variable. The Pandas DataFrame with rows and columns for the data we would like to parse is referenced via the df variable.

We then iterate through the XML file to find all tags with <item>. By iterating through the <item> tag we are able to extract its children tags: <guid>, <title>, <pubDate>, and <description>. Note how we use the find() method to get only one object. We append the values of each child tag to the Pandas table.

Now, at the end of the file after the function, add these two lines of code to call the function and create a CSV file:

df = parse_xml(xml_data)
df.to_csv('news.csv')

Run python nyt_rss_feed.py to create a new CSV file in your present working directory:

Appending row 1 of 24
Appending row 2 of 24
...
Appending row 24 of 24

The contents of the CSV file would look like this:

Image showing CSV output of parsed RSS feed for The New York Times

Note: Downloading data may take a bit depending on your internet connection and the RSS feed. Parsing data may take a bit depending on your CPU and memory resources as well. The feed we've used is fairly small so it should process quickly. Please be patient if you don't see results immediately.

Congrats, you've successfully parsed an RSS feed from The New York Times News and converted it to a CSV file!

Conclusion

In this guide, we learned how we can set up BeautifulSoup and LXML to parse XML files. We first got practice by parsing a simple XML file with teacher data, and then we parsed The New York Times's RSS feed, converting their data to a CSV file.

You can use these techniques to parse other XML you may encounter, and convert them into different formats that you need!



from Planet Python
via read more

Kushal Das: Johnnycanencrypt 0.6.0 released

A few days ago I released 0.6.0 of Johnnycanencrypt. It is a Python module written in Rust for OpenPGP using the amazing sequoia-pgp library. It allows you to access/use Yubikeys (without gpg-agent) directly from your code.

This release took almost an year. Though most of the work was done before, but I was not in a state to do a release.

Major changes

  • We can now sign and decrypt using both Curve25519 and RSA keys on the smartcard (we support only Yubikeys)
  • Changing of the password of the secret keys
  • Updating expiry date of the subkeys
  • Adding new user ID(s)
  • Revoking user ID(s)

I also released a new version of Tumpa which uses this. An updated package for Debian 11.



from Planet Python
via read more

Wednesday, December 29, 2021

PyCharm: Early Access PyCharm Podcast: PyCharm and the 2021.3 release

In this episode, Nafiul talks to Andrey Vlasovskikh, PyCharm’s team lead, and Aleksei Kniazev, responsible for PyCharm’s web frameworks support, about the 2021.3 PyCharm release. They talked about the new FastAPI support as well as about the new “Endpoints” tool window present in PyCharm Pro.

But that’s not all, the team also discussed the new Jupyter Support, a long-awaited feature, and why JetBrains has also released another IDE, called DataSpell, focused on the data science workflow.

Finally, the gang discussed the future of development and remote development; especially how JetBrains plans to make sure that code completion and refactoring are maintained even with a headless version of PyCharm.

Learn more about the new features in PyCharm 2021.3 and download PyCharm now!

https://youtu.be/iBzsUPoPeII


from Planet Python
via read more

Model Deployment Challenges: 6 Lessons From 6 ML Engineers

Deploying machine learning models is hard! If you don’t believe me, ask any ML engineer or data team that has been asked to put their models into production. To further back up this claim, Algorithima’s “2021 State of Enterprise ML” reports that the time required for organizations to deploy a machine learning model is increasing, […]

The post Model Deployment Challenges: 6 Lessons From 6 ML Engineers appeared first on neptune.ai.



from Planet SciPy
read more

PyCharm: PyCharm 2021.3.1 Is Out!

The first minor release of PyCharm 2021.3 contains multiple bug fixes:

  • We fixed a bug that prevented Django to run inside WSL. [PY-51679]
  • We fixed a bug where PyCharm suggested importing abstract collections from collections. [PY-46344]
  • We fixed a bug that displayed outputs in white text on white background. [IDEA-283193]
  • We fixed a bug that made tool windows to be moved into a separate window when resizing it. [IDEA-274904]
  • We fixed a bug that displayed incorrect results in DateTime columns when using the database tools with ClickHouse. [DBE-7770]

Download PyCharm 2021.3.1

For the full list of issues addressed in PyCharm 2021.3.1, please see the release notes.
Found a bug? Please report it using our bug tracker.



from Planet Python
via read more

Python for Beginners: String Slicing in Python

Strings are one of the most used data structures in Python. We process all the text data using strings in Python. In this article, we will look at ways to extract a sub-string from a given string using slicing. We will also look at some examples of string slicing to understand it in a better way.

What is String Slicing?

String slicing is the process of taking out a part of a string. The characters in the extracted part may be continuous or they may be present at regular intervals in the original string. 

You can understand string slicing using the following analogy. 

Suppose that you are slicing a loaf of bread into pieces. All the pieces, whatever be their thickness,constitute a slice of the bread. Similarly, We can create slices from a string. The only difference in this analogy is that in case of slicing of bread,the original bread is destroyed after the formation of slices. On the contrary, while slicing a string, the original string remains as it is even after creating new slices.  

Let us take an example. Suppose that we have a string “Pythonforbeginners”.

Different slices of the string can be as follows:

  1. Python”: Here, we have taken the first six characters of the string. You can take any number of continuous characters from the original string starting from any index and the characters will constitute a slice of the string.
  2. nigeb”: Here, we have taken some characters in reverse order. You can take any number of continuous characters from the original string starting from any index in reverse order and the characters will constitute a slice of the string.
  3. Ptof”: Here, we have taken some characters that are present at an interval of 1 position. You can take any number of characters at a fixed interval from the original string starting from any index and the characters will constitute a slice of the string.
  4. sngr”: Here, we have taken some characters that are present at an interval of 2 positions in reverse order. You can take any number of characters at a fixed interval from the original string starting from any index in reverse order and the characters will constitute a slice of the string.

In python there are two ways to create slices of a string. We can create a slice from a string using indexing as well as the built-in slice() method. Let us discuss both the methods one by one.

Slicing using String Indices

We can create a slice from a string using indices of the characters. The syntax for string slicing using indices is string_name [ start_index:end_index:step_size ]. Here, 

  • start_index is the index of the character in the string at which slicing of the string starts.
  • end_index is the index of the character in the string at which the slice of the string is terminated. Here, the end_index is exclusive and the character at end_index will not be included in the sliced string.
  • step_size  is used to determine the indices in the original string that will be included in the sliced string. 
    • A step_size of 1 means that the slice will be created from continuous characters starting from the start_index and ending at the end_index-1 of the original string.
    • step_size of 2 means that we will create a slice of the original string using alternate characters starting from the start_index and ending at the end_index-1 of the original string.
    • A step_size of 3 means that we will select characters after leaving 2 positions in between each character of the original string that has to be included in the sliced string starting from the start_index and ending at the end_index-1 of the original string.
  • If the start_index is greater than end_index and the step_size has a negative value, slicing is done in reverse order.

We can understand the working of the above syntax using the following example.

myString = "Pythonforbeginners"
mySlice = myString[0:6:1]
print("Slice of string '{}' starting at index {}, ending at index {} and step size {} is '{}'".format(myString, 0, 5, 1, mySlice))
mySlice = myString[13:8:-1]
print("Slice of string '{}' starting at index {}, ending at index {} and step size {} is '{}'".format(myString, 13, 9, -1, mySlice))
mySlice = myString[0:8:2]
print("Slice of string '{}' starting at index {}, ending at index {} and step size {} is '{}'".format(myString, 0, 8, 2, mySlice))
mySlice = myString[18:7:-3]
print("Slice of string '{}' starting at index {}, ending at index {} and step size {} is '{}'".format(myString, 18, 7, -3, mySlice))

Output:

Slice of string 'Pythonforbeginners' starting at index 0, ending at index 5 and step size 1 is 'Python'
Slice of string 'Pythonforbeginners' starting at index 13, ending at index 9 and step size -1 is 'nigeb'
Slice of string 'Pythonforbeginners' starting at index 0, ending at index 8 and step size 2 is 'Ptof'
Slice of string 'Pythonforbeginners' starting at index 18, ending at index 7 and step size -3 is 'sngr'

An alternate syntax for string slicing is that we specify only start_index and end_index as in string_name [ start_index:end_index]. Here, the step_size is taken as 1 and the characters are selected consecutively from start_index to end_index-1 as follows.

myString = "Pythonforbeginners"
mySlice = myString[0:6]
print("Slice of string '{}' starting at index {}, ending at index {} is '{}'".format(myString, 0, 5, mySlice))

Output:

Slice of string 'Pythonforbeginners' starting at index 0, ending at index 5 is 'Python'

We can also opt to not specify the start_index and the end_index. In such cases, the default value of start_index is taken as 0 and the default value of end_index is taken as length of the string. You can observe these variations in the following example.

myString = "Pythonforbeginners"
mySlice = myString[:6]
print("Slice of string '{}' starting at index {}, ending at index {} is '{}'".format(myString, 0, 5, mySlice))
mySlice = myString[13:]
print("Slice of string '{}' starting at index {}, ending at last index is '{}'".format(myString, 13, mySlice))

Output:

Slice of string 'Pythonforbeginners' starting at index 0, ending at index 5 is 'Python'
Slice of string 'Pythonforbeginners' starting at index 13, ending at last index is 'nners'

Slicing using built-in function

Instead of using indices of the character directly, we can use the slice() method. The slice() method takes the start_index, end_index and step_size as input and creates a slice object. The slice object is then passed to the original string as index, which then creates the slice of the original string as follows.

myString = "Pythonforbeginners"
slice_obj = slice(0, 6, 1)
mySlice = myString[slice_obj]
print("Slice of string '{}' starting at index {}, ending at index {} and step size {} is '{}'".format(myString, 0, 5, 1,
                                                                                                      mySlice))
slice_obj = slice(13, 8, -1)
mySlice = myString[slice_obj]
print(
    "Slice of string '{}' starting at index {}, ending at index {} and step size {} is '{}'".format(myString, 13, 9, -1,
                                                                                                    mySlice))
slice_obj = slice(0, 8, 2)
mySlice = myString[slice_obj]
print("Slice of string '{}' starting at index {}, ending at index {} and step size {} is '{}'".format(myString, 0, 8, 2,
                                                                                                      mySlice))
slice_obj = slice(18, 7, -3)
mySlice = myString[slice_obj]
print(
    "Slice of string '{}' starting at index {}, ending at index {} and step size {} is '{}'".format(myString, 18, 7, -3,
                                                                                                    mySlice))

Output:

Slice of string 'Pythonforbeginners' starting at index 0, ending at index 5 and step size 1 is 'Python'
Slice of string 'Pythonforbeginners' starting at index 13, ending at index 9 and step size -1 is 'nigeb'
Slice of string 'Pythonforbeginners' starting at index 0, ending at index 8 and step size 2 is 'Ptof'
Slice of string 'Pythonforbeginners' starting at index 18, ending at index 7 and step size -3 is 'sngr'

You can see that the slice object works in almost the same way that we have used to create a slice from a string using the indices of the characters. You can understand this more clearly using the following examples.

myString = "Pythonforbeginners"
# specify only start and end index
slice_obj = slice(5, 16)
mySlice = myString[slice_obj]
print("Slice of string '{}' starting at index {}, ending at index {} is '{}'".format(myString, 0, 5, mySlice))
# specify only end index
slice_obj = slice(12)
mySlice = myString[slice_obj]
print("Slice of string '{}' starting at index {}, ending at index {} is '{}'".format(myString, 0, 12, mySlice))

Output:

Slice of string 'Pythonforbeginners' starting at index 0, ending at index 5 is 'nforbeginne'
Slice of string 'Pythonforbeginners' starting at index 0, ending at index 12 is 'Pythonforbeg'

Conclusion

In this article, we have discussed string slicing in python. We have also looked at different ways to create slices from a given string. To study more about strings in python, you can read this article on string concatenation.

The post String Slicing in Python appeared first on PythonForBeginners.com.



from Planet Python
via read more

ItsMyCode: [Solved] Python can’t Multiply Sequence by non-int of type ‘float’

ItsMyCode |

The TypeError: can’t multiply sequence by non-int of type ‘float’ occurs if we use the multiply operator between a string and float value. 

In this tutorial, we will learn what exactly TypeError: can’t multiply sequence by non-int of type ‘float’ error means and how to resolve this TypeError in your program with examples.

TypeError: can’t multiply sequence by non-int of type ‘float’

Python is one of the best programming languages because of its features and simplicity. One such fantastic feature in Python is we can multiply strings with numbers.

Multiplying string with an integer

Let’s take an example to demonstrate multiplying string with numbers.

The other popular programming languages will never let you to multiple strings and integers. However, we can perform a multiplication between string and integer in Python. After the multiplication, the string is repeated for n times, as shown below.

text = "ItsMyCode"
n = 3
print(text*n)

Output

ItsMyCodeItsMyCodeItsMyCode

Here the string “ItsMyCode” is repeated multiplied by three and repeated three times in the output.

If we try to multiply the string with non-int, let’s say, a floating-point value, then the Python interpreter will throw TypeError: can’t multiply sequence by non-int of type ‘float’.

Multiplying string with a floating-point 

You cannot multiply a string with a non-integer value in Python. Hence if we multiply a string with a floating-point value, we get the error can’t multiply sequence by non-int of type ‘float’.

Let’s take an example to demonstrate multiplying string with a floating-point value.

text = "ItsMyCode"

# floating-point value
n = 3.0
print(text*n)

Output

Traceback (most recent call last):
  File "C:\Personal\IJS\Code\program.py", line 3, in <module>
    print(text*n)
TypeError: can't multiply sequence by non-int of type 'float'

Even though the number entered here is equal to the integer value 3, the Python interpreter will still consider it a floating-point value and raise a TypeError.

The simplest way to resolve this issue is by converting the floating-point to an integer and then multiplying it with a string, as shown below.

Solution TypeError: can’t multiply sequence by non-int of type ‘float’

Now we know that TypeError: can’t multiply sequence by non-int of type float is caused by multiplying a string with a floating-point number. Let us see how we can resolve this error with an example.

Usually, we get this error when we receive input from the user, and it will be of a string format. Consider we have to provide a discount based on the total order value to users.

In the below program, we accept the order value as a string, and we have a fixed discount of 5% on the total order value. 

When we multiply the order_value in string format with a discount value in float, we get an error “can’t multiply sequence by non-int of type float”.

order_value = input("Enter the order value ")
discount = 0.05

total_cost = order_value - (order_value * discount)
print(round(total_cost, 2))

Output

Enter the order value 200
Traceback (most recent call last):
  File "C:\Personal\IJS\Code\main.py", line 4, in <module>
    total_cost = order_value - (order_value * discount)
TypeError: can't multiply sequence by non-int of type 'float'

The best way to resolve this error is to convert the user input string to a floating-point using the float() method. 

This allows us to multiply the order_value and discount because both are floating-point numbers.

order_value = float(input("Enter the order value "))
discount = 0.05

total_cost = order_value - (order_value * discount)
print(round(total_cost, 2))

Output

Enter the order value 200
190.0

Conclusion

We cannot multiply strings with floating-point numbers. We cannot perform this because multiplying strings with integers will create a repetitive sequence of strings.

The same is not possible using the floating-point number as it would result in multiplying a string with decimal point values.

In order to solve this issue, TypeError: can’t multiply sequence by non-int of type ‘float’ ensure that either you are performing a multiplication between string and integer or alternatively you can convert all the string values into a floating-point number before performing any calculations. 

The post [Solved] Python can’t Multiply Sequence by non-int of type ‘float’ appeared first on ItsMyCode.



from Planet Python
via read more

ItsMyCode: TypeError: ‘list’ object is not callable

ItsMyCode |

The most common scenario where Python throws TypeError: ‘list’ object is not callable is when you have assigned a variable name as “list” or if you are trying to index the elements of the list using parenthesis instead of square brackets.

In this tutorial, we will learn what ‘list’ object is is not callable error means and how to resolve this TypeError in your program with examples.

Python TypeError: ‘list’ object is not callable

There are two main scenarios where you get a ‘list’ object is not callable error in Python. Let us take a look at both scenarios with examples.

Scenario 1 – Using the built-in name list as a variable name

The most common mistake the developers tend to perform is declaring the Python built-in names or methods as variable names.

What is a built-in name?

In Python, a built-in name is nothing but the name that the Python interpreter already has assigned a predefined value. The value can be either a function or class object. 

The Python interpreter has 70+ functions and types built into it that are always available.

In Python, a list is a built-in function, and it is not recommended to use the built-in functions or keywords as variable names.

Python will not stop you from using the built-in names as variable names, but if you do so, it will lose its property of being a function and act as a standard variable.

Let us take a look at a simple example to demonstrate the same.

fruit = "Apple"
list = list(fruit)
print(list)

car="Ford"
car_list=list(car)
print(car_list)

Output

['A', 'p', 'p', 'l', 'e']
Traceback (most recent call last):
  File "c:\Personal\IJS\Code\main.py", line 6, in <module>
    car_list=list(car)
TypeError: 'list' object is not callable

If you look at the above example, we have declared a fruit variable, and we are converting that into a list and storing that in a new variable called “list“.

Since we have used the “list” as a variable name here, the list() method will lose its properties and functionality and act like a normal variable.

We then declare a new variable called “car“, and when we try to convert that into a list by creating a list, we get TypeError: ‘list’ object is not callable error message. 

The reason for TypeError is straightforward we have a list variable that is not a built function anymore as we re-assigned the built-in name list in the script. This means you can no longer use the predefined list value, which is a class object representing the Python list.

Solution for using the built-in name list as a variable name

If you are getting object is not callable error, that means you are simply using the built-in name as a variable in your code. 

fruit = "Apple"
fruit_list = list(fruit)
print(fruit_list)

car="Ford"
car_list=list(car)
print(car_list)

Output

['A', 'p', 'p', 'l', 'e']
['F', 'o', 'r', 'd']

In our above code, the fix is simple we need to rename the variable “list” to “fruit_list”, as shown below, which will fix the  ‘list’ object is not callable error. 

Scenario 2 – Indexing list using parenthesis()

Another common cause for this error is if you are attempting to index a list of elements using parenthesis() instead of square brackets []. The elements of a list are accessed using the square brackets with index number to get that particular element.

Let us take a look at a simple example to reproduce this scenario.

my_list = [1, 2, 3, 4, 5, 6]
first_element= my_list(0)
print(" The first element in the list is", first_element)

Output

Traceback (most recent call last):
  File "c:\Personal\IJS\Code\tempCodeRunnerFile.py", line 2, in <module>
    first_element= my_list(0)
TypeError: 'list' object is not callable

In the above program, we have a “my_list” list of numbers, and we are accessing the first element by indexing the list using parenthesis first_element= my_list(0), which is wrong. The Python interpreter will raise TypeError: ‘list’ object is not callable error. 

Solution for Indexing list using parenthesis()

The correct way to index an element of the list is using square brackets. We can solve the ‘list’ object is not callable error by replacing the parenthesis () with square brackets [] to solve the error as shown below.

my_list = [1, 2, 3, 4, 5, 6]
first_element= my_list[0]
print(" The first element in the list is", first_element)

Output

 The first element in the list is 1

Conclusion

The TypeError: ‘list’ object is not callable error is raised in two scenarios 

  1. If you try to access elements of the list using parenthesis instead of square brackets
  2. If you try to use built-in names such as list as a variable name 

Most developers make this common mistake while indexing the elements of the list or using the built-in names as variable names. PEP8 – the official Python style guide – includes many recommendations on naming variables properly, which can help beginners.

The post TypeError: ‘list’ object is not callable appeared first on ItsMyCode.



from Planet Python
via read more

Tuesday, December 28, 2021

Andre Roberge: New milestone for friendly: version 0.5

 Friendly (previously at 0.4.41) and friendly-traceback (previously at 0.4.111) are now at version 0.5. The joint documentation for both projects has not yet been updated.  In addition to the many new cases added for which friendly/friendly-traceback can help with, which includes close to 400 test cases, I am very excited to report to three new important features

  1. Getting help when a traceback is generated before friendly is imported
  2. Not having to set non-default configurations each time friendly is used
  3. The addition of two new languages.

1. Getting help after the fact


Let's start with the first.  Previously, if one wanted help from friendly/friendly-traceback, it had either to be used to run a program, via something like "python -m friendly user_program.py", or it had to be imported and installed (either implicitly or explicitly) before any other code was executed. This still works as before and is the best way to use friendly.

Now, it can be imported *after* a traceback has been generated, and can provide its usual help when using:
  • IPython in a terminal
  • Jupyter notebooks, and Jupyter lab
  • Mu
  • Programs run with cPython using "python -i user_program.py"
  • Code entered in a cPython terminal, with the caveat that it only works in a limited fashion for some SyntaxErrors but almost never for run time errors.
    • The same when using pypy with the exception that using languages other than English may yield some undesirable results.
  • Code saved in files and run from IDLE  (Python 3.10 and possibly later versions of Python 3.9) -- but excluding SyntaxErrors
  • Code entered in IDLE's shell - but excluding SyntaxErrors.
Before explaining the origin of the (different) limitations when using cPython's interactive interpreter or IDLE, let me show the results using IPython, both for SyntaxErrors and run time errors starting with a very unlikely example of SyntaxError


Of course, we can ask for more details

Instead of a SyntaxError, let's see an example of a run time error.

Again, it just works. :-)

Moving on to SyntaxErrors with the cPython interpreter. Let's use the same example as above, with Python 3.10.1:


This works. However, let's have a more detailed look at the information available:


Python does not store the content of the code entered in the interpreter; for SyntaxErrors, it does include the very last line of code where the error was located. This will not work in other situations where a statement spans multiple lines; in some cases, if the error message is precise enough, friendly might still be able to guess the cause of the error.



By contrast, friendly does store the entire code entered in its interpreter.


Let's have a look at a run time error with cPython.

Notice how the traceback contains no information about the code in the file(s) named "<stdin>".
Let's see what information we can get from friendly.


If you use friendly, you would never see the log message (1) as it is something that is enabled by default on my computer. Note that, in spite of not having access to the exact code that produced the exception, in this case friendly is still able to provide some help. This information is similar to what is available with Python 3.10+; however, you can use friendly with Python 3.6 and still get the same information!
Of course, it is better still if you use friendly from the start:





Let's now have a look at IDLE. Recently, IDLE has added support for custom exception hook. Instead of requiring the use of its own console when using IDLE, friendly can make use of this new capability of IDLE to provide help with run time errors - but not SyntaxErrors as those are handled in a peculiar way by IDLE.



For this type of error, trying to use friendly after the fact yields very little useful information.

For SyntaxErrors, the situation is even worse: IDLE does not make any information available.

1.a) A possible improvement to fix the problem with cPython

If you look at the tracebacks from IDLE for runtime errors, you will see "files" with names like 
"<pyshell#4>": each code block entered by the user is saved in such a "file", each file having a different name.  IDLE works around a limitation of Python's linecache module to store the content of these files so that they can be retrieved and analyzed.  By contrast, cPython shows the code entered by a user as belonging to files with identical names "<stdin>" whose content can never be retrieved.

For code executed using exec("code"), the content is shown to belong to a file named "<string>" whose content is also not available.  If cPython were to store the code in files whose named included a different integer each time, like IDLE does, then it could be retrieved by programs like friendly and provide additional help.  This was suggested on Python-ideas for code run using exec, but got not traction, even though related questions are often asked on StackOverflow.

Moving on ...

2. Friendly saves settings

Previously, each time that friendly was used, it started with default values for the preferred language (French and English only in previous versions), color scheme (light or dark, depending on the background of the terminal or other application), formatter type, etc.

Now, friendly saves the values specified which are then used by default when a new session starts. For the language choice, this is a global settings, that carries in all environments.  For other settings, friendly (at least on Windows) can determine if it is running in a PowerShell terminal or an old-fashion cmd, in a Visual Studio Code terminal, in a PyCharm terminal, if it is run with IPython, or in a Jupyter notebook, etc.   Here's an example of adjusting the background color so that the information provided by friendly blends in better.

Friendly only includes two different color scheme: one that is designed to work with a white (or similar) background and another with a black (or similar) background.  Anyone working with terminals (or notebooks) with background colors that do not work well with either of the two existing color schemes is welcome to provide different color schemes to be added to friendly.

So far, I have only tested this with Windows. Mac and Linux users are encouraged to try it out and see if their different environments can be detected correctly so that friendly can work well in all the environment they use it.


3. New languages

In addition to English and French, friendly is available in Spanish (approximately 99% of the translation is done, as I keep adding new information) and about 10% has been translated into Italian.

Conclusion

There is much more I could write about new but smaller additions to friendly since version 0.4.  However, this blog post is already too long and this will have to wait until later - perhaps after I update the existing documentation.



from Planet Python
via read more

Hynek Schlawack: import attrs

An attempt at catharsis.



from Planet Python
via read more

2021: A Year in Review

Partnerships and Collaborations

from Planet SciPy
read more

Python for Beginners: Check For Harshad Number In Python

Numbers have many specialties. Based on the specialties, they are given unique names. One such special number is the Harshad number or Niven number. In this article, We will discuss a program to check if a given number is a Harshad number or not. 

What is a Harshad Number?

A number is said to be a Harshad number or Niven number if it is divisible by the sum of its digits. In other words, If we are given a number that is divisible by the sum of its own digits, the number will be called a Harshad Number. 

For example, let us consider the number 4320. The sum of its digits is 9 that can be obtained by adding the digits 4, 3, 2, and 0. We can see that 4320 is divisible by 9. Hence, 4320 is a Harshad number. On the other hand, the sum of digits in 4321 is 10. Here, 4321 is not completely divisible by 10. Hence, it is not a Harshad number.

Program To Check For A Harshad Number in Python

To check whether a number is a Harshad number or not, we will first calculate the sum of digits of the given number. After that, we will divide the number by the sum of digits to see if it is completely divisible or not. If the number is divisible by the sum of the digits, we will say that it is a Harshad Number. Otherwise not.

To write the complete program, we will first write a function to calculate the sum of digits of the given number. For this, we will keep dividing the number by 10 until the number becomes 0. Each time we divide the number by 10, we get the rightmost digit as the remainder. We can use this remainder to find the sum of digits by adding all the remainders till the number becomes 0.

The following function for calculating the sum of digits of a number in python accepts a number and returns the sum of digits of the number.

def calculate_sum_of_digits(N):
    sumOfDigits = 0
    while N > 0:
        digit = N % 10
        sumOfDigits = sumOfDigits + digit
        N = N // 10
    return sumOfDigits


input_number = 4320
output = calculate_sum_of_digits(input_number)
print("Sum of digits of {} is {}.".format(input_number, output))

Output:

Sum of digits of 4320 is 9.

After finding the sum of digits, we will divide the number by the sum of digits. If the reminder for the division is zero, we will say that a number is a Harshad number or Niven number. Otherwise, we will print that the number is not a Harshad Number.

def calculate_sum_of_digits(N):
    sumOfDigits = 0
    while N > 0:
        digit = N % 10
        sumOfDigits = sumOfDigits + digit
        N = N // 10
    return sumOfDigits


def check_for_harshad_number(N):
    sumOfDigits = calculate_sum_of_digits(N)
    if N % sumOfDigits == 0:
        return True
    else:
        return False


input_number = 4320
output = check_for_harshad_number(input_number)
print("{} is a Harshad Number:{}".format(input_number, output))
input_number = 4321
output = check_for_harshad_number(input_number)
print("{} is a Harshad Number:{}".format(input_number, output))

Output:

4320 is a Harshad Number:True
4321 is a Harshad Number:False

Conclusion

In this article, we have discussed what a Harshad number or Niven number is. We have also implemented a program in python to check if a given number is a Harshad number or not. To learn more about numbers in python, you can read this article on decimal numbers in python. You might also like this article on complex numbers in python.

The post Check For Harshad Number In Python appeared first on PythonForBeginners.com.



from Planet Python
via read more

Stack Abuse: Validate Email Addresses in Python with email-validator

Introduction

Whether you are creating a registration form for your website or you just need to delete all invalid email addresses from your mailing list, you can't help but perform the process of email validation.

You need to validate if an email address is real by checking whether if it meets the required form and can receive email messages. That must be performed efficiently and safely.

That is where email-validator comes in. It is an easy to use, yet robust, Python library used to validate email addresses.

In this guide, we'll go over the basics of this library, discover when and why you could use it, as well as when not to. We'll go over these with practical examples that will help you understand how to use email-validator.

What is email-validator?

As we've previously stated, email-validator is a robust Python library that validates email addresses. It performs two types of validation - syntax validation and deliverability validation. That is important because the email address must meet the required form and have a resolvable domain name at the same time to be considered valid.

Syntax validation ensures that a string representation of an email address is of the form , such as example@stackabuse.com.
Deliverability validation ensures that the syntactically correct email address has the domain name (the string after the @ sign - stackabuse.com) that can be resolved.

In simple terms, it ensures that the validated email address can send and receive email messages.

On top of that, email-validator has a small bonus for us, if the email address is valid, email-validator can return its normalized form, so that we can store it in a database in a proper way. On the other hand, if an email address is invalid, email-validator will give us a clear and human-readable error message to help us understand why the passed email address is not valid.

In its simplest form, the normalization of an email address implies lowercasing the domain of an email address (the sequence after the @ sign), because it is case-insensitive.

In more complex cases of normalization, where the domain part includes some Unicode characters, normalization covers a variety of conversions between Unicode and ASCII characters. The problem lies in the fact that different Unicode strings can look and mean the same to the end-user, so the normalization should ensure that those strings will be recorded in the same way because they actually represent the same domain.

It is important to mention that this library is not designed to work with an email address that doesn't meet the form of example@domainname.com.

For example, it won't properly validate the To: line in an email message (for example, To: Example Name <example@domainname.com>).

email-validator vs RegEx for Email Validation

We usually use some kind of Regular Expression (RegEx) to validate the correct form of email addresses and it is a great choice if you only need to make sure that some email address meets the required form. It is a well-known technique, easy to write and maintain, and doesn't consume too much computing power to execute.

If you'd like to read more about validating email addresses with RegEx - read our Python: Validate Email Address with Regular Expressions!

On the other hand, email address validation sometimes can be a lot more complex. A string containing an email address may meet the specified form of an email address, but still cannot be considered a proper email address, because the domain doesn't resolve.

For instance, example@ssstackabuse.com meets the specified form of an email address, but isn't valid because the domain name (ssstackabuse.com) doesn't exist, therefore doesn't resolve and the example email address can't send and receive email messages.

On the other hand, example@stackabuse.com, meets both requirements for a valid email address. It meets the desired form and the domain name resolves. Therefore, it can be considered a valid email address.

In that case, the email-validator provides a superior solution - it performs both syntax and deliverability validation with one simple function call, so there is no need to bother with making sure that the email address can actually send and receive emails. It would be impossible to code both of those verifications using just Regular Expressions.

Note: It's factually impossible to guarantee whether an email will be received, or not, without sending an email and observing the result. You can, however, check if it could receive an email as a categorical possibility.

Those two things make a strong case in favor of email-validator against Regular Expressions. It is easier to use and still can perform more tasks more efficiently.

How to Install email-validator?

The email-validator library is available on PyPI, so the installation is pretty straightforward via pip or pip3:

$ pip install email-validator
$ pip3 install email-validator

And now you have the email-validator ready to use in a Python script.

Validate Email Address with email-validator?

The core of the email-validator library is its validate_email() method. It takes a string representation of an email address as the argument and performs validation on that address. If the passed email address is valid, the validate_email() method will return an object containing a normalized form of the passed email address, but in the case of an invalid email address, it will raise the EmailNotValidError with a clear and human-readable error message that will help us understand why the passed email address is not valid.

EmailNotValidError is actually just an abstract class, which is used to detect that the error in a validation process occurred, hence, it is not used to represent and describe actual errors.

For that purpose, EmailNotValidError class has two subclasses describing actual errors that occurred. The first one is EmailSynaxError which is raised when a syntax validation fails, meaning that the passed email doesn't meet the required form of an email address. The second one is EmailUndeliverableError which is raised when a deliverability validation fails, meaning that the domain name of the passed email address doesn't exist.

Now we can finally take a look at how to use the validate_email() method. Of course, the first step is to import it to our script, and then we are ready to use it:

from email_validator import validate_email

testEmail = "example@stackabuse.com"

emailObject = validate_email(testEmail)
print(emailObject.email)

Since the passed testEmail is a valid email address, the previous code will output the normalized form of the email address stored in testEmail variable:

example@stackabuse.com

Note: In the previous example, the output is the same as the original address from the testEmail because it was originally normalized. If you pass the unnormalized form of an email to the validate_email() method, the returned email address will be normalized, as expected.

If we change the original testEmail to "example@STACKabuse.com", the previous code will still have the same output, because it's normalized:

example@stackabuse.com

On the other hand, if we pass the invalid email address to the validate_email() method, the previous code will prompt us with the corresponding error message. The following example of testEmail will pass the syntax validation, but fail the deliverability validation because the domain ssstackabuse.com doesn't exist:

testEmail = "example@ssstackabuse.com"

In this case, the previous code will prompt a long error amongst which is:

>> ...
>> raise EmailUndeliverableError("The domain name %s does not exist." % domain_i18n)
email_validator.EmailUndeliverableError: The domain name ssstackabuse.com does not exist.

Based on this prompt, we can conclude that the passed email is invalid because its domain name does not exist. The corresponding messages will also be prompted in the case of syntactically invalid emails so that we can easily conclude that the passed email address doesn't meet the required form of an email address.

You could extract a more user-friendly and human-readable error message from this as well, automatically. To extract just the error message from the previous prompt, we need to rewrite the previous code as follows:

from email_validator import validate_email, EmailNotValidError

testEmail = "examplestackabuse.com"

try:
    # Validating the `testEmail`
    emailObject = validate_email(testEmail)

    # If the `testEmail` is valid
    # it is updated with its normalized form
    testEmail = emailObject.email
    print(testEmail)
except EmailNotValidError as errorMsg:
    # If `testEmail` is not valid
    # we print a human readable error message
    print(str(errorMsg))

This code will output just a simple error message extracted from the previous prompt:

The domain name ssstackabuse.com does not exist.

Note: We've taken advantage of the EmailNotValidError class. We've tried to execute the email validation in the try block and ensured that the error will be caught in the except block in case of failing the validation. There is no need to catch EmailSyntaxError or EmailUndeliverableError individually, because both of them are subclasses of the caught EmailNotValidError class, and the type of error can be easily determined by the printed error message.

validate_email() - Optional Arguments

By default, the validate_email() method accepts only one argument - the string representation of the email address that needs to be validated, but can accept a few other keyword arguments:

  • allow_smtputf8 - the default value is True, if set to False the validate_email() won't validate internationalized email addresses, just ones that have a domain name consisting of ASCII characters only (no UTF-8 characters are allowed in a domain name in that case).
  • check_deliverability - the default value is True, if set to False, no deliverability validation is performed .
  • allow_empty_local - the default value is False, if set to True, the empty local part of an email address will be allowed (i.e. @stackabuse.com will be considered as the valid email address).

The ValidatedEmail Object

You've probably noticed that we've been accessing the normalized form of an email address by emailObject.email. That is because the validate_email() method returns the ValidatedEmail object (in previous examples, it was stored in the emailObject variable) when a valid email address is passed as the argument.

The ValidatedEmail object contains multiple attributes which describe different parts of the normalized email address. The email attribute contains the normalized form of the validated email address, therefore, we need to access it using the . notation - emailObject.email.

Generally, we can access any attribute of the ValidatedEmail object by using variableName.attributeName (where variableName is the variable used to store the ValidatedEmail object).

For example, let's say that we've validated the example@sTaCkABUSE.cOm with the validate_email() method. The resulting ValidatedEmail object will contain some interesting and useful attributes as described in the following table:

Attribute Name Example Value Description
email example@stackabuse.com Normalized form of an email address.
ascii_email example@stackabuse.com ASCII only form of email attribute. If the local_part contains any kind of internationalized characters, this attribute will be set to None.
local_part example The string before the @ sign in the normalized form of the email address.
ascii_local_part example If there are no internationalized characters, this attribute is set to ASCII only form of local_part attribute. Otherwise, it is set to None.
domain stackabuse.com The string after the @ sign in the normalized form of the email address. If it contains non-ASCII characters, the smptutf8 attribute must be True.
ascii_domain stackabuse.com ASCII only form of domain attribute.
smtputf8 True A boolean value. If the allow_smtputf8=False argument is passed to the validate_email() method, this argument is False and True otherwise.

Note: ASCII variants of mentioned attributes are generated using the Punycode encoding syntax. It is an encoding syntax used to transform a Unicode string into an ASCII string for use with Internationalized Domain Names in Applications (IDNA).

Conclusion

All in all, the email-validator is a great tool for validating email addresses in Python.

In this guide, we've covered all the important aspects of using this library, so that you have a comprehensive view of it. You should be able to understand when and how to use the email-validator, as well as when to pick some alternative tool.



from Planet Python
via read more

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...