Tuesday, December 28, 2021

Stack Abuse: Validate Email Addresses in Python with email-validator

Introduction

Whether you are creating a registration form for your website or you just need to delete all invalid email addresses from your mailing list, you can't help but perform the process of email validation.

You need to validate if an email address is real by checking whether if it meets the required form and can receive email messages. That must be performed efficiently and safely.

That is where email-validator comes in. It is an easy to use, yet robust, Python library used to validate email addresses.

In this guide, we'll go over the basics of this library, discover when and why you could use it, as well as when not to. We'll go over these with practical examples that will help you understand how to use email-validator.

What is email-validator?

As we've previously stated, email-validator is a robust Python library that validates email addresses. It performs two types of validation - syntax validation and deliverability validation. That is important because the email address must meet the required form and have a resolvable domain name at the same time to be considered valid.

Syntax validation ensures that a string representation of an email address is of the form , such as example@stackabuse.com.
Deliverability validation ensures that the syntactically correct email address has the domain name (the string after the @ sign - stackabuse.com) that can be resolved.

In simple terms, it ensures that the validated email address can send and receive email messages.

On top of that, email-validator has a small bonus for us, if the email address is valid, email-validator can return its normalized form, so that we can store it in a database in a proper way. On the other hand, if an email address is invalid, email-validator will give us a clear and human-readable error message to help us understand why the passed email address is not valid.

In its simplest form, the normalization of an email address implies lowercasing the domain of an email address (the sequence after the @ sign), because it is case-insensitive.

In more complex cases of normalization, where the domain part includes some Unicode characters, normalization covers a variety of conversions between Unicode and ASCII characters. The problem lies in the fact that different Unicode strings can look and mean the same to the end-user, so the normalization should ensure that those strings will be recorded in the same way because they actually represent the same domain.

It is important to mention that this library is not designed to work with an email address that doesn't meet the form of example@domainname.com.

For example, it won't properly validate the To: line in an email message (for example, To: Example Name <example@domainname.com>).

email-validator vs RegEx for Email Validation

We usually use some kind of Regular Expression (RegEx) to validate the correct form of email addresses and it is a great choice if you only need to make sure that some email address meets the required form. It is a well-known technique, easy to write and maintain, and doesn't consume too much computing power to execute.

If you'd like to read more about validating email addresses with RegEx - read our Python: Validate Email Address with Regular Expressions!

On the other hand, email address validation sometimes can be a lot more complex. A string containing an email address may meet the specified form of an email address, but still cannot be considered a proper email address, because the domain doesn't resolve.

For instance, example@ssstackabuse.com meets the specified form of an email address, but isn't valid because the domain name (ssstackabuse.com) doesn't exist, therefore doesn't resolve and the example email address can't send and receive email messages.

On the other hand, example@stackabuse.com, meets both requirements for a valid email address. It meets the desired form and the domain name resolves. Therefore, it can be considered a valid email address.

In that case, the email-validator provides a superior solution - it performs both syntax and deliverability validation with one simple function call, so there is no need to bother with making sure that the email address can actually send and receive emails. It would be impossible to code both of those verifications using just Regular Expressions.

Note: It's factually impossible to guarantee whether an email will be received, or not, without sending an email and observing the result. You can, however, check if it could receive an email as a categorical possibility.

Those two things make a strong case in favor of email-validator against Regular Expressions. It is easier to use and still can perform more tasks more efficiently.

How to Install email-validator?

The email-validator library is available on PyPI, so the installation is pretty straightforward via pip or pip3:

$ pip install email-validator
$ pip3 install email-validator

And now you have the email-validator ready to use in a Python script.

Validate Email Address with email-validator?

The core of the email-validator library is its validate_email() method. It takes a string representation of an email address as the argument and performs validation on that address. If the passed email address is valid, the validate_email() method will return an object containing a normalized form of the passed email address, but in the case of an invalid email address, it will raise the EmailNotValidError with a clear and human-readable error message that will help us understand why the passed email address is not valid.

EmailNotValidError is actually just an abstract class, which is used to detect that the error in a validation process occurred, hence, it is not used to represent and describe actual errors.

For that purpose, EmailNotValidError class has two subclasses describing actual errors that occurred. The first one is EmailSynaxError which is raised when a syntax validation fails, meaning that the passed email doesn't meet the required form of an email address. The second one is EmailUndeliverableError which is raised when a deliverability validation fails, meaning that the domain name of the passed email address doesn't exist.

Now we can finally take a look at how to use the validate_email() method. Of course, the first step is to import it to our script, and then we are ready to use it:

from email_validator import validate_email

testEmail = "example@stackabuse.com"

emailObject = validate_email(testEmail)
print(emailObject.email)

Since the passed testEmail is a valid email address, the previous code will output the normalized form of the email address stored in testEmail variable:

example@stackabuse.com

Note: In the previous example, the output is the same as the original address from the testEmail because it was originally normalized. If you pass the unnormalized form of an email to the validate_email() method, the returned email address will be normalized, as expected.

If we change the original testEmail to "example@STACKabuse.com", the previous code will still have the same output, because it's normalized:

example@stackabuse.com

On the other hand, if we pass the invalid email address to the validate_email() method, the previous code will prompt us with the corresponding error message. The following example of testEmail will pass the syntax validation, but fail the deliverability validation because the domain ssstackabuse.com doesn't exist:

testEmail = "example@ssstackabuse.com"

In this case, the previous code will prompt a long error amongst which is:

>> ...
>> raise EmailUndeliverableError("The domain name %s does not exist." % domain_i18n)
email_validator.EmailUndeliverableError: The domain name ssstackabuse.com does not exist.

Based on this prompt, we can conclude that the passed email is invalid because its domain name does not exist. The corresponding messages will also be prompted in the case of syntactically invalid emails so that we can easily conclude that the passed email address doesn't meet the required form of an email address.

You could extract a more user-friendly and human-readable error message from this as well, automatically. To extract just the error message from the previous prompt, we need to rewrite the previous code as follows:

from email_validator import validate_email, EmailNotValidError

testEmail = "examplestackabuse.com"

try:
    # Validating the `testEmail`
    emailObject = validate_email(testEmail)

    # If the `testEmail` is valid
    # it is updated with its normalized form
    testEmail = emailObject.email
    print(testEmail)
except EmailNotValidError as errorMsg:
    # If `testEmail` is not valid
    # we print a human readable error message
    print(str(errorMsg))

This code will output just a simple error message extracted from the previous prompt:

The domain name ssstackabuse.com does not exist.

Note: We've taken advantage of the EmailNotValidError class. We've tried to execute the email validation in the try block and ensured that the error will be caught in the except block in case of failing the validation. There is no need to catch EmailSyntaxError or EmailUndeliverableError individually, because both of them are subclasses of the caught EmailNotValidError class, and the type of error can be easily determined by the printed error message.

validate_email() - Optional Arguments

By default, the validate_email() method accepts only one argument - the string representation of the email address that needs to be validated, but can accept a few other keyword arguments:

  • allow_smtputf8 - the default value is True, if set to False the validate_email() won't validate internationalized email addresses, just ones that have a domain name consisting of ASCII characters only (no UTF-8 characters are allowed in a domain name in that case).
  • check_deliverability - the default value is True, if set to False, no deliverability validation is performed .
  • allow_empty_local - the default value is False, if set to True, the empty local part of an email address will be allowed (i.e. @stackabuse.com will be considered as the valid email address).

The ValidatedEmail Object

You've probably noticed that we've been accessing the normalized form of an email address by emailObject.email. That is because the validate_email() method returns the ValidatedEmail object (in previous examples, it was stored in the emailObject variable) when a valid email address is passed as the argument.

The ValidatedEmail object contains multiple attributes which describe different parts of the normalized email address. The email attribute contains the normalized form of the validated email address, therefore, we need to access it using the . notation - emailObject.email.

Generally, we can access any attribute of the ValidatedEmail object by using variableName.attributeName (where variableName is the variable used to store the ValidatedEmail object).

For example, let's say that we've validated the example@sTaCkABUSE.cOm with the validate_email() method. The resulting ValidatedEmail object will contain some interesting and useful attributes as described in the following table:

Attribute Name Example Value Description
email example@stackabuse.com Normalized form of an email address.
ascii_email example@stackabuse.com ASCII only form of email attribute. If the local_part contains any kind of internationalized characters, this attribute will be set to None.
local_part example The string before the @ sign in the normalized form of the email address.
ascii_local_part example If there are no internationalized characters, this attribute is set to ASCII only form of local_part attribute. Otherwise, it is set to None.
domain stackabuse.com The string after the @ sign in the normalized form of the email address. If it contains non-ASCII characters, the smptutf8 attribute must be True.
ascii_domain stackabuse.com ASCII only form of domain attribute.
smtputf8 True A boolean value. If the allow_smtputf8=False argument is passed to the validate_email() method, this argument is False and True otherwise.

Note: ASCII variants of mentioned attributes are generated using the Punycode encoding syntax. It is an encoding syntax used to transform a Unicode string into an ASCII string for use with Internationalized Domain Names in Applications (IDNA).

Conclusion

All in all, the email-validator is a great tool for validating email addresses in Python.

In this guide, we've covered all the important aspects of using this library, so that you have a comprehensive view of it. You should be able to understand when and how to use the email-validator, as well as when to pick some alternative tool.



from Planet Python
via read more

No comments:

Post a Comment

TestDriven.io: Working with Static and Media Files in Django

This article looks at how to work with static and media files in a Django project, locally and in production. from Planet Python via read...