In this tutorial, we’ll examine the different ways you can extract a date from a .txt file using Python programming. Python is a versatile language—as you’ll discover—and there are many solutions for this problem.
First, we’ll look at using regular expression patterns to search text files for dates that fit a predefined format. We’ll learn about using the re library and creating our own regular expression searches.
We’ll also examine datetime objects and use them to convert strings into data models. Lastly, we’ll see how the datefinder module simplifies the process of searching a text file for dates that haven’t been formatted, like we might find in natural language content.
Extract a Date from a .txt File using Regular Expression
Dates are written in many different formats. Sometimes people write month/day/year. Other dates might include times of the day, or the day of the week (Wednesday July 8, 2021 8:00PM).
How dates are formatted is a factor to consider before we go about extracting them from text files.
For instance, if a date follows the month/date/year format, we can find it using a regular expression pattern. With regular expression, or regex for short, we can search a text by matching a string to a predefined pattern.
The beauty of regular expression is that we can use special characters to create powerful search patterns. For instance, we can craft a pattern that will find all the formatted dates in the following body of text.
minutes.txt
10/14/2021 – Meeting with the client.
07/01/2021 – Discussed marketing strategies.
12/23/2021 – Interviewed a new team lead.
01/28/2018 – Changed domain providers.
06/11/2017 – Discussed moving to a new office.
Example: Finding formatted dates with regex
import re
# open the text file and read the data
file = open("minutes.txt",'r')
text = file.read()
# match a regex pattern for formatted dates
matches = re.findall(r'(\d+/\d+/\d+)',text)
print(matches)
Output
[’10/14/2021′, ’07/01/2021′, ’12/23/2021′, ’01/28/2018′, ’06/11/2017′]
The regex pattern here uses special characters to define the strings we want to extract from the text file. The characters d and + tell regex we’re looking for multiple digits within the text.
We can also use regex to find dates that are formatted in different ways. By altering our regex pattern, we can find dates that use either a forward slash (\) or a dash (–) as the separator.
This works because regex allows for optional characters in the search pattern. We can specify that either character—a forward slash or dash—is an acceptable match.
apple2.txt
The first Apple II was sold on 07-10-1977. The last of the Apple II
models were discontinued on 10/15/1994.
Example: Matching dates with a regex pattern
import re
# open a text file
f = open("apple2.txt", 'r')
# extract the file's content
content = f.read()
# a regular expression pattern to match dates
pattern = "\d{2}[/-]\d{2}[/-]\d{4}"
# find all the strings that match the pattern
dates = re.findall(pattern, content)
for date in dates:
print(date)
f.close()
Output
07-10-1977
10/15/1994
Examining the full extent of regex’s potential is beyond the scope of this tutorial. Try experimenting with some of the following special characters to learn more about using regular expression patterns to extract a date—or other information—from a .txt file.
Special Characters in Regex
- \s – A space character
- \S – Any character except for a space character
- \d – Any digit from 0 to 9
- \D – And any character except for a digit
- \w – Any word of characters or digits [a-zA-Z0-9]
- \W – Any non-word characters
Extract a Datetime Object from a .txt File
In Python we can use the datetime library for manipulating dates and working with time. The datetime library comes pre-packed with Python, so there’s no need to install it.
By using datetime objects, we have more control over string data read from text files. For example, we can use a datetime object to get a copy of the current date and time of our computer.
import datetime
now = datetime.datetime.now()
print(now)
Output
2021-07-04 20:15:49.185380
In the following example, we’ll extract a date from a company .txt file that mentions a scheduled meeting. Our employer needs us to scan a group of such documents for dates. Later, we plan to add the information we gather to a SQLite database.
We’ll begin by defining a regex pattern that will match our date format. Once a match is found, we’ll use it to create a datetime object from the string data.
schedule.txt
schedule.txt
The project begins next month. Denise has scheduled a meeting in the conference room at the Embassy Suits on 10-7-2021.
Example: Creating datetime objects from file data
import re
from datetime import datetime
# open the data file
file = open("schedule.txt", 'r')
text = file.read()
match = re.search(r'\d+-\d+-\d{4}', text)
# create a new datetime object from the regex match
date = datetime.strptime(match.group(), '%d-%m-%Y').date()
print(f"The date of the meeting is on {date}.")
file.close()
Output
The date of the meeting is on 2021-07-10.
Extracting Dates from a Text File with the Datefinder Module
The Python datefinder module can locate dates in a body of text. Using the find_dates() method, it’s possible to search text data for many different types of dates. Datefinder will return any dates it finds in the form of a datetime object.
Unlike the other packages we’ve discussed in this guide, Python does not come with datefinder. The easiest way to install the datefinder module is to use pip from the command prompt.
pip install datefinder
With datefinder installed, we’re ready to open files and extract data. For this example, we’ll use a text document that introduces a fictitious company project. Using datefinder, we’ll extract each date from the .txt file, and print their datimeobject counterparts.
Feel free to save the file locally and follow along.
project_timeline.txt
PROJECT PEPPER
All team members must read the project summary by
January 4th, 2021.
The first meeting of PROJECT PEPPER begins on 01/15/2021
at 9:00am. Please find the time to read the following links by then.
created on 08-12-2021 at 05:00 PM
This project file has dates in many formats. Dates are written using dashes and forward slashes. What’s worse, the month January is written out. How can we find all these dates with Python?
Example: Using datefinder to extract dates from file data
import datefinder
# open the project schedule
file = open("project_timeline.txt",'r')
content = file.read()
# datefinder will find the dates for us
matches = list(datefinder.find_dates(content))
if len(matches) > 0:
for date in matches:
print(date)
else:
print("Found no dates.")
file.close()
Output
2021-01-04 00:00:00
2021-01-15 09:00:00
2021-08-12 17:00:00
As you can see from the output, datefinder is able to find a variety of date formats in the text. Not only is the package capable of recognizing the names of months, but it also recognizes the time of day if it’s included in the text.
In another example, we’ll use the datefinder package to extract a date from a .txt file that includes the dates for a popular singer’s upcoming tour.
tour_dates.txt
Saturday July 25, 2021 at 07:00 PM Inglewood, CA
Sunday July 26, 2021 at 7 PM Inglewood, CA
09/30/2021 7:30PM Foxbourough, MA
Example: Extract a tour date and times from a .txt file with datefinder
import datefinder
# open the project schedule
file = open("tour_dates.txt",'r')
content = file.read()
# datefinder will find the dates for us
matches = list(datefinder.find_dates(content))
if len(matches) > 0:
print("TOUR DATES AND TIMES")
print("--------------------")
for date in matches:
# use f string to format the text
print(f"{date.date()} {date.time()}")
else:
print("Found no dates.")
file.close()
Output
TOUR DATES AND TIMES
——————–
2021-07-25 19:00:00
2021-07-26 19:00:00
2021-09-30 19:30:00
As you can see from the examples, datefinder can find many different types of dates and times. This is useful if the dates you’re looking for don’t have a certain format, as will often be the case in natural language data.
Summary
In this post, we’ve covered several methods of how to extract a date or time from a .txt file. We’ve seen the power of regular expression to find matches in string data, and we’ve seen how to convert that data into a Python datetime object.
Finally, if the dates in your text files don’t have a specified format—as will be the case in most files with natural language content—try the datefinder module. With this Python package, it’s possible to extract dates and times from a text file that aren’t conveniently formatted ahead of time.
Related Posts
If you enjoyed this tutorial and are eager to learn more about Python—and we sincerely hope you are—follow these links for more great guides from Python for Beginners.
- How to use Python concatenation to join strings
- Using Python try catch to mitigate errors and prevent crashes
The post How to Extract a Date from a .txt File in Python appeared first on PythonForBeginners.com.
from Planet Python
via read more
No comments:
Post a Comment