Introduction
Resources on the Web are located under some kind of web-address (even if they're not accessible), oftentimes referred to as a URL (Uniform Resource Locator). These resources are, most of the time, manipulated by an end-user (retrieved, updated, deleted, etc.) using the HTTP protocol through respective HTTP Methods.
In this guide, we'll be taking a look at how to leverage the
urllib3
library, which allows us to send HTTP Requests through Python, programmatically.
Note: The urllib3
module can only used with Python 3.x.
What is HTTP?
HTTP (HyperText Transfer Protocol) is a data transfer protocol used for, typically, transmitting hypermedia documents, such as HTML, but can also be used to transfer JSON, XML or similar formats. It's applied in the Application Layer of the OSI Model, alongside other protocols such as FTP (File Transfer Protocol) and SMTP (Simple Mail Transfer Protocol).
HTTP is the backbone of the World Wide Web as we know it today and it's main task is to enable a communication channel between web browsers and web servers, through a lifecycle of HTTP Requests and HTTP Responses - the fundamental communication components of HTTP.
It's based on the client-server model where a client requests a resource, and the server responds with the resource - or a lack thereof.
A typical HTTP Request may look something like:
GET /tag/java/ HTTP/1.1
Host: stackabuse.com
Accept: */*
User-Agent: Mozilla/5.0 (platform; rv:geckoversion) Gecko/geckotrail Firefox/firefoxversion
If the server finds the resource, the HTTP Response's header will contain data on how the request/response cycle fared:
HTTP/1.1 200 OK
Date: Thu, 22 Jul 2021 18:16:38 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
...
And the response body will contain the actual resource - which in this case is an HTML page:
<!DOCTYPE html>
<html lang="en">
<head>
<meta name="twitter:title" content="Stack Abuse"/>
<meta name="twitter:description" content="Learn Python, Java, JavaScript/Node, Machine Learning, and Web Development through articles, code examples, and tutorials for developers of all skill levels."/>
<meta name="twitter:url" content="https://stackabuse.com"/>
<meta name="twitter:site" content="@StackAbuse"/>
<meta name="next-head-count" content="16"/>
</head>
...
The urllib3 Module
The urllib3
module is the latest HTTP-related module developed for Python and the successor to urllib2
. It supports file uploads with multi-part encoding, gzip, connection pooling and thread safety. It usually comes pre-installed with Python 3.x, but if that's not the case for you, it can easily be installed with:
$ pip install urllib3
You can check your version of urllib3
by accessing the __version__
of the module:
import urllib3
# This tutorial is done with urllib3 version 1.25.8
print(urrlib3.__version__)
Alternatively, you can use the Requests module, which is built on top of
urllib3
. It's more intuitive and human-centered, and allows for a wider range of HTTP requests. If you'd like to read more about it - read our Guide to the Requests Module in Python.
HTTP Status Codes
Whenever an HTTP request is sent - the response, other than the requested resource (if available and accessible), also contains an HTTP Status Code, signifying how the operation went. It is paramount that you know what the status code you got means, or at least what it broadly implies.
Is there a problem? If so, is it due to the request, the server or me?*
There are five different groups of response codes:
- Informational codes (between 100 and 199)
- Successful codes (between 200 and 299) - 200 is the most common one
- Redirect codes (between 300 and 399)
- Client error codes (between 400 and 499) - 404 is the most common one
- Server error codes (between 500 and 599) - 500 is the most common one
To send requests using urllib3
, we use an instance of the PoolManager
class, which takes care of the actual requests for us - covered shortly.
All responses to these requests are packed into an HTTPResponse
instance, which, naturally, contains the status
of that response:
import urllib3
http = urllib3.PoolManager()
response = http.request("GET", "http://www.stackabuse.com")
print(response.status) # Prints 200
You can use these statuses to alter the logic of the code - if the result is 200 OK
, not much probably needs to be done further. However, if the result is a 405 Method Not Allowed
response - your request was probably badly constructed.
However, if a website responds with a
418 I'm a teapot
status code, albeit rare - it's letting you know that you can't brew coffee with a teapot. In practice, this typically means that the server doesn't want to respond to the request, and never will. If it were a temporary halt for certain requests - a503 Service Unavailable
status code is much more fitting.
Note: The 418 I'm a teapot
status code is a real but playful status code, added as an April Fools' joke.
The Pool Manager
A Connection Pool is a cache of connections that can be reused when needed in future requests, used to improve performance when executing certain commands numerous times. Similarly enough - when sending various requests, a Connection Pool is made so certain connections can be reused.
urllib3
keeps track of requests and their connections through the ConnectionPool
and HTTPConnection
classes. Since making these by hand leads to a lot of boilerplate code - we can delegate the entirety of the logic to the PoolManager
, which automatically creates connections and adds them to the pool. By adjusting the num_pools
argument, we can set the number of pools it'll use:
import urllib3
http = urllib3.PoolManager(num_pools=3)
response1 = http.request("GET", "http://www.stackabuse.com")
response2 = http.request("GET", "http://www.google.com")
Only through the PoolManager
, can we send a request()
, passing in the HTTP Verb and the address we're sending a request to. Different verbs signify different intents - whether you want to GET
some content, POST
it to a server, PATCH
an existing resource or DELETE
one.
How to Send HTTP Requests in Python with urllib3
Finally, let's take a look at how to send different request types via urllib3
, and how to interpret the data that's returned.
Send HTTP GET Request
An HTTP GET request is used when a client requests to retrieve data from a server, without modifying it in any way, shape or form.
To send an HTTP GET request in Python, we use the request()
method of the PoolManager
instance, passing in the appropriate HTTP Verb and the resource we're sending a request for:
import urllib3
http = urllib3.PoolManager()
response = http.request("GET", "http://jsonplaceholder.typicode.com/posts/")
print(response.data.decode("utf-8"))
Here, we sent a GET request to {JSON} Placeholder. It's a website that generates dummy JSON data, sent back in the response's body. Typically, the website is used to test HTTP Requests on, stubbing the response.
The HTTPResponse
instance, namely our response
object holds the body of the response. It can be accessed by the data
property which is a bytes
stream. Since a website might respond with an encoding we're not suited for, and since we'll want to convert the bytes
to a str
anyway - we decode()
the body and encode it into UTF-8 to make sure we can coherently parse the data.
If you'd like to read more, read our about guide to Converting Bytes to Strings in Python.
Finally, we print the response's body:
[
{
"userId": 1,
"id": 1,
"title": "sunt aut facere repellat provident occaecati excepturi optio reprehenderit",
"body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"
},
{
"userId": 1,
"id": 2,
"title": "qui est esse",
"body": "est rerum tempore vitae\nsequi sint nihil reprehenderit dolor beatae ea dolores neque\nfugiat blanditiis voluptate porro vel nihil molestiae ut reiciendis\nqui aperiam non debitis possimus qui neque nisi nulla"
},
...
Send HTTP GET Request with Parameters
Rarely do we not add certain parameters to requests. Path variables and request parameters are very common and allow for dynamic linking structures and organizing resources. For instance - we may want to search for a specific comment on a certain post through an API - http://random.com/posts/get?id=1&commentId=1
.
Naturally, urllib3
allows us to add parameters to GET requests, via the fields
argument. It accepts a dictionary of the parameter names and their values:
import urllib3
http = urllib3.PoolManager()
response = http.request("GET",
"http://jsonplaceholder.typicode.com/posts/",
fields={"id": "1"})
print(response.data.decode("utf-8"))
This will return only one object, with an id
of 1
:
[
{
"userId": 1,
"id": 1,
"title": "sunt aut facere repellat provident occaecati excepturi optio reprehenderit",
"body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"
}
]
HTTP POST Request
An HTTP POST request is used for sending data from the client side to the server side. Its most common usage is with file-uploading or form-filling, but can be used to send any data to a server, with a payload:
import urllib3
http = urllib3.PoolManager()
response = http.request("POST", "http://jsonplaceholder.typicode.com/posts", fields={"title": "Created Post", "body": "Lorem ipsum", "userId": 5})
print(response.data.decode("utf-8"))
Even though we're communicating with the same web address, because we're sending a POST
request, the fields
argument will now specify the data that'll be sent to the server, not retrieved.
We've sent a JSON string, denoting an object with a title
, body
and userId
. The {JSON} Placeholder service also stubs the functionality to add entities, so it returns a response letting us know if we've been able to "add" it to the database, and returns the id of the "created" post:
{
"id": 101
}
HTTP DELETE Request
Finally, to send HTTP DELETE requests, we simply modify the verb to "DELETE"
and target a specific post via its id
. Let's delete all posts with the id
s of 1..5
:
import urllib3
http = urllib3.PoolManager()
for i in range(1, 5):
response = http.request("DELETE", "http://jsonplaceholder.typicode.com/posts", fields={"id": i})
print(response.data.decode("utf-8"))
An empty body is returned, as the resources are deleted:
{}
{}
{}
{}
When creating a REST API - you'll probably want to give some status code and message to let the user know that a resource has been deleted successfully.
Send HTTP PATCH Requests
While we can use POST
requests to update resources, it's considered good practice if we keep POST
requests for only creating resources. Instead, we can fire a PATCH
request too update an existing resource.
Let's get the first post and then update it with a new title
and body
:
import urllib3
data = {
'title': 'Updated title',
'body': 'Updated body'
}
http = urllib3.PoolManager()
response = http.request("GET", "http://jsonplaceholder.typicode.com/posts/1")
print(response.data.decode('utf-8'))
response = http.request("PATCH", "https://jsonplaceholder.typicode.com/posts/1", fields=data)
print(response.data.decode('utf-8'))
This should result in:
{
"userId": 1,
"id": 1,
"title": "sunt aut facere repellat provident occaecati excepturi optio reprehenderit",
"body": "quia et suscipit\nsuscipit recusandae consequuntur expedita et cum\nreprehenderit molestiae ut ut quas totam\nnostrum rerum est autem sunt rem eveniet architecto"
}
{
"userId": 1,
"id": 1,
"title": "Updated title",
"body": "Updated body"
}
Send Secure HTTPS Requests in Python with urllib3
The urllib3
module also provides client-side SSL verification for secure HTTP connections. We can achieve this with the help of another module, called certifi
, which provides the standard Mozilla certificate bundle.
Its installation is pretty straightforward via pip
:
$ pip install certifi
With certifi.where()
, we reference the installed Certificate Authority (CA). This is an entity that issues digital certificates, which can be trusted. All these trusted certificates are contained in the certifi
module:
import urllib3
import certifi
http = urllib3.PoolManager(ca_certs=certifi.where())
response = http.request("GET", "https://httpbin.org/get")
print(response.status)
Now, we can send a secure request to the server.
Uploading Files with urllib3
Using urllib3
, we can also upload files to a server. To upload files, we encode the data as multipart/form-data
, and pass in the filename as well as its contents as a tuple of file_name: file_data
.
To read the contents of a file, we can use Python's built-in read()
method:
import urllib3
import json
with open("file_name.txt") as f:
file_data = f.read()
# Sending the request.
resp = urllib3.request(
"POST",
"https://reqbin.com/post-online",
fields= {
"file": ("file_name.txt", file_data),
}
)
print(json.loads(resp.data.decode("utf-8"))["files"])
For the purpose of the example, let's create a file named file_name.txt
and add some content:
Some file data
And some more
Now, when we run the script, it should print out:
{'file': 'Some file data\nAnd some more'}
When we send files using urllib3
, the response's data
contains a "files"
attribute attached to it, which we access through resp.data.decode("utf-8")["files"]
. To make the output a bit more readable, we use the json
module to load the response and display it as a string.
You can also supply a third argument to the tuple, which specifies the MIME type of the uploaded file:
... previous code
fields={
"file": ("file_name.txt", file_data, "text/plain"),
}
Conclusion
In this guide, we've taken a look at how to send HTTP Requests using urllib3
, a powerful Python module for handling HTTP requests and responses.
We've also taken a look at what HTTP is, what status codes to expect and how to interpret them, as well as how to upload files and send secure requests with certifi
.
from Planet Python
via read more
No comments:
Post a Comment