Python Friday #137: HTTP With Requests

Web crawlers are still a hot topic. More and more data is on the web; unfortunately, often not in the form we need it. Today we look at requests, the elegant HTTP library for Python that gives us a good starting point to interact with web applications.

This post is part of my journey to learn Python. You can find the other parts of this series here. You find the code for this post in my PythonFriday repository on GitHub.

Install requests

We can install the newest version of requests with this command:

pip install -U requests

1	pip install -U requests

There are a lot of special edge cases and tweaks to make requests work in the way certain web applications expect it. Should you run into some unique requirements, take a look at the extensive official documentation. You most likely will find a code example to handle your edge case.

To explore requests, you need a web application that can respond to the HTTP requests you send. Kenneth Reitz not only created requests, but he also made the demo application httpbin.org. You can use the online version or host a Docker image while you test requests. The code for this Flask app is on GitHub and it is a great help when you want to figure out how the different requests map get handled by Flask.

GET requests

We can create a GET request with the get() method that will call the web application on our behalf. The request results in a response object, which has the content of the URL in the .text property:

>>> import requests

>>> r = requests.get('https://jgraber.ch')
>>> r.text
'<!DOCTYPE html>\r\n<html lang='...

>>> import requests

>>> r = requests.get('https://jgraber.ch')

>>> r.text

'<!DOCTYPE html>\r\n<html lang='...

We can ask our response object for the status code to check if it was successful:

>>> r.status_code
200
>>> r.status_code == requests.codes.ok
True

>>> r.status_code

200

>>> r.status_code == requests.codes.ok

True

Query strings

Encoding parameters in the URL for query strings is often a pain. With requests we can create a dictionary with the values we need and pass it to the get() method:

>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.get('https://httpbin.org/get', params=payload)
>>> r.url
'https://httpbin.org/get?key1=value1&key2=value2'

>>> payload = {'key1': 'value1', 'key2': 'value2'}

>>> r = requests.get('https://httpbin.org/get', params=payload)

>>> r.url

'https://httpbin.org/get?key1=value1&key2=value2'

The application at httpbin.org returns JSON, that we can get with the json() method:

>>> r.json()
{
    'args': {
        'key1': 'value1', 
        'key2': 'value2'
        }, 
    'headers': {
        'Accept': '*/*', 
        'Accept-Encoding': 'gzip, deflate', 
        'Host': 'httpbin.org', 
        'User-Agent': 'python-requests/2.27.1', 
        'X-Amzn-Trace-Id': 'Root=1-63068b8d-12d628e31969095f64f294be'
        }, 
    'origin': '213.142.178.228', 
    'url': 'https://httpbin.org/get?key1=value1&key2=value2'
}

>>> r.json()

{

'args': {

'key1': 'value1',

'key2': 'value2'

'headers': {

'Accept': '*/*',

'Accept-Encoding': 'gzip, deflate',

'Host': 'httpbin.org',

'User-Agent': 'python-requests/2.27.1',

'X-Amzn-Trace-Id': 'Root=1-63068b8d-12d628e31969095f64f294be'

'origin': '213.142.178.228',

'url': 'https://httpbin.org/get?key1=value1&key2=value2'

}

HTTP headers

If we need to send special headers, we can create a dictionary and pass it to the get() method in a similar way as we passed the query strings:

>>> headers = {'user-agent': 'my-app/0.0.1'}
>>> r = requests.get('https://httpbin.org/get', headers=headers) 
>>> r.json()
{
    'args': {}, 
    'headers': {
        'Accept': '*/*', 
        'Accept-Encoding': 'gzip, deflate', 
        'Host': 'httpbin.org', 
        'User-Agent': 'my-app/0.0.1', 
        'X-Amzn-Trace-Id': 'Root=1-63068c4d-069d88c374a297341d678835'}, 
    'origin': '213.142.178.228', 
    'url': 'https://httpbin.org/get'
}

>>> headers = {'user-agent': 'my-app/0.0.1'}

>>> r = requests.get('https://httpbin.org/get', headers=headers)

>>> r.json()

{

'args': {},

'headers': {

'Accept': '*/*',

'Accept-Encoding': 'gzip, deflate',

'Host': 'httpbin.org',

'User-Agent': 'my-app/0.0.1',

'X-Amzn-Trace-Id': 'Root=1-63068c4d-069d88c374a297341d678835'},

'origin': '213.142.178.228',

'url': 'https://httpbin.org/get'

}

If we compare this output with the one above, we can see that the “User-Agent” header uses our supplied value.

We can access the headers of the response object to figure out what data we got back:

>>> r = requests.get('http://jgraber.ch')  
>>> r.headers
{
    'Date': 'Wed, 24 Aug 2022 20:53:40 GMT', 
    'Content-Type': 'application/json', 
    'Content-Length': '309', 
    'Connection': 'keep-alive', 
    'Server': 'gunicorn/19.9.0', 
    'Access-Control-Allow-Origin': '*', 
    'Access-Control-Allow-Credentials': 'true'
}

>>> r.headers['Server']
'gunicorn/19.9.0'
>>> r.headers['Content-Type'] 
'application/json'

>>> r = requests.get('http://jgraber.ch')

>>> r.headers

{

'Date': 'Wed, 24 Aug 2022 20:53:40 GMT',

'Content-Type': 'application/json',

'Content-Length': '309',

'Connection': 'keep-alive',

'Server': 'gunicorn/19.9.0',

'Access-Control-Allow-Origin': '*',

'Access-Control-Allow-Credentials': 'true'

}

>>> r.headers['Server']

'gunicorn/19.9.0'

>>> r.headers['Content-Type']

'application/json'

Post form data

To send data, we can create a dictionary for the values and pass it as data property to the post() method:

>>> r = requests.post('https://httpbin.org/post', data={'key': 'value'})
>>> r.json()
{
    'args': {}, 
    'data': '', 
    'files': {}, 
    'form': {'key': 'value'}, 
    'headers': {
        'Accept': '*/*', 
        'Accept-Encoding': 'gzip, deflate', 
        'Content-Length': '9', 'Content-Type': 
        'application/x-www-form-urlencoded', 
        'Host': 'httpbin.org', 
        'User-Agent': 'python-requests/2.27.1', 
        'X-Amzn-Trace-Id': 'Root=1-63068d0a-7104245d043d455c717e2d99'}, 
    'json': None, 
    'origin': '213.142.178.228', 
    'url': 'https://httpbin.org/post'
}

>>> r = requests.post('https://httpbin.org/post', data={'key': 'value'})

>>> r.json()

{

'args': {},

'data': '',

'files': {},

'form': {'key': 'value'},

'headers': {

'Accept': '*/*',

'Accept-Encoding': 'gzip, deflate',

'Content-Length': '9', 'Content-Type':

'application/x-www-form-urlencoded',

'Host': 'httpbin.org',

'User-Agent': 'python-requests/2.27.1',

'X-Amzn-Trace-Id': 'Root=1-63068d0a-7104245d043d455c717e2d99'},

'json': None,

'origin': '213.142.178.228',

'url': 'https://httpbin.org/post'

}

Follow redirects

Requests follows redirects automatically. If we call my web site on HTTP, the server sends a redirect to the HTTPS version. We may miss that redirect because we get the status code of the last action. However, there is a history property that contains the redirect response:

>>> r = requests.get('http://jgraber.ch')  
>>> r.status_code == requests.codes.ok    
True
>>> r.is_redirect
False
>>> r.status_code                         
200
>>> r.history
[<Response [301]>]

>>> r = requests.get('http://jgraber.ch')

>>> r.status_code == requests.codes.ok

True

>>> r.is_redirect

False

>>> r.status_code

200

>>> r.history

[<Response [301]>]

Other HTTP methods

While get() and post() are the most used HTTP methods, requests supports the full range of commands.

The head() method is great to see if a URL exists without downloading the content:

>>> r = requests.head('https://httpbin.org/get')
>>> r.text
''
>>> r.status_code
200
>>> r = requests.head('https://httpbin.org/get5') 
>>> r.status_code
404

>>> r = requests.head('https://httpbin.org/get')

>>> r.text

>>> r.status_code

200

>>> r = requests.head('https://httpbin.org/get5')

>>> r.status_code

404

The options() method is useful when you need to find out what methods are supported for a specific URL:

>>> r = requests.options('https://httpbin.org/get')
>>> r.text
''
>>> r.headers
{
    'Date': 'Wed, 24 Aug 2022 20:55:07 GMT', 
    'Content-Type': 'text/html; charset=utf-8', 
    'Content-Length': '0', 
    'Connection': 'keep-alive', 
    'Server': 'gunicorn/19.9.0', 
    'Allow': 'OPTIONS, GET, HEAD', 
    'Access-Control-Allow-Origin': '*', 
    'Access-Control-Allow-Credentials': 'true', 
    'Access-Control-Allow-Methods': 'GET, POST, PUT, DELETE, PATCH, OPTIONS', 
    'Access-Control-Max-Age': '3600'
}

>>> r = requests.options('https://httpbin.org/get')

>>> r.text

>>> r.headers

{

'Date': 'Wed, 24 Aug 2022 20:55:07 GMT',

'Content-Type': 'text/html; charset=utf-8',

'Content-Length': '0',

'Connection': 'keep-alive',

'Server': 'gunicorn/19.9.0',

'Allow': 'OPTIONS, GET, HEAD',

'Access-Control-Allow-Origin': '*',

'Access-Control-Allow-Credentials': 'true',

'Access-Control-Allow-Methods': 'GET, POST, PUT, DELETE, PATCH, OPTIONS',

'Access-Control-Max-Age': '3600'

}

If your application supports the PUT and DELETE command, you can use the methods put() or delete() to modify data:

r = requests.put('https://httpbin.org/put', data={'key': 'value'})

r = requests.delete('https://httpbin.org/delete')

r = requests.put('https://httpbin.org/put', data={'key': 'value'})

r = requests.delete('https://httpbin.org/delete')

With requests we can interact with web applications over HTTP(S). Whatever you need to do, if it is part of HTTP, you find a way to do it with requests. The interesting part of web crawling starts with the data we get back. Next week we find out how we can clean up that HTML and extract what we need.

3 thoughts on “Python Friday #137: HTTP With Requests”

Pingback: Python Friday #138: Parsing HTML With Beautiful Soup – Improve & Repeat
Pingback: Python Friday #140: Create a Basic Link Checker – Improve & Repeat
Pingback: Python Friday #141: Read TLS/SSL Certificates in Python – Improve & Repeat

Python Friday #137: HTTP With Requests

Install requests

GET requests

Query strings

HTTP headers

Post form data

Follow redirects

Other HTTP methods

Next

Like this:

Related

3 thoughts on “Python Friday #137: HTTP With Requests”

Leave a Comment Cancel reply

Install requests

GET requests

Query strings

HTTP headers

Post form data

Follow redirects

Other HTTP methods

Next

Share this:

Like this:

Related

3 thoughts on “Python Friday #137: HTTP With Requests”

Leave a Comment Cancel reply