Python Friday #137: HTTP With Requests

Web crawlers are still a hot topic. More and more data is on the web; unfortunately, often not in the form we need it. Today we look at requests, the elegant HTTP library for Python that gives us a good starting point to interact with web applications.

This post is part of my journey to learn Python. You can find the other parts of this series here. You find the code for this post in my PythonFriday repository on GitHub.

 

Install requests

We can install the newest version of requests with this command:

There are a lot of special edge cases and tweaks to make requests work in the way certain web applications expect it. Should you run into some unique requirements, take a look at the extensive official documentation. You most likely will find a code example to handle your edge case.

To explore requests, you need a web application that can respond to the HTTP requests you send. Kenneth Reitz not only created requests, but he also made the demo application httpbin.org. You can use the online version or host a Docker image while you test requests. The code for this Flask app is on GitHub and it is a great help when you want to figure out how the different requests map get handled by Flask.

 

GET requests

We can create a GET request with the get() method that will call the web application on our behalf. The request results in a response object, which has the content of the URL in the .text property:

We can ask our response object for the status code to check if it was successful:

 

Query strings

Encoding parameters in the URL for query strings is often a pain. With requests we can create a dictionary with the values we need and pass it to the get() method:

The application at httpbin.org returns JSON, that we can get with the json() method:

 

HTTP headers

If we need to send special headers, we can create a dictionary and pass it to the get() method in a similar way as we passed the query strings:

If we compare this output with the one above, we can see that the “User-Agent” header uses our supplied value.

We can access the headers of the response object to figure out what data we got back:

 

Post form data

To send data, we can create a dictionary for the values and pass it as data property to the post() method:

 

Follow redirects

Requests follows redirects automatically. If we call my web site on HTTP, the server sends a redirect to the HTTPS version. We may miss that redirect because we get the status code of the last action. However, there is a history property that contains the redirect response:

 

Other HTTP methods

While get() and post() are the most used HTTP methods, requests supports the full range of commands.

The head() method is great to see if a URL exists without downloading the content:

The options() method is useful when you need to find out what methods are supported for a specific URL:

If your application supports the PUT and DELETE command, you can use the methods put() or delete() to modify data:

 

Next

With requests we can interact with web applications over HTTP(S). Whatever you need to do, if it is part of HTTP, you find a way to do it with requests. The interesting part of web crawling starts with the data we get back. Next week we find out how we can clean up that HTML and extract what we need.

3 thoughts on “Python Friday #137: HTTP With Requests”

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.