Python3's web tool library requests and aiohttp manual

1. Background

Python is amazing as an excellent scripting language with its excellent network library Requests. The only drawback is that it cannot implement asynchronous requests, only synchronous ones. If you want to implement asynchronous requests, you need to use aiohttp/httpx.

2. requests

Before we get into the aiohttp library, let’s see what are the best HTTP libraries under Python3, which are

requests
httpx
aiohttp

Requests is an easy-to-use synchronous request HTTP library for beginners who are just getting started with Python.

HTTPX is a latecomer, supporting both synchronous and asynchronous syntax, while aiohttp supports only asynchronous requests.

In terms of asynchronous request efficiency, the gap between httpx and aiohttp is not obvious.

If you are interested, you can also write two simple demos to verify the asynchronous request performance gap between httpx and aiohttp, httpx is not in the scope of this article.

2.1. installation

reqeusts is not a standard library for Python3. Use pip to install it.

`1`	`pip3 install requests`

2.2. Get and Post use

GET request

Take the example of requesting Json data of a website message list.

>>> import requests
>>> response = requests.get('https://www.chancel.me/messages?ownType=2&ownID=0',timeout=10)
>>> response.status_code
200
>>> response.json
{'data': {'hasNext': False, 'hasPrev': False, 'items': [{'create_time': 'Mon, 14 Jan 2019 17:34:42 GMT', 'id': 8, 'm_author': '浮光', 'm_content': '<blockquote><p>理解得越多就越痛苦。知道得越多就越撕裂。但他有着同痛苦相对称的清澈，与绝望相 均衡的坚韧。</p>\n<p>-- 勒内.夏尔</p>\n</blockquote>\n', 'm_email': 'ycs1026@vip.qq.com', 'm_environ': [], 'm_gravatar': 'https://www.gravatar.com/avatar/d45071b54cf3339bd3d16bb35f750f35?d=https%3A%2F%2Fwww.chancel.ltd%2Fstatic%2Fimg%2Fgravatar.jpg&s=64', 'm_own_id': 0, 'm_parent_id': None, 'm_site_url': None, 'm_type': 2, 'sub_messages': [], 't_message_type': {'id': 2, 'm_type': 'book'}}], 'page': 1, 'pages': 1, 'perPage': 100, 'total': 1}, 'message': '留言获取成功', 'success': True}
>>> response.json['message']
'获取成功'

et requests are relatively simple to use, and requests are very easy to encapsulate against the returned Json.

`POST request

Taking the switch blog theme as an example, I manually put the cookies assigned for the first visit to the site into the cookies field in headers.

headers = {'Cookie': 'IDTAG=3e56264d-8fa8-11eb-9636-0050560000a0'}
>>> request_data = {"theme":{"appClass":"mdui-color-white","bodyClass":"mdui-theme-layout-dark","containerClass":"","footerClass":""}}
>>> response = requests.post('https://www.chancel.me/idtag',headers=headers,json=request_data)
>>> response.status_code
200
>>> response.text
'{"data":{},"message":"存储成功","success":true}\n'

Post requests usually need to submit data, for json data (Content-Type:application/json) type the parameter is Json.

In case of form requests, the data parameter can be used, as follows.

`1`	`>>> response = requests.post('https://www.chancel.me/idtag',headers=headers,data=request_data)`

2.3. session usage

In practical scenarios, requests are often continuous and with cookies, and manually maintaining cookies for each POST submission is a troublesome chore.

For example, when repeatedly requesting to change the theme of a blog site, you need to carry information about the cookies that visit the blog, you can manually add the cookie information in the headers in the code.

Obviously it is not very convenient to do this every time, we can use the session carried by requests to solve this problem.

Take switching blog themes as an example.

>>> requests_session = requests.session()
>>> requests_session.get('https://www.chancel.me')
<Response [200]>
>>> requests_session.cookies
<RequestsCookieJar[Cookie(version=0, name='IDTAG', value='2155a6c4-8fa9-11eb-afcb-0050560000a0', port=None, port_specified=False, domain='www.chancel.ltd', domain_specified=False, domain_initial_dot=False, path='/', path_specified=True, secure=True, expires=1648460213, discard=False, comment=None, comment_url=None, rest={'HttpOnly': None}, rfc2109=False)]>
>>> request = requests_session.post('https://www.chancel.me/idtag',json=request_data)
>>> request.status_code
200
>>> request.text
'{"data":{},"message":"存储成功","success":true}\n'

The session can effectively store all the current request cookies and track the changes of cookies, which is a very useful wrapper for continuous requests in complex scenarios.

It is also very convenient for logging in to a website, as soon as the login operation is performed, the session object will automatically save the current cookies for other requests.

We can also serialize the session’s cookies property locally and use it again when we start the next session, avoiding the problem of repeated logins to get cookies.

Write to local cookies.

1
2
3

>>> with open('/tmp/chancel.ltd-cookies', 'w') as f:
...     f.write(json.dumps(requests_session.cookies.get_dict()))
...

View information about locally saved cookies.

1
2
3

# chancel @ home-ubuntu in ~ [17:42:22]
$ cat /tmp/chancel.ltd-cookies
{"IDTAG": "2155a6c4-8fa9-11eb-afcb-0050560000a0"}%

Retrieves information from locally stored cookies.

1
2

with open('/tmp/chancel.ltd-cookies','r') as f:
    requests_session.cookies.update(json.loads(f.read()))

The above usage is enough syntax for most request scenarios. For more advanced usage of request (e.g. file upload, SSL certificate ignore, web proxy wind, etc.) you can refer to the official documentation.

Requests: HTTP for Humans™

3. aiohttp

aiohttp is a very good asynchronous HTTP library, but relatively more difficult to get started than requests.

I believe that you are no stranger to asynchronous if you have programming experience, and here you can briefly explain the difference between synchronous and asynchronous.

Synchronous request: After initiating a request, the next code is not executed until the request returns or an exception is thrown.
Asynchronous request: After initiating a request, you can continue to execute the next code step.

By the principle of asynchronous, we can prepare a large number of requests to observe the difference between the two executions.

3.1. Installation

Installing aiohttp can also be done using pip.

`1`	`pip3 install aiohttp asyncio`

3.2. Use of Get and Post

`Get request

Take the example from the official website and modify it a bit, still taking the example of getting the Json data of the blog’s message list. Get request uses asynchronous methods that are standard asyncio library methods.

import aiohttp
import asyncio

async def main():
    async with aiohttp.ClientSession() as session:
        async with session.get('https://www.chancel.me/messages?ownType=2&ownID=0') as response:
            print('Response status -> %d' % response.status)

            response_json = await response.json()
            print('Response data -> %s' % response_json)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

# output
Response status -> 200
Response data -> {'data': {'hasNext': False, 'hasPrev': False, 'items': [{'create_time': 'Mon, 14 Jan 2019 17:34:42 GMT', 'id': 8, 'm_author': '浮光', 'm_content': '<blockquote><p>理解得越多就越痛苦。知道得越多就越撕裂。但他有着同痛苦相对称的清澈，与绝望相均衡的坚韧。</p>\n<p>-- 勒内.夏尔</p>\n</blockquote>\n', 'm_email': 'ycs1026@vip.qq.com', 'm_environ': [], 'm_gravatar': 'https://cdn.v2ex.com/gravatar/d45071b54cf3339bd3d16bb35f750f35?d=https%3A%2F%2Fwww.chancel.ltd%2Fstatic%2Fimg%2Fgravatar.jpg&s=64', 'm_own_id': 0, 'm_parent_id': None, 'm_site_url': None, 'm_type': 2, 'sub_messages': [], 't_message_type': {'id': 2, 'm_type': 'book'}}], 'page': 1, 'pages': 1, 'perPage': 100, 'total': 1}, 'message': '留言获取成功', 'success': True}

`POST request

Still take switching blog themes as an example.

import aiohttp
import asyncio


headers = {'Cookie': 'IDTAG=f4fad1e1-911d-11eb-bbcb-0050560000a0'}
request_data = {"theme":{"appClass":"mdui-color-white","bodyClass":"mdui-theme-layout-dark","containerClass":"","footerClass":""}}

async def main():
    async with aiohttp.ClientSession() as session:
        async with session.post('https://www.chancel.me/idtag',headers=headers,json=request_data) as response:
            print(response.status)

            response_json = await response.json()
            print(response_json)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

# output
Response status -> 200
Response data -> {'data': {}, 'message': '存储成功', 'success': True}

Ordinary requests for aiohttp are not complicated, if you are not sure about the await/async syntax, you can refer to Coroutines and Tasks - doc.python.org.

3.3. Use of Session

The use of cookies in aiohttp is more complex, it also supports exporting to files, but the file format is binary, it can also be exported to Json files in line with Requests’ session, but it needs to use the unsafe flag, for details refer to Cookie Jar - docs.aiohttp.org.

import aiohttp
import asyncio


request_data = {"theme":{"appClass":"mdui-color-white","bodyClass":"mdui-theme-layout-dark","containerClass":"","footerClass":""}}

async def main():
    async with aiohttp.ClientSession() as session:
        async with session.get('https://www.chancel.me'):
            async with session.post('https://www.chancel.me/idtag',json=request_data) as response:

                session.cookie_jar.save('www.chancel.ltd.cookies')
                # session.cookie_jar.load(file_path) 可以读回cookies,大部分情况无需理会cookie内容

                print('Response status -> %d' % response.status)

                response_json = await response.json()
                print('Response data -> %s' % response_json)

loop = asyncio.get_event_loop()
loop.run_until_complete(main())

if __name__ == '__main__':
    main()

4. Speed comparison

The biggest difference between aiohttp and Requests is the asynchronous requests, taking the example of requesting 10 blog theme switches. It takes about 15 seconds to execute the code below, and you can see from the return that each request is executed sequentially.

import requests
import time

index_url = 'https://www.chancel.me'
post_url = 'https://www.chancel.me/idtag'
post_data = {"theme": {"appClass": "mdui-color-white", "bodyClass": "mdui-theme-layout-dark", "containerClass": "", "footerClass": ""}}


def requests_post(count: int):
    requests_session = requests.session()
    requests_session.get(index_url)
    response = requests_session.post(post_url, json=post_data)
    if response.ok:
        print('第%d次Post请求更换主题成功，返回结果Response数据->%s' % (count,response.json()))

if __name__ == '__main__':
    try_count = 10
    post_count = 1
    start_time = time.time()
    while post_count < try_count + 1:
        requests_post(count=post_count)
        post_count += 1
    stopwatch = time.time() - start_time
    print('Requests使用POST请求%d次共用时%d秒' % (try_count, stopwatch))

# output
第1次Post请求更换主题成功，返回结果Response数据->{'data': {}, 'message': '存储成功', 'success': True}
第2次Post请求更换主题成功，返回结果Response数据->{'data': {}, 'message': '存储成功', 'success': True}
...
第7次Post请求更换主题成功，返回结果Response数据->{'data': {}, 'message': '存储成功', 'success': True}
第8次Post请求更换主题成功，返回结果Response数据->{'data': {}, 'message': '存储成功', 'success': True}
第9次Post请求更换主题成功，返回结果Response数据->{'data': {}, 'message': '存储成功', 'success': True}
第10次Post请求更换主题成功，返回结果Response数据->{'data': {}, 'message': '存储成功', 'success': True}
Requests使用POST请求10次共用时15秒

And the same request logic, aiohttp takes only 1 second to complete, and you can see that the execution of the request is out of order.

import aiohttp
import asyncio
import time

index_url = 'https://www.chancel.me'
post_url = 'https://www.chancel.me/idtag'
post_data = {"theme": {"appClass": "mdui-color-white", "bodyClass": "mdui-theme-layout-dark", "containerClass": "", "footerClass": ""}}


async def aiohttp_post(count: int):
    async with aiohttp.ClientSession() as session:
        async with session.get(index_url):
            async with session.post(post_url, json=post_data) as response:
                response_json = await response.json()
                print('第%d次Post请求更换主题成功，返回结果Response数据-> %s' % (count, response_json))


if __name__ == '__main__':
    try_count = 10
    post_count = 1
    start_time = time.time()
    tasks = []
    while post_count < try_count + 1:
        tasks.append(aiohttp_post(count=post_count))
        post_count += 1
    loop = asyncio.get_event_loop()
    loop.run_until_complete(asyncio.wait(tasks))
    stopwatch = time.time() - start_time
    print('Requests使用POST请求%d次共用时%d秒' % (try_count, stopwatch))

# output
第2次Post请求更换主题成功，返回结果Response数据-> {'data': {}, 'message': '存储成功', 'success': True}
第4次Post请求更换主题成功，返回结果Response数据-> {'data': {}, 'message': '存储成功', 'success': True}
第10次Post请求更换主题成功，返回结果Response数据-> {'data': {}, 'message': '存储成功', 'success': True}
第6次Post请求更换主题成功，返回结果Response数据-> {'data': {}, 'message': '存储成功', 'success': True}
第1次Post请求更换主题成功，返回结果Response数据-> {'data': {}, 'message': '存储成功', 'success': True}
第8次Post请求更换主题成功，返回结果Response数据-> {'data': {}, 'message': '存储成功', 'success': True}
第9次Post请求更换主题成功，返回结果Response数据-> {'data': {}, 'message': '存储成功', 'success': True}
第7次Post请求更换主题成功，返回结果Response数据-> {'data': {}, 'message': '存储成功', 'success': True}
第5次Post请求更换主题成功，返回结果Response数据-> {'data': {}, 'message': '存储成功', 'success': True}
第3次Post请求更换主题成功，返回结果Response数据-> {'data': {}, 'message': '存储成功', 'success': True}
Requests使用POST请求10次共用时1秒

5. aiohttp concurrency control and timeout settings

5.1. Concurrency limitation

Using the above example as an example, the code to limit the frequency of aiohttp requests to switch blog topics is as follows.

import aiohttp
import asyncio
import time

index_url = 'https://www.chancel.me'
post_url = 'https://www.chancel.me/idtag'
post_data = {"theme": {"appClass": "mdui-color-white", "bodyClass": "mdui-theme-layout-dark", "containerClass": "", "footerClass": ""}}

connector = aiohttp.TCPConnector(limit=1)


async def aiohttp_post(session: aiohttp.ClientSession, count: int):
    async with session.get(index_url):
        async with session.post(post_url, json=post_data) as response:
            response_json = await response.json()
            print('第%d次Post请求更换主题成功，返回结果Response数据-> %s' % (count, response_json))


if __name__ == '__main__':
    try_count = 10
    post_count = 1
    start_time = time.time()
    tasks = []
    client_session = aiohttp.ClientSession(connector=connector)
    while post_count < try_count + 1:
        tasks.append(aiohttp_post(session=client_session, count=post_count))
        post_count += 1
    loop = asyncio.get_event_loop()
    loop.run_until_complete(asyncio.wait(tasks))
    stopwatch = time.time() - start_time
    print('Requests使用POST请求%d次共用时%d秒' % (try_count, stopwatch))

# output
第3次Post请求更换主题成功，返回结果Response数据-> {'data': {}, 'message': '存储成功', 'success': True}
第7次Post请求更换主题成功，返回结果Response数据-> {'data': {}, 'message': '存储成功', 'success': True}
第10次Post请求更换主题成功，返回结果Response数据-> {'data': {}, 'message': '存储成功', 'success': True}
第4次Post请求更换主题成功，返回结果Response数据-> {'data': {}, 'message': '存储成功', 'success': True}
第8次Post请求更换主题成功，返回结果Response数据-> {'data': {}, 'message': '存储成功', 'success': True}
第6次Post请求更换主题成功，返回结果Response数据-> {'data': {}, 'message': '存储成功', 'success': True}
第5次Post请求更换主题成功，返回结果Response数据-> {'data': {}, 'message': '存储成功', 'success': True}
第1次Post请求更换主题成功，返回结果Response数据-> {'data': {}, 'message': '存储成功', 'success': True}
第9次Post请求更换主题成功，返回结果Response数据-> {'data': {}, 'message': '存储成功', 'success': True}
第2次Post请求更换主题成功，返回结果Response数据-> {'data': {}, 'message': '存储成功', 'success': True}
Requests使用POST请求10次共用时5秒

This restricts only a single request to connect at a time, but it is still much faster than requests, but it does not mean that asynchronous requests are necessarily faster than synchronous requests, and needs to be discussed in the context of the scenario.

5.2. Timeout limits

The timeout limit is very simple in Requests, e.g.

1
2
3

import requests

response = requests.get('https://www.chancel.me',timeout=10)

So in aiohttp is also so simple? You can try it.

import aiohttp
import asyncio
import time

session = aiohttp.ClientSession()


async def aiohttp_post(session: aiohttp.ClientSession, count: int):
    async with session.get('https://www.chancel.me',timeout=10) as response:
        print('第%d次请求返回状态码%d' % (count, response.status))


if __name__ == '__main__':
    try_count = 100
    post_count = 1
    start_time = time.time()
    tasks = []
    connector = aiohttp.TCPConnector(limit=1)
    client_session = aiohttp.ClientSession(connector=connector)
    while post_count < try_count + 1:
        tasks.append(aiohttp_post(session=client_session, count=post_count))
        post_count += 1
    loop = asyncio.get_event_loop()
    loop.run_until_complete(asyncio.wait(tasks))
    stopwatch = time.time() - start_time
    print('请求首页%d次共耗时%d秒' % (try_count, stopwatch))

# output
第55次请求返回状态码200
第56次请求返回状态码200
第57次请求返回状态码200
第58次请求返回状态码200
第59次请求返回状态码200
第60次请求返回状态码200
第61次请求返回状态码200
第62次请求返回状态码200
第32次请求返回状态码200
Task exception was never retrieved
future: <Task finished name='Task-68' coro=<aiohttp_post() done, defined at /mnt/sda/Codes/dev/test_code/demo.py:8> exception=TimeoutError()>
Traceback (most recent call last):
  File "/mnt/sda/Codes/dev/test_code/demo.py", line 9, in aiohttp_post
    async with session.get('https://www.chancel.me',timeout=5) as response:
  File "/mnt/sda/Codes/dev/test_code/.venv/lib/python3.9/site-packages/aiohttp/client.py", line 1117, in __aenter__
    self._resp = await self._coro
  File "/mnt/sda/Codes/dev/test_code/.venv/lib/python3.9/site-packages/aiohttp/client.py", line 619, in _request
    break
  File "/mnt/sda/Codes/dev/test_code/.venv/lib/python3.9/site-packages/aiohttp/helpers.py", line 656, in __exit__
    raise asyncio.TimeoutError from None
asyncio.exceptions.TimeoutError
Task exception was never retrieved
请求首页100次共耗时10秒

If you try to do this, you will find that whether you request the blog home page 100 times or 10000 times, it will return within 10 seconds.

The reason why the 10 seconds timeout exception is thrown is that the timeout in session.get('https://www.chancel.me',timeout=10) refers to the timeout of the whole session, which looks very awkward.

Combined with the official example, you can set the total timeout of the whole asynchronous request, which is actually the same as the above code, only it looks less awkward.

import aiohttp
import asyncio
import time

session = aiohttp.ClientSession()


async def aiohttp_post(session: aiohttp.ClientSession, count: int):
    async with session.get('https://www.chancel.me') as response:
        print('第%d次请求返回状态码%d' % (count, response.status))


if __name__ == '__main__':
    try_count = 100
    post_count = 1
    start_time = time.time()
    tasks = []
    connector = aiohttp.TCPConnector(limit=1)
    timeout = aiohttp.ClientTimeout(total=60 * 5, connect=None, sock_connect=10, sock_read=None)
    client_session = aiohttp.ClientSession(connector=connector,timeout=timeout)
    while post_count < try_count + 1:
        tasks.append(aiohttp_post(session=client_session, count=post_count))
        post_count += 1
    loop = asyncio.get_event_loop()
    loop.run_until_complete(asyncio.wait(tasks))
    stopwatch = time.time() - start_time
    print('请求首页%d次共耗时%d秒' % (try_count, stopwatch))

# output
第26次请求返回状态码200
第88次请求返回状态码200
第27次请求返回状态码200
...
第25次请求返回状态码200
第87次请求返回状态码200
请求首页100次共耗时17秒

The complete 100 requests took about 17 seconds, similar to the requests score. timeout object has 4 properties.

total: the timeout of the entire session (in seconds, type float).
connect: the number of seconds the connection pool waits to get a connection, i.e. the timeout waiting to allocate the requested resource (in seconds, type float).
sock_connect: the timeout to connect to the other server, i.e. the traditional request timeout (in seconds, type float).
sock_read: timeout for reading a resource (usually reading the returned data).

6. Finally

Practice using down, about aiohttp many uses are different from requests, asynchronous calls in many APIs with the traditional multi-threaded approach is completely different ideas, the best way to have doubts or read the documentation.

Table of Contents