Using asyncio for concurrent http requests with Python (asyncio.ensure_future and asyncio.gather)
Python has a few ways for implementing fetching data using http with various packages. "requests", "asyncio", etc. In this article will make a practical example that demonstrates how to run 100 requests concurrently as asyncio tasks and then how to save the results to a file as json, using asyncio.ensure_future and asyncio.gather.
Why would I want to use asyncio.ensure_future and asyncio.gather?
If you need to make requests that do not have to be executed subsequently and if you want to dramatically increase the performance then you can use this asynchronous approach.
This is not an article where I will explain every single step in details but I will add code comments as a guide.
- This article assumes that you know how to install Python packages and how to set up a virtual environment (optional)
- The solution below does not include error handling in order to keep it simple and to be able to focus on the scope of this task.
- The URL for the request comes from a dummy API which does not require an API key "https://meowfacts.herokuapp.com" and can be used immediately.
- The benchmarking results are based on the very slow internet connection I have in France, where I am on holiday at the moment.
import aiohttp import asyncio import time import json # Get the time at which the script starts the run start_time = time.time() # Utils functions def saveToFile(content): saveFile = open("cat_facts.json", "w") json.dump(content, saveFile, indent=4) saveFile.close() async def getCatFacts(session, url): async with session.get(url) as resp: cat_fact = await resp.json() return cat_fact['data'] # Main code to execute async def main(): # Initiate session async with aiohttp.ClientSession() as session: # Prepare the list of requests tasks =  for i in range(0, 100): url = "https://meowfacts.herokuapp.com" tasks.append(asyncio.ensure_future(getCatFacts(session, url))) result = await asyncio.gather(*tasks) # Save request to json saveToFile(result) # Print the results one by one to the console for facts in result: print(facts) asyncio.run(main()) # Print the time required for the requests to complete print("--- %s seconds ---" % (time.time() - start_time))
Requests completion time
... When well treated, a cat can live twenty or more years but the average life span of a domestic cat is 14 years. Statistics indicate that animal lovers in recent years have shown a preference for cats over dogs! You check your cats pulse on the inside of the back thigh, where the leg joins to the body. Normal for cats: 110-170 beats per minute. --- 2.289151906967163 seconds ---
vs the synchronous approach
Retractable claws are a physical phenomenon that sets cats apart from the rest of the animal kingdom. In the cat family, only cheetahs cannot retract their claws. The worlds largest cat measured 48.5 inches long. https://www.youtube.com/watch?v=gc5M0aGc_EI --- 54.436748027801514 seconds ---
over 52 seconds is a good performance gain.
Synchronous approach code
import requests import time start_time = time.time() for _i in range(0, 100): url = "https://meowfacts.herokuapp.com" resp = requests.get(url) data = resp.json() print(data['data']) print("--- %s seconds ---" % (time.time() - start_time))