Async Python. Deep Dive

February 2, 2020 (4y ago)

Archive

You may have encountered explanations that appear overly simplistic, merely emphasizing the importance of async programming without going into the mechanics or exploring high-level APIs such as the asyncio module in the standard library. Clearly, you already understand the significance and use cases, so let's cut to the chase and see how it actually works.

Pausing

So, what's the deal with async functions in Python? We all know they pause execution until some operation resolves, right? but how does this mechanism truly operate? Let's take a step back and check out a foundational concept:

Iterators

An iterator serves as a programming object that facilitates the systematic traversal of a container, granting access to its elements one at a time. It maintains the state of the traversal, allowing the iterator to pause execution after processing a specific item. To continue, a call to the next() method, which is typically named by convention, is made. This call prompts the iterator to resume execution and retrieve the next element in the sequence. This iterative process persists until the iterator reaches the end of the container, and this approach is commonly referred to as lazy evaluation.

The key concepts here are the actions of pausing, resuming, and awaiting the next element. These notions become particularly relevant and meaningful when correlated with the concept of asynchronous functions, the idea of pausing and resuming execution aligns with the asynchronous nature of handling tasks, which allows the program to efficiently manage and switch between various operations without blocking the entire execution flow. The term "await" further emphasizes this waiting aspect, where the program can await the completion of a specific asynchronous task before proceeding, we call this task a Future, more on this later.

How Do I Create An Iterator

To define your own iterator you have to adhere to the iterator protocol meaning you have to define __iter__ and __next__ methods

__iter__

The purpose of the __iter__ method is to return the iterator object itself. When this method is implemented in a class, it allows instances of that class to be used in a for loop or any other iteration context. In other words, it enables the iteration over the elements of the associated object.

__next__

The __next__ method is here to provide the next element in the sequence when iterating over the associated object. For example, in a loop like for x in my_object, Python internally calls my_object.__next__() in each iteration to fetch the subsequent element. If the iteration is complete, the method raises a StopIteration error, signaling that there are no more elements to be retrieved.

Example

This object enables you to read a file, line by line without loading the entire file into memory.

1class EfficientFile:
2    def __init__(self, file_path: str):
3        self.file_path = file_path
4
5    def __iter__(self) -> 'EfficientFile':
6        self.file = open(self.file_path, 'r')
7        return self
8
9    def __next__(self) -> str:
10        line = self.file.readline()
11        if not line:
12            self.file.close()
13            raise StopIteration
14        return line.strip()
15
16file = EfficientFile(file_path='pyproject.toml')
17
18for line in file:
19    print(line)
20

An alternative way to read all the lines using the next() method:

try: 
	while True:
		line = next(file) # or file.__next__()
		print(line) 
except StopIteration:
	pass

In this example, the __iter__ method opens the file, and the __next__ method reads one line at a time until the end of the file is reached. This approach ensures that only one line is loaded into memory at a time, by only loading parts of the file upon request, the request in this case is the act of calling the next() method which makes it memory-efficient, even for large files.

Async Iterators

The extend the concept of iterators to asynchronous operations. Async iterators implement two key methods:

__aiter__

Similar to its synchronous counterpart, this method returns the async iterator object itself, making it compatible with async for loops instead of traditional for loops.

__anext__

Is the async version of next, __anext__ provides the next asynchronous value only upon request. This is contingent on the completion of another Future object. When the async iterator exhausts its items, it triggers a StopAsyncIteration exception. The implementation of this method should return a Future of some sort.

The invocation of __anext__ involves the use of the async and await keywords. Speaking of which, the await keyword can only be used with an awaitable object. So...

What Exactly Is An awaitable

An awaitable is an object that can be used with the await keyword within an async function. It typically represents an operation that may not be immediately completed, such as an asynchronous task or a Future object.

How Do I Define An awaitable

Define an object that implements the __await__() dunder method which should return a generator object

What Are Generators ?

Generators are functions that leverage the yield keyword. Their primary purpose is to provide a concise and efficient means of creating iterators. The inclusion of yield within generator functions eliminates the need for manual implementation of the iterator protocol.

When encountering the 'yield' statement, an event occurs: The current state of the generator function is preserved, and control is temporarily relinquished back to the calling function or context. At this point, the generator is in a suspended state, storing all local variables and the execution point.

Upon calling the generator again, whether through a for loop or the next() function, the generator resumes execution from where it was paused by the last 'yield'. This resumption allows the generator to produce the next value in the sequence. This process repeats until the generator function completes or encounters a 'return' statement.

Now, this behavior is also associated with coroutines, a Coroutine is essentially an awaitable object designed to handle asynchronous operations. A Coroutine accepts three types: the yield type, the send type, and the return type, just like the generator below.

def my_generator() -> Generator[YieldType, None, None]:
    yield some_value
    yield another_value


gen = my_generator()
for value in gen:
    print(value)

  • - The first None represents the type of the value that can be sent into the generator using the send() method (though this is optional and not commonly used).
  • - The second None represents the type of the value that the generator returns. In most cases, generators don't return a value explicitly, so it's None.
  • - If the generator doesn't yield any values, you can replace YieldType with None

I've referred to Future, so what's a future ? it's a combination of both iterable and awaitable , you might have also encountered an asynciterable, this is an object that defines and aiter method and returns an asynciterator. Lastly there's AsyncGenerator, just like Generator but it actually it inherits from an AsyncIterator instead of Ìterator

Read Again

Seems confusing doesn't it ?