Post

Introduction to Asynchronous Programming

Introduction to Asynchronous Programming

Introduction to Asyncio

The Paradigm Shift in Concurrent Programming

The Paradigm Shift in Concurrent Programming Traditionally, concurrent programming has been achieved by utilizing multiple threads. However, as those who have experience with manual threading can attest, writing thread-safe code is incredibly challenging. Furthermore, running multi-threaded programs on single-core processors often results in negligible performance gains—or even degradation—due to the overhead of context switching.

This is why asynchronous programming, which handles concurrency within a single thread, has recently gained significant traction. It has become essential for modern, large-scale applications to efficiently manage tasks such as parallel processing, network communication, and database integration.

Non-blocking I/O and the Evolution of Python

When developing applications like web servers, you will find that the time spent waiting for database or API responses far exceeds the time spent on actual CPU computation. Asynchronous programming prevents the CPU from idling during these I/O-bound wait times, allowing it to handle other tasks instead. This concept is commonly referred to as non-blocking.

While this model is native to languages like JavaScript, it was once a foreign concept to Python, which is fundamentally synchronous by design. However, with the introduction of the asyncio module in Python 3.4 and the official adoption of the async/await syntax in Python 3.5, asynchronous programming is now a built-in standard in the Python ecosystem, requiring no external libraries.

Simple test before we begin

Before we begin, we will explore the functions.

1
2
def do_sync():
  return 'sync'
1
do_sync()
1
'sync'
1
2
async def do_async():
  return 'async'
1
do_async()
1
<coroutine object do_async at 0x790685db2fb0>

From the test above, we learned the fact that async fn cannot be called like how sync fn is called. We need to use await keyword.

1
await do_async()
1
'async'

Now, ‘async’ is printed correctly. This behavior is consistent with JavaScript, which returns a ‘promise’ object unless the await keyword is specified.

Syncronous funciton example

1
2
3
4
5
6
7
import time

def find_users_sync(n):
  for i in range(1, n+1):
    print(f'{i}th user is being searched out of {n} people...')
    time.sleep(1) # intentional 1 second delay
  print(f'Total {n} people have been searched!')
1
2
3
4
5
6
7
def process_sync():
  start = time.time()
  find_users_sync(3)
  find_users_sync(2)
  find_users_sync(1)
  end = time.time()
  print(f' >>> total time taken for sync process: {end-start}')
1
process_sync()
1
2
3
4
5
6
7
8
9
10
1th user is being searched out of 3 people...
2th user is being searched out of 3 people...
3th user is being searched out of 3 people...
Total 3 people have been searched!
1th user is being searched out of 2 people...
2th user is being searched out of 2 people...
Total 2 people have been searched!
1th user is being searched out of 1 people...
Total 1 people have been searched!
 >>> total time taken for sync process: 6.00173020362854

Above is to show how a single thread web-server would work. 2nd function could start running when the 1st function finishes running by 100%.

Asyncronous Function example

1
2
3
4
5
6
7
import asyncio

async def find_users_async(n):
  for i in range(1, n+1):
    print(f'{i}th user is being searched out of {n} people...')
    await asyncio.sleep(1)
  print(f'Total {n} people have been searched!')
1
2
3
4
5
6
7
8
9
async def process_async():
  start = time.time()
  await asyncio.gather(  # --> By gather(), Python's event loop allowes to schedule and call three functions automatically and syncronously.
      find_users_async(3),
      find_users_async(2),
      find_users_async(1),
  )
  end = time.time()
  print(f' >>> total time taken for sync process: {end-start}')
1
process_async()
1
<coroutine object process_async at 0x790684cea980>

Again, do not call this without ‘await’ keyword. You get ‘coroutine’ object

1
await process_async()
1
2
3
4
5
6
7
8
9
10
1th user is being searched out of 3 people...
1th user is being searched out of 2 people...
1th user is being searched out of 1 people...
2th user is being searched out of 3 people...
2th user is being searched out of 2 people...
Total 1 people have been searched!
3th user is being searched out of 3 people...
Total 2 people have been searched!
Total 3 people have been searched!
 >>> total time taken for sync process: 3.004930019378662

By switching to an asynchronous approach, the execution time dropped from 6 seconds to 3 seconds. We effectively doubled the performance by handling these I/O-bound tasks concurrently rather than sequentially.

Why This Matters for LLMOps Engineers

Since LLM API calls and database lookups are typical I/O-bound tasks with significant latency, asyncio provides a non-blocking structure that prevents resource waste. It allows the CPU to handle other tasks while waiting for network responses. This enables parallel processing of multiple API calls in RAG pipelines or multi-agent environments, drastically reducing system latency.

Furthermore, mastering asyncio is essential for building high-performance serving layers (e.g., using FastAPI) by efficiently managing large-scale concurrent connections on a single thread. Ultimately, the ability to ensure high throughput and reduce operational costs with limited resources is a core competitive advantage for any LLMOps engineer.

This post is licensed under CC BY 4.0 by the author.