hust_logo github

High Performance Asynchronous Programming Framework

based on Linux Asynchronous I/O and Coroutine

Background of the Study

When talking about the performance of I/O(Input/Output) operations, like file transfers and network communication...

There two drawbacks in the traditional system:

  1. use Synchronous I/O
    • Applications processing cannot continue until the I/O operation is completed.
  2. use Multithreading
    • The cost of thread switching is very high.

Both of them lead to weak throughput performance in high-concurrency system.

Modern Solutions

Maybe we can improve that situation with two steps:

  1. Replace Synchronous I/O with the Asynchronous I/O.
  2. Utilize the Coroutine technology.

Asynchronous I/O vs Synchronous I/O

  1. Synchronous I/O operations blocks until the I/O request has completed.
  2. Asynchronous I/O operations run in the background and never blocks.

Asynchronous I/O improves performance, because I/O operations and application processing can run in parallel.

sync-vs-async

io_uring: Asynchronous I/O API for Linux

io_uring is a new asynchronous I/O API for Linux with very low performance overheads.

  1. Applications submit I/O requests and then continue other tasks.
  2. Once the I/O is completed, the applications can obtain the results from the memory.
io_uring

Coroutine: suspendable and resumable Function

Coroutines are functions that allow execution to be suspended and resumed.

  1. A coroutine is a function that can suspend execution to be resumed later.
  2. Advantages over threads: Lower overhead.
  3. Disadvantages over threads: Unable to utilize multi-core CPU, cannot execute concurrently.
coroutine
reference

Research Objectives

The goal is to design and implement a high-performance asynchronous programming framework similar to OpenMPI but with distinct features based on Linux io_uring and Coroutine.

A new kind of Programming Framework like OpenMPI

It makes building systems with powerful throughput performance easier!
And it can cooperate with the OpenMPI for the higher performance!

The Framework vs OpenMPI

Commonalities with OpenMPI:

  1. Aim to enhance system performance.
  2. Applicable to system where multiple tasks are executed concurrently.

Differences from OpenMPI:

  1. It's designed for I/O-intensive tasks while OpenMPI is designed for compute-intensive tasks.
  2. It typically works on a single computer while OpenMPI is primarily used across multiple computers or compute nodes.

Literature Review

Some relevant literature:

Architectural Design

Task = Coroutine
IoUring
Waiting Queue
Executor
Task
Create task

Architectural Design

IoUring
Waiting Queue
Executor
Task
Submit I/O request

Architectural Design

IoUring
Waiting Queue
Executor
Task
Push

Architectural Design

IoUring
Waiting Queue
Executor
Task
Complete

Architectural Design

IoUring
Waiting Queue
Executor
Task
Pop

Architectural Design

IoUring
Waiting Queue
Executor
Task
Resume and execute

Architectural Design

IoUring
Waiting Queue
Executor
Task
If there're no other I/O, finish.

Architectural Design

IoUring
Waiting Queue
Executor
Task
If reach next I/O...

Architectural Design

IoUring
Waiting Queue
Executor
Task
Suspend the task

Architectural Design

IoUring
Waiting Queue
Executor
Task
Repeat

Into the Code

If we do no have the framework...

The code is very complex and ugly!

Just imagine if there is no OpenMPI in the field of HPC.

Into the Code

But if we have the framework...

The code is simple and elegant!

That's why we say it makes building systems with powerful throughput performance easier!

Results: Frameworks in C++(Kuro) and in Rust(Emma)

kuro
emma

Benchmark

  1. Comparison Subject:
    • Tokio: The most famous asynchronous framework in Rust programming language
    • async-std: Another asynchronous framework in Rust programming language
    • Go: Goroutine in the Go programming language
    • Sync: Traditional solution using Synchronous I/O and Multithreading
  2. Testing Methodology:
    • Implement an HTTP server using each framework,
    • Test performance on the servers using Apache Benchmark tool,
    • Compare the test results.
  3. Performance Metrics: The throughput of the server
  4. Testing Environment:
    • CPU: Intel(R) Xeon(R) Gold 5117 CPU @ 2.00GHz, 56 Cores
    • Memory: 32GB, 2666 MT/s x 16
    • NIC: Intel Corporation Ethernet Connection X722 for 10GbE SFP+ (rev 09)
benchmark

Benchmark Results

  1. X-axis represents the number of HTTP requests, and the Y-axis represents server throughput.
  2. C represents the concurrency of network requests, and L represents the number of processing cores in the server.
  3. Emma and Kuro frameworks achieve the highest throughput, while the Go framework has the lowest throughput.
  4. The Kuro framework exhibits a throughput improvement of up to 78% compared to synchronous I/O and multithreaded frameworks.
test-result

Q & A

Thank you!