Understanding Virtual Threads in Java (JDK21)

by Arpit Kumar

23 Jul, 2023

9 minute read

Project Loom, an OpenJDK initiative, introduces virtual threads to Java, lightweight and efficient alternatives to traditional threads. Virtual threads simplify concurrency, maintain backward compatibility, and excel in I/O-bound tasks, offering high scalability and resource efficiency. They implement M:N scheduling and provide a seamless approach to thread management in Java applications

# java # programming # JDK

Table of Content

A short explanation of Threads

How much resource each thread consumes on initialization

Balancing Concurrency and Resource Management

What is Project Loom and Virtual threads

Before we start with virtual threads let’s first understand what threads are and how they help with the execution of a program.

A short explanation of Threads

A thread is essentially a path of execution within a program, allowing multiple tasks to be processed concurrently. Each thread represents an independent flow of control that can perform operations independently of other threads. Think of threads as individual workers in a large factory, each assigned specific tasks that can be executed simultaneously.

Process -> Multiple Threads -> Request Processing

In the Java Virtual Machine, threads can have different priorities, ranging from the lowest to the highest. Threads with higher priority are given preference in execution over threads with lower priority. Setting thread priorities is useful when you want to prioritize certain tasks over others to ensure critical operations receive more attention from the system.

Additionally, threads can be marked as daemon threads, which means they run in the background and do not prevent the JVM from shutting down when other non-daemon threads complete their tasks. Daemon threads are typically used for auxiliary operations that support the main functionality of an application like garbage collection.

Main Thread and JVM Startup

When a Java Virtual Machine starts up, it usually begins with a single non-daemon thread. This thread is known as the “main thread” and is responsible for executing the main method in a designated class. From this point, the main thread can create additional threads to handle various tasks concurrently.

Thread Execution and Termination

Thread execution continues until one of two conditions is met:

The exit method of the Runtime class has been called, and the security manager allows the exit operation to proceed. This typically results in the JVM shutting down, terminating all threads, including daemon threads.
All non-daemon threads have completed their tasks, either by returning from the run method (the entry point of a thread’s execution logic) or by encountering an unhandled exception that propagates beyond the run method.

How much resource each thread consumes on initialization

The amount of resources each thread consumes on initialization depends on the operating system, the programming language, and the underlying hardware architecture. However, there are some common resources that are allocated for each thread during initialization:

Thread Stack: One of the primary resources allocated to a thread is its stack. The stack is used to store local variables, function call frames, and other thread-specific data. The size of the stack is typically fixed or configurable and can vary based on the platform. For example, in Java, the default thread stack size is often around 1 MB, but it can be adjusted using JVM options

(java -XX:ThreadStackSize=1024)

Thread Control Block (TCB): The operating system maintains a data structure called the Thread Control Block (TCB) for each thread. This data structure contains information about the thread’s state, including its register values, program counter, and other essential details that the operating system needs to manage the thread.
Program Counter (PC) and Registers: Each thread has its program counter and register set. These registers store the current execution state of the thread, including the instruction pointer and the values of local variables.
Thread-Specific Data: Some additional data may be allocated for the thread, such as thread-specific variables or data structures.
Thread Identifier (TID): The operating system assigns a unique identifier to each thread. This identifier is used to distinguish between different threads in the system.
Memory for Thread Management: Some memory is needed for the internal management of the thread by the operating system.

It’s important to note that the memory allocated for each thread, particularly the stack size, is a finite resource. Allocating a large number of threads with significant stack sizes can quickly exhaust the available memory, leading to resource contention and potential performance issues.

As a developer, it’s essential to be mindful of the number of threads being used in an application and to consider the stack size and other resource requirements of each thread. Some programming languages and platforms provide ways to configure or limit the stack size to control memory usage. Proper thread management, such as using thread pools, can also help optimize resource utilization and improve the overall efficiency of multi-threaded applications.

Balancing Concurrency and Resource Management

Developers have been employing different strategies to handle multiple client requests simultaneously. One popular approach is the “Thread-per-Request” style of programming, where each incoming request is assigned its own dedicated thread to handle the task. While this approach has been widely used for nearly three decades and offers some benefits, it also comes with inherent challenges related to thread management and resource consumption.

Thread-per-Request Approach

In a thread-per-request model, when a new request arrives, the application creates a new thread to process that specific request. This way, each request is handled independently and concurrently with others, giving the impression of seamless responsiveness to users. The convenience of this model lies in its simplicity: each thread processes a single request, and developers can write request-handling code sequentially, mirroring the flow of each individual task.

Thread Pooling and Resource Management

However, as the number of incoming requests increases, the thread-per-request approach can quickly exhaust system resources. Creating and managing a large number of threads incurs a substantial memory overhead, as each thread requires its own stack and thread-specific data structures. Additionally, frequent thread creation and destruction add management overhead, impacting the application’s overall performance.

To address these concerns, developers have employed techniques like thread pooling, where a fixed number of threads are created at the start of the application and remain active throughout its lifecycle. Instead of creating a new thread for each incoming request, the thread pool reuses idle threads to handle new requests. This approach significantly reduces the overhead associated with thread creation and allows for better resource utilization.

Lower Concurrency and Scalability

One limitation of the thread-per-request model is its potential to limit the overall concurrency and scalability of the application. Since each request is tied to its own thread, the system’s ability to handle a large number of concurrent requests may be constrained by the available CPU cores and memory.

Asynchronous Programming Approach

There is another paradigm which is favored by developers who often turn to asynchronous programming. It is a powerful paradigm that allows tasks to be executed concurrently without relying on traditional thread-per-request approaches. Asynchronous programming enables better resource utilization and responsiveness, making it an effective strategy for enhancing application scalability. This is basically a thread-sharing style.

Understanding Asynchronous Programming

In contrast to the synchronous programming style, where tasks are executed sequentially, asynchronous programming decouples the execution of tasks from their invocation. Instead of waiting for each task to complete before moving on to the next one, asynchronous tasks are initiated and executed independently. The application can continue processing other tasks while waiting for the asynchronous tasks to finish.

At the core of asynchronous programming are non-blocking operations. When a task involves I/O operations instead of blocking the entire application while waiting for the result, asynchronous programming allows the application to move on to other tasks. Once the I/O operation is complete, a callback function is invoked to handle the result.

This kind of programming basically returns the thread to the pool to perform other tasks instead of waiting on I/O which results in better resource utilization. In asynchronous programming each stage of a request may get fulfilled by different thread

Benefits of Asynchronous Programming for Scalability

Efficient Resource Utilization: Asynchronous programming reduces the need for a large number of threads, as tasks can be executed concurrently without blocking the application. This approach optimizes resource utilization and memory consumption, enabling the application to handle a larger number of concurrent requests.
Enhanced Responsiveness: By eliminating blocking operations, asynchronous programming enhances application responsiveness. The application can quickly respond to new requests and continue processing other tasks while waiting for asynchronous operations to complete. This improved responsiveness leads to a better user experience.
Parallelism and Throughput: By executing multiple tasks concurrently, asynchronous programming can achieve parallelism and increase overall throughput. This is particularly advantageous for applications that deal with high volumes of I/O-bound operations, such as web servers or network applications.

Challenges and Considerations

While asynchronous programming offers significant benefits, it also introduces some challenges that developers must be mindful of:

Complexity: Asynchronous code can be more complex and harder to reason about compared to synchronous code. Careful design and error handling are crucial to prevent issues like callback hell and race conditions.
Callback Management: Managing callbacks and coordinating the flow of asynchronous tasks can become challenging, particularly in complex scenarios. Libraries and frameworks that support asynchronous programming can help alleviate some of these challenges.
Error Handling: Error handling can be more intricate in asynchronous code, as exceptions may not propagate through traditional call stacks. Proper handling and logging of errors are essential for debugging and maintaining application stability.

Asynchronous programming is a powerful approach to improve application scalability, responsiveness, and resource utilization. By embracing non-blocking operations and efficiently managing asynchronous tasks, developers can create highly scalable applications that deliver exceptional performance even under heavy workloads. While it introduces new challenges, the benefits of asynchronous programming make it a compelling choice for modern, high-performance applications.

I think now we are in a good position and have enough context to understand the what and why of virtual threads, especially focusing on its implementation.

What is Project Loom and Virtual threads

Project Loom is an ongoing project by the OpenJDK community to introduce a lightweight concurrency construct to Java. The goal of Project Loom is to make it easier to write high-performance, concurrent applications by providing a more efficient and flexible way to manage threads.

Project Loom introduces a new type of thread called a virtual thread. Virtual threads are lightweight and efficient, and they can be created and destroyed much more easily than OS threads. This makes them ideal for applications that need to create a large number of concurrent tasks. Project Loom tries to preserve the thread per request style with virtual threads.

As per JEP-444, there are certain goals which the team kept in mind while doing the implementation of virtual threads -

Enable server applications written in the simple thread-per-request style to scale with near-optimal hardware utilization.

Enable existing code that uses the java.lang.Thread API to adopt virtual threads with minimal change.
Enable easy troubleshooting, debugging, and profiling of virtual threads with existing JDK tools.

A virtual thread is an instance of java.lang.Thread that is not tied to a particular OS thread. A platform thread, by contrast, is an instance of java.lang.Thread implemented in the traditional way, as a thin wrapper around an OS thread.

Virtual threads use the thread-per-request approach running for the entire duration of the request while only consuming an OS thread when performing computations.

This approach offers scalability similar to the asynchronous style but with the advantage of seamless implementation. When a virtual thread executing code encounters a blocking I/O operation from the java.* API, the runtime automatically converts it into a non-blocking OS call, temporarily suspending the virtual thread until it can resume later.

Virtual threads are lightweight and practically unlimited in quantity. This efficient hardware utilization enables high concurrency and, consequently, superior throughput, all while maintaining harmony with the multi-threaded design of the Java Platform and its associated tooling.

As virtual threads are very lightweight they are not required to be pooled as platform threads are usually done. It’s comparable to Golang Goroutines in terms of execution style.

The number of virtual threads can be extremely high in comparison to the OS threads. Multiple virtual threads share a single OS thread.

Virtual threads employ M:N scheduling, where a large number (M) of virtual threads is scheduled to run on a smaller number (N) of OS threads.

It needs to be understood that virtual threads or platform threads can help in faster execution only in cases where tasks are I/O bound if it’s compute bound then increasing the number of virtual threads or platform threads is not going to help. As it would be limited by the number of CPU available to do the computation.

Virtual thread or platform thread can not make any process faster. It would always be limited by the CPU processing speed. Concurrency can only increase throughput not the latency.

So virtual threads help in simplifying the programming approach with its virtual thread-per-request style and also non-blocking and platform thread sharing model which enables asynchronous mode for I/O based requests. The way it’s implemented within the java.lang.thread makes it very easy to accommodate for most of the libraries and applications written using platform threads. This is like having your cake and eating it too.

For further details read JEP-444. It also talks about debugging, exception flow implementation etc.

I wrote about another big feature which became production ready in JDK 21 is generational ZGC. Read it here.