Java19 was released yesterday, bringing a new feature that Java developers have been waiting for - virtual threads. Before Java had this new feature, Golang’s Goroutine had been popular for a long time, and it was a big hit in the field of concurrent programming. With the rapid development and promotion of Golang, it seems that coroutine has become one of the necessary features of the best languages in the world.
Java19 virtual threads are here to fill this gap. In this article, we will take you through an introduction to virtual threads and a comparison with Golang Goroutine to give you a taste of Java19 virtual threads.
Java thread model
java threads vs. virtual threads
Our common Java threads are one-to-one with the system kernel threads, and the system kernel thread scheduler is responsible for scheduling Java threads. In order to increase the performance of the application, we will add more and more Java threads, and obviously the system will take up a lot of resources to handle thread context switching when scheduling Java threads.
In recent decades, we have relied on the multithreaded model described above to solve the problems of concurrent programming in Java. To increase the throughput of the system, we have to keep increasing the number of threads, but the threads of the machine are expensive and the number of available threads is limited. Even though we use various thread pools to maximize the cost effectiveness of threads, threads often become bottlenecks in the performance of our applications before CPU, network, or memory resources are exhausted, not unlocking the maximum performance that the hardware should have.
To solve this problem Java19 introduces Virtual Thread. In Java19, the threads we used to use are called platform threads, which still correspond one-to-one with the system kernel threads. A large number (M) of virtual threads run on a smaller number (N) of platform threads (one-to-one correspondence with OS threads) (M:N scheduling). Multiple virtual threads are scheduled by the JVM to execute on a particular platform thread, and only one virtual thread is executed at a time on a platform thread.
Create Java virtual threads
New thread-related APIs
Thread.ofVirtual()
and Thread.ofPlatform()
are new APIs for creating virtual and platform threads.
|
|
Use Thread.startVirtualThread(Runnable)
to quickly create a virtual thread and start it.
Determine if a thread is virtual with Thread.isVirtual()
.
Use Thread.join
to wait for the virtual thread to finish, use Thread.sleep
to make the virtual thread sleep.
Use Executors.newVirtualThreadPerTaskExecutor()
to create an ExecutorService
that creates a new virtual thread for each task.
Support interchange and migration with existing code using thread pools and ExecutorService.
Because virtual threads are a preview feature in Java19, the code that appears in this article needs to be run as follows.
- Compile the program using
javac --release 19 --enable-preview Main.java
and run it usingjava --enable-preview Main
.- Or run the program using
java --source 19 --enable-preview Main.java
.
Performance of platform threads vs. virtual threads
Since we are trying to solve the problem of platform threads, we will directly test the performance of platform threads vs. virtual threads.
The test is simple: execute 10,000 tasks of one second of sleep in parallel and compare the total execution time and the number of system threads used.
To monitor the number of system threads used for the test, write the following code.
|
|
The scheduling thread pool fetches and prints the number of system threads every second, making it easy to observe the number of threads.
|
|
First we use Executors.newCachedThreadPool()
to execute 10000 tasks, because the maximum number of threads in newCachedThreadPool
is Integer.MAX_VALUE, so theoretically at least a few thousand system threads will be created to execute.
The output is as follows (redundant output has been omitted)
|
|
As you can see from the above output, the maximum number of system threads created is 3914 and then an exception is thrown when the threads continue to be created and the program terminates. It is not realistic to try to improve the performance of the system by having a large number of system threads, because threads are expensive and resources are limited.
Now we use a thread pool with a fixed size of 200 to solve the problem of not being able to request too many system threads.
|
|
The output is as follows.
With the fixed size thread pool, there is no problem of creating a large number of system threads causing failure, and the task can be run normally, with a maximum of 207 system threads created, taking a total of 50436ms.
Let’s take a look at the results of using virtual threads.
|
|
The only difference between the code that uses virtual threads and the one that uses fixed size is the word Executors.newFixedThreadPool(200)
replaced by Executors.newVirtualThreadPerTaskExecutor()
.
The output is as follows.
As can be seen from the output, the total execution time is 1582 ms and the maximum number of system threads used is 15. The conclusion is clear that using virtual threads is much faster than platform threads and uses less resources from the system threads.
If we replace the task in this test program with one that performs a one-second computation (e.g., sorting a huge array), rather than just sleep for 1 second, there is no significant performance gain even if we increase the number of virtual or platform threads to much larger than the number of processor cores. Because virtual threads are not faster threads, they have no advantage over platform threads in terms of how fast they can run code. Virtual threads exist to provide higher throughput, not speed (lower latency).
The use of virtual threads can significantly increase program throughput if your application meets the following two characteristics.
- The program has a high number of concurrent tasks.
- IO-intensive, workload-independent CPU constraints.
Virtual threads can help increase the throughput of server-side applications because such applications have a large number of concurrent tasks, and these tasks usually have a large number of IO waits.
Virtual Threads VS Goroutine
Usage Comparison
Go goroutine vs. Java virtual threads
Def
ine a say()
method with a method body that loops sleep for 100ms, then outputs index, and executes this method using a goroutine.
Go implementation.
java implementation.
|
|
You can see that the way coroutines are written in both languages is very similar, in general Java virtual threads are a little more cumbersome to write, Go uses a keyword to easily create a goroutine.
Go Channel vs. Java Blocking Queue
In Go programming, Goroutine works well with channel, using Goroutine to calculate the sum of array elements.
Go implementation.
|
|
Java implementation.
|
|
Since there are no slices in Java, arrays and indexes are used instead. there is no channel in Java, so BlockingQueue
, which is similar to a pipe, is used instead.
Comparison of Goroutine implementation principles
GO G-M-P model
The Go language uses a two-level thread model, where Goroutine is M:N with the system kernel threads, in line with the Java virtual threads. The final goroutine is still handed off to the OS thread for execution, but needs an intermediary to provide context. This is the G-M-P model.
- G: goroutine, similar to process control block, holds stack, state, id, function, etc. G can only be dispatched if it is bound to P.
- M: machine, system thread, bound to a valid P and then dispatched.
- P: logical processor, holds various queues G. For G, P is the cpu core. For M, P is the context.
- sched: scheduler, holds information such as GRQ (global run queue), M idle queue, P idle queue, and lock.
queues
The Go scheduler has two different run queues.
- GRQ, the global run queue, which has not yet been assigned to G for P (prior to Go 1.1 there was only the GRO global run queue, but LRQ was added to reduce lock waits because of performance issues with global queue locking).
- LRQ, the local run queue, each P has an LRQ that manages the G assigned to P. It is fetched from the GRQ when there is no G to be executed in the LRQ.
hand off mechanism
When G performs a blocking operation, G-M-P schedules idle M to execute the other Gs in the blocking M LRQ in order to prevent blocking M from affecting the execution of other Gs in the LRQ.
- G1 operates on M1 and P’s LRQ has 3 other Gs.
- G1 makes a synchronous call, blocking M.
- Scheduler separates M1 from P, when only G1 is running under M1 and there is no P.
- Bind P to idle M2, M2 selects other G from LRQ to run.
- G1 ends the blocking operation and moves back to LRQ. M1 is placed in the idle queue for backup.
work stealing mechanism
G-M-P In order to maximize the performance of the hardware, the task stealing mechanism is used to execute other waiting G’s when M is free.
- There are two P’s, P1 and P2.
- If P1’s Gs are all executed and LRQ is empty, P1 starts task stealing.
- In the first case, P1 gets G from GRQ.
- In the second case, P1 does not get G from GRQ, then P1 steals G from P2 LRQ.
The hand off mechanism is to prevent M from blocking and the task stealing is to prevent M from being idle.
Java Virtual Thread Scheduling
The JDK relies on the thread scheduler in the operating system for scheduling platform threads that are implemented based on operating system threads. For virtual threads, the JDK has its own scheduler. instead of assigning virtual threads to system threads directly, the JDK’s scheduler assigns virtual threads to platform threads (this is the M:N scheduling of virtual threads mentioned earlier). Platform threads are scheduled by the operating system’s thread scheduling system.
The JDK’s virtual thread scheduler is a ForkJoinPool
-like thread pool that runs in FIFO mode. The amount of parallelism in the scheduler depends on the number of platform threads in the scheduler’s virtual threads. The default is the number of CPU cores available, but it can be adjusted using the system property jdk.virtualThreadScheduler.parallelism
. Note that ForkJoinPool
here is different from ForkJoinPool.commonPool()
, which is used to implement parallel streams and runs in LIFO mode.
ForkJoinPool
and ExecutorService
work differently. ExecutorService
has a waiting queue to store its tasks, and the threads in it will receive and process those tasks. Whereas ForkJoinPool
has a waiting queue for each thread, when a task run by a thread generates another task, that task is added to that thread’s waiting queue, which happens when we run Parallel Stream
and a large task is divided into two smaller tasks.
To prevent the thread starvation problem, when there are no more tasks in a thread’s waiting queue, ForkJoinPool
also implements another pattern called task stealing, which means that a hungry thread can steal some tasks from another thread’s waiting queue. This is similar to the work stealing mechanism in the Go G-M-P model.
Virtual Thread Execution
Normally, a virtual thread is offloaded from the platform thread when it executes I/O or other blocking operations in the JDK, such as BlockingQueue.take()
. When the blocking operation is ready to complete (e.g., network IO has received bytes of data), the scheduler mounts the virtual thread on the platform thread to resume execution.
The majority of blocking operations in the JDK unload the virtual thread from the platform thread, allowing the platform thread to perform other work tasks. However, a few blocking operations in the JDK do not offload virtual threads and therefore block platform threads. This is because of operating system-level (e.g., many file system operations) or JDK-level (e.g., Object.wait()
) limitations. When these blocking operations block platform threads, they will compensate for the loss of other platform threads blocking by temporarily increasing the number of platform threads. As a result, the number of platform threads in the scheduler’s ForkJoinPool
may temporarily exceed the number of cores available to the CPU. The maximum number of platform threads available to the scheduler can be adjusted using the system property jdk.virtualThreadScheduler.maxPoolSize
. This blocking compensation mechanism is similar to the hand off mechanism in the Go G-M-P model.
In the following two cases, a virtual thread is fixed to the platform thread running it and cannot be unloaded during blocking operations.
- when executing code in a
synchronized
block or method. - When executing
native
methods or foreign function.
The fact that a virtual thread is fixed does not affect the correctness of program operation, but it may affect the concurrency and throughput of the system. If a virtual thread performs a blocking operation such as I/O or BlockingQueue.take()
while it is fixed, the platform thread responsible for running it will be blocked for the duration of the operation. (If the virtual thread is not fixed, it will be offloaded from the platform thread when it performs blocking operations such as I/O).
How to uninstall virtual threads
We create 5 unstarted virtual threads via Stream that are tasked with printing the current thread, then sleeping for 10 milliseconds, then printing the thread again. Then start these virtual threads and call jion()
to make sure the console can see everything.
|
|
From the console output, we can see that VirtualThread[#21] first runs on thread 1 of the ForkJoinPool, and continues on thread 4 when it returns from sleep.
Why does the virtual thread jump from one platform thread to another after sleep?
If we read the source code of the sleep method, we find that the sleep method has been rewritten in Java19, and the rewritten method adds virtual thread-related judgments.
|
|
Digging deeper into the code, we find that the real method called when the virtual thread sleeps is Continuation.yield
.
|
|
This means that Continuation.yield
transfers the stack of the current virtual thread from the platform thread’s stack to the Java heap memory, and then copies the stack of other ready virtual threads from the Java heap to the current platform thread’s stack to continue execution. Blocking operations such as IO or BlockingQueue.take()
cause a virtual thread switch, just like sleep. The virtual thread switch is also a relatively time-consuming operation, but it is much lighter than the context switch of the platform thread.
Other
Virtual Threads and Asynchronous Programming
Reactive programming solves the problem of platform threads needing to block waiting for other system responses. Instead of blocking and waiting for a response, using the asynchronous API notifies you of the result via a callback. When a response arrives, the JVM allocates another thread from the thread pool to process the response. This way, processing a single asynchronous request will involve multiple threads.
In asynchronous programming, we can reduce the response latency of the system, but the number of platform threads is still limited due to hardware limitations, so we still have a bottleneck in system throughput. Another problem is that asynchronous programs are executed in different threads and it is difficult to debug or analyze them.
Virtual threads improve code quality (reducing the difficulty of coding, debugging, and analyzing code) with smaller syntax tweaks, while having the advantage of reactive programming that can dramatically increase system throughput.
Don’t pool virtual threads
Because virtual threads are very lightweight and each virtual thread is intended to run only a single task for its lifetime, there is no need to pool virtual threads.
ThreadLocal under virtual threads
|
|
Virtual threads support ThreadLocal
in the same way as platform threads, where the platform thread does not have access to the variables set by the virtual thread, and the virtual thread does not have access to the variables set by the platform thread, making the platform thread responsible for running the virtual thread transparent to the virtual thread. However, since millions of virtual threads can be created, think twice before using ThreadLocal
in a virtual thread. If we create a million virtual threads in our application, there will be a million ThreadLocal
instances and the data they reference. A large number of objects can put a large burden on memory.
Replacing Synchronized with ReentrantLock
Because Synchronized
keeps the virtual thread pinned to the platform thread, blocking operations do not unload the virtual thread and affect the throughput of the program, so you need to use ReentrantLock
instead of Synchronized
.
befor:
after:
How to migrate
- Directly replace the thread pool with a virtual thread pool. If your project uses
CompletableFuture
you can also directly replace the thread pool for executing asynchronous tasks withExecutors.newVirtualThreadPerTaskExecutor()
. - Eliminate the pooling mechanism. Virtual threads are very lightweight and do not need to be pooled.
synchronized
is changed toReentrantLock
to reduce virtual threads being fixed to platform threads.
Summary
This article describes the Java thread model, the use of Java virtual threads, the principle and the applicable scenarios, and also compares it with the popular Goroutine, and also finds similarities between the two implementations, which hopefully will help you understand Java virtual threads. java19 virtual threads is a preview feature, and it is likely to become an official feature in Java21, which is worth waiting for.