Download a PDF of this article
The vehicle of long standing for updating the Java language and the JVM has been JDK Enhancement Proposal (JEP) documents, which are prepared using an established template and posted on the OpenJDK website. As of early September 2021, they reach up to JEP 417, which is a proposal for a vector API. While JEPs represent individual proposals, they are frequently adopted as groups of related enhancements that form what the Java team refers to as projects. These projects are named rather randomly, sometimes after things (Loom, where threads are turned into cloth) or places (Valhalla, the fabled hall of Norse mythology) or the technology itself (Lambda).
Because the Java Platform Group and the OpenJDK contributors have so many projects going on simultaneously—showing how remarkably fast the language and the JVM are evolving—it can be difficult to track what the project names refer to. This article discusses the most important active projects and touches briefly on some older projects whose nicknames are still referred to in talks and articles.
I’ll start with the big three projects: Loom, Valhalla, and Panama and then look at Amber and a few others.
Project Loom
Project Loom has gained considerable attention from the Java community because it promises to deliver lightweight threads. Today, concurrency in Java is delivered via nonlightweight threads, which are, for all intents, wrappers around operating-system threads. While these threads have worked well over the years, they suffer from several limitations: They are heavyweight (the default stack size is 1 MB per thread), they are limited in number to several thousand and, most importantly, their execution is scheduled by the operating system—not by the JVM. Traditionally, the way to reduce some of these costs has been to employ a thread pool, in which a certain number of threads are preallocated and then borrowed individually for specific tasks, after which they are returned to the pool for reuse.
Project Loom aims to deliver a lighter version of threads, called virtual threads. In the planned implementation, a virtual thread is programmed just as a thread normally would be, but you specify at thread creation that it’s virtual. A virtual thread is multiplexed with other virtual threads by the JVM onto operating system threads. This is similar in concept to Java’s green threads in its early releases and to fibers in other languages.
How does this work? You create a virtual thread to which you assign a task, and the JVM manages scheduling this virtual thread on one of the OS threads. Because the JVM has knowledge of what your task is doing, it can optimize the scheduling. It will move your virtual thread (that is, the task) off the OS thread when it’s idle or waiting and intelligently move some other virtual thread onto the OS thread. When implemented correctly, this allows many lightweight threads to share a single OS thread.
The benefit is that the JVM, rather than the OS, schedules your task. This difference enables application-aware magic to occur behind the curtains. Suppose you’re using a thread to do blocking I/O. The JVM will convert the I/O request into an asynchronous request and run your virtual thread long enough to make the I/O call. Then, while you’re waiting for the response, the JVM will move your virtual thread off the OS thread and allow another virtual thread to run. As soon as the I/O response is received, your task is swapped back in—and you get the data. To you, this swapping is entirely invisible. Your code experiences the threaded task just as it would if it were running on a single, conventional OS thread.
A historical note: The goal of trying to circumvent the OS threads in order to manage thread scheduling in the JVM long predates Project Loom. In the early 2000s, BEA Systems, which sold the WebLogic Java EE server (later acquired by Oracle in 2007), was working on what it called “bare metal Java.” This was an attempt to run BEA’s JRockit JVM on hardware without an OS layer. In the experimental project, bare metal Java used a minimal software shim to handle file system interactions and networking. Everything else was handled in the JVM. In this way, the JVM could control the scheduling of all threads and, in theory, provide better performance.
Project Valhalla
Project Valhalla aims to improve performance as it relates to access to data items. Currently in Java, data items exist in two forms.
◉ Primitives (such as chars, integers, and floats), which are created on the stack, that is, in the memory allocated inside the execution context of the executing thread. These are accessed directly.
◉ Objects, which are created on the heap and which are accessed indirectly via a pointer. (There are a few exceptions when the Java compiler’s code optimizations create an object on the stack rather than on the heap.)
Dereferencing a pointer (that is, getting the value at the address the pointer points to) can be an expensive operation. The expense is particularly steep if the pointer or the object it points to is not in the processor’s cache, because this causes a cache miss, which is remediated by an expensive read from memory.
The problem is compounded in arrays of objects, where stepping through the array means repeatedly dereferencing pointer after pointer. An example of the problem is shown in Figure 1, which is an image used in Oracle’s presentations on Project Valhalla:
Figure 1. Stepping through an array of objects means dereferencing many pointers.
Valhalla seeks to solve this problem by introducing value types, which are a new form of data type that is programmed like objects but accessed like primitives. Specifically, value types are data aggregates that contain only data (no state) and are not mutable. By this means, the value types in Figure 1 can be stored as a single array with only a single header field for the entire array and direct access to the individual fields. Such a layout would look like Figure 2.
Figure 2. The layout in memory of data items from Figure 1 after Valhalla
Notice the single header and the absence of internal pointers. Access is now direct and faster, and the possibility of a cache miss when accessing an element is greatly reduced. Given the increase in the use of Java for big data applications where literally millions of values might be on the heap at any given moment, Valhalla has the potential to deliver substantial performance benefits.
A second phase of the project, which will be rolled out after the value types, is a refinement to generics to include these new types.
Valhalla has taken a long time to go from proposal to delivery. This is due to the extent of the changes entailed in creating a new kind of data type: It requires changes in the compiler, the class file structure, and the JVM. Reflecting the size of the challenge, the Java team has prototyped six different solutions to see what problems ensued from the trial implementations and how the prototype broke either existing code or the implicit covenants of the Java language.
As the project has progressed, terminology has changed: Value types were renamed inline classes and then renamed again to primitive classes. The name might change again, so for the purposes of clarity, I’ve stayed with the term used in most of the talks and presentations on Valhalla—just be aware that the terminology is evolving.
Project Panama
Project Panama simplifies the process of connecting Java programs to non-Java components. In particular, Panama aims to enable straightforward communication between Java applications and C-based libraries. In other languages, equivalent technology is often referred to as the foreign-function interface, or FFI. In Java, it is known as Java Native Interface, or JNI, and can be difficult to use.
Panama comprises several subprojects, which are beginning to ship in incubator status in the recent releases of Java. These include
◉ The Foreign-Memory Access API (
JEP 393 and
JEP 412) formalizes how Java programs can access foreign memory without concern about garbage collection and without using the existing JNI tools. Presently, the most common ways of doing this are the ByteBuffer API (available in Java since version 1.4) or using the soon-to-be-deprecated sun.misc.Unsafe package. This new API has been incubated multiple times, most recently in Java 16, and is slated to appear in Java 17.
◉ The Foreign Linker API (
JEP 389) links to C libraries and accesses C library symbol information.
◉ The Vector API (
JEP 338) facilitates the use of vector math. Vectors in this sense refer to the values placed in the special wide registers of the Intel x64 and ARM AArch64 architecture in which multiple calculations can be performed simultaneously. These registers vary from 64 bits to 512 bits in length. If you’ve programmed with Intel’s Streaming SIMD Extensions (SSE) and Advanced Vector Extensions (AVX), you’ve used vectors. This API, which will be released for another round of incubation, is a developer API—that is, it’s distinct from the Java runtime’s existing optimization that makes use of vector extensions, where available.
Project Amber
The three projects discussed so far are the ones that most often come up in presentations and conversations on the future direction of Java and the JVM. If you have difficulty keeping the names straight, here’s how I do it: The primary material for a loom is threads; Valhalla is a huge Norse meeting hall where dead warriors congregate—they most likely got there using pointy things (hence, pointers); and Panama sits between the Gulf of Mexico and the Pacific just like the FFI sits between the JVM and non-Java languages.
As you might expect, there are other projects ongoing, especially those gathered up in Project Amber (for which I have no catchy mnemonic aid).
Project Amber describes its goal as “explore and incubate smaller, productivity-oriented Java language features that have been accepted as candidate JEPs.” Amber has been the springboard from which several of the recent language improvements have been launched, starting with the addition of the var keyword for local variables in Java 10.
More recently, Amber has been responsible for text blocks, which came out in Java 13; records in Java 16; and pattern matching in instanceof comparisons, which was finalized in Java 16.
Several Amber subprojects are still in progress.
◉
Sealed classes, which have been previewed in the last few Java releases and are scheduled to be finalized in Java 17. Sealed classes (and interfaces) can limit which other classes or interfaces can extend or implement them.
◉
Pattern matching in switches is a feature that will be previewed in Java 17. While this capability is conceptually straightforward, the myriad possible syntax options make its implementation complex. The JEP I linked to presents a detailed examination of the options and the various difficulties they pose, especially with regard to maintaining rigorous compatibility with existing switch statements.
◉
Pattern matching for records and arrays, which, like pattern matching in switches, is an extension of pattern matching with instanceof. This language feature facilitates identification of the types of fields in records and of entries in arrays. A preview of this feature is projected for Java 18.
Amber could have been multiple separate projects. Due to this multiplicity of subprojects, it’s hard to know specifically what is being referred to by that name, and so it’s best to refer to the individual features, rather than the project name.
Completed projects
Several well-known Java projects have been completed and their features have been delivered. Nonetheless, the projects are still occasionally referred to by their project names. Here’s a quick guide to the most important ones.
Project Coin was a grab bag of
new small features in Java 7, which included try-with-resources, strings in switch statements, multi-catch of exceptions, and a handful of other advances.
Project Graal aimed to write a Java ahead-of-time compiler in Java itself. This project was successful enough that it spawned the much larger
GraalVM project.
Project Jigsaw delivered modules in Java 9 and modularized the JDK.
Project Skara examined version-control options for the Java codebase and led eventually to the entire JDK source code moving onto Git.
Source: oracle.com
0 comments:
Post a Comment