Friday, May 6, 2022

How the JVM Locates, Loads, and Runs Libraries

Class loaders are the key to understanding how the JVM executes programs.

Classes are the building blocks of Java’s type system, but they also serve another fundamental purpose: a class is a compilation unit, the smallest piece of code that can be individually loaded and run a JVM process.

The class-loading mechanism was set from the beginning of Java time, back in JDK 1.0, and it immensely affected Java’s popularity as a cross-platform solution. Compiled Java code—in the form of class files and packaged JAR files—can be loaded into a running JVM process on any of many supported operating systems.

JVM Locates, Oracle Java Exam Prep, Java Exam Preparation, Java Career, Java Jobs, Java Skills, Java Certification

It’s this ability that has allowed developers to easily distribute compiled binaries of libraries. Because it is so much easier to distribute JAR files than source code or platform-dependent binaries, this ability has made Java popular, particularly in open source projects.

In this article, I explain the Java class-loading mechanism in detail and how it works. I also explain how classes are found in the classpath and how are they loaded into memory and initialized for use.

The Mechanics of Loading Classes into the JVM

Imagine you have a simple Java program such as the one below:

public class A { 

   public static void main(String[] args) {

      B b = new B();

      int i = b.inc(0); 

      System.out.println(i); 

   } 

}

When you compile this piece of code and run it, the JVM correctly determines the entry point into the program and starts running the main method of class A. However, the JVM doesn’t load all imported classes or even referred-to classes eagerly—that is, right away. In particular, this means that only when the JVM encounters the bytecode instructions for the new B() statement will it try to locate and load class B.

Besides calling a constructor of a class, there are other ways to initiate the process of loading a class, such as accessing a static member of the class or accessing it through the Reflection API.

In order to actually load a class, the JVM uses Classloader objects. Every already loaded class contains a reference to its class loader, and that class loader is used to load all the classes referenced from that class. In the preceding example, this means that loading class B can be approximately translated into the following Java statement: A.class.getClassLoader().loadClass("B").

Here comes a paradox: every class loader is itself an object of the java.lang.Classloader type that developers can use to locate and load the classes by name. If you’re confused by this chicken-and-egg problem and wonder how the first class loader that loads all the JDK classes (for example, java.lang String) is created, you’re thinking along the right lines.

Indeed, the primordial class loader, called the bootstrap class loader, comes from the core of the JVM and is written in native platform-dependent code. It loads the classes necessary for the JVM itself, such as those of the java.lang package, classes for Java primitives, and so forth. Application classes are loaded using the regular, user-defined class loaders written in Java—so, if needed, the developer can influence the processing of these loaders.

The Class-Loader Hierarchy

The class loaders in the JVM are organized into a tree hierarchy, in which every class loader has a parent. Prior to locating and loading a class, a good practice for a class loader is to check whether the class’s parent can load—or already has loaded—the required class.

This helps avoid doing double work and loading classes repeatedly. As a rule, the classes of the parent class loader are visible to the children but are not visible otherwise. This structure, which is based on delegation and visibility of the classes, allows for separation of the responsibilities of the class loaders in the hierarchy and makes the class loaders responsible for loading classes from a specific location only.

Let’s look at this hierarchy of class loaders in a Java application and explore what classes they typically load. At the root of the hierarchy, Java is the bootstrap class loader. It loads the system classes required to run the JVM itself. You can expect all the classes that were provided with the JDK distribution to be loaded by this class loader. (A developer can expand the set of classes that the bootstrap class loader will be able to load by using the -Xbootclasspath JVM option.)

Note that even though the library might be put on the boot classpath, it won’t be automatically loaded and initialized. Classes are loaded into the JVM only on demand, so even though classes might be available for the bootstrap class loader, the application needs to access them to trigger their actual loading. (A curious aspect of this loading process is that you can override JDK classes if your JAR file is prepended to the boot classpath. While this is almost always a poor idea, it does open a door to potentially more-powerful tools.)

A sort of child of the bootstrap class loader is the extension class loader, which loads the classes from the extension directories (explained in a moment). These classes may be used to specify machine-specific configuration such as locales, security providers, and such. The locations of the extension directories are specified via the java.ext.dirs system property, which on my machine is set to the following:

/Users/shelajev/Library/Java/Extensions:/Library/

Java/JavaVirtualMachines/jdk1.8.0_40.jdk/Contents/

Home/jre/lib/ext:/Library/Java/Extensions:/Network/

Library/Java/Extensions:/System/Library/Java/ 

Extensions:/usr/lib/java

By changing the value of this property, you can change which additional libraries are loaded into the JVM process.

Next comes the system class loader, which loads the application classes and the classes available on the classpath. Users can specify the classpath using the -cp property.

Both the extension class loader and the system class loader are of the URLClassloader type and behave in the same way: delegating to the parent first, and only then finding and resolving the required classes themselves, if need dictates.

The class-loader hierarchy of web applications is a bit more complicated. Because multiple applications can be deployed simultaneously to an application server, they need to be able to distinguish their classes from each other. So, every web application uses its own class loader, which is responsible for loading its libraries.

Such isolation ensures that different web applications deployed to a single server can have different versions of the same library without conflicts. So the application server automatically provides every web application with its own class loader, which is responsible for loading the application’s libraries. This arrangement works because the web application class loader will try to locate the classes packaged in the application’s WAR file first, rather than first delegating the search to the parent class loader.

Finding the Right Class

In general, if multiple classes with the same fully qualified name are available to the JVM, the conflict resolution strategy is simple and straightforward: the first appropriate class wins. The URLClassloader, which most of the class loaders extend from, will traverse the directories in the order they are given on the classpath and load the first class it finds that has requested the class name.

The same goes for JAR files that share the same name. The JAR files will be scanned in the order in which they appear in the classpath, not according to their names. If the first JAR file contains an entry for the required class, the class will be loaded. If not, the classpath scan will continue and reach the second JAR file.

Naturally, if the class isn’t found anywhere on the classpath, the ClassNotFound exception will be thrown.

Usually, relying on the order of directories in the classpath is a fragile practice, so instead the developer can add the classes to -Xbootclasspath to ensure that they will be loaded first. There’s nothing in particular wrong with this approach, but maintaining a project that relies on a polluted boot classpath requires work. Intuition about where the classes are loaded from will be broken, and everyone will be confused.

A better practice is to resolve the confusion at its root and figure out why there are multiple classes with the same name on the classpath. Maybe upgrading some dependency version, cleaning the caches, or running a clean build will be enough to get rid of the duplicates.

Resolution, Linking, and Verification

After a class is located and its initial in-memory representation created in the JVM process, it is verified, prepared, resolved, and initialized.

◉ Verification makes sure that the class is not corrupted and is structurally correct: its runtime constant pool is valid, the types of variables are correct, and the variables are initialized prior to being accessed. Verification can be turned off by supplying the -noverify option. If the JVM process does not run potentially malicious code, strict verification might not be required. Turning off the verification can speed up the startup of the JVM. Another benefit is that some classes, especially those generated on the fly by various tools, can be valid and safe for the JVM but unable to pass the strict verification process. In order to use such tools, the developer should disable this verification, which is often acceptable to do in a development environment.

◉ Preparation of a class involves initializing its static fields to the default values for their respective types. (After preparation, fields of type int contain 0, references are null, and so forth.)

◉ Resolution of a class means checking that the symbolic references in the runtime constant pool actually point to valid classes of the required types. The resolution of a symbolic reference triggers loading of the referenced class. According to the JVM specification, this resolution process can be performed lazily, so it is deferred until the class is used.

◉ Initialization expects a prepared and verified class. It runs the class’s initializer. During initialization, the static fields are initialized to whatever values are specified in the code. The static initializer method that combines the code from all the static initialization blocks is also run. The initialization process should be run only once for every loaded class, so it is synchronized, especially because the initialization of the class can trigger the initialization of other classes and should be performed with care to avoid deadlocks.

Other Considerations About Class Loaders

The class-loading model is the central piece of the dynamic operations of the Java platform. Not only does it allow for dynamic location and linking of classes at runtime, but it also provides an interface for various tools to hook into the application.

In addition, many security features rely on the class-loader hierarchy for permission checks. For example, the famous method sun.misc.Unsafe.getUnsafe() successfully returns an instance of the Unsafe class if it is called from a class that was loaded by the bootstrap class loader. Because only system classes are returned by this loader, every library that uses the Unsafe API must rely on the Reflection API to read the reference from a private field.

Source: oracle.com

Related Posts

0 comments:

Post a Comment