Java Virtual Machine (or JVM for short) is a platform-dependent software that allows you to execute programs written in languages like Java. Languages such as Scala and Kotlin utilize JVM for execution and are also often referred to as JVM languages for this reason. Code written in these languages is often identified via their file extensions such as .java and .scala. Compiling source files of these languages results in .class files, which are a special representation of your source code and contain information necessary for successful execution. Each class file begins with the magic number 0xCAFEBABE, which helps identify this format.
This is how a class file is represented as per the Java Virtual Machine Specification:
ClassFile {
u4 magic;
u2 minor_version;
u2 major_version;
u2 constant_pool_count;
cp_info constant_pool[constant_pool_count-1];
u2 access_flags;
u2 this_class;
u2 super_class;
u2 interfaces_count;
u2 interfaces[interfaces_count];
u2 fields_count;
field_info fields[fields_count];
u2 methods_count;
method_info methods[methods_count];
u2 attributes_count;
attribute_info attributes[attributes_count];
}
Note: The sizes are represented as values of type ux, where x is an exponent of 2. For example, u2 is a value that takes up 2 bytes or 16 bits, and u4 is 4 bytes or 32 bits. You can use javap to generate a readable representation of a class file.
javac Main.java
javap -c -v Main
Constant Pool
The constant pool of a class is a sort of a key-value store containing entries for things like String constants, as well as references to all classes and methods that are referenced by the class. The type of each constant pool entry is indicated by a single byte falling in the integral range [1, 18], often referred to as a “constant pool tag”.
Consider the following snippet:
/ Main.java
class Foo {
public void bar() {
}
}
public class Main {
public static void main(String[] args) {
Foo f = new Foo();
f.bar();
String lang = "java";
}
}
The constant "java" is stored in the constant pool as:
#11 = Utf8 java
You can generalize the format as:
#index = type value
You will also find information on classes and methods used within this class in its constant pool:
// Main.class
#6 = Utf8 ()V
#7 = Class #8 // Foo
#8 = Utf8 Foo
#9 = Methodref #7.#3 // Foo.'<init>':()V
#10 = Methodref #7.#11 // Foo.bar:()V
#11 = NameAndType #12:#6 // bar:()V
#12 = Utf8 bar
Class references (Indicated by the Class type) are composed only of one simple Utf8 entry, signifying the name of the referenced class. Method references (MethodRef entries) are more complex, and are of the form <Class>.<NameAndType>. The NameAndType entry is again composed of two Utf8 entries, i.e. the name of the method and its descriptor.
Any entry that references another entry will contain an index pointing to that other entry. For example, at index 7 is this entry: #7 = Class #8 // Foo. This entry refers to a class whose name is contained in index 8. The entry in index 8 is a Utf8 entry with the name of the class, Foo.
Any index referenced by some entry in the constant pool must be a valid index of only that constant pool.
Introduction to bytecode representation
The readable representation of the bytecode for the main method in the above example obtained via javap is:
0: new #7 // class Foo
3: dup
4: invokespecial #9 // Method Foo.'<init>':()V
7: astore_1
8: aload_1
9: invokevirtual #10 // Method Foo.bar:()V
12: ldc #13 // String java
14: astore_2
15: return
The comments you see here are clarifications inserted by javap and do not appear in the constant pool.
Each line of a method’s representation describes a single bytecode instruction in the following format:
offset: instruction arg1, arg2
You may have noticed that the instruction offsets shown here are discontinuous. The first instruction is at 0, while the second one starts at 3. This is because instructions may have any number of operands embedded in bytecode. For example, the invokespecial instruction requires one 2-byte operand. Similarly, the new instruction at the start takes a 2-byte operand which occupies space represented by the offsets 1 and 2, which is why 3 is the next available offset for an instruction.
Note: Bytecode is represented as a byte array and its offsets are not the same as constant pool indices.
Method invocation
JVM uses certain instructions such as invokevirtual, invokespecial, and invokestatic to invoke methods depending on their nature. For example, constructors are invoked via invokespecial, static methods via invokestatic, and other methods via invokevirtual. Instructions such as invokeinterface and invokedynamic fall outside this blog’s scope.
Let’s take a closer look at the invokevirtual instruction in the listing for main:
9: invokevirtual #10 // Method Foo.bar:()V
In the example above, invokevirtual is at offset 9. It takes one 2 byte operand, whose contents are located at offsets 10 and 11. invokevirtual‘s operand is interpreted as the index of a MethodRef entry in the class’s constant pool. The value of the index specified is 10, meaning the tenth entry in the constant pool. javap has helpfully included the value of that entry for us as a comment — Method Foo.bar:()V. We now have all the information required for the JVM to invoke the specified method, Foo.bar(). Arguments are passed to the invoked method beforehand by pushing values onto the operand stack using instructions from the *const and *load families.
Note: Here, we say *load because this instruction can be considered to be an entire family of instructions. Depending on its prefix we can interpret it as loading an integer, a floating point constant, or even an object reference. The same principle applies to the *const family, except with only integer and floating point types (And, as a special case of a constant value, null). Examples of instructions in this family are: aload, iload, fload, etc.
Control flow
if conditions, loops, and unconditional jumps are important parts of control flow. Let’s take a look at how the JVM executes each of these.
Pre-requisites: Local array and stack
Every method has a small space allocated to it within the Java call stack called a frame. Frames store local variables, the operand stack for the method and also the address of the constant pool of the method’s containing class.
Source: javacodegeeks.com
0 comments:
Post a Comment