Monday, August 14, 2023

Inside the JVM: Arrays and how they differ from other objects

Arrays are unique objects inside the JVM, and understanding their structure makes for better coding.


The simplest way of classifying Java data items is to divide them into primitives and objects. Primitives, as most Java developers know, comprise booleans, bytes, chars, the integer variants (short, int, and long), and the floating-point variants (floats and doubles). Inside the JVM, these primitives are instantiated in a raw form. The declaration of an int creates a 32-bit signed integer field for the JVM to work with. These primitives are most often created on the operand stack that is constructed for every method invocation. (The notable exception is static primitives, which are created on the heap.)

Inside the JVM: Arrays and how they differ from other objects

In contrast to the simply allocated primitives, objects are entities that surround the data item with methods and sometimes with additional supporting fields. For example, a String object contains an array (which I’ll discuss shortly) that holds the contents of the string and supplementary fields that are used by a variety of methods defined for String. Objects are created on the heap. That is, they are allocated from free memory.

All objects—except arrays—have a constructor. If the source code does not define a constructor for a new object, a no-parameter constructor is created for it by the Java compiler. Most often, this constructor calls the default constructor in the Object class, which simply returns—that is, it does nothing.

The nature of arrays


Arrays are objects. However, inside the JVM, arrays are notably different from all other objects. The first major difference is that arrays are created by the JVM—not by an implicit or explicit call to new() by the developer. When the Java compiler first comes upon a set of brackets attached to a variable name, it emits a specific bytecode that tells the JVM to create an array. The compiler also specifies the kind of data items the array will hold (either primitives or objects) and how many dimensions the array has.

The JVM next creates an array of the appropriate size and type and wraps it up as an object. That is, all the methods available in Object—which arrays inherit—for example, toString(), are available to arrays. The elements of the last dimension of a newly created array are initialized to the default value for the data type (zero for the numeric types, null for objects).

Initializing arrays


As mentioned previously, arrays lack constructors. No default constructor is created by the Java compiler, and no constructor can be specified by the developer. One implication of this is that arrays must be initialized explicitly to their desired values. This is typically done through a for loop or directly at the time of the array declaration, as follows:

importantYears = new int[] {800, 1066, 1492,};

(Note that the comma after the last value is accepted in Java and won’t cause an error.) Java does not allow initialization of selected elements using the previous syntax. You must initialize a specific element individually.

Another curiosity of Java arrays is that they can have a size of zero.

unimportantYears = new int[0];

This code will not result in an error message. This surprising feature is used primarily by code generators, which might create an array and then discover there are no values to place in it. In this example, unimportantYears is not null; instead, it’s an empty array. In the same manner, a zero-length string is not null, but rather it’s a viable object.

Multidimensional arrays


While single-dimension arrays have their quirks, multidimensional arrays contain a lot more curious magic. Here is an example of a three-dimensional array, representing the x, y, and z dimensions of the three values for a financial transaction.

points = new int[2][3][4]; // a point or 1% in interest

When the compiler encounters this code, it emits a unique bytecode, MULTIANEWARRAY, which creates an array with dimensions that are each set to the specified size. This array is implemented as an array of arrays. That is, the first two dimensions contain only pointers to other arrays. So, for example, when you access the data item at 1, 2, 0, the 1 points not to a series of values but to an array of pointers to arrays of pointer values. Those values each point to yet another array—the array of integers. Put another way, points is an array of two pointers to arrays of three pointers to arrays of four ints. Figure 1 shows this design pictorially.

Inside the JVM: Arrays and how they differ from other objects
Figure 1. A three-dimensional array as it’s created inside the JVM

If you think of this design as a tree, you’ll note that only leaf arrays contain actual values. This is somewhat counterintuitive. Two-dimensional arrays are often thought of as tables. (Strictly speaking, there is no tabular analogy in the JVM’s representation of a two-dimensional array; it’s not a rows-and-columns construct.)

This design has important performance implications. The first is that to access an individual element in this example array, the JVM must dereference three pointers to get to the integer. For accessing individual values intermittently, that process represents little overhead. However, for multidimensional arrays where you are frequently updating all the values at once, such as via a for loop, the numerous dereferencing of pointers incurs significant overhead.

One way to reduce this overhead is to consider unfolding the arrays into a single-dimension array. For example, make it a 1 x 24 array and then map the three coordinates yourself to the intended element in the array. Then, updating all the values in the array can be done quickly with greatly reduced overhead. As with all things, performance should be measured carefully to make sure the trade-off is worthwhile.

Array size and the concept of arrays of arrays


Many Java collections have a method called size(), which returns an integer stating the number of elements in the collection. Arrays have no such method. There are several reasons for this, but the principal one is that arrays are simple Object instances—they are not collections. The Object class has no size() method, so arrays don’t either.

Arrays, however, have a property called length, which can be queried to get the number of elements in the specified array. In a single-dimension array, such as the first example in this article, the following code would be equal to 3.

importantYears.length

With multidimensional arrays, the same query gives a perhaps unexpected result. Using the previous points array, points.length is equal to 2, rather than the value of 24 that you might expect. The reason is that points is considered only as an array of two elements (which happen to be pointers to other arrays). If you want to get the size of all the dimensions, you need to write the following:

System.out.printf("\n Length of points: %d", points.length);
System.out.printf("\n Length of points[0]: %d", points[0].length);
System.out.printf("\n Length of points[0][0]: %d", points[0][0].length);

The code above prints the following:

Length of points: 2
Length of points[0]: 3
Length of points[0][0]: 4

As you can see, what you have is truly three arrays, working together to create the equivalent of a three-dimensional array. So, to get the size of each dimension, you need to specify exactly which dimension you want. (It’s somewhat counterintuitive that the zero dimension is not the first one in the array.)

Here’s an interesting question: What would happen in a multidimensional array if one of the dimensions were declared with a size of 0? For example,

strangePoints = new int[3][4][0][2]

In this declaration, all dimensions after the zero-size dimension are ignored. So, the result of this declaration is equivalent to a two-dimensional array of ints. This makes sense because a zero-size dimension would contain no pointers, so it’d be unable to point to subsequent layers.

Back inside the JVM


Eagle-eyed readers of my earlier statement about length being a field rather than a method call might wonder how a direct subclass of Object would have a field called length to begin with, as Object has no such field. The answer is that there is a little magic going on inside the Java compiler. When the compiler detects a reference to the length of an array, it emits a special bytecode, ARRAYLENGTH, which obtains the length of the array and returns it. This looks and behaves like a method call, but all method calls in the JVM require one of a small set of bytecodes, and they are implemented via the creation of a new frame with stack allocation and several other operations. None of that happens with this special bytecode.

No other Java objects have a corresponding bytecode for determining their size. This is just one of the many aspects that make arrays entirely unique entities inside the JVM. So, now when you code arrays, you’ll know there’s magic going on, and you’ll understand how to use the magic to get the behavior you’re looking for.

Source: oracle.com

Related Posts

0 comments:

Post a Comment