Monday, March 6, 2023

Pseudorandom numbers in Java, Part 2: Randomness with Java 17


Java 17 introduced significant changes to the platform’s pseudorandom number generation capabilities. The first article in this series, “Pseudorandom numbers in Java, Part 1: The Background,” offers a discussion of pseudorandom number generators (PRNGs) and how they work. This article discusses the changes made to PRNGs in Java 17.

Pseudorandom Numbers, Java 17, Oracle Java Tutorial and Materials, Core Java, Java Career, Java Skills, Java Jobs, Java Certification, Java Prep, Java Prepararation

Random number generation was perhaps not given quite enough love in early releases of Java. This inspired a Wikipedia discussion of poor-quality PRNGs that noted

As an illustration, consider the widely used programming language Java. Up until 2020, Java still relied on a linear congruential generator (LCG) for its PRNG, which is of low quality…

Developers have been able to choose between several random generators in Java. The best-known is java.util.Random; another well-known interface, java.lang.Math.random(), simply delegates to java.util.Random.nextDouble().

Java 17 introduced a new interface, java.util.random.RandomGenerator, to consolidate the implementations of existing and new random number generators. The new code is in the java.util.random package, which is also new in Java 17. RandomGenerator has the same methods as java.util.Random, including both the legacy methods and the Java 8 methods that generate Java streams of randoms of various types. For example, for double values, the new interface contains the methods described in Listing 1.

Listing 1. Methods from RandomGenerator with double and DoubleStream values

public default double nextDouble();
    public default double nextDouble(double min);
    public default double nextDouble(double min, double max);
    public default DoubleStream doubles();
    public default DoubleStream doubles(double min, double max);
    public default DoubleStream doubles(long count);
    public default DoubleStream doubles(long count, double min,
        double max);
    public default double nextGaussian();
    public default double nextGaussian(double min, double max);
    public default double nextExponential();

There are similar methods for int and long numeric types. Raw bytes and booleans do not get the same treatment; there is only void nextBytes(byte[]) and boolean nextBoolean().

The legacy classes java.util.Random and java.util.SecureRandom were backfitted to implement this interface, so if you switch the declaration to the new interface, the legacy implementations become interchangeable with the new generators. The legacy classes’ algorithms were not improved, however, to avoid breaking Java’s ever-important backwards compatibility. In fact, given the same seed, the java.util.Random class in both Java 11 and Java 17 will output exactly the same sequence.

In the following example, Random5 is a modified version of the random histogram program shown in Part 1 of this series, which starts with a fixed seed. It should (and does) produce the same stream with both implementations. The withjava command is a script of my own that I use to run Random5 on a particular version of Java.

$ withjava 11 java -Xms4G -Xmx4G Random5.java
Using JAVA_HOME=/usr/local/jdk-11 java -Xms4G -Xmx4G Random5.java in /home/ian/git/Randomness/src/main/java/com/darwinsys/random/15vs17
First 5 random ints are:
1553932502 -2090749135 -287790814 -355989640 -716867186
Generating 100000000 randoms using Random
Generating randoms took 6640 mSec
$ withjava 17 java -Xms4G -Xmx4G Random5.java
Using JAVA_HOME=/usr/local/jdk-17 java -Xms4G -Xmx4G Random5.java in /home/ian/git/Randomness/src/main/java/com/darwinsys/random/15vs17
First 5 random ints are:
1553932502 -2090749135 -287790814 -355989640 -716867186
Generating 100000000 randoms using Random
Generating randoms took 7448 mSec
$

In my tests, over multiple runs, the Java 17 implementation was consistently 10% to 12% slower than the Java 11 version, while generating exactly the same outputs.

SecureRandom has always been about generating better random numbers for use in security-related software. The implementation is intended to meet the requirements specified in section 4.9.1 of the US National Institute of Standards and Technology (NIST) FIPS PUB 140-2, “Security Requirements for Cryptographic Modules,” and the Internet Engineering Task Force (IETF) RFC 4086, “Randomness Requirements for Security.”

Java has two other legacy RandomGenerator implementations: ThreadLocalRandom (from Java 1.7) and SplittableRandom (from Java 1.8). These are listed in the Java 17 RandomGenerator documentation. Of these, ThreadLocalRandom is meant for use in multithreaded designs, and the SplittableRandom class is designed for use in fork/join algorithms where each split should have its own independent PRNG. These two do not meet the security requirements mentioned above; SecureRandom should be used when security is needed.

It turns out that most of the methods of the RandomGenerator interface have default methods coded in terms of nextLong() (the only abstract method in the interface); a complete (but not very usable) implementation of this interface is shown in Listing 2.

Listing 2. A minimal implementation of Java 17’s RandomGenerator

public class CrappyRandom implements RandomGenerator {

    // The attacker has a hard time guessing the
    // exact nanosecond at which this is initialized.
    private long x0 = (21 * System.nanoTime()) & 0xffff;
    private long a = (long) Math.pow(2,7) + 1;
    private long c = 1024;   // The increment
    private long mod = (long) Math.pow(2,32);  // The modulus
    private long lastVal = x0;

    public void setKnuthBadParams() {
        // As per Knuth , to get 7 6 9 0, 7 6 9 0, ...
        mod = 10;
        lastVal = x0 = a = c = 7;
    }

    @Override
    /**
     * This is a crude implementation just to get going.
     * Tests will reveal its quality (or lack thereof).
     */
    public long nextLong() {
        return lastVal = (a * lastVal + c) % mod;
    }
}

Important note: Please do not use this sample code for anything! It produces terribly bad random numbers. It is just here to show that if you can write a decent algorithm, the interface will massage its results to provide all the other types needed to complete the interface.

To avoid the poor behavior of LCG algorithms used on their own, Java 17 introduces a new class of random number algorithms called LXM. The name stands for “LCG + XOR + Mix,” representing the three stages: an LCG, an XOR subgenerator, and a mixer to combine them. The first two stages are based on the best-available LCG values and an exclusive-or (XOR) function; this combination has been shown to provide the best results.

To avoid creating dozens of new public classes for this family of algorithms, Java 17 introduces the new RandomGeneratorFactory class, which allows you to create an implementation of a particular algorithm, as follows:

String algorithm = "L64X1024MixRandom";
RandomGenerator genx = RandomGeneratorFactory.of(algorithm).create();
genx.doubles(1000).doSomethingWithValues();

These algorithms are implemented by nonpublic classes with the same names in the private jdk.random package. For example, the snippet above assigns genx to an instance of jdk.random.L64X1024MixRandom. The list of new algorithms is

  • L32X64MixRandom
  • L32X64StarStarRandom
  • L64X128MixRandom
  • L64X128StarStarRandom
  • L64X256MixRandom
  • L64X1024MixRandom
  • L128X128MixRandom
  • L128X256MixRandom
  • L128X1024MixRandom
  • Xoshiro256PlusPlus
  • Xoroshiro128PlusPlus

The RandomGenerator interface includes a getDefault() default method, which, as expected, returns a usable implementation of the interface without specifying too much about its behavior. In Java 17 on my system, it returns a jdk.random.L32X64MixRandom, but that is not documented and should not be counted on. However, note that getDefault() will be a decent choice unless you need one of the others.

RandomGeneratorFactory also has an all() method that returns a stream of all the providers. Running my simple ShowAllGenerators program displays the following:

Group Name                          Period
      LXM L128X1024MixRandom          Infinity
      LXM L128X128MixRandom         1.15792e+77
      LXM L128X256MixRandom         3.94020e+115
      LXM L32X64MixRandom           7.92282e+28
      LXM L64X1024MixRandom           Infinity
      LXM L64X128MixRandom          6.27710e+57
      LXM L64X128StarStarRandom     6.27710e+57
      LXM L64X256MixRandom          2.13599e+96
   Legacy Random                    2.81475e+14
   Legacy SecureRandom                 0.00000
   Legacy SplittableRandom          1.84467e+19
  Xoshiro Xoshiro256PlusPlus        1.15792e+77
Xoroshiro Xoroshiro128PlusPlus      3.40282e+38

The few algorithms that show Infinity for their period in the output aren’t truly infinite, of course, but they are too large to display in a double. (The RandomGeneratorFactory.period() method actually returns a BigInteger, but I displayed them as double to show the scale instead of a vast run of digits.)

If you know which algorithm you want, you can use the of method, as in the following:

RandomGenerator random = RandomGenerator.of("L128X128MixRandom");

Choosing the best algorithm


The following is excerpted verbatim from the documentation for the java.util.random package, which explains the trade-offs clearly and concisely. So there’s no reason for me to paraphrase it.

There are three groups of random number generator algorithm [sic] provided in Java: the Legacy group, the LXM group, and the Xoroshiro/Xoshiro group.

The legacy group includes random number generators that existed before JDK 17: Random, ThreadLocalRandom, SplittableRandom, and SecureRandom. Random (LCG) is the weakest of the available algorithms, and it is recommended that users migrate to newer algorithms. If an application requires a random number generator algorithm that is cryptographically secure, then it should continue to use an instance of the class SecureRandom.

The algorithms in the LXM group are similar to each other. The parameters of each algorithm can be found in the algorithm name. The number after “L” indicates the number of state bits for the LCG subgenerator, and the number after “X” indicates the number of state bits for the XBG subgenerator. “Mix” indicates that the algorithm uses an 8-operation bit-mixing function; “StarStar” indicates use of a 3-operation bit-scrambler.

The algorithms in the Xoroshiro/Xoshiro group are more traditional algorithms (see David Blackman and Sebastiano Vigna, “Scrambled Linear Pseudorandom Number Generators,” ACM Transactions on Mathematical Software, 2021); the number in the name indicates the number of state bits.

For applications (such as physical simulation, machine learning, and games) that do not require a cryptographically secure algorithm, this package provides multiple implementations of interface RandomGenerator that provide trade-offs among speed, space, period, accidental correlation, and equidistribution properties.

For applications with no special requirements, L64X128MixRandom has a good balance among speed, space, and period, and is suitable for both single-threaded and multi-threaded applications when used properly (a separate instance for each thread).

If the application uses only a single thread, then Xoroshiro128PlusPlus is even smaller and faster, and certainly has a sufficiently long period.

For an application running in a 32-bit hardware environment and using only one thread or a small number of threads, L32X64MixRandom may be a good choice.

For an application that uses many threads that are allocated in one batch at the start of the computation, either a “jumpable” generator such as Xoroshiro128PlusPlus or Xoshiro256PlusPlus may be used, or a “splittable” generator such as L64X128MixRandom or L64X256MixRandom may be used.

For an application that creates many threads dynamically, perhaps through the use of spliterators, a “splittable” generator such as L64X128MixRandom or L64X256MixRandom is recommended. If the number of generators created dynamically may be very large (millions or more), then using generators such as L128X128MixRandom or L128X256MixRandom, which use a 128-bit parameter rather than a 64-bit parameter for their LCG subgenerator, will make it much less likely that two instances use the same state cycle.

For an application that uses tuples of consecutively generated values, it may be desirable to use a generator that is k-equidistributed such that k is at least as large as the length of the tuples being generated. The generator L64X256MixRandom is provably 4-equidistributed, and L64X1024MixRandom is provably 16-equidistributed.

For applications that generate large permutations, it may be best to use a generator whose period is much larger than the total number of possible permutations; otherwise it will be impossible to generate some of the intended permutations. For example, if the goal is to shuffle a deck of 52 cards, the number of possible permutations is 52! (52 factorial), which is larger than 2225 (but smaller than 2226), so it may be best to use a generator whose period is at least 2256, such as L64X256MixRandom or L64X1024MixRandom or L128X256MixRandom or L128X1024MixRandom. (It is of course also necessary to provide sufficiently many seed bits when the generator is initialized, or else it will still be impossible to generate some of the intended permutations.)

There are other interesting methods in the factory class; running javap (or looking at the documentation) reveals them all.

$ javap java.util.random.RandomGeneratorFactory
Compiled from "RandomGeneratorFactory.java"
public final class RandomGeneratorFactory<T extends RandomGenerator> {
  static <T extends RandomGenerator> T of(String, Class<T>) throws IllegalArgumentException;
  static <T extends RandomGenerator> RandomGeneratorFactory<T> factoryOf(String, Class<T>) throws IllegalArgumentException;
  public static <T extends RandomGenerator> RandomGeneratorFactory<T> of(String);
  public static RandomGeneratorFactory<RandomGenerator> getDefault();
  public static java.util.stream.Stream<RandomGeneratorFactory<RandomGenerator>> all();
  public String name();
  public String group();
  public int stateBits();
  public int equidistribution();
  public java.math.BigInteger period();
  public boolean isStatistical();
  public boolean isStochastic();
  public boolean isHardware();
  public boolean isArbitrarilyJumpable();
  public boolean isJumpable();
  public boolean isLeapable();
  public boolean isSplittable();
  public boolean isStreamable();
  public boolean isDeprecated();
  public T create();
  public T create(long);
  public T create(byte[]);
}
$

The informational methods (such as isHardware() and isJumpable()) are in the RandomGeneratorFactory but not in the RandomGenerator instances themselves. The full list of informational methods is shown in Table 1.

Table 1. Informational methods in RandomGeneratorFactory

Name Meaning 
isStatistical() Returns true if the generated values are computed using an algorithm that is statistically deterministic.
isStochastic()  Returns true if the generated values are computed using some external or entropic input(s).
isHardware()  Returns true if the random number generator (RNG) is implemented as a hardware random number generator (HRNG). 
isJumpable()  Returns true if the RNG can jump ahead in the cycle of random data. 
isArbitrarilyJumpable()  Returns true if the RNG can jump arbitrarily far into the cycle of data. 
isLeapable()  Returns true if the RNG can jump to a very distant point in the cycle. 
isSplittable()  Returns true if the RNG can be cloned to make a second generator with the same properties but positioned further in the cycle. 
isStreamable();  Returns true if the RNG can output into a Java stream. 
isDeprecated()  Returns true if the implementation of RandomGenerator has been marked for deprecation. 

Suppose you wanted to determine which RandomGenerator instances are streamable, that is, which can emit random values as a stream.

// From ShowStreamableGenerators
RandomGeneratorFactory.all()
    .filter(rgFactory -> rgFactory.isStreamable())
    .map(rgFactory-> rgFactory.name())
    .sorted()
    .forEach(System.out::println);

As you might expect, all the new PRNGs are streamable, and all the legacy ones are not.

Testing randomness


Two of the main test suites for RNGs are TestU01 and PractRand.

TestU01, by Pierre L’Ecuyer and Richard Simard at the University of Montreal, includes a 200-page user manual that is highly recommended for those who are mathematically oriented; it’s useful even if you don’t plan to use the test suite. The authors note the following, in passing:

There is no universal test or battery of tests that can guarantee, when passed, that a given generator is fully reliable for all kinds of simulations. Passing many tests improves one’s confidence in the RNG, although it never proves that the RNG is foolproof. In fact, no RNG can pass every conceivable statistical test. One could say that a bad RNG is one that fails simple tests, and a good RNG is one that fails only very complicated tests that are extremely hard to find or impractical to run.

The LXM algorithms have been tested against both TestU01 and PractRand, and they passed. The work for Java 17 was done under JEP 356, Enhanced pseudo-random number generators, which contains more information on the choices made, the tests performed and passed, and so on.

Performance


If you need only a few random numbers, performance isn’t a big deal. But if you need millions or billions, speed matters. To measure the relative performance of each generator, I wrote a simple timing loop called TimeAllGenerators. I generated 10 million randoms with each generator to get measurable values.

dalai-LAPTOP-random $ java TimeAllGenerators.java | sort -k5n
Generating 10000000 randoms took 55 mSec using SplittableRandom
Generating 10000000 randoms took 56 mSec using Xoroshiro128PlusPlus
Generating 10000000 randoms took 57 mSec using L64X128MixRandom
Generating 10000000 randoms took 57 mSec using L64X128StarStarRandom
Generating 10000000 randoms took 58 mSec using L32X64MixRandom
Generating 10000000 randoms took 59 mSec using Xoshiro256PlusPlus
Generating 10000000 randoms took 63 mSec using L64X1024MixRandom
Generating 10000000 randoms took 65 mSec using L64X256MixRandom
Generating 10000000 randoms took 76 mSec using L128X128MixRandom
Generating 10000000 randoms took 77 mSec using L128X256MixRandom
Generating 10000000 randoms took 89 mSec using L128X1024MixRandom
Generating 10000000 randoms took 207 mSec using Random
Generating 10000000 randoms took 2571 mSec using SecureRandom

As you can see, the legacy implementation java.util.Random takes around four times as long as the more modern algorithms and gives less-reliable randomness. SecureRandom is 10 times slower but gives good randomness for security-based applications. If you need to generate a lot of random numbers, take this into account. If you need only a small number of randoms, any of the generators will do.

Migrating to the new APIs


Should you change to a newer API? Migrate if good random performance and reliability is important. If you are just using one or two random values, it may not be worthwhile. Otherwise, yes; you should migrate.

The first step in migrating would be to change

Random random = new Random();

to

RandomGenerator random = new Random();

Then, at least you are depending on the interface. A better option would be to change to the following:

RandomGenerator random = RandomGenerator.default();

Then, you would be using one of the newer implementations, which is chosen for you.

If you want to ask for a particular implementation, based on the performance figures I generated and on the notes on choosing a generator given previously, use the following:

RandomGenerator random = RandomGenerator.of("L64X1024MixRandom");

Finally, consider moving the code to a streams approach, since you now have a stream-capable random generator. Note that migrating iterative code to streams is a topic for another article. For that, I’ll refer you to Raoul-Gabriel Urma’s article, “Processing data with Java SE 8 streams, Part 1.”

In summary, Java 17 provides a new framework for building multiple implementations of PRNGs that provide a standard interface and whose implementations may more readily be exchanged. While the legacy implementations are not changed and still use LCGs, the new ones use the best-known LCG parameters and combine these with an XOR operation to be more random.

My final advice: For general use, if you want good randomness, use RandomGenerator.default() in new code, and consider migrating to that while maintaining legacy code. For security work, continue to use SecureRandom.

Source: oracle.com

Related Posts

0 comments:

Post a Comment