Wednesday, January 4, 2023

Quiz yourself: The three-argument overload of the Stream API’s reduce method

Oracle Java, Oracle Java Exam, Oracle Java Tutorial and Materials, Oracle Java Certification, Oracle Java Guides

There are three reduce overloads; you should know what they do.

Imagine you have the following Person record:

record Person(String name, Integer experience) {}

Your colleague wrote the following code to calculate the total experience of all the people in the stream:

public static Integer calculateTotalExperience(Stream<Person> stream) {

  return stream.reduce(Integer.valueOf(0),

    (sum, p) -> sum += p.experience, // line n1

    (v1, v2) -> v1 * v2);     // line n2

}

To test the code, your colleague used the following test case, which produced a total experience of 15 years:

Person p1 = new Person("P1", 3);

Person p2 = new Person("P2", 3);

Person p3 = new Person("P3", 4);

Person p4 = new Person("P4", 5);

List<Person> list = List.of(p1, p2, p3, p4);

Integer totalAge = calculateTotalExperience(list.stream());

Which statement is correct? Choose one.

A. Line n1 contains an error.

B. Line n2 contains an error.

C. Both lines n1 and n2 contain errors.

D. The code is properly constructed for calculating the total experience.

Answer. This question investigates the three-argument overload of the reduce method in the Stream API.

In the Stream API, the reduce method creates a single result by taking the elements of the stream one at a time and updating an intermediate result. When all the stream data has been used, that intermediate result is considered final.

There are three reduce overloads, and it’s helpful to discuss all of them, since they introduce the key concepts sequentially.

The first overload takes a single argument that’s a BinaryOperator. That operator combines pairs of items of the stream data type into one item of the same type. Then the next stream item is combined with that intermediate result, and this is done repeatedly until all the stream data has been used. If the stream is empty, there can’t be a result in the normal way. Because of that, this overload returns an Optional that either contains the result of a nonempty stream or is itself empty to indicate no result.

The two-argument overload of reduce also takes a value of the result type. This is called the identity value and must have a couple of properties. First, it represents the result value if the stream is empty. Second, it should be possible to incorporate this value into the binary operator’s calculations any number of times without changing the final result. So, for simple addition, this identity value would be zero. For multiplication, it would be one. Because the identity value is provided, this overload does not need to return an Optional, and instead it returns a value of the stream type under all normal circumstances.

The third overload, which takes three arguments, is the topic of this question. This overload is used when the result is not of the same type as the stream data. In other APIs, an equivalent method might go by another name, perhaps involving the word fold or aggregate.

The operation of this three-argument reduction takes an identity value of the result type, rather than the stream type. It also takes a BiFunction operation that combines a value of the result type with a value of the stream type and produces a new value of the result type. This works well but has a problem in a parallel configuration of the stream.

In parallel mode, each of the separate threads that work on the reduction produce a partial result derived from just some of the stream’s data. To get to a final result, these partial results must be combined. This is the purpose of the third argument, which is a BinaryOperator of the result type. The signature of this method is as follows:

<U> U reduce(U identity,

   BiFunction<U, ? super T, U> accumulator,

   BinaryOperator<U> combiner);

In the general case of a stream running in sequential mode, there won’t be multiple partial results across multiple threads. Consequently, the combiner operation won’t be needed in a stream running in sequential mode. This fact turns out to be important to answering this question.

In the code presented here, identity is an Integer containing zero. This is the correct value for the identity value of an addition operation.

The accumulator operation on line n1 is provided by the following lambda:

(sum, p) -> sum += p.experience

This code uses the += assignment operator to add the current stream item’s experience field to the sum so far. This might look suspect, for two reasons.

◉ First, the sum is an Integer object, and that type is immutable. However, in Java, functions and lambda formal parameters are mutable by default, and the expression actually modifies the value of the sum to refer to a newly created Integer object. So this concern is unfounded.

◉ The second potential concern is that the lambda must implement a BiFunction that returns an Integer object, but this lambda lacks an obvious value to return. Of course, in Java, assignment operators have value. So the value of the expression sum += p.experience is actually the value assigned to the sum. That’s the correct value, so this lambda is correct both syntactically and semantically. Therefore, there’s no error on line n1, and options A and C are both incorrect.

Next, consider the combiner provided on line n2. This has the job of adding up the intermediate sums that might be created in separate threads if the stream were executed in parallel mode. However, it should be calculating a sum, not a multiplication, so that’s clearly a logical error. This tells you that option B is correct and, consequently, that option D is incorrect. Even though the code produces the right answer, it is not correctly written.

As a side note, the Java documentation for the Collector interface mentions a similar situation to the one described in this quiz.

A sequential implementation of a reduction using a collector would create a single result container using the supplier function and invoke the accumulator function once for each input element. A parallel implementation would partition the input, create a result container for each partition, accumulate the contents of each partition into a sub-result for that partition, and then use the combiner function to merge the subresults into a combined result.

There doesn’t seem to be an equivalent statement for the reduce operation, but clearly the expectation is that the combiners will typically not be invoked when a stream runs sequentially. This also explains how the code generated the correct result when your colleague ran the test.

Of course, although there’s no obvious reason why it would be useful, there does not appear to be any guarantee that the combiner must not be used in a sequential mode. So a developer must not assume that the combiner will be unused. By simply changing the stream in this example to parallel mode, you should expect to get incorrect results.

Conclusion. The correct answer is option B.

Source: oracle.com

Related Posts

0 comments:

Post a Comment