Showing posts with label Java Magazine. Show all posts
Showing posts with label Java Magazine. Show all posts

Saturday, October 14, 2023

Introducing the Visual Recognition spec for Java machine learning

Oracle Java, Oracle Java Career, Oracle Java Skills, Oracle Java Jobs, Oracle Java Prep, Oracle Java Preparation

The VisRec API, JSR 381, addresses common pain points for machine learning in Java.


There are not many machine learning (ML) coding options for Java developers, and the ML libraries currently available have several issues. Many are very complex and designed for data scientists, or they are Java wrappers around C/C++ libraries and don’t feel like Java tools.

Of course, one native Java library is the increasingly popular Tribuo. But that’s not all: For image recognition tasks, there’s JSR 381, Visual Recognition (VisRec) Specification. This JSR was released in early 2022 and was designed to address the following common pain points for ML in Java:

  • Many different incompatible data formats
  • Many different ML algorithms
  • Many confusing configuration parameters
  • Lack of a clear task-oriented interface that hides implementation details, such as Collections.sort()
  • Lack of simple and portable integration into existing Java applications and devices

The authors of the VisRec specification explained that “there are wide business implications for machine learning capabilities in all applications across many types of devices. VisRec is an important subset of ML. Right now, the primary language for ML is Python. We feel Java needs to play a major role in ML, starting with VisRec.”

The specification authors added that typically, the API will be used in conjunction with an ML engine, package, or set of libraries that “would execute on a server or set of servers (as most ML applications) and callable from either the server side or remotely from a distributed client (JavaFX, web, command-line).”

Common applications of visual ML include pattern recognition and classifications. Imagine training and then processing a series of images and videos.

  • How many dogs are in those still images? How many cats? How many lemurs? For videos, how many unique dogs were captured?
  • Is there a vehicle in a video, and if so, what type? Is it speeding? Did it go through a stop sign without stopping? Was it speeding through a neighborhood while people were attempting to legally cross the street? Could the registration plate be read?
  • Did a security camera show that a shopper at a self-checkout skipped scanning an item?
  • Was the product label correctly affixed to a product in an automated assembly facility?

Two of the JSR 381 coauthors, Frank Greco and Zoran Sevarac, had an online chat with Mala Gupta about the VisRec API and its goals. Greco is a senior consultant at Google and chair of the New York Java user group, JavaSIG. Sevarac is an associate professor at the University of Belgrade and a Java and AI deep-learning expert. Mala Gupta is a developer advocate at JetBrains and is a frequent Java Magazine author. All three are Java Champions.

Here are a few highlights from their conversation.

Gupta: What is the most common myth you hear from Java developers about ML in Java applications?

Greco: It’s that you must be a data scientist to do ML. Of course, there will be some experts, but the bulk of the people using ML are not going to be data scientists. The core of ML is recognizing patterns, building models based on that data, and making predictions. Python is the default when it comes to ML, but we wanted to create something for Java, too.

Gupta: What exactly is JSR 381?

Greco: From a high level, it’s a Java-friendly standard API. It’s not only for visual recognition; it’s also a generic ML API. It includes the usual high-level abstractions. It uses Java paradigms. It’s more readable. The bottom line: It was designed for Java developers.

Gupta: What was your primary goal when developing JSR 381?

Sevarac: The main goal was to make visual recognition and ML easy to use by nonexperts. Java developers should be able to use this API intuitively and use their Java skills to the maximum extent, so they don’t have to learn new things. Or, if they do need to learn new things, they can learn them along the way through coding. We were also aware that it was not possible for us to implement all potential use cases in all possible learning algorithms. Therefore, the important point is to stay open, so any existing ML libraries out there can implement this API without much difficulty. We have created a reusable design, which can be applied to other domains, too.

Gupta: Several Java ML libraries already exist. Why develop JSR 381?

Sevarac: There are many libraries with different sets of APIs, but none of them works well with the others. Each library tries to reinvent the wheel. Our idea was to create a standard API collection that would address the most typical use cases and that would be extendable, so any existing ML library could easily implement it.

Also, existing libraries support different specific sets of algorithms and there are no common abstractions, or they are not compatible with each other. When it comes to visual recognition, there are so many Java imaging libraries that use different images and classes, and some of them use native dependencies, which creates portability issues. Most of these libraries do not look like Java and are very complex for the average Java developer, not only because the API is not Java flavored but also because most application developers don’t have a background in data science or know how ML algorithms work and how to configure them. These are all issues we are trying to address with JSR 381.

Gupta: What are some of the common barriers for ML in Java that you tried to address with JSR 381?

Sevarac: We tried to create abstractions where you can specify, with generic parameters, what type of data you’re going to use. There are many different ML algorithms, and we tried to create abstractions for the most common ones. Existing and new libraries should consider implementing some of these abstractions.

Confusing configuration parameters are one of the biggest challenges for every ML algorithm. We tried to keep those to a minimum and provide reasonable defaults—or at least some starting points people can use to create models. Learning these parameters will come with experience.

One of the most important things we addressed is providing a clear, task-oriented interface. It is very simple and straightforward.

Also, we felt it was important to hide implementation details. Developers don’t necessarily want to know details about how an image is being stored in memory and the most efficient way calculations are performed on that image. That might be important for the people who are implementing the algorithms, but end users generally don’t need those details.

Finally, JSR 381 provides simple and portable integration into existing Java applications and devices because the reference implementation is pure Java. It does not have native dependencies. That makes it very portable. You get all the benefits of the Java ecosystem.

Gupta: How would you sum up JSR 381 and its impact on the Java community?

Sevarac: We believe we created something that can evolve into a friendly Java API for developers. We could not solve all problems for everyone, but I think this is a good starting point. I believe that with feedback from the community, we can move this forward.

Source: oracle.com

Wednesday, May 10, 2023

Quiz yourself: Nested lambdas and Java thunks

Quiz Yourself, Oracle Java, Oracle Java Career, Oracle Java Skills, Oracle Java Jobs, Oracle Java Learning, Oralce Java Preparation


Given the following three functional interfaces

interface Woo { public void wow(); }
interface Moo { public String mow(); }
interface Boo { public int bow(); }

and a class fragment

static void doIt(Woo w) {
  System.out.print("W");
  w.wow();
}
static String doIt(Moo m) {
  System.out.print("M");
  return m.mow();
}
static int doIt(Boo b) {
  System.out.print("B");
  return b.bow();
}
public static void main(String[] args) {
  doIt(() -> { doIt(() -> "done"); });
  doIt(() -> { doIt(() -> 1); });
}

What is the result? Choose one.

A. MMBB is printed.
B. WWWW is printed.
C. WMWB is printed.
D. MWBW is printed.
E. Compilation fails in the main method because the code is ambiguous.

Answer. This question investigates how lambdas take on a type that is compatible with their code and the context in which they are declared.

In this question, you are presented with several lambdas, including nested lambdas. That’s certainly a recipe for hard-to-read code, but the rules don’t change from the simple case. Look at these lambdas in turn and notice which interface each might be compatible with.

The two lambdas that are nested inside the others are () -> "done" and () -> 1.

It is clear that the first of these is compatible with the interface Moo because the method it defines takes no arguments and returns a String. Further, that first lambda is not compatible with either of the other interfaces because you can’t cast or promote a String to an int.

The second lambda is compatible with Boo because it declares a method that takes no arguments and returns an int. Again, casting or promotions can’t reconcile that lambda with either of the other two interface types.

Next, look at the outer lambdas: () -> { doIt(() -> "done"); } and () -> { doIt(() -> 1); }.

You know that the return type of the first enclosed lambda is String and that of the second is int. However, notice that these enclosing lambdas are block lambdas, that is, they include curly braces and must, therefore, define entire method bodies. However, the method bodies do not include return statements. So, in both cases, the enclosing lambda will invoke the enclosed lambda and ignore the returned value. The enclosing lambdas, therefore, have a void return type; as such, both are instances of the Woo interface.

At this point, you should have an instinct that since you have lambdas implementing all three interfaces, the output should contain all three letters: M, B, and W. If that instinct is correct, you can conclude that options A and B are looking unlikely. But that’s just a gut feeling, and while you might let that guide you if you’re running short of time in the exam, let’s trace the execution to determine what actually happens.

Think about the order in which the enclosing and enclosed lambdas bodies actually execute. In a Java method call, the arguments to a method invocation are always evaluated before the method is actually called. However, the value of a lambda is an object that contains the method the lambda describes. Java does not execute that method in constructing that object, and that means that when a lambda is passed as an argument, the method represented by the lambda has not been executed prior to the actual invocation of the method to which the lambda is being passed.

From this, you can tell that the very first lambda to be executed will be a Woo, which will print W. That one then delegates to the String-producing Moo and prints M. The process then repeats with a W from the second line’s enclosing lambda, followed by a B from the enclosed lambda that’s a Boo type. That results in the output WMWB.

Of course, the code compiles and runs, since all the lambdas validly and unambiguously satisfy one or other of the functional interfaces. Therefore, option E is incorrect. From the previous paragraph, you should conclude that the output is WMWB. Therefore, the correct answer is C and options A, B, and D are incorrect.

As a side note, you can use this technique to create the effect of lazy execution. When an exception is logged, for example, it is often quite computationally intensive to traverse all the frames in a stack, collect data, and concatenate the data into a log message—and very often, that log message is never used because the filtering level abandons it.

Java’s logging APIs allow passing a Supplier<String> to the log methods, so that if the message will not be used, it need never be evaluated. This idea is a design pattern that has a curious name; the pattern is commonly called a thunk. (It’s like the cartoonish past tense of to think, as in, “I had a thought, and the thought that I thunk was …” It’s very silly; don’t ask us why it exists, because we don’t know!)

This technique is sometimes implemented by a language (for example, Scala) such that the programmer simply writes a block of code that is wrapped in a thunk before being passed into an as-yet-unexecuted method. This is typically described as passing parameters by name.

It’s also interesting to consider what would have happened if the code had included explicit return statements.

public static void main(String[] args) {
  doIt(() -> { return doIt(() -> "done"); });
  doIt(() -> { return doIt(() -> 1); });
}

In this form, all four lambdas would return a value. The two in the first line return String and are, therefore, instances of Moo. The second two return int and are instances of Boo. The modified code would then print MMBB.

Conclusion. The correct answer is option C.

Source: oracle.com

Wednesday, April 12, 2023

Quiz yourself: Using the stream methods dropWhile and takeWhile in Java

Quiz Yourself, Oracle Java Certification, Oracle Java Tutorial and Materials, Java Guides, Java Learning, Java Skills, Java Jobs


Given the following method fragment

Collection<Integer> c = Set.of(1,2,3,4,5,6,7); // line 1
var r = c.stream()
  .dropWhile(e -> e < 5)
  .takeWhile(e -> e < 7)
  .count(); // line 2
System.out.println(r);

Which statement is correct? Choose one.

A. Line 1 will cause an error.

B. Replacing the second statement with the following code would produce the same output:

var r = c.stream()
   .takeWhile(e -> e < 7)
   .dropWhile(e -> e < 5)
   .count(); // line 2

C. Replacing the second statement with the following code would produce the same output:

var r = c.stream()
  .filter(e -> !(e < 5))
  .filter(e -> e < 7)
  .count(); // line 2

D. The code may print 0.

E. The code may print 7.

F.     The code always prints 2.

Answer. This question explores some foundational concepts for collections and streams and, in particular, the Stream API’s dropWhile and takeWhile methods. These methods were added in Java 9 and are legitimate territory for the Java 11 and Java 17 exams.

Consider the behavior of the dropWhile and takeWhile methods. Both take a single argument of type Predicate and both methods are intermediate operations that return a new stream.

The dropWhile method, when applied to an ordered stream, starts at the beginning of the stream and tests, in order, data items of the stream it’s reading from. If elements pass the test, they are dropped. So far, this behavior seems similar to that of the filter method, except the behavior removes items that pass the test rather than keeping items that pass the test. However, as soon as an item fails the test, dropWhile becomes a simple pass-through, and all subsequent items will continue down the stream without being tested. The documentation describes this as “dropping the longest prefix of elements.”

Therefore, if an ordered stream contains the following (with the parade of items progressing towards the right, so that 1 is the head of the stream)

10 9 8 7 6 5 4 3 2 1

and you apply dropWhile(e -> e < 5), the output of the dropWhile will be a stream such as the following

10 9 8 7 6 5

If you apply the same dropWhile to this sequence

10 9 8 7 6 5 4 3 2 1 10 9 8 7 6 5 4 3 2 1

the result will be a stream such as the following

10 9 8 7 6 5 4 3 2 1 10 9 8 7 6 5

Notice that the result still contains the second occurrences of 1 through 4 (and if more of these occurred later, they too would be present in the result).

The behavior of the takeWhile operation has parallels to this, except that the resulting stream ends as soon as any item in the stream fails the test specified in the predicate. So, if you start with this stream

10 9 8 7 6 5 4 3 2 1 10 9 8 7 6 5 4 3 2 1

and apply takeWhile(e -> e < 7) to it, the result is a stream like the following

6 5 4 3 2 1

Therefore, if you chain the dropWhile(e -> e < 5) and takeWhile(e -> e < 7) operations, and apply them to an ordered stream that looks like the following

7 6 5 4 3 2 1

the dropWhile(e -> e < 5) would yield

7 6 5

and then the takeWhile(e -> e < 7) would operate on that stream and produce

6 5

If you then count the elements, you’d get the value 2.

This is all very nice, except for two problems. First, the order of items drawn from most Set objects is not guaranteed and typically does not match the order in which the items were added to the set. The second problem is you don’t have an ordered stream and as mentioned, the discussion above applies only to ordered streams.

Consider why these two things are true. A set, from a math perspective, has the task of rejecting duplicate elements, but it does not have the task of maintaining user-defined order. (Note that some of Java’s Set implementations do maintain user-defined order, for example, the TreeSet. But the order in this case still isn’t the order in which items were added; it’s literally an ordering that applies to the elements.) The documentation for the Set interface describes the effect of Set.of and Set.copyOf with the following caveat:

The iteration order of set elements is unspecified and is subject to change.

Shortly, you’ll see why this is critical and contradicts the discussion above.

Some stream objects are considered to be ordered and others are not. If you have a source of data that has a meaningful (user-controllable) order, such as the elements of a List or the computations of Stream.iterate (which calculates the next element of a stream from the predecessor), the stream starts out ordered.

Meanwhile, the documentation for dropWhile states

If this stream is unordered, and some (but not all) elements of this stream match the given predicate, then the behavior of this operation is nondeterministic; it is free to drop any subset of matching elements (which includes the empty set).

The documentation for takeWhile has an equivalent statement.

In this quiz question, you have an unordered stream; therefore, you have no basis for making reliable predictions about what it will do. (If you try this code, it will behave consistently, but “It always works for me!” is not a sound way to create production quality code. You must be able to guarantee that it will still work when implementations change, and for that, you must take seriously the limitations called out in the documentation.)

If you don’t even know that this code will behave consistently, you cannot claim to have other code that will behave the same way, nor can you assert that it will always do anything in particular. For this reason, options B, C, and F are all incorrect.

This leaves you to consider whether line 1 compiles, and if so, what the code might possibly output.

Option A claims the code will not compile. There are three ways that might seem like a tempting option.

◉ Can you put primitives into the argument for Set.of? Yes, you can because they will be autoboxed to Integer type. Further, this is consistent with the variable type (Collection<Integer>) to which the result is assigned.

◉ The assignment of the Set.of<Integer> created the variable of type Collection<Integer>. This succeeds because Set is a subinterface of Collection, and the generic types are both Integer.

◉ There would be an exception at runtime if you call Set.of with duplicate values in the argument list. In this case, the arguments are all distinct, so there’s no problem.

From this you know that there are no errors, and option A is also incorrect.

Option D is correct. The documentation for takeWhile working on an unordered stream states “…it is free to take any subset of matching elements (which includes the empty set).” Therefore, it clearly is permissible for the operation to result in zero surviving elements, which would produce a count of 0.

If you ignore the restrictions of dropWhile and takeWhile and simply consider that you don’t know the iteration order of the elements taken from Set.of, you can still see how unexpected results might be achieved. Imagine that the stream elements from the set arrive in the following order:

6 5 4 3 2 1 7

In this case, the dropWhile(e -> e < 5) part will be false on the first element, and none of the elements will be dropped. After that, the takeWhile(e -> e < 7) part will also be false on the first element (which is still 7). Therefore, zero elements will proceed downstream to the count() operation, and therefore 0 would be printed. Because 7 can’t be printed, you can see that option E is incorrect.

Conclusion. The correct answer is option D.

Source: oracle.com

Monday, March 7, 2022

Curly Braces #1: Java and a project monorepo

In his debut Java Magazine column, Eric Bruno explores the benefits of keeping all your project elements in a single Git repository.

[Welcome to “Curly Braces,” Eric Bruno’s new Java Magazine column. Just as braces (used, for example, in if, while, and for statements) are critical elements that surround your code, the focus of this column will be on topics that surround Java development. Some of the topics will be familiar while others will be novel—and the goal will be to help you think more deeply about how to build Java applications. —Ed.]

Core Java, Oracle Java Preparation, Oracle Java Certification, Oracle Java Guides, Oracle Java Skills, Oracle Java Jobs

I recently explored a fairly new concept in code repository organization that some large companies have adopted, called the monorepo. This is a subtle shift in how to manage projects in systems such as Git. But from what I’ve seen, some people have strong feelings about monorepos one way or the other. As a Java developer, I believe there are some tangible benefits to using a monorepo.

First, I assume nearly everyone agrees that IDEs make it easy to build and test a multicomponent application. Whether you’re building a series of microservices, a set of libraries, or an application with distributed components, it’s straightforward to use NetBeans, IntelliJ IDEA, or Eclipse to import them, build dependencies, deploy, and run the result. As for external dependencies, tools such as Maven and Gradle handle them well.

It’s straightforward and common to have a single script to build a project and all its dependent projects, pull down external dependencies, and then deploy and even run the application.

By contrast, managing Git repositories is a tedious process to me. Why can’t I have an experience similar to an IDE across source code repositories? Well, I can and you can, and that’s the reason for the monorepo movement.

What is a monorepo, and why should you care?

Overall, I feel a monorepo helps to overcome some of the nagging polyrepo issues that bother me. The act of cloning multiple repos, configuring permissions, dealing with pushes across separate Git repos and directories, forgetting to push to one repo when I’ve updated code across more than one…phew. That is tedious and exhausting.

With the monorepo, you ideally place all of your code—every application, every microservice, every library, and so on—into a single repository. Only one.

Developers then pull down the entire bundle and operate on that one repo going forward.

Even as developers work on different applications, they’re working from the same Git repository, which means all pull requests, all branches and merges, tags, and so on take place against that one repo.

This has the advantage that you clone one repository for your entire organization’s codebase, and that’s it. That means no more tedium related to multiple repos, as described above. It also has other benefits, such as the following:

◉ Avoiding silos: Because they pull down the source for all internal applications and libraries, all developers have the means to make code changes and, indeed, they should be expected to. This removes silos, where only certain developers are permitted to maintain the code.

◉ Fewer pull requests: If you change a library, its interface, one or more applications that use that library, or the related documentation, you need only one pull request and merge with the monorepo compared to multiple requests if the elements lived in separate repos. This is sometimes referred to as an atomic commit.

◉ Transparency: I once worked for a company whose main application consisted of dozens and dozens of individual Java projects. Depending on what you were trying to do, you needed to choose combinations of these projects. (In a way, these were a simple form of microservices.) Occasionally I needed to discover which additional Git repo I needed to clone to make things work, and that wasn’t always easy. With a monorepo, all of the code is in your local tree.

◉ Code awareness: Due to transparency, there’s reduced risk of code duplication.

◉ Improved structure: I’ve seen different approaches to structuring a monorepo, but a common one is to create subdirectories for the different types of codebases, such as apps, libs, or docs. Each application resides within its own subdirectory under apps, each library resides under libs, and so on. This approach also helps because documentation is kept with the code in the repo.

◉ Continuous integration/continuous delivery (CI/CD): GitHub Actions helps resolve many tooling issues with monorepos. Specifically, GitHub supports code owners with scoped workflows for project management and permissions. Atlassian also provides tips for monorepos and GitLab does as well. Other tools are available to support monorepos. However, you can get by very well with just Maven or Gradle.

Monorepo and Maven for Java projects

A monorepo advantage specific to Java projects is improved dependency management. Since you’re likely to have all your organization’s applications and libraries locally, any changes to a dependency in an application (other than the one you’re focused on) will be built locally and tests will run locally as well. This process will highlight potential conflicts and the need for regression testing earlier during development before binaries are rolled out to production.

Here’s how the monorepo concept affects Maven projects. Of course, Maven isn’t aware of your repo structure, but using a monorepo does affect how you organize your Java projects; therefore, Maven is involved.

For instance, it’s common to structure a monorepo (and the development directory structure) as follows:

<monorepo-name>

|–apps

| |–app1

| |–app2

|–libs

| |–librarybase

| |–library1

| |–library2

|–docs

However, you can structure your monorepo any way you wish, with directories named frontend, backend, mobile, web, microservices, devops, and so on.

To support the monorepo hierarchy, I use Maven modules. For instance, at the root of the project, I define a pom.xml file with modules for apps and libs. Listing 1 is a partial listing that shows root-level modules.

Listing 1.

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

    <modelVersion>4.0.0</modelVersion>

    <groupId>com.ericbruno</groupId>

    <artifactId>root</artifactId>

    <version>${revision}</version>

    <packaging>pom</packaging>

    <properties>

        <revision>1.0-SNAPSHOT</revision>

        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>

        <maven.compiler.source>15</maven.compiler.source>

        <maven.compiler.target>15</maven.compiler.target>

    </properties>

    <modules>

        <module>libs</module>

        <module>apps</module>

    </modules>

Within each of the subdirectories, such as libs and apps, there are pom.xml files that define the set of library and application modules, respectively. As shown in Listing 2, the pom.xml file for the set of library modules is straightforward and goes within the libs subdirectory.

Listing 2.

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

    <modelVersion>4.0.0</modelVersion>

    <parent>

        <groupId>com.ericbruno</groupId>

        <artifactId>root</artifactId>

        <version>${revision}</version>

    </parent>

    <groupId>com.ericbruno.libs</groupId>

    <artifactId>libs</artifactId>

    <packaging>pom</packaging>

    <modules>

        <module>LibraryBase</module>

        <module>Library1</module>

        <module>Library2</module>

    </modules>

</project>

The pom.xml file for applications is more involved. To avoid consuming local disk space, and to avoid long compile times, you might decide to keep only a subset of your organization’s applications locally. This is not recommended, because you lose some of the benefits of a monorepo, but due to resource constraints, you may have no choice. In such cases, you can omit application subdirectories as you see fit. However, to avoid build errors in your Maven scripts, you can use Maven build profiles, which contain <activation> property sections with <file> and <exists> properties, as shown in Listing 3.

Listing 3.

<?xml version="1.0" encoding="UTF-8"?>

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

    <modelVersion>4.0.0</modelVersion>

    <parent>

        <groupId>com.ericbruno</groupId>

        <artifactId>root</artifactId>

        <version>${revision}</version>

    </parent>

    <groupId>com.ericbruno.apps</groupId>

    <artifactId>apps</artifactId>

    <packaging>pom</packaging>

    <profiles>

        <profile>

            <id>App1</id>

            <activation>

                <file>

                    <exists>App1/pom.xml</exists>

                </file>

            </activation>

            <modules>

                <module>App1</module>

            </modules>

        </profile>

        <profile>

            <id>App2</id>

            <activation>

                <file>

                    <exists>App2/pom.xml</exists>

                </file>

            </activation>

            <modules>

                <module>App2</module>

            </modules>

        </profile>

    </profiles>

</project>

The sample monorepo in Listing 3 contains only two applications: App1 and App2. There are Maven build profiles defined for each, which causes the existence of each application’s separate pom.xml file to be checked before the profile is activated. In summary, only the applications that exist on your local file system will be built, no Maven errors will occur, and you don’t need to change the Maven scripts. This works well with Git’s concept of sparse checkouts, as explained on the GitHub blog.

Note: Alternatively, you can drive Maven profile activation by checking for the lack of a file using the <missing> property, which can be combined with <exists>.

In the sample monorepo, available in my GitHub repository here, I created two libraries, both of which extend a base library using Java interfaces and Maven modules and profiles. For example, whereas LibraryBase is a standalone Maven Java project, Library1 depends on it. As you can see, LibraryBase is denoted as a Maven dependency in the pom.xml for Library1.

...

<dependency>

    <groupId>com.ericbruno</groupId>

    <artifactId>LibraryBase</artifactId>

    <version>1.0-SNAPSHOT</version>

    <type>jar</type>

</dependency>

...

You can open the root monorepo pom.xml file as a Maven Java project within NetBeans, and all the modules will be listed in a hierarchy (see Figure 1). You can build the entire set of libraries and applications from this root project. You can also double-click a module—such as App1 in this example—and that project will load separately so you can edit its code.

Core Java, Oracle Java Preparation, Oracle Java Certification, Oracle Java Guides, Oracle Java Skills, Oracle Java Jobs
Figure 1. A monorepo root Maven project with an individual module loaded in NetBeans

Oh, a final tip: Remember to rename your master branch to main, because that’s a more inclusive word.

Some monorepo challenges to overcome


There are two sides to every story, and a monorepo has some challenges as well as benefits.

As discussed earlier, there could potentially be a lot of code to pull down and keep in sync. This can be burdensome, wasteful, and time-consuming.

Fortunately, there’s an easy way to handle this, as I’ve shown, by using build profiles. Other challenges, as written about by Matt Klein, include potential effects on application deployment, the potential of tight coupling between components, concerns over the scalability of Git tools, the inability to easily search through a large local codebase, and some others.

In my experience, these perceived challenges can be overcome, tools can and are being modified to handle growing codebases, and the benefits of a monorepo, as explained by Kenneth Gunnerud, outweigh the drawbacks.

Source: oracle.com

Monday, December 27, 2021

11 great Java tricks from dev.java

The dev.java website is a huge resource for all Java developers. Here are 11 tricks collected from the site.

Download a PDF of this article

Trick #1: Compact record constructors

You know how to use records to model data carriers and how to verify incoming data during construction, but do you know that you don’t have to list the parameters of all the record constructors?

A record’s canonical constructor has one argument per component but in its compact form, you don’t have to list all the arguments. You can’t assign fields—that happens in compiler-generated code after yours—but you can reassign the parameters, which leads to the same result.

public record Range(int start, int end) {

    public Range {

        if (end <= start) {

            throw new IllegalArgumentException();

        }

    }

}

Trick #2: Serializing records

You know that every object can be serialized using black magic, but do you know that no such deviance is needed for records? The guaranteed presence of constructor parameters and accessors makes serialization work with the object model—and makes it easy for you to create reliable serializable records.

Trick #3: jpackage versus modules

You know jpackage, but wait: Do you really know jpackage?

jpackage is a command-line tool that takes an entire Java app as input and produces a fully self-contained application image, meaning it includes your code, dependencies, and a Java runtime. jpackage creates the runtime for your app with jlink, which you can fully configure through jpackage, or you can pass it the path to a runtime image that you already created.

Other configuration options for jpackage include the ability to specify application metadata such as icons and licenses, installation options, and launchers as well as JVM and program options.

jpackage produces output in platform-specific formats such as deb and rpm for Linux or exe and msi for Windows.

Now that you know jpackage, do you know that it can do all of that for modular as well as nonmodular applications?

# generate an application image for a modular application:

jpackage --type app-image -n name -p modulePath -m moduleName/className

# for a nonmodular application:

jpackage --type app-image -i inputDir -n name --main-class className --main-jar myJar.jar

Trick #4: Cross-OS runtime images

Speaking of jlink, do you know that you can use it to create runtime images across operating systems?

Say your build server runs Linux and you need a Windows runtime image. You simply need to download and unpack a Windows JDK of the same version as the Linux one that runs jlink, and then add the Windows jmods folder to the Linux jlink executable’s module path.

# download JDK for Windows and unpack into jdk-win

# create the image with the jlink binary from the system's JDK

# (in this example, Linux)

$ jlink

  --module-path jdk-win/jmods:mods

  --add-modules com.example.main

  --output app-image

Trick #5: Labeled breaks and continues

You know how to use the break statement to get out of an inner loop, but do you know that you can give the break a label to break out of an appropriately labeled outer loop as well? You can do likewise with continue, which skips the rest of the current iteration of the innermost loop, because if you pass it a label, it will skip the iteration of the labeled loop instead.

However, just because you can, doesn’t mean you should. Use this trick with care.

class ContinueWithLabelDemo {

    public static void main(String[] args) {

        String searchMe = "Look for a substring in me";

        String substring = "sub";

        boolean foundIt = false;

        int max = searchMe.length() -

                  substring.length();

    test:

        for (int i = 0; i <= max; i++) {

            int n = substring.length();

            int j = i;

            int k = 0;

            while (n-- != 0) {

                if (searchMe.charAt(j++) != substring.charAt(k++)) {

                    continue test;

                }

            }

            foundIt = true;

                break test;

        }

        System.out.println(foundIt ? "Found it" : "Didn't find it");

    }

}

Trick #6: Boolean expressions in pattern matching

You know pattern matching, but do you know that you can use the variable that pattern matching introduces in the same boolean expression?

For example, if you check whether the instance object is of type String with object instanceof String s, you can start using s straight away, such as to check whether s is nonempty with && !s.isEmpty(). This works in if statements in Java 16 and in switch as a preview in Java 17.

Object object = // ...

if (object instanceof String s && !s.isEmpty())

    System.out.println("Non-empty string");

else

    System.out.println("No string or empty.");

Trick #7: Generic wildcards and subtyping

You know generics and that a List<Integer> does not extend a List<Number>, as shown in Figure 1.

Java Trick, Oracle Java Certification, Oracle Java Tutorial and Materials, Oracle Java Prep, Java Preparation, Java Career, Oracle Java Learning
Figure 1. Generic inheritance

But do you know that if you add wildcards, you can create a type hierarchy, as shown in Figure 2? A List<? extends Integer> actually does extend a List<? extends Number>. And the other way around, a List<? super Number> extends a List<? super Integer>.

Java Trick, Oracle Java Certification, Oracle Java Tutorial and Materials, Oracle Java Prep, Java Preparation, Java Career, Oracle Java Learning
Figure 2. Generic inheritance with wildcards

Trick #8: Creating and chaining predicates


You know how to write lambdas to create predicates, but do you know that the interface offers many methods to create and combine them? Call the instance methods and, or, or negate for boolean formulas.

You have no predicates yet? No problem! The static factory method not is useful for inverting a method reference. And if you pass some object to isEqual, you create a predicate that checks instances for equality with that object.

Predicate<String> isEqualToDuke = Predicate.isEqual("Duke");
Predicate<Collection<String>> isEmpty = Collection::isEmpty;
Predicate<Collection<String>> isNotEmpty = Predicate.not(isEmpty);

Trick #9: Creating and chaining comparators


If you think that was cool, hold your breath for comparators: You know how to implement them, but do you know there are even more factory and combination methods?

To compare long, double, float, and other values, use a method reference to their static compare method.

Comparator<Integer> comparator = Integer::compare;

If you want to compare objects by one of their attributes, pass a function extracting it to the static method Comparator.comparing. To first sort by one and then another attribute, create both comparators, and then chain them with the instance method thenComparing.

Comparator<User> byFirstName = Comparator.comparing(User::getFirstName);
Comparator<User> byLastName = Comparator.comparing(User::getLastName);
Comparator<User> byName = byFirstName.thenComparing(byLastName);

Need a Comparator instance that uses a Comparable object’s compareTo method? The static factory method naturalOrder is there for you. If that’s the wrong way around, just call reversed on it.

Ah, and what to do about the pesky null? Worry not: Pass a Comparator to nullsFirst or nullsLast to create a Comparator that does what you need.

Comparator<Integer> natural = Comparator.naturalOrder();
Comparator<Integer> naturalNullsLast = Comparator.nullsLast(natural);

Trick #10: Executing source files as scripts


You know that you can use the java command to launch a single source file without having to manually compile the file first. But do you know that you can use this capability to write full-on scripts in Java in three simple steps?

First, add a shebang line to the source file that points at your java executable, followed by --source and the Java version the code was written for.

#!/path/to/your/bin/java --source 16

public class HelloJava {

    public static void main(String[] args) {
        System.out.println("Hello " + args[0]);
    }
}

Second, rename the file so it doesn’t end with .java. This is a great opportunity to give it a good command-line style of name.

Third, make the file executable with chmod +x. There you go: scripts in Java!

Trick #11: Loading jshell with all imports


You know how to launch jshell for some quick experiments but do you know that you don’t have to import everything manually? Simply launch jshell with the option JAVASE and all Java SE packages are imported, so you can get right to work.

jshell JAVASE

Source: oracle.com

Monday, December 20, 2021

Perform textual sentiment analysis in Java using a deep learning model

Positive? Negative? Neutral? Use the Stanford CoreNLP suite to analyze customer product reviews.

Download a PDF of this article

Sentiment analysis is a text classification task focused on identifying whether a piece of text is positive, negative, or neutral. For example, you might be interested in analyzing the sentiment of customer feedback on a certain product or in detecting the sentiment on a certain topic trending in social media.

Core Java, Oracle Java Certification, Oracle Java Prep, Oracle Java Career, Oracle Java Skills, Java Jobs

This article illustrates how such tasks can be implemented in Java using the sentiment tool integrated into Stanford CoreNLP, an open source library for natural language processing.

The Stanford CoreNLP sentiment classifier

To perform sentiment analysis, you need a sentiment classifier, which is a tool that can identify sentiment information based on predictions learned from the training data set.

In Stanford CoreNLP, the sentiment classifier is built on top of a recursive neural network (RNN) deep learning model that is trained on the Stanford Sentiment Treebank (SST), a well-known data set for sentiment analysis.

In a nutshell, the SST data set represents a corpus with sentiment labels for every syntactically possible phrase derivable from thousands of sentences used, thus allowing for the capture of the compositional effects of sentiment in text. In simple terms, this allows the model to identify the sentiment based on how words compose the meaning of phrases rather than just by evaluating words in isolation.

To better understand the structure of the SST data set, you can examine the data set files downloadable from the Stanford CoreNLP sentiment analysis page.

In Java code, the Stanford CoreNLP sentiment classifier is used as follows.

To start, you build up a text processing pipeline by adding the annotators required to perform sentiment analysis, such as tokenize, ssplit, parse, and sentiment. In terms of Stanford CoreNLP, an annotator is an interface that operates on annotation objects, where the latter represent a span of text in a document. For example, the ssplit annotator is required to split a sequence of tokens into sentences.

The point is that Stanford CoreNLP calculates sentiment on a per-sentence basis. So, the process of dividing text into sentences is always followed by applying the sentiment annotator.

Once the text has been broken into sentences, the parse annotator performs syntactic dependency parsing, generating a dependency representation for each sentence. Then, the sentiment annotator processes these dependency representations, checking them against the underlying model to build a binarized tree with sentiment labels (annotations) for each sentence.

In simple terms, the nodes of the tree are determined by the tokens of the input sentence and contain the annotations indicating the predicted class out of five sentiment classes from very negative to very positive for all the phrases derivable from the sentence. Based on these predictions, the sentiment annotator calculates the sentiment of the entire sentence.

Setting up Stanford CoreNLP

Before you can start using Stanford CoreNLP, you need to do the following setup:

◉ To run Stanford CoreNLP, you need Java 1.8 or later.

◉ Download the Stanford CoreNLP package and unzip the package in a local folder on your machine.

◉ Add the distribution directory to your CLASSPATH as follows:

export CLASSPATH=$CLASSPATH:/path/to/stanford-corenlp-4.3.0/*:

After completing the steps above, you are ready to create a Java program that runs a Stanford CoreNLP pipeline to process text.

In the following example, you implement a simple Java program that runs a Stanford CoreNLP pipeline for sentiment analysis on text containing several sentences.

To start, implement a class that provides a method to initialize the pipeline and a method that will use this pipeline to split a submitted text into sentences and then to classify the sentiment of each sentence. Here is what the implementation of this class might look like.

//nlpPipeline.java

import java.util.Properties;

import edu.stanford.nlp.pipeline.StanfordCoreNLP;

import edu.stanford.nlp.ling.CoreAnnotations;

import edu.stanford.nlp.pipeline.Annotation;

import edu.stanford.nlp.neural.rnn.RNNCoreAnnotations;

import edu.stanford.nlp.sentiment.SentimentCoreAnnotations;

import edu.stanford.nlp.sentiment.SentimentCoreAnnotations.SentimentAnnotatedTree;

import edu.stanford.nlp.trees.Tree;

import edu.stanford.nlp.util.CoreMap;

public class nlpPipeline {

    static StanfordCoreNLP pipeline;

    public static void init() 

    {

        Properties props = new Properties();

        props.setProperty("annotators", "tokenize, ssplit, parse, sentiment");

        pipeline = new StanfordCoreNLP(props);

    }

    public static void estimatingSentiment(String text)

    {

   int sentimentInt;

      String sentimentName; 

      Annotation annotation = pipeline.process(text);

      for(CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class))

      {

         Tree tree = sentence.get(SentimentAnnotatedTree.class);

        sentimentInt = RNNCoreAnnotations.getPredictedClass(tree); 

                sentimentName = sentence.get(SentimentCoreAnnotations.SentimentClass.class);

        System.out.println(sentimentName + "\t" + sentimentInt + "\t" + sentence);

      }

     }

}

The init() method initializes the sentiment tool in the Stanford CoreNLP pipeline being created, and it also initializes the tokenizer, dependency parser, and sentence splitter needed to use this sentiment tool. To initialize the pipeline, pass a Properties object with the corresponding list of annotators to the StanfordCoreNLP() constructor. This creates a customized pipeline that is ready to perform sentiment analysis on text.

In the estimatingSentiment() method of the nlpPipeline class, invoke the process() method of the pipeline object created previously, passing in text for processing. The process() method returns an annotation object that stores the analyses of the submitted text.

Next, iterate over the annotation object getting a sentence-level CoreMap object on each iteration. For each of these objects, obtain a Tree object containing the sentiment annotations used to determine the sentiment of the underlying sentence.

Pass the Tree object to the getPredictedClass() method of the RNNCoreAnnotations class to extract the number code of the predicted sentiment for the corresponding sentence. Then, obtain the name of the predicted sentiment and print the results.

To test the functionality above, implement a class with the main() method that invokes the init() method and then invokes the estimatingSentiment() method of the nlpPipeline class, passing sample text to the latter.

In the following implementation, the sample text is hardcoded in the program for simplicity. The sample sentences were designed to cover the entire spectrum of sentiment scores available with Stanford CoreNLP: very positive, positive, neutral, negative, and very negative.

//SentenceSentiment.java

public class SentenceSentiment 

{

  public static void main(String[] args) 

  {

               String text = "This is an excellent book. I enjoy reading it. I can read on Sundays. Today is only Tuesday. Can't wait for next Sunday. The working week is unbearably long. It's awful.";

    nlpPipeline.init();

    nlpPipeline.estimatingSentiment(text);

  }

}

Now, compile the nlpPipeline and SentenceSentiment classes and then run SentenceSentiment:

$ javac nlpPipeline.java

$ javac SentenceSentiment.java 

$ java SentenceSentiment

Here is what the output should look like.

Very positive  4     This is an excellent book.

Positive       3     I enjoy reading it.

Neutral        2     I can read on Sundays.

Neutral        2     Today is only Tuesday.

Neutral        2     Can't wait for next Sunday.

Negative       1     The working week is unbearably long.

Very negative  0     It's awful.

The first column in the output above contains the name of the sentiment class predicted for a sentence. The second column contains the corresponding number code of the predicted class. The third column contains the sentence.

Analyzing online customer reviews

As you learned from the previous example, Stanford CoreNLP can return a sentiment for a sentence. There are many use cases, however, where there is a need to analyze the sentiment of many pieces of text, each of which may contain more than a single sentence. For example, you might want to analyze the sentiment of tweets or customer reviews from an ecommerce website.

To calculate the sentiment of a multisentence text sample with Stanford CoreNLP, you might use several different techniques.

When dealing with a tweet, you might analyze the sentiment of each sentence in the tweet and if there are some sentences that are either positive or negative you could rank the entire tweet respectively, ignoring the sentences with the neutral sentiment. If all (or almost all) the sentences in a tweet are neutral, then the tweet could be ranked neutral.

Sometimes, however, you don’t even have to analyze each sentence to estimate the sentiment of an entire text. For example, when analyzing customer reviews, you can rely on their titles, which often consist of a single sentence.

To work through the following example, you’ll need a set of customer reviews. You can use the reviews found in the NlpBookReviews.csv file accompanying this article. The file contains a set of actual reviews downloaded from an Amazon web page with the help of Amazon Review Export, a Google Chrome browser extension that allows you to download a product’s reviews with their titles and ratings to a comma-separated values (CSV) file. (You can use that tool to explore a different set of reviews for analysis.)

There is another requirement. Because the Java language lacks any native support for the efficient handling of CSV files, you’ll need a third-party library such as Opencsv, an easy-to-use CSV parser library for Java. You can download the Opencsv JAR and its dependencies. In particular, you will need to download the Apache Commons Lang library. Include them in the CLASSPATH as follows:

export CLASSPATH=$CLASSPATH:/path/to/opencsv/*:

Then, add the following method to the nlpPipeline class created in the previous section:

//nlpPipeline.java

    ...

    public static String findSentiment(String text) {

        int sentimentInt = 2;

        String sentimentName = "NULL";

        if (text != null && text.length() > 0) {

          Annotation annotation = pipeline.process(text);

          CoreMap sentence = annotation

                    .get(CoreAnnotations.SentencesAnnotation.class).get(0);

          Tree tree = sentence

                     .get(SentimentAnnotatedTree.class);

          sentimentInt = RNNCoreAnnotations.getPredictedClass(tree);

          sentimentName = sentence.get(SentimentCoreAnnotations.SentimentClass.class);

        }

        return sentimentName;

    }

As you might notice, the code above is similar to the code in the estimatingSentiment() method defined in the previous section. The only significant difference is that this time you don’t iterate over the sentences in input text. Instead, you get only the first sentence, since in most cases a review’s title consists of a single sentence.

Create a ReviewsSentiment.java file with the main method that will read the reviews from a CSV file and pass them to the newly created findSentiment() for processing, as follows:

import com.opencsv.CSVReader;

import com.opencsv.CSVParser;

import com.opencsv.CSVReaderBuilder;

import com.opencsv.exceptions.CsvException; 

import java.io.FileReader;

import java.io.IOException;

public class ReviewsSentiment {

    public static void main(String[] args) throws IOException, CsvException {

        nlpPipeline.init();

        String fileName = "NlpBookReviews.csv";

        try (CSVReader reader = new CSVReaderBuilder(new FileReader(fileName)).withSkipLines(1).build())

        {

            String[] row;

            while ((row = reader.readNext()) != null) {

                System.out.println("Review: " + row[1] + "\t" + " Amazon rating: " + row[4] + "\t" + " Sentiment: " + nlpPipeline.findSentiment(row[1]));

            }

        }

    }

}

You’ll need to recompile the nlpPipeline class and compile the newly created ReviewsSentiment class. After that, you can run ReviewsSentiment as follows:

$ javac nlpPipeline.java

$ javac ReviewsSentiment.java

$ java ReviewsSentiment

The output should look as follows:

Review: Old version of python useless            Amazon rating: 1    Sentiment: Negative

Review: Excellent introduction to NLP and spaCy  Amazon rating: 5    Sentiment: Positive

Review: Could not get spaCy on MacBook           Amazon rating: 1    Sentiment: Negative

Review: Good introduction to SPACY for beginner. Amazon rating: 4    Sentiment: Positive

Source: oracle.com

Monday, December 13, 2021

Arm resources show how to optimize your Java applications for AArch64

Choosing the best runtime switches can greatly affect the performance of your workload.

Java developers don’t need to worry about the deployment hardware as long as a first-class JVM is available, and that’s certainly the case for Arm processors. Whether you’re writing the code or running the code, it’s business as usual: At the end of the day, it’s plain old Java bytecode.

Of course, many Java developers and systems administrators want to know more, and there are several excellent resources, especially two posts from Arm.

Java performance on Ampere A1 Compute

Arguably the most important post for the deployment of processor-intensive applications is a November 2021 post from Arm senior performance engineer Shiyou (Alan) Huang. His post, “Improving Java performance on OCI Ampere A1 Compute instances,” begins with the following:

Oracle Cloud Infrastructure (OCI) has recently launched the Ampere A1 Compute family of Arm Neoverse N1-based VMs and bare-metal instances. These A1 instances use Ampere Altra CPUs that were designed specifically to deliver performance, scalability, and security for cloud applications. The A1 Flex VM family supports an unmatched number of VM shapes that can be configured with 1-80 cores and 1-512GB of RAM (up to 64GB per core). Ampere A1 compute is also offered in bare-metal configurations with up to 2-sockets and 160-cores per instance.

In this blog, we investigate the performance of Java using SPECjbb2015 on OCI A1 instances. We tuned SPECjbb2015 for best performance by referring to the configurations used by the online SPECjbb submissions. Those Java options may not apply to all Java workloads due to the very large heap size and other unrealistic options. The goal here is to see the best scores we can achieve on A1 using SPECjbb. We compared the performance results of SPECjbb2015 over different versions of OpenJDKs to identify a list of patches that improve the performance. As SPECjbb is a latency-sensitive workload, we also presented the impact of Arm LSE (Large System Extensions) on the performance in this blog.

Huang’s paper presents two metrics to evaluate the performance of a JVM: max-jOPS for throughput and critical-jOPS for critical throughput under service-level agreements (SLAs).

One of Huang’s charts, reproduced below as Figure 1, shows the SPECjbb scores from OpenJDK 11 to 16 using Arm’s tuned configurations.

Oracle Java Exam Prep, Oracle Java Certification, Oracle Java Career, Oracle Java Exam Preparation, Core Java, Oracle Java Study Materials
Figure 1. SPECjbb scores from OpenJDK 11 to 16 using Arm’s tuned configurations

Java performance on Neoverse N1 systems


Huang wrote another relevant blog post, “Improving Java performance on Neoverse N1 systems,” in July 2021.

In this post, Huang writes the following:

…we investigate Java performance using an enterprise benchmark on Arm’s Neoverse N1 based CPU, which has demonstrated huge cost/power/performance improvements in server CPUs. We built OpenJDK from the source code with different compiling flags to test the impact of LSE (Large System Extensions), the new architecture extension introduced in ARMv8.1. The Java enterprise benchmark we tested provides two throughput-based metrics: maximum throughput and throughput under service level agreement (SLA). We tuned performance by tweaking an initial set of Java runtime flags and augmented system parameters using an automated optimization tool.

Huang summarizes the tests as follows: “As we can see, tuning is critical to the performance of Java applications. The throughput is improved by 40% and the SLA score for more than 80% after the tuning configurations are applied” (as shown in Figure 2).

Oracle Java Exam Prep, Oracle Java Certification, Oracle Java Career, Oracle Java Exam Preparation, Core Java, Oracle Java Study Materials
Figure 2. Tuning the JVM’s runtime flags can greatly impact performance.

Large System Extensions


A key section of both of Huang’s blog posts is his discussion of Arm’s Large System Extensions (LSE), which first appeared in the Arm 8.1 architecture. He describes LSE as “a set of atomic operations such as compare-and-swap (CAS), atomic load and increment (LDADD). Most of the new operations have load-acquire or store-release semantics which are low-cost atomic operations comparing to the legacy way to implement atomic operations through pairs of load/store exclusives.”

Huang adds that in his experiments, “We see that LSE benefits the workloads that are highly concurrent and use heavy synchronizations,” as seen in Figure 3.

Oracle Java Exam Prep, Oracle Java Certification, Oracle Java Career, Oracle Java Exam Preparation, Core Java, Oracle Java Study Materials
Figure 3. The impact of Arm’s LSE

Source: oracle.com

Wednesday, December 8, 2021

Microservice, monolith, microlith

Microservice, monolith, microlith, Core Java, Oracle Java Certification, Oracle Java Guides, Oracle Java Preparation, Oracle Java Learning, Oracle Java Career

A proposal to overcome the limitations of both monolith and microservices applications

Download a PDF of this article

As a training consultant, I often deal with very practical questions about microservices: What are they? What is so special about microservices? What are some of the best-justified and beneficial use cases for microservices?

Often, these questions are answered in quite a partial manner, with answers greatly depending on one’s past experiences and personal preferences. Answers range from “everything should be a microservice” to “one should avoid microservices like the plague,” with various degrees of cautionary approaches in between.

Despite the availability of multiple answers, I’ve found they generally lack scientific precision, representing points of view rather than hard facts. Indeed, many recommendations were essentially personal experience testimonies that describe the success or failure of specific cases of microservice implementations.

This article seeks to present something entirely different from such anecdotal evidence. I’ll explore some hard facts from an ocean of perspectives and points of view about the nature and applicability of microservices.

Defining microservices

Let’s start with a definition of what a microservice actually is. Oh, wait: There is no definition—at least there is no definition that is universally recognized. Instead, there are many competing definitions that appear to share a number of similarities. Instead, here are commonly recognized characteristics to define a microservice.

◉ Microservices are characterized as micro or small in size. This implies a small deployment footprint, making it easier to test, deploy, maintain, and scale a microservice application. Smaller application size is aimed at a shorter and thus cheaper production cycle and more flexible scalability.

◉ Microservices are described as loosely coupled, suggesting that each such application ought to be a self-contained unit of business logic that is not dependent on other applications. Loose coupling also means that a microservice application should be capable of being independently versioned and deployed.

◉ Each microservice should be developed and owned by a small team utilizing a technology stack of its choice. This approach promotes tight development focus on a relatively small subset of business functions, resulting in more precise and capable business logic implementation. (See Figure 1).

Microservice, monolith, microlith, Core Java, Oracle Java Certification, Oracle Java Guides, Oracle Java Preparation, Oracle Java Learning, Oracle Java Career
Figure 1. Microservice application architecture

Defining monoliths


A microservice does not exist only to satisfy its own requirements but is a part of an extensive collection of services. Together these services meet the business requirements of an organization that owns these services.

Consider large enterprisewide business applications, which are often described as monoliths. Unfortunately, much like a microservice, the term monolith is not strictly defined, so I’ll have to resort to describing the characteristics again.

◉ A monolith is characterized as a large application that implements many different business functions across the enterprise. A similar amount of business logic that many microservices provide can be implemented by a single monolith application, but the development of a larger application would take longer. The monolith will likely be harder to maintain than a group of microservices, and it would be less flexible when it comes to available scalability options.

◉ Components within a monolith application could be tightly coupled, suggesting that internally a monolith application would have many dependencies between its parts. Of course, this does not necessarily have to be the case, because the number of dependencies greatly relies on specific design and architecture choices. Still, it is certainly more likely that internal dependencies would exist, simply because creating such dependencies is less of a hurdle for an application developer when all code belongs to the same application anyway.

◉ Monolith development is a collective effort of many programmers and designers, making the development cycle longer, but it may promote a consistent design approach across many different business functions. Using a common technology stack across the enterprise can simplify maintenance and development. Unlike the polyglot approach promoted by microservice advocates, monolith development does not allow flexibility for design and architecture decisions that would be the best fit for a specific subset of business functions implementations. (See Figure 2.)

Microservice, monolith, microlith, Core Java, Oracle Java Certification, Oracle Java Guides, Oracle Java Preparation, Oracle Java Learning, Oracle Java Career
Figure 2. Monolith application architecture

Common misconceptions


Here are some common misconceptions regarding microservices and monolith architectures.

Service granularity. The term service granularity describes the distribution of business functions and features across a number of services. For example, consider a business function that needs to create a description of a scientific experiment and to record measurements for this experiment. This business function can be implemented as a single service operation that handles a single large business object combining all the properties of an experiment and all associated measurements. Such an implementation approach is usually described as a coarse-grained service design.

An alternative approach is known as a fine-grained service design. The exact same business function can be implemented as a number of different service operations, separately handling smaller data units such as an experiment or a measurement. Notice that the difference is in the number of service operations it takes to represent a given amount of business functions. In other words, both approaches implement the same unit of logic but expose it as a different number of services.

The problem is that the granularity of the service is often confused with the concept of a microservice: Basically, a fine-grained service design is not necessarily implemented as a microservice, while a coarse-grained service is not necessarily synonymous with a monolith implementation.

The key to understanding why these are not synonymous concepts is linked to one of the most fundamental properties of a service, which is that a service invoker should not be able to tell anything about the service implementation. Therefore, it makes no difference to the service consumer exactly how a service is implemented behind the scenes. Whether it’s a monolith or not, service consumers should not be able to tell the difference anyway.

To resolve this confusion, I propose to use the phrase implementation granularity instead of service granularity, where implementation granularity could be described as either fine-grained or coarse-grained. This focuses on defining the actual complexity and size of the application behind a service interface. The idea of the implementation granularity should be helpful to clarify the confusion. You could essentially describe a microservices approach as based on fine-grained implementation design, and which allows the developers to deliver services of any granularity, if that is convenient.

The issue has to do with the implication that microservices must be implemented as small-sized applications, which could be described as a fine-grained implementation design.

Remember that small size and loose coupling are important microservices characteristics that are considered to be beneficial because of the shorter production cycle, flexible scalability, independent versioning, and deployment. However, these benefits should not be considered automatically granted.

Data fragmentation. One unintended consequence of a fine-grained application implementation is data fragmentation. A loosely coupled design implies that each microservice application has its own data storage that contains information owned by this specific application.

Another implication of the loosely coupled design is that different applications should not use distributed transactions or a two-phase commit to synchronize their data in order to maintain a high degree of separation between microservice applications. This approach introduces the problem of data fragmentation and the potential lack of consistency.

Consider the case when a given microservice application needs information owned by another microservice application. What if the solution is simply to allow one application to invoke another to obtain or synchronize required pieces of information? This could work, but what if a given service experiences performance problems or an outage? This would inevitably have a cascading effect on any other dependent services, leading to larger outages and overall performance degradation. Such an approach may work for a small number of applications, but the larger the set of such applications, the greater the risks to their performance and availability.

Thus, consider another solution that addresses the data consistency and fragmentation implications. For example, what if each microservice application caches information that it needs from other applications?

Caching should provide some degree of autonomy for each application, contributing to its capability to be isolated and self-sustained. However, caching also means that applications would have to be designed, considering that the latest data state may not always be available. Different distributed caching and data-streaming solutions could be utilized to automate the handling of information replication. Finally, each application has to provide data state tracking and undo behaviors instead of the distributed transaction coordination.

Inevitably, these issues lead to design complications, making each microservice application not as simple as it appears at first glance.

Furthermore, each development team that works on a particular microservice cannot really remain in a state of perfect isolation but has to maintain data dependencies with other applications.

In other words, microservices architecture does not appear to actually deliver on the promise of completely solving the dependency issues experienced by monoliths. Instead, data consistency and integrity management are shifted from being an internal concern of a single monolith application to a shared responsibility among many microservice development teams.

Versioning. Another problem arises from the promise of independent versioning capabilities for each microservice application. In more complex service interaction scenarios, functional dependencies had to be considered along with the data dependencies.

Imagine a service in which the internal implementation had been modified. Such modification may not necessarily cause any changes to the service interface or the shape and format of its data. Many developers would not consider such implementation modifications as having any consequences that would require the production of a new version of the service and thus would not notify the dependent application developers of these changes.

However, such modifications may affect the semantics of how the service interprets its data, leading to a discrepancy between microservice applications.

For example, consider the implications of a change in the interpretation of a specific value. Suppose an application that records measurements considers an inch to be a default unit of measure, and other applications may rely on this to be the default value. An internal change may lead to the centimeter being implied to be the default value instead of an inch. This could have a knock-on effect on any other microservice applications, which—all things considered—could even be dangerous. Yet, the chances are that developers of these other systems may not be any wiser about said change.

This shows that in any nontrivial application interaction scenario, microservices characteristics should not be automatically assumed as purely beneficial.

Monoliths and microservices face similar problems


Businesses face the exact same functional and data integration problems regardless of the choice of architecture. Because both monoliths and microservices must address them anyway, the question really is in understanding the benefits and drawbacks of each approach.

◉ A monolith offers data and functional consistency as an integral part of its centralized design and the unified development approach at the cost of scalability and flexibility.

◉ Microservices offer a significant degree of development autonomy yet shift the responsibility to resolve data and functional consistency problems to many different independent development teams, which could be a very precarious coordination task.

Perhaps a balanced approach aiming to embrace benefits and mitigate drawbacks of both microservices and monolith architectures could be the way forward. In my opinion, the most critical factor is the idea of the implementation granularity, as discussed earlier.

REST services are by far the most common form of representing microservices applications, and most of the use case examples for REST services focus on each such service representing a single business entity. This approach results in extremely fine-grained application implementations.

Consider the increase in the number of dependencies between such applications because of the need to maintain data cohesion across so many independently managed business entities.

However, strictly speaking, the microservices architecture does not require such a fine level of implementation granularity. In fact, microservices are usually described as focused on a single business capability, which is not necessarily the same as a single business entity, because a number of business entities can be used to support a single business capability.

Typically, such entities form data groups that exhibit very close ties and a significant number of dependencies. Using these data groupings as a guiding principle to decide on the implementation granularity of applications should result in a smaller number of microservice applications that are better isolated from each other. Each such application would not truly be micro compared to the one-application-per-entity structure, but the application would not be a single monolith that incorporates the entirety of the enterprise functions.

This approach should reduce the need to synchronize information across applications and, in fact, may have a positive effect on the overall system performance and reliability.

Business capability. What constitutes a single business capability? It obviously sounds like a set of commonly used business functions, but that is still a rather vague definition. It’s worth considering the way business functions use data as a grouping principle. For example, data could be grouped as a set of data objects produced in a context of a specific business process and having common ownership.

Common ownership implies that there is a specific business unit that is responsible for a number of business entities. Common ownership also implies that business decisions within this unit define the semantic context for these entities, changes that may affect their data structure and, most importantly, define a set of business functions that are responsible for creating, updating, and deleting this data.

Other business units may wish to read the same information, but they mostly act as consumers of this data rather than producers. Thus, each application would be responsible for its own subset of business entities and would be capable of performing all required transactions locally, without a need for distributed transaction coordination or two-phase commit operations.

Of course, data replication across applications would still be required for data caching purposes to improve the individual application autonomy. However, the overhead of maintaining a set of read-only data replicas is significantly smaller than the overhead of maintaining multidirectional data synchronization. Also, there would be a need to perform fewer data replications because of the overall reduction in the number of applications.

Data ownership. In large enterprises, the question of data ownership could be difficult to resolve and would require some investment into both data and business process analysis. Understanding a larger context of information helps to determine where data originates as well as the possible consumer of this data. This analysis picture has to represent a much broader landscape than that of an individual microservice application.

Practically speaking, in addition to a number of development teams dedicated to the production of specific applications, an extra group of designers and analysts has to be established to produce and maintain an integrated enterprise data model, assist in scoping individual applications, and reconcile any discrepancies between all other development teams.

As you can see, this approach proposes to borrow some monolith application design characteristics but use them differently, not aiming to produce a single enterprisewide application but rather support the integration of many applications, each focused on implementing their specific business capabilities.

Furthermore, this approach aims at ensuring that service application boundaries are well-defined and maintained, and it suggests criteria to establish such boundaries based on data ownership principles. There are even some interesting APIs, such as GraphQL, that can support these integration efforts.

Nomenclature. There is one more problem to resolve: What should we call this architecture? Should it still be called microservices, even though some business capabilities may own a relatively large number of entities, and thus some applications may turn out not to actually be that small?

I want to suggest calling such an approach a microlith architecture to indicate a hybrid nature of the strategy that attempts to combine the benefits of both microservices and monolith architectures. The actual word microlith means a small stone tool such as a prehistoric arrowhead or a needle made of stone. I like the sense of practicality projected by this term. (See Figure 3.)

Microservice, monolith, microlith, Core Java, Oracle Java Certification, Oracle Java Guides, Oracle Java Preparation, Oracle Java Learning, Oracle Java Career
Figure 3. Microlith application architecture

I’m sure many would agree that the best design and architecture decisions are based on practical cost-benefit analysis rather than blindly following abstract principles.

Source: oracle.com