Monday, November 29, 2021

Add a distributed database to your Java application with i.o.cluster

The interference open cluster is an open-source framework that allows you to build distributed applications with their own persistent storage.

Download a PDF of this article

The interference open cluster, or i.o.cluster, is a simple Java framework that lets you add a distributed database and complex event processing service within your Java application using an interface similar to the Java Persistence API (JPA) and uses annotations for structure mapping and data operations.

The name comes from the Interference Project; in this context, interference is a fundamental physical phenomenon in which two waves superpose to form a resultant wave of greater, lower, or the same amplitude.

The i.o.cluster project is an open source, pure Java framework. The basic unit of the i.o.cluster service is a node, which can be a standalone running service or a service running within a Java application.

Each i.o.cluster node has its own persistent storage and can be considered and used as a local database. It operates with plain old Java objects (POJOs). In addition, it uses base JPA annotations (such as @Entity, @Table, @Column, @Transient, @Index, and @GeneratedValue) for object mapping directly to persistent storage. It provides the following basic features:

◉ It supports transactions and SQL queries with READ COMMITTED isolation level.

◉ It uses persistent indices for fast access to data and to increase the performance of SQL joins.

◉ It allows flexible management of data in memory for the stable operation of a node at any ratio of storage size to available memory. Depending on the problem being solved, you can choose to allocate all data directly in memory with a sufficient heap size or use access to huge storage resources with a minimum heap size in the Java application.

Each of the nodes includes the following mechanisms; Figure 1 shows how they fit together.

◉ Core algorithms for structured persistent storage, indices, custom serialization, heap management, and local and distributed sync processes

◉ SQL and complex event processing (CEP) engine

◉ Event transport to exchange messages between nodes as well as between a node and a client application

Oracle Java, Oracle Java Exam Prep, Oracle Java Exam Preparation, Oracle Java Tutorial and Materials, Oracle Java Career, Oracle Java Certification

Figure 1. The parts of an i.o.cluster node

Nodes can be joined into a cluster, and you can insert data and run SQL queries from any node included in the cluster. SQL queries are scaled horizontally, and you can run transparent cluster-level transactions. Within a cluster, you can use both CEP and simple streaming SQL, and the i.o.cluster nodes do not require any additional coordinators.

Getting started with i.o.cluster


To get started with i.o.cluster, download the source code for the current release (which is 2021.1 as of late September 2021) from GitHub; then build the source code using Maven and install the JAR file into your local Maven repository. Then add the following dependency to your project’s pom.xml file. Note that the minimum requirements for the build are JDK 1.8 and Maven 3.

<dependencies>
    ...
    <dependency>
        <groupId>su.interference</groupId>
        <artifactId>interference</artifactId>
        <version>2021.1</version>
    </dependency>
    ...
</dependencies>

Before starting the application, create a text file with a name such as iocluster.properties that has properties similar to the example given in the source code’s config directory and in accordance with the description provided in the i.o.cluster documentation. Place the file in your project’s config directory. In the launch command line, specify the following parameter:

-Dsu.interference.config=iocluster.properties

Once this is done, launch the service anywhere in your code, as follows:

Instance instance = Instance.getInstance();
Session session = Session.getSession();
instance.startupInstance(session);

The i.o.cluster service implements a simple data management model, which is based on several standard JPA-like methods on the Session object and which can be obtained by calling the static method Session.getSession().

◉ persist() places an object of an @Entity-annotated class into storage.
◉ find() finds and returns an object from storage using a unique identifier.
◉ execute() executes a SQL query and returns a ResultSet object that contains a poll() method to retrieve the query execution results.
◉ commit() commits a transaction.
◉ rollback() rolls back a transaction.

More details about configuring your application for the launch of the i.o.cluster service are provided in the documentation attached to the source code and also on the developers’ website along with detailed examples.

By the way, the i.o.cluster software includes a remote client that provides the ability to remotely connect to any of the cluster nodes using internal event transport and then execute standard JPA-like methods to persist, find, execute, commit, and roll back transactions. Figure 2 shows how the remote client works.

Oracle Java, Oracle Java Exam Prep, Oracle Java Exam Preparation, Oracle Java Tutorial and Materials, Oracle Java Career, Oracle Java Certification
Figure 2. The remote client in the i.o.cluster service

The distributed persistent model


The i.o.cluster system is decentralized, meaning that the cluster does not use any coordination nodes. Instead, each node follows a set of formal rules of behavior that guarantees the integrity and availability of data within a certain interaction framework.

Within the framework of these rules, all nodes of the i.o.cluster are equivalent. There is no separation in the system between controller and replica nodes; changes to user tables can be made from any node. All changes are replicated to all nodes, regardless of which node initiated the change. In a transaction, running commit in a local user session automatically ensures that the changed data is visible on all nodes in the cluster.

To include a node in the cluster, specify the full list of cluster nodes in the cluster.nodes configuration parameter file. The minimum number of cluster nodes is 2, and the maximum is 64.

After the configuration is complete, you may start all configured nodes in any order. All nodes will use specific messages (events) for internode data consistency and horizontal-scaling queries.

These are the formal rules.

◉ All nodes in the cluster are equivalent.
◉ Changes made on any node are mapped to the other nodes immediately.
◉ If replication is not possible, a persistent change queue is created for the node. This might occur if a node is temporarily unavailable or if a connection is broken.
◉ The owner of any data frame is the node on which the frame has been allocated.
◉ The system generates unique identifiers for entities (using the @DistributedId annotation) so that each identifier is unique within the cluster.
◉ The system does not use any additional checks for uniqueness.
◉ Data inserts are performed in the local storage structure, after which the changes are replicated to the other nodes.

By the way, all SQL queries called on any of the cluster nodes will be automatically distributed among all cluster nodes for parallel processing, if the volume of tasks indicates this will be advantageous.

If some nodes become unavailable during the processing of a request (for example, if the network fails or the service stops), the tasks assigned to that node will be automatically rescheduled to another available node.

Complex event processing


The i.o.cluster service supports CEP using the SELECT STREAM clause in a SQL statement. A SELECT STREAM query can use any of the following three CEP modes:

◉ Events are processed as they arrive, without any aggregations.
◉ Events are aggregated by column value with the use of any group functions (tumbling window).
◉ Events are aggregated by a window of a certain size for every new record (sliding window).

The basic difference between a streaming query and a standard query is that with SELECT STREAM, the execute() method returns a StreamQueue object. The request is executed asynchronously until the StreamQueue.stop() method is called or until the application terminates.

With CEP, the StreamQueue.poll() method returns all records previously inserted into the table according to the WHERE condition (if one has been specified) and continues to return newly added records. Each StreamQueue.poll() method always returns the next record after the last polled position within the session. This means that if the SQL request is stopped and called again within the same session, the data retrieved will be continued from the last fixed position in the table. However, if there’s a new session, the data will be retrieved from the beginning of the table.

Note that unlike a normal SQL series, a streaming request does not support transactions and always returns actually inserted rows, regardless of whether the commit() method is used in a session inserting data. This could result in dirty reads, that is, uncommitted rows from another transaction.

All this means that when your code uses the SELECT STREAM clause, you can retrieve records in exactly the same order in which they were added. In general, at the cluster level, the session.persist() operation can be considered as publishing a persistent event. Based on i.o.cluster’s distribution rules, this event is sent to all the nodes.

Source: oracle.com

Related Posts

0 comments:

Post a Comment