SDP Release 23.4 – Stackable

Stackable Data Platform (SDP) Release 23.4 is now publicly available!

Highlights

The focus in this release is on Cluster Operation, the Status field and default/custom Affinities.

Cluster Operation

The first part of Cluster Operations has been implemented in all Stackable Operators where it is relevant. This supports pausing the cluster reconciliation (not applying any changes to the Kubernetes resource) and stopping the cluster completely (setting all replicas of StatefulSets, Deployments or DaemonSets to zero and therefore deleting all Pods belonging to that cluster whilst leaving the data intact).

Status Field

Operators of the Stackable Data Platform create, manage and delete Kubernetes resources: in order to easily query the health state of the products Stackable Operators use several predefined condition types to capture different aspects of a product’s availability.

Default / Custom Affinities

In Kubernetes there are different ways to influence how Pods are assigned to Nodes. In some cases it makes sense to co-locate certain services that communicate a lot with each other, such as HBase regionservers with HDFS datanodes. In other cases it makes sense to distribute the Pods among as many Nodes as possible. There may also be additional requirements e.g. placing important services in different racks or data-centers. This release implements default affinities that should suffice for many scenarios out-of-the box, while also allowing for custom affinity rules at a role and/or role-group level.

New product-specific features

We have also added new product-specific features such as:

support for loading Airflow DAGs using git-sync.
support for running the Secret operator in unprivileged mode.
support for Kerberos keytab provisioning with the Secret Operator.
completion of Logging framework rollout across all Operators.

Learning Stackable

This release contains the following new demos:

a condensed form of the data-lakehouse-iceberg-trino-spark demo showcasing the integration of Trino and Iceberg to store and modify data.
an integration of JupyterHub, PySpark and Apache Hadoop to run an anomaly detection notebook.
concept pages for cluster operations and pod placements.

Further details on our release and how to upgrade can be found in our release notes as well as in the change logs of the individual operators (e.g. for Airflow).