Stackable

Stackable

SDP Release 22.09

The Stackable Data Platform (SDP) Release 22.09 is now available! This short article will show you how to install it and what you can expect from it in terms of new features and other improvements. For more details you can visit the release notes page or dive into individual operator changelogs (e.g. for Airflow).

Installation

The easiest way to install the release – indeed, the recommended way to manage your all SDP releases going forward – is with the stackablectl tool. Once the tool is installed you can list, install and remove releases using the release command option, for example:

stackablectl release uninstall 22.06
stackablectl release install 22.09

Highlights

The focus in this release is on two major features: OpenShift compatibility and security.

OpenShift compatibility

OpenShift is compatible with Kubernetes – Kubernetes is a central component of its distributed platform – though the converse is not true. Operators and applications that run in a non-OpenShift Kubernetes environment must be adapted to be runnable within OpenShift. OpenShift offers advantages in the areas of e.g. scalability, the management of multi-clusters, and security & compliance.

OpenShift also offers certified operators from within its Embedded OperatorHub. As a first step towards certification of our own SDP operators we are implementing stricter security rules. This work will be iterative in nature but similar security features – mainly the fact that product Pods must run with a custom ServiceAccount and SecurityContextConstraint – will be added to other operators over time. With this release we have made continued progress towards OpenShift compatibility, and the following operators can now be previewed on OpenShift:

  • Apache Airflow
  • Apache HBase
  • Apache HDFS
  • Apache Spark on Kubernetes
  • Apache ZooKeeper

Further improvements are expected in future releases, but no stability or compatibility guarantees are currently made for OpenShift clusters.

Security – TLS

Internal and external transport security using TLS has been implemented for the Apache Kafka, Trino and ZooKeeper operators.

Kafka and ZooKeeper

For Kafka and ZooKeeper, the internal (broker or quorum) and client communication are encrypted by default via TLS, using certificates created by the Secret operator (though these can be overriden by secrets you provide yourself). The internal communication is also authenticated via TLS per default: this can also be enabled this for client-server communication by providing a reference to an AuthenticationClass in the Kafka/ZooKeeper cluster definition.

Trino

For Trino, internal TLS (for encrypted and authenticated communication between Trino coordinators and workers) must be explicitly configured as it adds a performance overhead of which the user should be aware.

Security – LDAP

The Airflow, Apache Nifi and Apache Superset operators can now use a central LDAP server to manage your user identities in one place: simply specify an AuthenticationClass which will then be used to authenticate them. Authorization with LDAP is also available in a limited form: further iterations will follow in subsequent releases.

Demos with stackablectl

stackablectl now supports the deployment of ready-made demos, combining the rollout of operator stacks with the demonstration of a particular use-case for that stack. Check them out here:

The last two in this list combine an initial batch load with near-real-time updates with current streamed data. Watch out for more demos in the next release!

Comments are closed.