The Stackable Data Platform (SDP) Release 22.09 is now available! This short article will show you how to install it and what you can expect from it in terms of new features and other improvements. For more details you can visit the release notes page or dive into individual operator changelogs (e.g. for Airflow).
Installation
The easiest way to install the release – indeed, the recommended way to manage your all SDP releases going forward – is with the stackablectl tool. Once the tool is installed you can list, install and remove releases using the release command option, for example:
stackablectl release uninstall 22.06
stackablectl release install 22.09
Highlights
The focus in this release is on two major features: OpenShift compatibility and security.
OpenShift compatibility
OpenShift is compatible with Kubernetes – Kubernetes is a central component of its distributed platform – though the converse is not true. Operators and applications that run in a non-OpenShift Kubernetes environment must be adapted to be runnable within OpenShift. OpenShift offers advantages in the areas of e.g. scalability, the management of multi-clusters, and security & compliance.
OpenShift also offers certified operators from within its Embedded OperatorHub. As a first step towards certification of our own SDP operators we are implementing stricter security rules. This work will be iterative in nature but similar security features – mainly the fact that product Pods must run with a custom ServiceAccount
and SecurityContextConstraint
– will be added to other operators over time. With this release we have made continued progress towards OpenShift compatibility, and the following operators can now be previewed on OpenShift:
- Apache Airflow
- Apache HBase
- Apache HDFS
- Apache Spark on Kubernetes
- Apache ZooKeeper
Further improvements are expected in future releases, but no stability or compatibility guarantees are currently made for OpenShift clusters.
Security – TLS
Internal and external transport security using TLS has been implemented for the Apache Kafka, Trino and ZooKeeper operators.
Kafka and ZooKeeper
For Kafka and ZooKeeper, the internal (broker or quorum) and client communication are encrypted by default via TLS, using certificates created by the Secret operator (though these can be overriden by secrets you provide yourself). The internal communication is also authenticated via TLS per default: this can also be enabled this for client-server communication by providing a reference to an AuthenticationClass
in the Kafka/ZooKeeper cluster definition.
Trino
For Trino, internal TLS (for encrypted and authenticated communication between Trino coordinators and workers) must be explicitly configured as it adds a performance overhead of which the user should be aware.
Security – LDAP
The Airflow, Apache Nifi and Apache Superset operators can now use a central LDAP server to manage your user identities in one place: simply specify an AuthenticationClass
which will then be used to authenticate them. Authorization with LDAP is also available in a limited form: further iterations will follow in subsequent releases.
Demos with stackablectl
stackablectl
now supports the deployment of ready-made demos, combining the rollout of operator stacks with the demonstration of a particular use-case for that stack. Check them out here:
- Taxi data analysis with S3, Trino and Superset
- Earthquake data visualization with S3, Kafka, Apache Druid and Superset
- Water level visualization, also with S3, Kafka, Druid and Superset
The last two in this list combine an initial batch load with near-real-time updates with current streamed data. Watch out for more demos in the next release!