Stackable

Stackable Data Platform (SDP) Release 24.3 – celebrating new security features

Stackable news thumbnail, showing an illustration of a man setting in a chair and looking at his phone.

Stackable Data Platform (SDP) Release 24.3 is now publicly available! This time the focus is on new security enhancing features and improvements.

Security-first: new product features

Security is the focus of developments in our latest version of SDP, in which we have added several new features for platform-wide authentication and authorization.

For authorization, we continue to rely on the Open Policy Agent (OPA) to create and enforce rule-based policies. Specific extensions have been implemented and integrated so that the products integrated in the Stackable Data Platform can support this: We would like to highlight our contribution (!) to Trino, which now officially supports access control using the Open Policy Agent in the current versions. For our distributed file system HDFS, policy-based authorization is supported for the first time through the OPA integration, a feature that HDFS users and administrators have long wished for.

The widely used open-source IAM application Keycloak can be used to implement uniform identity and access management across all products. This is made possible by the new User Information Fetcher component, which allows the Open Policy Agent to enforce policies based on user and group information from Keycloak.

Support for Kerberos and OpenID Connect (OIDC) has been extended for authentication: After Apache HDFS, Kerberos is now also available for Apache HBase and Apache Hive. This is complemented by examples of running Apache Spark applications in a Kerberos-enabled environment. Last but not least, we are introducing the integration of OpenID Connect (OIDC) for single sign-on to our user interfaces, starting with Apache Superset and Trino.

Finally, the first product binaries will be created from source code – initially for Apache Hadoop and Apache HBase – instead of packaging them from official versions. This will give us more control over the features and security aspects of these products in the future.

Security-second: vulnerability management

For the first time, we are publishing Software Bills of Materials (SBOMs) in CycloneDX format for both operators and product images. This project was partially funded by the Sovereign Tech Fund. Our SBOMS are published as signed, in-toto attestations in our OCI registry. To help users get started with these SBOMs, we have created a detailed tutorial. We have also made an SBOM browser publicly available that enables automatic downloading and analysis of the CycloneDX JSON files.

Transparency is also an aspect of security, which is why we are continuously working on improving our documentation. We generate the documentation for our custom resource definitions automatically and make it freely available at https://crds.stackable.tech.

New Product-Specific Features

Further new functions in our platform components, operators and products will be made available with release 24.3:

  • Storage
    • Introducing rack awareness support for HDFS deployments, bringing the SDP platform closer to feature parity with bare metal HDFS deployments.
    • Introducing a new topology provider bundled with the HDFS image, linking Kubernetes labels to a cluster topology.
  • Command Line Tooling
    • Revamped stackablectl command line tool now capable of enumerating endpoints provided by the listener operator.
    • Parallelized operator installation, significantly expediting the establishment process of SDP on fresh Kubernetes clusters.
  • Custom Labels for Helm Charts
    • Helm users can now assign custom labels to stacklets, facilitating improved component management with third-party tools.
  • Noteworthy Bugfixes:
    • Apache Airflow Operator: Now supports using git-sync with the KubernetesExecutor.
    • Apache Hadoop Operator:
      • Inclusion of Kerberos principals in the discovery ConfigMap.
      • Environment variables can now be superseded with the role group’s envOverrides property.
    • Apache Spark Operator:
      • Dynamic provisioning of applications without necessitating modification of classpath settings.
      • Updated RBAC permissions allowing deletion of ConfigMaps during application cleanup.
    • Trino Operator: Addition of HDFS configuration files to the hive.config.resources property when connecting to an HDFS cluster.

New Product Versions

The following new product versions are now supported:

ProductNew version/sWhat’s new ?
Airflow2.7.3
2.8.1
Introducing Airflow Object Storage and Listener hooks for Datasets plus various bug fixes.
Druid 28.0.1SQL compliance & engine enhancements, ingestion improvements,
concurrent data handling.
HBase(2.4.17)No version change.
HDFS(3.2.4
3.3.6)
No version change.
Kafka3.5.2
3.6.1

Bug fix releases.
NiFi1.25.0Improvements and bugfixes. Over 270 issues fixed since version 1.23.2. Adds new components for Slack and Zendesk integration among others.
OpenPolicyAgent 0.61.0Performance improvements, bugfixes and security fixes for third-party libraries. Tooling to help prepare existing policies for the upcoming OPA 1.0 release, which will incude a new version of the Rego language.
Spark3.4.2
3.5.1
Releases containing maintenance, security and correctness fixes.
Superset
2.1.3
3.0.3
3.1.0
Latest patch release for the Superset 2.x lineage.

Apache Superset 3.1 includes various smaller new features/optimizations e.g. waterfall chart visualization, ECharts bubble chart, improved data set selectors, automatically format SQL queries, and country map visualization improvements.
Trino
442
Lots of improvements and optimization since release 428. Most notably we would like to highlight support for access control with the Open Policy Agent that we ourselves contributed (s.a.) in release 438 (#19532). Also, starting from release 440, there is now row filtering and column masking in Open Policy Agent access control.
ZooKeeper3.8.4
3.9.2
Security and bug fixes.

More Info

Further details on our release and how to upgrade can be found in our release notes as well as in the change logs of the individual operators:

Airflow, Druid, HBase, HDFS, Kafka, NiFi, OpenPolicyAgent, Spark, Superset, Trino, ZooKeeper

Comments are closed.