Stackable Data Platform (SDP) Release 23.11 – Enhancing Operational Excellence

Stackable Data Platform (SDP) Release 23.11 is now publicly available! This time the focus is on various improvements that optimize operational efficiency.

PodDisruptionBudgets, graceful shutdowns, and signed product images collectively contribute to operational resilience. By minimizing planned downtime, managing controlled shutdowns, and fortifying security, your data platform achieves a level of continuity that is essential in today’s dynamic data landscape.

Here’s a closer look at key features that contribute to operational optimization:

Continuous availability

Experience improved business operations with the implementation of Kubernetes-backed PodDisruptionBudgets. These mechanisms for planned downtime management ensure that critical roles, including HDFS namenodes and Trino workers, undergo controlled, graceful shutdowns. This approach ensures a seamless transition without compromising service availability, allowing your operations to maintain continuity and reliability.

Additional Trust Layer

SDP 23.11 introduces the signing of all Stackable product images, complementing the signing of operators introduced in R 23.7 to provide an added layer of security. This not only fortifies your data ecosystem but significantly mitigates operational risks. By ensuring the authenticity and integrity of your product images, we bolster your business against potential threats and instill confidence in the reliability of your data operations.

New Product-Specific Features

We have also added new product-specific features such as:

Airflow: KubernetesExecutor-run jobs for better resource management without need for a queue component
HBase: Hadoop native compression for better performance
HBase: inclusion of operator tools for administration, analysis, and cluster debugging
HDFS: support for FUSE to allow HDFS to be mounted as a standard file system using the mount command
Hive: updated postgresql driver to support SCRAM authentication to avoid newer postgres versions to explicitly enable MD5 password encrpytion
Spark: all product images now contain pyspark thus harmonizing the images
Trino: support for the new OPA authorizer in preparation for upstream integration
Vector: upgrade to 0.33.0 for security and bug fixes
All java-based products: overridable Java security settings. For JVM-based products (i.e. Druid, HBase, HDFS, Hive, Kafka, NiFi, Spark, Trino and ZooKeeper) it is now possible to provide custom security settings that override the default values. This allows controlling things such as DNS lookup caches.

New Product Versions

The following new product versions are now supported:

Product	New version/s	What’s new ?
Airflow	2.6.3, 2.7.2	Bug fixes releases.
Druid	27.0.0	Focuses on stability and scaling improvements, introducing Smart Segment Loading for managing data files as the database scales, improved schema auto-discovery, and a new feature for querying from deep storage.
HBase	2.4.17	Latest patch release in the HBase 2.4.x line.
HDFS	3.2.4, 3.3.6	A lot of major and minor improvements, please see official change log.
Kafka	2.8.2, 3.4.1, 3.5.1	Bug fixes and better handling of offset synchronization at startup and during task commits for Mirror Maker 2 (MM2), reducing unnecessary RPC calls, and managing frequent rebalances in MM2.
NiFi	1.23.2	Corrected repository corruption related to handling empty FlowFiles.
OpenPolicyAgent	0.57.0	Updated Rego syntax to allow general references in rule heads, and a mix of new features and bugfixes.
Spark	3.4.1, 3.5.0	Adds new PySpark and SQL functionality such as the SQL IDENTIFIER clause, named argument support, HyperLogLog approximate aggregations, and Python user-defined table functions, while streamlining distributed training with DeepSpeed and introducing watermark propagation and dropDuplicatesWithinWatermark operations in Structured Streaming.
Superset	2.1.1, 3.0.1	Latest patch release for the Superset 2.x lineage. Apache Superset 3.0 improves three key areas: enhancing the developer experience by simplifying maintenance and testing, improving codebase maintainability by removing outdated features and reducing complexity, and streamlining the product through diligent review and refactoring for better efficiency and performance. Get an overview here.
Trino	428	Focuses on reducing memory usage for queries involving GROUP BY clauses and simplifying writer count configuration. It also introduces enhancements for various connectors like Delta Lake, Hive, Hudi, and Iceberg, such as reducing the number of read requests for scanning small Parquet files and introducing the parquet.small-file-threshold configuration property.
ZooKeeper	3.8.3	Security and bug fixes.

One More Thing…

As a final note, SDP 23.11 introduces the early and experimental preview of Stackable Cockpit, a browser-based management tool which interacts with the Stackable data platform to display e.g. deployed stacklets and their status.

We appreciate your comments and thoughts on this. Happy to receive your feedback!

Learning Stackable

Further details on our release and how to upgrade can be found in our release notes as well as in the change logs of the individual operators:

Airflow, Druid, HBase, HDFS, Kafka, NiFi, OpenPolicyAgent, Spark, Superset, Trino, ZooKeeper