What Hadoop users need to know about our platform

Stackable blog thumbnail, showing an illustration of man setting with his mobile phone in hand.

Why does Stackable make the ideal choice for your modern data platform

Hadoop was first created in 2005 and as this adolescent technology rapidly approaches adulthood we find ourselves wondering what’s next on its life journey. Many folks have sounded the death knell for Hadoop over the years but in 2022 we still find it running at the heart of enterprises both big and small. Hadoop is here, the question is whether it’s here to stay and if not what comes next? Stackable has built the Stackable Data Platform (SDP) with this question in mind and suits those who want to remain with Hadoop or are looking for an off-ramp to other data technologies.

Historical Hadoop

Hadoop offered enterprises the ability to create their own data platform by choosing the components they wanted to use based on their requirements. Hortonworks and Cloudera shared most of the success, building their own distributions composed mostly of open source software with proprietary, licensable features on top. Looking back on these Hadoop distributions now it’s easy to view them as a relic of the past, with dedicated enterprise consoles and a monolithic software deployment model. This is a far cry from the world we now live in, one of fast paced CI/CD, GitOps, containers and cloud that leaves the extant Hadoop distributions looking tired and difficult to manage compared to modern alternatives. Add to that the paywalls erected around these ostensibly free products and you find Hadoop existing increasingly as a technology island in a data ocean.

Stackable Data Platform offers an alternative to the commercial Hadoop distributions. One very important point to note however is this; SDP is not a Hadoop distribution. While Stackable does ship Hadoop components including HDFS and HBase you can just as readily use SDP to build your data platform with absolutely no Hadoop in it at all. Unlike a Hadoop distribution SDP doesn’t base itself around the core Hadoop technologies, instead shipping a set of data components based on Kubernetes that allow you to define your data infrastructure as code. This moves firmly away from the enterprise management console approach of Ambari and Cloudera Manager and towards a model that fits much better with modern DevOps and CI/CD methodologies. In short, SDP offers a way to build a data platform tailored to suit your needs that can include Hadoop but doesn’t have to.

This puts Stackable Data Platform in a unique position of providing a platform compatible with existing installations as well as alternatives to shape the modernisation or replacement of existing Hadoop clusters. Whether you want to maintain your Hadoop presence for the next 5 years or if you’re looking for an exit strategy SDP provides a viable and supported choice. What you choose to do depends largely on where you are as a Hadoop user so I’ll describe a few places that you might find yourself in and explain how Stackable can help you move forward.

You still love Hadoop

You have invested a lot of time and effort into your Hadoop platform. It runs critical workloads and you have millions of lines of code and petabytes of data to maintain. Right now the friction of moving to a new platform is too high, but you need a mid- to long-term strategy to modernise your data platform. The Hadoop version you’re running is getting stale but the choices for upgrade are limited, difficult and expensive, but if you stay where you are you are stuck on an end of life product with no maintenance and security updates.

Stackable Data Platform offers you a way to revitalise your Hadoop infrastructure by upgrading to the latest version of Hadoop. You can use the same technologies to minimise the changes required to your application as well as modernising the way your code is deployed. Defining infrastructure as code makes it easier to deploy test and development environments and keep them consistent with your production platform. You can start to integrate your Hadoop data infrastructure better with other IT functions in your organisation, removing the operational silos created by proprietary management tools. SDP is lightweight and lower cost compared to existing Hadoop distributions and offers in place upgrade as well as sidecar migration.

You’ve fallen out of love with Hadoop

You feel that Hadoop has had its day and you want to move on with your life. The paradigms that originally sold you on Hadoop no longer bear much weight and you find the platform inflexible, costly to manage and you can’t attract talent with the necessary skills. You’re left with a data technology silo and you want to migrate to more modern (and dare I say popular) technologies supported more broadly by your IT organisation.  You want to move off your existing Hadoop clusters as sticking with your current vendor means a costly maintenance contract with a company who aren’t invested in helping you off-board their product.

Where Stackable Data Platform excels here is in supporting Hadoop but offering much more besides. Stackable understands Hadoop inside and out but is not devoted to keeping you on this technology, meaning they are well suited to helping you design and build your modern data platform and migrate away from your existing one. You don’t have to lose everything you’ve built; maybe you want to keep the best parts of your existing platform and run them in the cloud or on premise with compute and storage decoupled. Choosing your ideal data technology stack is at the heart of Stackable Data Platform.

You like Hadoop but want an open relationship

You’re a Hadoop user but one size does not fit all in the Big Data world and most organisations adopt multiple data technologies. Hadoop is not central to your data strategy but you want to make use of it alongside all of the other data sources at your disposal. You’re excited about building a Data Mesh architecture to tie together all your disparate data sources and want to do this in a consistent and well managed way.

Stackable’s expertise in big data and Hadoop means they can guide you on your journey across the whole data landscape. Stackable Data Platform fully supports your Data Mesh architectures, offering tools like Trino for running federated queries across multiple data catalogues and built-in support for defining data sources through infrastructure as code. You can deploy SDP with your data sources pre-configured, enabling short lifecycle and ephemeral deployments and cost saving versus always on, always deployed clusters. 


Wherever you find yourself on your big data journey Stackable has something for you. Get your data platform back on course with a supported, maintainable and modern Big Data distribution and gain better control with full support for infrastructure as code and CI/CD pipelines. Stackable deploys the best open source Big Data software with the flexibility and power of Kubernetes. The Stackable Kubernetes operators are open source and always will be, ensuring that you can use SDP however you wish and wherever you wish. You can deploy on-premise and in any cloud you like, not just with hyperscale cloud vendors, and build a truly open, hybrid Big Data platform with absolutely no vendor lock in.

Comments are closed.