Jim Halfpenny, Autor bei Stackable

Stackable and Trino Part 3: Migrating Hive Tables Using CTAS

Posted by Jim Halfpenny on 3. February 2023 | Featured

Using CREATE TABLE AS SELECT (CTAS) SQL statements is a well established method of copying and transforming structured data.Trino’s inherent ability to manipulate data from many different sources, sinks and formats makes this a particularly effective way to move data … Read More

Tags: Stackable

Stackable and Trino Part 2: Setting up a Hive migration sandbox

Posted by Jim Halfpenny on 10. January 2023 | Featured

In Part 1 of this blog series I looked at how Stackable and Trino fit in with the wider Apache Hadoop SQL ecosystem and especially with Apache Hive to provide a means of smoothing the migration away from Hive. Now … Read More

Tags: Stackable

Stackable and Trino Part 1: A Rosetta Stone for Apache Hive

Posted by Jim Halfpenny on 12. December 2022 | Featured

Apache Hive is a common feature in many Hadoop deployments and it’s not unusual to find Hadoop clusters where the primary use case boils down to SQL queries on structured data. Hive has strong roots in the Hadoop ecosystem and … Read More

Tags: Stackable

The Stackable Docathon: building a data pipeline

Posted by Jim Halfpenny on 30. May 2022 | Featured

Last month we ran our first ever Documentation-Hackathon – or “Docathon” – at Stackable. The result is a guide showing how to build a simple data pipeline which can be found here: As anyone who has been involved in software … Read More

Tags: Stackable

What Hadoop users need to know about our platform

Posted by Jim Halfpenny on 20. April 2022 | Featured

Why does Stackable make the ideal choice for your modern data platform Hadoop was first created in 2005 and as this adolescent technology rapidly approaches adulthood we find ourselves wondering what’s next on its life journey. Many folks have sounded … Read More

Tags: Stackable

A Brief History of Open Source Big Data Distributions

Posted by Jim Halfpenny on 16. September 2021 | Featured

This blog post is based on a lecture at Berlin Buzzwords by Lars Francke and Sönke Liebau on June 15th, 2021. You can find the full version of the lecture on YouTube. If large amounts of data are to be stored, … Read More

Tags: Stackable

Building a New Big Data Distribution Based on Kubernetes – With a Twist!

Posted by Jim Halfpenny on 19. August 2021 | Featured

This blog post is based on the presentation to Berlin Buzzwords by Lars Francke and Sönke Liebau on 2021-06-15. You can watch the full version of the talk on YouTube. A brief history of open source big data distributions If … Read More

Tags: Stackable

Author Archives: Jim Halfpenny

Stackable and Trino Part 3: Migrating Hive Tables Using CTAS

Stackable and Trino Part 2: Setting up a Hive migration sandbox

Stackable and Trino Part 1: A Rosetta Stone for Apache Hive

The Stackable Docathon: building a data pipeline

What Hadoop users need to know about our platform

A Brief History of Open Source Big Data Distributions

Building a New Big Data Distribution Based on Kubernetes – With a Twist!