Handling data governance during a platform migration means explicitly mapping, translating, and validating every policy, access rule, and lineage dependency before you cut over, not after. Governance controls that worked in your old platform rarely transfer automatically; they need to be rebuilt or re-expressed in terms the new platform understands. The questions below walk through the most common failure points and how to address them systematically.
Many of the teams we work with run into exactly this when moving to a Kubernetes-native data platform – governance continuity is frequently the last thing planned and the first thing that breaks. By the end, you’ll know how the Stackable Data Platform (SDP) addresses each of these challenges.
What breaks in data governance when you migrate platforms?
When you migrate platforms, the most common governance failures are access control gaps, broken data lineage, orphaned audit logs, and policy mismatches between environments. These failures happen because governance in most platforms is tightly coupled to the platform’s internal mechanisms – role definitions, metadata schemas, and audit hooks that simply do not exist in the target system until you explicitly recreate them.
Access control is usually the first to break. Role-based permissions defined in one system rarely map one-to-one to another. A role called “data_analyst” in your legacy warehouse may carry implicit privileges that are not documented anywhere and only surface when someone gets an unexpected denial – or unexpected access – in the new environment.
Data lineage is the second casualty. Lineage graphs are built from metadata emitted by specific tools. When you change tools, that metadata stops flowing. Unless you have a lineage layer that is independent of the underlying execution engine, you end up with a gap in the graph that makes compliance audits painful.
Audit logs are often overlooked entirely. Most organizations assume they can export historical logs before decommissioning the old platform. In practice, log formats differ, retention periods conflict with compliance requirements, and the new platform’s audit schema may not accept the old format without transformation.
How do you map existing governance policies before migrating?
Mapping existing governance policies before a migration requires extracting every policy definition, access rule, and data classification from the current platform in a format that is independent of that platform’s implementation. The goal is a governance inventory that describes what is controlled, not how the current system controls it.
Start with access control. Export all role definitions, group memberships, and object-level permissions. For each permission, document the business justification, not just the technical rule. This matters because the new platform may model the same control differently, and without the business intent, you cannot make good translation decisions.
Next, catalog your data classifications. Which datasets are tagged as sensitive, regulated, or confidential? These classifications need to survive the migration and be re-applied in the new platform’s metadata layer from day one, not retrofitted later.
Then map your policy dependencies. Some policies only make sense in combination – a masking rule that applies only when a user lacks a specific role, for example. Document these dependencies explicitly. A flat export of individual rules will miss them.
Finally, identify which policies are enforced by the platform versus which are enforced upstream by your data pipeline or downstream by your BI layer. Platform migrations only carry the platform-enforced policies. The others need separate migration plans.
How can policies as code preserve governance across environments?
Policies as code preserve governance across environments by expressing access rules, data classifications, and compliance controls as version-controlled, executable definitions that are not tied to any single platform’s UI or internal configuration format. When your policies are code, they travel with your infrastructure and can be applied consistently whether you are running on-premises, in the cloud, or in a hybrid setup.
The practical mechanism is declarative policy definitions stored in a repository alongside your infrastructure definitions. Tools like Open Policy Agent (OPA) allow you to write authorization logic that can be evaluated independently of the data platform itself. This means the same policy file that governs access in your staging environment governs access in production – and the same file that worked before the migration works after it.
Policies as code also make migration testing tractable. You can run your policy suite against the new platform before cutover and compare the authorization outcomes against the old platform. Discrepancies become visible as test failures rather than production incidents.
There is a real discipline cost here. Policies as code only work if the team treats policy files with the same rigor as application code – code review, testing, and change history. If policies drift back into being managed through a UI, you lose the reproducibility benefit quickly.
What’s the difference between governance in proprietary versus open-source platforms?
The key difference is where governance is implemented and who controls it. In proprietary platforms, governance features are built into the product and managed through vendor-controlled interfaces. In open-source platforms, governance is typically assembled from composable components that you integrate and configure yourself, giving you more control but also more responsibility.
Proprietary platforms often provide governance as a bundled, integrated feature set – access control, lineage, and auditing are part of the product. This makes initial setup faster, but it also means your governance model is constrained by what the vendor has chosen to support. When you migrate away, you often discover that governance configurations are stored in proprietary formats with no clean export path.
Open-source platforms approach governance differently. Access control might be handled by Apache Ranger or OPA, lineage by OpenLineage, and auditing by a separate log aggregation layer. Each component is independently configurable and replaceable. This composability is what enables data sovereignty – you are not dependent on a single vendor’s roadmap or pricing decisions to maintain compliant operations.
The trade-off is integration work. In a proprietary platform, the governance components talk to each other because the vendor built them to. In an open-source stack, you are responsible for ensuring that your lineage collector is actually receiving events from your query engine, and that your access control layer is enforced at every entry point. This is not a reason to avoid open-source governance – it is a reason to plan it carefully and treat it as infrastructure, not an afterthought.
How do you maintain data lineage continuity during a migration?
Maintaining data lineage continuity during a migration requires running lineage collection in both the old and new environments simultaneously during the transition period, using a lineage standard that both platforms can emit to. The OpenLineage specification is the most practical choice for this because it is supported by a growing number of query engines and orchestration tools.
The transition period is where most lineage gaps occur. If you decommission the old platform before the new one is fully emitting lineage events, you get a break in the graph. For compliance purposes, that break can be as problematic as having no lineage at all, because auditors need to trace data flows across the full history of a dataset.
A few concrete steps help here. First, deploy your lineage backend – whether that is Marquez, DataHub, or another OpenLineage-compatible catalog – before you start the migration. Get it collecting from the current environment so you have a baseline. Second, configure lineage emission on the new platform and verify that events are flowing before you migrate any production workloads. Third, document the migration cutover as an explicit lineage event so the graph reflects the transition rather than showing a gap.
For pipelines that span both environments during the transition – where a job on the old platform feeds a table that a job on the new platform reads – you need to ensure that both jobs are emitting lineage to the same backend with compatible dataset identifiers. Mismatched dataset naming conventions between platforms are a common and avoidable source of lineage breaks.
When should governance validation happen in a migration timeline?
Governance validation should happen at three points in a migration timeline: before migration begins (baseline validation), immediately after the new environment is configured but before data moves (pre-cutover validation), and after cutover with production traffic running (post-cutover validation). Treating governance validation as a single end-of-project activity is the most reliable way to discover problems after they have already caused compliance exposure.
Baseline validation establishes what governance actually looks like in the current environment, which is often different from what documentation says. Run access control audits, verify lineage completeness, and confirm that audit logging is working. This gives you a target state to validate against.
Pre-cutover validation checks that the new platform enforces the same access rules and that lineage collection is active. This is where policies as code pay off – you can run automated tests against the new environment before any production data arrives. Specifically, test that users who should be denied access are denied, not just that users who should have access can reach data.
Post-cutover validation confirms that governance holds under real production conditions. Access patterns in production often differ from what test cases cover. Monitor access logs for anomalies in the first weeks after cutover and run a lineage completeness check against a sample of your most critical datasets.
One thing worth stating directly: governance validation requires dedicated time in the migration plan. It is not something that happens automatically alongside technical testing. If your migration timeline does not include explicit governance validation milestones, they will be skipped when the schedule gets tight.
How Stackable helps with data governance during migration
The SDP is built around the premise that governance should be expressed as configuration, not managed through a UI that locks you into a single vendor’s model. This makes it a practical fit for organizations that need to migrate without losing governance continuity.
- Policies as code via OPA integration: The SDP supports Open Policy Agent for authorization across the platform. Access rules are defined declaratively, version-controlled, and applied consistently across environments – on-premises, cloud, or hybrid.
- OpenLineage-compatible lineage collection: Tools in the SDP emit lineage events using open standards, so your lineage graph continues to build through a migration rather than starting from scratch on the new platform.
- Kubernetes-native configuration management: All platform components, including access control and audit settings, are defined as Kubernetes custom resources. This means governance configuration is reproducible, diffable, and deployable through standard GitOps workflows.
- No vendor lock-in on governance metadata: Because the SDP uses open standards throughout, your governance metadata – policies, lineage, audit logs – is not trapped in a proprietary format. You own it and can move it.
- Data sovereignty by design: The SDP is 100% open source and cloud-agnostic, which means you can run the full governance stack in your own infrastructure without routing sensitive metadata through a third-party cloud service.
If you are planning a platform migration and want to understand how the SDP handles governance in your specific environment, reach out to the Stackable team to discuss your requirements.
Related Articles
- What is the difference between a data platform migration and a data warehouse migration?
- How do you build a business case for data platform migration?
- What are the signs your data platform needs replacing?
- What happens to your data pipelines during a platform migration?
- How do you manage Big Data infrastructure with infrastructure as code?