Who We Are About Stripe  Stripe is a financial infrastructure platform for businesses. Millions of companies — from the world's largest enterprises to the most ambitious startups — use Stripe to accept payments, grow their revenue, and accelerate new business opportunities. Our mission is to increase the GDP of the internet, and we have a staggering amount of work ahead. That means you have an unprecedented opportunity to put the global economy within everyone's reach while doing the most important work of your career. About the Team The Datalake team builds and maintains Stripe's foundational data access and governance infrastructure — the paved path for safe, fast, and compliant access to Stripe's critical big data assets. We serve developers, data engineers, analysts, ML and AI teams, security teams, and business users across the company. The team is in the middle of a significant architectural transition as Stripe grows. We are making Stripe's data lake a first-class citizen of the modern data ecosystem to support our growing scale and diverse workloads. What Makes This Role Compelling <ul> <li>Foundational infrastructure with broad reach: The Datalake team's systems sit in the critical path of nearly every data workload at Stripe. Decisions affect petabytes of data, hundreds of production pipelines, and every engineering team that builds on Stripe's data lake.</li> <li>Active, high-stakes architectural transformation: The team is executing a multi-year migration to modern, OSS-aligned solutions — a technically deep project with real architectural choices at each step, including API design, compute engine integration, authorization model, and per-table credential vending.</li> <li>Active, high-stakes, OSS-aligned architectural transformation: You will lead a multi-year migration to modern, open-source solutions like the Apache Iceberg REST Catalog. This is a technically deep project involving critical architectural choices at each step, from API design and compute engine integration to authorization models, where your opinions and technical influence will directly shape how the platform engages with the broader data infrastructure ecosystem.</li> <li>Storage platform ownership with room to define the approach: The team owns the object storage abstraction layer — access control, IAM policy design, lifecycle management, and compliance architecture — but the how is still being written. You'll shape how hundreds of engineering teams interact with petabytes of data, and the decisions you make will stick.</li> <li>At Stripe you’ll have the scale of the large company and the agency to influence technical strategy and the roadmap</li> </ul> Responsibilities <ul> <li>Architect the unified Iceberg platform: Lead the technical design of a metastore service as it becomes the single source of truth for Iceberg table management across all compute engines — Spark, Trino, Flink, and PyIceberg. Define the API contracts, authorization model, per-table credential vending, and integration patterns that every data pipeline at the company will depend on.</li> <li>Own the metastore migration strategy: Drive the sequencing, backward compatibility story, rollback approach, and cross-team coordination for migrating all remaining Hive Metastore-backed workloads to the new platform. This means coordinating with dozens of consuming teams while keeping production data infrastructure operational at all times.</li> <li>Shape the object storage abstraction: Define the storage abstraction layer — including bucket provisioning, access control policy design, and the developer-facing client libraries that make object storage ergonomic and secure by default. The goal is an abstraction layer that consuming teams can adopt without needing to become storage infrastructure experts themselves.</li> <li>Lead compliance architecture: Partner with security and compliance teams to translate regulatory requirements into durable preventative technical controls — audit logging, access review infrastructure, data segregation, and lifecycle enforcement — built into the platform rather than bolted on.</li> <li>Drive cost and efficiency at petabyte scale: Identify systemic inefficiencies in storage layout, snapshot retention, and data lifecycle, and design automated, self-service tooling that scales without ongoing manual intervention from the team.</li> <li>Set the technical bar: Own critical design reviews, establish standards for reliability, security, and developer experience, and mentor senior engineers through high-stakes architectural decisions. Provide the technical judgment that keeps the platform moving fast without accumulating structural debt.</li> </ul> Who You Are Minimum requirements <ul> <li>10+ years of professional software engineering experience.</li> <li>Demonstrated track record of designing, building, and operating large-scale distributed storage or data infrastructure systems.</li> <li>Deep experience with object storage (S3, Azure Blob, or equivalent) — including IAM, access control policy design, lifecycle management, and operational practices at petabyte scale.</li> <li>Proven ability to lead complex, multi-quarter infrastructure projects end-to-end, including cross-team dependency management and coordinating migrations across many consuming teams.</li> <li>Strong background in authorization and access control design for distributed data systems.</li> </ul> Preferred requirements <ul> <li>Deep expertise in Apache Iceberg — table format internals, the REST Catalog specification, snapshot lifecycle management, compaction, and compute engine integration (Spark, Trino, Flink, PyIceberg).</li> <li>Background in compliance-sensitive infrastructure — SOX, ICFR, or equivalent regulatory frameworks — with an understanding of how audit and access review requirements translate into preventative technical controls.</li> <li>Experience safely executing large-scale data migrations with a strong instinct for sequencing, blast radius reduction, rollback, and data integrity validation.</li> <li>A strong developer experience sensibility: the ability to build abstractions that are ergonomic, well-documented, and actively reduce toil for the engineering teams that depend on your platform.</li> </ul>

Staff Engineer, Datalake Platform

Job Description