Skip to content
Back to jobs

Posted 1 week ago

Docker

Staff Software Engineer Infrastructure

Roles

Compensation

Technology stipend equivalent to $100 USD net/month; 16 weeks of paid parental leave (after 6 months); training stipend for conferences/courses/classes; equity (growing startup).

unclear
Equity; all employees to have a share in the success of the company.
  • Freedom & flexibility; fit your work around your life
  • Designated quarterly Whaleness Days plus end of year Whaleness break
  • Home office setup; comfortable while you work
  • 16 weeks of paid Parental leave (after 6 months of employment)
  • Technology stipend equivalent to $100 USD net/month
  • PTO plan that encourages you to take time to do the things you enjoy
  • Training stipend for conferences, courses and classes
  • Equity
  • Docker Swag
  • Medical benefits, retirement and holidays vary by country
  • Remote-first culture, with offices in Seattle and Paris

Tech stack

GoLinuxKubernetesAmazon EKSTerraformArgo CDGrafana CloudGrafanaPrometheusOpenTelemetryEnvoy Gateway
ingressCNIservice-meshCI/CDGitHub ActionscanariesOpenTelemetryPrometheusGrafanaprogressive deliverymigrationsadoption programs

Location

Remote

Work setup

full-time
Senior
Remote; Remote-first culture. Work arrangement listed as Remote. On-call rotation after onboarding and shadowing.
unclear
Clear written and verbal communication in a remote environment (RFCs, design docs, incident writeups).
Docker considers visa sponsorship on a case-by-case basis based on business needs.

Role details

  • Take ambiguous infrastructure problems and turn them into proposals the org can rally around, then drive them through RFCs and architecture reviews across teams.
  • Design self-service capabilities and platform APIs (primarily in Go) for onboarding, provisioning, deployment, observability defaults, and day-2 operations, with contracts and docs teams actually use.
  • Set delivery standards using Terraform, GitOps with Argo CD, progressive rollout, and good testing, including building the continuous-deployment flow we're missing today.
  • Evolve the multi-tenant EKS foundations toward better reliability, security, scale, and cost: Envoy Gateway ingress, traffic routing, and the multi-region, cross-account connectivity we need.
  • Improve SLOs, alerting, and incident follow-up on Grafana Cloud so production gets safer and less dependent on heroics.
  • Help shape AI-assisted and agentic workflows to cut operational toil while staying safe, auditable, and human-reviewed.
  • Alert enrichment and incident context-gathering to assemble relevant signals, history, and runbooks.
  • Runbook-assisted diagnosis and remediation recommendations with a human in the loop on anything that changes production.
  • Onboarding and readiness assistants that answer questions experts answer today.
  • Join the on-call rotation after onboarding and shadowing; improve the health of on-call with better alerts, stronger runbooks, less toil, and blameless postmortems aimed at prevention.
  • 8+ years of professional, hands-on, full-time software engineering experience in backend, infrastructure, or platform engineering.
  • Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience.
  • Strong software engineering in Go or a similar language: design, testing, debugging, review, long-term maintainability.
  • Track record designing, shipping, and operating cloud services or infrastructure platforms in production.
  • Deep expertise in at least one of: Kubernetes, networking, cloud platforms, reliability engineering, or developer platforms, plus solid Linux, networking, and production-ops fundamentals.
  • Experience setting technical direction and leading work that needs cross-team alignment.
  • Clear written and verbal communication in a remote environment (RFCs, design docs, incident writeups).

Application

Please mention the word COMFORT and tag RMTk0LjE2My4xNjEuMjIw when applying to show you read the job post completely (#RMTk0LjE2My4xNjEuMjIw).

unclear
unclear
unclear
external

Company context

Container image and application development tools (Docker Desktop, Docker Hub, Docker Scout) and infrastructure platform for building and delivering software.
Developer tools
Seattle and Paris (offices mentioned); company location not fully specified
growing start-up

Description

Docker has been one of the most loved brands in developer tooling, trusted by more than 20 million monthly users and over 20 billion container image pulls. As a globally distributed, remote-first team, Docker builds the tools that define how software gets built and delivered. The Staff Software Engineer Infrastructure role will help move expert-driven infrastructure and operational workflows into self-service systems with clear ownership, safe defaults, strong guardrails, and measurable adoption. Responsibilities (Staff-level, measured by leverage): - Take ambiguous infrastructure problems and turn them into proposals the org can rally around; drive through RFCs and architecture reviews. - Design self-service capabilities and platform APIs (primarily in Go) for onboarding, provisioning, deployment, observability defaults, and day-2 operations; provide contracts and docs. - Set delivery standards using Terraform, GitOps with Argo CD, progressive rollout, and strong testing, including building a continuous-deployment flow. - Evolve multi-tenant EKS foundations for reliability, security, scale, and cost, including Envoy Gateway ingress, traffic routing, and multi-region, cross-account connectivity. - Improve SLOs, alerting, and incident follow-up on Grafana Cloud. AI-assisted operations: - Shape AI-assisted and agentic workflows to cut operational toil while staying safe, auditable, and human-reviewed. - Early targets include alert enrichment/incident context gathering; runbook-assisted diagnosis/remediation recommendations with human-in-the-loop for production changes; onboarding/readiness assistants. On-call: - Operational ownership; join rotation after onboarding and shadowing; improve on-call health with better alerts, runbooks, less toil, and blameless postmortems aimed at prevention.

Similar jobs

  • Loading similar jobs...