Docker is hiring - June 2026

Roles

Staff Software Engineer Infrastructure

Compensation

Technology stipend equivalent to $100 USD net/month; 16 weeks of paid parental leave (after 6 months); training stipend for conferences/courses/classes; equity (growing startup).

Salary period: unclear
Equity: Equity; all employees to have a share in the success of the company.

Benefits

Freedom & flexibility; fit your work around your life
Designated quarterly Whaleness Days plus end of year Whaleness break
Home office setup; comfortable while you work
16 weeks of paid Parental leave (after 6 months of employment)
Technology stipend equivalent to $100 USD net/month
PTO plan that encourages you to take time to do the things you enjoy
Training stipend for conferences, courses and classes
Equity
Docker Swag
Medical benefits, retirement and holidays vary by country
Remote-first culture, with offices in Seattle and Paris

Tech stack

incident runbooks Envoy Gateway GitHub Actions Prometheus Kubernetes Linux Amazon EKS Go EKS Grafana Cloud Grafana Kubernetes networking GitOps CI/CD Terraform Argo CD Networking Docker OpenTelemetry Progressive Delivery

Required

GoLinuxKubernetesAmazon EKSTerraformArgo CDGrafana CloudGrafanaPrometheusOpenTelemetryEnvoy Gateway

Nice to have

ingressCNIservice-meshCI/CDGitHub ActionscanariesOpenTelemetryPrometheusGrafanaprogressive deliverymigrationsadoption programs

Location

Remote

Work setup

Employment: full-time
Level: Senior
Remote policy: Remote; Remote-first culture. Work arrangement listed as Remote. On-call rotation after onboarding and shadowing.
Remote scope: unclear
Timezones: Clear written and verbal communication in a remote environment (RFCs, design docs, incident writeups).
Visa: Docker considers visa sponsorship on a case-by-case basis based on business needs.

Role details

Responsibilities

Take ambiguous infrastructure problems and turn them into proposals the org can rally around, then drive them through RFCs and architecture reviews across teams.
Design self-service capabilities and platform APIs (primarily in Go) for onboarding, provisioning, deployment, observability defaults, and day-2 operations, with contracts and docs teams actually use.
Set delivery standards using Terraform, GitOps with Argo CD, progressive rollout, and good testing, including building the continuous-deployment flow we're missing today.
Evolve the multi-tenant EKS foundations toward better reliability, security, scale, and cost: Envoy Gateway ingress, traffic routing, and the multi-region, cross-account connectivity we need.
Improve SLOs, alerting, and incident follow-up on Grafana Cloud so production gets safer and less dependent on heroics.
Help shape AI-assisted and agentic workflows to cut operational toil while staying safe, auditable, and human-reviewed.
Alert enrichment and incident context-gathering to assemble relevant signals, history, and runbooks.
Runbook-assisted diagnosis and remediation recommendations with a human in the loop on anything that changes production.
Onboarding and readiness assistants that answer questions experts answer today.
Join the on-call rotation after onboarding and shadowing; improve the health of on-call with better alerts, stronger runbooks, less toil, and blameless postmortems aimed at prevention.

Requirements

8+ years of professional, hands-on, full-time software engineering experience in backend, infrastructure, or platform engineering.
Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience.
Strong software engineering in Go or a similar language: design, testing, debugging, review, long-term maintainability.
Track record designing, shipping, and operating cloud services or infrastructure platforms in production.
Deep expertise in at least one of: Kubernetes, networking, cloud platforms, reliability engineering, or developer platforms, plus solid Linux, networking, and production-ops fundamentals.
Experience setting technical direction and leading work that needs cross-team alignment.
Clear written and verbal communication in a remote environment (RFCs, design docs, incident writeups).

Application

Please mention the word COMFORT and tag RMTk0LjE2My4xNjEuMjIw when applying to show you read the job post completely (#RMTk0LjE2My4xNjEuMjIw).

Portfolio: unclear
GitHub: unclear
Cover letter: unclear
Apply flow: external
Canonical URL: https://remoteOK.com/remote-jobs/remote-staff-software-engineer-infrastructure-docker-1133025

Company context

Product: Container image and application development tools (Docker Desktop, Docker Hub, Docker Scout) and infrastructure platform for building and delivering software.
Industry: Developer tools
HQ: Seattle and Paris (offices mentioned); company location not fully specified
Stage: growing start-up

Description

Docker has been one of the most loved brands in developer tooling, trusted by more than 20 million monthly users and over 20 billion container image pulls. As a globally distributed, remote-first team, Docker builds the tools that define how software gets built and delivered. The Staff Software Engineer Infrastructure role will help move expert-driven infrastructure and operational workflows into self-service systems with clear ownership, safe defaults, strong guardrails, and measurable adoption. Responsibilities (Staff-level, measured by leverage): - Take ambiguous infrastructure problems and turn them into proposals the org can rally around; drive through RFCs and architecture reviews. - Design self-service capabilities and platform APIs (primarily in Go) for onboarding, provisioning, deployment, observability defaults, and day-2 operations; provide contracts and docs. - Set delivery standards using Terraform, GitOps with Argo CD, progressive rollout, and strong testing, including building a continuous-deployment flow. - Evolve multi-tenant EKS foundations for reliability, security, scale, and cost, including Envoy Gateway ingress, traffic routing, and multi-region, cross-account connectivity. - Improve SLOs, alerting, and incident follow-up on Grafana Cloud. AI-assisted operations: - Shape AI-assisted and agentic workflows to cut operational toil while staying safe, auditable, and human-reviewed. - Early targets include alert enrichment/incident context gathering; runbook-assisted diagnosis/remediation recommendations with human-in-the-loop for production changes; onboarding/readiness assistants. On-call: - Operational ownership; join rotation after onboarding and shadowing; improve on-call health with better alerts, runbooks, less toil, and blameless postmortems aimed at prevention.

Similar jobs

Loading similar jobs...