Roles
Compensation
Technology stipend equivalent to $100 USD net/month; 16 weeks of paid parental leave (after 6 months); training stipend for conferences/courses/classes; equity (growing startup).
- Salary period
- unclear
- Equity
- Equity; all employees to have a share in the success of the company.
Benefits
- Freedom & flexibility; fit your work around your life
- Designated quarterly Whaleness Days plus end of year Whaleness break
- Home office setup; comfortable while you work
- 16 weeks of paid Parental leave (after 6 months of employment)
- Technology stipend equivalent to $100 USD net/month
- PTO plan that encourages you to take time to do the things you enjoy
- Training stipend for conferences, courses and classes
- Equity
- Docker Swag
- Medical benefits, retirement and holidays vary by country
- Remote-first culture, with offices in Seattle and Paris
Tech stack
Required
Nice to have
Location
Remote
Work setup
- Employment
- full-time
- Level
- Senior
- Remote policy
- Remote; Remote-first culture. Work arrangement listed as Remote. On-call rotation after onboarding and shadowing.
- Remote scope
- unclear
- Timezones
- Clear written and verbal communication in a remote environment (RFCs, design docs, incident writeups).
- Visa
- Docker considers visa sponsorship on a case-by-case basis based on business needs.
Role details
Responsibilities
- Take ambiguous infrastructure problems and turn them into proposals the org can rally around, then drive them through RFCs and architecture reviews across teams.
- Design self-service capabilities and platform APIs (primarily in Go) for onboarding, provisioning, deployment, observability defaults, and day-2 operations, with contracts and docs teams actually use.
- Set delivery standards using Terraform, GitOps with Argo CD, progressive rollout, and good testing, including building the continuous-deployment flow we're missing today.
- Evolve the multi-tenant EKS foundations toward better reliability, security, scale, and cost: Envoy Gateway ingress, traffic routing, and the multi-region, cross-account connectivity we need.
- Improve SLOs, alerting, and incident follow-up on Grafana Cloud so production gets safer and less dependent on heroics.
- Help shape AI-assisted and agentic workflows to cut operational toil while staying safe, auditable, and human-reviewed.
- Alert enrichment and incident context-gathering to assemble relevant signals, history, and runbooks.
- Runbook-assisted diagnosis and remediation recommendations with a human in the loop on anything that changes production.
- Onboarding and readiness assistants that answer questions experts answer today.
- Join the on-call rotation after onboarding and shadowing; improve the health of on-call with better alerts, stronger runbooks, less toil, and blameless postmortems aimed at prevention.
Requirements
- 8+ years of professional, hands-on, full-time software engineering experience in backend, infrastructure, or platform engineering.
- Bachelor's degree in Computer Science, Engineering, or related field, or equivalent practical experience.
- Strong software engineering in Go or a similar language: design, testing, debugging, review, long-term maintainability.
- Track record designing, shipping, and operating cloud services or infrastructure platforms in production.
- Deep expertise in at least one of: Kubernetes, networking, cloud platforms, reliability engineering, or developer platforms, plus solid Linux, networking, and production-ops fundamentals.
- Experience setting technical direction and leading work that needs cross-team alignment.
- Clear written and verbal communication in a remote environment (RFCs, design docs, incident writeups).
Application
Please mention the word COMFORT and tag RMTk0LjE2My4xNjEuMjIw when applying to show you read the job post completely (#RMTk0LjE2My4xNjEuMjIw).
- Portfolio
- unclear
- GitHub
- unclear
- Cover letter
- unclear
- Apply flow
- external
Company context
- Product
- Container image and application development tools (Docker Desktop, Docker Hub, Docker Scout) and infrastructure platform for building and delivering software.
- Industry
- Developer tools
- HQ
- Seattle and Paris (offices mentioned); company location not fully specified
- Stage
- growing start-up
Description
Docker has been one of the most loved brands in developer tooling, trusted by more than 20 million monthly users and over 20 billion container image pulls. As a globally distributed, remote-first team, Docker builds the tools that define how software gets built and delivered. The Staff Software Engineer Infrastructure role will help move expert-driven infrastructure and operational workflows into self-service systems with clear ownership, safe defaults, strong guardrails, and measurable adoption. Responsibilities (Staff-level, measured by leverage): - Take ambiguous infrastructure problems and turn them into proposals the org can rally around; drive through RFCs and architecture reviews. - Design self-service capabilities and platform APIs (primarily in Go) for onboarding, provisioning, deployment, observability defaults, and day-2 operations; provide contracts and docs. - Set delivery standards using Terraform, GitOps with Argo CD, progressive rollout, and strong testing, including building a continuous-deployment flow. - Evolve multi-tenant EKS foundations for reliability, security, scale, and cost, including Envoy Gateway ingress, traffic routing, and multi-region, cross-account connectivity. - Improve SLOs, alerting, and incident follow-up on Grafana Cloud. AI-assisted operations: - Shape AI-assisted and agentic workflows to cut operational toil while staying safe, auditable, and human-reviewed. - Early targets include alert enrichment/incident context gathering; runbook-assisted diagnosis/remediation recommendations with human-in-the-loop for production changes; onboarding/readiness assistants. On-call: - Operational ownership; join rotation after onboarding and shadowing; improve on-call health with better alerts, runbooks, less toil, and blameless postmortems aimed at prevention.
Similar jobs
-
Loading similar jobs...
View on Remote OK