Roles
Compensation
USD 150000 - 250000
150000 - 250000
- Salary period
- yearly
- Location basis
- depending on experience, skills, qualifications, and location
- Equity
- May include equity in the form of stock options.
Benefits
- Competitive total compensation package (salary + equity)
- Retirement or pension plan, in line with local norms
- Health, dental, and vision insurance
- Generous PTO policy, in line with local norms
Tech stack
Required
Location
Remote
Work setup
- Employment
- full-time
- Level
- Principal
- Remote policy
- Remote. Willingness to travel extensively across the fleet (50–75%).
- Remote scope
- unclear
- Travel
- 50–75% travel across the fleet.
Role details
Responsibilities
- Operate as the most senior technical authority for the operational hardware fleet across the hyperscale AI data center portfolio
- Serve as the technical arm of senior operations leadership in the field, leading site assessments and operational audits
- Drive technical readiness of teams ahead of site activation
- Review hardware platforms and integration designs from an operational lens
- Feed operational learnings back into hardware engineering, deployment, and supply chain organizations
- Act as a force multiplier across site hardware leads, deployment teams, and reliability engineers
- Hold OEMs, ODMs, service vendors, and deployment partners accountable and enforce standards without burning relationships
- Author, approve, and execute high-risk MOPs and change records in live production environments
- Lead root cause analysis on significant hardware events and drive corrective actions to closure
- Produce operational health assessments, RCAs, procedure reviews, and design review feedback
- Operate as the senior technical voice across operations, hardware engineering, network, facilities, supply chain, and customer-facing teams
- Travel extensively across the fleet (50–75%)
Requirements
- 10+ years of hands-on experience operating mission-critical hardware infrastructure
- At least 5 years as the senior technical voice on a site, campus, or fleet
- Deep working command of GPU systems, server platforms, storage infrastructure, firmware lifecycle management, and hardware diagnostics
- Demonstrated ability to author, approve, and execute high-risk MOPs and change records in live production environments
- Track record of leading root cause analysis on significant hardware events and driving corrective actions to closure
- Track record of holding OEMs, ODMs, service vendors, and deployment partners accountable
- Strong written communication
- Comfort operating as the senior technical voice across operations and cross-functional teams
- Willingness to travel extensively across the fleet (50–75%)
Application
You will receive a confirmation email once your application has successfully been accepted. If there is an error with your submission and you did not receive a confirmation email, please email careers@fluidstack.io with your resume/CV, the role you've applied for, and the date you submitted your application. Please mention the word LUXURIOUSLY and tag RODguMTk4Ljk5LjE0Mw== when applying to show you read the job post completely (#RODguMTk4Ljk5LjE0Mw==).
- Portfolio
- unclear
- GitHub
- not required
- Cover letter
- unclear
- Apply flow
- unclear
Company context
Make humanity more free by delivering frontier compute infrastructure for aligned AI.
- Product
- Frontier compute infrastructure and hyperscale AI data center operations (design, build, and operate data centers; deliver large-scale compute faster).
- Industry
- unclear
Contact
careers@fluidstack.io
Description
About Fluidstack: We exist to make humanity more free. Powerful AI will be the biggest lever for human choice we've ever built, and whoever deploys frontier compute infrastructure fastest will decide whether AI expands human freedom or shrinks it. We're focused on delivering 10 to 100s of GWs of compute faster than anyone else by acquiring power, designing and building data centers, and operating them with teams spanning hardware and software. About the Role: Seeking a Principal Operations Engineer, Hardware to serve as the most senior technical authority for the operational hardware fleet across a hyperscale AI data center portfolio. Ensures deployed GPU systems, servers, and supporting hardware are operated, maintained, and continuously improved to the workload’s standard. Operates as the technical arm of senior operations leadership—leading site assessments and operational audits, driving technical readiness ahead of site activation, reviewing hardware platforms and integration designs from an operational lens, and feeding operational learnings back into hardware engineering, deployment, and supply chain organizations as the company shifts toward a productized, repeatable build model. Acts as a force multiplier across site hardware leads, deployment teams, and reliability engineers, connecting hardware operations, hardware engineering, network, facilities, and customer-facing teams. The ideal candidate has spent a career operating hardware at scale in hyperscale data centers, large HPC environments, or comparable 24/7 infrastructure, and can diagnose hardware issues, lead fleet-wide root cause investigations, and push back on vendors on flawed processes. Formal engineering credentials valued but not required. Responsibilities include: 10+ years hands-on experience operating mission-critical hardware infrastructure with at least 5 years as the senior technical voice on a site/campus/fleet; data center operations experience strongly preferred; deep working command of GPU systems, server platforms, storage infrastructure, firmware lifecycle management, and hardware diagnostics; author/approve/execute high-risk MOPs and change records; lead root cause analysis on significant hardware events; hold OEMs/ODMs/service vendors/deployment partners accountable; strong written communication; comfort operating as senior technical voice across operations and cross-functional teams; willingness to travel extensively across the fleet (50–75%).
Similar jobs
-
Loading similar jobs...
View on Remote OK