Back to jobsRemote
Senior AI Infrastructure & Platform Operations Engineer
MirantisRemoteJuly 3, 2026
Skills
kuberneteslinux
Job Description
Role Overview As a Senior AI Infrastructure & Platform Operations Engineer, you will serve as a technical leader within the operations organization, providing deep expertise across infrastructure, networking, platform operations, and service reliability. You will be responsible for driving operational excellence across complex production environments while acting as a key escalation point for critical incidents and challenging technical issues. What You Will Do Lead the investigation and resolution of complex infrastructure, networking, and platform-related incidents. Troubleshoot complex Linux, Kubernetes, networking, storage, and hardware-related issues. Analyze platform performance, capacity, stability, and reliability trends to proactively identify risks. Why It Might Be a Fit We offer: Operate some of the most advanced AI infrastructure environments in production today. Work with the latest NVIDIA GPU technologies, Kubernetes platforms, and high-performance networking environments. Help define operational standards and reliability practices for next-generation AI infrastructure services. Requirements 7+ years of experience in infrastructure operations, platform operations, site reliability engineering, network operations, cloud operations, datacenter operations, or related technical roles. Expert-level Linux administration and troubleshooting skills. Strong networking expertise, including experience diagnosing complex performance, connectivity, and reliability issues. Strong experience operating Kubernetes in production environments. Experience supporting large-scale production infrastructure and distributed systems. Proven experience leading technical investigations and managing complex incidents. Experience performing root cause analysis and driving long-term operational improvements. Strong understanding of observability, monitoring, and service reliability practices. Excellent troubleshooting and analytical skills across multiple infrastructure domains. Strong communication, collaboration, and stakeholder management skills. Benefits Operate some of the most advanced AI infrastructure environments in production today. Work with the latest NVIDIA GPU technologies, Kubernetes platforms, and high-performance networking environments. Help define operational standards and reliability practices for next-generation AI infrastructure services. Influence the adoption of AI-powered operational capabilities through k0rdent AI. Work alongside highly skilled engineers solving complex infrastructure and platform challenges at scale. Join a growing organisation investing heavily in AI infrastructure, platform services, and operational innovation. Originally posted on Himalayas
Apply for this role
You'll be redirected to the company's application page