SR DevOps Engineer

Qualified Health
Qualified Health

Software Engineering

Palo Alto, CA, USA

Posted on Jun 18, 2026

Transform healthcare with us.

At Qualified Health, we’re redefining what’s possible with Generative AI in healthcare. Our infrastructure provides the guardrails for safe AI governance, healthcare-specific agent creation, and real-time algorithm monitoring—working alongside leading health systems to drive real change.

This is more than just a job. It’s an opportunity to build the future of AI in healthcare, solve complex challenges, and make a lasting impact on patient care. If you’re ambitious, innovative, and ready to move fast, we’d love to have you on board.

Join us in shaping the future of healthcare.

Job Summary

We're looking for a Senior DevOps Engineer / Site Reliability Engineer to ensure the reliability, performance, and operational excellence of our production environments powering AI solutions for major health systems. You'll partner closely with engineering teams to make services production-ready, own observability and incident response, and drive the practices that keep our platform stable as we scale. As a key member of our infrastructure team, you'll be the connective tissue between development and production, ensuring new features ship safely while maintaining the reliability standards required for healthcare workloads.

Key Responsibilities

  • Partner with engineering teams to ensure services are production-ready before release, including reviewing deployment patterns, failure modes, resource requirements, and rollback strategies

  • Design and maintain observability infrastructure including metrics, logging, distributed tracing, and dashboards across multi-cloud environments

  • Define and manage alerting policies, SLIs/SLOs, and on-call rotations to ensure timely response to production issues

  • Lead and support incident response for production issues, drive root cause analysis, and coordinate hotfix deployments when needed

  • Author and maintain release documentation, runbooks, incident postmortems, and operational playbooks

  • Provide day-to-day operational support to engineering teams, unblocking deployments, debugging production issues, and improving developer experience around shipping to production

  • Design and maintain zero trust network architectures, ensuring secure connectivity across multi-cloud environments and tenant boundaries

  • Build and improve CI/CD pipelines and release processes to make production deployments safer, faster, and more predictable

  • Develop automation in Python and Terraform to reduce toil and codify operational best practices

  • Manage Kubernetes-based workloads in production, including troubleshooting cluster issues, optimizing resource utilization, and maintaining workload reliability

  • Operate Temporal workflows in production, including monitoring, scaling, and troubleshooting long-running workflow executions

  • Collaborate with security and compliance teams to maintain HIPAA and HITRUST controls across production environments

Required Qualifications

  • 6+ years of experience in Site Reliability Engineering, DevOps, or Infrastructure Engineering, with at least 3 years directly managing production workloads

  • Strong proficiency with Terraform including module development, state management, and multi-environment architectures

  • Deep experience operating production Kubernetes environments, including troubleshooting, networking, workload management, and cluster operations

  • Hands-on experience with both Google Cloud Platform and Microsoft Azure services

  • Strong networking and security knowledge, including zero trust architectures, network segmentation, private connectivity, identity-based access controls, and secrets management

  • Production experience with Temporal or comparable workflow orchestration systems

  • Strong proficiency in Python for automation, tooling, and operational scripting

  • Demonstrated experience designing and operating observability stacks including metrics, logging, tracing, and alerting

  • Experience leading incident response, including on-call rotation management, runbook development, and postmortem processes

  • Track record of partnering with engineering teams to improve production readiness and release practices

  • Excellent written communication skills for authoring runbooks, postmortems, and release documentation

  • Bachelor's degree in Computer Science, Engineering, or related field, or equivalent experience

Desirable Skills

  • Experience in healthcare industry with understanding of HIPAA compliance requirements

  • Familiarity with HITRUST or similar compliance frameworks

  • Experience operating LLM-based systems, agentic workflows, or RAG pipelines in production

  • Experience with GitOps workflows (Rancher Fleet, ArgoCD, or Flux)

  • Experience building and operating multi-tenant SaaS infrastructure

  • Familiarity with chaos engineering and reliability testing practices

  • Prior experience as a founding or early SRE/Platform hire at a startup

Technical Environment

Our infrastructure is built on modern cloud technologies including:

  • Google Cloud Platform (primary) and Microsoft Azure

  • Google Kubernetes Engine (GKE)

  • Terraform and Terragrunt

  • Temporal for workflow orchestration

  • Python, Go, Shell scripting

  • GitOps-based deployment workflows

  • Modern monitoring and observability tools

Why Join Qualified Health?

This is an opportunity to join a fast-growing company and a world-class team, that is poised to change the healthcare industry. We are a passionate, mission-driven team that is building a category-defining product. We are backed by premier investors and are looking for founding team members who are excited to do the best work of their careers.

Our employees are integral to achieving our goals so we are proud to offer competitive salaries with equity packages, robust medical/dental/vision insurance, flexible working hours, hybrid work options and an inclusive environment that fosters creativity and innovation.

Our Commitment to Diversity

Qualified Health is an equal opportunity employer. We believe that a diverse and inclusive workplace is essential to our success, and we are committed to building a team that reflects the world we live in. We encourage applications from all qualified individuals, regardless of race, color, religion, gender, sexual orientation, gender identity or expression, age, national origin, marital status, disability, or veteran status.

Pay & Benefits: The pay range for this role is between $170,000 and $220,000, and will depend on your skills, qualifications, experience, and location. This role is also eligible for equity and benefits.

Join our mission to revolutionize healthcare with AI. To apply, please send your resume through the application below.