Salary:
$200-270K - Per Annum
Locations:
San Francisco, United States
Type:
Permanent
Published:
October 2, 2025
Contact:
Raman Rajakumar
Ref:
18619
Required Skills:
AI
Share this job
Apply

Senior Site Reliability Engineer

Location: San Fran, CA (On-site, Full-time)
Compensation: $200-275K + equity + full benefits package
 

About the Role

Our client is a fast-growing AI scale-up seeking a Senior Site Reliability Engineer to scale and harden the infrastructure behind their core platform. This person will own automation, deployment pipelines, observability, and reliability at scale. They will partner closely with software engineers, ML researchers, and product teams to ensure the platform is secure, performant, and ready for rapid iteration.
 

Key Responsibilities

  • Design and implement infrastructure automation and deployment pipelines using Terraform.

  • Build and maintain monitoring, alerting, and logging systems to ensure reliability and performance.

  • Work with engineering teams to design and deploy scalable, fault-tolerant, and secure production systems on AWS, GCP, or Azure.

  • Define and maintain cloud-native security and compliance practices (e.g. Vault, KMS).

  • Troubleshoot and resolve complex infra and operations issues across cross-functional teams.

  • Lead disaster recovery and business continuity planning.

  • Document systems and processes to improve repeatability and scalability.

  • Mentor junior engineers and provide technical guidance.

 

Qualifications

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or equivalent.

  • 5+ years of professional SRE or infrastructure engineering experience.

  • Strong expertise in Terraform and Kubernetes with a demonstrable track record.

  • Proficiency in Python with Go as a plus for building orchestration and automation.

  • Hands-on experience with Docker and CI/CD pipelines such as GitLab or GitHub Actions.

  • Familiarity with monitoring and logging platforms such as ELK, Grafana, or Datadog.

  • Strong knowledge of cloud platforms including AWS, GCP, or Azure. Preference for GCP or Azure if equally matched.

  • Ability to thrive in fast-paced, high-growth environments with minimal ramp-up.

  • Excellent communication and collaboration skills.

Apply