Talent.com
This job offer is not available in your country.
Site Reliability Engineer

Site Reliability Engineer

TEKsystems (Allegis Group Singapore Pte Ltd)Singapore
23 days ago
Job description

Site Reliability Engineer

  • Monitor production systems using tools like Grafana and New Relic to detect performance issues and security vulnerabilities.
  • Respond to live incidents and outages, perform root cause analysis, and drive postmortem documentation and learning.
  • Maintain up-to-date operational runbooks formon issues and workflows.

A leading global gaming and technologypany is seeking a highly capable Site Reliability Engineer (SRE) to join their team in Singapore. This is a mission-critical role where you'll own the reliability, scalability, and performance ofplex distributed systems supporting a global platform. You'll work at the intersection of software development and operations-designing robust systems, responding to live incidents, and driving automation across infrastructure and CI / CD processes.

The Position :

  • Monitor production systems using tools like Grafana and New Relic to detect performance issues and security vulnerabilities.
  • Respond to live incidents and outages, perform root cause analysis, and drive postmortem documentation and learning.
  • Maintain up-to-date operational runbooks formon issues and workflows.
  • Collaborate closely with developers to streamline production releases, patches, and deployment workflows.
  • Manage infrastructure across cloud environments (primarily AWS), and optimize CI / CD pipelines for reliability and efficiency.
  • Handle capacity planning, system performance tuning, and implement infrastructure-as-code using tools like Terraform.
  • The Candidate :

  • es from a backend or full-stack development background and isfortable coding in languages such as Java, JavaScript / TypeScript, or Bash.
  • Has experience running services at scale in cloud environments like AWS, with a strong understanding of Linux.
  • Thinks like a software engineer, but with the mindset of an operator-proactively preventing outages and continuously improving systems.
  • Is adept at debugging under pressure, analyzing logs / metrics, andmunicating clearly during incidents.
  • Is passionate about automation, observability, and creating self-healing systems.
  • Preferred Qualifications

  • 3+ years of experience in site reliability engineering, DevOps, or software engineering roles.
  • Proven skills in :
  • o Monitoring & alerting tools (Grafana, New Relic)

    o CI / CD pipelines (Git, Jenkins, GitHub Actions, etc.)

    o Container orchestration (Docker, Kubernetes)

    o Infrastructure-as-code (Terraform, CloudFormation, Ansible)

    o Managing and securing AWS environments

  • Understanding of authentication / authorization protocols (OAuth, JWT, OpenID)
  • Familiarity with SQL / NoSQL databases (PostgreSQL, Redis, MongoDB)
  • Strong interpersonal skills and a collaborative approach to working with cross-functional teams.
  • We regret to inform that only shortlisted candidates will be notified / contacted.

    EA Registration No : R22105541, TAY ZHIHENG, DARIUS

    Allegis Group Singapore Pte Ltd,pany Reg No. 200909448N, EA License No. 10C4544

    Job ID a4VOd0000017OQ3MAM

    Create a job alert for this search

    Site Reliability Engineer • Singapore