Talent.com
This job offer is not available in your country.
Site Reliability Engineer

Site Reliability Engineer

Tek SystemsSingapore, Pedra Branca, Singapore
20 days ago
Job description
  • Monitor production systems using tools like Grafana and New Relic to detect performance issues and security vulnerabilities.
  • Respond to live incidents and outages, perform root cause analysis, and drive postmortem documentation and learning.
  • Maintain up-to-date operational runbooks for common issues and workflows.
  • A leading global gaming and technology company is seeking a highly capable Site Reliability Engineer (SRE) to join their team in Singapore. This is a mission-critical role where you’ll own the reliability, scalability, and performance of complex distributed systems supporting a global platform. You’ll work at the intersection of software development and operations—designing robust systems, responding to live incidents, and driving automation across infrastructure and CI / CD processes.

    The Position :

    • Monitor production systems using tools like Grafana and New Relic to detect performance issues and security vulnerabilities.
    • Respond to live incidents and outages, perform root cause analysis, and drive postmortem documentation and learning.
    • Maintain up-to-date operational runbooks for common issues and workflows.
    • Collaborate closely with developers to streamline production releases, patches, and deployment workflows.
    • Manage infrastructure across cloud environments (primarily AWS), and optimize CI / CD pipelines for reliability and efficiency.
    • Handle capacity planning, system performance tuning, and implement infrastructure-as-code using tools like Terraform.
    • The Candidate :

    • Comes from a backend or full-stack development background and is comfortable coding in languages such as Java, JavaScript / TypeScript, or Bash.
    • Has experience running services at scale in cloud environments like AWS, with a strong understanding of Linux.
    • Thinks like a software engineer, but with the mindset of an operator—proactively preventing outages and continuously improving systems.
    • Is adept at debugging under pressure, analyzing logs / metrics, and communicating clearly during incidents.
    • Is passionate about automation, observability, and creating self-healing systems.
    • Preferred Qualifications

    • 3+ years of experience in site reliability engineering, DevOps, or software engineering roles.
    • Proven skills in :
    • o Monitoring & alerting tools (Grafana, New Relic)

      o CI / CD pipelines (Git, Jenkins, GitHub Actions, etc.)

      o Container orchestration (Docker, Kubernetes)

      o Infrastructure-as-code (Terraform, CloudFormation, Ansible)

      o Managing and securing AWS environments

    • Understanding of authentication / authorization protocols (OAuth, JWT, OpenID)
    • Familiarity with SQL / NoSQL databases (PostgreSQL, Redis, MongoDB)
    • Strong interpersonal skills and a collaborative approach to working with cross-functional teams.
    • We regret to inform that only shortlisted candidates will be notified / contacted.

      EA Registration No : R22105541, TAY ZHIHENG, DARIUS

      Allegis Group Singapore Pte Ltd, Company Reg No. 200909448N, EA License No. 10C4544

      J-18808-Ljbffr

    Create a job alert for this search

    Site Reliability Engineer • Singapore, Pedra Branca, Singapore