Talent.com
This job offer is not available in your country.
Site Reliability Engineer - Scalable Infra for AI

Site Reliability Engineer - Scalable Infra for AI

SECOND TALENT SG PTE. LTD.Islandwide, SG
30+ days ago
Job description

Roles & Responsibilities

Overview :

Be the infrastructure hero for one of Asia’s most dynamic AI startups. This is an opportunity to own reliability, scalability, and efficiency across global systems.

Key Responsibilities :

  • Manage container and open-source infrastructure clusters.
  • Build and maintain CI / CD pipelines, monitoring, and logging tools.
  • Troubleshoot and resolve critical incidents rapidly.
  • Enhance system availability through architectural improvements.
  • Drive automation across all levels of operations.
  • Work closely with engineering to champion infrastructure best practices.
  • Participate in 24 / 7 support rotations.

Requirements :

  • Bachelor's degree in Computer Science or equivalent.
  • 3+ years in SRE, DevOps, or system operations.
  • Strong with Linux, Shell / Python scripting, and system-level performance tuning.
  • Hands-on with cloud platforms (AWS, GCP, Azure).
  • Advanced knowledge of Kubernetes, Docker, GitLab CI, and ArgoCD.
  • Familiarity with managing MySQL, Redis, Kafka, Nginx, Elasticsearch, JVM apps.
  • Self-driven with a strong problem-solving mindset.
  • Tell employers what skills you have

    Scalability

    Kubernetes

    Azure

    Pipelines

    Architectural

    MySQL

    Nginx

    Scripting

    Reliability

    Logging

    Python

    Performance Tuning

    Docker

    GCP

    Linux

    Create a job alert for this search

    Site Reliability Engineer • Islandwide, SG