This job offer is not available in your country.

Site Reliability Engineer - Scalable Infra for AI

SECOND TALENT SG PTE. LTD.Islandwide, SG

30+ days ago

Job description

Roles & Responsibilities

Overview :

Be the infrastructure hero for one of Asia’s most dynamic AI startups. This is an opportunity to own reliability, scalability, and efficiency across global systems.

Key Responsibilities :

Manage container and open-source infrastructure clusters.
Build and maintain CI / CD pipelines, monitoring, and logging tools.
Troubleshoot and resolve critical incidents rapidly.
Enhance system availability through architectural improvements.
Drive automation across all levels of operations.
Work closely with engineering to champion infrastructure best practices.
Participate in 24 / 7 support rotations.

Requirements :

Bachelor's degree in Computer Science or equivalent.

3+ years in SRE, DevOps, or system operations.

Strong with Linux, Shell / Python scripting, and system-level performance tuning.

Hands-on with cloud platforms (AWS, GCP, Azure).

Advanced knowledge of Kubernetes, Docker, GitLab CI, and ArgoCD.

Familiarity with managing MySQL, Redis, Kafka, Nginx, Elasticsearch, JVM apps.

Self-driven with a strong problem-solving mindset.

Tell employers what skills you have

Scalability

Kubernetes

Azure

Pipelines

Architectural

MySQL

Nginx

Scripting

Reliability

Logging

Python

Performance Tuning

Docker

GCP

Linux

Create a job alert for this search

Site Reliability Engineer • Islandwide, SG