Talent.com
This job offer is not available in your country.
Site Reliability Engineer - Scalable Infra for AI - SECOND TALENT SG PTE. LTD.

Site Reliability Engineer - Scalable Infra for AI - SECOND TALENT SG PTE. LTD.

SECOND TALENT SG PTE. LTD.Islandwide, SG
16 days ago
Job description

Roles & Responsibilities

Overview :

Be the infrastructure hero for one of Asia’s most dynamic AI startups. This is an opportunity to own reliability, scalability, and efficiency across global systems.

Key Responsibilities :

  • Manage container and open-source infrastructure clusters.
  • Build and maintain CI / CD pipelines, monitoring, and logging tools.
  • Troubleshoot and resolve critical incidents rapidly.
  • Enhance system availability through architectural improvements.
  • Drive automation across all levels of operations.
  • Work closely with engineering to champion infrastructure best practices.
  • Participate in 24 / 7 support rotations.

Requirements :

  • Bachelor's degree in Computer Science or equivalent.
  • 3+ years in SRE, DevOps, or system operations.
  • Strong with Linux, Shell / Python scripting, and system-level performance tuning.
  • Hands-on with cloud platforms (AWS, GCP, Azure).
  • Advanced knowledge of Kubernetes, Docker, GitLab CI, and ArgoCD.
  • Familiarity with managing MySQL, Redis, Kafka, Nginx, Elasticsearch, JVM apps.
  • Self-driven with a strong problem-solving mindset.
  • Tell employers what skills you have

    Scalability

    Kubernetes

    Azure

    Pipelines

    Architectural

    MySQL

    Nginx

    Scripting

    Reliability

    Logging

    Python

    Performance Tuning

    Docker

    GCP

    Linux

    Create a job alert for this search

    Site Reliability Engineer • Islandwide, SG

    Related jobs
    • Promoted
    Site Reliability Engineer (Linux Kernel, Kubernetes, Cloud, Automation, Networking). - ANTAS PTE. LTD.

    Site Reliability Engineer (Linux Kernel, Kubernetes, Cloud, Automation, Networking). - ANTAS PTE. LTD.

    ANTAS PTE. LTD.Islandwide, SG
    Develop and oversee performance-critical infrastructure for financial markets, ensuring maximum throughput, high resiliency, and minimal operational risk. Leverage deep Linux kernel expertise to fin...Show moreLast updated: 10 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    ABAXX SINGAPORE PTE. LTD.D01 Cecil, Marina, People’s Park, Raffles Place, SG
    Site Reliability Engineer - Networking.We are seeking competent candidate joining our Infrastructure Team for the mission building and operating MAS regulated marketplace and clearing house.This ro...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer (Linux Kernel, Kubernetes, Cloud, Automation, Networking). - Islandwide, SG

    Site Reliability Engineer (Linux Kernel, Kubernetes, Cloud, Automation, Networking). - Islandwide, SG

    EXASOFT CONSULTING PTE. LTD.Islandwide, SG
    Develop and oversee performance-critical infrastructure for financial markets, ensuring maximum throughput, high resiliency, and minimal operational risk. Leverage deep Linux kernel expertise to fin...Show moreLast updated: 10 days ago
    • Promoted
    Site Reliability Engineer (SRE) for WOG with 4 years experience (contract) - D14 Geylang, Eunos, SG

    Site Reliability Engineer (SRE) for WOG with 4 years experience (contract) - D14 Geylang, Eunos, SG

    WEBSPARKS PTE. LTD.D14 Geylang, Eunos, SG
    Contract Duration : 12 months (Renewable).We are seeking a skilled and passionate Engineer to join our team to build and operate a Whole-of-Government (WoG) runtime platform.As a Site Reliability En...Show moreLast updated: 14 days ago
    • Promoted
    Site Reliability Engineer (MCS) - THALES DIS (SINGAPORE) PTE. LTD.

    Site Reliability Engineer (MCS) - THALES DIS (SINGAPORE) PTE. LTD.

    THALES DIS (SINGAPORE) PTE. LTD.D05 Clementi New Town, Hong Leong Garden, Pasir Panjang, SG
    You will work in a Devops team managing ODC products in GCP Cloud, following the SRE approach.You will develop and maintain IAC code and automation tools. You will be responsible to provide technica...Show moreLast updated: 7 days ago
    • Promoted
    Site Reliability Engineer (MCS) - D05 Clementi New Town, Hong Leong Garden, Pasir Panjang, SG

    Site Reliability Engineer (MCS) - D05 Clementi New Town, Hong Leong Garden, Pasir Panjang, SG

    THALES DIS (SINGAPORE) PTE. LTD.D05 Clementi New Town, Hong Leong Garden, Pasir Panjang, SG
    You will work in a Devops team managing ODC products in GCP Cloud, following the SRE approach.You will develop and maintain IAC code and automation tools. You will be responsible to provide technica...Show moreLast updated: 7 days ago
    • Promoted
    Site Reliability Engineer (Linux Kernel, Kubernetes, Cloud, Automation, Networking). - EXASOFT CONSULTING PTE. LTD.

    Site Reliability Engineer (Linux Kernel, Kubernetes, Cloud, Automation, Networking). - EXASOFT CONSULTING PTE. LTD.

    EXASOFT CONSULTING PTE. LTD.Islandwide, SG
    Develop and oversee performance-critical infrastructure for financial markets, ensuring maximum throughput, high resiliency, and minimal operational risk. Leverage deep Linux kernel expertise to fin...Show moreLast updated: 10 days ago
    • Promoted
    Site Reliability Engineer - Scalable Infra for AI

    Site Reliability Engineer - Scalable Infra for AI

    SECOND TALENT SG PTE. LTD.Islandwide, SG
    Be the infrastructure hero for one of Asia’s most dynamic AI startups.This is an opportunity to own reliability, scalability, and efficiency across global systems. Manage container and open-source i...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer (MCS)

    Site Reliability Engineer (MCS)

    THALES DIS (SINGAPORE) PTE. LTD.D05 Clementi New Town, Hong Leong Garden, Pasir Panjang, SG
    You will work in a Devops team managing ODC products in GCP Cloud, following the SRE approach.You will develop and maintain IAC code and automation tools. You will be responsible to provide technica...Show moreLast updated: 30+ days ago
    • Promoted
    Site Reliability Engineer (SRE)

    Site Reliability Engineer (SRE)

    COFFEE MEETS BAGEL PTE. LTD.D01 Cecil, Marina, People’s Park, Raffles Place, SG
    We are a global dating app created to give everyone a chance at love.The sense of belonging and connectedness we get from relationships helps us survive and thrive, and we’re working to make it a l...Show moreLast updated: 24 days ago
    • Promoted
    Site Reliability Engineer, ARK Large Model Platform (Singapore)

    Site Reliability Engineer, ARK Large Model Platform (Singapore)

    BYTEPLUS PTE. LTD.D01 Cecil, Marina, People’s Park, Raffles Place, SG
    Founded in 2012, ByteDance's mission is to inspire creativity and enrich life.With a suite of more than a dozen products, including TikTok, Lemon8, CapCut and Pico as well as platforms specific to ...Show moreLast updated: 24 days ago
    • Promoted
    Site Reliability Engineer - D01 Cecil, Marina, People’s Park, Raffles Place, SG

    Site Reliability Engineer - D01 Cecil, Marina, People’s Park, Raffles Place, SG

    ABAXX SINGAPORE PTE. LTD.D01 Cecil, Marina, People’s Park, Raffles Place, SG
    Site Reliability Engineer - Networking.We are seeking competent candidate joining our Infrastructure Team for the mission building and operating MAS regulated marketplace and clearing house.This ro...Show moreLast updated: 14 days ago
    • Promoted
    Site Reliability Engineer

    Site Reliability Engineer

    APPLE SOUTH ASIA PTE. LTD.D20 Bishan, Ang Mo Kio, SG
    There is a lot that goes into building the most secure yet user-friendly devices in the world.We are a unique Software Development group with a charter to secure our platforms, which include iOS so...Show moreLast updated: 13 days ago
    • Promoted
    DevOps Engineer / Site-Reliability Engineer - THIRD PARTY CONSULTING PTE. LTD.

    DevOps Engineer / Site-Reliability Engineer - THIRD PARTY CONSULTING PTE. LTD.

    THIRD PARTY CONSULTING PTE. LTD.D01 Cecil, Marina, People’s Park, Raffles Place, SG
    Cluster Operations & Management.Manage and maintain container clusters (Kubernetes, Docker) and open-source component clusters (Kafka, Redis, Elasticsearch) across multiple business units.Ensur...Show moreLast updated: 16 days ago
    • Promoted
    Site Reliability Engineer - D11 Novena, Thomson, Watten Estate, SG

    Site Reliability Engineer - D11 Novena, Thomson, Watten Estate, SG

    CAREER INTERNATIONAL - FOS PTE. LTD.D11 Novena, Thomson, Watten Estate, SG
    Ensure the stability, reliability, and efficient operation of the Company's global business, maintaining high availability of services at all times. Responsible for core operational tasks such as re...Show moreLast updated: 6 days ago
    • Promoted
    Site Reliability Engineer - ABAXX SINGAPORE PTE. LTD.

    Site Reliability Engineer - ABAXX SINGAPORE PTE. LTD.

    ABAXX SINGAPORE PTE. LTD.D01 Cecil, Marina, People’s Park, Raffles Place, SG
    Site Reliability Engineer - Networking.We are seeking competent candidate joining our Infrastructure Team for the mission building and operating MAS regulated marketplace and clearing house.This ro...Show moreLast updated: 14 days ago
    • Promoted
    Site Reliability Engineer, ARK Large Model Platform (Singapore) - D01 Cecil, Marina, People’s Park, Raffles Place, SG

    Site Reliability Engineer, ARK Large Model Platform (Singapore) - D01 Cecil, Marina, People’s Park, Raffles Place, SG

    BYTEPLUS PTE. LTD.D01 Cecil, Marina, People’s Park, Raffles Place, SG
    Founded in 2012, ByteDance's mission is to inspire creativity and enrich life.With a suite of more than a dozen products, including TikTok, Lemon8, CapCut and Pico as well as platforms specific to ...Show moreLast updated: 16 days ago
    • Promoted
    Site Reliability Engineer (Database / MySQL / Game) - MANPOWER STAFFING SERVICES (SINGAPORE) PTE LTD

    Site Reliability Engineer (Database / MySQL / Game) - MANPOWER STAFFING SERVICES (SINGAPORE) PTE LTD

    MANPOWER STAFFING SERVICES (SINGAPORE) PTE LTDD02 Anson, Tanjong Pagar, SG
    Responsible for deployment, change, issues triage and infra management of overseas games and relevant components and system, such as game monitor system, login services. Responsible for monitoring a...Show moreLast updated: 16 days ago