Roles & Responsibilities
Job Summary :
We are seeking a highly skilled and experienced Cloud Engineer lead (Level 3) to support cloud infrastructure for Commercial and Singapore Government-appointed agency operating across commercial cloud platforms. This role requires experiences managing multi-cloud environments predominantly on Amazon Web Services (AWS), with knowledge in Microsoft Azure and Google Cloud Platform (GCP). The ideal candidate will demonstrate strong Infrastructure-as-Code (IaC) capabilities, comprehensive OS lifecycle and patching operations, application deployment and troubleshooting expertise, and proactive operational leadership. This role emphasizes hands-on technical proficiency, security awareness, automation-driven practices, mentorship capabilities, and familiarity with strict uptime, compliance, and audit requirements in network separation environments.
Key Responsibilities :
Multi-Cloud Infrastructure Operations
- Operate and maintain cloud-native services in production across AWS, Microsoft Azure, and Google Cloud Platform :
- Hands-on experience with cloud services including : Lambda, ECS / EKS, FSx, Glue, SES, GuardDuty, WAF, Shield Advanced, Security Hub, KMS, Secret Manager, SNS, SQS, EventBridge, API Gateway, EC2, S3, CloudWatch, Systems Manager, Azure Virtual Machines, Azure Kubernetes Service (AKS), Azure Functions, Azure Storage, Azure Monitor, Compute Engine, Google Kubernetes Engine (GKE), Cloud Functions, Cloud Storage, Cloud Monitoring
- Monitor and troubleshoot infrastructure performance, uptime, and scalability across all platforms
- Support production and staging environments with 24 / 7 reliability objectives
- Able to participate in 24 / 7 shift rotation to provide round-the-clock operational support and assist a team of L2 engineers with hands-on troubleshooting of technical issues.
Infrastructure as Code (IaC)
With working knowledge, able to maintain infrastructure deployment pipelines with 1 of the following : Terraform, Ansible, and / or Azure Resource Manager (ARM) templatesTroubleshoot environment drift and pipeline failures across multi-cloud environments.Promote and be empowered to drive automation in cloud operations and continuous improvement initiatives.Implement and maintain GitOps practices for infrastructure deploymentOperating System Lifecycle & Patch Management
Lead OS patching operations across RHEL (v8 to v10) and Windows Server (2016→2025) using AWS Patch Manager, Azure Update Management, WSUS, SCCM, and YUM / DNFMaintain basic knowledge of Linux administration with deep expertise in Wintel Operating System patching and managementSchedule, automate, and track patches across all environmentsCoordinate patch approvals and ensure compliance with organizational policiesExecute monthly and quarterly patch cycles with minimal disruptionPerform post-patch validation and remediation activitiesApplication Deployment & Troubleshooting
Deploy and troubleshoot applications across Windows and Linux operating systemsSupport application teams with OS-level diagnostics and performance optimizationCollaborate with development teams to resolve infrastructure and OS-related application issuesImplement and maintain application monitoring and alerting frameworksSecurity & Compliance
Execute CIS (Center for Internet Security) security remediations across cloud platformsPerform security hardening based on CIS Benchmarks and government security baselinesConduct vulnerability remediation using tools such as Trend Micro Vision One, Qualys, Tenable, and AWS ConfigTrack SSL certificate renewals across all environmentsIdentify and remediate End-of-Life (EOL) components including OS versions and Lambda runtimesSupport compliance with government-level security, audit, and regulatory requirementsContainer & DevSecOps
Demonstrate knowledge of container technologies (Docker, Kubernetes, ECS, EKS, AKS, GKE)Familiarity or insights of DevSecOps practices using SHIP-HATS (Secure Hybrid Integration Pipeline - Hive Agile Testing Solutions) under Singapore Government technology stackSupport CI / CD pipeline operations and integration with security scanning toolsITIL & Service Management
Adhere to ITIL processes including Incident, Problem, Change, and Request ManagementManage and resolve ITSM tickets via ServiceNow, Jira, or similar platformsDrive ITSM ticket escalation between engineering teams and stakeholdersCoordinate change management activities and participate in Change Advisory Board (CAB) reviews with junior engineers.Maintain service level agreements (SLAs) and operational level agreements (OLAs)Tool Integration & Observability
Integrate third-party tools including NGINX, monitoring dashboards, and observability stacksConfigure and maintain observability tools for metrics, logs, and alerts across multi-cloud environmentsImplement log aggregation and analysis using CloudWatch, Azure Monitor, and GCP Cloud LoggingDocumentation & Knowledge Management
Create and maintain comprehensive infrastructure runbooks, system documentation, and change tracking logs and infrastructure architecture design of Application assigned.Develop standard operating procedures (SOPs) and knowledge base articlesEnsure audit-readiness through meticulous documentation disciplineMaintain configuration management databases (CMDB) and asset inventoriesLeadership & Mentorship
Provide technical guidance and mentorship to Level 2 and junior engineersLead technical discussions and architecture reviewsFacilitate knowledge