Roles & Responsibilities
Job Summary :
We are seeking a highly skilled and experienced Cloud Engineer lead (Level 3) to support cloud infrastructure for Commercial and Singapore Government-appointed agency operating across commercial cloud platforms. This role requires experiences managing multi-cloud environments predominantly on Amazon Web Services (AWS), with knowledge in Microsoft Azure and Google Cloud Platform (GCP). The ideal candidate will demonstrate strong Infrastructure-as-Code (IaC) capabilities, comprehensive OS lifecycle and patching operations, application deployment and troubleshooting expertise, and proactive operational leadership. This role emphasizes hands-on technical proficiency, security awareness, automation-driven practices, mentorship capabilities, and familiarity with strict uptime, compliance, and audit requirements in network separation environments.
Key Responsibilities :
Multi-Cloud Infrastructure Operations
- Operate and maintain cloud-native services in production across AWS, Microsoft Azure, and Google Cloud Platform :
- Hands-on experience with cloud services including : Lambda, ECS / EKS, FSx, Glue, SES, GuardDuty, WAF, Shield Advanced, Security Hub, KMS, Secret Manager, SNS, SQS, EventBridge, API Gateway, EC2, S3, CloudWatch, Systems Manager, Azure Virtual Machines, Azure Kubernetes Service (AKS), Azure Functions, Azure Storage, Azure Monitor, Compute Engine, Google Kubernetes Engine (GKE), Cloud Functions, Cloud Storage, Cloud Monitoring
- Monitor and troubleshoot infrastructure performance, uptime, and scalability across all platforms
- Support production and staging environments with 24 / 7 reliability objectives
- Able to participate in 24 / 7 shift rotation to provide round-the-clock operational support and assist a team of L2 engineers with hands-on troubleshooting of technical issues.
Infrastructure as Code (IaC)
With working knowledge, able to maintain infrastructure deployment pipelines with 1 of the following : Terraform, Ansible, and / or Azure Resource Manager (ARM) templatesTroubleshoot environment drift and pipeline failures across multi-cloud environments.Promote and be empowered to drive automation in cloud operations and continuous improvement initiatives.Implement and maintain GitOps practices for infrastructure deploymentOperating System Lifecycle & Patch Management
Lead OS patching operations across RHEL (v8 to v10) and Windows Server (2016→2025) using AWS Patch Manager, Azure Update Management, WSUS, SCCM, and YUM / DNFMaintain basic knowledge of Linux administration with deep expertise in Wintel Operating System patching and managementSchedule, automate, and track patches across all environmentsCoordinate patch approvals and ensure compliance with organizational policiesExecute monthly and quarterly patch cycles with minimal disruptionPerform post-patch validation and remediation activitiesApplication Deployment & Troubleshooting
Deploy and troubleshoot applications across Windows and Linux operating systemsSupport application teams with OS-level diagnostics and performance optimizationCollaborate with development teams to resolve infrastructure and OS-related application issuesImplement and maintain application monitoring and alerting frameworksSecurity & Compliance
Execute CIS (Center for Internet Security) security remediations across cloud platformsPerform security hardening based on CIS Benchmarks and government security baselinesConduct vulnerability remediation using tools such as Trend Micro Vision One, Qualys, Tenable, and AWS ConfigTrack SSL certificate renewals across all environmentsIdentify and remediate End-of-Life (EOL) components including OS versions and Lambda runtimesSupport compliance with government-level security, audit, and regulatory requirementsContainer & DevSecOps
Demonstrate knowledge of container technologies (Docker, Kubernetes, ECS, EKS, AKS, GKE)Familiarity or insights of DevSecOps practices using SHIP-HATS (Secure Hybrid Integration Pipeline - Hive Agile Testing Solutions) under Singapore Government technology stackSupport CI / CD pipeline operations and integration with security scanning toolsITIL & Service Management
Adhere to ITIL processes including Incident, Problem, Change, and Request ManagementManage and resolve ITSM tickets via ServiceNow, Jira, or similar platformsDrive ITSM ticket escalation between engineering teams and stakeholdersCoordinate change management activities and participate in Change Advisory Board (CAB) reviews with junior engineers.Maintain service level agreements (SLAs) and operational level agreements (OLAs)Tool Integration & Observability
Integrate third-party tools including NGINX, monitoring dashboards, and observability stacksConfigure and maintain observability tools for metrics, logs, and alerts across multi-cloud environmentsImplement log aggregation and analysis using CloudWatch, Azure Monitor, and GCP Cloud LoggingDocumentation & Knowledge Management
Create and maintain comprehensive infrastructure runbooks, system documentation, and change tracking logs and infrastructure architecture design of Application assigned.Develop standard operating procedures (SOPs) and knowledge base articlesEnsure audit-readiness through meticulous documentation disciplineMaintain configuration management databases (CMDB) and asset inventoriesLeadership & Mentorship
Provide technical guidance and mentorship to Level 2 and junior engineersLead technical discussions and architecture reviewsFacilitate knowledge transfer sessions and training programsAct as escalation point for complex technical issuesDrive continuous improvement initiatives and best practice adoptionSoft Skills & Competencies
Problem Solving – Advanced troubleshooting of complex multi-cloud systemsCommunication – Clear and effective communication with technical and non-technical teams, stakeholders, and managementLeadership – Ability to guide teams and drive technical initiativesCollaboration – Cross-functional teamwork across engineering, security, and business teamsAdaptability – Responsive and effective in rapidly changing environmentsAccountability / Attention to Detail – Takes ownership of outcomes and service delivery, ensures accurate and secure implementationsCustomer Focus – Supportive, service-oriented approach with stakeholder managementContinuous Learning – Stays current with evolving cloud and security practicesResilience – Performs effectively under pressure and during incident responseMentorship – Develops and supports junior team engineersSME Expectations – Role Behavior
This Subject Matter Expert (SME) role requires :
Proficiency across Amazon Web Services with working knowledge of Azure and GCPProven experience in uptime-critical and compliance-driven environmentsStrong mentorship and leadership capabilities for junior and mid-level engineersProactive initiative in incident prevention and operational excellenceCalm, structured, and methodical approach to incident handling with strict adherence to change management and incident response processesAudit-readiness mindset with comprehensive documentation practicesAbility to drive escalations and manage stakeholder communications effectivelyExperience working within Singapore Government technology frameworksRequired Qualifications :
Bachelor's degree in Computer Science, Information Systems, or related fieldMinimum 3 years of experience in Commercial Cloud Engineering rolesAt least 2 years of experience in public sector or regulated cloud environmentsMinimum 3 years of hands-on experience with AWS or Microsoft Azure or Google Cloud PlatformExperience in 24 / 7 operational support environments with shift rotationDemonstrated experience in mentoring and leading junior engineersStrong background in ITIL processes and ITSM platforms with experiences on CIS security hardening and remediationFamiliarity with Singapore Government technology standards and frameworks (e.g., SHIP-HATS, IM8 Policy)Preferred Certifications :
AWS Certified Solutions Architect – Associate / ProfessionalAWS Certified SysOps Administrator – Associate (preferred)Microsoft Certified : Azure Administrator Associate or Azure Solutions Architect ExpertMicrosoft Certified : Windows Server Hybrid Administrator AssociateRHCE or Linux Professional Institute Certification (LPIC)ITIL v3 / v4 FoundationTell employers what skills you have
Microsoft Azure
Troubleshooting
Kubernetes
Azure
AWS
Architect
Cloud Storage
Amazon Web Services
Private Cloud
EC2
JIRA
Architecture Design
Docker
GCP
Cloud
ITIL
Ansible
S3
Linux
Amazon Cloud