Principal Engineer, Ops Systems Engineering, Ops Systems Sustainment Centre
HTX (Home Team Science & Technology Agency)
What The Role Is
You play an important role in providing Information and Communications Technology (ICT) engineering service to Ops Systems Sustainment Centre in HTX.
What You Will Be Working On
Lead in Implementation and Deployment of Ransomware-Resilient Recovery Solutions
- Design, implement and deploy end-to-end recovery architectures to restore systems in the event of ransomware attacks
- Work with system owners to define Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO)
- Architect tiered recovery models, e.g. Critical Information Infrastructure (CII) systems vis-a-vis Significant Information Infrastructure (SII), aligned to RTO / RPO requirements
- Design solutions supporting immutable backups, offline / air-gapped storage, and isolated recovery environments (IRE)
- Define and document clean recovery paths ensuring restoration from known-good, uncompromised backups
- Define and maintain recovery blueprints for different system tiers, including OS, middleware, application, and data layers
- Document and handover recovery solutions to O&S teams with clear SOPs and technical guides
- Keep abreast of new / emerging technologies to future-proof solutions
Enable Recovery Operations and Readiness
Develop SOPs, runbooks, and validation procedures for system recoveryPlan and execute recovery drills and tests to verify restoration speed and data integrityLead the identification and remediation of gaps in recovery readinessOversee backup validation processes including routine test restores readiness through post-drill reviews and continuous improvementFunction as Technical Authority and Solution Ownership
Function as the technical expert for recovery solutions post-deploymentProvide in-depth guidance on backup and recovery operations across Ops Systems platformsDefine and maintain backup retention and recovery validation standardsCollaborate with infrastructure and application teams to ensure seamless integration of recovery processes into operationsRender technical expertise to Ops Systems Engineering unit’s undertakings such as the O&S expertise to flagship programmes, governance & checklist for O&S gatekeeping of AORs, etcAdvise on budget or capacity planning for storage, retention, and recovery environmentsCoach and guide junior engineers in the Engineering unit
Provide coaching to junior officers or project team members [no formal people management responsibilities]Guide engineers on the application of knowledge, and translation of knowledge into viable solutionsManage any other tasks as assigned by the supervisorWhat We Are Looking For
Tertiary qualification in Computer Science, Information Technology, Electrical and Electronics Engineering or equivalentMinimum 10 years of IT infrastructure, systems engineering, or operations, with at least 5 years in backup and recovery leadershipProven expertise in designing and implementing ransomware-resilient system recovery strategies or equivalentTrack record on delivering RTO / RPO-aligned recovery capabilitiesStrong leadership and collaboration skills across multi-disciplinary teamsDetail-oriented, structured, and proactive in identifying and mitigating recovery risksSkilled in communicating complex recovery concepts to both technical and non-technical stakeholdersAbility to cope with reasonably high level of stressAbility to work in team and independentlyPossess a good grasp of IT industry best practices and processes, while keeping abreast of advances in technology and best practicesAll new appointees will be appointed on a two-year contract in the first instance.
We wish to inform that only shortlisted candidates will be notified within 4 weeks upon closing of the advertisement.
Seniority level
Mid-Senior level
Employment type
Full-time
Industry
Government Administration
#J-18808-Ljbffr