Need Singaporean cat 1, Cat 2 A
Key Responsibilities :
Administer and manage HPC infrastructure with 700+ compute nodes and 50+ AWS cloud instances.
Ensure smooth operation and integration of HPC systems, storage subsystems, and networking components.
Perform administration on Red Hat and CentOS servers.
Handle patching, compiling, securing, and troubleshooting in a heterogeneous environment.
Implement and maintain system monitoring, configuration management, and automation using tools like Puppet, Splunk, BigFix, Ganglia, and Nagios.
Manage job scheduling environments with PBS or equivalent workload schedulers.
Provide advanced troubleshooting for researchers and developers in HPC environments.
Contribute to system performance, reliability, and scalability enhancements.
Coordinate and implement changes across development, testing, and production environments.
Work closely with internal IT teams and research staff to meet infrastructure demands.
Participate in disaster recovery planning and maintain system documentation.
Required Skills & Tools :
Operating Systems :
HPC Tools & Technologies :
Scripting & Automation :
Cloud Technologies :
Monitoring & Configuration Tools :
Soft Skills :
Network Engineer • Singapore