Posted byDirector - Head of Technology Recruitment
My client, a global hedge fund, is actively seeking a hands on a highly skilled and motivated SRE to join their team. As an SRE, you will play a critical role in driving the adoption of Site Reliability Engineering practices within their organization. The ideal candidate will have a strong technical background and a passion for driving operational efficiency and continuous improvement.
The role :
- Drive the adoption of SRE principles, methodologies, and best practices across the organization.
- Collaborate closely with application development teams to ensure the successful deployment and operation of applications, including early-stage support during development.
- Establish and monitor key metrics, performance indicators, and service level objectives (SLOs) to ensure the reliability and availability of critical systems.
- Identify opportunities to eliminate toil through automation, code improvements, and process optimizations.
- Conduct root cause analyses for system failures and incidents, and implement engineering solutions to prevent future occurrences.
- Lead incident management and resolution efforts, ensuring timely and effective response to incidents, and driving post-incident reviews and process improvements.
- Work closely with cross-functional teams, including infrastructure, networking, and security, to optimize system performance, scalability, and security.
- Collaborate with stakeholders to define and refine service-level agreements (SLAs) and operational requirements.
- Stay abreast of industry trends and emerging technologies in Site Reliability Engineering, and leverage them to drive innovation and enhance operational efficiency.
What you offer :
Bachelor's or Master's degree inputer Science, Engineering, or a related field.Extensive experience in Site Reliability Engineering or a related field, with a strong understanding of SRE principles, practices, and tools.Strong technical background with expertise in areas such as distributed systems, cloudputing, network architecture, and software development.Strong experience in Python, also with automation and configuration management tools.Solid understanding of monitoring and observability frameworks, incident management, and post-incident analysis.Excellent problem-solving and troubleshooting skills, with the ability to analyzeplex systems and identify areas for improvement.Strong leadership skills, with the ability to inspire and motivate a team, and foster a culture of collaboration, innovation, and continuous improvement.Excellentmunication and interpersonal skills, with the ability to effectivelymunicate technical concepts to both technical and non-technical stakeholders.Experience in the financial industry or hedge fund environment highly preferred.The sell :
Extremelypetitivepensation and medical benefitsBrand name that will open doors for your career in the futureMany opportunities for internal mobility and long term career growth