Head of Site Reliability Engineering (SRE)

at UK Cloud
Location Farnborough
Date Posted April 4, 2021
Category Default
Job Type Full-time


Head of Site Reliability Engineering (SRE)

The Site Reliability Engineering team is responsible for our Cloud Management Platform which provides the core tools and systems (such as automation, monitoring, ITSM, IDAM, etc) to enable UKCloud to build and operate our multi-cloud environments – efficiently and securely. The team consists of circa 20 IT specialists with particular skills in integration and automation.

Head of Site Reliability Engineering (SRE) Responsibilities:

• Operational and line management of the Automation & Service Reliability team.

• Lifecycle management of the core components of the Cloud Management Platform, including; reactive support of internal users of the CMP, proactive support such as preventiative maintenance and continuous service improvement/optimisation of existing core services.

• Providing project support for the adoption of new or replacement CMP systems or tools – from requirements, through design and engineering into core service.

• Ensuring appropriate supporting documentation for design, configuration and on-going support is available and maintained to support the enablement of 24/7 operations teams (external to ASR) to take on more of the reactive and proactive support requirements.

• As a member of the Office of the CTO (OCTO) Team, contribute to the development and implementation of an Automation & Service Reliability plan, including recommendations for technical training and wider team development.

• Working with internal stakeholders such as the Customer Services team, the Platform Operations team, Project Managers and the Software Engineering team to ensure their expectations are
• Consistently met and that they are bought into the continued improvement of the ASR function and the Cloud Management Platform.
• Contribute to the creation of technical standards within UKCloud to drive consistency, ease of automation and efficient scalability

• Providing regular fact (metric) based reports/scorecards on the performance of the ASR function to the CTO and wider business.

• Maintain procedural and audit-readiness of the function including policies, processes and education.

Head of Site Reliability Engineering (SRE) Requirements:

• Experienced and successful leader of Infrastructure teams.
• A proven track record of delivering infrastructure and systems automation programmes.
• A good understanding of core datacentre technologies: Services, Servers, Networks, Storage gained through an operational support background.
• Experience of executing a Vision and Strategy and the successful implementation thereof.
• Strong operational support experience of Linux Operating Systems and virtualised platforms.
• Demonstrable skill in at least one of the following scripting languages: Ansible, Bash, Python
• Leadership of technical teams and the ability to lead by example

About UKCloud:

UKCloud provides an unbeatable, secure UK public cloud. Focused solely on serving the UK Public Sector. We are committed to assurance and security while delivering flexible, agile and value-based cloud hosting to our customers.

Formed in 2012, UKCloud is based in Farnborough (Hampshire) and Corsham (Wiltshire). We have a team of 250+ people and we continue to grow! We are looking for people who want a rewarding career in a business who truly invest in you as an individual.

Type: Full time, Permanent

Location: Farnborough, Hants

Salary: Competitive salary plus 10% bonus

• 25 days' holiday increasing to 30 days over length of service, half a day birthday leave, charity day
• Flexible working
• Contributory pension
• Healthcare
• Life cover
• Access to free parking
• Active social and charity events
• Cycle to work scheme
• Onsite facilities
• Friday breakfasts, fruit and soft drinks

UKCloud is an equal opportunities employer and positively encourages applications from suitably qualified and eligible applicants. Applicants must be eligible to work and live in the UK and will be required to undergo and maintain appropriate UK government security clearance.

You may have experience of the following: Site Reliability Engineer, IT Infrastructure Manager, Infrastructure Engineer, Service Delivery Manager, Site Reliability Manager, Cloud Services, SRE Manager, SRE Engineer, etc.

Ref: 97529https://ukcloud.livevacancies.co.uk/#/applicant/175?source=Careerbuilder

Only registered members can apply for jobs.