Lead Infrastructure DevOps Engineer
Lead DevOps Engineer
Ready to lead the charge in seamless software delivery? Find your bounce at tombola! 🚀
At tombola, we build our own amazing games and platforms, and getting that cutting-edge software to our players reliably and efficiently is key. We're looking for a Lead DevOps Engineer to head up a team/function that designs, implements, and maintains the infrastructure, automation, and deployment processes that power our high-quality software delivery. You'll help bridge the gap between development and operations, ensuring scalable, reliable, and secure systems that always align with our business goals.
What will you be doing? 🔎
As a Lead DevOps Engineer, you'll be instrumental in shaping our infrastructure and delivery pipelines. You'll guide a team, driving best practices in automation, monitoring, and cloud management to keep tombola at the forefront of online gaming.
Key Accountabilities and Responsibilities:
Team Leadership and Management
- Providing leadership, management, and development for your direct reports.
- Achieving this through effective 1-to-1s, clear objective setting (OKRs), and performance management.
- Making team goals clear and ensuring they align with our broader business objectives.
- Collaborating with other teams and departments to achieve shared success.
- Partnering with our People Partner for tech to build robust team management practices.
Continuous Integration and Continuous Deployment (CI/CD)
- Develop and maintain CI/CD pipelines: Automating the process of software integration, testing, and deployment to speed up software delivery.
- Integrate various tools: Ensure the development process integrates seamlessly with build and deployment tools (e.g., Octopus Deploy, GitHub, TeamCity).
- Automate deployment processes: Make sure deployments are fully automated and can be performed with minimal manual intervention.
Infrastructure Management
- Provisioning and managing infrastructure: Utilise Infrastructure as Code (IaC) tools like Terraform or CloudFormation to provision and manage infrastructure in AWS cloud environments.
- Optimize resource usage: Ensure our infrastructure runs efficiently in terms of cost, resource allocation, and performance.
- Scalability and availability: Ensure systems and applications are scalable and highly available by setting up robust monitoring, scaling, and failover mechanisms.
Monitoring and Incident Management
- Set up monitoring tools: Implement tools like CloudWatch and Dynatrace to monitor system health, performance, and availability.
- Respond to incidents: Be part of the team that responds quickly to system outages or issues, performing root cause analysis and implementing fixes or improvements.
- Log management: Ensure logging is properly set up using tools like FluentD and Kibana, and use logs for troubleshooting and improving system reliability.
Automation
- Automate manual tasks: Identify and automate repetitive tasks like environment setup, configuration management, and updates.
- Script development: Write scripts to automate common operations, reducing manual intervention and potential errors.
Collaboration with Development and Operations Teams
- Foster collaboration: Work closely with developers, system administrators, and other teams to align goals and requirements, ensuring seamless development and deployment processes.
- Code reviews and quality: Collaborate with developers to ensure code meets operational standards and can be deployed reliably.
Security and Compliance
- Ensure security in the pipeline: Integrate security practices into the development pipeline (DevSecOps), ensuring vulnerabilities are identified early.
- Maintain compliance: Ensure infrastructure and processes comply with industry regulations and standards (e.g., GDPR, ISO, SOC 2).
Cloud Management
- Cloud architecture and management: Design, implement, and maintain infrastructure on AWS.
- Cost management: Monitor cloud costs and optimize resource utilization to control run-rate.
Performance and Reliability Optimization
- Ensure optimal performance: Continuously assess and optimize system performance to handle load efficiently and minimize downtime.
- Disaster recovery planning: Develop and test disaster recovery plans to ensure business continuity.
Version Control and Configuration Management
- Version control systems: Use tools like Git to manage code changes and ensure proper branching, merging, and versioning.
- Configuration management: Use tools like AWS Control Tower, SSM, and Config to maintain consistent environments across development, staging, and production.
Documentation
- Maintain clear documentation: Document infrastructure, deployment processes, and operational procedures to ensure knowledge sharing across the team.
Performance Metrics and Reporting
- Collect and analyze metrics: Gather system metrics and provide regular reports on system performance, uptime, and resource usage to stakeholders.
- Optimization recommendations: Based on the metrics, suggest improvements to optimize the performance and cost-effectiveness of the system.
Continuous Improvement
- Stay up-to-date with industry trends: Regularly evaluate new tools, technologies, and practices that could improve the development and operations process.
- Implement best practices: Apply best practices for automation, system design, and security to improve the reliability, scalability, and efficiency of the infrastructure.
- Locations
- Sunderland, UK
- Remote status
- Hybrid

Lead Infrastructure DevOps Engineer
Loading application form
Already working at tombola?
Let’s recruit together and find your next colleague.