Vacancy expired!
SRE Team Lead -Cloud Stability
The Cloud Stability group is trusted to support Bloomberg's private cloud infrastructure. This infrastructure runs on our own open-source OpenStack distribution based on OpenStack itself, Ubuntu, Chef, Ansible and Ceph along with VMWare hypervisors. It spans across Bloomberg's own world-class data centers and global private network, hosting business critical applications and services. You'll be trusted to ensure high-availability and scalability of this environment in a team which relies on software engineering and automation to address challenging at-scale infrastructure problems. What's in it for you: You'll work with modern open-source computing platforms while maintaining mission-critical systems hosting a wide array of applications. We'll depend on you to advise on the design, architecture, and scaling of our virtual farms that utilize several different technologies both open and closed source. In addition, you'll play a critical role in improving the stability of all cloud systems to help us ensure we have a solid platform as we scale. We'll trust you to:- Lead a team of infrastructure and software reliability engineers responsible for our global cloud infrastructure
- Own the performance and availability of our cloud products, innovating with your team to continuously improve upon both
- Build automation around all phases of the cloud lifecycle, eliminating toil, automating responses to failures, and generally eliminating as much operational work as possible
- Inspire and motivate a high-performing team, leading by example while supporting individual growth and development
- Develop technical solutions which combine hardware and software to achieve required service levels, performance, and economics
- Experience managing teams responsible for mission-critical production systems
- Hands-on software development experience in some combination of C/C or Python
- The ability to take ownership and responsibility of issues and handle them effectively to resolution
- Contributions to Open Source projects especially related to Cloud, Automation, or Performance Monitoring
- Thorough knowledge and experience with data structures and complex production troubleshooting
- A keen interest in keeping abreast of technological advances and proven success at incorporating new technology into existing systems
- Experience working with large-scale distributed systems including deep dives into code, cloud infrastructure, networking, and operating systems
Vacancy expired!