Vacancy expired!
- In this client, Production Support and Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to manage and support large-scale, massively distributed, fault-tolerant systems hosted in the external cloud environment.
- When incidents do occur, Production Support engineer is responsible for taking ownership and consulting, engaging and partnering with Lines of Business to lead the team towards successful resolution, as well as conducting problem management activities coupled with implementing prevention strategies.
- Expertise in designing, analyzing, and troubleshooting large-scale distributed systems (i.e. 3-tier applications end-to-end troubleshooting).
- Experience with Scripting language(s) to debug, optimize code, and automate routine tasks.
- Experience using and supporting External Cloud environments (i.e. troubleshooting cloud-hosted micro-service failures); experience with AWS.
- Experience with enterprise monitoring and observability solutions (Splunk, Datadog, PagerDuty or New Relic)
- Systematic problem-solving approach, coupled with effective communication skills and a sense of drive.
Vacancy expired!