Job Details

ID #19626641
State Texas
City Plano
Job type Permanent
Salary USD TBD TBD
Source Pepsico
Showed 2021-09-15
Date 2021-09-13
Deadline 2021-11-11
Category Architect/engineer/CAD
Create resume

Operations Center Engineer

Texas, Plano, 75024 Plano USA

Vacancy expired!

A fortune 50 client is seeking an IT Operations Center Engineer to join their expanding SRE team in Plano, Texas. As Operations Center Engineer, youll work with a variety of talented PepsiCo SRE teammates and serve as a driving force for IT stability, providing intelligent proactive monitoring to ensure end-to-end insight of mission critical services for all Technology Platforms.

This is a key technical role to drive observability, automation across the Technology stack to prevent incidents and reduce MTTR. This role will participate in Incident and Problem management to provide technical guidance and direction to Managed Service Providers (MSP) in monitoring and automation. This lead role will be leveraging AIOps capabilities to help PepsiCo IT Operations to move towards a forward looking and proactive organization to bring minimum disruption to the business. You will be partnering with Observability engineers to help create highly available observability solutions, transforming and accelerating PepsiCos AIOPS Journey. We're looking for someone with solid experience using observability data to debug systems, to reduce the frequency and length of production incidents, and to provide a cohesive overall view of systems health. This engineer will partner with ITSM teams, multiple Managed service providers and Operations SMEs across applications, Infrastructure to drive prevention agenda thru monitoring, automation and process innovations.- Perform the role of a technical lead specialist in Operation Center and drive towards overall IT stability.- Drive the success of an AIOps based Command Center with providing technical insights during major incidents and provide senior management updates.- Work with Operations leads and SMEs to provide solutions following best practices for monitoring and automation to ensure maximum business benefits .- Work with DevSecOps teams to provide reliability for services to minimize business impact.- Work directly with Major Incident Managers and MSPs to identify root cause for high and critical incidents and opportunities for automation- Provide technical and process guidance to team members and MSPs.- Identify opportunities to improve operational stability and performance.- Partner with Problem management team and MIMs to identify opportunities with monitoring and operations.- Focus on improving the Mean Time to Resolve (MTTR) for major incidents and improve Mean Time to Detect (MTTD) - Masters degree in related field - Experience with monitoring and observability solutions and methodologies including server and network performance, hardware, web synthetics, and application performance monitoring- 12+ years of experience in IT Support & Operations role- 5+ years of experience in an IT Command Center- Experience implementing and delivering monitoring & Automation solutions in a complex hybrid environment- Demonstrate competence in shell scripting and high-level programming languages. Strong focus on python- Previous experience defining, creating, and supporting monitoring dashboards- Broad technical knowledge automation/self-heal capability enablement & observability and AIOps- Possess practical knowledge of various aspects of distributed service design, including messaging protocols, caching strategies and autonomous software design practices- Experience with monitoring and observability tools and methodology of products such as; Splunk , ElasticSearch, AppDynamics, Dynatrace, Solarwinds, Nagios, Graphite, Grafana, Prometheus, BigPanda, Solution Mgr., Focus Run, Datadog etc.- Strong understanding of Open Systems Interconnection model (OSI model)- Solid understanding of performance metrics, KPIs, statistical calculations, machine learning, and correlation- Ability to solve problems across the entire stack - operating systems (Linux/Unix/windows), software, application, and network- Experience with a variety of modern distributed software tools, e. g. service discovery, containerization, messaging- Extensive experience with metrics and logging libraries and aggregators, data analysis and visualization tools- Understanding of ITSM process, with a focus on Event, Major Incident, Problem Management - Broad technical knowledge in OS and Platform - Azure, PCF, Kubernetes, Linux, Windows, VMware, AWS, Cisco, Infoblox, F5, Palo Alto- API development using 3rd party libraries, REST/API- Bachelors degree or higher in computer science, engineering or related field

Vacancy expired!

Subscribe Report job