Vacancy expired!
As an SME in Site Reliability Engineering in PAS (Personal Advisor Services) you will have the opportunity to put your operational savvy-ness and engineering skills to work! On the job you'll be partnering with PAS Teams ensuring the "-ilities" (Availability, Reliability, Scalability, Usability; etc.) of PAS systems in both test and production environments. Additionally, you can anticipate working with real-time monitoring , diagnostic data and analyze trends. As a caretaker of these systems you'll be collaborating and planning activities with PAS Technical Leads to ensure that application service level objectives are met.
The goal of this position is to lead and guide all the Personal Advisor Services teams to do the things listed below.- Continually improving the monitoring, alerting, and automation to increase environments reliability, availability, performance, and overall system health.
- Champion the best Monitoring setup/Control reporting, Create/Update Procedures for Enterprise wide adoption
- Employ Analytics & Define/Report Trends
- Canary, Smoke Test and Manage Chaos Injection
- Experiment, recommend new tools where needed
- Identify process gaps and implement process improvements to increase operational efficiency.
- Work with different groups to develop and improve monitors for products and infrastructure.
- Collaborate and lead both within the team and across the organization.
- Work with operations / Cloud Ops teams to ensure applications and services are highly available and reliable.
- Modernizing the environments tools and technologies to support the IT strategic goals of continuous delivery and cloud migration.
- Automating and implementing the build and maintenance of the environments for applications.
- Bachelor's Degree preferred or equivalent technical experience
- An understanding and practical experience with containerization frameworks (Pivotal Cloud Foundry, ECS/Fargate, Heroku, Kubernetes, Docker)
- Experience with Atlassian suite (Jira, Bitbucket, Bamboo, Confluence), Git, Maven, Jenkins, Selenium, Nexus, Artifactory
- Strong background in Java Development
- You have been a part of or led agile development teams
- Worked with Concourse, Jenkins, and/or Bamboo CI/CD pipelines
- Understanding of monitoring/telemetry solutions (Splunk, ELK, AppDynamics, etc) data ingestion and analysis
- Knowledge of Linux/Unix systems
- Passion for problem solving and strategic thinking and a desire to own and execute
- Experience with dealing with production issues
- Understanding and application of at least one scripting language (Shell, PHP, Python, etc) in pursuit of automation
- Experience with configuration automation (Chef, Ansible, Puppet)
- Experience implementing and maintaining distributed applications and systems (Microservices, 12-factor app)
- A flexible schedule - some activities you'll be performing may require off-hours or weekend support
- Experience configuring (or administering) application or infrastructure monitoring tools
- Development experience in scripting/query languages
- Experience with dashboarding tools
Vacancy expired!