Job Details

ID #8431084
State North Carolina
City Raleigh / durham / CH
Full-time
Salary USD TBD TBD
Source Wells Fargo
Showed 2021-01-21
Date 2021-01-21
Deadline 2021-03-22
Category Et cetera
Create resume

Automation & Site Reliability Sr. Engineer

North Carolina, Raleigh / durham / CH, 27601 Raleigh / durham / CH USA

Vacancy expired!

Job DescriptionImportant Note: During the application process, ensure your contact information (email and phone number) is up to date and upload your current resume when submitting your application for consideration. To participate in some selection activities you will need to respond to an invitation. The invitation can be sent by both email and text message. In order to receive text message invitations, your profile must include a mobile phone number designated as 'Personal Cell' or 'Cellular' in the contact information of your application.At Wells Fargo, we want to satisfy our customers' financial needs and help them succeed financially. We're looking for talented people who will put our customers at the center of everything we do. Join our diverse and inclusive team where you'll feel valued and inspired to contribute your unique skills and experience.Help us build a better Wells Fargo. It all begins with outstanding talent. It all begins with you.Wells Fargo Technology is a team of more than 40,000 information technology and security professionals who help keep Wells Fargo at the forefront of America's diversified financial services companies. Employees execute an engineering-led IT strategy to deliver stable, secure, scalable and innovative services that provide Wells Fargo global customers ‘round-the-clock' banking access through in-store, online, ATM, and other channels. Wells Fargo Technology plays a critical role in the company's customer and employee experience, business and risk management transformation, and growth agenda.Platform Management Engineering Services, within Enterprise Functions Technology (EFT) focuses on scaled horizontal enterprise solutions that are stable, secure, and always on. EFT Engineering Services is seeking an Automation Engineer and a Site Reliability Engineer (SRE) to be a part of a newly embedded Site Reliability Engineering practice within EFT supporting multiple technology divisions. We believe that "Hope is not a Strategy" and we solve operational issues through code.We are looking for an SRE who enjoys and thrives on solving complex problems through innovation impacting change at scale in a diverse environment. You will join a focused team of SREs introducing and advancing SRE discipline across several hundred applications and multiple vertical lines of business supporting the entire firm. The team will drive technology transformation and adoption of SRE aligned enterprise capabilities and products, launch new tooling enablement, automate away complex issues and integrate with the latest technology. Site Reliability Engineers leverage their experience as software and systems engineers to ensure applications onboarded to SRE are available, have full stack observability, introduce continuous improvement through code and automation, provide operational insight through analytics, continuously test, are integrated with CI/D and work with application teams to ensure products and service we provide are always on.The Automation Engineer will work within Wells Fargo Platform Management team partnering across platform teams, development teams, product owners, scrum masters, and with other technology centers of excellence. They are responsible for engineering new solutions (automated and procedural) to improve platform and application stability, performance, staff productivity, metrics and reporting; ensuring all availability, architecture, quality, security, support and risk/compliance standards are met. It will include both collaborating with other teams to accomplish the solutions and in other cases creating the solution.The Automation Engineer will be responsible for the following:Develop and oversee the engineering of automated solutions to improve platform and application stability, performance, staff productivity, metrics and reporting.

Design, build, deploy and maintain engineered solutions through collaborative efforts with team members and third party vendors.

Collaborate with other teams within the Enterprise to design and create effective solutions.

Engage in service capacity planning and demand forecasting, software performance analysis and system tuning.

Perform advanced troubleshooting of incidents in mission-critical systems (on call support as necessary) and participate in preventative problem management activities.

Partner to influence and support innovation & continued drive towards automation, touch less operational sustainment as a design/architecture construct working with EFT technology partners/managers.

This role is posted as an Automation Engineer, the Wells Fargo job title is a Systems Operations Engineer.

Operational sustainment and reduce risks in the eco-system by aggressively pursuing safety and soundness type of actions not limited to vulnerability, patching, end of life and resiliency.

Manage and coordinate Production change requests and release management.

Manages continuous services improvements and drives innovation to ensure SLAs, KPIs and OLAs for the critical business processes, applications and partner interfaces.

This Site Reliability Engineer will be responsible for the following:Instantiate Site Reliability Engineering practice at Wells Fargo EFT igniting the practice, principles, and culture leading by example. Assist in training skilled peer engineers by growing the practice within EFT and partnering with peer platform embedded SRE teams.

Onboard 16 critical customer journeys and applications to Site Reliability Engineering working within EFT and Lines of Business to assess the availability of critical business flows, identify service level objectives and indicators, instrument applications for observability, onboard to CI/CD pipeline taking advantage of continuous testing, introduce continuous inspection, continuous improvement, and conduct destructive testing to reach 99.99% availability for the firms critical products and services leading to higher customer satisfaction and customer experience.

Introduce enterprise capabilities, tools, and innovation improving availability in a multi-cloud ecosystem by evolving observability, monitoring, logging, CI/CD integration, continuous testing (performance, smoke, regression, functional, chaos) introduce continuous improvement, standardization/automation, capabilities to conduct destructive and resiliency testing

Evolve AIOPS, ChatOps, NoOps introducing self-healing and autonomic capabilities solving for complex operational and systemic issues with precision including building and training models, automating cognitive processes, leveraging Robotic Process Automation, Unified Communication, and AI/ML to improve availability of products we provide to customers

Automate key SRE metrics and IT Service Operations processes including customer impact, % availability of critical business flows, SLO/SLI adherence, error budget, automate incident process for IT Service Operations through data integrating with unified communications, alerting/notification systems, and evolve ChatOps to reduce time to recovery.

Share support responsibilities for critical applications and customer journeys onboarded to SRE including remediation of issues through Agile, conduct blameless post mortems, root cause analysis and introduce continuous improvement solving problems once and for all with the goal of no repeats.

Proven Technical Expertise with one or more of the following:Software Development: Java, Go, C/C, Scala, R

OS and Platform - AWS, Lamda, PCF, Kubernetes, OpenShift, Linux, Azure, Windows, VMware

CI/CD and Automation: Jenkins, Gitlab, SonarQube, Artifactory, Ansible, Puppet, Apigee

Observability and AIOPS: DataDog, Grafana, Prometheus, ELK, Elastic, Kibana, Kafka, CloudWatch, Jaeger, Zipkin, Kinesis, Apache Airflow, AppDynamics, Splunk

Experience in one or more of the following areas is desired:AIOPS: Moogsoft, BigPanda, UIpath, Robotic Processing, Artificial Intelligence (AI) and Machine Learning (ML) Frameworks

Operations Tools: ServiceNow, PagerDuty, Microsoft Teams, Symphony/Slack, Remedy, IBM Netcool

Data/Data Structures: Oracle, SQL, Mongo, Hadoop, Cloudera, Spark, Teradata

Testing: Gremlin, Chaos Monkey, Selenium, jmeter, Blazemeter, Performance Center, Perfecto, Gherkin, DevTest

Capacity Management: Turbonomics, BMC Truesight

Required Qualifications7+ years of software engineering experience

7+ years of development experience with languages such as Python, Java, Scala, or R

3+ years of build-deploy automation and configuration experience within the Linux and Unix environment

Desired QualificationsAn industry-standard technology certification

Strong verbal, written, and interpersonal communication skills

Scripting and automation experience

Experience with Ansible automation tool

Incident Management System experience

Configuration Management Tools experience

Experience with Agile Scrum (Daily Standup, Sprint Planning and Sprint Retrospective meetings) and Kanban

Excellent verbal, written, and interpersonal communication skills

Other Desired Qualifications2+ years working with configuration and monitoring technologies such Ansible, Telegraf, Grafana

3 + years of Unix or Linux administration experience

2+ years of design, implementation and governance experience with Artificial Intelligence, Natural Language Processing or Machine Learning architecture

4+ years of experience with Cloud technologies

Experience with Ansible automation

Experience with mass vulnerability remediation automation

Experience with system administration across multiple platforms

Experience with one or more Technology Platforms (Cloud, o/s, etc.): Pivotal Cloud Foundry (PCF), AWS, Azure, Linux, VMware

Experience with Observability/Monitoring technologies: Splunk, DataDog, Elastic Stack/ELK, Grafana, Prometheus, Kafka, Cloudwatch

Experience with Container technologies: Kubernetes, Docker, PKS

Experience with Site Reliability Engineering (SRE)

5+ years of experience with Agile, Kanban, or Lean methodology

Job ExpectationsWillingness to work on-site at stated location on the job opening

Ability to travel up to 15% of the time

Street AddressNC-Raleigh: 1100 Corporate Center Dr - Raleigh, NCNY-New York: 100 Park Ave - New York, NYAZ-Chandler: 2600 S Price Rd - Chandler, AZMN-Minneapolis: 255 2nd Ave S - Minneapolis, MNCA-Concord: 1755 Grant Street - Concord, CATX-DAL-Downtown Dallas: 1445 Ross Ave - Dallas, TXDisclaimerAll offers for employment with Wells Fargo are contingent upon the candidate having successfully completed a criminal background check. Wells Fargo will consider qualified candidates with criminal histories in a manner consistent with the requirements of applicable local, state and Federal law, including Section 19 of the Federal Deposit Insurance Act.Relevant military experience is considered for veterans and transitioning service men and women.Wells Fargo is an Affirmative Action and Equal Opportunity Employer, Minority/Female/Disabled/Veteran/Gender Identity/Sexual Orientation.Company: Wells FargoReq Number: 5561854-6Updated: 2021-01-20 10:25:36.705 UTCLocation: Raleigh,NC

Vacancy expired!

Subscribe Report job