Job Details

ID #43488152
State New Jersey
City Piscataway
Job type Contract
Salary USD Depends on Experience Depends on Experience
Source Pyramid Consulting, Inc.
Showed 2022-06-22
Date 2022-06-16
Deadline 2022-08-15
Category Et cetera
Create resume

SRE Engineer

New Jersey, Piscataway, 75014 Piscataway USA

Vacancy expired!

Description:
  • Client is committed to provide the best Omni Channel and personalized experience to its customers over all channels including Digital, Retail and Indirect, and Customer Care. To enable this best-in-class customer experience, we are working on implementing Site Reliability Engineering (SRE) practices and principles across all customer interactions and applications. The role of the SRE team is to operate applications in production “mission-critical systems” and do whatever is necessary to keep the site up and running. It is often defined as a software engineer doing operations work. This SRE lead engineer will be responsible for maintaining and establishing service levels agreed upon with Business and manage error budgets for each of their systems. You will be expected to balance your time doing operational work (making sure systems work as expected) and also improving the systems by writing software to automate processes and reduce toil.
  • We are looking for a SRE Automation lead with 6 years of experience on distributed systems design and integration architectures of business applications using microservices, containers, and cloud. Lead team to build a new or modify existing automation framework for IT operations. Apply engineering mindset and development skills to IT operations to improve the overall observability of the applications and infrastructure and develop automation framework such way that it reduces manual efforts

What you’ll be doing
  • In this role, you will lead a team that develops the SRE automation framework and practice all tenets of SRE, vision and technical leadership to enable the execution of best in class operations practices that would improve Reliability of applications.
  • The incumbent should be a strong Technical lead to help execute on our vision for Site Reliability Engineering (SRE), determining how each system relates to each other and using a breadth of tools, build automation to improve Reliability for customers. Practices, such as limiting time spent on operations, and proactive identification of potential automation opportunities, factor into the iterative improvement key to both product quality and interesting, dynamic day-to-day work.
  • Implement SRE automation, develop automation across the stack, and optimize operations hours by reducing manual operations.
  • Eliminate toil by automation across all the layers – infrastructure provisioning, configuration management, deployment, testing, and operation.
  • Work on retooling our infrastructure to provide an agile, cloud based foundation that provides common infrastructure management and automation framework.
  • Interface directly with senior staff members within the organization to discuss and assess compliance with IT policies, standards and procedures, suggest opportunities for improvement, and report on the status of specific. Work with development teams throughout the software life cycle ensuring sustainable software releases.
  • Practice sustainable incident response and blameless postmortems.

What we’re looking for
  • Bachelor’s Degree or equivalent experience required, Masters preferred
  • 6+ years of experience in applications development, infrastructure, or database architectures
  • Good understanding of SRE practices and principles to build resilient systems and to provide business continuity.
  • Automation experience and ability to code or script at an advance level
  • Build and drive adoption for SRE automation for IT operations and deployments
  • Lead and participate in discussion to identify opportunities for SRE automation
  • Ability to program with one or more high level languages, such as Python, Ansible , Cloudformation , Java, , Ruby, and JavaScript
  • Perform analytics on previous incidents and usage patterns to come up with automation opportunities and take proactive actions
  • Experience with software development, software delivery lifecycle, application modernization, DevOps, Service/Infrastructure as Service and Operations
  • Experience in Systems Architecture, in-depth knowledge on SRE, IT Operations, Cloud, Coding and Scripting experience with Java, JavaScript, python and .NET, understanding of AI/ML
  • Intellectual curiosity, problem solving and collaboration skills
  • Experience in vendor management
  • Experience in IT Security and compliance, operations and network services, and application development
  • Experience in Cloud & Container platform Strategies, Design, Architecture and Migration
  • Excelling in delivering high-value solutions in dynamic and ambiguous environments
  • Leading medium to large projects by bringing together the right perspectives, identifying roadblocks, and integrating feedback from clients and team members

Vacancy expired!

Subscribe Report job