Manager - Site Reliability Engineering job vacancy

Vacancy expired!

Overview

Opportunity to work in a Hybrid Model, work four days per week from home.

Why GM Financial?

At GM Financial, were looking for an experienced and highly motivated Manager - Site Reliability Engineer. We are expanding our efforts into complementary data technologies for decision support in areas of ingesting and processing large data sets. Our interests are in enabling data science and Machine Learning applications on large and low latent data sets in both a batch and streaming context for processing.

To that end, this role will engage with team counterparts in exploring, deploying, and reliably operate technologies for holding data sets created using a combination of batch and streaming transformation processes. These data sets support both off-line and in-line machine learning training and model execution. Other data sets support search engine-based analytics. Exploration and deployment of technologies activities include identifying opportunities that impact business strategy, selecting data solutions software, and defining hardware requirements based on business requirements. Responsibility also includes documentation of procedures for deployment, monitoring, managing, and switching the environments in production and disaster recovery sites. This role participates along with team counterparts to architect an end-to-end framework developed on a group of core data technologies.

Responsibilities

Why Work Here?

A work environment built on teamwork, flexibility, and respect.
Flexible work options based on business needs.
GM Financial is committed to strengthening the communities where we live and work. Each year, through our Signature Events program, each year we select several philanthropic organizations to support.
Professional growth and development programs to help advance your career, as well as tuition reimbursement.
GM Financial 401(k) Savings Plan featuring a company's match.
Paid holidays and paid time off and floating holiday's.
All full-time team members receive eight hours of paid time off to volunteer each quarter.

What will you do here?

Manage Object Store and Spark cluster environments, on bare-metal and container infrastructure, including service allocation and confirmation for the cluster, capacity planning, performance tuning and ongoing monitoring.
Define and refine processes and procedures for the site reliability engineering practice.
Setup, manage and maintain Kubernetes based scalable environments for high-availability and work with vendors for smooth and continuous operations.
Work closely with data scientists, data architects, data engineers, ETL developers, other IT counterparts, and business partners to design and setup the environments to manage the ingested and processed datasets from the external sources, internal systems, and the data warehouse to extract features of interest.
Evaluate, research, experiment with data processing, management, and scalability technologies in a lab to keep pace with industry innovation while assessing business impact and viability for use cases associated with efforts in hand.
Design, setup, test, deploy, monitor, document, and troubleshoot data processing and associated automation issues from the operations perspective.
Work with IT Operations and Information Security Operations with monitoring and troubleshooting of incidents to maintain service levels.
Work with Information Security Vulnerability Management and vendors to remediate known impacting vulnerabilities.
Contribute to the evolving distributed systems architecture to meet changing requirements for scaling, reliability, performance, manageability, and cost.
Report utilization and performance metrics to user communities.
Contributes to planning and implementation of new/upgraded hardware and software releases.
Responsible for monitoring the Linux, Hadoop, and Spark communities and vendors and report on important defects, feature changes, and or enhancements to the team.
Research and recommend innovative, and where possible, automated approaches for administration tasks.
Identify approaches to efficiencies in resource utilization, provide economies of scale, and simplify support issues.
Perform other duties as assigned.
Conform with all company policies and procedures.

Qualifications

What you bring to the table:

2 - 3 years' experience hands on with supporting Linux, AIX, or other production environments
2-3 years' experience with supporting Hadoop, Object Store and/or Spark ecosystems technologies in production
3 - 5 years' experience of hands-on development/ administration experience on Kafka, HBase, Spark, Solr
3 - 5 years with scripting in bash, perl, ruby or python along with Docker Datacenter
Strong working knowledge of disaster recovery, incident management, and security best practices
Working knowledge of containers (e.g., docker) and major orchestrators (e.g., Mesos, Kubernetes, Docker Datacenter)
Working knowledge of automation tools (e.g., Puppet, Chef, Ansible)
Working knowledge of software defined networking
Working knowledge of setting up and customize interactive data analytics tools (e.g., Apache Zeppelin, Jupyter notebooks)
Experienced with networking infrastructure including VLAN and firewalls
Experience in Cloud infrastructure development and management in production

Education:

Bachelor's Degree

#LI-Hybrid

#LI-NZ1

Vacancy expired!

ID	#46103406
State	Texas
City	Arlington
Job type	Permanent
Salary	USD TBD TBD
Source	GM Financial
Showed	2022-09-30
Date	2022-09-29
Deadline	2022-11-27
Category	Et cetera
Create resume

Job Details

Manager - Site Reliability Engineering

Related jobs

»Program Manager Police and Security Training- Full Time, Days

»Retail General Manager - Arlington, TX

»Manager Police and Security Communications

»Manager Patient Logistics Center

»General Manager

»General Manager

»Assistant Manager