Site Reliability Engineer job vacancy

Vacancy expired!

Responsibilities:

Gain deep knowledge of our complex applications.
Serve as a primary point responsible for the overall health, performance, and capacity of one or more of our technology products.
Familiar with design principles of monitoring and alerting systems.
Designing, implementing, and maintaining robust monitoring and alerting to improve performance and reliability.
Experience with automation, configuration management, and developing infrastructure as code.
Use engineering best practices — deliver high-quality production code, utilize automated testing, and build reusable components
Develop tools to improve our ability to rapidly deploy and effectively monitor custom applications in a large-scale Windows and Linux environment.
Work closely with development teams to ensure that platforms are designed with "operability" in mind.
Function well in a fast-paced, rapidly-changing environment.
Participating in the operations on-call rotation, triaging and addressing production issues

Qualifications:

S. or higher in Computer Science or other technical discipline, or related practical experience.
Programming skills (Java & Shell Script | Python, Ruby Perl or C).
5 or more years of experience in Unix/ Linux large-scale operations role.
Experience in designing, analyzing, and troubleshooting large-scale distributed systems.
Debug production issues across services and levels of the stack.
Experience with one or more orchestration, deployment tools Docker, Ansible.
Familiarity with Git or other source control systems.
Experience using tools to create and manage CI (continuous integration) and CD (continuous delivery) pipelines.
Shell or Python experience, specifically for systems automation.
Good exp in performance Eng. tools like - Selenium, JMeter & Load runner etc.
Working knowledge of the TCP/IP stack, internet routing and load balancing.
Experience with monitoring alerting using technologies like New Relic, SiteScope, Netcool, Dynatrace, Extrahop, Moogsoft, Prometheus, Sensu, Nagios,Splunk,Dynatrace etc.
Optional: Experience implementing, designing, deploying Docker, Kubernetes, Serverless (Function or Lambda’s).
Strong interpersonal communication skills (including listening, speaking, and writing) and ability to work well in a diverse, team-focused environment with other SREs, Engineers, Product Managers, etc.
Creative thinker and strong problem solver with meticulous attention to detail
Highly organized, creative, motivated, and passionate about achieving results
Strong experience with AWS (design, SDKs, best practices) – good to have
AWS certifications – good to have.

Vacancy expired!

ID	#15339048
State	Georgia
City	Atlanta
Job type	Contract
Salary	USD Depends on Experience Depends on Experience
Source	Hexaware Technologies, Inc
Showed	2021-06-11
Date	2021-06-08
Deadline	2021-08-07
Category	Et cetera
Create resume

Job Details

Site Reliability Engineer

Related jobs

»Staff Site Reliability Engineer - Observability

»Site Reliability Developer

»Senior Systems Engineer - Workplace Engineering

»Senior Fueling Electrical Engineer - Aviation & Federal (Multiple Locations)

»Network Engineer

»Technical Solutions Engineer, Chronicle

»Security Site Supervisor - Unarmed