Site Reliability Engineer job vacancy

Vacancy expired!

Site Reliability Engineer The SRE team at is charged with creating tools, processes, and frameworks that ensure the stability of the Internet Computer, which is distributed and scalable. As a member of the team you will work with engineering, infrastructure, and security teams to bake reliability and operability into the product from the start, by participating in design and code reviews, identifying risks, problems, and mitigations. This is not a team that exists to be on-call; this is a team that elects to be on-call because it helps do the job better.

Responsibilities:

Implement tools that ensure high availability of product
Gain deep knowledge of complex applications
Identify opportunities to automate or improve processes and then implement the automation
Coordinate incident response across multiple teams clearly understanding and communicating what is going on, next steps, who is responsible for what, and so on
Implement observability tools to ensure visibility into service stability and performance
Be on-call for production services
Operating, troubleshooting, and deploying software to Unix systems
Thinking about things in a systemic, methodical way, especially when troubleshooting

Required Skills:

Expertise in observability and monitoring of applications, services, and networks, using tools such as PrometheGrafana and ELK logging
Unix/Linux experience, including application installation, configuration, and maintenance
Significant experience with site reliability, developer productivity, devops, or server infrastructure engineering (including on call incident response)
Understanding of Internet networking protocols: TCP/IP, TLS, DNS, HTTP/S, SMTP
Experience troubleshooting issues across the entire stack (hardware, software, network, etc)
Experience writing automation scripts and utilities in a scripting language such as Python, Perl, Shell, PHP, etc
Experience with incident and problem management
Strong communication and interpersonal skills

Desired Skills

Experience coding in Rust or C
Experience supporting large-scale, mission critical services
Experience with CI/CD pipelines

Vacancy expired!

ID	#15434030
State	California
City	Sanfrancisco
Job type	Permanent
Salary	USD $140+ 140+
Source	Eclaro
Showed	2021-06-14
Date	2021-05-25
Deadline	2021-07-24
Category	Systems/networking
Create resume

Job Details

Site Reliability Engineer