Job Details

ID #15545428
State Virginia
City Reston
Job type Contract
Salary USD $45 45
Source Kellton Tech
Showed 2021-06-17
Date 2021-06-16
Deadline 2021-08-15
Category Et cetera
Create resume

Site reliability engineer

Virginia, Reston, 20190 Reston USA

Vacancy expired!

Team:Digital Operations Center (DOC) 7 person team that keeps all systems running 24x7. They also back up the user support center as needed.

Looking for a 2-5 year engineer with solid infrastructure monitoring and scripting experience. They seek someone with solid communication skills, that loves to learn and methodical with process and documentation. Imagine working in an environment where applications make use of legacy technologies. Now imagine you are an integral part of the migration to new cutting-edge technologies. You help guide the migration by providing insights into system performance and reliability while leveraging new monitoring mechanisms created as part of an elite engineering team. If this sounds exciting to you, you will work with us as a Site Reliability Engineer. You will be a member of the Digital Operations Team, a 24x7 operation, and help keep our systems and our customers protected. You will contribute to maximum availability, performance, reliability and security for positive digital experiences at the Client. As a champion of the Observability discipline, you will use leading edge tools, partner with DevOps teams to adhere to best practices, and develop monitoring to proactively address issues before they affect performance or availability.

S

Technologies You'll Use:DC/OS, Kubernetes, Nginx, AWS, HybridCloud, Jenkins, Git, Docker, Kafka, Prometheus, Kibana, Elasticsearch, Grafana, Postgres, Javascript or PythonResponsibilities:o Work closely with DevOps teams to understand, evaluate, and propose solutions to meet current and anticipated future growtho Using state-of-the art tools, automate and optimize our cloud and internal infrastructureo Design and develop tools to aid in improving infrastructure reliabilityo Work with security analysts to apply best security practices with new product monitoring patternso Provide support and coordinate triage efforts to resolve technical issues for enterprise class applications that are hosted in the cloud and datacentero Analyze production utilization and incident patterns, identify areas of improvement, and implement automation to improve productivity, avoiding manual tasks and recurring incidents.o Implement new software technology and coordinate simultaneous implementation tasks across teamso Work with and lead other members of the team in staying on top of key industry innovation and technology, and assist in team development growtho Incident management - Act as a key responder during high severity incidents and participate in the technical review of the incident for problem management (Root Cause Analyses (RCAs))o Lead by example to instill a culture of continuous improvement and optimization among your counterpartso Continually evaluate our processes and technologies and suggest areas for improvement

Skills we hope to see:o Ability to operate in the high-pressure environment, troubleshoot complex issues quickly and successfully handle multiple prioritieso Have a strong desire to continually learn new technologies, tools, and methodologies, including those out of your comfort zone.o Display a solid understanding of core computer science fundamentals including common data structures, algorithms, and concurrent programming.o Strong attention to detail and excellent analytical capabilitieso Emphasis on qualityo A creative thinker approach, not bound by "the way things have always been done ". We're looking for someone to help us grow and continue to innovate in our industry.

Qualifications:o 2+ years experience as an SREo Bachelors Degree in Computer Science, Software Engineering, Computer Engineering or equivalent experienceo You have experience engineering and supporting production services in a modern, containerized cloud environmento Familiarity working with at least one major cloud platform, such as AWS, Azure, Google Cloud Platform, etc.o Familiarity with Unix/Linux systemso Proficiency with one or more scripting languages such as Javascript or Pythono Have an ability to engage effectively with business and technical teamso Strong interpersonal skills as well as strong problem-solving and analytical skillso Understanding of networking, routing, and security conceptso Understanding of monitoring tools and conceptso Experience in working in a 24/7 support teamo Excellent written and verbal English communication skills

Vacancy expired!

Subscribe Report job