Vacancy expired!
Job Summary:
- A client of ours in Atlanta GA is looking for a Site Reliability Engineer with Google Cloud Platform for a Contract opportunity.
- Site Reliability Engineer will manage end to end application and system stack and to work with one of the leading financial services organization in the US.
- Site Reliability Engineering (SRE) is a discipline that combines software and systems engineering for building and running large-scale, distributed, fault-tolerant systems.
- SRE ensures that internal and external services meet or exceed reliability and performance expectations.
- SRE is also an engineering approach to building and running production systems engineer solutions to operational problems.
- As SREs are responsible for overall system operation, utilizing a breadth of tools and approaches to solve a broad set of problems.
- Practices such as limiting time spent on operational work, blameless postmortems, proactive identification, and prevention of potential outages.
- You will be part of the team to migrate and transform the on-prem applications and data centers to public Cloud (Google Cloud Platform), and then.
- You will engage in and improve the software development lifecycle from inception and design, through development, deployment, operation and refinement
- Develop and maintain the large-scale infrastructure
- Own build tools and CI/CD automation pipeline
- You will influence and design infrastructure, architecture, standards and methods for large-scale systems
- You will support services prior to production via infrastructure design, software platform development, load testing, capacity planning and launch reviews
- You will maintain services during deployment and in production by measuring and monitoring key performance and service level indicators including availability, latency, and overall system health
- You will automate system scalability and continually work to improve system resiliency, performance and efficiency
- Investigate, diagnose, and resolve performance and reliability problems in a wide range of large-scale and high-throughput services
- Collaborate with architects and application engineers to ensure applications are maintainable, scalable, and follow appropriate disaster recovery and high availability strategies
- Contributions to handbook, runbooks, and general documentation
- You will remediate tasks within corrective action plan via sustainable, preventative, and automated measures whenever possible
- BS degree in Computer Science or related technical field, or equivalent job experience required
- 4plus years of SRE experience in Cloud environments
- 2+ years of experience developing and/or administering software in public cloud
- Strong working knowledge and working experience on Google Cloud Platform (Google Cloud Platform)
- Experience in DevOps and CI/CD pipelines and build tools like Jenkins.
- 2 -4 years of experience in languages such as Python, Ruby, Bash, Java, Go, Perl, JavaScript and/or node.js
- Experience managing Infrastructure as code via tools such as Terraform or CloudFormation
- Must have great communication skills
- Experience operating a production environment at high scale with emphasis on availability, latency
- Deep knowledge of container orchestration tools such as Docker, Kubernetes
- Familiar with configuration management tools and Deployment tools such as Chef, Octopus
- Experience in software development in one or more of the following: C, C, Java, Go and/or Perl, Python.
- Prior experience in developing and/or administering software in Windows with Dotnet applications
- Strong team player with a "can do" attitude, and the flexibility to jump in wherever needed
- Demonstrable cross-functional knowledge with systems, storage, networking, security and databases
- System administration skills, including automation and orchestration of Linux/Windows using Chef, Puppet, Ansible, Salt Stack and/or containers (Docker, Kubernetes, etc.)
- Proficiency with continuous integration and continuous delivery tooling and practices
- Strong analytical and troubleshooting skills
- Ability and willingness to learn and apply new tools and technologies
- Extra Points for any of the following:
- Prior experience in developing applications in .NET technologies or Java
- You have expertise designing, analyzing and troubleshooting large-scale distributed systems.
- You take a system problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
- You are passionate for automation with a desire to eliminate toil whenever possible
- You've built software or maintained systems in a highly secure, regulated or compliant industry
- You thrive in and have experience and passion for working within a DevOps culture and as part of a team
- Kubernetes
- AWS
- Docker
- Devops
Vacancy expired!