Job Details

ID #49573257
State North Carolina
City Cary
Job type Contract
Salary USD TBD TBD
Source Codeforce 360
Showed 2023-03-28
Date 2023-03-27
Deadline 2023-05-26
Category Et cetera
Create resume

SRE Lead (Onsite)

North Carolina, Cary, 27512 Cary USA

Vacancy expired!

Career Opportunity: Job Title: SRE Lead (Onsite) About CodeForce 360 Making a career choice is amongst the most critical choices one can make, and it's important for the choice to be calculated with factors such as a company's run of success since its inception and more. But, when you come across a company that has reputation proven with nothing but an illustrious run of success since the day it began, you don't need to think of anything else. That's precisely what some of our employees and prospective employees think when they came across CodeForce 360. Position Overview SRE Lead (Onsite) Requirements:

  • Experience working as an IT Operation Automation Solution Architect for minimum of 2 years.
  • Experience implementing AIOPS solution
  • Strong Experience with one AIOPS platform (ServiceNow ITOM, Splunk ITSI, Moogsoft
  • Strong Experience with one Orchestration & automation platform (ServiceNow Orchestrator, Ansible Tower, IPCenter)
  • Exposure with one APM AIOPS tools (Dynatrace, AppDynamics, Datadog, New Relic)
  • Exposure with at least couple of Infra monitoring tools (For ex: Solarwinds, ScienceLogic, Zabbix etc.)
  • Exposure with one or multiple RPA platform (Blue Prism, UIPath, Automation Anywhere etc.)
  • 5+ year's experience designing, implementing and managing one or more of the following monitoring platforms
  • App Dynamics / Dynatraceo DataDog / Splunk / Moogsoft
  • Sensu
  • New Relic
  • 3-5 years designing, deploying and managing one or more of the following
  • Graphite
  • Prometheus
  • TICK stack
  • 3-5 years designing, deploying and managing log aggregation solutions with either Elastic or Splunk
  • Proficiency in at least one high level programming language used for automation Ansible, Python, Ruby, GO,
  • Experience developing monitoring integrations into
  • ServiceNow
  • PagerDuty
  • Slack
  • Microsoft Teams
  • 3-5 year's experience as a system administrator in a predominantly RedHat LINUX environment
  • Proficiency with at least one of the following configuration management tools o Chef, Puppet, Ansible
  • Understanding of application development and deployment practices, primarily in Java
  • Experience with monitoring large scale containerized applications
  • 3+ years developing and designing dashboards in Grafana, Kibana, Tableau or equivalent
  • Identify operations to automate, and design and implement automation frameworks for provisioning, configuration, and deployment of infrastructure and applications.
  • Develop and implement effective incident management processes and procedures to minimize service disruptions and mitigate the impact of incidents when they occur.
  • Coordinate and lead incident response teams to quickly and effectively address incidents.
  • Analyze incidents to identify root causes and implement measures to prevent similar incidents in the future.
  • Develop and implement disaster recovery and business continuity plans to minimize service disruptions and data loss.
  • Develop and implement security best practices and standards to ensure the security and privacy of the data and platform.
  • Design and implement effective monitoring, logging, and alerting systems to ensure the stability and performance of the platform.
  • Collaborate with developers, product managers, and operations teams to ensure the platform meets the needs of the business.
  • Develop and implement continuous integration and delivery pipelines to ensure faster and more reliable software delivery.
  • Maintain a deep understanding of emerging technologies, best practices, and industry trends, and make recommendations for improvements to the platform.
  • Establish processes for deploying, managing, and troubleshooting production systems across multiple cloud providers and on-premise infrastructure.
  • Bachelor's or Master's degree in Computer Science or a related field.
  • 7+ years of experience in designing and managing large-scale, geo-distributed systems, with expertise in cloud computing, networking, and security.
  • Experience in automating operations tasks and deploying applications using continuous integration and delivery pipelines.
  • Strong problem-solving skills and ability to analyze incidents and identify root causes.
  • Passion for learning and staying up-to-date with emerging technologies and best practices.
  • Excellent communication and collaboration skills
How to Apply Job ID: JPC - 143431 For more information, please contact below: Vishal Shinde Qualified individuals will be contacted for an interview.

Vacancy expired!

Subscribe Report job