Job Details

ID #53822204
State Massachusetts
City Boston
Job type Full-time
Salary USD TBD TBD
Source Nexthink
Showed 2025-04-22
Date 2025-04-22
Deadline 2025-06-21
Category Et cetera
Create resume

Senior Site Reliability Engineer

Massachusetts, Boston, 02108 Boston USA

Vacancy expired!

Nexthink is looking for a  Site Reliability Engineer who is passionate about building and running a high-performance cloud platform and enabling best-in-class site reliability and operations practices. This role will support US-based operations generally, but will in addition focus on enabling Nexthink to deliver to the US Public Sector market, in particular a FedRAMP Moderate offering. The candidate will implement modern, cloud-native SRE processes and the management and operations for Nexthink’s multi-tenant, microservices-based cloud platform. The platform has multiple instances deployed across the globe.  This role involves working closely with cross-functional teams to integrate reliability and security into our systems, ensuring they meet federal security standards. The ideal candidate will have extensive experience in both software engineering and systems administration, with a strong understanding of FedRAMP concepts, requirements and security practices.Infrastructure Management:Oversee the design, deployment, and management of scalable and secure cloud infrastructure.Drive automation of infrastructure provisioning, configuration, and management using Infrastructure as Code (IaC) tools.Monitoring and Performance: Develop and maintain comprehensive monitoring, logging, and alerting systems to ensure high availability and performance. Lead efforts in performance tuning and optimization for applications and infrastructure.Security and Compliance:Ensure implementation and maintenance of security controls and best practices to achieve FedRAMP compliance. Conduct and oversee regular security assessments, vulnerability scans, and penetration testing. Collaborate with the compliance team to prepare for and respond to FedRAMP audits.Incident Management:Lead incident management efforts, ensuring rapid resolution and thorough root cause analysis.Develop and implement strategies for improving incident response and minimizing downtime.Collaboration and Communication:Work closely with development, operations, and security teams to integrate reliability and security into the software development lifecycle.Communicate effectively with stakeholders, providing regular updates on system performance, reliability, and compliance status.

Vacancy expired!

Report job