OverviewPlatform Reliability Manager is responsible for three key areas within the Digital Technology team:
Ensuring the availability of the production environment, lower environment availability, ensuring capacity for growth, and ensuring that the production services meet (or exceed) prescribed Service Level Agreements (SLAs). and Operational Level Agreements (OLAs).
Ensuring that Release Management Practices and Procedures are followed through modern Dev/Ops processes.
Ensuring resiliency of the Digital application through modern application monitoring/observability practices.
Key Area of Responsibility
Application Monitoring Engineering
Implement modern Observability tools and processes.
Evangelize the importance of logging/monitoring within the ART
Work with Tech Partners to develop the appropriate feature thresholds.
Release Management
Establish and maintain healthy Pipelines for legacy and modern code
Ensure mandatory security and vulnerability scans are automated and up to date
Leverage Test Automation resources to provide fast feedback loops to development teams and business partners.
Environment Support Engineering
Coordinate the infrastructure standards (patching, firewall, SSL)
Establish access standards within the lower environments
Integrate within the infrastructure engineering team for automation.
Reduce lower environment problems through monitoring and reducing operational toil.
Proactively analyze, recommend, and manage delivery of recommendations to production environment
Manage and guide day-to-day activities within the application reliability teams
Continually evaluate production environments for:
Availability
Responsiveness
Accuracy
Scalability for both organic growth and readiness for acquisition-based growth
Define, document, and evolve application reliability support strategy
Collaborate with engineering, telecom, networking, and information security teams to ensure application reliability and system security
Define, design, and deploy value-added preventative maintenance tasks based on:
Performance testing of new projects and or functionality being introduced into the production environment.
Vulnerability and/or penetration testing for security loopholes
Application reliability manager is responsible for minimizing the potential of unplanned outages. In the event of an outage engage problem resolution teams, establish business command center for communicating to the lines of business and technology stakeholders. Application reliability manager is also responsible for establishing and maintaining on-call support arrangements for the production teams based on the criticality of the applications in their domain, and their support model (onshore / offshore).Application reliability manager is also responsible for ensuring that procedures, and application reliability strategy is documented and kept up to date. These may include:
Change management.
Application Access management – Re-certifications
Release and Deployment Management
Incident/problem management
Root cause analysis – for in-house applications and vendor hosted applications on-premises.
Network and server setup diagrams
Backup and disaster recovery procedures per bank standards
Application Reliability Manager is also responsible for establishing metrics to evaluate the effectiveness of application reliability tasks. These may include operational matrices on systems, database and network performance matrices, and incident management reports.Key Qualifications Criteria
10+ years of experience managing production environments within an organization or specifically within financial institutions.
Minimum of Bachelor’s degree in computer sciences or a minimum of 15 years IT professional experience in lieu of degree
Working knowledge of Site Reliability Engineering Principles or SRE Foundations Certified
Strong technical background from both application development and infrastructure perspective
Hands on technical expertise in Modern Logging practices, Observability tooling, Pipelines and testing principles.
Strategic Leadership and Vision with a Focus on the “OPS” side of DEV/OPS
Cross-Functional Collaboration/Building Alliances
Requires experience with SDLC procedures.
Must be detail oriented/articulate with strong time management and organizational skills
Experience with problem and incident management
Demonstrated ability to lead and to effectively communicate with staff at all levels of the organization that includes both technical staff and business partners
Motivated and excellent interpersonal skills including ability to lead and affect change
Demonstrated understanding of systems and processes within a complex environment
Ability to exercise substantial initiative and strong problem solving skills
M&T Bank is committed to fair, competitive, and market-informed pay for our employees. The pay range for this position is $115,703.73 - $192,839.55 Annual (USD). The successful candidate’s particular combination of knowledge, skills, and experience will inform their specific compensation.Location:Buffalo, New York, United States of AmericaM&T Bank Corporation is an Equal Opportunity/Affirmative Action Employer, including disabilities and veterans.