Vacancy expired!
Senior Site Reliability Engineer - Analytics Job Description
OVERVIEW CoStar delivers real-time, verified commercial real estate data that helps clients confidently spot great opportunities and make smart choices ahead of competitors. By combining the power of CoStar's independent research organization - the industry's largest - with global data delivery, software, and application solutions, clients can act on opportunities with confidence . The Analytics team is responsible for the development of CoStar's customer-facing Real Estate Analytics products. We think big, creating innovative data-intensive applications that take the vast amount of data collected by our CoStar Research teams to create a fast, reliable and intuitive analytics platform for our customers. We are a collaborative group with a mix of big data, API/platform and front-end skills, and we are growing rapidly to help invent the future of Real Estate Analytics. We are searching for an experienced Site Reliability Engineer to join our team and be the bridge between development and operation s. This person will apply a software engineering mindset to system administration topics such as availability, performance, monitoring, alerting, emergency response, and capacity planning for our customer-facing Analytics platform at CoStar. We have the industry's largest set of Commercial Real Estate property data, which lets us create innovativ e products that must be stable , reliable and high-performing . RESPONSIBILITIES- M onitor the availability and performance of our mission-critical applications and services.
- Design and develop real time monitoring and alerting solutions that provide clarity into application health and performance.
- Understand the architecture of our applications and services and ensure that we have the necessary monitoring, alerting, etc. in place. And in those cases where it's lacking, collaborate with other technical stakeholders on a plan to implement.
- Clearly communicate scope and impact of any incidents to teammates and stakeholders , following through to closure as necessary.
- T roubleshoot sometimes complex issues , using your knowledge of application architectures and our monitoring tools, and work with other team members as necessary to get issues fixed and deployed.
- Effectively manage your time to multi-task between monitoring, occasional incident response and longer term project goals.
- Participate in service capacity planning, load testing, performance analysis and system tuning.
- Continually lo ok to improve our performance, stability, tooling, processes, etc. as our applications evolve.
- Develop effective documentation for processes and tooling that you implement , so that other team members can use the tools and processes you put in place.
- 3-5 years of experience with operational ownership of large scale application architectures.
- Expertise with monitoring tools such as DataDog , Kibana, etc.
- Experience with architectures that use REST API s or Microservices
- Infrastructure as code using Cloudformation , Terraform or similar platforms
- Experience with AWS services such as S3, Lambda, DynamoDB, SageMaker , etc.
- Experience with SQL and No SQL databases
- U nderstanding of the Software Development Life Cycle including CI and CD pipeline architecture
- Experience working with container technologies such as Kubernetes and Docker
- Strong interpersonal and communications skills
- Programming languages / some developer background with either C#, NodeJS
- Server-less architecture methodologies
- Agile methodologies and working within sprint cycles
- Comprehensive healthcare coverage: Medical / Vision / Dental / Prescription Drug
- Life, legal, and supplementary insurance
- Commuter and parking benefits
- 401(K) retirement plan with matching contributions
- Employee stock purchase plan
- Paid time off
- Tuition reimbursement
- Complimentary gourmet coffee, tea, hot chocolate, prepared foods, fresh fruit, and other healthy snacks
Vacancy expired!