Vacancy expired!
job summary:
Project/Projects:- Create and maintain data pipelines between on-premise data center, Azure Data Lake Storage, and Azure Synapse database using Databricks and Apache Spark/Scala.
- This role is for a senior data engineer that will join a team responsible for managing a growing cloud-based data ecosystem consisting of a metadata driven data lake and databases that support real time analytics, extracts, and reporting.
- The right candidate will have a solid background in data engineering and should have a few years of experience on a major cloud platform such as Azure.
- Databricks
- Apache Spark
- Scala programming
- Azure
- Data Warehousing / Big Data Best Practices
- Understanding of how best to partition and organize data depending on the technology and use case: 2 years
- Experience in regularly dealing with data in hundreds of Terabytes up to 1-2 Petabytes: 2 years
- Data engineering experience: 5 years
- Cloud platform experience: 2 years
- Version Control (Git or equivalent): 2 years
- Data Integration Tools (Spark/Databricks or equivalent): 2 years
- Scripting (Linux/Unix Shell scripting or equivalent): 2 years
- Ab Initio experience
- Netezza Experience
- Building and maintaining a data processing framework on Azure using Databricks
- Writing code in Apache Spark/Scala
- Working with existing Databricks Delta Lake tables to optimize for CDC performance using techniques
- Working with existing Databricks Notebooks to optimize or address performance concerns
- Create new Databricks Notebooks or stand-alone Apache Spark/Scala code as needed
- Working with existing on-premise data management tools as required and willing to learn Ab Initio.
- Experience level: Experienced
- Minimum 5 years of experience
- Education: No Degree Required
- Databricks (5 years of experience is required)
- Apache Spark (5 years of experience is required)
- Azure (5 years of experience is required)
- Scala (5 years of experience is required)
Vacancy expired!