Roles & Responsibilities
Role and Responsibilities
- Design, develop, and optimize ETL / ELT pipelines using Databricks (PySpark, Spark SQL, Delta Lake).
- Implement data ingestion, transformation, and cleansing workflows from multiple data sources (structured and unstructured).
- Collaborate with data architects and business teams to ensure data models support analytics and reporting requirements.
- Manage and monitor Databricks clusters, jobs, and workflows for performance and cost optimization.
- Integrate Databricks with cloud platforms (e.g., Azure Data Lake, AWS S3, Synapse, Redshift, Snowflake, etc.).
- Develop and maintain CI / CD pipelines for Databricks notebooks and data workflows.
- Ensure compliance with data governance, security, and privacy standards.
- Troubleshoot and resolve issues in data pipelines and analytics jobs.
- Document technical solutions, data flows, and best practices.
Requirements :
Bachelor’s degree in Computer Science, Information Systems, Data Engineering, or related field.Required Skills :
3–5 years of hands-on experience with Databricks and Apache Spark.Strong proficiency in Python (PySpark), SQL, and Spark SQL.Experience with cloud data platforms (Azure, AWS, or GCP).Familiarity with Delta Lake, Parquet, and data lakehouse architecture.Experience in version control (Git) and DevOps practices (CI / CD).Understanding of data modeling, data warehousing, and ETL best practices.Strong analytical, problem-solving, and communication skills.Preferred Skills :
Experience with Databricks REST APIs, MLflow, or Unity Catalog.Exposure to Power BI / Tableau or other visualization tools.Familiarity with Airflow, Terraform, or Data Factory for orchestration and deployment.Knowledge of data governance and security compliance frameworks.Tell employers what skills you have
Machine Learning
Version Control
PySpark
Airflow
Apache Spark
Azure
Data Modeling
Pipelines
AWS
Azure Machine Learning
ETL
Databricks
Data Governance
Data Engineering
Python
GCP
Orchestration
S3
Power BI
Data Warehousing