Roles & Responsibilities
Responsibilities
- Integrate data from multiple sources, such as databases, APIs, or streaming platforms, to provide a unified view of the data
- Implement data quality checks and validation processes to ensure the accuracy, completeness, and consistency of data
- Identify and resolve data quality issues, monitor data pipelines for errors, and implement data governance and data quality frameworks
- Enforce data security and compliance with relevant regulations and industry-specific standards
- Implement data access controls, encryption mechanisms, and monitor data privacy and security risks
- Optimise data processing and query performance by tuning database configurations, implementing indexing strategies, and leveraging distributed computing frameworks
- Optimize data structures for efficient querying and develop data dictionaries and metadata repositories
- Identify and resolve performance bottlenecks in data pipelines and systems
- Collaborate with cross-functional teams, including data scientists, analysts, and business stakeholders
- Document data pipelines, data schemas, and system configurations, making it easier for others to understand and work with the data infrastructure
- Monitor data pipelines, databases, and data infrastructure for errors, performance issues, and system failures
- Set up monitoring tools, alerts, and logging mechanisms to proactively identify and resolve issues to ensure the availability and reliability of data
- It would be a plus if he has software engineering background
Requirements
Bachelor's or master's degree in computer science, information technology, data engineering, or a related fieldStrong knowledge of databases, data structures, algorithmsProficiency in working with data engineering tools and technologies including knowledge of data integration tools (e.g., Apache Kafka, Azure IoTHub, Azure EventHub), ETL / ELT frameworks (e.g., Apache Spark, Azure Synapse), big data platforms (e.g., Apache Hadoop), and cloud platforms (e.g., Amazon Web Services, Google Cloud Platform, Microsoft Azure)Expertise in working with relational databases (e.g., MySQL, PostgreSQL, Azure SQL, Azure Data Explorer) and data warehousing concepts.Familiarity with data modeling, schema design, indexing, and optimization techniques is valuable for building efficient and scalable data systemsProficiency in languages such as Python, SQL, KQL, Java, and ScalaExperience with scripting languages like Bash or PowerShell for automation and system administration tasksStrong knowledge of data processing frameworks like Apache Spark, Apache Flink, or Apache Beam for efficiently handling large-scale data processing and transformation tasksUnderstanding of data serialization formats (e.g., JSON, Avro, Parquet) and data serialization libraries (e.g., Apache Avro, Apache Parquet) is valuableHaving experience in CI / CD and GitHub that demonstrates ability to work in a collaborative and iterative development environmentHaving experience in visualization tools (e.g. Power BI, Plotly, Grafana, Redash) is beneficialPreferred Skills & Characteristics
Consistently display dynamic independent work habits, goal oriented, passionate in growth mindsets and self-motivated professional. Self-driven and proactive in keeping up with new technologies and programming
Tell employers what skills you have
Scala
Azure
Big Data
Data Modeling
Pipelines
Software Engineering
MySQL
Scripting
Data Integration
Data Quality
Data Engineering
SQL
Python
Java
Data Warehousing
Databases