Key Responsibilities :
- Design and develop scalable data pipelines across Hadoop (Hive, Impala, Spark, Kafka, Iceberg) and Teradata environments.
- Build ingestion and transformation frameworks using Java, Spark, Python and shell scripts.
- Develop full stack applications and internal tools using Python, Shell scripting, and modern web frameworks (e.g., Flask, React).
- Create APIs and microservices to expose data and ML models securely to downstream systems and user interfaces.
- Collaborate with data scientists to operationalize ML models using Cloudera Machine Learning (CML)
- Build and deploy GenAI / LLM-powered applications for intelligent data interaction, summarization, and automation.
- Implement enterprise-grade security controls including RBAC, LDAP, Kerberos, Apache Ranger, and row-level access.
- Tune and optimize data applications for performance across Hadoop and Teradata, ensuring efficient resource utilization.
- Support sandbox environments for prototyping, enabling users to build ML models, dashboards, and data pipelines.
Required Skills & Experience :
Data Engineering : Strong experience with Hadoop ecosystem (Hive, Impala, Spark, Kafka, Iceberg, Ranger, Atlas), Teradata and data pipeline orchestration.Full Stack Development : Proficiency in Python, Shell scripting, REST APIs, and web frameworks (Flask, React, etc.).Machine Learning & AI : Hands-on experience with ML platforms (CML), Spark MLlib, Python ML libraries (scikit-learn, XGBoost), and model deployment.GenAI / LLM Applications : Familiarity with building applications using large language models (e.g., OpenAI, Hugging Face, LangChain) for enterprise use cases.Security & Governance : Experience with enterprise data security (LDAP, Kerberos, RBAC), data masking, and access control.Performance Tuning : Proven ability to optimize data applications and queries in large-scale environments (Hadoop, Teradata).Tools & Platforms : Cloudera Data Platform (CDP), Informatica, QlikSense, Apache Oozie, Git, CI / CD pipelines.Soft Skills : Strong analytical and problem-solving skills, excellent communication, and ability to work in cross-functional teams.Skills Required
Python, Tensorflow, Deep Learning, Data Analysis, Sql, Cloud Computing, Machine Learning, Big Data, Hadoop, Kafka, Pyt, Ml, Java, Cdp