Title: Technical Lead-Data Engg
Area(s) of responsibility
Job Profile: Datalakehouse operations Platform Engineer (AWS)
Key Responsibilities:
- Develop and maintain data ingestion pipelines for various data sources, including transactional databases, streaming big data, and batch data, utilizing tools such as GitHub, Terraform, AWS Glue, and PySpark, Kafka, ECS.
- Set up and manage batch orchestration jobs using Apache Airflow, ensuring timely execution and reliability.
- Monitor data pipelines continuously to ensure operational efficiency and address any anomalies or incidents that may arise in a timely manner.
- Collaborate with the data governance team to ensure compliance with data governance guidelines, including data classification and quality, using monitoring tools like Grafana and metrics from Prometheus.
- Document operational procedures, incident reports, and performance metrics to support continuous improvement efforts.
Qualifications:
- bachelor in IT or information systems
- Proficient in GitHub and GitHub Actions, as well as Terraform for infrastructure management.
- Experience with AWS Glue, Python, and PySpark for data processing.
- Familiarity with monitoring and visualization tools such as Grafana and Prometheus.
- Knowledge of Amazon Athena and open table formats, including Apache Iceberg.
- Experience with Amazon S3, IAM, and AWS Lambda.
- A solid understanding of object-oriented programming principles and SOLID principles is a plus.