Title: Technical Specialist-Data Engg
Area(s) of responsibility
AI Engineer
Role Summary
We are looking for a hands-on AI Engineer to develop and implement AI/ML models and pipelines for enterprise data management use cases — including automated data profiling, semantic mapping, cleansing, validation, enrichment, and deduplication. The candidate will work under the guidance of the AI Architect to build, test, and deploy intelligent automation components.
Key Responsibilities
- Develop and implement AI/ML models for automated schema discovery, data profiling, and statistical analysis of enterprise data sets.
- Build semantic mapping solutions using NLP and LLM techniques — generating source-to-target field mappings with confidence scoring.
- Implement automated data cleansing and standardization logic — format normalization, unit conversion, naming convention enforcement, and industry-specific rule application.
- Develop validation pipelines using AI-inferred rules based on target system metadata — including mandatory field checks, referential integrity, and business rule enforcement.
- Build enrichment modules to identify and fill missing data attributes using external reference sources and domain knowledge bases.
- Implement deduplication solutions using exact and fuzzy matching algorithms; configure survivorship logic and golden record creation.
- Create dashboards and reports for data quality metrics, mapping confidence, exception tracking, and load readiness indicators.
- Capture learnings and outputs into reusable knowledge repositories — mapping dictionaries, rule libraries, and exception pattern catalogs.
Required Skills & Qualifications
- 3–5 years of experience in AI/ML engineering, applied AI, or data engineering with AI/ML components.
- Strong proficiency in Python and AI/ML libraries — scikit-learn, TensorFlow, PyTorch, Hugging Face Transformers, LangChain, or similar.
- Experience applying LLMs and NLP to data tasks — semantic matching, entity resolution, schema mapping, anomaly detection, and text classification.
- Working knowledge of data quality techniques — profiling, cleansing, deduplication (exact and fuzzy), standardization, and enrichment.
- Familiarity with cloud platforms (AWS, Azure, or GCP) — ML/AI services (SageMaker, Azure ML, Vertex AI) and data services (S3, Glue, Databricks).
- Understanding of ETL/ELT concepts, data pipeline design, and metadata-driven automation approaches.
- Ability to build visualizations and reports using Python libraries (Matplotlib, Plotly, Streamlit) or BI tools for data quality KPIs.
- Good understanding of common enterprise data objects — Items, Vendors, Customers, BOMs, Purchase Orders, Invoices, Inventory.
- Familiarity with version control (Git) and collaborative development practices.
Preferred
- Exposure to ERP data loading formats (IDOCs, BAPIs, flat file imports, API-based loading).
- Experience in Healthcare, MedTech, or Manufacturing data domains.
- Prior experience building AI/ML models for data quality, entity matching, or data classification use cases.
Education
- Bachelor's or Master's degree in Computer Science, Data Science, Artificial Intelligence, or a related field.