Data Engineer
prodapt
Job Description
- Develop and maintain ETL pipelines for unstructured data (logs, documents, tickets)
- Preprocess and transform data into model-ready formats (JSONL, embeddings, chunks)
- Assist in SLM fine-tuning workflows (dataset prep, training, evaluation)
- Build and integrate APIs for model inference
- Support data cleaning, deduplication, and validation
- Collaborate with Tech Lead on model experiments and improvements
Must-Have Skills
- Strong proficiency in Python (pandas, data processing, scripting) with experience in building AI/ML solutions
- Experience with unstructured text processing / NLP basics
- Experience in designing and implementing ETL pipelines for data cleaning, transformation, and batch processing
- Experience in dataset creation and curation for model training, including instruction tuning, supervised fine-tuning, and evaluation datasets
- Familiarity with machine learning frameworks (PyTorch / TensorFlow)
- Experience in developing and integrating REST APIs using frameworks like FastAPI or Flask
- Basic understanding of LLMs / embeddings / fine-tuning concepts