Data Engineer

prodapt

chennai 2 Years Exp Posted 45d ago

Job Description

  • Develop and maintain ETL pipelines for unstructured data (logs, documents, tickets)
  • Preprocess and transform data into model-ready formats (JSONL, embeddings, chunks)
  • Assist in SLM fine-tuning workflows (dataset prep, training, evaluation)
  • Build and integrate APIs for model inference
  • Support data cleaning, deduplication, and validation
  • Collaborate with Tech Lead on model experiments and improvements

 

Must-Have Skills

  • Strong proficiency in Python (pandas, data processing, scripting) with experience in building AI/ML solutions
  • Experience with unstructured text processing / NLP basics
  • Experience in designing and implementing ETL pipelines for data cleaning, transformation, and batch processing
  • Experience in dataset creation and curation for model training, including instruction tuning, supervised fine-tuning, and evaluation datasets
  • Familiarity with machine learning frameworks (PyTorch / TensorFlow)
  • Experience in developing and integrating REST APIs using frameworks like FastAPI or Flask
  • Basic understanding of LLMs / embeddings / fine-tuning concepts

Similar Openings for You