Senior Software Engineer
caterpillar
Job Description
- Design, develop, and maintain scalable data pipelines on AWS using services such as S3, Glue, Lambda, Redshift, and EMR.
- Build and optimize data warehousing solutions using Snowflake, including performance tuning and data modeling.
- Write efficient and reusable code in Python and SQL for data transformation and processing.
- Collaborate with cross-functional teams, including data scientists, analysts, and business stakeholders, to understand data requirements.
- Develop and optimize solutions using graph databases (e.g., Neo4j, Amazon Neptune), including query design and performance tuning.
- Design, build, and operate vector database solutions (e.g., Milvus, Amazon OpenSearch) to support semantic search, recommendations, RAG, and AI-driven use cases.
- Integrate vector databases with LLM-based applications and AI workflows.
- Monitor, troubleshoot, and improve pipeline performance and reliability.
- Ensure data quality, integrity, and security across all stages of the pipeline.
- Participate in code reviews, architecture discussions, and continuous improvement initiatives.
Required Qualifications
- 8+ years of experience in data engineering or related roles.
- Strong hands-on experience with AWS cloud services, including data and AI workloads.
- Deep understanding of Snowflake architecture, performance tuning, and best practices.
- Advanced proficiency in Python and SQL for data pipelines, transformations, and services.
- Strong understanding of graph and vector data modelling concepts and their practical applications.
- Hands-on experience with graph databases (e.g., Neo4j, Neptune) and vector databases (e.g., Milvus, Amazon OpenSearch).
- Experience with version control systems (e.g., Git) and Git workflows.
- Experience working with Azure DevOps (AzDO) boards for backlog management in Agile environments.
- Excellent analytical and problem-solving skills.
- Strong communication and collaboration abilities.
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
Nice to Have skills
- Knowledge of the NVIDIA ecosystem and its applications in data and AI.
Preferred Qualifications
- Experience with orchestration tools such as AWS Step Functions.
- Familiarity with data governance and compliance practices.
- Exposure to real-time data processing frameworks (e.g., Kafka, Spark Streaming).
Mode detail on Knowledge Base
- Experience designing and deploying data ingestion pipelines for unstructured sources such as PDFs, Word documents, and HTML files, including text extraction, chunking strategies, and embedding generation at scale.
- Hands-on expertise with vector databases, specifically Milvus, covering schema design, indexing, and optimizing write performance for large-scale embedding ingestion pipelines.
- Proficiency in building Knowledge Graph ingestion pipelines using Graph Databases — including entity extraction, relationship modelling, and populating nodes and attributes.
- Strong pipeline engineering skills in Python and frameworks for orchestrating multi-stage document processing workflows, with experience deploying and monitoring these pipelines in production environments.
- Bonus: Exposure to RAPIDS libraries (cuDF, cuML, cuGraph) or CUDA-based tooling for GPU-accelerated data processing, enabling faster transformation and optimization during large-scale ingestion workflows.