Senior Software Engineer

caterpillar

Bangalore 8 Years Exp Posted 7h ago

Design, develop, and maintain scalable data pipelines on AWS using services such as S3, Glue, Lambda, Redshift, and EMR.
Build and optimize data warehousing solutions using Snowflake, including performance tuning and data modeling.
Write efficient and reusable code in Python and SQL for data transformation and processing.
Collaborate with cross-functional teams, including data scientists, analysts, and business stakeholders, to understand data requirements.
Develop and optimize solutions using graph databases (e.g., Neo4j, Amazon Neptune), including query design and performance tuning.
Design, build, and operate vector database solutions (e.g., Milvus, Amazon OpenSearch) to support semantic search, recommendations, RAG, and AI-driven use cases.
Integrate vector databases with LLM-based applications and AI workflows.
Monitor, troubleshoot, and improve pipeline performance and reliability.
Ensure data quality, integrity, and security across all stages of the pipeline.
Participate in code reviews, architecture discussions, and continuous improvement initiatives.

Required Qualifications

8+ years of experience in data engineering or related roles.
Strong hands-on experience with AWS cloud services, including data and AI workloads.
Deep understanding of Snowflake architecture, performance tuning, and best practices.
Advanced proficiency in Python and SQL for data pipelines, transformations, and services.
Strong understanding of graph and vector data modelling concepts and their practical applications.
Hands-on experience with graph databases (e.g., Neo4j, Neptune) and vector databases (e.g., Milvus, Amazon OpenSearch).
Experience with version control systems (e.g., Git) and Git workflows.
Experience working with Azure DevOps (AzDO) boards for backlog management in Agile environments.
Excellent analytical and problem-solving skills.
Strong communication and collaboration abilities.
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.

Nice to Have skills

Preferred Qualifications

Experience with orchestration tools such as AWS Step Functions.
Familiarity with data governance and compliance practices.
Exposure to real-time data processing frameworks (e.g., Kafka, Spark Streaming).

Mode detail on Knowledge Base

Experience designing and deploying data ingestion pipelines for unstructured sources such as PDFs, Word documents, and HTML files, including text extraction, chunking strategies, and embedding generation at scale.
Hands-on expertise with vector databases, specifically Milvus, covering schema design, indexing, and optimizing write performance for large-scale embedding ingestion pipelines.
Proficiency in building Knowledge Graph ingestion pipelines using Graph Databases — including entity extraction, relationship modelling, and populating nodes and attributes.
Strong pipeline engineering skills in Python and frameworks for orchestrating multi-stage document processing workflows, with experience deploying and monitoring these pipelines in production environments.
- Bonus: Exposure to RAPIDS libraries (cuDF, cuML, cuGraph) or CUDA-based tooling for GPU-accelerated data processing, enabling faster transformation and optimization during large-scale ingestion workflows.