Senior Software Engineer -Data Engineer

cgi

Bangalore 4 Years Exp Posted 1h ago

Job Description

 Design, develop, and maintain scalable data pipelines and architectures using Python, SQL, and big data technologies.
. Build, manage, and optimize modular, testable, and efficient data ingestion and transformation scripts.
. Integrate data from various sources including RESTful APIs, third party services, and internal databases.
. Collaborate with Data Scientists, Analysts, and Backend developers to deliver end to end data solutions.
. Write clean, efficient, and maintainable code following data engineering best practices.
. Optimize pipeline performance, processing speed, and scalability for large scale datasets.
. Implement complex data flows and state management using orchestration tools and reactive data processing frameworks.
. Troubleshoot, debug, and resolve data quality, pipeline failures, and performance bottlenecks.
. Participate in code reviews and maintain high coding standards for data infrastructure.
. Ensure data security, integrity, and protection practices are followed (e.g., encryption, access control, GDPR/SOC2 compliance).
. Contribute to data architecture, modeling, and technical design discussions.
. Support deployment processes and work with data centric CI/CD pipelines (DataOps).
. Stay updated with emerging data technologies, cloud updates, and suggest improvements to the data stack.

Required qualifications to be successful in this role:

Must Have Skills:

. Strong proficiency in Python or Scala and advanced SQL (complex joins, window functions, optimization).
. Hands on experience with core data technologies including Spark, Hadoop, or similar distributed processing frameworks.
. Solid understanding of data modeling (Star/Snowflake schema), data warehousing, and lakehouse architectures.
. Experience with workflow orchestration tools like Apache Airflow, Prefect, or Dagster.
. Strong experience in building and consuming APIs for data ingestion and asynchronous data handling.
. Proficiency in Git and version control workflows.
. Understanding of software design principles (modular design, reusability, maintainability applied to data).
. Experience with query optimization, performance tuning, and database indexing.
. Familiarity with data security practices (IAM roles, encryption at rest/transit, data masking).
. Solid understanding of batch and real time data processing patterns.
. Good to Have Skills:
. Experience with cloud data warehouses like Snowflake, BigQuery, or Redshift.
. Exposure to NoSQL databases and search engines (e.g., MongoDB, Cassandra, Elasticsearch).
. Familiarity with data validation frameworks (e.g., Great Expectations, dbt).
. Experience with streaming technologies like Apache Kafka, Flink, or Spark Streaming.
. Working knowledge of Infrastructure as Code (Terraform) and Cloud services (AWS, Azure, or GCP).
. Understanding of Machine Learning pipeline integration (MLOps) and feature stores.
. Familiarity with containerization (Docker, Kubernetes) for data workloads.
. Basic understanding of CI/CD pipelines for automated data testing and deployment.
. Working knowledge of BI and visualization tools (e.g., Tableau, PowerBI, Looker).
. Experience working in Agile/Scrum environments.

Similar Openings for You