CloudOps Engineer
tieto
Job Description
- Be a champion for department initiatives and values by ensuring all actions promote the department’s mission statement
- Work Closely with AWS Professional Services team in the AWS migration project to migrate the Client products from other hosting platforms into AWS Cloud.
- Work closely with the DevOps engineers in setting up the controlled environments of Client’s products built on the AWS Cloud using IaC.
- Participate in release cycles of product by closely working with Engineering Managers, Architects and Developers during the AWS migration.
- Create and maintain standard operating procedures (SOPs) for performing maintenance tasks, applying configuration changes, and remediate problems in the environment.
- Implement monitoring, alerting, notification and metrics collection for
- Infrastructure and application performance
- System uptime
- Error rate
- Monitor and continually improve the capacity and reliability of our production environment infrastructure.
- Investigate and fix performance and scalability bottlenecks, proactively identify issues and create work items to improve stability and performance.
- Identify single points of failure and other high-risk architecture issues and propose resilient resolutions to mitigate the risk thereby improving the system reliability.
- See opportunities for automation and reduce the operational workload, build scripts, introduce new tools and practices as needed
- Work with other Cloud Infrastructure engineers and developers to ensure maximum performance, reliability and automation of our deployments and infrastructure.
- Communicate to stakeholders and handle the deployment/maintenance/support efficiently
- Ticket Handling and Support
- Tickets that are handled should have clear communication and correct stakeholders involved
- Tickets should be completed within the SLA and should be clearly informed, documented if there are any delays or improper tickets.
- Tickets should have proper comments to close the ticket including steps for resolutions, screen shots.
- Tickets that are repetitive should be discussed in standup call for brainstorming and eventually should lead into resolution through automation if necessary.
Skills Required:
- 5+ years of experience with any public cloud provider such as Amazon Web Services (AWS) and On-Prem Servers
- Hands-on experience on migrating workloads from On-prem or other cloud hosting platforms into AWS cloud.
- Solid understanding of standard TCP/IP networking, Load Balancing and common protocols like DNS, HTTPS
- Good knowledge on CI/CD tools like Azure ADO, GitHub Actions, Jenkins etc
- Monitoring and Logging: Experience with any Application monitoring and logging tools (e.g. Datadog, New Relic, AppDynamics, Application Insight, ELK, Prometheus).
- Incident Management experience (from tools like PagerDuty)
- Good understanding of Web Servers & Database
- Good understanding in Docker and Kubernetes.
- Good scripting knowledge & Software life cycles model.
- Good understanding of DevOps practices.
- Should have worked on high traffic & highly scalable systems in past
- Knowledge of fundamental aspects for release automation (packaging, dependencies, promotion, deployment, compliance)
- A passion for collecting, evaluating, and improving performance metrics.
- Excellent time management, resource organization and priority establishment skills, and ability to multi-task in a fast-paced environment
- Ability to work quickly and efficiently with minimal supervision
- Excellent communication skills with both written and verbal
Qualifications:
- 5+ years of Systems Engineering experience in the following areas
- Cloud platforms (Azure, AWS) and On-Prem Servers
- Windows and Linux Servers
- Application Monitoring Tools (Datadog, New Relic, AppDynamics, Application Insights)
- Log Aggregation Tools (Datadog, ELK, etc)
- PowerShell, Bash, or Python scripting
- CI/CD tools (Azure Pipelines, GithHub Actions, Jenkins, Octopus, etc.)
- Infrastructure management tools (Terraform, Ansible, etc.)
- Application Hosting (IIS,