-
4+ years of work experience as a DevOps/Site Reliability Engineer or similar software engineering role
-
Hands-on Experience managing production workloads in AWS
-
Strong Unix and scripting experience, demonstrating the ability to automate tasks and streamline processes using Python. Django experience is good to have.
-
Experience with Infrastructure as Code (IaC) tools and configuration management tools (Terraform, Ansible, Chef etc.) is must.
-
Proficiency and experience in observability such monitoring, SLO alerting, and telemetry collection using tools such as Grafana, Datadog, Sentry, Prometheus etc.
-
Problem-solving attitude with a proactive approach to identifying and resolving issues in a timely manner.
-
BTech in Computer Science, Engineering or relevant field
-
Any AWS certifications are an added plus.
-
Design, implement, and maintain large-scale cloud-based systems, ensuring high availability and scalability
-
Collaborate with cross-functional teams to identify and prioritize technical requirements, and ensure timely delivery of projects
-
Develop and maintain complex bash scripts and Python utilities to automate system administration tasks
-
Lead initiatives to improve the efficiency and scalability of our systems, including optimizing infrastructure and automating processes
-
Ensure the smooth operation of our cloud infrastructure, including monitoring and troubleshooting systems and resolving technical issues
-
Develop and implement cloud security policies and procedures to protect our systems and data
-
Stay up-to-date with emerging technologies and trends in DevOps engineering, and make recommendations for innovation and improvement
-
Provide technical leadership and mentorship to junior engineers, and contribute to the development of team members through knowledge sharing and code reviews