Job Description
Mad Street Den® is looking for a Platform Engineering Manager to lead a team of platform
engineers building scalable, reliable software systems on the cloud (AWS, Azure, GCP). You
will work closely with Data Scientists and ML Engineers to architect, orchestrate and deploy
state-of-the-art ML Systems.
Skills we need:
- Take ownership of development and advancement of the roadmap for the engineering platform
- Design, architect and evolve multiple platform systems and provide hands-on technical leadership and mentorship to your team
- Provide strong technical leadership in cloud/network architecture and be able to work across multiple cloud providers easily
- Work on top of multiple distributed systems and opensource datastores like solrCloud, ElasticSearch, PostgreSQL
- Work closely with internal teams using your platform and help them deploy their services on the Cloud using Kubernetes and Docker
- Develop & evolve infrastructure automation using tools like Terraform, Ansible, and provision & maintain multiple environments
- Setup and reiterate strong engineering & DevOps practices (CI/CD) using tools like Jenkins
- Make sure the infrastructure is able to handle high-throughput, low-latency scale requirements
- Be a hands-on leader in production, minimizing downtimes, optimizing for costs and setting up 360-degree observability for site reliability teams
- Hire, build and nurture your team and create a culture of high-performance and quality engineering practice
- Engage with external/client-side technical teams when necessary and manage the relationship
Qualifications:
- 10+ years of experience working on systems & platforms, Linux, OS, Networking, AWS,
- Web Services
- 5+ years of experience building and managing teams
- A mind for systems. You can appreciate a well-designed architecture. You believe simplicity is the ultimate sophistication
- Strong experience in one programming language like Python, Go, Java
- Strong experience on the entire stack of at least one cloud provider
- Exceptional understanding of the problems in deploying and maintaining distributed systems
- Experience dealing with large volumes of traffic and data in production and have the experience to performance-tune different types of datastores
- Experience in building microservices using tools like Kubernetes, and Docker
- Ideally been part of a startup building large-scale systems on the cloud. You have seen Load balancer dashboards in your dreams. You can almost anticipate when systems will go down