Site Reliability Engineering: DevOps & Automation

Aera Technology

Pune, IN
  • Job Type: Full-Time
  • Function: IT
  • Post Date: 06/15/2021
  • Website: aeratechnology.com
  • Company Address: 707 California Avenue, Mountain View, CA, 94041

About Aera Technology

Aera Technology delivers the Cognitive Operating Systemâ„¢ that enables the Self-Driving Enterpriseâ„¢. Aera understands how businesses work; makes real-time recommendations; predicts outcomes; and acts autonomously. Using proprietary data crawling, industry models, machine learning and artificial intelligence, Aera is revolutionizing how people relate to data and how organizations function.

Job Description

Do you want to be part of a world-class team of Software Engineers that are shaping the future of enterprise software?
 
At Aera Technology, we apply Internet scale technology to the challenges facing enterprise businesses. Think of the self-driving car: connected, always-on, thinking, and autonomous. Our mission is to enable companies in the same way.
 
Site Reliability Engineering at Aera is creating next-generation, hybrid cloud infrastructure that enables our SaaS platform to process billions of Machine Learning transactions on petabytes of data every day.
 
As our customer base rapidly grows, we are looking for experienced Site Reliability Engineers to join our global Software Engineering team and help us deliver our vision.

In this role you will:

    • Design, build, release, and maintain a fully automated, Infrastructure as Code ecosystem that ensures 4+ nines availability of our SaaS platform.
    • Continuously innovate your way out of existing and yet-to-be-discovered problems, with an eye on “what’s next” as we anticipate and remain ahead of customer expectations.
    • Obsess about, measure, and optimise system performance, continuously pushing your capabilities beyond current boundaries as our platform scales and customer base grows.
    • Learn what a “healthy” platform ecosystem looks like, and build “Observability” into the platform which prevents outages from impacting service availability.
    • Seek out and build relationships across teams that positively impact our culture of collaboration, innovation, with an understanding of how your work contributes to the bottom line of the business.

Your day will consist of:

    • Participating in infrastructure design, platform management, and capacity planning discussions to ensure we are scaling to meet business needs.
    • Writing code that automates activities that have historically been executed manually.
    • Gathering and analyzing metrics from our platform using Observability methods to assist in performance tuning, debugging, and root cause analysis.
    • Collaborating with development teams to improve our platform services through innovative new designs, rigorous testing and release methods.
    • Ensuring we are meeting our Service Level Objectives, (SLOs) by reviewing our Service Level Indicators, (SLIs) and reporting deviations along with remediation and mitigation plans and schedules.
    • Helping restore service availability, followed by debugging, and root cause analysis for issues that occur in our Production environments.
    • Helping provide 24/7/365 coverage in a “Follow-the-Sun” model for on-call support.

Your ideal qualifications are:

    • A Bachelor’s degree in Computer Science or other related technical, and/or scientific discipline. A strong background in advanced Mathematics is a plus.
    • Ability to write code (structured and OO) with one or more high level languages, such as Python, Java, C/C++, and JavaScript.
    • Ability to write code using multiple automation languages like Terraform and Ansible.
    • Working knowledge of Cloud-based technologies, providers, and tools such as Kubernetes, “service meshes”, AWS, Azure, GCP, etc.
    • Experience with large scale distributed systems that incorporate modern databases, (Cassandra, SQL), and big data platforms, (Exasol).
    • Experience using various real-time and historical monitoring tools such as ELK, DataDog, Prometheus, Nagios, etc to troubleshoot issues in our platform.
    • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks, as well as an unwavering commitment to identifying root causes of infrastructure issues and resolving them.
    • 3+ years working as a SRE maintaining complex, distributed systems in real time.
At Aera, we're on a mission to solve the biggest, most intractable challenges of enterprise software. We envision the rise of the Self-Driving Enterprise: a more autonomously functioning business with a central operating system that connects and orchestrates business operations. Our platform is increasingly used by the world's largest companies to identify and respond to market opportunities faster.
 
If you share our passion for building the next generation of enterprise software, and implementing it for the most sophisticated customers in the world, you’ve met your match. Headquartered in Mountain View, California, we're growing fast, with teams in Mountain View and San Francisco (California), Bucharest and Cluj-Napoca (Romania), Paris (France), Munich (Germany), London (UK), Pune and Bangalore (India), Sydney (Australia) and Singapore.  So join us, and let’s build this!

Related Jobs

Client Partner

Aera Technology - San Francisco, CA, US

Client Partner

Aera Technology - Philadelphia, PA, US

Client Partner

Aera Technology - Paris, FR

Client Partner

Aera Technology - Munich, DE

Client Partner

Aera Technology - Lausanne, CH
Disclaimer: Local Candidates Only
This company does NOT accept candidates from outside recruiting firms. Agency contacts are not welcome.