Site Reliability Engineer (SRE)/AWS

Olive

Remote / United States of America
  • Job Type: Full-Time
  • Function: IT
  • Post Date: 05/04/2021
  • Website: oliveai.com
  • Company Address: 99 East Main Street, Columbus, OH, 43215

About Olive

Olive builds artificial intelligence and RPA solutions that empower healthcare organizations to improve efficiency and patient care while reducing costly administrative errors.

Job Description

Olive’s AI workforce is built to fix our broken healthcare system by addressing healthcare’s most burdensome issues -- delivering hospitals and health systems increased revenue, reduced costs, and increased capacity. People feel lost in the system today and healthcare employees are essentially working in the dark due to outdated technology that creates a lack of shared knowledge and siloed data. Olive is designed to drive connections, shining a new light on the broken healthcare processes that stand between providers and patient care. She uses AI to reveal life-changing insights that make healthcare more efficient, affordable and effective. Olive’s vision is to unleash a trillion dollars of hidden potential within healthcare by connecting its disconnected systems. Olive is improving healthcare operations today, so everyone can benefit from a healthier industry tomorrow.

Our Infrastructure team is looking to add an Engineer and continue to advance the cloud capabilities and services/systems for our internal engineering teams. As part of our engineering team, you’ll be responsible for ensuring Olive’s applications build, deploy and run smoothly. You’ll help keep our infrastructure up to date, and use new and existing tools to solve technical problems. At Olive, automation, reliability and efficiency are part of everything we do.

Responsibilities

  • Work directly with product engineering teams to architect and deploy applications using AWS services and methodologies.
  • Design and implement pipelines for Continuous Improvement and Continuous Delivery.
  • Create high quality alerts based on business centric performance metric including uptime, error rate, performance baseline, infrastructure load metrics.
  • Partner with product engineering teams and other SREs to optimize performance and solve issues across the entire stack: hardware, software, application, and network.
  • Plan, develop, and implement automated systems for deployment and automated issue remediation.
  • Embrace changing requirements.
  • Actively participate in architecture, design reviews and operational readiness exercises for new and existing services.
  • Experience working with container deployment and orchestration technologies with knowledge of fundamentals including service discovery, deployments, monitoring, scheduling, load balancing. Knowledge of Kubernetes, Go and Docker preferred.
  • Incident Triage & Response
  • Scalability Reviews (JVM tuning, Load testing, Architecture reviews, Database performance)

Requirements

  • High level experience in architecting systems using AWS services and methodologies.
  • The ability to identify, document, and execute common deployment patterns to increase service coherency.
  • Past experience with being an SRE or Software Engineer with a keen interest in performance and scalability of large systems.
  • Programming experience in Python, Golang or Ruby.
  • Experience with APM tools like New Relic, DataDog.Experience with deploying large projects using infrastructure as code tools like Terraform, Serverless Framework, or AWS CDK.
  • Experience running services in a large scale environment is a bonus but not required
  • Understanding of Linux operating system, networking, and databases
  • A degree in computer science is helpful but not required. We value skills and technical aptitude over degrees
  • Detect abnormalities in performance and proactively address alerts and deviation to reduce risk to platform before it impacts customer
  • You will be part of an on-call rotation consisting of SREs and Engineers but you are not required to solve every infrastructure problem. Our entire engineering team practices DevOps culture and owns their respective services

Requirements

  • Past experience with being an SRE or Software Engineer with a keen interest in performance and scalability of large systems
  • Programming skills in Python and shell scripting language
  • Experience with APM tools like New Relic, Data Dog and understanding the difference of APM vs Infrastructure monitoring tools is preferred.
  • Experience with infrastructure as code (Terraform, AWS, Azure)
  • Experience running services in a large scale environment is a bonus but not required
  • Understanding of Linux operating system, networking, and databases
  • Knowledge of TCP/IP, HTTP, web application security
  • Able to configure or learn to fix network systems including DNS, DHCP, and Load Balancer technologies.
  • A degree in computer science is helpful but not required. We value skills and technical aptitude over degree
  • Detect abnormalities in performance and proactively address alerts and deviation to reduce risk to platform before it impacts customer
  • You will be part of an on-call rotation consisted of SREs and Engineers but you are not required to solve every infrastructure problem. Our entire engineer team practices Dev-Ops culture and owns their respective services

Related Jobs

Full Stack Software Engineer (Node.js, JavaScript) - Open to Remote

Olive - RemoteUnited States of America

Software Engineer, Product Engineering

Olive - Columbus, OH, USRemote

Strategic Account Lead

Olive - United States of AmericaRemote

Recruiting Coordinator

Olive - Columbus, OH, US

Market Development Executive

Olive - Columbus, OH, USRemote
Disclaimer: Local Candidates Only
This company does NOT accept candidates from outside recruiting firms. Agency contacts are not welcome.