Site Reliability Engineer

Dremio

Santa Clara, CA, US
  • Job Type: Full-Time
  • Function: IT
  • Post Date: 02/15/2021
  • Website: dremio.com
  • Company Address: 883 N Shoreline Boulevard C100, Mountain View, CA, 94043

About Dremio

Dremio is the Data Lake Engine. Created by veterans of open source and big data technologies, and the creators of Apache Arrow, Dremio is a fundamentally new approach to data analytics that helps companies get more value from their data, faster. Dremio makes data engineering teams more productive, and data consumers more self-sufficient. For more information, visit www.dremio.com.

Job Description

Dremio is the Data Lake Engine company. Our mission is to reshape the world of analytics to deliver on the promise of data with a fundamentally new architecture, purpose-built for the exploding trend towards cloud data lake storage such as AWS S3 and Microsoft ADLS. We dramatically reduce and even eliminate the need for the complex and expensive workarounds that have been in use for decades, such as data warehouses (whether on-premise or cloud-native), structural data prep, ETL, cubes, and extracts. We do this by enabling lightning-fast queries directly against data lake storage, combined with full self-service for data users and full governance and control for IT. The results for enterprises are extremely compelling: 100X faster time to insight; 10X greater efficiency; zero data copies; and game-changing simplicity. And equally compelling is the market opportunity for Dremio, as we are well on our way to disrupting a $25BN+ market.

About the Role

Dremio’s SREs ensure that our internal and externally visible services have reliability and uptime appropriate to users' needs and a fast rate of improvement. You will be joining a newly formed team that will spearhead our efforts to launch a cloud service. This is an opportunity to join a very fast growth startup and help build a cloud service from the ground up.

Responsibilities and Ownership

  • Ability to debug and optimize code and automate routine tasks.
  • Evangelize and advocate for reliability practices across our organization.
  • Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, monitoring/alerting, capacity planning and launch reviews.
  • Analyze and optimize our core product by developing and implementing reliability and performance practices.
  • Scale systems sustainably through automation and evolve systems by pushing for changes that improve reliability and velocity.
  • Be on-call for services that the SRE team owns.
  • Practice sustainable incident response and blameless postmortems.

Qualifications

  • 5+ years of relevant experience in the following areas: SRE, DevOps, Cloud Operations, Systems Engineering, or Software Engineering.
    • Excellent command of cloud services on AWS/GCP/Azure, Kubernetes and CI/CD pipelines.
  • Have moderate-advanced experience in Java, C, C++, Python, Go or other object-oriented programming languages.
  • You are Interested in designing, analyzing and troubleshooting large-scale distributed systems.
  • You have a systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
  • You have a great ability to debug and optimize code and automate routine tasks.
  • You have a solid background in software development and architecting resilient and reliable applications.

Related Jobs

Site Reliability Engineer

Dremio - Santa Clara, CA, US

Software Engineer - Data Management & Processing

Dremio - Hyderabad, IN

Software Engineer - Dremio SaaS Service

Dremio - Hyderabad, IN

Software Engineer - Query Optimizer

Dremio - Santa Clara, CA, US

Outbound Product Manager

Dremio - Remote
Disclaimer: Local Candidates Only
This company does NOT accept candidates from outside recruiting firms. Agency contacts are not welcome.