Software Architect, Agent Evaluation & Core Framework

Datagrid

San Francisco, CA, US / Remote
  • Job Type: Full-Time
  • Function: IT
  • Post Date: 06/23/2025
  • Website: www.datagrid.com
  • Company Address: San Francisco, CA 94110, US

About Datagrid

Datagrid turns all your data into reasoning AI agents that can automate any work or task.

Job Description

Location:

Remote First

SF Bay area preferred

 

About Datagrid

Datagrid is the AI Agent that gets work done for you.

 

Instead of just answering questions, Datagrid’s agents take action—automating entire workflows across your tools, files, and systems. Whether it’s searching through documents to find answers, cross-referencing data to uncover gaps, or running a financial analysis that updates your Excel file—Datagrid does the work, so you don’t have to.

 

You get your time back. You 10x your output. The AI runs the playbook.

Behind the scenes, Datagrid connects to over 100 platforms and 2,000+ APIs—Excel, Google Docs, SharePoint, Slack, PDFs, websites, and more. It handles multi-modal problems like handling unstructured data like images and documents, as well as entire databases with ease, and communicates through channels like Teams, Slack, or SMS.

 

It’s built for trust and precision: agents cite their sources and operate safely in real-time. Enterprise teams get full control with teamspaces, RBAC, and usage reports. You can customize everything—launch fast on your own, or partner with our expert team.

From research to reporting, from digging through files to delivering results—Datagrid doesn’t just assist. It executes.

 

We’re looking for passionate individuals to join us at the frontier of AI innovation.

 

 

About the role:

 

Datagrid Agents operate where our customers work-across Teams, Slack, and even SMS. Agents make multistep plans, leverage vectorized data from 100+ sources, use tools like Docusign, and manipulate the Datagrid app

 

Software Architect, Agent Evaluation & Core Framework, is crucial because we cannot manually test the vast array of agent interactions and capabilities. You will own and drive extending our evaluation harness to provide actionable reports on agent regressions and improvements, directly impacting strategic direction and customer experience. A key part of this will be incorporating the best open-source benchmarks into our evaluation set, and figuring out how to Agentically generate evaluations that are representative of customer use cases. As you become established, you will also have the opportunity to make fundamental changes to the Core Framework to improve the way Agents reason, use tools, and collaborate with humans.

 

What you’ll do:

 

  • Work closely with an ex-Googler who built Gemini evals to create a harness for evaluating Agent performance, make that harness available both for local development and in CI/CD pipelines, and set up alerting for when Agents misbehave.
  • Influence and contribute to the extension of Datagrid’s Agentic capabilities.
  • Choose the best open/closed source components to build out the testing infra.
  • Integrate publicly available benchmarks such as RAGBench into the testing system.
  • Grant subject matter experts the ability to add to the test library using customer queries, manually authored cases, and synthetically generated questions.
  • Expose evaluation performance via alerts and dashboards

 

What you’ll have:
  • Proven track record of building test harnesses for Chat Agents from 0 ⇒ 1.
  • 10+ years of B2B software engineering experience.
  • Ability to write effective LLM prompts without assistance.
  • Proficiency with nodejs and server side frameworks such as NestJS or NextJS.
  • Familiarity with JavaScript frameworks such as React, Angular JS.
  • Experience with databases such as Weaviate and BigQuery.
  • Experience working with GCP or similar cloud providers.

 

Nice to Haves
  • Experience with any LLM evaluation platform (Galileo, Arize, LangSmith Orq)
  • Background in B2B SaaS automation tools
  • Contributions to open-source AI projects or published research
  • Familiarity with prompt engineering or model evaluation

 

Pay Range and Benefits

 

$200,000 – $240,000 USD per year, depending on experience and qualifications.

 

At Datagrid we set pay ranges using market data, internal benchmarks, and the scope of responsibilities. Final compensation within this range will be determined based on relevant experience, skills, and geographic location.

 

In addition to base salary, this role may be eligible for:

  • Equity in the company
  • Home office set-up reimbursement
  • Health, dental, and vision benefits
  • Flexible PTO and remote work options

 

Equal Opportunity Employer

Datagrid is an equal opportunity employer and is committed to building a diverse and inclusive team. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, disability status, or any other characteristic protected by law. We encourage candidates from all backgrounds to apply.



Related Jobs

Software Architect, Agent Evaluation & Core Framework

Datagrid - San Francisco, CA, USRemote

AI Agent Solutions Architect

Datagrid - San Francisco, CA, US

Sales Development Representative

Datagrid - United States of AmericaRemote
Disclaimer: Local Candidates Only
This company does NOT accept candidates from outside recruiting firms. Agency contacts are not welcome.