OS

Passing

No ill-formed inputs

4
1
Training
Failing

No SSN

6
0
Training
Passing

No credit card information

1
1
Validation
Training
Passing

No ill-formed inputs

4
1
Training
Failing

No SSN

6
0
Training
Passing

No credit card information

1
1
Validation
Training
Passing

No ill-formed inputs

4
1
Training
Failing

No SSN

6
0
Training
Passing

No credit card information

1
1
Validation
Training
Passing

Is valid JSON

2
3
Validation
Passing

Is valid Python code

2
3
Validation
Failing

No column drift

3
0
Validation
Training
Passing

Is valid JSON

2
3
Validation
Passing

Is valid Python code

2
3
Validation
Failing

No column drift

3
0
Validation
Training
Passing

Is valid JSON

2
3
Validation
Passing

Is valid Python code

2
3
Validation
Failing

No column drift

3
0
Validation
Training
Passing

High GPT-evaluation score

4
1
Model
Validation
Passing

High answer relevancy

2
1
Model
Validation
Failing

Response is concise

5
1
Model
Validation
Passing

High GPT-evaluation score

4
1
Model
Validation
Passing

High answer relevancy

2
1
Model
Validation
Failing

Response is concise

5
1
Model
Validation
Passing

High GPT-evaluation score

4
1
Model
Validation
Passing

High answer relevancy

2
1
Model
Validation
Failing

Response is concise

5
1
Model
Validation
Passing

Response is factual

2
1
Model
Validation
Failing

High BLEU-1 score on protected subpopulation

12
1
Model
Validation
Passing

High METEOR score

2
1
Model
Validation
Passing

Response is factual

2
1
Model
Validation
Failing

High BLEU-1 score on protected subpopulation

12
1
Model
Validation
Passing

High METEOR score

2
1
Model
Validation
Passing

Response is factual

2
1
Model
Validation
Failing

High BLEU-1 score on protected subpopulation

12
1
Model
Validation
Passing

High METEOR score

2
1
Model
Validation
Passing

No ill-formed sentences

4
1
Training
Failing

No new tokens

4
0
Training
Passing

No disparity between gender-related pronouns

4
1
Validation
Training
Passing

No ill-formed sentences

4
1
Training
Failing

No new tokens

4
0
Training
Passing

No disparity between gender-related pronouns

4
1
Validation
Training
Passing

No ill-formed sentences

4
1
Training
Failing

No new tokens

4
0
Training
Passing

No disparity between gender-related pronouns

4
1
Validation
Training
Failing

No rows from the training set present in validation

4
0
Validation
Training
Passing

No new labels in the validation set

4
3
Validation
Training
Passing

No label drift

4
1
Training
Failing

No rows from the training set present in validation

4
0
Validation
Training
Passing

No new labels in the validation set

4
3
Validation
Training
Passing

No label drift

4
1
Training
Failing

No rows from the training set present in validation

4
0
Validation
Training
Passing

No new labels in the validation set

4
3
Validation
Training
Passing

No label drift

4
1
Training
Passing

No significant label drift

4
1
Validation
Training
Passing

No rows from the training set present in validation

12
2
Validation
Training
Failing

Expect high performance on sentences containing "help"

9
1
Model
Validation
Passing

No significant label drift

4
1
Validation
Training
Passing

No rows from the training set present in validation

12
2
Validation
Training
Failing

Expect high performance on sentences containing "help"

9
1
Model
Validation
Passing

No significant label drift

4
1
Validation
Training
Passing

No rows from the training set present in validation

12
2
Validation
Training
Failing

Expect high performance on sentences containing "help"

9
1
Model
Validation
Failing

High precision on sentences containing key tokens

12
1
Model
Validation
Failing

High confidence and accuracy on must-pass cases

15
1
Model
Validation
Passing

High precision on "urgent" predictions

2
1
Model
Validation
Failing

High precision on sentences containing key tokens

12
1
Model
Validation
Failing

High confidence and accuracy on must-pass cases

15
1
Model
Validation
Passing

High precision on "urgent" predictions

2
1
Model
Validation
Failing

High precision on sentences containing key tokens

12
1
Model
Validation
Failing

High confidence and accuracy on must-pass cases

15
1
Model
Validation
Passing

High precision on "urgent" predictions

2
1
Model
Validation
Passing

No more than 10 rows with nulls

4
1
Training
Failing

No more than 10 duplicate rows

4
0
Training
Passing

No significant difference in accuracy between gender feature values

4
1
Validation
Training
Passing

No more than 10 rows with nulls

4
1
Training
Failing

No more than 10 duplicate rows

4
0
Training
Passing

No significant difference in accuracy between gender feature values

4
1
Validation
Training
Passing

No more than 10 rows with nulls

4
1
Training
Failing

No more than 10 duplicate rows

4
0
Training
Passing

No significant difference in accuracy between gender feature values

4
1
Validation
Training
Failing

No rows from the training set present in validation

4
0
Validation
Training
Passing

No new labels in the validation set

4
3
Validation
Training
Passing

No null columns

4
1
Training
Failing

No rows from the training set present in validation

4
0
Validation
Training
Passing

No new labels in the validation set

4
3
Validation
Training
Passing

No null columns

4
1
Training
Failing

No rows from the training set present in validation

4
0
Validation
Training
Passing

No new labels in the validation set

4
3
Validation
Training
Passing

No null columns

4
1
Training
Passing

No significant label drift

4
1
Validation
Training
Passing

No rows from the training set present in validation

12
2
Validation
Training
Failing

Expect high performance on young adult females in South Africa

9
1
Model
Validation
Passing

No significant label drift

4
1
Validation
Training
Passing

No rows from the training set present in validation

12
2
Validation
Training
Failing

Expect high performance on young adult females in South Africa

9
1
Model
Validation
Passing

No significant label drift

4
1
Validation
Training
Passing

No rows from the training set present in validation

12
2
Validation
Training
Failing

Expect high performance on young adult females in South Africa

9
1
Model
Validation
Failing

High precision on "fraudulent" predictions

12
1
Model
Validation
Failing

High confidence and accuracy on must-pass cases

15
1
Model
Validation
Passing

High precision on "urgent" predictions

2
1
Model
Validation
Failing

High precision on "fraudulent" predictions

12
1
Model
Validation
Failing

High confidence and accuracy on must-pass cases

15
1
Model
Validation
Passing

High precision on "urgent" predictions

2
1
Model
Validation
Failing

High precision on "fraudulent" predictions

12
1
Model
Validation
Failing

High confidence and accuracy on must-pass cases

15
1
Model
Validation
Passing

High precision on "urgent" predictions

2
1
Model
Validation

Openlayer is the most advanced platform for tracking, testing, and monitoring your AI

Support for your task type

We support a diverse range of task types so that all of your use cases are covered.

LLMs
Text classification
Tabular classification
Tabular regression

Tests

Make sure your AI is at peak performance when it gets in the hands of your users. Tests are a great way to track all of the constraints that are important for your model's performance.

Data integrity

Start by improving the foundation of your AI system: the data.

Data integrity tests in Openlayer

Data consistency

Make sure data stays consistent between different datasets.

Data consistency tests in Openlayer

Performance

Identify underperforming subpopulations Choose from the most advanced metrics.

Performance tests in Openlayer

Fairness

Guard against biases and ensure equal treatment of sensitive groups.

Fairness tests in Openlayer

Robustness

Probe for edge-cases not captured by your data that you may encounter in the wild. See how your AI performs under adversarial attack.

Robustness tests in Openlayer

Monitoring

Add your monitoring pipeline to set production-specific tests and keep a close eye on your model behavior in the wild.

Real-time alerts

Something went wrong in production? Be the first to know with real-time pings on email, Slack, or in-app.

Openlayer

@sophia commented on No duplicate rows
– thoughts on changing the threshold?

Openlayer

Test status updated for No output drift
To 🔴 Failing  From 🟢 Passing

Evaluation windows

Different tests require different windows of data. Set custom evaluation windows to determine when to run each test.

Evaluation windows in Openlayer

Monitor dashboard

The monitoring dashboard offers a comprehensive view of current test results. Click on any individual test to dive deeper into its performance history.

Monitor dashboards in Openlayer

Diagnosis

Breeze through the "why?"s behind failed tests. The information you need to find the root cause of issues is at your fingertips.

Root cause analysis

Dive deeper into every test to understand why it is failing. Stop questioning what to do next to improve your model.

A graph in the Openlayer UI

Explainability

Understand which features are the culprits for driving model performance over a particular data slice or r the whole dataset.

Explainability UI in Openlayer

What-if-analysis

Perturb individual model inputs and see how the prediction changes. Compare the explainability scores and model predictions side-by-side to validate your hypotheses.

What-if analysis UI in Openlayer

Versioning & experiment tracking

Track and version your models, prompts, and datasets. Compare performance across versions, and systematically choose the best AI stack.

Prompt playground

Experiment with different models, prompts, and parameters and generate test cases.

Prompt playground in the Openlayer UI

Version commits

Keep track of and easily switch between model and dataset versions.

Adding a commit in the Openlayer UI

Side-by-side comparison

Quickly compare versions to pick a winner or revert ones that introduce regressions.

Version comparison in the Openlayer UI

Collaboration

Bring the whole team in on the development of your AI. Work with others to diagnosis issues and identify next steps. Keep everyone in the loop.

One shared workspace

Increase visibility with everyone in one place. Everyone stays up to date on the versions and progress towards deployment.

Lee
Jordan
Fatima
Pranav
Yasmin
Josh
Casey
Bianca
Zachary
Charlie
Wesley
Priyam
Aisha
Nathan
Nina
Maya
Andre
Helena
Jared
Rachel
Malcolm
Katie
Maddy
Lila
Jocelyn
Lauren
Lucien

Discussions

Work together to create tests and improve your model by leaving comments. Add context by tagging data, features, and more.

Comments in the Openlayer UI

How to get started

Sign up for free

Get your free account up and running in 60 seconds

Get a product walkthrough

Our team will give you a demo of our platform and answer any questions

Join the community

Join our Discord to leave feedback and chat with other users or our team

Start with an example

Start by creating a project from one of our guided example notebooks