Testing in Machine Learning

Testing in Machine learning is way too different from the way we test software traditionally(functional tests, regression tests, etc.) where we check for actual vs expected behaviour of any given application. In the ML world, data sets with desired behaviour are used to train the model(learn the logic) and we need to check if the trained model consistently provides us with the expected output.

Phases in an ML project and the corresponding stages of the data pipeline:

Phases in ML

*Source: https://greatexpectations.io/blog/ml-ops-data-quality/

In order to proceed step by step, we look at the Validation stages in the diagram above and that’s where we need the tests as a part of the ML pipeline. Data ingestion stage ingests the raw data set and Data cleaning stage performs the preprocessing and normalization required in the training samples such as removing missing values, duplicates etc. The output of the stage is a cleaned data set.

There are two kinds of tests that you will need to focus on:

  • Pre train tests: The ones that include assertions about the data set, more of data validation after data ingestion and data cleaning stages. Now, why do we need those? The cliche “Garbage in, garbage out” is popular in Machine learning as the quality of data that is used for training determines the quality of predictions. That’s how Data Validation in Machine Learning is imperative, not optional!
  • Post train tests: The ones that will come into the picture after the model has been trained using our validated data set. As a starting step, you would want to cover the minimum functionality tests in this and determine the numbers: f1 score, recall and precision(it’s important to have a deep understanding of what each of them means and how to get these numbers).

Getting started with Pre train tests

  • If this is your first time working with data, then this also might be the first time when you are looking at such a large data set maybe 100k lines(or more) of json records. In order to read and understand the data set, you might want to familiarize yourself with Jupyter notebooks and pandas library.
# To view the test data, columns and values of records in the Jupyter notebook and see the output of each command.

import pandas as pd

test_df = pd.read_json('path/to/raw_data.jsonl', lines=True)

  • You can refer the pandas cheat sheet here.
  • Now, that you are comfortable using the pandas library and would like to move to writing some data validation tests. Great expectations library helps you to write pre train tests with some data validations.

From the Great Expectation documentation, here is what it says:

With Great Expectations, you can assert what you expect from the data you load and transform, and catch data issues quickly — Expectations are basically unit tests for your data.

In brief, the kind of data validations you might like to add:

– expect_column_to_exist

– expect_column_values_to_be_unique

– expect_column_values_to_not_be_null

– expect_column_values_to_match_regex

– you can refer the full list of expectations here.

A basic script with minimal validations using Great Expectations library might look like:


from pathlib import Path

import pandas as pd

import great_expectations as ge


assert Path(DATA).exists()

raw_data = pd.read_json(f’{DATA}/raw/raw_data.jsonl’, lines=True)

print(‘\n — Columns from raw data: — \n’, raw_data.columns)

ge_raw = ge.from_pandas(raw_data)

expected_columns = [



result = ge_raw.expect_table_columns_to_match_set(expected_columns,     exact_match = True, result_format={‘result_format’: ‘BASIC’}, include_config=True, catch_exceptions=True)

assert result[‘success’]

To add the pre-train tests to your pipeline, you might want to understand the existing ML pipeline, in my case I was introduced to dvc.yaml which consists of all stages involved in a ML pipeline specified in a yaml format.

In short, a dvc.yaml is a file that contains all the individual stages of a machine learning pipeline starting from data ingestion to training to evaluation. For better understanding, please read more about DVC and dvc.yaml.

How to add data validation stages to the existing dvc.yaml?


#first stage: data ingestion, second stage: data validationstages:


   cmd: wget /path/to/raw_data.jsonl -P ./data/raw


   - ./data/raw/raw_test_data.jsonl


   cmd: python ./data_validations/validations_get_dataset.py


     - ./data/raw/raw_test_data.jsonl

     - ./data_validations/validations_get_dataset.py


woah! You just finished adding pre-train tests to your ML pipeline. The next time your pipeline is run for retraining the model, your data validation checks will be in place to make sure that the training data is as expected.

Post train tests require a deeper understanding of a few concepts, which are summarized in brief(in order) below:

  • There are three types of Post train tests: Minimum functionality tests, Invariance tests and Directional Expectation tests. You can read more about these here.
  • In order to get started with Minimum functionality tests and be able to check for the consistency of the results, we need to introduce ourselves to the confusion matrix and the classification report(recall, precision and f1 score).
  • Once you have a decent understanding of each of these metrics, you might want to proceed to get these numbers for your data. In Machine Learning, we usually split our data set into two subsets: training data and testing data which we call as Test/Train split.
  • Next up we want to see the numbers as a part of the classification report, for which you might want to read up here.
  • Every time you train the model, you can make sure that the numbers do not drop. These tests can be included as a part of the ML pipeline the same way as pre-train tests.
  • For more visual representation of the classification report, you can use the library matplotib.

I hope the information here helps you in getting started with Testing in Machine Learning. However, I would be interested to know how you test data/ML pipeline in your project? Let me know in the comments.

PS: I also write about life experiences in general and share travel stories on YouTube.

Know Our Writer

Anisha Narang

Principal Software Quality Engineer at Red Hat

Anisha has been with Red Hat for over 8 years now and is passionate about Quality Engineering. She has worked with multiple test automation tools and continues to explore more. Also spoken at multiple tech conferences on topics related to software quality engineering. An active member of the Women Leadership Community at Red Hat. Until the pandemic hit she was quite a solo traveler currently she is sharing her travel stories on the internet on Medium and YouTube.

Upcoming Events

Supernovas galaxyquest

About Us

SYNAPSE QA is a community-driven space that aims to foster, support and inspire talents in tech industry(mainly QAs) by bridging the gaps between knowns and unknowns!

Subscribe For The Latest Updates

Subscription Form

Leave a Reply

Up ↑

%d bloggers like this: