Synapse QA

Testing Data Lake

We all have heard about data lake and its importance, but we rarely talk about a testing results data lake and how it can help us. Companies have always struggled with not having a single place to hold all testing results & related data. All the data related to testing have been scattered, hard to consolidate, move around & have posed to be one of the biggest barriers for testing to succeed.

We have seen in the companies where for instance automation testing team has its way of storing, managing, and maintaining their results data, similarly, performance testing, security testing & other types of testing have their own ways. This trend is seen across teams, companies & industry.

To give an example, on the performance testing & engineering, the load testing tools would have its own way of storing the data, profiling & analysis tools would have its own way of storing. Due to this, it’s highly challenging to port, move around the data from one store to another. There isn’t any simpler way to connect these different data stores to provide flexibility and power to the testing team, a community across the organizations.

Data is the new oil today, and the benefits are equally applicable for the testing domain as well.

A data lake is a centralized repository that can be created and used to store all the structured and unstructured data. A testing data lake can be created by pushing all the various test results & related data into a centralized repository that can store the data, for example, AWS S3 which is an object storage service can be used as a data lake.

How testing data lake can be created

Let us consider a company that has a testing team named “TCOE”. “TCOE” has been conducting automation testing, performance testing, exploratory testing, security testing, accessibility testing across their products.

“TCOE” after they execute performance test runs, they would push all test results and data into a data lake they have created. All the data can be stored as a parquet file or XML file or JSON file or any other format before it gets pushed to a data lake. The files could store response times, hits per second, errors, CPU, memory utilization as XML / Parquet files and push it to a data lake.

Automation testing results like automation script pass/fail, validations, screenshots, errors & other details can be pushed to the data lake as well. Similarly, other types of testing related data can be pushed to the data lake.

One can choose the technology and service on how to store, construct & push to data lake based on their requirements. Similarly, we can choose technology, service & cloud / on-premises for building the data lake based on the requirements

There are plethora of options to choose from and build the entire ecosystem.

data lake illustration
Data Lake ecosystem

Benefits of building a data lake:

  1. Live feed of the overall quality of the product. If a product is getting ready for the release, a live running dashboard built out of the data lake can provide every second of the overall quality, how a product is doing with performance, security, accessibility, and other testing. If there are 50 different products, a data lake can consolidate the overall quality of the company, break down by product, break down by testing types. The live dashboard can be so flexible that it can meet the needs of different teams and people across the organization. For example, a CEO wanting to get a holistic view of the quality of all products to an engineer wanting to specifically look at a defect detail.
  2. Tools, framework, reports that we use across various testing (performance, automation, security, and others) are all different. Companies have always struggled to consolidate results, reports, and provide a unique single view of the quality. Data lake can help address this challenge.
  3. Data lake can provide complete flexibility to build any kind of visualization, dashboards reports, metrics.
  4. Perform predictive analysis, real-time analytics, etc.
  5. Build AI solutions & ML models to make better decisions.
  6. Foster innovation around testing, quality.
  7. Drive product overall success.

How testing teams can reap the benefits of a data lake

The whole idea of a testing data lake is to push all test results & test-related data into a data lake. The data could be across:

The above image depicts how a company could build a testing data lake. Once the data lake is built, the sky is the limit on what we would want to do.

Future of testing data lakes

Companies can also look at building a data lake from the production data. The production data lake and testing data lake can be integrated to build a robust solution that can provide insight into a lot of things.Some of the advantages are:

The list of other advantages can go on, there are plenty of them.

We could have an open-source testing data lake where companies can come subscribe, retrieve, and make use of these data for the larger benefit of the software industry. We could have an open-source data model where APIs can be exposed to companies across the industry to contribute and leverage equally.

The above image depicts how we could have an open-sourced testing data lake where companies across the industry can contribute and leverage. Having an open-source data lake has benefits like identifying similar pattern & kind of issues, helping each other. The testing data lake can be created using different technologies, one such example is using AWS S3 to create a data lake and using other AWS services like Kinesis, Athena, Quick sight to build dashboard and analytics.

Predictive analytics and machine learning can be done using AWS Deep learning and sage maker.

The other way is to use sumologic, Jenkins, Grafana, and a combination of other tools. There are many other ways you could build this. Teams need to figure out what is the best way to build this based on requirements and needs of the organization.

Building and owning a Testing data lake is going to be very important for companies to succeed and hence companies need to start thinking and laying out the building blocks to have this in place. There are a lot of advantages and benefits to building this for testing teams across the industry, the list is infinite.


About the Author:

Mahesh M | Senior Software Development Manager

I have never run away from challenges and have always embraced challenges with open arms since I believe challenges are what can shape a person and provide opportunities to grow. My journey spanning many years has taken me across various companies like Accenture, Sony, Ellucian providing me an opportunity to learn, contribute, and grow. I have played various Leadership roles in many large-scale engagements across various companies. I have been involved in leading various testing operations and engagements.

Have Managed Challenging and Critical engagements which demanded building, managing, and nurturing high performance, result-oriented teams to support delivery and contribute towards the success of the organization. I have always enjoyed various opportunities, challenges, successes, failures that were thrown at me in my career and have received them equally. Each of them has taught me one or the other lesson and made me only stronger.

Exit mobile version