Validata Blog: Talk AI-powered Testing

Getting ready for Big Data Testing

As Testers, we need trustworthy data! Data is often the root cause of testing issues; we don’t always have the data we need, which causes blocked test cases, and defects get returned as “data issues.”

Big data will only add complexity to the issue. What data should be stored, and how long should we keep it? What data should be included in analytical processing, and how do we properly prepare it for analysis? Since most Big Data are unstructured, how does Big Data quality look like?

Defining Big Data

Let’s first define Big Data. Gartner, and most of the industry, defines big data as “high volume, high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making and process automation.” Big data is usually unstructured, which means that it doesn’t have a defined data model and it doesn’t fit into organized columns and rows. Big Data’s defining characteristics are Volume, Velocity and Variety. In other words, you have to process an enormous amount of data of various formats at high speed.

Testers, can you see some test scenarios here? Yes, indeed with big data comes big testing.

Testing Big Data

Gartner says the average organization loses $14.2 million annually through poor Data Quality. Data quality challenges can be overcome by deploying a structured testing approach. At the highest level, a big data testing approach involves both functional and non-functional components, along with strong test data and test environment management to ensure that the data from varied sources is processed error-free and is of good quality.

Functional testing will validate the quality of the data and the processing of it. Test scenarios in data quality include completeness, correctness and lack of duplication. Data processing can be accomplished in three ways— interactive, real-time and batch—but they all involve movement of data. Therefore, all big data testing strategies are based on the extract, transform and load (ETL) process.

It begins by validating data quality coming from the source databases, validating the transformation or process through which the data is structured and validating the loading.
Big-Data1

ETL testing has three phases. The first phase is the data staging, which is validated by comparing the data coming from the source systems to the data in the staged location. The next phase is the MapReduce validation of the data transformation. MapReduce is the programming model for unstructured data. This testing ensures that the business rules used to aggregate and segregate the data are working properly. The final ETL phase is the output validation phase where the output files from MapReduce are ready to be moved to the data warehouse.

ETL testing, especially testing the speed required for big data, requires automation. Automating Unit, Performance, and UI testing, and making sure there’s a clear visualization of the aggregate results of all testing activities in a single place easily accessible to all stakeholders. The big data regression test suite will be used multiple times so it should be automated and also should be reusable as this will save a lot of money and time during Big Data validations.

The emphasis is not only on managing data complexities and integrity, but also on the performance of the system which will make the data useful. Hence, failover & performance testing of the framework, and data rendition are required to verify the speed in which the system consumes the data, and determine optimum application performance.

By applying the right test strategies and following best practices, you can effectively prepare for your Big Data testing initiatives. Automation is the key to success in big data testing, as manual testing is impractical due to data volume and variety. As more and more begin to implement big data solutions, testers must sharpen their skills to test these complex implementations.


Copyright © 2018 Validata Group

powered by pxlblast
Our website uses cookies. By continuing to use this website you are giving consent to cookies being used. For more information on how we use cookies, please read our privacy policy