From Performance Engineering Lab

The paper 'Data Population of Testing Environments' by Teodora Sandra Buda has been accepted for publication at The ISSTA Doctoral Symposium.

Populating the testing environment represents a great challenge in software validation, generally requiring expert knowledge about the system under development, as its data critically impacts the outcome of the tests designed to assess the system. Current practices of populating the testing environments generally focus on developing efficient algorithms for generating synthetic data or use the production environment for testing purposes. The latter is an invaluable strategy to achieve good code coverage and provide real test cases. However, the production environment generally consists of large amounts of data that are difficult to handle and analyze. Database sampling from the production environment is a potential solution to overcome these challenges.

In this research, we propose two database sampling approaches aimed at populating the testing environment in a very fast way. One of the approaches aims at preserving the distribution of data in order to produce a representative sample. In particular, we focus on the dependencies between the data from different tables and the method tries to preserve the distributions of these dependencies.