Today I had a discussion about the representation of test data. Using the system functions will achieve a 100% complete representation of the objects we want to test. But will we achieve the same if we use a separate test data-loading tool, which will insert the records in the database directly?

Basically there are 3 ways to build your test data:

  1. Through system functions
  2. Through separate test data loading tools (via SQL for instance)
  3. Using production data (if you’re working on a migration project)

Personally I don’t like to use production data. It can contain a lot of fuzziness in the new system and I’ve seen a lot of extra testing time wasted because of that. It can be useful to replay some situation on production and see if the new system improves or corrects it.

Here are some of the pros and cons using system functions and data loading tools.

Through system functions Through load tool
Pros:
  • Implicitly test system functions and integrity etc.
  • Making sure we don’t introduce business rules testing defects
  • Quicker when we need to insert big data into the database.
  • Most of the testers know how to use these tools
Cons:
  • We need a working/running system in order to use the system functions
  • Slower when we need to insert big data
  • No control over input resulting inconstancy or not allowed testing situations
  • Not so sure about representation in objects and skipping hidden business rules for instance

Until now I’ve been using a mix of loading tools and system functions during testing. When we need to load big data into the database to have a central initial situation we use the loading tools. In order to test the situations (like updating and reading objects) we use the system functions. But be careful! Maintaining this to be imported data can be a hell of a job. Changing one of the values in this batch can break other tests, so there should be someone gatekeeping these changes in a structural way.

Most of the test cases I like to generate the data through system functions. This also makes sure that the system makes unique instances of test objects, so it won’t influence the other tests running at the same time. Sometimes I reuse the generated data in a script to test multiple variations of situation. After every completed test I simply remove all the data the tests created, so it won’t hurt any of the other tests.

From my point of view it’s better to create test data on the spot. And if you can cluster some of the test data organize your tests by feature (or User Story). When your test is telling you something is broken you can simply investigate that piece of test (of set of tests) containing the data it needed. If the test data needs to be changed you’ll don’t need to worry about other tests breaking.