Generating Test Data Using Random Number Generation

0
1695

Hi! Thanks for dropping by my blog, it is much appreciated!

Generating test data has always been a problem with data warehousing.

This is one reason why our BIDA data models are so light on demonstration data.

However, I recently had to generate some more test data and found a way I didn’t know about to generate key ranges.

So I thought I would put it on my blog for those like me who did not know about this!

The following statement will generate an integer in the range of 1000 to 9999.

 

FLOOR(RAND(CHECKSUM(NEWID()))*(9999-1000)+1000)

 

It will generate a new number for each row that is updated.

So what I have been able to do is to:

Define a domain for a field value.

Set a “field setter” column in the valid range using this random number generator.

Then perform update statements for the field setter to set the text to the values I want.

If I do not want a even coverage then I can run the field setter with constraints on the integer key for the table to randomly set more fields to different values.

Here is an example of how to do this.

Say you have 4 values of Marital Status and the keys go from 93 to 96 in the dimension table. We could then set the text in the vm_person table as follows.

update vm_person
set field_setter = FLOOR(RAND(CHECKSUM(NEWID()))*(96-93)+93)

This would set the field_setter integer field.

We could then issue the following update.

update vm_person set marital_status = case field_setter 
 when 93 then 'Single'
 when 94 then 'Married'
 when 95 then 'Divorced'
 when 96 then 'Widow'
 else 'Unk'
 End

 

Of course, this is the simplest of examples. It could be made much more complicated.

Hope this helps some of you create test data out there!

Best Regards

Peter

Previous articleGetting Field Names And Data Types from SQL Servers Catalog
Next articleHow Do I Determine CPU / Elapsed Time in SQL Server Stored Procedures
Peter Nolan
Peter Nolan is one of the worlds leading thought leaders in Business Intelligence. Across his 29+ years in BI Peter has consistently invented new and innovative ways of designing and building data warehouses. SeETL now stands alone as the worlds most cost effective data warehouse development tool.