Hi! Thanks for dropping by my blog, it is much appreciated!
Generating test data has always been a problem with data warehousing.
This is one reason why our BIDA data models are so light on demonstration data.
However, I recently had to generate some more test data and found a way I didn’t know about to generate key ranges.
So I thought I would put it on my blog for those like me who did not know about this!
The following statement will generate an integer in the range of 1000 to 9999.
FLOOR(RAND(CHECKSUM(NEWID()))*(9999-1000)+1000)
It will generate a new number for each row that is updated.
So what I have been able to do is to:
Define a domain for a field value.
Set a “field setter” column in the valid range using this random number generator.
Then perform update statements for the field setter to set the text to the values I want.
If I do not want a even coverage then I can run the field setter with constraints on the integer key for the table to randomly set more fields to different values.
Here is an example of how to do this.
Say you have 4 values of Marital Status and the keys go from 93 to 96 in the dimension table. We could then set the text in the vm_person table as follows.
update vm_person set field_setter = FLOOR(RAND(CHECKSUM(NEWID()))*(96-93)+93)
This would set the field_setter integer field.
We could then issue the following update.
update vm_person set marital_status = case field_setter when 93 then 'Single' when 94 then 'Married' when 95 then 'Divorced' when 96 then 'Widow' else 'Unk' End
Of course, this is the simplest of examples. It could be made much more complicated.
Hope this helps some of you create test data out there!
Best Regards
Peter