Generating test data
It is enormously useful to be able to quickly generate test data so that we can create QlikView applications and test different aspects of development and discover how different development methods work. By creating our own set of data, we can abstract problems away from the business issues that we are trying to solve because the data is not connected to those problems. Instead, we can resolve the technical issue underlying the business issue. Once we have resolved that issue, we will have built an understanding that allows us to more quickly resolve the real problems with the business data.
We might contemplate that if we are developers who only have access to a certain dataset, then we will only learn to solve the issues in that dataset. For true mastery, we need to be able to solve issues in many different scenarios, and the only way that we can do that is to generate our own test data to do that with.
Generating dimension values
Dimension tables will generally have lower numbers of records; there are a number of websites online that will generate this type of data for you.
For quite a while, I used http://www.generatedata.com to generate random data such as company names, and so on. However, in a recent blog entry by Barry Harmsen (Barry is the co-author of QlikView 11 for Developers, Packt Publishing) at http://www.qlikfix.com, he mentioned http://www.mockaroo.com as a resource for generating such tables.
The following screenshot demonstrates setting up a Customer extract in Mockaroo:
This allows us to create 1,000 customer records that we can include in our QlikView data model. The extract is in the CSV format, so it is quite straightforward to load into QlikView.
Generating fact table rows
While we might often abdicate the creation of test dimension tables to a third-party website like this, we should always try and generate the Fact
table data ourselves.
A good way to do this is to simply generate rows with a combination of the AutoGenerate()
and Rand()
functions.
For even more advanced use cases, we can look at using statistical functions such as NORMINV
to generate normal distributions. There is a good article on this written by Henric Cronström on Qlik Design Blog at http://community.qlik.com/blogs/qlikviewdesignblog/2013/08/26/monte-carlo-methods.
We should be aware of the AutoGenerate()
function that will just simply generate empty rows of data. We can also use the Rand()
function to generate a random number between 0 and 1 (it works both in charts and in the script). We can then multiply this value by another number to get various ranges of values.
In the following example, we load a previously generated set of dimension tables—Customer
, Product
, and Employee
. We then generate a number of order header and line rows based on these dimensions, using random dates in a specified range.
First, we will load the Product
table and derive a couple of mapping tables:
// Load my auto generated dimension files Product: LOAD ProductID, Product, CategoryID, SupplierID, Money#(CostPrice, '$#,##0.00', '.', ',') As CostPrice, Money#(SalesPrice, '$#,##0.00', '.', ',') As SalesPrice FROM Products.txt (txt, utf8, embedded labels, delimiter is '\t', msq); Product_Cost_Map: Mapping Load ProductID, Num(CostPrice) Resident Product; Product_Price_Map: Mapping Load ProductID, Num(SalesPrice) Resident Product;
Now load the other dimension tables:
Customer: LOAD CustomerID, Customer, City, Country, Region, Longitude, Latitude, Geocoordinates FROM Customers.txt (txt, codepage is 1252, embedded labels, delimiter is '\t', msq); Employee: LOAD EmployeeID, Employee, Grade, SalesUnit FROM Employees.txt (txt, codepage is 1252, embedded labels, delimiter is '\t', msq) Where Match(Grade, 0, 1, 2, 3); // Sales people
We will store the record counts from each table in variables:
// Count the ID records in each table Let vCustCount=FieldValueCount('CustomerID'); Let vProdCount=FieldValueCount('ProductID'); Let vEmpCount=FieldValueCount('EmployeeID');
We now generate some date ranges to use in the data calculation algorithm:
// Work out the days Let vStartYear=2009; // Arbitrary - change if wanted Let vEndYear=Year(Now()); // Generate up to date data // Starting the date in April to allow // offset year testing Let vStartDate=Floor(MakeDate($(vStartYear),4,1)); Let vEndDate=Floor(MakeDate($(vEndYear),3,31)); Let vNumDays=vEndDate-vStartDate+1;
Tip
Downloading the example code
You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Run a number of iterations to generate data. By editing the number of iterations, we can increase or decrease the amount of data generated:
// Create a loop of 10000 iterations For i=1 to 10000 // "A" type records are for any date/time // Grab a random employee and customer Let vRnd = Floor(Rand() * $(vEmpCount)); Let vEID = Peek('EmployeeID', $(vRnd), 'Employee'); Let vRnd = Floor(Rand() * $(vCustCount)); Let vCID = Peek('CustomerID', $(vRnd), 'Customer'); // Create a date for any Time of Day 9-5 Let vOrderDate = $(vStartDate) + Floor(Rand() * $(vNumDays)) + ((9/24) + (Rand()/3)); // Calculate a random freight amount Let vFreight = Round(Rand() * 100, 0.01); // Create the header record OrderHeader: Load 'A' & $(i) As OrderID, $(vOrderDate) As OrderDate, $(vCID) As CustomerID, $(vEID) As EmployeeID, $(vFreight) As Freight AutoGenerate(1); // Generate Order Lines // This factor allows us to generate a different number of // lines depending on the day of the week Let vWeekDay = Num(WeekDay($(vOrderDate))); Let vDateFactor = Pow(2,$(vWeekDay))*(1-(Year(Now())-Year($(vOrderDate)))*0.05); // Calculate the random number of lines Let vPCount = Floor(Rand() * $(vDateFactor)) + 1; For L=1 to $(vPCount) // Calculate random values Let vQty = Floor(Rand() * (50+$(vDateFactor))) + 1; Let vRnd = Floor(Rand() * $(vProdCount)); Let vPID = Peek('ProductID', $(vRnd), 'Product'); Let vCost = ApplyMap('Product_Cost_Map', $(vPID), 1); Let vPrice = ApplyMap('Product_Price_Map', $(vPID), 1); OrderLine: Load 'A' & $(i) As OrderID, $(L) As LineNo, $(vPID) As ProductID, $(vQty) As Quantity, $(vPrice) As SalesPrice, $(vCost) As SalesCost, $(vQty)*$(vPrice) As LineValue, $(vQty)*$(vCost) As LineCost AutoGenerate(1); Next // "B" type records are for summer peak // Summer Peak - Generate additional records for summer // months to simulate a peak trading period Let vY = Year($(vOrderDate)); Let vM = Floor(Rand()*2)+7; Let vD = Day($(vOrderDate)); Let vOrderDate = Floor(MakeDate($(vY),$(vM),$(vD))) + ((9/24) + (Rand()/3)); if Rand() > 0.8 Then // Grab a random employee and customer Let vRnd = Floor(Rand() * $(vEmpCount)); Let vEID = Peek('EmployeeID', $(vRnd), 'Employee'); Let vRnd = Floor(Rand() * $(vCustCount)); Let vCID = Peek('CustomerID', $(vRnd), 'Customer'); // Calculate a random freight amount Let vFreight = Round(Rand() * 100, 0.01); // Create the header record OrderHeader: Load 'B' & $(i) As OrderID, $(vOrderDate) As OrderDate, $(vCID) As CustomerID, $(vEID) As EmployeeID, $(vFreight) As Freight AutoGenerate(1); // Generate Order Lines // This factor allows us to generate a different number of // lines depending on the day of the week Let vWeekDay = Num(WeekDay($(vOrderDate))); Let vDateFactor = Pow(2,$(vWeekDay))*(1-(Year(Now())-Year($(vOrderDate)))*0.05); // Calculate the random number of lines Let vPCount = Floor(Rand() * $(vDateFactor)) + 1; For L=1 to $(vPCount) // Calculate random values Let vQty = Floor(Rand() * (50+$(vDateFactor))) + 1; Let vRnd = Floor(Rand() * $(vProdCount)); Let vPID = Peek('ProductID', $(vRnd), 'Product'); Let vCost = ApplyMap('Product_Cost_Map', $(vPID), 1); Let vPrice = ApplyMap('Product_Price_Map', $(vPID), 1); OrderLine: Load 'B' & $(i) As OrderID, $(L) As LineNo, $(vPID) As ProductID, $(vQty) As Quantity, $(vPrice) As SalesPrice, $(vCost) As SalesCost, $(vQty)*$(vPrice) As LineValue, $(vQty)*$(vCost) As LineCost AutoGenerate(1); Next End if Next // Store the Generated Data to QVD Store OrderHeader into OrderHeader.qvd; Store OrderLine into OrderLine.qvd;
Note
Barry Harmsen, co-author of QlikView 11 for Developers, Packt Publishing, recommends a slightly different method for generating seasonal variation. By using the Sin()
or Cos()
functions to generate a table containing the number of records to generate for each day, we can loop across this table and use these values to auto-generate rows for the fact table.