Scenarios

Scenarios allow you to shape numeric distributions based on other columns in your schema. For example, let's say your want to generate a file where each row represents the sale of a car including the model, region, sale price, and date of sale.


Sales e76b54a14e96cad7f7578dfabe8354a579a20b4697d7775fd814fefdcf3b8281

Here we use the normal distribution field type to generate reasonable prices. Let's look at some sample data:

datemodelregionprice
2014-10-26ExplorerSE25341
2014-10-30MustangNE25051
2014-10-17FocusSE26003
2014-10-18FocusMW24396
2014-10-02MustangMW25670
2014-10-09ExplorerNW25137
2014-10-14ExplorerSE24027
2014-10-24FocusSW26206
2014-10-10ExplorerSW22668
2014-10-18ExplorerNE23611

See the problem? All models cost about the same on average. This isn't realistic. Let's create a scenario to better model the real world prices of each model.


Scenario ff26b5395c45396e8db05160b8ffaeb701a4914a5b48f041ab46b7650454d81a

Here we use the value of the model column to control the price range. We make the Focus model less expensive while boosting the price of the Explorer. We also adjust the standard deviation to simulate the wider price fluctuations seen on more expensive models.

Now let's change our schema to use our new scenario...


Sales weighted 23559f640c5448b4b89591b4dc8a0f927a030ee6ba1c88f3a6f06432d7306e8f

Let's have a look at some sample data...

datemodelregionprice
2014-10-05FocusSW16206
2014-10-20ExplorerSW27987
2014-10-13ExplorerSE31191
2014-10-17FocusSE16809
2014-10-25FocusNE16229
2014-10-21ExplorerNW29149
2014-10-28ExplorerNW30061
2014-10-15MustangMW26221
2014-10-03ExplorerNE28423
2014-10-29MustangMW26568

Much better! Now our sales figures accurately represent the average price of each model.