A few weeks ago I published a post detailing the process by which to create a histogram with dynamic bins using DAX in PowerPivot. While standard histograms are useful to visualize distribution, even greater analytic value can be derived from that same data by converting a histogram into a distribution chart. Those of you who have taken courses in statistics are probably familiar with the normal distribution curve, which is a distribution chart that expresses common variation from the mean.
As a continuation of my previous post, I’ll build a distribution chart that adds both a dynamic mean and dynamic standard deviation lines to the dynamic bins of the histogram:
First, a dynamic mean needs to be added to PowerPivot using DAX. The dynamic mean [SPEED AVG] is added to the calculation bar as follows:
In order to create a “specialized version” of [SPEED AVG] that won’t slice by the x-axis [BIN NAME] values, another calculation can be added to the calculation bar. It is named [DISTR SPEED AVG]:
DISTR SPEED AVG:=CALCULATE([SPEED AVG],ALL(‘Fact StrikeReport'[BIN NAME]))
Quite often, both the mean and the median are often displayed together as a representation of how close a distribution curve is to a perfectly normal distribution curve. With a perfect normal distribution curve, the mean and the median will be one and the same. I won’t be adding a median line for this example, but if you’d like to give it a try you might want to reference this article from Javier Guillen or this article from Marco Russo.
Now let’s get going on the standard deviation lines. The standard deviation DAX function [StDev] can be added to the calculation bar to calculate dynamically for the population as determined by selected parameters in the report:
StDev:=CALCULATE(STDEV.P([SPEED]),ALL(‘Fact StrikeReport'[BIN NAME]))
It should be noted that the data used for this example does not appear to be on a normal distribution curve, so by no means is the report a textbook use of standard deviation calculations. However using standard deviation lines on the chart provides, if nothing else, a useful visual aid for understanding and interpreting data. The standard deviation lines for each tail of the curve can be created with DAX on the calculation bar as follows:
StDevMaxLine:=[DISTR SPEED AVG] + [StDev]
StDevMinLine:=[DISTR SPEED AVG] – [StDev]
Now that there is a dynamic mean and standard deviation lines, they need to be assigned to bins on the x axis. First, determine where the lines are greater than or equal to a bin floor or less than a bin ceiling using DAX on the calculation bar:
For [DISTR SPEED AVG]:
SpAvgBelow:=COUNTROWS(FILTER(‘Fact StrikeReport’,CALCULATE([DISTR SPEED AVG],’Fact StrikeReport’) < [Bin Ceiling]))
SpAvgAbove:=COUNTROWS(FILTER(‘Fact StrikeReport’,CALCULATE([DISTR SPEED AVG],’Fact StrikeReport’) >= [Bin Floor]))
StdevMaxBelow:=COUNTROWS(FILTER(‘Fact StrikeReport’,CALCULATE([StDevMaxLine],’Fact StrikeReport’) < [Bin Ceiling]))
StdevMaxAbove:=COUNTROWS(FILTER(‘Fact StrikeReport’,CALCULATE([StDevMaxLine],’Fact StrikeReport’) >= [Bin Floor]))
StdevMinBelow:=COUNTROWS(FILTER(‘Fact StrikeReport’,CALCULATE([StDevMinLine],’Fact StrikeReport’) < [Bin Ceiling]))
StdevMinAbove:=COUNTROWS(FILTER(‘Fact StrikeReport’,CALCULATE([StDevMinLine],’Fact StrikeReport’) >= [Bin Floor]))
Last but not least, use the above calculations to determine when each standard deviation line is within a specific bin. Each of the bins [GraphedAvgSpeed], [SdevMax] and [SdevMin] will be assigned the same count of [INCIDENTS] as the actual count of that bin for data visualization purposes:
SdevMax:=IF([StdevMaxBelow] > 0,IF([StdevMaxAbove] > 0,[INCIDENTS],0),0)
SdevMin:=IF([StdevMinBelow] > 0,IF([StdevMinAbove] > 0,[INCIDENTS],0),0)
GraphedAvgSpeed:=IF([SpAvgBelow] > 0,IF([SpAvgAbove] > 0,[INCIDENTS],0),0)
Now, the fun can begin as you graph the data on an Excel PivotChart. A few tips to help optimize the visualization of the data in your dynamic BI report:
- Change the Excel Series Chart Type to “line” for [INCIDENTS] and add [SdevMax], [SdevMin], and [GraphedAvgSpeed] as bars. All three calculations are on the primary y axis, and [BIN NAME] is still on the x axis.
- Right click on a bar, select “Format Data Series,” and under “Series Options” change the overlap to 100%. This places the bars in the middle of the bins. While not completely true to the core calculations, a uniform look can be achieved this way with evenly spaced bars.
- Right click on the line, select “Format Data Series,” and under “Line Style” select “smoothed line.” The distribution line will look less choppy.
Due to placement and centering of the dynamic average and the standard deviation lines within bins, some of the lines will be slightly askew from their appropriate location. However, as mentioned earlier, they can be useful visualization aids for a chart. As a side note, the greater the number of bins, the better the visualization of the data constituting the lines. Following is a distribution chart built from a histogram having dynamic bins, a dynamic mean, and dynamic standard deviation lines:
Filter for Barn Swallows, and the BI chart will adjust accordingly: