Central estimators and uncertainty for making sense of huge experiments

A common use of cadCAD is to run thousands and thousands of simulations, each one with a different set of parameters and random seeds, and time and time again there is the problem of how to reliably compare the experiments, which can be very complex: they can involve a multitude of metrics as well as compositions of time series. Also, the values that arise from the models can give birth to arbitrary distributions, so that we can’t use standard analysis procedures which assume that everything is a Gaussian.

Yesterday, (and actually this post is just a summary of it) I had a discussion with @markusbkoch and @mzargham as for how reliably make sense of the estimators that we use for summarizing the distributions that arise when observing the experiments. Normally, one would use an central estimator like mean and median for aggregating the results and getting an quick view. If one goes on, he can even select an L-estimator [1]

But how much can we trust them? How reliable is the central estimator, and how can we make sense of them when needing to take a decision?

Those questions can quickly ramify into several ones, like:

  • What central estimator to use?
  • How to summarize the spreads and skew around the central estimator?
  • How sure we are of the central estimator itself?
  • How much the central estimator actually explains the data?
  • What is the use case for each one?

Central estimator

The first thing is to choose the central estimator. Why would you want one anyway? Because they usually have the property of being close to the region with the highest density of points of your distribution. It is sort of an treasure map to the place where you can find most occurrences of what you are studying.

Usually, people choose either mean - because it is intuitive and converges fast, or the median, because it is robust to outliers.

The mean usually works best when your distribution is sort of a Normal one - symmetric and without additional peaks, and the median works best when the distribution is more complicated. Usually when exploring system, people tend to use the median because it is easier to become biased with the mean.

There is also a entire class of metrics that are a in-between for the mean and median. They are the L-estimators, and you can find more about them on WIkipedia [1]

Uncertainty of the data

When you have little info about the distribution itself, the best way to summarize the data is by reporting the interquartile ranges, as this captures both the variance and the skewness of the data, but still reports it in a manner that you can visualize clearly what is the applicable range of it.

Normally, you would report the median together with the 25% and 75% percentiles. So, you could express your distribution as being located at median on 12.0, with 25% / 75% percentiles on 8.0, 14.5

Uncertainty of the central estimator itself

A issue that sometimes arise with the central estimator is: how much the central estimator is uncertain by itself? When we calculate the percentile and variance of the data, we are characterizing the scale of the randomness of the process behind the data. But what about the uncertainty behind our own estimate?

This is usually retrieved through the usage of the variance of the median, or variance of the mean if you want. In a Normal Distribution, the st. dev of the median is simply sigma_median = (pi/2)^(-1/2) * sigma / sqrt[N], where sigma is the stdev and N is the sample size. [2]

Interestingly, if you search for the st. dev for the mean, you’ll notice that it is lower by a constant factor. This means that the estimate of the mean itself is less uncertain than the median for the same amount of data. In other words: there is an trade-off between the median robustness and the speed of convergence for the mean.

That’s why you would use median for unknown distributions, and mean for simple distributions.

The math expression above is for a Normal distribution, but there is also the computational way of doing it, that would be to apply summarization into to the median itself. We would be looking into the median of the medians, and the variance / percentile of the medians.This usually requires characterizing / bootstrapping the distribution. Practically an simulation of the simulation. Otherwise, we would use MC runs or use grouping. If you want to, I’ve attached an notebook on this post doing this for a pathological distribution.

Variance of the medians when changing the sample size, both computationally and through the math expression

The distribution used for the plot above

Uncertainty of the central estimator given the data

Now we know how uncertain is our data, how uncertain is our central estimator. How do we connect them for getting to know how uncertain is our data given our central estimator?

There are some ways for doing that, but a intuitive one is to calculate the uncertainty band, and divide it by our central estimator. The uncertainty band is the size of the variance near your central estimator, and it can be the difference of your upper and lower percentiles, or the standard deviation.

The division between the two informs how much the scale of our central estimator can be accounted by the uncertainty when looking into the data. It is sort of indicative metric really, which allows you to compare in relative fashion how much is your process being random.

When should use those uncertainty measures?

It depends where are you looking into. Analysis tasks can be sometimes divided artifiically into inference-oriented, or characterization-oriented. In the first one, you are seeking regions to where to bet your decisions, and on the second one, you are seeking to quantify how much you are sure of it.

In the first one, normally using median and the interquartile ranges are the way to go. They inform in an straightforward manner where your certainty lies.

As for the second one, the usage of the variance / estimator ratio is sometimes more useful, as it is informative of how physically meaningful are your central estimator in light of the incoming data.


[1] - https://en.wikipedia.org/wiki/L-estimator

[2] - https://mathworld.wolfram.com/StatisticalMedian.html


The link for an Colab notebook calculating the variance of the medians for an pathological distribution: https://colab.research.google.com/drive/136xaod6A4OSe2EwvS2Rs8g_wcqfIHN5v?usp=sharing


I will definitely come back to this. Nassim Talebs “Technical Incerto” is very related especially when we want to model economies, w/o the pitfalls of traditional overly simplifying econometrics: https://arxiv.org/abs/2001.10488
"A few examples:

  • The sample mean is rarely in line with the population mean, with effect on “naive empiricism”, but can be sometimes be estimated via parametric methods.
  • The “empirical distribution” is rarely empirical.
  • Parameter uncertainty has compounding effects on statistical metrics.
  • Dimension reduction (principal components) fails.
  • Inequality estimators (GINI or quantile contributions) are not additive and produce wrong results.
  • Many “biases” found in psychology become entirely rational under more sophisticated probability distributions
  • Most of the failures of financial economics, econometrics, and behavioral economics can be attributed to using the wrong distributions."

What a great reference! I’ve loved it, and I’ll read it thoroughly over this week.

It is interesting you citing the pitfalls of overly simplifying econometrics, because you see, I am a physicist, and dealing with uncertainty is one of the cornerstones of our science. Without effective communication of where and how much we are uncertain, both in scale and form, hard data can quickly become a black box by itself.

Last semester, I did a course of historical economics, and one of the things that shocked me was how the economists were dealing with uncertainty when comparing data. Most of the time, they didn’t even bother to give a estimate of how wrong the data could be, as this could be a weakness for the arguments. Sometimes, they did bother to put an uncertainty, but most of the times it was grounded an a presumption of normality that didn’t had any inherent reason to be that.

A key thing to have in mind in uncertainty, is that probability in the end is relative and depends on the prior of the observer. It is hard to have universal conclusions about anything, but if you have clarity and transparece about your priors and preconceptions, and consider that all knowledge is intrinsically probabilistic in the sense that it is always uncertain, then science can follow along an more constructive approach.


wonderful! I was looking for a “quote” for a hitchhiker’s guide to token engineering, to combine the “businessy stakeholder mapping/design thinking parts” that will yield a stock & flow description … which we can start taking over to best practices from this group like adding the “diamonds” and porting over to Differential Syntax… Already some feedback came back in a sense to separate “the soft” from the “hard engineering” part, but I think that has been the problem for too long… Also why systems thinking or “Business Dynamics” remained “niche”… We need scientists and engineers to care about the “soft” side and not just say: “gimme your assumptions” to the business side - unless we made a good enough job to help them arrive at not so wrong assumptions …

Anyways, this I think you’ll like as well: Exponential Economist Meets Finite Physicist | Do the Math

…Will take this thread as an inspiration to knit the communication gap between business and tech, in case of token economies, economic/legal/financial engineers and systems/software engineers