Average vs. Median in a sample vs. the full population

Let's say I take a random sample from a full population of which I know average and median. Can I estimate the average and median for the sample from the average and median of the full population (without an explicit calculation)? I am assuming the median should be close, while the averages do not need to be close at all, depending on the distribution. In other words, if I select an hypothetical random element, the value for that element should be probabilistically closer to the median of the population rather than the average.

asked Jun 14, 2022 at 15:11 1,125 1 1 gold badge 6 6 silver badges 16 16 bronze badges

1 Answer 1

$\begingroup$
  1. You can use the sample mean to estimate the population mean and the sample median to estimate the population median. For some distributions they may not be the best estimators, but they are natural estimators.
  2. Whether the absolute distribution of the distance from the sample median to the population median is more or less than the sample mean to the population mean will depend on the distribution.

As an example, consider a sample sized $5$ from a normal distribution with mean and median $0$ , and let's simulate that $10^5$ times using R. The sample median is closer to $0$ than the sample mean is in about $40\%$ of simulations, and the dispersion for the sample median (red) is wider than for the sample mean (blue):

 set.seed(2022) samplem sims  

enter image description here

Now consider a Laplace distribution (i.e. an exponential distribution with random signs). The sample median is closer to $0$ than the sample mean is in about $55\%$ of simulations, and the dispersion for the sample median (red) is narrower than for the sample mean (blue):

set.seed(2022) samplem sims  

enter image description here

  1. Your third paragraph is actually a different question, and you would have to explain what precisely what you mean by the value of a random variable being "probabilistically closer to the median of the population rather than the average" as well as noting that for a symmetric distribution the population mean and median will be equal (if they are both uniquely defined), as in the two previous examples. It is correct that the expected absolute distance to a central point is minimised when that central point is the population median, while the expected square of the distance to the central point is minimised when that central point is the population mean.