Sampling his conquered population must certainly have been fun for Gengis Khan, and, as they say, "size does matter".
Slightly more seriously, any actual living population, say of Caucasians, is thought of as having been drawn independently, one at a time, from some imagined process such that the drawings are characterized by that formula for a normal probability density function.
If that is true (that the process really produces normally distributed drawings), then mathematics can be used to prove that the sample mean is an unbiased estimator of the population mean. It turns out that the "adjusted sample standard deviation", which is calculated using N-1 rather than N (where N is the sample size) is an unbiased estimator of the population standard deviation.
So you can get an unbiased estimate of the population standard deviation from a sample of any size with at least two observations, but of course the estimate will be more reliable for greater sample sizes.
any actual living population, say of Caucasians, is thought of as having been drawn independently, one at a time, from some imagined process such that the drawings are characterized by that formula for a normal probability density function.
I can't get my head around that statement.
what does it mean?
I'm talking about the distinction between the population mean and the sample mean. If you flip a fair coin, you know for certainty that the population mean is 1/2 heads. That comes from the definition of "fair coin". But if you flip such a coin 20 times and calculate the fraction of time that you got a head, you'll get a number of the form X/20, where X will perhaps be 10 but might also be 9 or 11 or 8 or 12... The number "X/20" is called the sample mean. It is an unbiased estimator of the population mean, which we know to be 1/2.
We do not know the population mean for the IQ of sub-Sahara Africans. So we test 1000 of them, and calculate the sample mean IQ. We interpret that number as an estimate of what the true mean IQ is for sub-Sahara Africans. We don't know the true mean, and we don't know what that process is. Is it DNA? Is it the water or the food? Is it the mosquito bites? We don't know. That's why I refer to "some imagined process" (hidden from us) that is determining the outcomes that we observe.
The process that determines coin toss outcomes is not hidden from us because we (think that we) know the physics involved.
When estimating a distribution that you assume to be normal, 1000 is a huge sample size. The benefits to increasing sample size pretty much disappear after you reach 100 and have become unimportant after you reach only 20 or so.
All of my remarks use a 95% confidence level, which is customary in science.
Well, I am a trained social scientist. That means that I am able to, and deeply interested in, how society works and how it can be improved. I spent nine months working for a couple of research economists (James Smith and Finis Welch) at the RAND Corporation in Santa Monica who were studying what determines male/female and black/white earnings differences. I was their programmer, and my job was to run the regression analyses on the decennial U.S. Census data and on the yearly U.S. Current Population Survey data.
As I just stated in a new blog post, I am blogging about this topic primarily to test whether speech is tolerated unconditionally here on steemit.
Sampling his conquered population must certainly have been fun for Gengis Khan, and, as they say, "size does matter".
Slightly more seriously, any actual living population, say of Caucasians, is thought of as having been drawn independently, one at a time, from some imagined process such that the drawings are characterized by that formula for a normal probability density function.
If that is true (that the process really produces normally distributed drawings), then mathematics can be used to prove that the sample mean is an unbiased estimator of the population mean. It turns out that the "adjusted sample standard deviation", which is calculated using N-1 rather than N (where N is the sample size) is an unbiased estimator of the population standard deviation.
So you can get an unbiased estimate of the population standard deviation from a sample of any size with at least two observations, but of course the estimate will be more reliable for greater sample sizes.
any actual living population, say of Caucasians, is thought of as having been drawn independently, one at a time, from some imagined process such that the drawings are characterized by that formula for a normal probability density function.
I can't get my head around that statement.
what does it mean?
I'm talking about the distinction between the population mean and the sample mean. If you flip a fair coin, you know for certainty that the population mean is 1/2 heads. That comes from the definition of "fair coin". But if you flip such a coin 20 times and calculate the fraction of time that you got a head, you'll get a number of the form X/20, where X will perhaps be 10 but might also be 9 or 11 or 8 or 12... The number "X/20" is called the sample mean. It is an unbiased estimator of the population mean, which we know to be 1/2.
We do not know the population mean for the IQ of sub-Sahara Africans. So we test 1000 of them, and calculate the sample mean IQ. We interpret that number as an estimate of what the true mean IQ is for sub-Sahara Africans. We don't know the true mean, and we don't know what that process is. Is it DNA? Is it the water or the food? Is it the mosquito bites? We don't know. That's why I refer to "some imagined process" (hidden from us) that is determining the outcomes that we observe.
The process that determines coin toss outcomes is not hidden from us because we (think that we) know the physics involved.
that leads to the question.
Assuming 1000 is an accurate statistical sample
which accurately provides theconfidence level that you will accept.
has such testing been done?
When estimating a distribution that you assume to be normal, 1000 is a huge sample size. The benefits to increasing sample size pretty much disappear after you reach 100 and have become unimportant after you reach only 20 or so.
All of my remarks use a 95% confidence level, which is customary in science.
95% confidence level, which is customary in science.
Now I know where you are coming from.
Well, I am a trained social scientist. That means that I am able to, and deeply interested in, how society works and how it can be improved. I spent nine months working for a couple of research economists (James Smith and Finis Welch) at the RAND Corporation in Santa Monica who were studying what determines male/female and black/white earnings differences. I was their programmer, and my job was to run the regression analyses on the decennial U.S. Census data and on the yearly U.S. Current Population Survey data.
As I just stated in a new blog post, I am blogging about this topic primarily to test whether speech is tolerated unconditionally here on steemit.