When creating your study or evaluating the quality of previous research, pay attention to three aspects of the sampling procedure: (a) the composition of the target population; (b) the size of the sample, and (c) the representativeness of the sample with regard to the target population.

1. Does the sample provide good coverage of the target population?
Always specify the target population of your study. The target population is the universe of objects, situations or actors that the results should be informative about. What is the population that you want to make claims about? The sample you analyze should represent the target population as much as possible.
If you use date collected by others, look for statements about the target population, and which rules were followed to get to a representative sample of that population. In reports about experiments, authors often leave the target population unspecified. This means that the authors assume that the results based on less than a hundred students from their own university are representative for mankind. This is typically not the case: experiments often use convenience samples of ‘WEIRD’ participants: students from universities in Western, Industrialized, Rich and Developed countries (Henrich, Heine & Norenzayan, 2010). In addition to a description of the target population, look for a constraint on generality (COG) statement (Simons, Shoda & Lindsay, 2017). If you collect data yourself, discuss the limitations of generalization.

2. Is the sample large enough?
As a rule, larger samples are better than smaller ones because larger samples increase the statistical power to detect existing relationships among variables. There is no way you are going to be able to generalize from one case study to other contexts. In practice, time and money limitations restrict the possibility to collect data among large samples of observations. That is why you should conduct a power analysis before you start collecting data. Start with the smallest effect size of interest, and determine the number of observations required to detect it (for guidance: see Lakens, 2021; Perugini, Galluci, & Costantini, 2018).

An overwhelming majority of publications does not specify such an a priori power analysis. In most cases the number of observations in such studies are determined by funding limitations, practical circumstances or rules of thumb such as 50 observations per cell or 15 interviews. These factors have produced underpowered samples that do not allow for robust generalizations. Keep this in mind when you read previous research. The law of large numbers implies that small samples are more likely to produce chance findings. In addition, Small-n studies such as laboratory and field experiments with strongly significant results may have been ‘p-hacked’ – the non-significant results are not shown (Simonsohn, Nelson & Simmons, 2014).

While larger samples are generally to be preferred to smaller samples, there is also a downside of large samples: even very weak relationships will easily be significant. You may get impressed yourself by the many stars for significant relationships. Regardless of the size of the sample, however, the strength of a relationship (sometimes called ‘effect size’) is more important than its significance. Some (Ziliak & McCloskey, 2004) even say that if a relationship is not sizeable, significance does not matter (‘no size, no significance’). In any case, a strongly significant difference between males and females of one tenth of a standard deviation in a sample of 3 million individuals is less impressive than a marginally significant difference of one and a half standard deviation in a sample of 1,000 individuals. Thus the second rule is substance outweighs significance.

With small sample sizes, statistics can be misleading because they suggest a precision that is strongly sensitive to sampling composition and particularly to outliers. If you have a small sample size, reporting the proportion of that sample that has a certain characteristic is overprecise. As a rule of thumb I suggest you avoid publishing proportions and other derived statistics based on a number of observations lower than n = 15. When you have more observations, but fewer than 250, still exercise caution. Correlations stabilize roughly after 250 observations (Schönbrodt & Perugini, 2013). In multilevel analyses, the ‘more is better’ rule applies even more forcefully. A rule of thumb is that at least 25 observations are required for even the most simple linear models (Bryan & Jenkins, 2016). At this number, however, the variance at the higher level is still easily overestimated, and at least 50 observations are required to reduce this bias (Maas & Hox, 2005).

3. Is the sample representative of the target population?
When you work with a sample, specify the extent to which the sample represents the target population. In most cases, the representativeness is unknown. Reports on survey research data often describe samples as ‘nationally representative’. The question you should always ask is “Representative with respect to what?”

The rule here is that variance counts. The results of a study do not only depend on the size of the sample, but also on the variance in both the outcome as well as the predictor variables. Ask yourself: what parts of the target population have not been included?
To some extent, samples are not representative of the entire population because some parts are excluded by the design of the sampling frame: the institutionalized population and those who do not understand questions are typically left out. This is called coverage error. The imprisoned population, the rich, the very sick, mentally challenged, and those who do not speak the dominant language in a country are less likely to be included in samples.
In addition to coverage error, a second source of error is sampling error. When participation in interviews, surveys, or experiments is voluntary, samples of participants are typically consisting of individuals with above average intelligence, health, and civic-mindedness. As a result, those who are easier to reach and more willing to help science are overrepresented. Non-voters, citizens in remote areas, and persons in areas with inferior internet connectivity will be less well represented. When you work with data that others have collected, examine how much effort has been made and which strategies have been used to mitigate such risks.
As a rule, snowball or quota samples are not representative. Typically, samples in survey research are weighted on a few key demographic variables for which the true values in the population are known from registers such as gender, age, and place of residence. However, keep in mind that this procedure may yield highly inaccurate results if the weights are very low or high. Also bear in mind that non-response may be selective. If participants are selected based on their interest in the topic of the study, or on some other characteristic that is related to the independent or the dependent variable, the results are likely to be biased. This complicates accurate testing of hypotheses.

Avoid sampling on the dependent variable. If you’re a fundraiser and you ask a sample of donors why they are donating to your organization you will not learn much that helps you to recruit new donors. It will be more helpful to ask non-donors who considered donating why they chose not to do so, or to ask previous donors why they stopped giving.

As a general rule, looking only at successful cases results in survivorship bias, which can totally distort your findings (Brown, Goetzmann, Ibbotson & Ross, 1992). A somewhat milder form of looking at successful cases emerges from selection into the study based on factors that also influence the dependent variable – also known as collider bias (Elwert & Winship, 2014).
The issue of representativeness is particularly important in case study research. If you can only study a few cases, it is often impossible to select a set that is representative of all existing or possible cases. Therefore cases are often selected based on the value of the dependent variable, as in a study on the ‘best practices’ in a certain field (Seawright & Gerring, 2008; Gerring & McDermott, 2007). They can still be informative, especially if you ask informants about contrasting cases. When you speak to representatives of best practices, you could ask about decisions at crucial events that contributed to success: “why did you become a success, while others did not?” You can also ask about counterfactuals: “what practice, if taken away, would destroy your success?” Note that you can also ask these questions to representatives of worse practices.