After you have done the analyses, it will turn out that an effect you predicted is not there. The data will show no difference, and no support for the hypothesis. Perhaps the result is even negative: you may find the opposite of what you predicted. Then you will ask yourself: why? Here’s a menu of options, and a list of things you can do before you know the results to make a lack of support for a hypothesis into an informative rejection.

Preventing honest mistakes

Sometimes the results may be different from the effect you predicted because there is a mistake in your code. You coded a variable in the wrong direction, forgot to change the polarity of negatively formulated items before you constructed a scale, or left the value 99999 in the data, instead of treating it as a missing value. You can prevent this from happening by taking a quick break or asking someone else to check your coding. After hours of work and with so many lines of code in front of you, it’s hard to spot mistakes.

Another possibility is that you formulated the prediction incorrectly. Perhaps you made a mistake in writing up the hypothesis. You had omitted the word ‘not’, mixed up ‘positive’ and ‘negative’, or forgot to accept tracked changes in the revised draft. Only if that was an honest error in your writing, it’s okay to reformulate the prediction after you’ve seen the results. You can prevent this from happening by asking someone else to check your writing. After you’ve been reading your own text for the 4th time you may notice mistakes any more, but a fresh pair of eyes will spot them immediately. Such as the omission of the word ‘not’ in the previous sentence.

You recognize an honest language error if fixing it doesn’t force you to rewrite the entire theory section to arrive at the observed result. That is what you can’t do once you know the results – it would be ‘harking’. If you see the results and you are tempted to change the theory section to make it consistent with the observed results you have a rejection on your hands. At first, this may feel like a failure. Let me assure you that it is not. Read more about that here.

Strengthening the research design

The most common reason why results turn out to be different from predictions is some flaw in the research design. You didn’t think through all the details of the analysis before you did them. If you really believe in your reasoning, this will be the area in which you seek answers to the question why results turned out to be different than you expected. Knowing this before you see the results means you have to invest more in the research design before you conduct the analyses. It is better to think about them ex ante than post hoc and ad hoc. With a waterproof research design, negative or null-results must imply that the hypothesis is incorrect. The implication is that you have to invest in the best research design possible, because only true negatives are informative rejections. Here’s a list of common issues and ways to prevent them.

1. Non-linear associations

Most theories assume linear associations between variables: the higher the value of X, the higher the value of Y; and conversely, the lower X gets, the lower Y gets. This assumption may not be justified when the the functional form of the association is non-linear. For instance, increases in Y may only emerge after X reaches some threshold, like ice melts suddenly at zero degrees Celsius. Another possibility is that increases in X are associated with increases in Y at decreasing rates. For instance, increases in income are initially associated with large increases in well-being, but with additional increases in income, well-being increases less. The very poor are much less well-off than the not so very poor, while the ultra-rich are only marginally more happy than the very rich.

To prevent misspecification of associations, consider the possibility that the association is non-linear before you do the analysis. Rewrite the hypothesis to more accurately reflect the non-linearity you expect. Even if you expect a linear association, it is always good to plot the data before you conduct statistical tests. If you find that the empirical association is in fact non-linear, introduce a non-linear term in the analysis that captures the functional form of the association.

2. Power failure

The effect you’re looking for is likely to be weak. In this case, you need a large number of observations to study it and draw firm conclusions about the results. Most effects in the social sciences are weak. That means that a one standard deviation increase in the predictor variable X is associated with a smaller than one third of a standard deviation change in the dependent variable Y. You may have learned about effect sizes from a textbook discussing Cohen’s rules of thumb, starting from d = 0.5. These rules are from the dark ages. We now know that the effect sizes you see in the published literature are greatly inflated.

At small numbers of observations, outliers can greatly influence the results. You can reduce the effects of outliers by winsorizing them, for instance at the 99th percentile. This is better than to drop them from the analysis altogether. Whatever you choose, think about the strategy to deal with outliers before you analyze the data, and report results both with and without the outlier treatment.

Another problem that can affect the results is multicollinearity. When two predictors are strongly correlated, you need a large number of cases to get robust results.

To prevent power failure, strive for the highest possible number of observations. Especially if you have hypotheses about interactions between variables – such as the effect is larger for group A than for group B – you need more cases than you would think. One way to increase power is to collect multiple datasets containing the effect you’re interested in, and pool them into one analysis. If you have only one dataset containing the measures you need, use the maximum number of observations in the dataset by multiple imputation of missing values.

3. Subgroup differences

The effect you’re looking for may be present or stronger for some cases in the analysis, but absent, weaker or even have an opposite sign for other cases. In jargon: the effect may be heterogeneous. An example is that information about the effectiveness of charities may encourage charitable giving for persons who have confidence in charities, but does not encourage giving for those who are less trusting.

A third variable moderates the effect: for some contexts, groups, individuals the effect is there, but not for others. When there are subgroup differences in the effect, the composition of the sample for your analysis matters. If the sample contains fewer observations from groups for which the effect is larger, the effect you’re looking for will appear to be weaker than it is. Obviously it could also work the other way around. You may be lucky with the sample when it contains more observations from groups for which the effect is larger. In this case the effect you’re observing is stronger than it really is.

4. Countervailing powers

The effect you are looking for may be confounded by other effects. In this case, you need a strategy to remove the confounding influences. For example, positive discrimination of women may enhance their chances to obtain a better position in the labor market, but once employers attribute the success of women to positive discrimination it may hurt them in the future.

Another example is when a third variable suppresses the effect that you’re looking for. Measuring it and including it in the analysis as a covariate may uncover the relationship you’re looking for. In the relationship between education and income for instance, age is likely to suppress the association. Higher educated persons earn higher incomes, and older persons also earn higher incomes. Because persons from more recent birth years are more highly educated, omitting birth year (or age) in a regression of income on education will give you a smaller coefficient than when you put birth year (or age) into the analysis.

In experiments, interventions may have multiple effects. The manipulation of x had some counteracting effect that reduced the predicted effect. If the effect of the manipulation is the same for all participants, and there is no identifiable subgroup for whom the countervailing effect does not occur, there is nothing you can do to mitigate the effect of confounding.

5. Low measurement reliability

The effect you’re looking for may be biased by measurement error. As soon as the predictor variable or the dependent variable contains error, the estimated association will be surrounded by noise, and be less precise. If measurement error is not systematic, the point estimate of the association will be okay, but the standard error will be large. More accurate measurement increases the precision of estimates. This holds for both the predictor variable as well as for the dependent variable. Suppose you’re interested in the effect of income on subjective well-being. If the dependent variable is measured with only one item, e.g., “generally speaking, how satisfied are you with your life?” the precision of the estimate will be smaller than when you have a multiple item measure of subjective well-being.

If the measurement error is systematic, it will affect your point estimates. For example, suppose you’re looking for the association between income and political preference. If data on income are inaccurate for some participants with very high incomes because they do not want to report how rich they are, and they reported it to be lower than it actually is, the association with political preference for instance will appear to be less strong than it actually is. At the bottom of the distribution participants may have exaggerated their income to be higher than it actually is. The association between political preference and income will then appear to be weaker than it actually is because observations at the extremes of the income distribution are missing.

To prevent measurement error, it is usually better to combine information from multiple measures into one composite score. Perhaps participants do not know their exact income, or do not want to report it, but they may report whether their income is higher or lower than average, or (in)sufficient to make ends meet.

6. The difference is not an effect

The effect you’re looking for may not be a causal effect of X on Y, but merely a difference in Y for units with different values of X. that is visible in an analysis of differences between individuals, groups or contexts, but not within. The difference emerges because of reverse causality or third variables. An example of reversed causality is the presumed effect of volunteering on generalized social trust, which turns out to be a selection effect of trust on volunteering.

7. The timing is off

The effect you’re looking for may occur over a different period of time than your research design can capture. For example, if charitable giving has a warm glow effect and increases happiness, it is not captured by a regression of current happiness on giving in the past calendar year, and you should measure happiness immediately after observing an act of giving.

8. The intervention didn’t work

The effect you’re looking for may not have occurred because the intervention did not work. In experiments, you can include a manipulation check to see whether the manipulation affected participants in the predicted manner. Suppose that you’re looking for the effect of gratitude on giving, and you design an assignment to increase gratitude: you ask participants to describe a recent event that made them feel grateful. After the writing task you give them an opportunity to donate money to a charitable cause. Before you conclude that feelings of gratitude do not enhance giving, you need to be sure that the writing task actually increased feelings of gratitude. If that is indeed the case, the finding that participants assigned to the gratitude writing task are equally generous as people who described another recent event is much more powerful. Without the manipulation check, the null-finding is not informative. It could be explained by inattentiveness or low commitment of participants, or a failure of the writing task to move gratitude.