On the Significance of Replication in Social Sciences


I want to argue once again here that replication is essential in the construction of scientific knowledge. But, what do I mean by replication exactly? I imply that we need to promote the replication of relevant and internally valid studies both in similar and different environments. Unfortunately, in social sciences in general, and in economics in particular, there is now excessive confidence in the knowledge gathered by a single study in a particular environment, perhaps as a result of a misconstruction of the virtues of experimentation in social sciences. As Donald T. Campbell once wrote (1969):

“Too many social scientists expect single experiments to settle issues once and for all. This may be a mistaken generalization from the history of great crucial experiments in physics and chemistry. In actually the significant experiments in the physical sciences are replicated thousands of times…. Because we social scientists have less ability to achieve “experimental isolation,” because we have good reasons to expect our treatment effects to interact significantly with a wide variety of social factors many of which we have not yet mapped, we have much greater needs for replication experiments than do physical sciences”.

In general, we may not presume that an estimated causal relationship is universally true in the sense that it holds under all conditions with all types of people and in any circumstance. All causal statements are inevitably contingent. Thus it is undoubtedly useful to learn as much as possible about these contingencies and, where possible, identify the relationships that hold more consistently than others. 

Any causal study faces two sources of threats to its validity: internal and external (see Campbell, 1957). Most of the research efforts are normally devoted to deal with the different threats to internal validity. This refers to whether one can validly draw the inference that, within the context of the study, the average differences in the outcome studied were caused by the differences in the relevant explanatory variables. External validity, instead, is concerned with the extent to which a causal relationship holds over variations in persons, settings and time. Thus, whenever it is possible, once an identification strategy for a causal construct is deemed reasonably valid internally, it is worth inquiring about the external validity of the results obtained. As noted by Fisher (1935) in his seminal work:

“Any given conclusion… has a wider inductive base when inferred from an experiment in which the quantities of other ingredients have been varied, than it would have from any amount of experimentation in which these had been kept strictly constant”.

Related to external validity is the idea of causal generalization, which is concerned with specifying the range of application of a causal mechanism that has been identified with at least one instance of a treatment and outcome and at least one sample of persons and settings. In practice, there is a sense in which all causal generalization is about interpolation and extrapolation. Such exercise inevitably relies on social science theory. Rubin (1992) suggests that causal generalization is about estimating a response surface, i.e., mapping a third variable to an estimated causal relationship. Even though this is practically difficult to attain, a response surface is a useful way to think about causal generalization (see Shadish, Cook and Campbell, 2002).

In this vein, Cruces and Galiani (2007) investigate the extent to which the cause-and-effect construct identified by Angrist and Evans (1998) can be generalized to the context of two developing countries where, compared to the US, fertility was known to be higher and female education levels were lower. Thus, they investigate whether in such different socioeconomic environments childbearing also leads to a reduction in female labor supply. They find that the estimates for the US can be generalized both qualitatively and quantitatively to Mexico and Argentina. I believe this is a really important result in that helps to generalize the effect identified in the original study to different populations.

In the same spirit, Galiani et al. (2014) provide empirical evidence on the causal effects that upgrading slum dwellings has on the living conditions of the extremely poor in El Salvador, Mexico and Uruguay. This paper experimentally evaluates the impact of a housing project run by the NGO TECHO which provides basic pre-fabricated houses to members of extremely poor population groups in Latin America. The main objective of the program is to improve household well-being. The findings of the study show that better houses have a positive effect on overall housing conditions and general subjective well-being (with the effects being larger in El Salvador, where the counterfactual situation was worse). In two out of the three countries, the research team also document improvements in children’s health. What is more, the one case in which these improvements do not seem to have health effects among children is the one in which the experiment took place in a better, more urbanized environment in which services were more accessible. There are no other noticeable robust effects on the possession of durable goods or in terms of labor outcomes. Thus, the results of this study are unusually robust in terms of both internal and external validity because they are derived from similar experiments in three different Latin American countries.

Accordingly, I believe there is a potentially high reward in replicating valid empirical strategies of relevant cause-and-effect constructs and this kind of study should receive more attention both among academics but also among policy-makers. Ultimately, the external validity of causal estimates is established by replication in new data sets (Angrist, 2003). In addition, external replication of reasonably valid identification strategies would lead, to the extent that it is possible, to causal generalization. In conclusion, replication studies, combined with social science theory, would broaden our knowledge about our cause-and-effect constructs of interest.


Angrist, J. (2003): “Treatment Effect Heterogeneity in Theory and Practice”, NBER WP 9708, Cambridge, MA, US.

Angrist, J. and W. Evans (1998): “Children and their Parents’ Labor Supply: Evidence from Exogenous Variation in Family Size”, American Economic Review 88(3), pp.450-77.

Campbell, D. T. (1969): “Reforms as experiments”, American Psychologist 24, pp. 409-29.

Campbell, D. T. (1957): “Factors relevant for the validity of experiments in social settings”, Psychological Bulletin 54, pp. 297-312.

Cruces, G. and S. Galiani (2007): Fertility and female labor supply in Latin America: New causal evidence, Labour Economics, Volume 14, 2007, pages 565-73.

Fisher, R. (1935): The Designs of Experiments, Oliver and Boyd, London.

Galiani, S., P. Gertler, R. Cooper, S. Martinez, A. Ross and R. Undurraga (2014): “Shelter from the storm: Upgrading housing infrastructure in Latin America Slums”, Mimeo.

Rubin, D. (1992): “Meta-Analysis: Literature Synthesis or Effect-Size Surface Estimation?”, Journal of Educational Statistics 17, pp. 363-74.

Shadish, W., T. Cook and D. Campbell (2002): Experimental and Quasi-Experimental Designs for Generalized Causal Inference, Houghton Mifflin Company, New York.

Share this