The ability to accurately predict cancer rates is hardly likely to help you decide how to reduce it. However, knowing that smoking leads to higher risk of cancer is valuable information - because if you decrease smoking rates e. Looking at the problem this way, I would think that explanatory modelling would mainly focus on variables which are in control of the user, either directly or indirectly.
There may be a need to collect other variables, but if you can't change any of the variables in the analysis, then I doubt that explanatory modelling will be useful, except maybe to give you the desire to gain control or influence over those variables which are important. Predictive modelling, crudely, just looks for associations between variables, whether controlled by the user or not.
Is it true that exercising regularly say 30 minutes per day leads to lower blood pressure? To answer this question we may collect data from patients about their exercise regimen and their blood pressure values over time. The goal is to see if we can explain variations in blood pressure by variations in exercise regimen. Blood pressure is impacted by not only exercise by wide variety of other factors as well such as amount of sodium a person eats etc.
These other factors would be considered noise in the above example as the focus is on teasing out the relationship between exercise regimen and blood pressure. When doing a predictive exercise, we are extrapolating into the unknown using the known relationships between the data we have at hand. If I exercise 1 hour per day to what extent is my blood pressure likely to drop? To answer this question, we may use a previously uncovered relationship between blood pressure and exercise regimen to perform the prediction.
In the above context, the focus is not on explanation, although an explanatory model can help with the prediction process. There are also non-explanatory approaches e. One practical issue that arises here is variable selection in modelling. A variable can be an important explanatory variable e. I see this mistake almost every day in published papers.
Another difference is in the distinction between principal components analysis and factor analysis. PCA is often used in prediction, but is not so useful for explanation. FA involves the additional step of rotation which is done to improve interpretation and hence explanation. There is a nice post today on Galit Shmueli's blog about this. For example, home loans may be strongly related to GDP but that isn't much use for predicting future home loans unless we also have good predictions of GDP.
Here is a deck of slides that I use in my data mining course to teach linear regression from both angles. Even with linear regression alone and with this tiny example various issues emerge that lead to different models for explanatory vs. A classic example that I have seen is in the context of predicting human performance.
Thus, if you put self-efficacy into a multiple regression along with other variables such as intelligence and degree of prior experience, you often find that self-efficacy is a strong predictor. This has lead some researchers to suggest that self-efficacy causes task performance.
And that effective interventions are those which focus on increasing a person's sense of self-efficacy. However, the alternative theoretical model sees self-efficacy largely as a consequence of task performance. In this framework interventions should focus on increasing actual competence and not perceived competence.
Thus, including a variable like self-efficacy might increase prediction, but assuming you adopt the self-efficacy-as-consequence model, it should not be included as a predictor if the aim of the model is to elucidate causal processes influencing performance. This of course raises the issue of how to develop and validate a causal theoretical model.
This clearly relies on multiple studies, ideally with some experimental manipulation, and a coherent argument about dynamic processes. I've seen similar issues when researchers are interested in the effects of distal and proximal causes. Proximal causes tend to predict better than distal causes. However, theoretical interest may be in understanding the ways in which distal and proximal causes operate. Finally, a huge issue in social science research is the variable selection issue.
In any given study, there is an infinite number of variables that could have been measured but weren't. Thus, interpretation of models need to consider the implications of this when making theoretical interpretations.
Two Cultures by L. Breiman is, perhaps, the best paper on this point. His main conclusions see also the replies from other prominent statisticians in the end of the document are as follows:.
I haven't read her work beyond the abstract of the linked paper, but my sense is that the distinction between "explanation" and "prediction" should be thrown away and replaced with the distinction between the aims of the practitioner, which are either "causal" or "predictive". In general, I think "explanation" is such a vague word that it means nearly nothing. For example, is Hooke's Law explanatory or predictive?
On the other end of the spectrum, are predictively accurate recommendation systems good causal models of explicit item ratings? I think we all share the intuition that the goal of science is explanation, while the goal of technology is prediction; and this intuition somehow gets lost in consideration of the tools we use, like supervised learning algorithms, that can be employed for both causal inference and predictive modeling, but are really purely mathematical devices that are not intrinsically linked to "prediction" or "explanation".
Having said all of that, maybe the only word that I would apply to a model is interpretable. Regressions are usually interpretable; neural nets with many layers are often not so. I think people sometimes naively assume that a model that is interpretable is providing causal information, while uninterpretable models only provide predictive information.
This attitude seems simply confused to me. I am still a bit unclear as to what the question is. Having said that, to my mind the fundamental difference between predictive and explanatory models is the difference in their focus.
By definition explanatory models have as their primary focus the goal of explaining something in the real world. In most instances, we seek to offer simple and clean explanations. By simple I mean that we prefer parsimony explain the phenomena with as few parameters as possible and by clean I mean that we would like to make statements of the following form: Given these goals of simple and clear explanations, explanatory models seek to penalize complex models by using appropriate criteria such as AIC and prefer to obtain orthogonal independent variables either via controlled experiments or via suitable data transformations.
The goal of predictive models is to predict something. Thus, they tend to focus less on parsimony or simplicity but more on their ability to predict the dependent variable. However, the above is somewhat of an artificial distinction as explanatory models can be used for prediction and sometimes predictive models can explain something.
With respect, this question could be better focused. Have people ever used one term when the other was more appropriate? Sometimes it's clear enough from context, or you don't want to be pedantic. Sometimes people are just sloppy or lazy in their terminology.
This is true of many people, and I'm certainly no better. What's of potential value here discussing explanation vs. In short, the distinction centers on the role of causality. If you want to understand some dynamic in the world, and explain why something happens the way it does, you need to identify the causal relationships amongst the relevant variables.
To predict, you can ignore causality. For example, you can predict an effect from knowledge about its cause; you can predict the existence of the cause from knowledge that the effect occurred; and you can predict the approximate level of one effect by knowledge of another effect that is driven by the same cause.
Why would someone want to be able to do this? To increase their knowledge of what might happen in the future, so that they can plan accordingly. For example, a parole board may want to be able to predict the probability that a convict will recidivate if paroled. However, this is not sufficient for explanation. Of course, estimating the true causal relationship between two variables can be extremely difficult. In addition, models that do capture what are thought to be the real causal relationships are often worse for making predictions.
So why do it, then? First, most of this is done in science, where understanding is pursued for its own sake. Second, if we can reliably pick out true causes, and can develop the ability to affect them, we can exert some influence over the effects. With regard to the statistical modeling strategy, there isn't a large difference.
Primarily the difference lies in how to conduct the study. If your goal is to be able to predict, find out what information will be available to users of the model when they will need to make the prediction. Information they won't have access to is of no value. If they will most likely want to be able to predict at a certain level or within a narrow range of the predictors, try to center the sampled range of the predictor on that level and oversample there. For instance, if a parole board will mostly want to know about criminals with 2 major convictions, you might gather info about criminals with 1, 2, and 3 convictions.
On the other hand, assessing the causal status of a variable basically requires an experiment. That is, experimental units need to be assigned at random to prespecified levels of the explanatory variables.
If there is concern about whether or not the nature of the causal effect is contingent on some other variable, that variable must be included in the experiment. If it is not possible to conduct a true experiment, then you face a much more difficult situation, one that is too complex to go into here. Brad Efron, one of the commentators on The Two Cultures paper, made the following observation as discussed in my earlier question:. Prediction by itself is only occasionally sufficient.
The post office is happy with any method that predicts correct addresses from hand-written scrawls. Peter Gregory undertook his study for prediction purposes, but also to better understand the medical basis of hepatitis.
Most statistical surveys have the identification of causal factors as their ultimate goal. Medicine place a heavy weight on model fitting as explanatory process the distribution, etc. Other fields are less concerned with this, and will be happy with a "black box" model that has a very high predictive success. This can work its way into the model building process as well. Most of the answers have helped clarify what modeling for explanation and modeling for prediction are and why they differ.
What is not clear, thus far, is how they differ. So, I thought I would offer an example that might be useful. Suppose we are intereted in modeling College GPA as a function of academic preparation. As measures of academic preparation, we have:. If the goal is prediction, I might use all of these variables simultaneously in a linear model and my primary concern would be predictive accuracy.
Whichever of the variables prove most useful for predicting College GPA would be included in the final model. If the goal is explanation, I might be more concerned about data reduction and think carefully about the correlations among the independent variables.
My primary concern would be interpreting the coefficients. In a typical multivariate problem with correlated predictors, it would not be uncommon to observe regression coefficients that are "unexpected".
Given the interrelationships among the independent variables, it would not be surprising to see partial coefficients for some of these variables that are not in the same direction as their zero-order relationships and which may seem counter intuitive and tough to explain. This is not a problem for prediction, but it does pose problems for an explanatory model where such a relationship is difficult to interpret.
This model might provide the best out of sample predictions but it does little to help us understand the relationship between academic preparation and College GPA.
Instead, an explanatory strategy might seek some form of variable reduction, such as principal components, factor analysis, or SEM to:. Strategies such as these might reduce the predictive power of the model, but they may yield a better understanding of how Academic Preparation is related to College GPA.
Predictive modeling is what happens in most analyses. For example, a researcher sets up a regression model with a bunch of predictors. The regression coefficients then represent predictive comparisons between groups. The predictive aspect comes from the probability model: The purpose of this model is to predict new outcomes for units emerging from this superpopulation.
Often, this is a vain objective because things are always changing, especially in the social world. Or because your model is about rare units such as countries and you cannot draw a new sample. The usefulness of the model in this case is left to the appreciation of the analyst.
When you try to generalize the results to other groups or future units, this is still prediction but of a different kind. We may call it forecasting for example. The key point is that the predictive power of estimated models is, by default, of descriptive nature. You compare an outcome across groups and hypothesize a probability model for these comparisons, but you cannot conclude that these comparisons constitute causal effects.
The reason is that these groups may suffer from selection bias. Ie, they may naturally have a higher score in the outcome of interest, irrespective of the treatment the hypothetical causal intervention.
Or they may be subject to a different treatment effect size than other groups. This is why, especially for observational data, the estimated models are generally about predictive comparisons and not explanation. Explanation is about the identification and estimation of causal effect and requires well designed experiments or thoughtful use of instrumental variables.
In this case, the predictive comparisons are cut from any selection bias and represent causal effects. The model may thus be regarded as explanatory. I found that thinking in these terms has often clarified what I was really doing when setting up a model for some data.
We can learn a lot more than we think from Black box "predictive" models. In this sense even a purely predictive model can provide explanatory insights. This is a point that is often overlooked or misunderstood by the research community. Just because we do not understand why an algorithm is working doesn't mean the algorithm lacks explanatory power Overall from a mainstream point of view, probabilityislogic's succinct reply is absolutely correct There is distinction between what she calls explanatory and predictive applications in statistics.
She says we should know every time we use one or another which one exactly is being used. She says we often mix them up, hence conflation. I agree that in social science applications, the distinction is sensible, but in natural sciences they are and should be the same.
Problem formulation Test your knowledge Lesson 3: Research objectives Test your knowledge Lesson 4: Synopsis Test your knowledge Lesson 5: Meeting your supervisor Getting started: Where to search Searching for articles Searching for Data Databases provided by your library Other useful search tools Test your knowledge Lesson 2: How to search Free text, truncating and exact phrase Combining search terms — Boolean operators Keep track of your search strategies Problems finding your search terms?
Test your knowledge Lesson 3: Evaluating sources Different sources, different evaluations Extract by relevance Test your knowledge Lesson 4: Obtaining literature Literature search: Qualitative and quantitative methods Combining qualitative and quantitative methods Collecting data Analysing data Strengths and limitations Test your knowledge Lesson 2: Empirical studies Explanatory, analytical and experimental studies Strengths and limitations Test your knowledge Lesson 3: Summary Project management Project management Lesson 1: Project Initiation Project budgeting Test your knowledge Lesson 2: Project execution Project control Project management: Summary Writing process Writing process Lesson 1: Structure your thesis Title page, abstract, foreword, abbreviations, table of contents Introduction, methods, results Discussion, conclusions, recomendations, references, appendices, layout Test your knowledge Lesson 2: Avoid plagiarism Use citations correctly Use references correctly Bibliographic software Test your knowledge Writing process — summary.
An example of a classic case control study. Your name Your friend's e-mail Message Note: Page 2 of 4.
Explore the research methods terrain, read definitions of key terminology, and discover content relevant to your research methods journey.
Causal research, also known as explanatory research is conducted in order to identify the extent and nature of cause-and-effect relationships. Causal research can be conducted in order to assess impacts of specific changes on existing norms, various processes etc.
Explanatory research is defined as an attempt to connect ideas to understand cause and effect, meaning researchers want to explain what is going on. Explanatory research looks at . Explanatory, analytical and experimental studies. Explanatory, analytical and experimental studies Explain Why a phenomenon is going on; Can be used for hypothesis testing.
Qualitative research is designed to explore the human elements of a given topic, while specific qualitative methods examine how individuals see and experienc. Explanatory Research is the research whose primary purpose is to explain why events occur to build, elaborate, extend or test theory.