DNV GL, SINTEF, and Dr. Techn. Olav Olsen awarded 14 MNOK research project on causal data-driven models.
Correlation is not Causation! – Why does this matter now more than ever?
New advances in data-driven models and deep learning offer unprecedented insight into data and their correlations. This insight often captures effects difficult or impossible to extract from first-principle models of cause-and-effect. Still, observations and correlations can only inform based on what is in the data, not the causal effects in the systems producing the data. To understand the effect of changes to a system we need causal models. This is not a question of correlation or causation. To assure the safety of high-consequence systems we need to combine and master both to alleviate their shortcomings and build on their strengths.
Why can’t we answer some questions just by observing? Why is the revolution of deep learning, although extremely efficient in some ways, still profoundly dumb about cause and effect?
In his latest book, “The book of why”, Judea Pearl successfully explains this in a thrilling read on what he terms the “Causal revolution”. As he notes, all that a deep-learning program can do is fit a function to data, information about the effects of actions or interventions is simply not available in raw data, unless it is collected by controlled experimental manipulation. The difference between human cognition and any other species or machine, is our innate ability and propensity to ask one simple question: “Why?”.
DNV GL provide assurance that safety-critical systems are inherently safe to operate.
Is it safe?
It is obvious from the nature of this question that data alone cannot answer the question. When we are considering how any possible scenario can lead to a failure of the system it is clear that we have not experienced all possible ways a system can fail – especially when the system is a novel technology. The information we seek does not exist in the data!
Why is it safe?
Because our engineering combines the causal knowledge acquired through centuries of scientific questions of why, and our statistical ability to understand variance based on our observed experience.
This is combining causal knowledge, from first principles, and data-driven knowledge about variance, to answer counterfactual questions of “What if?” – which is really what any risk assessment is all about. Our bread and butter.
The many examples of the efficacy of pure data-driven models across many problems has created a hype of promises that anything can be solved if we have enough data. But, as Pearl clearly states, we cannot have data of counterfactuals – we cannot answer “what-if” from the data alone. We cannot answer questions like: “Would the Chernobyl nuclear powerplant have had a nuclear melt-down if the technicians had not performed the safety test the night between April 25th and 26th in 1986?” based on data alone. However, based on our current knowledge about the causal mechanisms of how the disaster happened, it seems like a series of events that, although not very likely to happen, was not sufficiently guarded against.
Our job as risk professionals is to ask these counterfactual questions. What if? Or, paraphrased in the words of the Harvard professor Steven Pinker: “Any educated decision maker should know what Bayesian reasoning is; a statistical decision theory of making decisions under uncertainty that trades off the harm of false alarms vs. the harm of misses. How to tell causation from correlation, and to avoid logical fallacies.” These are the tools of rationality we need to employ to ensure safety. We need to combine our causal knowledge with all relevant data to make sure that the best available, and most accurate, information is readily at hand for decision makers operating safety-critical systems – whether they are human or machines.
DNV GL together with SINTEF and Dr. Techn. Olav Olsen AS have successfully attracted funding from the Norwegian Research Council to run a 14 MNOK research project to boost our research on causal data-driven decision support. The RaPiD project (Reciprocal Physics and Data-driven models) aims to provide more specific, accurate and timely decision support in operation of safety-critical systems, by combining physics-based modelling with data-driven machine learning and probabilistic uncertainty assessment.
The main objective of the research project is to develop and document the methodologies and technologies needed to consistently combine physics-based and data-driven models to alleviate the deficiencies of both by capturing their complementary advantages.
To support this, the project will:
- Increase computational efficiency of advanced modelling tools by reduced-order modelling.
- Develop hybrid analysis and modeling, i.e., combine physical models with data-driven models.
- Reduce computational demand and increase safety by effective selection of relevant simulation scenarios based on probabilistic machine learning and risk aware objective functions.
- Tailoring and demonstrating the integrated modelling approach to selected use cases.
More information on background research can be found in Background
If you find this interesting and would like to discuss or contribute to the project, please contact:
Simen Eldevik, Principal research scientist – Risk, physics, and Machine learning, DNV GL.
Frank Børre Pedersen, Programme Director – Oil & Gas and Ocean Space, Group Technology and Research