Data Driven Models…The Tide of Opportunity


“There is a tide in the affairs of men which, taken at the flood, leads on to fortune; Omitted, all the voyage of their life is bound in shallows and in miseries.”
William Shakespeare (Julius Caesar)

It’s hard to escape the chatter about “big data”. Evangelists talk of volume, velocity and variety of data across business verticals and the impact that big data will have on traditional analytics. Certainly, surfacing trends and identifying relationships and correlations in a multivariate, multivariant and multidimensional system provides food for thought. But does big data without business context provide any knowledge that enriches upstream decision-making?

The technique to bring context to analysis is to develop data-driven models (DDM). As part of a series, this article will examine the development and deployment of DDM in the oil and gas industry. In the next article, we shall discuss in more detail how data-driven models are changing the world. Oil and gas companies have much to learn from other industries. Not only are data-driven models making the workplace more efficient, they are enabling Generation Y to take innovation to another level.

Navigating murky waters
So the upstream O&G industry generates big data. Great! But running analytical workflows across these data to garner information requires engineers across all disciplines to step into a reality not dictated by ingrained postulates and axioms dressed as first principles. Mother Nature is fickle. No matter how hard theoretical physicists and geoscientists strive to explain the fundamentals that drive our observations, the actual always deviates from the plan. We must consider how our engineering principles relate to the complex systems and different environments of the global oil and gas reservoirs.

Engineers have nurtured a familiarity with the upstream environment to a degree that emboldens definitive strategies to recover hydrocarbons from some of the most complex reservoirs; and with an historical success that enables them to fill the gaps in their knowledge. But they cannot know the reservoir, the contextual wrapper. Thus it is impossible to determine cause and effect unless observed in a laboratory where an artificial environment controls the context. To attain that contextual understanding, we must apply data-driven models (DDM), marrying interpretation with the non-determinism of soft computing methodologies.

What is a DDM?
Let’s start with a definition of data-driven models. Computational intelligence has matured over the past decades, primarily in the area of machine learning. This has greatly enhanced the competences behind empirical modeling. The school of thought that embraces these new approaches is coined Data Driven Modeling (DDM). It is focused on analyzing the total data within a system. One of the central tenets in DDM is to surface connections between the system state variables (input and output) without definitive knowledge of the physical behavior of the system. This methodology advances the horizons beyond conventional empirical modeling to entertain contributions from super-imposed spheres of study. Thus we have evolved through history from a time when empirical models underpinned science to a few centuries ago when the scientific landscape was populated by purely theoretical bastions of thought.

Moving on from computational branches of science, we now stand at the crossroads of a data intensive exploratory analysis that is ideal for DDM. In the oil and gas industry, we are generating numerical simulations of a reservoir without fully appreciating the contextual wrapper that permeates in the subsurface and reflects the dynamic behavior of fluids in a porous medium. The principle of a DDM is that unquantified behaviors in neighboring systems can be observed in the data.

There is a fundamental difference between a discrete data point and how we choose to analyze the specific measurement. Do we consider the value on its own? Or within the context of how a process or equipment performs? Or as an observation point within a system that is responding to various inputs and producing desired outputs?

The first principles of any system are important, but in order to create valuable analytical models we must also understand that there are relationships within the data that can be observed from a systematic viewpoint and can be further developed to define models that describe and allow control over desired outcomes. We would suggest that the solution lies somewhere in the middle. A hybrid model that is constrained by First Principles but defined by data observations can provide an analytically sound model of the system that can be operationalized within the real-world. Consider the complexity of a subsurface process like drilling.

Figure 1: Drilling System Subsurface Complexity

Figure 1: Drilling System Subsurface Complexity

The point is not to replace fundamental scientific observations but to enhance our knowledge by using a DDM in conjunction with them to enhance the information and put the big data into that real world contextual wrapper.

Example of DDMs outside of O&G
Data-driven models offer incredible value, both economic and social, owing to their inherent knowledge impact that potentially transforms the way we communicate, understand our environment, perform our jobs and basically run our lives. There is an extensive array of previously incomprehensible applications of DDMs currently being implemented that have an innovative influence on people’s lives. Let us touch upon one such example in the fraud detection space.

Insurance companies are exposed to the slings and arrows of criminals focused on fraudulent activity. In fact, it can cost the insurance industry in the realm of $80 billion each year. CNA is one of the country’s largest commercial lines carriers that combats and invariably witnesses fraud in 1 of every 10 claims they process. Implementing predictive models across each line of their business CNA are able to score all the claims and identify those that can be deemed as high alerts based on historical signatures in the submitted data, be it structured or unstructured.

This cases study illustrates the benefits of developing data-driven models built on historical events garnered from big data. In real-time as new claims are being processed, the data is fed into the DDMs enabling CNA to observe the ratio of cases flagged for possible fraud increase by some 8%, only 2 years after implementation. This resulted in recovered or prevented fraudulent activity that totaled $6.4 million.

Example of DDMs inside of O&G
The oil and gas industry tends to lag behind other industries in advanced technology adoption. But the full gamut of possible DDM applications in E&P are only limited by an engineer’s imagination. One example that underlines the advantages of implementing a DDM in the unconventional assets is found in the Pinedale anticlinal structure in Wyoming, as detailed in SPE paper 135523.

Faced with the analysis of disparate subsurface data generated from over 50 fluvial sand packages in a single wellbore drilled in 5000 feet of vertical section, the operator wanted to comprehend well performance potential and variability across the geologic structure. Such knowledge is imperative to optimize reservoir development and well completion strategies, with an emphasis on evaluating an optimized hydraulic fracture package.

A neural network was chosen among many soft-computing techniques to evaluate both reservoir parameters as well as variables that are controlled by the operator such as proppant volume and flowback methods. Input data included general information from 211 wells and 2399 stages, PLT data, stimulation treatment data, petrophysical data for formations and sands, flowback data, proppant type, proppant volume, and well production data. The purpose of the analysis was to identify patterns among the poor and exceptional wells and thus appreciate the decreasing trend of stage production performance over time, identifying those factors that have most impact on production. The disparate data systems had to be aggregated to produce a managed robust data set conducive to effective exploratory data analysis. These two important steps are essential to identify plausible and efficient hypotheses worth testing and steer the decision makers towards sound modeling techniques.

Data clustering was used to create different models and assess different parameters. Models were able to identify the relative impact of the most significant variables affecting stage production performance, and develop probability distributions for potential outcomes at different categories of production. The probability distributions provided a basis for completion optimization. Findings included identification of the impact of flowback procedures in total well performance, and the probable sensitivity to key geologic and petrophysical parameters that most affected performance.

Evaluation was conducted on 195 stages with 49 stages identified for increase in proppant volume. Through this process, the operator identified a need to update the DDM to include the impact of pressure depletion from down-spacing. Even in the absence of accounting for pressure depletion, the operator experienced excellent results. The analysis resulted in identifying a “very favorable” economic return at a gas price of $3.00 per Mscf/d, and break even at $2.60. The team has also identified seven stages that, if eliminated, can improve the economics significantly.

Thus E&P engineers can complement the traditional interpretation of their data with DDMs, implementing a suite of soft-computing techniques such as Artificial Neural Networks to garner insight through pattern matching. The data must be put into context and DDMs enable that context to be defined since the data tells the story.

So where does Generation Y fit in this equation?
Put simply, the very essence of a data-driven approach is to place the focus on the outcome and allow the mathematical complexity to remain fluid within the model. This aligns with Generation Y’s “technology as enabler” persona. The focus shifts from a depth of understanding behind the algorithms, towards consideration of how to enrich and enable the process with the insight that the DDM provides. As the bard himself would suggest, data-driven models are our tide of opportunity. If we enable Generation Y to use them, we can expect a flood of good fortune.

Resource: Petroleum Labour Market Information Report