Javascript must be enabled to continue!

Leveraging Deep Learning and Natural Language Processing for hydrogeological insights from borehole logs

The advent of extensive digital datasets coupled with advancements in artificial intelligence (AI) is revolutionizing our ability to extract meaningful insights from complex patterns in natural sciences. In this context, the targeted classification of textual descriptions, particularly those detailing the granulometry of unconsolidated sediments or the fracturing state of rock masses, combining supervised deep learning and natural language processing (NLP) is a promising method to refine large-scale geological and hydrogeological models by enriching them with increased data volume.Several databases are replete with qualitative geological data such as borehole logs, which, while abundant, are not readily assimilated into quantitative hydrogeological modeling due to the extensive time required to process the written descriptions into operationally significant units like hydrofacies. This conversion typically necessitates expert analysis of each report but can be expedited through the application of NLP techniques rooted in AI.The primary objectives of this research are twofold: (i) to develop a robust classification model that leverages geological descriptions alongside grain size data, and (ii) to standardize a vast array of sparse and heterogeneous stratigraphic log data for integration into large-scale hydrogeological applications.The Po River alluvial plain in northern Italy (45,700 km²) serves as the pilot area for this study due to the homogeneous shallow subsurface geology, the dense borehole coverage and the availability of a pre-labelled training set. This research demonstrates the conversion of qualitative geological information from a very large dataset of stratigraphic logs (encompassing 387,297 text descriptions from 39,265 boreholes), into a dataset of semi-quantitative information. This transformation, primed for hydrogeological modeling, is facilitated by an operational classification system using a deep learning-based NLP algorithm to categorize complex geological and lithostratigraphic text descriptions according to grain size-based hydrofacies. A supervised text classification algorithm, founded on a Long-Short Term Memory (LSTM) architecture was meticulously developed, trained and validated using 86,611 pre-labelled entries encompassing all sediment types within the study region. The word embedding technique enhanced the model accuracy and learning efficiency by quantifying the semantic distances among geological terms.The outcome of this work is a novel dataset of semi-quantitative hydrogeological information, boasting a classification model accuracy of 97.4%. This dataset was incorporated into expansive modeling frameworks, enabling the assignment of hydrogeological parameters based on grain size data, integrating the uncertainty stemming from misclassification. This has markedly increased the spatial density of available information from 0.34 data points/km² to 8.7 data points/km². The study findings align closely with the existing literature, offering a robust spatial reconstruction of hydrofacies at different scales. This has significant implications for groundwater research, particularly in the realm of quantitative modeling at a regional scale.

Copernicus GmbH

Alberto Previati Valerio Silvestri Giovanni Crosta

2025

Title: Leveraging Deep Learning and Natural Language Processing for hydrogeological insights from borehole logs

Description:

In this context, the targeted classification of textual descriptions, particularly those detailing the granulometry of unconsolidated sediments or the fracturing state of rock masses, combining supervised deep learning and natural language processing (NLP) is a promising method to refine large-scale geological and hydrogeological models by enriching them with increased data volume.

Several databases are replete with qualitative geological data such as borehole logs, which, while abundant, are not readily assimilated into quantitative hydrogeological modeling due to the extensive time required to process the written descriptions into operationally significant units like hydrofacies.

This conversion typically necessitates expert analysis of each report but can be expedited through the application of NLP techniques rooted in AI.

The primary objectives of this research are twofold: (i) to develop a robust classification model that leverages geological descriptions alongside grain size data, and (ii) to standardize a vast array of sparse and heterogeneous stratigraphic log data for integration into large-scale hydrogeological applications.

The Po River alluvial plain in northern Italy (45,700 km²) serves as the pilot area for this study due to the homogeneous shallow subsurface geology, the dense borehole coverage and the availability of a pre-labelled training set.

This research demonstrates the conversion of qualitative geological information from a very large dataset of stratigraphic logs (encompassing 387,297 text descriptions from 39,265 boreholes), into a dataset of semi-quantitative information.

This transformation, primed for hydrogeological modeling, is facilitated by an operational classification system using a deep learning-based NLP algorithm to categorize complex geological and lithostratigraphic text descriptions according to grain size-based hydrofacies.

A supervised text classification algorithm, founded on a Long-Short Term Memory (LSTM) architecture was meticulously developed, trained and validated using 86,611 pre-labelled entries encompassing all sediment types within the study region.

The word embedding technique enhanced the model accuracy and learning efficiency by quantifying the semantic distances among geological terms.

The outcome of this work is a novel dataset of semi-quantitative hydrogeological information, boasting a classification model accuracy of 97.

4%.

This dataset was incorporated into expansive modeling frameworks, enabling the assignment of hydrogeological parameters based on grain size data, integrating the uncertainty stemming from misclassification.

This has markedly increased the spatial density of available information from 0.

34 data points/km² to 8.

7 data points/km².

The study findings align closely with the existing literature, offering a robust spatial reconstruction of hydrofacies at different scales.

This has significant implications for groundwater research, particularly in the realm of quantitative modeling at a regional scale.

Back

<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...

Bayesian Algorithm Opens Way to Wellbore Stability

Abstract Breakouts provide valuable information with respect to evaluation of maximum horizontal stress magnitude and also verification of the geomechanical model...

Učinak poučavanja razrednomu jeziku u izobrazbi nastavnika njemačkoga

The actual use of classroom language is principally limited to the classroom environment. As far as foreign language learning is concerned, the classroom often turns out to be the ...

Hydrogeological Change in Borehole Damage Zone (BDZ) by Expanding Diameter of Borehole

This presentation describes the results of a study on the change in hydrogeological properties that can occur in natural barriers after excavating a disposal hole. We performed a s...

Drilling-Induced Fractures in Borehole Walls

Summary Drilling-induced fractures in borehole walls are investigated by ring tests, flow tests, and microscopic studies. Each drilling method producescharacteris...

Hydrogeological map of Albania at a scale of 1:200,000, principles of compilation and content – a document of Albanian pioneering hydrogeological research since the 1960s

The organized hydrogeological investigations in Albania started in 1959, while general hydrogeological prospecting started there in 1963 and finished in 1974. One of the hydrogeolo...

Applications Of Acoustic Image Logs

Abstract Acoustic image logs have been acquired in the Barua/Motatan and Mara fields as a part of the information acquisition program implemented by Maraven, S.A....

A New Workﬂow for Estimating Reservoir Properties With Gradient Boosting Model and Joint Inversion Using MWD Measurements

Triple-combo logs are important measurements for estimating geological, petrophysical, and geomechanical properties. Unfortunately, wireline and advanced logging-while-drilling (LW...

Email:
Password:

Email:

Leveraging Deep Learning and Natural Language Processing for hydrogeological insights from borehole logs

Related Results