Javascript must be enabled to continue!

Enhancing Learning in Oil & Gas: SPE Podcast Analytics with Open-Source Tools

Abstract Expertise in oil and gas operations is essential for professional success. The Society of Petroleum Engineers (SPE) podcasts serve as an excellent resource for individual learning. Currently, there are over a hundred podcasts that have been published. This project employs web scraping, data mining, and interacting with open-source large language models (LLM) to assess podcast insights and develop a podcast recommendation system with the goal of enhancing individual career success. The study began by scraping all SPE podcast recording files using a web scraper browser plugin. The speech files were subsequently transcribed into text with an open-source deep learning algorithm. All metadata for each podcast was obtained using an open-source Python tool. The analysis is subsequently executed and published as an open-source GitHub project. A LINE chatbot has been established to facilitate user interaction with the LLM model and podcast data, enabling the retrieval of podcast insights and subsequent listening recommendations.  Over one hundred SPE podcast transcripts were analyzed. The podcast began publication in 2019 and ceased near the end of 2020 due to the influence of COVID-19. It continued once again from early 2023 till the present. One outlier was identified by looking at the number of downloads for each podcast; this was the one that was released around the end of 2020, prior to the COVID-19 pandemic. The podcast's downloads surged to 4,600, in contrast to the usual count of under 2,000. Transcribed texts were tokenized and analyzed via natural language processing techniques. Supervised machine learning models have been created to estimate which variable or text in the podcast or title most significantly influences the number of downloads. The substance of the transcript appears to have the most influence. Word cloud analysis indicates that technology is the predominant subject discussed in most podcasts, followed by drilling, production, reservoir, storage, and automation. Cluster analysis is performed using the Silhouette score, with the primary terms in the transcribed texts emphasized for each cluster. The primary cluster terms are organized as a list for input into the LLM application model. Finally, a LINE chatbot interface utilizing LLM technology has been developed to enhance user interaction. This study employs a comprehensive data mining and text analysis methodology on more than one hundred released SPE podcasts. A recommendation system employing the LINE chatbot API has been implemented. All libraries utilized in this study are open-source, and the project is made available for the benefit of everyone through GitHub. The author welcomed future collaboration and maintenance for the benefit of the Society of Petroleum Engineers (SPE).

SPE

Suradech Kongkiatpaiboon

SPE/IATMI Asia Pacific Oil & Gas Conference and Exhibition

2025

Title: Enhancing Learning in Oil & Gas: SPE Podcast Analytics with Open-Source Tools

Description:

Abstract Expertise in oil and gas operations is essential for professional success.

The Society of Petroleum Engineers (SPE) podcasts serve as an excellent resource for individual learning.

Currently, there are over a hundred podcasts that have been published.

This project employs web scraping, data mining, and interacting with open-source large language models (LLM) to assess podcast insights and develop a podcast recommendation system with the goal of enhancing individual career success.

The study began by scraping all SPE podcast recording files using a web scraper browser plugin.

The speech files were subsequently transcribed into text with an open-source deep learning algorithm.

All metadata for each podcast was obtained using an open-source Python tool.

The analysis is subsequently executed and published as an open-source GitHub project.

A LINE chatbot has been established to facilitate user interaction with the LLM model and podcast data, enabling the retrieval of podcast insights and subsequent listening recommendations.

  Over one hundred SPE podcast transcripts were analyzed.

The podcast began publication in 2019 and ceased near the end of 2020 due to the influence of COVID-19.

It continued once again from early 2023 till the present.

One outlier was identified by looking at the number of downloads for each podcast; this was the one that was released around the end of 2020, prior to the COVID-19 pandemic.

The podcast's downloads surged to 4,600, in contrast to the usual count of under 2,000.

Transcribed texts were tokenized and analyzed via natural language processing techniques.

Supervised machine learning models have been created to estimate which variable or text in the podcast or title most significantly influences the number of downloads.

The substance of the transcript appears to have the most influence.

Word cloud analysis indicates that technology is the predominant subject discussed in most podcasts, followed by drilling, production, reservoir, storage, and automation.

Cluster analysis is performed using the Silhouette score, with the primary terms in the transcribed texts emphasized for each cluster.

The primary cluster terms are organized as a list for input into the LLM application model.

Finally, a LINE chatbot interface utilizing LLM technology has been developed to enhance user interaction.

This study employs a comprehensive data mining and text analysis methodology on more than one hundred released SPE podcasts.

A recommendation system employing the LINE chatbot API has been implemented.

All libraries utilized in this study are open-source, and the project is made available for the benefit of everyone through GitHub.

The author welcomed future collaboration and maintenance for the benefit of the Society of Petroleum Engineers (SPE).

Back

1. Introduction</strong&...

North Syrian Mortaria and Other Late Roman Personal and Utility Objects Bearing Inscriptions of Good Luck

ΠΗΛΙΝΑ ΙΓ&Delta...

Morphometry of an hexagonal pit crater in Pavonis Mons, Mars

Introduction: Pit craters are peculiar depressions found in almost every terrestria...

Un manoscritto equivocato del copista santo Theophilos († 1548)

ΕΝΑ ΛΑΝ&...

A Touch of Space Weather - Outreach project for visually impaired students

'A Touch of Space Weather' is a project that brings space weather science into...

Ballistic landslides on comet 67P/Churyumov–Gerasimenko

Introduction:The slow ejecta (i.e., with velocity lower than escape velocity) and l...

Stress transfer process in doublet events studied by numerical TREMOL simulations: Study case Ometepec 1982 Doublet.

<pre class="western">Earthquake doublets are a characteristic rupture &lt...

Effects of a new land surface parametrization scheme on thermal extremes in a Regional Climate Model

The EFRE project Big Data@Geo aims at providing high resolution <span&...

Email:
Password:

Email:

Enhancing Learning in Oil & Gas: SPE Podcast Analytics with Open-Source Tools

Related Results