Javascript must be enabled to continue!

AUTOMATING CYBER THREAT INTELLIGENCE EXTRACTION USING NATURAL LANGUAGE PROCESSING TECHNIQUES

The increasing negligence and complexity of online confrontations have made it abundantly clear that an organization must place a premium on real-time, ready-to-use, and expandable Cyber Threat Intelligence (CTI) strategies. The classical approach to CTI collection and analysis that heavily involves manual work over raw unstructured text-based data including threat reports, blogs, and advisories cannot keep up with the requirements of current cybersecurity threats. In this study, an intermediate form of Natural Language Processing (NLP) framework is introduced utilizing the state-of-the-art transformer models, namely fine-tuned versions of BERT architectures, and syntactic dependency parsing and domain-specific rule-based post-processing to automate CTI extraction. The dataset of more than 5,000 cybersecurity documents was created with a custom label that allows the system to extract the strongest threat entities such as names of malware, CVEs, IP addresses, threat actors, and TTPs. As experimental comparisons prove the proposed system vastly surpasses the existing BiLSTM-CRF and traditional CRF baselines scoring 0.90 F1-score in entity recognition. Error analysis also showed that syntactic and rule-based enhancements produced a big difference in entity fragmentation and false positives. The paper also investigates how preprocessing or data source quality and the process of entity links to external knowledge bases can aid in the optimal extraction of CTI. The findings demonstrate the promise of using advanced NLP methods to revolutionize CTI processes to perform more accurate, faster, and scalable threat intelligence processing to support proactive cybersecurity defense.

Kashf Institute of Development & Studies

Amjad Jumani Amber Baig Engr. Dr. Shamim Akhtar Muhammad Shahmir Shamim Hira Zaheer Areej Changaiz

Kashf Journal of Multidisciplinary Research

2025

Title: AUTOMATING CYBER THREAT INTELLIGENCE EXTRACTION USING NATURAL LANGUAGE PROCESSING TECHNIQUES

Description:

The classical approach to CTI collection and analysis that heavily involves manual work over raw unstructured text-based data including threat reports, blogs, and advisories cannot keep up with the requirements of current cybersecurity threats.

In this study, an intermediate form of Natural Language Processing (NLP) framework is introduced utilizing the state-of-the-art transformer models, namely fine-tuned versions of BERT architectures, and syntactic dependency parsing and domain-specific rule-based post-processing to automate CTI extraction.

The dataset of more than 5,000 cybersecurity documents was created with a custom label that allows the system to extract the strongest threat entities such as names of malware, CVEs, IP addresses, threat actors, and TTPs.

As experimental comparisons prove the proposed system vastly surpasses the existing BiLSTM-CRF and traditional CRF baselines scoring 0.

90 F1-score in entity recognition.

Error analysis also showed that syntactic and rule-based enhancements produced a big difference in entity fragmentation and false positives.

The paper also investigates how preprocessing or data source quality and the process of entity links to external knowledge bases can aid in the optimal extraction of CTI.

The findings demonstrate the promise of using advanced NLP methods to revolutionize CTI processes to perform more accurate, faster, and scalable threat intelligence processing to support proactive cybersecurity defense.

Back

<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...

An Empirical Study on Cyber Crimes Against Women and Children in India

The aim of the study is to understand the Cyber-crimes against women and Children in India for a period of five years from 2017 to 2021. The study is based on Secondary data collec...

Cyber operational risk scenarios for insurance companies

Abstract Cyber Operational Risk: Cyber risk is routinely cited as one of the most important sources of operational risks facing organisations today, in various publications and ...

Increased life expectancy of heart failure patients in a rural center by a multidisciplinary program

Abstract Funding Acknowledgements Type of funding sources: None. INTRODUCTION Patients with heart failure (HF)...

Cyber Espionage

Cyberspace gives rise to risks as well as opportunities, and a prominent threat emerging from this domain is cyber espionage. Because no internationally and legally recognized defi...

ThreatBased Security Risk Evaluation in the Cloud

Research ProblemCyber attacks are targeting the cloud computing systems, where enterprises, governments, and individuals are outsourcing their storage and computational resources f...

METHODS OF EXTRACTING CYBERSECURITY OBJECTS FROM ELECTRONIC SOURCES USING ARTIFICIAL INTELLIGENCE

B a c k g r o u n d . The rapid development of information technology (IT) has led to new threats and challenges in the field of cybersecurity. Cyber warfare has become a reality a...

The challenges of cybersecurity insurance development: The case of Latvia

Purpose. This paper aims to provide an overview of the current challenges of cybersecurity insurance, focusing on the identification of development constraints and opportunities an...

Email:
Password:

Email:

AUTOMATING CYBER THREAT INTELLIGENCE EXTRACTION USING NATURAL LANGUAGE PROCESSING TECHNIQUES

Related Results