Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

SynoExtractor: A Novel Pipeline for Arabic Synonym Extraction Using Word2Vec Word Embeddings

View through CrossRef
Automatic synonym extraction plays an important role in many natural language processing systems, such as those involving information retrieval and question answering. Recently, research has focused on extracting semantic relations from word embeddings since they capture relatedness and similarity between words. However, using word embeddings alone poses problems for synonym extraction because it cannot determine whether the relation between words is synonymy or some other semantic relation. In this paper, we present a novel solution for this problem by proposing the SynoExtractor pipeline, which can be used to filter similar word embeddings to retain synonyms based on specified linguistic rules. Our experiments were conducted using KSUCCA and Gigaword embeddings and trained with CBOW and SG models. We evaluated automatically extracted synonyms by comparing them with Alma’any Arabic synonym thesauri. We also arranged for a manual evaluation by two Arabic linguists. The results of experiments we conducted show that using the SynoExtractor pipeline enhances the precision of synonym extraction compared to using the cosine similarity measure alone. SynoExtractor obtained a 0.605 mean average precision (MAP) for the King Saud University Corpus of Classical Arabic with 21% improvement over the baseline and a 0.748 MAP for the Gigaword corpus with 25% improvement. SynoExtractor outperformed the Sketch Engine thesaurus for synonym extraction by 32% in terms of MAP. Our work shows promising results for synonym extraction suggesting that our method can also be used with other languages.
Title: SynoExtractor: A Novel Pipeline for Arabic Synonym Extraction Using Word2Vec Word Embeddings
Description:
Automatic synonym extraction plays an important role in many natural language processing systems, such as those involving information retrieval and question answering.
Recently, research has focused on extracting semantic relations from word embeddings since they capture relatedness and similarity between words.
However, using word embeddings alone poses problems for synonym extraction because it cannot determine whether the relation between words is synonymy or some other semantic relation.
In this paper, we present a novel solution for this problem by proposing the SynoExtractor pipeline, which can be used to filter similar word embeddings to retain synonyms based on specified linguistic rules.
Our experiments were conducted using KSUCCA and Gigaword embeddings and trained with CBOW and SG models.
We evaluated automatically extracted synonyms by comparing them with Alma’any Arabic synonym thesauri.
We also arranged for a manual evaluation by two Arabic linguists.
The results of experiments we conducted show that using the SynoExtractor pipeline enhances the precision of synonym extraction compared to using the cosine similarity measure alone.
SynoExtractor obtained a 0.
605 mean average precision (MAP) for the King Saud University Corpus of Classical Arabic with 21% improvement over the baseline and a 0.
748 MAP for the Gigaword corpus with 25% improvement.
SynoExtractor outperformed the Sketch Engine thesaurus for synonym extraction by 32% in terms of MAP.
Our work shows promising results for synonym extraction suggesting that our method can also be used with other languages.

Related Results

Installation Analysis of Matterhorn Pipeline Replacement
Installation Analysis of Matterhorn Pipeline Replacement
Abstract The paper describes the installation analysis for the Matterhorn field pipeline replacement, located in water depths between 800-ft to 1200-ft in the Gul...
Pipeline Resistance
Pipeline Resistance
Pipeline resistance is where an often abstract and wonky climate movement meets the bravery and boldness of Indigenous and other frontline defenders of land and water who inspire d...
Learned Text Representation for Amharic Information Retrieval and Natural Language Processing
Learned Text Representation for Amharic Information Retrieval and Natural Language Processing
Over the past few years, word embeddings and bidirectional encoder representations from transformers (BERT) models have brought better solutions to learning text representations fo...
A Fluid-pipe-soil Approach to Stability Design of Submarine Pipelines
A Fluid-pipe-soil Approach to Stability Design of Submarine Pipelines
Abstract The conventional approach to submarine pipeline stability design considers interactions between water and pipeline (fluid-pipe) and pipeline and seabed (...
test
test
Feature extraction has transformed the field of Natural Language Processing (NLP) by providing an effective way to represent linguistic features. Various techniques are utilised fo...
Identification of Review Helpfulness Using Novel Textual and Language-Context Features
Identification of Review Helpfulness Using Novel Textual and Language-Context Features
With the increase in users of social media websites such as IMDb, a movie website, and the rise of publicly available data, opinion mining is more accessible than ever. In the rese...
قصيد”اللغة العربية تنعى حظها بين أهلها“ لحافظ ابراهيم: دراسة تحليلية
قصيد”اللغة العربية تنعى حظها بين أهلها“ لحافظ ابراهيم: دراسة تحليلية
Many Languages are spoken in the world. The diversity of human languages and colors are sign of Allah, for those of knowledge (Al-Quran, 30:22). Although the Arabic language origin...
Seismic Vulnerability of the Subsea Pipeline
Seismic Vulnerability of the Subsea Pipeline
Abstract Unburied marine pipeline vulnerability under seismic impact, a new approach of investigation, and conclusion / recommendations for certain analyzed cases...

Back to Top