Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Automatic summarization of Malayalam documents using clause identification method

View through CrossRef
<span>Text summarization is an active research area in the field of natural language processing. Huge amount of information in the internet necessitates the development of automatic summarization systems. There are two types of summarization techniques: Extractive and Abstractive. Extractive summarization selects important sentences from the text and produces summary as it is present in the original document. Abstractive summarization systems will provide a summary of the input text as is generated by human beings. Abstractive summary requires semantic analysis of text. Limited works have been carried out in the area of abstractive summarization in Indian languages especially in Malayalam. Only extractive summarization methods are proposed in Malayalam. In this paper, an abstractive summarization system for Malayalam documents using clause identification method is proposed. As part of this research work, a POS tagger and a morphological analyzer for Malayalam words in cricket domain are also developed. The clauses from input sentences are identified using a modified clause identification algorithm. The clauses are then semantically analyzed using an algorithm to identify semantic triples - subject, object and predicate. The score of each clause is then calculated by using feature extraction and the important clauses which are to be included in the summary are selected based on this score. Finally an algorithm is used to generate the sentences from the semantic triples of the selected clauses which is the abstractive summary of input documents.</span>
Title: Automatic summarization of Malayalam documents using clause identification method
Description:
<span>Text summarization is an active research area in the field of natural language processing.
Huge amount of information in the internet necessitates the development of automatic summarization systems.
There are two types of summarization techniques: Extractive and Abstractive.
Extractive summarization selects important sentences from the text and produces summary as it is present in the original document.
Abstractive summarization systems will provide a summary of the input text as is generated by human beings.
Abstractive summary requires semantic analysis of text.
Limited works have been carried out in the area of abstractive summarization in Indian languages especially in Malayalam.
Only extractive summarization methods are proposed in Malayalam.
In this paper, an abstractive summarization system for Malayalam documents using clause identification method is proposed.
As part of this research work, a POS tagger and a morphological analyzer for Malayalam words in cricket domain are also developed.
The clauses from input sentences are identified using a modified clause identification algorithm.
The clauses are then semantically analyzed using an algorithm to identify semantic triples - subject, object and predicate.
The score of each clause is then calculated by using feature extraction and the important clauses which are to be included in the summary are selected based on this score.
Finally an algorithm is used to generate the sentences from the semantic triples of the selected clauses which is the abstractive summary of input documents.
</span>.

Related Results

Envisioning Originalism Applied to Bioethics Cases
Envisioning Originalism Applied to Bioethics Cases
Photo ID 123697425 © Alexandersikov | Dreamstime.com Abstract Originalism is an increasingly prevalent method for interpreting provisions of the US Constitution. It requires strict...
Automatic Text Summarization Berdasarkan Pendekatan Statistika pada Dokumen Berbahasa Indonesia
Automatic Text Summarization Berdasarkan Pendekatan Statistika pada Dokumen Berbahasa Indonesia
Abstract—Propelled by the modern technological innovations data and text will be more abundant throughout the year. With this much text, automatic text summarization is needed now ...
SUBORDINATE CLAUSES IN ADULTERY NOVEL
SUBORDINATE CLAUSES IN ADULTERY NOVEL
A subordinate clause (dependent clause) is a clause that cannot stand alone as a complete sentence because it does not express a complete thought. It explains and gives more inform...
IIST BCI Dataset-1 for Selected Common Malayalam Words
IIST BCI Dataset-1 for Selected Common Malayalam Words
Designing Brain Computer Interfaces (BCIs), for helping patients, needs appropriate datasets which are relevant for the language of the patients. There exists a significant shortag...
Advancements in Automatic Text Summarization using Natural Language Processing
Advancements in Automatic Text Summarization using Natural Language Processing
With the rapid expansion of data across various domains, the need for automated text summarization has become increasingly crucial. Given the overwhelming volu...
Quantifying clause chains in Nungon texts
Quantifying clause chains in Nungon texts
AbstractClause chains are sequences of clauses with under-specified verbal predicates, plus a single clause with a fully-specified verbal predicate. Clause chains represent the mor...
Phrase and Clause Usage in Michael Jackson’s Song Lyrics
Phrase and Clause Usage in Michael Jackson’s Song Lyrics
This research aims to find out the usage of phrase and clause types that are found in Michael Jackson song-lyrics such as Billie Jean, Beat It, Black or White, Dirty Diana, Earth S...
Mapping the Queer Body: Queer Tropes and Malayalam Cinema
Mapping the Queer Body: Queer Tropes and Malayalam Cinema
The narrative logic of mainstream Malayalam cinema is often predicated on heteronormative values and homophobic social practices. Though representations of LGBTQIA+ (lesbian, gay, ...

Back to Top