Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Towards automated recipe genre classification using semi-supervised learning

View through CrossRef
Sharing cooking recipes is a great way to exchange culinary ideas and provide instructions for food preparation. However, categorizing raw recipes found online into appropriate food genres can be challenging due to a lack of adequate labeled data. In this study, we present a dataset named the “Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking Recipe Dataset” that contains two million culinary recipes labeled in respective categories with extended named entities extracted from recipe descriptions. This collection of data includes various features such as title, NER, directions, and extended NER, as well as nine different labels representing genres including bakery, drinks, non-veg, vegetables, fast food, cereals, meals, sides, and fusions. The proposed pipeline named 3A2M+ extends the size of the Named Entity Recognition (NER) list to address missing named entities like heat, time or process from the recipe directions using two NER extraction tools. 3A2M+ dataset provides a comprehensive solution to the various challenging recipe-related tasks, including classification, named entity recognition, and recipe generation. Furthermore, we have demonstrated traditional machine learning, deep learning and pre-trained language models to classify the recipes into their corresponding genre and achieved an overall accuracy of 98.6%. Our investigation indicates that the title feature played a more significant role in classifying the genre.
Title: Towards automated recipe genre classification using semi-supervised learning
Description:
Sharing cooking recipes is a great way to exchange culinary ideas and provide instructions for food preparation.
However, categorizing raw recipes found online into appropriate food genres can be challenging due to a lack of adequate labeled data.
In this study, we present a dataset named the “Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking Recipe Dataset” that contains two million culinary recipes labeled in respective categories with extended named entities extracted from recipe descriptions.
This collection of data includes various features such as title, NER, directions, and extended NER, as well as nine different labels representing genres including bakery, drinks, non-veg, vegetables, fast food, cereals, meals, sides, and fusions.
The proposed pipeline named 3A2M+ extends the size of the Named Entity Recognition (NER) list to address missing named entities like heat, time or process from the recipe directions using two NER extraction tools.
3A2M+ dataset provides a comprehensive solution to the various challenging recipe-related tasks, including classification, named entity recognition, and recipe generation.
Furthermore, we have demonstrated traditional machine learning, deep learning and pre-trained language models to classify the recipes into their corresponding genre and achieved an overall accuracy of 98.
6%.
Our investigation indicates that the title feature played a more significant role in classifying the genre.

Related Results

A Cookbook of Her Own
A Cookbook of Her Own
Introduction The recipe is more than just a list of ingredients and the instructions on how to prepare a particular dish. Recipes also are, as Janet Floyd and Laurel Foster argu...
Evaluation of an antimalarial herbal mixture and each extract for DNA and chromosomal mutations in Swiss albino mice and Allium cepa cells
Evaluation of an antimalarial herbal mixture and each extract for DNA and chromosomal mutations in Swiss albino mice and Allium cepa cells
Toxicological evaluation of herbal medicines is necessary because of possible adverse effects that may be associated with their consumption. This study screened antimalarial herbal...
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...
MDT: semi-supervised medical image segmentation with mixup-decoupling training
MDT: semi-supervised medical image segmentation with mixup-decoupling training
Abstract Objective . In the field of medicine, semi-supervised segmentation algorithms hold crucial research significance...
Studies on Preparation of Mango Pickle from Different Genotypes of Akola Maharashtra Region
Studies on Preparation of Mango Pickle from Different Genotypes of Akola Maharashtra Region
Mango relishes are extremely popular throughout the Asian continent. In every Indian household, they are the condiment most frequently ingested. It is rich in antioxidants and cont...
Violin miniature in creativity by Liudmila Shukailo: features of the genre interpretation
Violin miniature in creativity by Liudmila Shukailo: features of the genre interpretation
Background. Rapidness of information flows of contemporary life enforces to concentrate a significant amount of information in small formats. This fact meaningfully increases socia...
Semi-supervised learning: a brief review
Semi-supervised learning: a brief review
Most of the application domain suffers from not having sufficient labeled data whereas unlabeled data is available cheaply. To get labeled instances, it is very difficult because e...

Back to Top