Cross-modal recipe retrieval has gained prominence due to its ability to retrieve a text representation given an image representation and vice versa. Clustering these recipe representations based on similarity is essential to retrieve relevant information about unknown food images. Existing studies cluster similar recipe representations in the latent space based on class names. Due to inter-class similarity and intraclass variation, associating a recipe with a class name does not provide sufficient knowledge about recipes to determine similarity. However, recipe title, ingredients, and cooking actions provide detailed knowledge about recipes and are a better determinant of similar recipes. In this study, we utilized this additional knowledge of recipes, such as ingredients and recipe title, to identify similar recipes, emphasizing attention especially on rare ingredients. To incorporate this knowledge, we propose a knowledge-infused multimodal cooking representation learning network, Ki-Cook, built on the procedural attribute of the cooking process. To the best of our knowledge, this is the first study to adopt a comprehensive recipe similarity determinant to identify and cluster similar recipe representations. The proposed network also incorporates ingredient images to learn multimodal cooking representation. Since the motivation for clustering similar recipes is to retrieve relevant information for an unknown food image, we evaluated the ingredient retrieval task. We performed an empirical analysis to establish that our proposed model improves the Coverage of Ground Truth by 12% and the Intersection Over Union by 10% compared to the baseline models. On average, the representations learned by our model contain an additional 15.33% of rare ingredients compared to the baseline models. Owing to this difference, our qualitative evaluation shows a 39% improvement in clustering similar recipes in the latent space compared to the baseline models, with an inter-annotator agreement of the Fleiss kappa score of 0.35.
more »
« less
This content will become publicly available on August 6, 2026
Quantitative Evaluation of AI-Generated Recipes for Health Recommender Systems
The rise of generative Artificial Intelligence (AI) has created the possibility of presenting novel recipes, i.e., recipes that do not exactly match any known recipe and this has led to the creation of AI-based recipe recommendation systems. AI-based recipe recommendation has the possibility of accommodating a variety of preferences – including a person’s current health (e.g., diabetes), health goals (e.g., weight loss), taste preferences, cultural or ethical needs (e.g., vegan diet). However, unlike recipes recommended or created by a human dietitian, recipes created by generative AI do not guarantee accuracy, i.e., the generated recipe may not meet the requirements specified by the user. This work quantitatively evaluates how closely recipes generated by OpenAI’s GPT4 large language models, created in response to specific prompts, match known recipes in a collection of human-curated recipes. The prompts also include requests for a health condition, diabetes. The recipes are from the largest online community of home cooks sharing recipes (www.allrecipes.com) and the Mayo Clinic’s collection of diabetes meal plan recipes. Recipes from these sources are assumed to be authoritative and thus are used as ground truth for this evaluation. Quantitative evaluation using NLP techniques (Named Entity Recognition (NER) to extract each ingredient from the recipes and cosine similarity metrics) enable computing the quality of the AI results along a continuum. Our results show that the ingredients list in the AI-generated recipe matches 67-88% with the ingredients in the equivalent recipe in the ground truth database. The corresponding cooking directions match 64-86%. Ingredients in recipes generated by AI for diabetics match those in known recipes in our ground truth datasets at widely varying levels: between 26-83%. The quantitative evaluation is used to inform the development of a web-based personalized recipe recommendation system for diabetics that uses OpenAI’s GPT4 model for recipe generation.
more »
« less
- Award ID(s):
- 2125654
- PAR ID:
- 10647110
- Publisher / Repository:
- IEEE
- Date Published:
- Page Range / eLocation ID:
- 367 to 372
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
People increasingly use the Internet to make food-related choices, prompting research on food recommendation systems. Recently, works that incorporate nutritional constraints into the recommendation process have been proposed to promote healthier recipes. Ingredient substitution is also used, particularly by people motivated to reduce the intake of a specific nutrient or in order to avoid a particular category of ingredients due for instance to allergies. This study takes a complementary approach towards empowering people to make healthier food choices by simplifying the process of identifying plausible recipe substitutions. To achieve this goal, this work constructs a large-scale network of similar recipes, and analyzes this network to reveal interesting properties that have important implications to the development of food recommendation systems.more » « less
-
Abstract In the face of climate change, climate literacy is becoming increasingly important. With wide access to generative AI tools, such as OpenAI’s ChatGPT, we explore the potential of AI platforms for ordinary citizens asking climate literacy questions. Here, we focus on a global scale and collect responses from ChatGPT (GPT-3.5 and GPT-4) on climate change-related hazard prompts over multiple iterations by utilizing the OpenAI’s API and comparing the results with credible hazard risk indices. We find a general sense of agreement in comparisons and consistency in ChatGPT over the iterations. GPT-4 displayed fewer errors than GPT-3.5. Generative AI tools may be used in climate literacy, a timely topic of importance, but must be scrutinized for potential biases and inaccuracies moving forward and considered in a social context. Future work should identify and disseminate best practices for optimal use across various generative AI tools.more » « less
-
Generative AI models, specifically large language models (LLMs), have made strides towards the long-standing goal of text-to-code generation. This progress has invited numerous studies of user interaction. However, less is known about the struggles and strategies of non-experts, for whom each step of the text-to-code problem presents challenges: describing their intent in natural language, evaluating the correctness of generated code, and editing prompts when the generated code is incorrect. This paper presents a large-scale controlled study of how 120 beginning coders across three academic institutions approach writing and editing prompts. A novel experimental design allows us to target specific steps in the text-to-code process and reveals that beginners struggle with writ- ing and editing prompts, even for problems at their skill level and when correctness is automatically determined. Our mixed-methods evaluation provides insight into student processes and perceptions with key implications for non-expert Code LLM use within and outside of education.more » « less
-
Mental health stigma manifests differently for different genders, often being more associated with women and overlooked with men. Prior work in NLP has shown that gendered mental health stigmas are captured in large language models (LLMs). However, in the last year, LLMs have changed drastically: newer, generative models not only require different methods for measuring bias, but they also have become widely popular in society, interacting with millions of users and increasing the stakes of perpetuating gendered mental health stereotypes. In this paper, we examine gendered mental health stigma in GPT3.5-Turbo, the model that powers OpenAI’s popular ChatGPT. Building off of prior work, we conduct both quantitative and qualitative analyses to measure GPT3.5-Turbo’s bias between binary genders, as well as to explore its behavior around non-binary genders, in conversations about mental health. We find that, though GPT3.5-Turbo refrains from explicitly assuming gender, it still contains implicit gender biases when asked to complete sentences about mental health, consistently preferring female names over male names. Additionally, though GPT3.5-Turbo shows awareness of the nuances of non-binary people’s experiences, it often over-fixates on non-binary gender identities in free-response prompts. Our preliminary results demonstrate that while modern generative LLMs contain safeguards against blatant gender biases and have progressed in their inclusiveness of non-binary identities, they still implicitly encode gendered mental health stigma, and thus risk perpetuating harmful stereotypes in mental health contexts.more » « less
An official website of the United States government
