Multi-Modal Augmentation for Large Language Models with Applications to Task-Oriented Dialogues

Samarinas, Chris; Promthaw, Pracha; Lekhwani, Rahul; Mysore, Sheshera; Huang, Sung Ming; Nijasure, Atharva; Zeng, Hansi; Zamani, Hamed

Citation Details

We introduce MarunaBot V2, an advanced Task-Oriented Dialogue System (TODS) primarily aimed at aiding users in cooking and Do-It-Yourself tasks. We utilized large language models (LLMs) for data generation and inference, and implemented hybrid methods for intent classification, retrieval, and question answering, striking a balance between efficiency and performance. A key feature of our system is its multi-modal capabilities. We have incorporated a multi-modal enrichment technique that uses a fine-tuned CLIP model to supplement recipe instructions with pertinent images, a custom Diffusion model for image enhancement and generation, and a method for multi-modal option matching. A unique aspect of our system is its user-centric development approach, facilitated by a custom tool for tracking user interactions and swiftly integrating feedback. For a demonstration of our system, visit https://youtu.be/4MNI-puv_eE. more »

Award ID(s):: 2143434

PAR ID:: 10610101

Author(s) / Creator(s):: Samarinas, Chris; Promthaw, Pracha; Lekhwani, Rahul; Mysore, Sheshera; Huang, Sung Ming; Nijasure, Atharva; Zeng, Hansi; Zamani, Hamed

Publisher / Repository:: 2nd Proceedings of Alexa Prize TaskBot (Alexa Prize 2023)

Date Published:: 2023-10-03

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this