Vision-Language Models for Design Concept Generation: An Actor–Critic Framework

Ghasemi, Parisa; Moghaddam, Mohsen

doi:10.1115/1.4067619

We introduce a novel actor-critic framework that utilizes vision-language models (VLMs) and large language models (LLMs) for design concept generation, particularly for producing a diverse array of innovative solutions to a given design problem. By leveraging the extensive data repositories and pattern recognition capabilities of these models, our framework achieves this goal through enabling iterative interactions between two VLM agents: an actor (i.e., concept generator) and a critic. The actor, a custom VLM (e.g., GPT-4) created using few-shot learning and fine-tuning techniques, generates initial design concepts that are improved iteratively based on guided feedback from the critic—a prompt-engineered LLM or a set of design-specific quantitative metrics. This process aims to optimize the generated concepts with respect to four metrics: novelty, feasibility, problem–solution relevancy, and variety. The framework incorporates both long-term and short-term memory models to examine how incorporating the history of interactions impacts decision-making and concept generation outcomes. We explored the efficacy of incorporating images alongside text in conveying design ideas within our actor–critic framework by experimenting with two mediums for the agents: vision language and language only. We extensively evaluated the framework through a case study using the AskNature dataset, comparing its performance against benchmarks such as GPT-4 and real-world biomimetic designs across various industrial examples. Our findings underscore the framework’s capability to iteratively refine and enhance the initial design concepts, achieving significant improvements across all metrics. We conclude by discussing the implications of the proposed framework for various design domains, along with its limitations and several directions for future research in this domain.

More Like this