TransformCAD: Multimodal Transformer for Computer-Aided Design Generation

Sun, Yuewan; Sha, Zhenghui

doi:10.1115/DETC2025-168790

The creation of manufacturable and modifiable 3D shapes using Computer-Aided Design (CAD) remains a predominantly manual and time-consuming process, hindered by the complexity of boundary representations in 3D solids and the lack of intuitive design tools. This paper introduces TransformCAD, a CAD generation model that leverages both image and natural language descriptions as input to generate CAD sequences, producing editable 3D representations relevant to engineering design. TransformCAD incorporates a fine-tuned Contrastive Language-Image Pre-Training (CLIP) model to process multimodal input and employs two prediction branches—sketch and extrude—to enhance the parsing rate of CAD generation. Extensive evaluations demonstrate that TransformCAD outperforms existing models in terms of parsing rate, Chamfer distance, minimum matching distance, and Jensen-Shannon divergence. Furthermore, by analyzing the impact of training data, we show that TransformCAD exhibits strong potential for accurately generating long-sequence CAD models, which correspond to higher-complexity designs. Moreover, real-world 3D object images taken by a smartphone are used to validate TransformCAD’s practicability, demonstrating its effectiveness in industrial applications. To the best of our knowledge, this is the first attempt at generating 3D CAD models integrating both image and natural language input. TransformCAD expands the boundaries of automated CAD modeling, enabling a more flexible and intuitive design process that bridges visual perception and structured command-based representations.

More Like this