CAD-Tokenizer: Towards Text-based CAD Prototyping via Modality-Specific Tokenization
Abstract
CAD-Tokenizer, a multimodal tokenization framework using VQ-VAE, enhances text-guided CAD prototyping by improving instruction following and generation quality.
Computer-Aided Design (CAD) is a foundational component of industrial prototyping, where models are defined not by raw coordinates but by construction sequences such as sketches and extrusions. This sequential structure enables both efficient prototype initialization and subsequent editing. Text-guided CAD prototyping, which unifies Text-to-CAD generation and CAD editing, has the potential to streamline the entire design pipeline. However, prior work has not explored this setting, largely because standard large language model (LLM) tokenizers decompose CAD sequences into natural-language word pieces, failing to capture primitive-level CAD semantics and hindering attention modules from modeling geometric structure. We conjecture that a multimodal tokenization strategy, aligned with CAD's primitive and structural nature, can provide more effective representations. To this end, we propose CAD-Tokenizer, a framework that represents CAD data with modality-specific tokens using a sequence-based VQ-VAE with primitive-level pooling and constrained decoding. This design produces compact, primitive-aware representations that align with CAD's structural nature. Applied to unified text-guided CAD prototyping, CAD-Tokenizer significantly improves instruction following and generation quality, achieving better quantitative and qualitative performance over both general-purpose LLMs and task-specific baselines.
Community
TLDR: We introduce a new, CAD-specific tokenizer pretrained from VQ, and improve the compression rate and LLM performance on a unified CAD-editing and Text-to-CAD generation task.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- CAD-Judge: Toward Efficient Morphological Grading and Verification for Text-to-CAD Generation (2025)
- B-repLer: Semantic B-rep Latent Editor using Large Language Models (2025)
- UniUGG: Unified 3D Understanding and Generation via Geometric-Semantic Encoding (2025)
- Drawing2CAD: Sequence-to-Sequence Learning for CAD Generation from Vector Drawings (2025)
- Training LLMs to be Better Text Embedders through Bidirectional Reconstruction (2025)
- Text4Seg++: Advancing Image Segmentation via Generative Language Modeling (2025)
- PartSAM: A Scalable Promptable Part Segmentation Model Trained on Native 3D Data (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper