DemoDiff: Graph Diffusion Transformers are In-Context Molecular Designers

This repository contains the DemoDiff model, a diffusion-based molecular foundation model for in-context inverse molecular design, as presented in the paper Graph Diffusion Transformers are In-Context Molecular Designers.

DemoDiff leverages graph diffusion transformers to generate molecules based on contextual examples, enabling few-shot molecular design across diverse chemical tasks without task-specific fine-tuning. It introduces demonstration-conditioned diffusion models, which define task contexts using a small set of molecule-score examples instead of text descriptions to guide a denoising Transformer for molecule generation. A novel molecular tokenizer with Node Pair Encoding is developed for scalable pretraining, representing molecules at the motif level.

Code: https://github.com/liugangcode/DemoDiff

🌟 Key Features

  • In-Context Learning: Generate molecules using only contextual examples (no fine-tuning required)
  • Graph-Based Tokenization: Novel molecular graph tokenization with BPE-style vocabulary
  • Comprehensive Benchmarks: 30+ downstream tasks covering drug discovery, docking, and polymer design

Model Configuration

Parameter Value Description
context_length 150 Maximum sequence length for the input context.
depth 24 Number of transformer layers.
diffusion_steps 500 Number of diffusion steps during training.
hidden_size 1280 Hidden dimension size in the transformer.
mlp_ratio 4 Expansion ratio in the MLP block.
num_heads 16 Number of attention heads.
task_name pretrain Task type for model training.
tokenizer_name pretrain Tokenizer used for model input.
vocab_ring_len 300 Length of the circular vocabulary window.
vocab_size 3000 Total vocabulary size.
Downloads last month
8
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train liuganghuggingface/DemoDiff-0.7B

Collection including liuganghuggingface/DemoDiff-0.7B