DemoDiff: Graph Diffusion Transformers are In-Context Molecular Designers

This repository contains the DemoDiff model, a diffusion-based molecular foundation model for in-context inverse molecular design, as presented in the paper Graph Diffusion Transformers are In-Context Molecular Designers.

DemoDiff leverages graph diffusion transformers to generate molecules based on contextual examples, enabling few-shot molecular design across diverse chemical tasks without task-specific fine-tuning. It introduces demonstration-conditioned diffusion models, which define task contexts using a small set of molecule-score examples instead of text descriptions to guide a denoising Transformer for molecule generation. A novel molecular tokenizer with Node Pair Encoding is developed for scalable pretraining, representing molecules at the motif level.

Code: https://github.com/liugangcode/DemoDiff

🌟 Key Features

In-Context Learning: Generate molecules using only contextual examples (no fine-tuning required)
Graph-Based Tokenization: Novel molecular graph tokenization with BPE-style vocabulary
Comprehensive Benchmarks: 30+ downstream tasks covering drug discovery, docking, and polymer design

Model Configuration

Parameter	Value	Description
context_length	150	Maximum sequence length for the input context.
depth	24	Number of transformer layers.
diffusion_steps	500	Number of diffusion steps during training.
hidden_size	1280	Hidden dimension size in the transformer.
mlp_ratio	4	Expansion ratio in the MLP block.
num_heads	16	Number of attention heads.
task_name	`pretrain`	Task type for model training.
tokenizer_name	`pretrain`	Tokenizer used for model input.
vocab_ring_len	300	Length of the circular vocabulary window.
vocab_size	3000	Total vocabulary size.

Downloads last month: 8

Inference Providers NEW

Graph Machine Learning

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

liuganghuggingface
/

DemoDiff-0.7B

DemoDiff: Graph Diffusion Transformers are In-Context Molecular Designers

🌟 Key Features

Model Configuration

Dataset used to train liuganghuggingface/DemoDiff-0.7B

Collection including liuganghuggingface/DemoDiff-0.7B

Demodiff