Model Card for llama32-1b-python-docstrings-qlora
A parameter-efficiently fine-tuned adapter on top of meta-llama/Llama-3.2-1B-Instruct for generating concise one-line Python docstrings from function bodies.
Model Details
Model Description
- Developed by: Abdullah Al-Housni
- Model type: Causal language model with LoRA/QLoRA adapters
- Language(s): Python code as input, English docstrings as output
- License: Same as
meta-llama/Llama-3.2-1B-Instruct(Meta Llama 3.2 Community License) - Finetuned from model:
meta-llama/Llama-3.2-1B-Instruct
The model is trained to take a Python function definition and generate a concise, one-line docstring describing what the function does.
Uses
Direct Use
- Automatically generate one-line Python docstrings for functions.
- Improve or bootstrap documentation in Python codebases.
- Educational use for learning how to summarize code behavior.
Typical usage pattern:
- Input: Python function body (source code).
- Output: Single-sentence English description suitable as a docstring.
Out-of-Scope Use
- Generating full, multi-paragraph API documentation.
- Security auditing or correctness guarantees for code.
- Use outside Python (e.g., other programming languages) without additional fine-tuning.
- Any safety-critical application where incorrect summaries could cause harm.
Bias, Risks, and Limitations
- The model can produce incorrect or incomplete summaries, especially for complex or ambiguous functions.
- It may imitate noisy or low-quality patterns from the training data (e.g., overly short or cryptic docstrings).
- It does not understand project-specific context, invariants, or business logic; outputs should be reviewed by a human developer.
Recommendations
- Use the model as an assistive tool, not an authoritative source.
- Always review and edit generated docstrings before committing to production code.
- For non-Python or highly domain-specific code, consider additional fine-tuning on in-domain examples.
How to Get Started with the Model
Example with π€ Transformers and PEFT (LoRA adapter):
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
base_model_id = "meta-llama/Llama-3.2-1B-Instruct"
adapter_id = "Abdul1102/llama32-1b-python-docstrings-qlora"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(base_model_id, device_map="auto")
model = PeftModel.from_pretrained(model, adapter_id)
def make_prompt(code: str) -> str:
return
f'Write a one-line Python docstring for this function:\n\n{code}\n\n"""'
code = "def add(a, b):\n return a + b"
inputs = tokenizer(make_prompt(code), return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=32, do_sample=False)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(text)
Training Details
Training Data
- Dataset: Python subset of CodeSearchNet (
Nan-Do/code-search-net-python) - Inputs:
codecolumn (full Python function body) - Targets: First non-empty line of
docstring - A filtered subset of ~1,000β2,000 examples was used for efficient QLoRA fine-tuning
Training Procedure
- Objective: Causal language modeling (predict the docstring continuation)
- Method: QLoRA (4-bit quantized base model with LoRA adapters)
- Precision: 4-bit quantized weights, bf16 compute
- Epochs: 1
- Max sequence length: 256β512 tokens
Training Hyperparameters
- Learning rate: ~2e-4 (adapter weights only)
- Epochs: 1
- Optimizer: AdamW via Hugging Face
Trainer - LoRA rank: 16
- LoRA alpha: 32
- LoRA dropout: 0.05
Evaluation
Testing Data, Factors & Metrics
Testing Data
Held-out test split from the same CodeSearchNet Python dataset, using identical code β one-line docstring mapping.
Factors
- Function size and complexity
- Variety in docstring writing styles
- Presence of short or noisy docstrings
Metrics
- BLEU (sacreBLEU): strict n-gram overlap, sensitive to paraphrasing
- ROUGE (ROUGE-1 / ROUGE-2 / ROUGE-L): better for short summaries
Results
Approximate performance on ~50 held-out samples:
- BLEU: ~12.4
- ROUGE-1: ~0.78
- ROUGE-2: ~0.74
- ROUGE-L: ~0.78
Summary
The model frequently reproduces or closely paraphrases the correct docstring. Occasional failures include echoing part of the prompt or returning an empty string. Strong performance for a 1B model trained briefly on a small dataset.
Model Examination
Not applicable.
Environmental Impact
- Hardware Type: Google Colab GPU (T4/L4)
- Hours Used: ~0.5β1 hour total
- Cloud Provider: Google Colab
- Compute Region: US
- Carbon Emitted: Not estimated (very low due to minimal training time)
Technical Specifications
Model Architecture and Objective
- Base model: Llama 3.2 1B Instruct
- Architecture: Decoder-only transformer
- Objective: Causal language modeling
- Parameter-efficient fine-tuning using LoRA (rank 16)
Compute Infrastructure
Hardware
Single Google Colab GPU (T4 or L4)
Software
- Python
- PyTorch
- Hugging Face Transformers
- PEFT
- bitsandbytes
- Datasets
Citation
Not applicable.
Glossary
Not applicable.
More Information
See the Hugging Face model page for updates or usage examples.
Model Card Authors
Abdullah Al-Housni
Model Card Contact
Available through the Hugging Face model repository.