enesarda22
/

Llama-3.2-1B-DeepSeek67B-Distilled

Text Generation

text-generation-inference

Model card Files Files and versions

Llama-3.2-1B-DeepSeek-Distilled

Model Overview

This model is distilled from deepseek-ai/deepseek-llm-67b-base, a 67B parameter model available on Hugging Face. It is based on the meta-llama/Llama-3.2-1B architecture.

Evaluation Scores

XTREME - XNLI - en: Accuracy: 0.365
SuperGLUE - BoolQ: Accuracy: 0.721
GLUE - SST-2: Accuracy: 0.924
SQuAD:
- Exact Match: 61.5
- F1 Score: 74.1

Usage

Load the model and tokenizer from the Hugging Face Hub:

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("enesarda22/Llama-3.2-1B-DeepSeek67B-Distilled")
tokenizer = AutoTokenizer.from_pretrained("enesarda22/Llama-3.2-1B-DeepSeek67B-Distilled")

Downloads last month: -

Safetensors

Model size

1B params

Tensor type

F32

·

Dataset used to train enesarda22/Llama-3.2-1B-DeepSeek67B-Distilled