smol-course/README · Study Group [Accountability, Discussions, Resources and Common Doubt Resolutions]

Sep 11

•

Hey guys I am starting this study group if you want to learn together rather than being alone in finishing the course.
Key Points for this group:

You can ask fellow members to motivate you in case you procrastinate a lot (like me).
If some topic is interesting and you find extra resources then you can share with the group.
Resolution of common doubts that we face during learning.
We can add more points later :p

Leave your discord in comment or directly take part through here!

KanishkNoir changed discussion status to closed Sep 11

KanishkNoir changed discussion status to open Sep 11

burtenshaw

a smol course org Sep 11

Nice work @KanishkNoir . Feel free to share any practical information like weekly calls or check-ins.

KanishkNoir

Sep 11

•

edited Sep 11

Module: Instruction Tuning
Section: Chat Template

When running text-generation if your response has <think> token then you can switch it to standard response by disabling the enable _thinking param in the tokenizer!
Reason: The tokenizer that we are using right now has the thinking mode enabled by default :)


tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B")

pipe = pipeline("text-generation", "HuggingFaceTB/SmolLM3-3B", tokenizer=tokenizer, device_map="auto")

messages = [
    {"role": "system", "content": "You are a angry chatbot that responds in the style of a wild west cowboy."},
    {"role": "user", "content": "Hello, how are you?"}
]

# Apply the chat template with thinking disabled
thinking_disabled_chat = tokenizer.apply_chat_template(
    messages, 
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False
)

# Use the formatted template in pipeline
response = pipe(thinking_disabled_chat, max_new_tokens=128, temperature=0.7, return_full_text=False)


print(response[0]['generated_text'])

Echo9Zulu

Sep 11

@KanishkNoir Hello!

You can also add a tools parameters and pass tool instructions into the tokenizer, wrapping tool instructions with tool tokens! The idea is to leverage AutoTokenizers as a templating engine to make tool calls more deterministic by using tokens a model has seen in post training, or fine tuning. Leveraging jinja templates wherever possible is pretty much necessary to reproduce performance across tasks. Of course, this can be achieved with prompts. There is also a parameter for documents in rag scenarios.

I haven't tested these yet but have been researching as I plan an implementation of something like the OpenAI Responses API.

thinking_disabled_chat = tokenizer.apply_chat_template(
    messages, 
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=False
    tools=tools
)

Another interesting feature I finally leveraged recently was

return_tensors="np"

which returns tokens as a numpy array, very useful for manipulation. I do a lot of work with OpenVINO, and wrapping the output in ov.Tensor makes AutoTokenizers usable with OpenVINO runtime outside of Optimum-Intel. A bit off topic, but related to apply_chat_template.

        prompt_token_ids = self.encoder_tokenizer.apply_chat_template(
            messages, 
            add_generation_prompt=True,
            skip_special_tokens=True,
            return_tensors="np"
            )
        return ov.Tensor(prompt_token_ids)

Very nice. Gets around jinja errors in OpenVINO GenAI and adds enormous flexibility.

Bonus, because nerd sniping: This took a long time for me to figure out. Another class, AutoProcessor, can be used for multimodal input; it handles formatting of images and others with lots of flexibility across suppoted architectures. For chat scenario with models that support image input in Transformers we do not need to pass an image with every prompt, like we see in ChatGPT, Gemini, Claude, etc, even though models are multimodal capable. So I came up with this snippet to use in a FastAPI application where we expect message to contain a base64 image and be formatted in role/content key. Maybe there are simpler ways to do this, but here was my solution, which I cleaned up to share:

# Iterate over the messages list 
for message in messages:
    # Handle multimodal messages: check if "content" is a list
    if isinstance(message.get("content", ""), list):
        text_parts = []

        for content_item in message["content"]:
            # Case 1: Image content
            if isinstance(content_item, dict) and content_item.get("type") == "image_url":
                image_url = content_item.get("image_url", {})

                # Check if the image is embedded as base64
                if isinstance(image_url, dict) and image_url.get("url", "").startswith("data:image/"):
                    base64_data = image_url["url"].split(",", 1)
                    if len(base64_data) > 1:
                        # Decode base64 string into binary
                        image_data = base64.b64decode(base64_data[1])

                        # Convert binary into a PIL image (force RGB mode)
                        image = Image.open(BytesIO(image_data)).convert("RGB")
                        images.append(image)

            # Case 2: Text content
            elif isinstance(content_item, dict) and content_item.get("type") == "text":
                text_parts.append(content_item.get("text", ""))

        # Build a cleaned message object containing only text
        if text_parts:
            text_message = message.copy()
            text_message["content"] = " ".join(text_parts)
            text_conversation.append(text_message)
        else:
            # Even if no text, keep the message with empty content
            text_message = message.copy()
            text_message["content"] = ""
            text_conversation.append(text_message)

    else:
        # If "content" is not a list, append the raw message as-is
        text_conversation.append(message)

# Apply the processor’s chat template using messages
text_prompt = self.processor.apply_chat_template(
    messages,
    add_generation_prompt=True
)

# Prepare processor inputs depending on whether images were found
if images:
    inputs = self.processor(
        text=[text_prompt],
        images=[images],
        padding=True,
        return_tensors="pt",
        add_generation_prompt=True
    )
else: # if text was found without images, pass inputs forward with empty image tokens
    inputs = self.processor(
        text=[text_prompt],
        padding=True,
        return_tensors="pt",
        add_generation_prompt=True
    )

KanishkNoir

Sep 11

@Echo9Zulu Hey, that was a really cool read! I will try to experiment with this as I progress with the course and off-course too! Really intrigued about the return_tensors = "np" part. Will try it later to see where can I leverage it :)
Thanks for the input and these additional theory!

KanishkNoir

Sep 13

Module: Instruction Tuning
Section: Chat Template
Sub- Section: Training with Thinking Mode

Application examples of using Training with Thinking Mode:

Educational Systems (Step-by-Step Explanations)
a. Math tutoring: Show reasoning when solving equations or word problems, with an option to toggle concise answers.
b. Science education: Provide hidden reasoning for physics/chemistry derivations, revealing it only when a student asks "show me how."
c. Language learning: Include reasoning behind grammar corrections (verb tense, subject-verb agreement, etc.) that can be shown or hidden.
Debugging & Programming Assistance
a. Code explanations: Reveal the thought process behind bug identification and fixes, while allowing users to view just the final code if preferred.
b. Algorithm walkthroughs: Enable learners and developers to see step-by-step optimization or complexity analysis.
AI Safety & Interpretability
a. Model auditing: Store both reasoning and answers in training data for evaluators to examine how the model reached its conclusions.
b. Bias detection: Allow analysts to inspect the hidden reasoning layer to identify stereotypes or faulty logic in sensitive domains.
Conversational Agents & Assistants
a. Decision-making assistants: Travel planners or shopping bots can internally weigh trade-offs (budget vs. quality) while presenting only the final recommendation.
b. Customer support bots: Log reasoning for QA teams while delivering concise, customer-friendly responses.

andregustavo

Sep 16

[Cross posting from Discord]
Has anyone having issues running lighteval on Colab T4? I tried running the plain command, but I had to do some tweaks to get it working. Is this okay or am I missing some details?

!pip install uv -qqq
!uv init
!uv add "triton<=3.2.0" "vllm<0.10.2" lighteval[vllm] -qqq
!uv run lighteval vllm "model_name=HuggingFaceTB/SmolLM3-3B,dtype=float16" "lighteval|gsm8k|0|0" --push-to-hub --results-org andregustavo

I've made a standalone notebook just for evals here: https://colab.research.google.com/drive/1Sntdimj1WFzLI26QpiR1ykD3ZsQpOOrF#scrollTo=Emybz1V2UcWm
It's boilerplate eval'ing the original model just to make sure it was working before trying my model!

Nevermetyou

Sep 28

•

edited Sep 28

I am interested
discord: nevermetyou#3659

Nevermetyou

Sep 28

Module: Instruction Tuning
Section: Tool Usage and Function Calling
Sub- Section: Chat Templates with Tools

The below example is kind of weird

# Conversation with tool usage
messages = [
    {"role": "system", "content": "You are a helpful assistant with access to tools."},
    {"role": "user", "content": "What's the weather like in Paris?"},
    {
        "role": "assistant", 
        "content": "I'll check the weather in Paris for you.",
        "tool_calls": [
            {
                "id": "call_1",
                "type": "function",
                "function": {
                    "name": "get_weather",
                    "arguments": '{"location": "Paris, France", "unit": "celsius"}'
                }
            }
        ]
    },
    {
        "role": "tool",
        "tool_call_id": "call_1", 
        "content": '{"temperature": 22, "condition": "sunny", "humidity": 60}'
    },
    {
        "role": "assistant",
        "content": "The weather in Paris is currently sunny with a temperature of 22°C and 60% humidity. It's a beautiful day!"
    }
]

# Apply chat template with tools
formatted_with_tools = tokenizer.apply_chat_template(
    messages,
    tools=tools,
    tokenize=False,
    add_generation_prompt=False
)

print("Chat template with tools:")
print(formatted_with_tools)

If you apply chat template to the above message example, you will not get the <tool_call>...</tool_call> in your output.
Instead the example should be

messages = [
    {"role": "system", "content": "You are a helpful assistant with access to tools."},
    {"role": "user", "content": "What's the weather like in Paris?"},
    {
        "role": "assistant",
        "content": 'I\'ll check the weather in Paris for you.\n\n<tool_call>\n{"name": "get_weather", "arguments": {"location": "Paris, France", "unit": "celsius"}}\n</tool_call>',
    },
    {
        "role": "tool",
        "content": '{"temperature": 22, "condition": "sunny", "humidity": 60}',
    },
    {
        "role": "assistant",
        "content": "The weather in Paris is currently sunny with a temperature of 22°C and 60% humidity. It's a beautiful day!",
    },
]

You will get the below result which is look more correct

<|im_start|>system
## Metadata

Knowledge Cutoff Date: June 2025
Today Date: 28 September 2025
Reasoning Mode: /think

## Custom Instructions

You are a helpful assistant with access to tools.

<|im_start|>user
What's the weather like in Paris?<|im_end|>
<|im_start|>assistant
I'll check the weather in Paris for you.

<tool_call>
{"name": "get_weather", "arguments": {"location": "Paris, France", "unit": "celsius"}}
</tool_call><|im_end|>
<|im_start|>user
{"temperature": 22, "condition": "sunny", "humidity": 60}<|im_end|>
<|im_start|>assistant
The weather in Paris is currently sunny with a temperature of 22°C and 60% humidity. It's a beautiful day!<|im_end|>

The reason the tool role got replace by user role is because the line 83-83 in this file
https://huggingface.co/HuggingFaceTB/SmolLM3-3B/blob/main/chat_template.jinja

For me this is kind of weird, I think we should use explicit role name for each message

All the credit go to Mr. NOIR in discord

h-d-h

about 1 month ago

[Cross posting from Discord]
Has anyone having issues running lighteval on Colab T4? I tried running the plain command, but I had to do some tweaks to get it working. Is this okay or am I missing some details?
!pip install uv -qqq
!uv init
!uv add "triton<=3.2.0" "vllm<0.10.2" lighteval[vllm] -qqq
!uv run lighteval vllm "model_name=HuggingFaceTB/SmolLM3-3B,dtype=float16" "lighteval|gsm8k|0|0" --push-to-hub --results-org andregustavo
I've made a standalone notebook just for evals here: https://colab.research.google.com/drive/1Sntdimj1WFzLI26QpiR1ykD3ZsQpOOrF#scrollTo=Emybz1V2UcWm
It's boilerplate eval'ing the original model just to make sure it was working before trying my model!

I've had the issues getting evals to run with both the SFT and DPO exercises, locally and with hf jobs. Basically, the evals listed in the course seem to be incorrect or not exist? I basically did the same as you for the first module, and now I'm trying to figure out which evals to actually run for the second (DPO) module.

It seems important to have the correct evals for a leaderboard to make sense. :)

andregustavo

29 days ago

@h-d-h I haven't started the DPO module yet. Can you try a different lighteval version? Maybe the evals were removed. I will also test in a few days when I get there.