import { useEffect, useState, useRef } from "react"; import Chat from "./components/Chat"; import ArrowRightIcon from "./components/icons/ArrowRightIcon"; import StopIcon from "./components/icons/StopIcon"; import Progress from "./components/Progress"; const IS_WEBGPU_AVAILABLE = !!navigator.gpu; const STICKY_SCROLL_THRESHOLD = 120; const EXAMPLES = [ "Give me some tips to improve my time management skills.", "What is the difference between AI and ML?", "Write python code to compute the nth fibonacci number.", ]; function App() { // Create a reference to the worker object. const worker = useRef(null); const textareaRef = useRef(null); const chatContainerRef = useRef(null); // Model loading and progress const [status, setStatus] = useState(null); const [error, setError] = useState(null); const [loadingMessage, setLoadingMessage] = useState(""); const [progressItems, setProgressItems] = useState([]); const [isRunning, setIsRunning] = useState(false); // Inputs and outputs const [input, setInput] = useState(""); const [messages, setMessages] = useState([]); const [tps, setTps] = useState(null); const [numTokens, setNumTokens] = useState(null); function onEnter(message) { setMessages((prev) => [...prev, { role: "user", content: message }]); setTps(null); setIsRunning(true); setInput(""); } function onInterrupt() { // NOTE: We do not set isRunning to false here because the worker // will send a 'complete' message when it is done. worker.current.postMessage({ type: "interrupt" }); } useEffect(() => { resizeInput(); }, [input]); function resizeInput() { if (!textareaRef.current) return; const target = textareaRef.current; target.style.height = "auto"; const newHeight = Math.min(Math.max(target.scrollHeight, 24), 200); target.style.height = `${newHeight}px`; } // We use the `useEffect` hook to setup the worker as soon as the `App` component is mounted. useEffect(() => { // Create the worker if it does not yet exist. if (!worker.current) { worker.current = new Worker(new URL("./worker.js", import.meta.url), { type: "module", }); worker.current.postMessage({ type: "check" }); // Do a feature check } // Create a callback function for messages from the worker thread. const onMessageReceived = (e) => { switch (e.data.status) { case "loading": // Model file start load: add a new progress item to the list. setStatus("loading"); setLoadingMessage(e.data.data); break; case "initiate": setProgressItems((prev) => [...prev, e.data]); break; case "progress": // Model file progress: update one of the progress items. setProgressItems((prev) => prev.map((item) => { if (item.file === e.data.file) { return { ...item, ...e.data }; } return item; }), ); break; case "done": // Model file loaded: remove the progress item from the list. setProgressItems((prev) => prev.filter((item) => item.file !== e.data.file), ); break; case "ready": // Pipeline ready: the worker is ready to accept messages. setStatus("ready"); break; case "start": { // Start generation setMessages((prev) => [ ...prev, { role: "assistant", content: "" }, ]); } break; case "update": { // Generation update: update the output text. // Parse messages const { output, tps, numTokens } = e.data; setTps(tps); setNumTokens(numTokens); setMessages((prev) => { const cloned = [...prev]; const last = cloned.at(-1); cloned[cloned.length - 1] = { ...last, content: last.content + output, }; return cloned; }); } break; case "complete": // Generation complete: re-enable the "Generate" button setIsRunning(false); break; case "error": setError(e.data.data); break; } }; const onErrorReceived = (e) => { console.error("Worker error:", e); }; // Attach the callback function as an event listener. worker.current.addEventListener("message", onMessageReceived); worker.current.addEventListener("error", onErrorReceived); // Define a cleanup function for when the component is unmounted. return () => { worker.current.removeEventListener("message", onMessageReceived); worker.current.removeEventListener("error", onErrorReceived); }; }, []); // Send the messages to the worker thread whenever the `messages` state changes. useEffect(() => { if (messages.filter((x) => x.role === "user").length === 0) { // No user messages yet: do nothing. return; } if (messages.at(-1).role === "assistant") { // Do not update if the last message is from the assistant return; } setTps(null); worker.current.postMessage({ type: "generate", data: messages }); }, [messages, isRunning]); useEffect(() => { if (!chatContainerRef.current || !isRunning) return; const element = chatContainerRef.current; if ( element.scrollHeight - element.scrollTop - element.clientHeight < STICKY_SCROLL_THRESHOLD ) { element.scrollTop = element.scrollHeight; } }, [messages, isRunning]); return IS_WEBGPU_AVAILABLE ? (
{status === null && messages.length === 0 && (

Llama-3.2 WebGPU

A private and powerful AI chatbot
that runs locally in your browser.


You are about to load{" "} Llama-3.2-1B-Instruct , a 1.24 billion parameter LLM that is optimized for inference on the web. Once downloaded, the model (1.01 GB) will be cached and reused when you revisit the page.

Everything runs directly in your browser using{" "} 🤗 Transformers.js {" "} and ONNX Runtime Web, meaning your conversations aren't sent to a server. You can even disconnect from the internet after the model has loaded!
Want to learn more? Check out the demo's source code on{" "} GitHub !

{error && (

Unable to load model due to the following error:

{error}

)}
)} {status === "loading" && ( <>

{loadingMessage}

{progressItems.map(({ file, progress, total }, i) => ( ))}
)} {status === "ready" && (
{messages.length === 0 && (
{EXAMPLES.map((msg, i) => (
onEnter(msg)} > {msg}
))}
)}

{tps && messages.length > 0 && ( <> {!isRunning && ( Generated {numTokens} tokens in{" "} {(numTokens / tps).toFixed(2)} seconds ( )} { <> {tps.toFixed(2)} tokens/second } {!isRunning && ( <> ). { worker.current.postMessage({ type: "reset" }); setMessages([]); }} > Reset )} )}

)}