Push model using huggingface_hub.
Browse files- README.md +5 -126
- config.json +3 -2
- model.safetensors +1 -1
README.md
CHANGED
|
@@ -1,131 +1,10 @@
|
|
| 1 |
-
---
|
| 2 |
-
datasets:
|
| 3 |
-
- thuml/UTSD
|
| 4 |
-
pipeline_tag: time-series-forecasting
|
| 5 |
-
---
|
| 6 |
-
# A tutorial on how to build a Foundation Model for Univariate Time Series Forecasting
|
| 7 |
-
|
| 8 |
-
A concise, reproducible recipe for training a transformer-based, patch-to-patch forecasting model for univariate time series. The approach mirrors Large Language Model (LLM) practices (next-token → next-patch) while remaining lightweight compared to a classic LLM and practical.
|
| 9 |
-
|
| 10 |
-
## Highlights
|
| 11 |
-
- Next-patch prediction objective (autoregressive, causal)
|
| 12 |
-
- Patch-based representation of time series (tokens ↔ patches)
|
| 13 |
-
- Causal masking self-attention with RoPE (relative positions)
|
| 14 |
-
- RevIN (Reversible Instance Normalization) with causal statistics
|
| 15 |
-
- SwiGLU feed-forward networks
|
| 16 |
-
- Multi-quantile outputs (median + uncertainty bands)
|
| 17 |
-
- Efficient rollout with KV caching
|
| 18 |
-
|
| 19 |
---
|
| 20 |
tags:
|
| 21 |
-
-
|
| 22 |
-
-
|
| 23 |
-
- transformer
|
| 24 |
-
- patches
|
| 25 |
-
- foundation
|
| 26 |
-
- zero-shot
|
| 27 |
-
pipeline_tag: time-series-forecasting
|
| 28 |
---
|
| 29 |
|
| 30 |
This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
|
| 31 |
-
- Code: [
|
| 32 |
-
- Paper:
|
| 33 |
-
|
| 34 |
-
## Installation
|
| 35 |
-
```bash
|
| 36 |
-
git clone https://github.com/vilhess/PatchFM
|
| 37 |
-
cd PatchFM
|
| 38 |
-
pip install -r requirements.txt
|
| 39 |
-
```
|
| 40 |
-
|
| 41 |
-
## Quick Start
|
| 42 |
-
|
| 43 |
-
```python
|
| 44 |
-
import torch
|
| 45 |
-
from model import Forecaster
|
| 46 |
-
from configs import PatchFMConfig
|
| 47 |
-
|
| 48 |
-
# --- Instantiate model ---
|
| 49 |
-
config = PatchFMConfig(load_from _hub=True)
|
| 50 |
-
model = Forecaster(config)
|
| 51 |
-
|
| 52 |
-
# --- Inference ---
|
| 53 |
-
forecast_horizon = 64
|
| 54 |
-
seq = torch.randn(1, 1024) # (batch, time)
|
| 55 |
-
pred_median, pred_quantiles = model(seq, forecast_horizon=forecast_horizon, quantiles=[0.1, 0.5, 0.9]) # (batch, time, quantiles)
|
| 56 |
-
```
|
| 57 |
-
|
| 58 |
-
We provide an extended quick start example in [notebooks/tutorial.ipynb](./notebooks/tutorial.ipynb).
|
| 59 |
-
If you dont have suitable hardware you can run the the extended quick start example example also in Google Colab:
|
| 60 |
-
|
| 61 |
-
<a target="_blank" href="https://colab.research.google.com/drive/17sdf-7luCkv5TaeLj3Z6kIaTDkwkz3VR?usp=share_link">
|
| 62 |
-
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open Quick Start In Colab"/>
|
| 63 |
-
</a>
|
| 64 |
-
|
| 65 |
-
## Method (TL;DR)
|
| 66 |
-
- Patching: Split a context signal of length $w$ into $P_{num} = w / P_{len}$ patches of length $P_{len}$.
|
| 67 |
-
- RevIN: Normalize patches using causal running mean/variance over past patches, and denormalize outputs to the original scale.
|
| 68 |
-
- Architecture: Input residual MLP → stacked Transformer blocks (MHA + SwiGLU FFN, pre-norm, residual) → $|\mathcal{Q}|$ output heads mapping back to patch space.
|
| 69 |
-
- Positional encoding: Rotary Position Embeddings (RoPE) applied to queries/keys.
|
| 70 |
-
- Training: Multi-quantile (pinball) loss across positions, elements, and quantiles $\mathcal{Q}$.
|
| 71 |
-
- Inference: Predict next patch; roll out autoregressively with KV caching for long horizons.
|
| 72 |
-
|
| 73 |
-
## Problem Formulation
|
| 74 |
-
Given context patches $x_{p_1}, \ldots, x_{p_n}$, predict the next patch $x_{p_{i+1}}$ for each position $i$ using only past patches (causality). The model outputs quantiles $\{\hat{x}_{p_{i+1}}^{(q)}: q \in \mathcal{Q}\}$ with median (q=0.5) as the point forecast.
|
| 75 |
-
|
| 76 |
-
## Loss: Multi-Quantile (Pinball)
|
| 77 |
-
For residual $u = x - \hat{x}^{(q)}$:
|
| 78 |
-
$$\rho_q(u) = \begin{cases} q\,u, & u \ge 0,\\ (q-1)\,u, & u < 0. \end{cases}$$
|
| 79 |
-
Aggregate over positions, patch elements, and quantiles.
|
| 80 |
-
|
| 81 |
-
## Architecture
|
| 82 |
-
- Input MLP: $\mathbb{R}^{P_{len}} \to \mathbb{R}^{dim}$ residual 2-layer MLP (ReLU)
|
| 83 |
-
- Multi-Head Attention: causal mask, RoPE; queries/keys/values per head
|
| 84 |
-
- FFN: SwiGLU (SiLU-gated), pre-norm + residual
|
| 85 |
-
- Output heads: |Q| linear maps $\mathbb{R}^{dim} \to \mathbb{R}^{P_{len}}$ (one per quantile)
|
| 86 |
-
|
| 87 |
-
### Model Details
|
| 88 |
-
- Patch size: 32
|
| 89 |
-
- Max context: 32 patches (1024 steps)
|
| 90 |
-
- Forecast horizon: 32 steps per forward pass
|
| 91 |
-
- Quantiles $\mathcal{Q}$: {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}
|
| 92 |
-
- Layers: 6
|
| 93 |
-
- Attention heads: 64 (head dim 32)
|
| 94 |
-
- Model dim: 2048
|
| 95 |
-
- Parameters: ~300M
|
| 96 |
-
|
| 97 |
-
## Inference
|
| 98 |
-
- Single step: predict next patch ($P_{len}$ values)
|
| 99 |
-
- Long-horizon: append prediction to context and repeat (optionally drop oldest patch to keep window fixed)
|
| 100 |
-
- KV caching: reuse cached keys/values for past patches; compute new Q/K/V only for the appended patch
|
| 101 |
-
|
| 102 |
-
## Datasets
|
| 103 |
-
- UTSD (Unified Time Series Dataset) [UTSD]: seven domains (Energy, IoT, Nature, Web, Health, Transport, Environment). We start with UTSD-1G (~55M series after preprocessing).
|
| 104 |
-
- Artificial: ~1M synthetic series (sinusoidal, linear, polynomial, logarithmic) plus mixtures via TSMixup [Chronos]; Gaussian Process samples via KernelSynth (mixtures of RBF/periodic/linear kernels with swept hyperparameters).
|
| 105 |
-
|
| 106 |
-
## Repository Layout
|
| 107 |
-
|
| 108 |
-
- `model/training/` — main PatchFM model class
|
| 109 |
-
|
| 110 |
-
- `modules.py` - core modules (Residual Layers, MHA, SwiGLU, RoPE, Transformer Encoder, ...)
|
| 111 |
-
- `revin.py` — causal RevIN
|
| 112 |
-
- `loss.py` — multi-quantile (pinball) loss
|
| 113 |
-
- `trainer.py` — PyTorch Lightning trainer class
|
| 114 |
-
|
| 115 |
-
- `model/inference/` — main PatchFM model class for inference with KV caching
|
| 116 |
-
- `modules.py` — core modules with caching support
|
| 117 |
-
- `forecaster.py` — Forecasting model with KV caching and rollout logic
|
| 118 |
-
|
| 119 |
-
- `dataset/` — data loading and preprocessing
|
| 120 |
-
- `artificial.py` — synthetic dataset : artificial signals + TSMixup + KernelSynth
|
| 121 |
-
- `utsd.py` — Unified Time Series Dataset (UTSD) loading and preprocessing
|
| 122 |
-
- `get_data.py` — utility to fetch and preprocess datasets
|
| 123 |
-
- `generate_data.py` — utility to generate and save the KernelSynth dataset (long to generate)
|
| 124 |
-
|
| 125 |
-
- `configs/` — model and training configurations
|
| 126 |
-
- `notebooks/inference` — how to load a trained model and generate forecasts
|
| 127 |
-
- `training.py` — training script using PyTorch Lightning
|
| 128 |
-
|
| 129 |
-
## Acknowledgements
|
| 130 |
-
We thank the authors of the following repositories for inspiration and code snippets:
|
| 131 |
-
- [TiRex](https://github.com/NX-AI/tirex)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
tags:
|
| 3 |
+
- model_hub_mixin
|
| 4 |
+
- pytorch_model_hub_mixin
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
|
| 8 |
+
- Code: [More Information Needed]
|
| 9 |
+
- Paper: [More Information Needed]
|
| 10 |
+
- Docs: [More Information Needed]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
config.json
CHANGED
|
@@ -1,8 +1,9 @@
|
|
| 1 |
{
|
| 2 |
-
"ckpt_path": "ckpts/
|
| 3 |
"compile": true,
|
| 4 |
"d_model": 2048,
|
| 5 |
-
"load_from_hub":
|
|
|
|
| 6 |
"n_heads": 64,
|
| 7 |
"n_layers_encoder": 6,
|
| 8 |
"patch_len": 32,
|
|
|
|
| 1 |
{
|
| 2 |
+
"ckpt_path": "ckpts/huge_v8_12g_5000.pth",
|
| 3 |
"compile": true,
|
| 4 |
"d_model": 2048,
|
| 5 |
+
"load_from_hub": false,
|
| 6 |
+
"max_seq_len": 1024,
|
| 7 |
"n_heads": 64,
|
| 8 |
"n_layers_encoder": 6,
|
| 9 |
"patch_len": 32,
|
model.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 1275009880
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8aa576f038e409ce4f620cd1f31d6d7b1f2f7f55c07bcdb2c569603dc4465bf2
|
| 3 |
size 1275009880
|