LACUNA

Cross-Market Data Fusion for Prediction Market Trading

An experiment in cross-market data fusion. A reinforcement learning agent trained to trade Polymarket's 15-minute crypto prediction markets by fusing Binance futures order flow with Polymarket orderbook data.

The thesis: read the "fast" market (Binance) and trade the "slow" market (Polymarket) before the price adjusts.

Note: This represents ~10 hours of paper trading data from a single run on New Year's Eve 2025. The model traded with a $500 fixed position size and $2,000 max exposure (up to 4 concurrent positions).

Results

Metric	Value
Total PnL	$50,195
Return on Exposure	2,510%
Sharpe Ratio	4.13
Profit Factor	1.21
Total Trades	29,176
Win Rate	23.9%
Runtime	~10 hours

Learning Progression

Comparing first 25% vs last 25% of trades:

Phase	Avg PnL/Trade	Win Rate
First 25%	+$1.27	22.5%
Last 25%	+$3.56	25.3%

2.8x improvement in avg PnL per trade. Last 25% of trades generated 52% of total profit.

Limitations: Single 10-hour run. No out-of-sample validation. Results could reflect market regime, not learned behavior. We're sharing the raw data—draw your own conclusions.

Performance by Asset

Asset	PnL	Trades	Win Rate
BTC	+$38,794	8,257	32.6%
ETH	+$9,978	7,859	27.0%
SOL	+$1,752	6,310	16.3%
XRP	-$328	6,750	16.7%

Architecture

LACUNA (v5) uses a temporal PPO architecture:

Temporal Encoder: Sees last 5 states instead of just the present
Asymmetric Actor-Critic: Separate networks for policy and value
Feature Normalization: Stabilizes training across different market conditions

Model Constraints

Fixed position size: $500 per trade
Max exposure: $2,000 (up to 4 concurrent positions)
Markets: 15-minute crypto prediction markets (BTC, ETH, SOL, XRP)

Observation Space (18 dimensions)

Fuses data from two sources into an 18-dimensional state:

Category	Features
Momentum	1m/5m/10m returns
Order flow	L1/L5 imbalance, trade flow, CVD acceleration
Microstructure	Spread %, trade intensity, large trade flag
Volatility	5m vol, vol expansion ratio
Position	Has position, side, PnL, time remaining
Regime	Vol regime, trend regime

Training Evolution

Five phases over three days. Each taught us something. Only the last earned a name.

Phase 1: Shaped Rewards (Failed)

Duration: ~52 min | Trades: 1,545 | Result: Policy collapse

Started with micro-bonuses to guide learning:

+0.002 for trading with momentum
+0.001 for larger positions
-0.001 for fighting momentum

What happened: Entropy collapsed from 1.09 → 0.36. The agent learned to game the reward function—collect bonuses while ignoring actual profitability. Buffer showed 90% win rate while real trade win rate was 20%.

Lesson: Reward shaping backfired here. When shaping rewards were gameable and similar magnitude to the real signal, the agent optimized the wrong thing.

Phase 2: Pure Realized PnL

Duration: ~1 hour | Trades: 2,000+ | Result: 55% ROI

Stripped everything back:

Reward ONLY on position close
Increased entropy coefficient (0.05 → 0.10)
Simplified actions (7 → 3)
Smaller buffer (2048 → 512)

Update	Entropy	PnL	Win Rate
1	0.68	$5.20	33.3%
36	1.05	$10.93	21.2%

Win rate settled at 21%—below random (33%)—but profitable. Binary markets have asymmetric payoffs. (Still using probability-based PnL at this point.)

Phase 3: Scaled Up ($50 trades)

Duration: ~50 min | Trades: 4,133 | Result: -$64 → +$23

First update hit -$64 drawdown. But the agent recovered:

Update	PnL	Win Rate
1	-$63.75	29.5%
36	+$23.10	15.6%

Observation: The agent recovered from -$64 to +$23 without policy collapse.

Phase 4: Share-Based PnL ($500 trades)

Duration: ~1 hour | Trades: 4,873 | Result: 170% ROI

Changed reward signal to reflect actual market economics:

# Old: probability-based
pnl = (exit_price - entry_price) * dollars

# New: share-based
shares = dollars / entry_price
pnl = (exit_price - entry_price) * shares

Update	PnL	Win Rate
1	-$197	18.9%
20	-$465	18.5%
46	+$3,392	19.0%

4.5x improvement over Phase 3's reward signal.

Phase 5: LACUNA (Final)

Duration: ~10 hours | Trades: 29,176 | Result: 2,510% ROI

Architecture rethink:

Temporal encoder: 5-state history instead of single-frame
Asymmetric actor-critic: Separate network capacities
Feature normalization: Stable across market regimes

It started with a big loss. Seemed broken. Left it running on New Year's Eve while counting down to midnight—not out of hope, just neglect.

Checked back hours later. The equity curve had inflected. By morning: +$50,195.

Only this version earned a name.

Observed Patterns

These patterns emerged in the data. Whether they represent learned behavior or market regime effects is unclear without further validation.

Pattern	Observation
Low volatility preference	$4.07/trade on calm markets vs -$1.44 on volatile
Cheap outcome bias	Cheap entries (<30¢) yield $8.63/trade vs $1.53 for expensive
DOWN momentum	77% of trades bet DOWN when prob is falling
Short hold times on winners	0.35x hold time vs losers

These could reflect genuine learned strategies or simply profitable patterns in this specific market window.

What We Observed

Reward shaping backfired - Phase 1 collapsed when the agent gamed micro-bonuses. Pure realized PnL worked better for us.
Reward signal design mattered - Share-based PnL outperformed probability-based by 4.5x. Match actual market economics.
Entropy coefficient mattered - 0.05 caused policy collapse; 0.10 maintained exploration.
Buffer/trade divergence was a warning sign - When buffer win rate diverged from actual trades, the agent was optimizing the wrong thing.
Give it time - LACUNA started deep in the red. Early performance wasn't indicative.

The Story

This is our final checkpoint. We're done experimenting with LACUNA, but you don't have to be.

Usage

import torch
from safetensors.torch import load_file
import numpy as np
import json

# Load model weights
weights = load_file("model.safetensors")

# Load normalization stats (for preprocessing observations)
stats = np.load("normalization_stats.npz")
obs_mean = stats["obs_mean"]
obs_std = stats["obs_std"]

# Load config for architecture details
with open("config.json") as f:
    config = json.load(f)

# Normalize observations before inference
def normalize_obs(obs):
    return (obs - obs_mean) / (obs_std + 1e-8)

Files

README.md - This documentation
config.json - Model configuration and architecture details
model.safetensors - Model weights in SafeTensors format
normalization_stats.npz - Observation normalization statistics
trades.csv - All 29,176 trades with full details
updates.csv - Training updates with metrics over time

License

MIT

Citation

@misc{lacuna2025,
  author = {HumanPlane},
  title = {LACUNA: Cross-Market Data Fusion for Prediction Market Trading},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/HumanPlane/LACUNA}
}

Downloads last month: 49

Video Preview

Reinforcement Learning

HumanPlane
/

LACUNA