Weather + park + pitcher model vs Kalshi Over/Under ladder. Phase 2 — Dry run · signals observed, no trades
Games
—
On Kalshi
—
With Signal
—
View
Full Game
Mode
Dry Run
Near-Certain
—
Post-4/21 analysis shows the model’s WR is solid across all buckets (60-70%), but full-game buckets are bleeding because entry prices exceed what the edge supports. The counterfactual backtest of the 4/24 pitcher-threshold fix showed no P&L improvement on 4/21-4/23, confirming the losses are price-driven, not model-driven.
Working as designed. F5 OVER is the cleanest cell (75%+ WR).
Need ~60% WR to break even — within model’s actual 69% WR. OK.
Needs 70%+ WR to break even. Historical WR is 69%. Skip.
50% historical WR. Coin flip being priced as favorite. Skip.
Signals still fire at all prices; this is informational guidance while we collect more data.
v2.2 agreement — two EV-positive cells (n=343 backfill, 4/13-4/24)
Backfill of all historical signals against the v2.2 (offense + home/road splits) model revealed an asymmetric pattern. Both EV-positive cells now get a mauve badge on the signal pill:
Counter-intuitive but real: on UNDER signals, v2.2 disagreement (i.e., v2.2 expects more runs than v1) is when v1's UNDER call is at its best. The mauve ✗ v2.2 badge marks these. UNDER signals where v2.2 agrees are the worst cell in the entire model lineup — 64% WR isn't enough at the avg entry price.
Why? UNDER alpha lives in volatile pitcher matchups where v1's pitcher-only model nails it but v2.2's offense factor pushes the projection up. When both models agree on UNDER, it's a low-variance game and the price already reflects that — no edge.
| Game | Time | Probable Pitchers | Env ⓘ | Expected ⓘ | Moneyline | Total Ladder ⓘ | Signal | Link |
|---|---|---|---|---|---|---|---|---|
| Loading MLB scanner… | ||||||||
1. Expected Total
Base (8.6 FG / 4.2 F5) × park factor × temperature × wind × pitcher quality (composite ERA + WHIP + K/9 + BAA). F5 uses 1.5× pitcher amplification — starters carry 100% of F5 with no bullpen dilution. Domes mute weather; retractable roofs half-strength.
2. Probability
Normal distribution around expected mean (σ=3.0 full game, 1.9 F5). P(Over X.5) = 1 − Φ((X.5 − mean) / σ).
3. Edge
Per threshold: ourProb − yesAsk (over) or (1 − ourProb) − noAsk (under). Signal fires when edge ≥ 5pp, price in 15-85¢.
4. Data
Schedule, weather, linescore from statsapi.mlb.com. Pitcher hand + season ERA, WHIP, K/9, BAA, IP from batched people endpoint. Kalshi markets from 7 MLB series.
On binary contracts, every loss costs the same ($100 on a $100 stake) regardless of entry price. But wins pay out inversely to what you paid. This asymmetry means entry price matters as much as edge percentage when choosing between signals.
| Entry | Win Profit | Loss | Payout Ratio | Break-even WR |
|---|---|---|---|---|
| 25¢ | +$300 | −$100 | 3 : 1 | 25% |
| 35¢ | +$186 | −$100 | 1.9 : 1 | 35% |
| 50¢ | +$100 | −$100 | 1 : 1 | 50% |
| 65¢ | +$54 | −$100 | 0.5 : 1 | 65% |
| 80¢ | +$25 | −$100 | 0.25 : 1 | 80% |
When picking between signals: a 10pp edge at 40¢ is worth materially more than a 10pp edge at 70¢. The 40¢ signal makes 2.5× more on a win with the same $100 downside. Weight entry price alongside your baseball read.
Why concentration happens: a single win at 26¢ (+$285) produces more P&L than the next four wins at 55¢ combined (+$82 each). The top games in the P&L table almost always have the cheapest entries, not the highest edge.
The trap: cheap entries (<35¢) win less often — Kalshi is pricing them cheap for a reason. The edge has to be large enough to overcome the lower base rate. An 8pp edge at 30¢ is better math than a 12pp edge at 70¢, but only if the model's probability estimate is actually correct at that tail.