Architecture Compiler v1

Intro
Overview
Pareto Frontier
Pareto Modifier
Delta Influence
Cross-Hardware
Tile Lattice
Throughput
Loading blog.md...

Interactive Frontier

Baseline-aware heuristic mode starts from a known dense Transformer and searches local same-quality hardware-fit moves: parallelism, layout, kernels, cache policy, and scheduling. Shape, GQA, KV quantization, and FP8 changes are optional quality-spending moves.
+0.00% optional loss proxy
Scales baseline (mem, KV, TBT) to the chosen deployment regime so the bottleneck card surfaces axes the modifier can actually relieve.

Baseline Delta Frontier

Pick any baseline architecture, hardware target, and workload regime, then choose one or more deltas from the transformation library on the left. The result panel on the right evaluates each delta against the baseline and classifies it against a local heuristic reference set built in the browser.

Delta Library

One chip → that delta alone. Multiple chips → deltas stack into a combined evaluation. Hybrid components: pick at most one per category (Type / Placement / Ratio).

Constraint Perturbation

Architecture Dimension Perturbation

What happens if you perturb each dimension of the optimal architecture? Each row shows the marginal cost/benefit of a single change.

Available for 2T token configurations. Constraint perturbation requires optimizer re-runs and is computed for select configurations.

Tile-Aligned Architecture Lattice

Hardware-aware transformer dimensions for H100, B200, and TPU v5e — every efficient (d_model, d_head, n_heads, FFN_dim) at each precision and TP degree.

Lattice Browser
Config Calculator
GQA Configs
MoE Sizing
State Dims
Cross-Precision
Validation
Find efficient architectures for a target parameter count. Enumerates all lattice-aligned configurations, computes n_layers to hit target params, and ranks by composite efficiency score.
Rankd_modeln_headsd_headFFN dim n_kv_headsn_layersParams (B) Tile UtilWave @2KWave @8KScore
Valid (n_heads, n_kv_heads) pairs for each d_model and TP degree, showing which GQA ratios are tile-aligned.
n_headsn_kv_headsGQA RatioKV Proj / GPUAligned
Tile-aligned expert FFN dimensions for MoE architectures. Each expert's matmul is a separate kernel invocation — per-expert alignment is critical.
n_expertsExpert FFN dimTotal FFN equivAligned
Valid d_state values for state mechanisms (Mamba-2 / structured SSM). Must be tile-aligned for the state update matmul and fit in SRAM.
d_stated_headSRAM / head (bytes)Aligned
For mixed-precision architectures (e.g. BF16 attention + FP8 FFN), dimensions must satisfy tile alignment for both precisions simultaneously. The intersection lattice is sparser than either single-precision lattice.
Tile alignment check for known production architectures. Verifies whether each dimension satisfies CTA-level alignment.
Architectured_modeld_headn_headsFFN dimn_layers d_model%Kd_model%Nd_head%Kd_head%NFFN%NStatus

Throughput Model

Hardware-aware architecture throughput estimation — dense transformer with GQA, across H100, B200, TPU v5e, and TPU v5p.

Comparison
Layer Breakdown
Decode Analysis
Validation
Cross-Hardware Comparison
HardwareTrain tok/sPrefill ms Decode tok/sMemory GBBottleneck
Training Throughput by Architecture (H100)
Inference TBT (Time Between Tokens) by Architecture (H100)
Inference TTFT (Time To First Token) by Architecture (H100)

Per-Layer Time Breakdown (Training)

Training Operation Costs

Per-Layer Time Breakdown (Prefill)

Per-Layer Time Breakdown (Decode)

Decode Latency vs KV Cache Length

GQA Impact on Decode

Training Validation (H100, ≤25% error target)
ArchitecturePredictedMeasuredRatioErrorStatus
Decode Validation (H100, ≤25% error target)
ArchitecturePredicted tok/sMeasured tok/sRatioErrorStatus