Roofline ModelMilk-V Jupiter · SpacemiT K1 (8× X60 @ 1.6 GHz)

Peak compute measured via cpufp vfmacc.vv · Memory: 32-bit LPDDR4X @ 2666 Mbps · VLEN=256 DLEN=128 · RVV 1.0 · RVA22

Measured Peak Compute (cpufp vfmacc.vv)

Precision1 core8 cores
FP1663.9511.5 GFLOPS
FP3232.0255.8 GFLOPS
FP6416.0127.9 GFLOPS
INT8 IME2.05 TOPS (4c only)

Ridge Points (FP32, 8 cores)

BandwidthRidge OIMin N for compute-bound
Theoretical (10.66 GB/s)24.0 FLOP/BN ≈ 144
Realistic (~7 GB/s)36.5 FLOP/BN ≈ 220

Square NxN FP32 GEMM: OI = N/6 FLOP/byte.
Orange markers show where each N falls on the roofline.

Microarchitecture Notes

X60 core: In-order, dual-issue. VLEN=256, DLEN=128 (128-bit execution datapath). Two vector units can execute in parallel to the scalar FP unit. Microarchitecture closely resembles XuanTie C908 but with wider VLEN.

FP32 per-core throughput: ~32 GFLOPS → ~20 FLOPS/cycle @ 1.6 GHz. This implies ~10 FP32 FMAs/cycle sustained: 2 vector units × (128b / 32b) = 8 FMAs from vector + likely 1-2 scalar FMA contributing.

IME (Integer Matrix Extension): SpacemiT custom vendor extension for INT8 matrix multiply. Only available on Cluster 0 (cores 0–3). Delivers ~2 TOPS across 4 cores — this is the "2.0 TOPS AI computing power" in SpacemiT's marketing.

Memory: Single 32-bit LPDDR4X bus @ 2666 MT/s = 10.66 GB/s theoretical peak. Realistic sustained bandwidth with these in-order cores is likely 6–8 GB/s. This makes the ridge point quite high (~24–37 FLOP/byte for FP32 8-core), meaning GEMM needs N≥144–220 to become compute-bound.