Peak compute measured via cpufp vfmacc.vv · Memory: 32-bit LPDDR4X @ 2666 Mbps · VLEN=256 DLEN=128 · RVV 1.0 · RVA22
| Precision | 1 core | 8 cores |
|---|---|---|
| FP16 | 63.9 | 511.5 GFLOPS |
| FP32 | 32.0 | 255.8 GFLOPS |
| FP64 | 16.0 | 127.9 GFLOPS |
| INT8 IME | — | 2.05 TOPS (4c only) |
| Bandwidth | Ridge OI | Min N for compute-bound |
|---|---|---|
| Theoretical (10.66 GB/s) | 24.0 FLOP/B | N ≈ 144 |
| Realistic (~7 GB/s) | 36.5 FLOP/B | N ≈ 220 |
Square NxN FP32 GEMM: OI = N/6 FLOP/byte.
Orange markers show where each N falls on the roofline.
X60 core: In-order, dual-issue. VLEN=256, DLEN=128 (128-bit execution datapath). Two vector units can execute in parallel to the scalar FP unit. Microarchitecture closely resembles XuanTie C908 but with wider VLEN.
FP32 per-core throughput: ~32 GFLOPS → ~20 FLOPS/cycle @ 1.6 GHz. This implies ~10 FP32 FMAs/cycle sustained: 2 vector units × (128b / 32b) = 8 FMAs from vector + likely 1-2 scalar FMA contributing.
IME (Integer Matrix Extension): SpacemiT custom vendor extension for INT8 matrix multiply. Only available on Cluster 0 (cores 0–3). Delivers ~2 TOPS across 4 cores — this is the "2.0 TOPS AI computing power" in SpacemiT's marketing.
Memory: Single 32-bit LPDDR4X bus @ 2666 MT/s = 10.66 GB/s theoretical peak. Realistic sustained bandwidth with these in-order cores is likely 6–8 GB/s. This makes the ridge point quite high (~24–37 FLOP/byte for FP32 8-core), meaning GEMM needs N≥144–220 to become compute-bound.