xAI would be leading in raw announced scale of parameters. No other lab has publicly confirmed training 10T or even 6T models right now. The 6T model alone is roughly double the rumored size of Grok 4 and far larger than most current estimates for GPT-5 or Claude 4.6.

Parameter count is only part of the story.

AI models are judged more on:

Active parameters per token (MoE efficiency).

Training data quality and "intelligence density" (xAI claims higher density per gigabyte).

Inference-time compute (reasoning modes, multi-agent orchestration).

Real-world benchmarks (coding, agentic tasks, multimodality).

Chips Needed & Costs for Pre-Training Runs

Exact per-model costs are not public (models are still training), but here are the best analyses and estimates.

Colossus 2 hardware: ~550,000 NVIDIA GPUs (mostly GB200/GB300 Blackwell variants) at ~$18 billion hardware cost alone (average ~$32k-$40k per GPU). This supports the full parallel training lineup.

Total CapEx is tens of billions of dollars for Colossus 2 (land, power infrastructure, cooling, networking). Includes on-site gas turbines/Megapacks for 400+ MW dedicated power and rapid buildout.

Per-model rough estimates (community/analyst extrapolations).

10T model needs ~$1.5 billion+ in compute (one early analyst call. scales with FLOPs and duration). Initial pre-training phase ~2 months on Colossus 2.

6T model needs Similar order of magnitude but lower. benefits from shared cluster efficiency.

Smaller 1T/1.5T runs: Significantly cheaper/faster due to parallelization.

XAI Training 10 Trillion Parameter Model ??" Likely Out in Mid 2026