CLM is a 3DGS training system that enables large-scale reconstruction on consumer-level GPU setup (e.g., RTX 4090).
Stores parts of the model on CPU memory to overcome GPU memory limits.
Trains 102 million Gaussians on a single RTX 4090, enabling city-scale reconstruction that previously required multi-GPU setup.
Achieves 55–97% of GPU-only training throughput by intelligently pipelining data transfers and computation.
3D Gaussian Splatting is typically trained exclusively on GPUs. As shown in the diagram below, all Gaussians and their attributes are stored in GPU memory. Given a camera view to render, its selects the in-frustum Gaussians and rasterizes these gaussians onto the view.
Fundamental Limitation: While this approach offers fast training speed through GPU parallelism, it faces a critical bottleneck: limited GPU memory capacity. Large or intricate scenes can require hundreds of millions of Gaussians, which easily cause Out of Memory (OOM) on consumer-grade GPUs (e.g., 24GB on an RTX 4090).
CLM leverages CPU memory and CPU computation beyond the GPU. Specifically, it only stores some selection-critical attributes (e.g., position and shape) on GPU while offloading the rest non-critical attributes to CPU memory. During the training, CLM uses these selection-critical attributes to determine which Gaussians are visible and thus selected for rendering each view (i.e., those within view frustum). CLM only loads non-critical attributes of these in-frustum Gaussians into GPU memory, before performing the actual rendering. CLM does Adam for non-critical attributes on CPU.
The diagram below illustrates CLM's offloading pipeline during training:
CLM consistently enables significantly larger models across all scenes. The BigCity scene shows the most notable improvement—6.7x larger than the baseline and 2.2x over naive offloading on RTX 4090.
For small Gaussian models that can fit in GPU memory, CLM achieves 55% (Ithaca) to 90% (Bicycle) of an Enhanced baseline’s throughput on the RTX 4090.
CLM enables training models with 102.2 million Gaussians, achieving a PSNR of 25.15 dB on the BigCity scene. In contrast, the GPU-only baseline is limited to just 15.3 million Gaussians and yields a lower PSNR of 23.93 dB. By supporting 6.7× larger models, CLM demonstrates that quality continues to improve with scale.
@inproceedings{zhao2025clm,
title={CLM: Removing the GPU Memory Barrier for 3D Gaussian Splatting},
author={Hexu Zhao and Xiwen Min and Xiaoteng Liu and Moonjun Gong and Yiming Li and Ang Li and Saining Xie and Jinyang Li and Aurojit Panda},
booktitle={Proceedings of the 2026 International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'26)},
year={2026},
address={Pittsburgh, PA, USA},
url={https://arxiv.org/abs/2511.04951}
}