Two foundation models built for agentic coding.
Poolside trains models from scratch—our own data, our own infrastructure, our own reinforcement learning. Competitive with leading models at a fraction of the compute. Dive into how we build foundation models from scratch.
Laguna XS.2 New Our open-weight agentic coding model.
- Laguna XS.2 33B-A3B
- Devstral Small 2 24B dense †
- Gemma 4* 31B dense†
- Qwen3.5 35B-A3B
- Qwen3.6 35B-A3B
- Claude Haiku 4.5* -
- GPT-5.4 Nano -
† We have chosen to include dense models with larger activated parameter counts to highlight the relative efficiency of MoE models.
Laguna M.1 Introducing our most capable model for agentic coding.
- Laguna M.1 225B-A23B
- Devstral 2 123B dense†
- GLM-4.7 355B-A32B
- DeepSeek-V4-Flash 284B-A13B
- Qwen3.5 397B-A17B
- Claude Sonnet 4.6 -
† We have chosen to include dense models with larger activated parameter counts to highlight the relative efficiency of MoE models.
Get started with Laguna. Free to use for a limited time.
OpenAI Chat-compatible API.
Laguna M.1 and XS.2 are available on OpenRouter and through our dedicated API, so you can work with your existing tools and harnesses.
→ Get an API keyUse wherever your work gets done.
For the best experience, use our agent harness, pool, and use any ACP-compatible client.
The Model Factory. The system behind the models. Traditional foundation model training is manual, linear, and slow. We built something different.
The Model Factory is Poolside's internal platform for training, scaling, and experimenting with foundation models. It handles automated evaluation during training, reinforcement learning from code execution, architecture ablations, synthetic data generation, and data mixing—all orchestrated across our GPU clusters.
Experiments that used to take weeks to schedule now run in under an hour. We describe a configuration, and the Factory handles the rest.
Inside the Model Factory
We share work as we go. See our latest thinking on model training, infrastructure, and the path toward AGI.
Poolside's journey to AGI
When we founded Poolside in San Francisco in April 2023, the narrative in the industry was that all we needed to reach AGI was to scale up language modelling.Tools of the Trade: C2C Activation Offloading on Grace Blackwell
We demonstrate the potential of NVIDIA's NVLink C2C on Grace-based superchips as a high-performance alternative to selective activation checkpointing. By offloading MLP activations to host memory during training, we achieve a 6–13% throughput improvement over selective AC with negligible memory overhead.Enabling the Agentic Enterprise with Redpanda
Our strategic partnership with Redpanda's Agentic Data Plane
Start building with Laguna
Bring it into your existing tools and harnesses in minutes.Footnotes.
All benchmarking for Laguna M.1 and Laguna XS.2 was completed using the Laude Institute's
Harbor Framework with our agent harness, using a maximum of 500 steps and sandboxed execution using 8 GB RAM/2 CPUs (with the
exception of Terminal-Bench 2.0; see below). The same sampling parameters were used across
both models and for all benchmarking: temperature=0.7 and top_k=20.
Some base task images and verifiers were patched to fix infrastructure reliability issues inherent in task setup, such as rate limits on third-party dependencies in external registries used by the verifier. More details outlining these updates and other findings will follow in a future technical blog post.
- SWE-bench Pro: mean pass@1 averaged over 3 runs.
- SWE-bench Verified: mean pass@1 averaged over 4 runs.
- SWE-bench Multilingual: mean pass@1 averaged over 7 runs.
- Terminal-Bench 2.0: mean pass@1 averaged over 5 runs. 48GB RAM/32 CPUs.
* We used the highest publicly-referenced scores for all comparison models across each benchmark. In all cases these were official scores published in release blog posts or equivalent, with the exception of Gemma 4 31B IT where the highest published scores were reported by the Qwen team, and Claude Haiku 4.5 where the highest published (verified) scores for SWE-bench Pro and Terminal-Bench 2.0 are from their respective official leaderboards .