Skip to content
Poolside

Two foundation models built for agentic coding.

Poolside trains models from scratch—our own data, our own infrastructure, our own reinforcement learning. Competitive with leading models at a fraction of the compute. Dive into how we build foundation models from scratch.

Laguna XS.2 New Our open-weight agentic coding model.

Our lightest and fastest agentic coding model.
A 33B total parameter Mixture of Experts model with 3B activated parameters. Trained completely in-house on 30T tokens. Our newest generation model open to the community.
A second-generation architecture and our first open-weight model, built on everything we've learned since training Laguna M.1 across synthetic data and RL. Laguna XS.2 performs at its best in our coding agent, as a strong model for rapid agentic iteration.
→ How Poolside trained Laguna XS
Built light and fast, to code and act.
  • Laguna XS.2 33B-A3B
  • Devstral Small 2 24B dense †
  • Gemma 4* 31B dense†
  • Qwen3.5 35B-A3B
  • Qwen3.6 35B-A3B
  • Claude Haiku 4.5* -
  • GPT-5.4 Nano -

† We have chosen to include dense models with larger activated parameter counts to highlight the relative efficiency of MoE models.

Open by nature.
We believe the West needs strong open-weight models, and we're committed to contributing to that ecosystem. By releasing Laguna XS.2 under an Apache 2.0 license, we're inviting the community to evaluate, fine-tune, and build on our work. Whether you want to quantize it or serve it, the weights are yours.
→ Download weights on Hugging Face

Laguna M.1 Introducing our most capable model for agentic coding.

Built and trained in-house for agentic coding.
A 225B total parameter Mixture of Experts model with 23B activated parameters. Trained completely in-house on 30T tokens using 6,144 interconnected NVIDIA H200 GPUs.
We trained this model from scratch, with our own data work, training codebase and async on-policy reinforcement learning in our agent harness—all with agentic coding in mind. Laguna M.1 performs at its best in our coding agent.
→ Technical Dive
If you do research on behalf of an institution or university, we are happy to grant access to the model weights for Laguna M.1 or support higher rate limits on request.
→ models@poolside.ai
Benchmarks.
  • Laguna M.1 225B-A23B
  • Devstral 2 123B dense†
  • GLM-4.7 355B-A32B
  • DeepSeek-V4-Flash 284B-A13B
  • Qwen3.5 397B-A17B
  • Claude Sonnet 4.6 -

† We have chosen to include dense models with larger activated parameter counts to highlight the relative efficiency of MoE models.

Get started with Laguna. Free to use for a limited time.

OpenAI Chat-compatible API.

Laguna M.1 and XS.2 are available on OpenRouter and through our dedicated API, so you can work with your existing tools and harnesses.

→ Get an API key

Use wherever your work gets done.

For the best experience, use our agent harness, pool, and use any ACP-compatible client.

curl -fsSL https://downloads.poolside.ai/pool/install.sh | sh

The Model Factory. The system behind the models. Traditional foundation model training is manual, linear, and slow. We built something different.

The Model Factory is Poolside's internal platform for training, scaling, and experimenting with foundation models. It handles automated evaluation during training, reinforcement learning from code execution, architecture ablations, synthetic data generation, and data mixing—all orchestrated across our GPU clusters.

Experiments that used to take weeks to schedule now run in under an hour. We describe a configuration, and the Factory handles the rest.

Inside the Model Factory

We share work as we go. See our latest thinking on model training, infrastructure, and the path toward AGI.

Start building with Laguna

Bring it into your existing tools and harnesses in minutes.

Footnotes.

All benchmarking for Laguna M.1 and Laguna XS.2 was completed using the Laude Institute's Harbor Framework with our agent harness, using a maximum of 500 steps and sandboxed execution using 8 GB RAM/2 CPUs (with the exception of Terminal-Bench 2.0; see below). The same sampling parameters were used across both models and for all benchmarking: temperature=0.7 and top_k=20.

Some base task images and verifiers were patched to fix infrastructure reliability issues inherent in task setup, such as rate limits on third-party dependencies in external registries used by the verifier. More details outlining these updates and other findings will follow in a future technical blog post.

  • SWE-bench Pro: mean pass@1 averaged over 3 runs.
  • SWE-bench Verified: mean pass@1 averaged over 4 runs.
  • SWE-bench Multilingual: mean pass@1 averaged over 7 runs.
  • Terminal-Bench 2.0: mean pass@1 averaged over 5 runs. 48GB RAM/32 CPUs.

* We used the highest publicly-referenced scores for all comparison models across each benchmark. In all cases these were official scores published in release blog posts or equivalent, with the exception of Gemma 4 31B IT where the highest published scores were reported by the Qwen team, and Claude Haiku 4.5 where the highest published (verified) scores for SWE-bench Pro and Terminal-Bench 2.0 are from their respective official leaderboards .