Introducing Laguna XS.2 and Laguna M.1

We’re releasing two foundation models and two products into preview today.

Today we’re releasing two foundation models and two products focused on agentic coding to the world. We're releasing Laguna M.1, our 225B model (23B active) that was pre-trained last year and will now be publicly available.

We’re also releasing our next generation model, Laguna XS.2, as open-weights. 33B total parameters, 3B active. Apache 2.0. Available today on Hugging Face.

Both models are free to use for a limited time via our API and on OpenRouter.

To enable the best experience with our models, we have two products going into preview today:

pool, our terminal-based coding agent, and
Shimmer, a cloud dev experience for iterating on web apps, APIs, and CLIs with our models.

Our most capable model to date

Laguna M.1 is a 225B total parameter model with 23B activated parameters, built for agentic coding and long-horizon work.

Laguna M.1 225B-A23B
Devstral 2 123B dense†
GLM-4.7 355B-A32B
DeepSeek-V4-Flash 284B-A13B
Qwen3.5 397B-A17B
Claude Sonnet 4.6 -

SWE-bench Verified Resolved tasks on SWE-bench Verified.

SWE-bench Multilingual Resolved tasks on SWE-bench Multilingual.

SWE-bench Pro Resolved tasks on SWE-bench Pro.

Terminal-Bench 2.0 Resolved tasks on Terminal-Bench 2.0.

† We have chosen to include dense models with larger activated parameter counts to highlight the relative efficiency of MoE models.

(Extra) Small model, big story

Laguna XS.2. At 33B total parameters and 3B activated, it runs on a single GPU and stands up against models several times its size for agentic coding. It's our first open-weight release, and we're excited to put it in the hands of the community.

Laguna XS.2 33B-A3B
Qwen3.5 35B-A3B
Qwen3.6 35B-A3B
Devstral Small 2 24B dense†
Gemma 4* 31B dense†
Claude Haiku 4.5* -
GPT-5.4 Nano -

SWE-bench Verified Resolved tasks on SWE-bench Verified.

SWE-bench Multilingual Resolved tasks on SWE-bench Multilingual.

SWE-bench Pro Resolved tasks on SWE-bench Pro.

Terminal-Bench 2.0 Resolved tasks on Terminal-Bench 2.0.

† We have chosen to include dense models with larger activated parameter counts to highlight the relative efficiency of MoE models.

Laguna XS.2 is trained from scratch in our Model Factory, continuing the work pioneered by Laguna M.1, our largest model to date. XS.2 started pre-training 5 weeks ago and today is being released fully post-trained. Expect continued open progress from us as we iterate on and scale up the Laguna family in the future.

Our hope is to work closely with the feedback and collaboration of the world to accelerate progress towards the frontier.

Laguna M.1 is in absolute terms our most capable model by far, and while a generation behind XS.2, it's a model we’re excited to get in the hands of developers.

These models are the work of approximately 60 people who make up our Applied Research organization, across infrastructure, architecture, data, pre-training, and reinforcement learning. We’re excited to share them with you.

What this means for us

Poolside is a foundation model lab with a focus on agentic models. Pre-training, post-training, agent RL, all within our Model Factory. Today is the first time we're shipping models in public. We're excited to get everyone’s feedback, so we can continue to ship increasingly more capable models.

Why open weights for XS.2

We want to see what people build with Laguna XS.2. You get the weights, an Apache 2.0 license, and the freedom to run it however you want.

The open-weight ecosystem in the West is still early in its development. We want to change that. And the fastest way for us to improve our work is to put it in the hands of people who'll push it. Expect more from us in the open ecosystem going forward.

Get started

Install pool, our terminal-based coding agent, for the best agent experience with our models.
- Both models are available from own inference endpoint and OpenRouter
- Laguna XS.2 is locally available with Ollama with native support for MLX: run ollama launch pool --model laguna-xs.2
Build with both models in Shimmer, an instant-on Virtual Machine sandbox with Poolside Agent installed. Build web apps, APIs, CLIs and more to try out Laguna M.1 and Laguna XS.2.
Use our API or OpenRouter to interact with Laguna M.1 & XS.2. Create a key at platform.poolside.ai, or call the models on OpenRouter. Free for a limited time.
Download Laguna XS.2 model weights from Hugging Face. Weights are live today under Apache 2.0.

For a fuller technical story—synthetic data and automixing, our work on the Muon optimizer, async on-policy agent RL, our Titan training codebase, and the agent harness we use internally—check out our technical blog post, A Deeper Dive, and stay tuned for our upcoming technical report.

If you're a small team building with models at a startup, university or institution, we're happy to raise rate limits or share Laguna M.1 weights on request. Email us at models@poolside.ai or send us a DM on X.

Footnotes: All benchmarking for Laguna M.1 and Laguna XS.2 was completed using the Laude Institute's Harbor Framework with our agent harness, using a maximum of 500 steps and sandboxed execution using 8 GB RAM/2 CPUs (with the exception of Terminal-Bench 2.0; see below). The same sampling parameters were used across both models and for all benchmarking: temperature=0.7 and top_k=20.

Some base task images and verifiers were patched to fix infrastructure reliability issues inherent in task setup, such as rate limits on third-party dependencies in external registries used by the verifier. More details outlining these updates and other findings will follow in a future technical blog post.

SWE-bench Pro: mean pass@1 averaged over 3 runs.
SWE-bench Verified: mean pass@1 averaged over 4 runs.
SWE-bench Multilingual: mean pass@1 averaged over 7 runs.
Terminal-Bench 2.0: mean pass@1 averaged over 5 runs. 48GB RAM/32 CPUs.

* We used the highest publicly-referenced scores for all comparison models across each benchmark. In all cases these were official scores published in release blog posts or equivalent, with the exception of Gemma 4 31B IT where the highest published scores were reported by the Qwen team, and Claude Haiku 4.5 where the highest published (verified) scores for SWE-bench Pro and Terminal-Bench 2.0 are from their respective official leaderboards.

Table of contents