xLAM Enters Its Next Era: The Evolution of Large Action Models

Latest
May 1, 2025

Introduction

As AI agents become part of our everyday workflows, we expect them to handle tasks quickly, clearly, and thoroughly. But to do that, they need to fully understand what we’re asking — and that often takes more than a single message.

Take this simple ask:

“Can you cancel an order for customer Jane Doe?”

Seems easy enough, right? But for the AI agent to actually take action, it needs more context. So it follows up with questions like:

“Which order number should I cancel?”

“Would you like to issue a refund as well?”

What started as a one-click task has ballooned into a multi-step conversation—technically known as a multi-turn interaction, where the AI engages in a back-and-forth dialogue to gather information, clarify intent, and complete a task across several conversational turns. In the world of enterprise AI, workflows are rarely straightforward. AI agents need to reason, ask follow-ups and clarifying questions, and adapt in real time.

To meet the demands of these complex workflows, enterprises need more than just a responsive assistant—they need an agentic system. One that can interpret intent, plan ahead, take action, and adapt as new information comes in. It’s not just about understanding language; it’s about doing something with it.

Last year, we introduced xLAM— Salesforce’s family of Large Action Models built for function calling, reasoning, and planning. Today, we’re excited to share some major upgrades to these models: xLAM now supports multi-turn, natural conversations, enabling more complex, real-world agentic tasks. We’ve also expanded the model portfolio to increase accessibility and deployment flexibility across diverse enterprise environments.

What are Large Action Models?

Large Action Models (LAMs) are specialized, compact language models optimized for speed, precision, and real-world execution. Unlike traditional LLMs that focus on predicting the next word, LAMs are built to predict and perform the next action—making them ideal for powering AI agents that can reason, decide, and act.

In early experiments, this targeted approach has enabled our LAMs to match—and in some cases outperform—larger LLMs, all while maintaining a significantly smaller footprint, as seen on the Berkeley Function-Calling Leaderboard (as of April 30, 2025).

Their smaller size brings key advantages: lower cost, faster inference, and improved sustainability. That speed isn’t just about shaving off milliseconds—it’s what enables real-time responsiveness in high-volume environments like CRM. When a sales rep updates a record, or a service agent triggers an automation, there’s no time to wait for a lagging model. LAMs deliver answers—and actions—on the spot. That makes them especially powerful in CRM, where it’s not just about understanding a request—it’s about executing it, instantly and accurately. Whether automating workflows or managing tasks across systems, LAMs shift AI from passive responses to active outcomes.

xLAM: Salesforce AI Research’s family of large action models

[The Berkeley Function Calling Leaderboard as of April 30, 2025]

At Salesforce AI Research, we’ve made key upgrades to xLAM to meet these real-world demands. Here’s what’s under the hood:

Multi-Turn Tool Calling

In the first version of xLAM, we built models assuming the customer’s query would contain all the information needed to complete the task. A clean, efficient baseline as shown below.

[Example of a single turn conversation]

But in real-life enterprise environments, this assumption doesn’t hold.

Customers don’t always give complete inputs. Sometimes, they’re vague:

“I need help with a refund.”

Other times, they’re confused:

“I tried to cancel my order but it didn’t work.”

And often, the information needed to complete a task isn’t in the user’s message at all—it lives in another system, like an order management tool, a knowledge base, or an inventory tracker.

To address this, we need to reimagine how AI agents interact not just with customers, but also with tools—and how they stitch those pieces together over the course of a task.

[Example of a multi-turn conversation]

We’ve now introduced support for multi-turn tool calling—a major step beyond traditional one-shot execution, where an agent attempts to complete a task in a single interaction with limited context. With multi-turn capabilities, agents can engage in ongoing dialogue: calling tools, interpreting outputs, asking clarifying questions, and adapting their actions as new information becomes available. It’s how simple requests become intelligent, end-to-end workflows.

APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay

To support multi-turn reasoning, you need to train models on high-quality examples of it. And to get that kind of data at scale, we built a new data generation pipeline called APIGen-MT. It’s a two-step framework designed to create realistic, diverse, and verifiable multi-turn agent conversations. First, it generates task blueprints—detailed plans that include the correct sequence of actions—using a mix of LLM reviewers and feedback loops to ensure quality. Then, those blueprints are turned into full interaction flows by simulating back-and-forth dialogues between a human and an agent. These conversations show how agents gather information, call tools at each step, and make decisions along the way. We’re also incorporating real-world patterns to keep the training data relevant and grounded.

New Model Sizes Join the xLAM family

We are now adding more models to the xLAM research portfolio including:

xLAM-2-1B-r: An update of our “Tiny Giant” for on-device applications with better tool calling performance.
xLAM-2-3B-r and xLAM-2-8B-r: Models designed for swift academic exploration with limited GPU resources.
xLAM-2-32B-r: Ideal for industrial applications striving for a balanced combination of latency, resource consumption, and performance.
xLAM-2-70B-r: The best-performing xLAM research model, if you have great computational resources. See Berkeley Function-Calling Leaderboard

These additions to our family of xLAM models are designed to increase accessibility and deployment flexibility across diverse environments.

Conclusion

Early results are promising: our Large Action Models are proving to be both fast and precise, thanks to their smaller, specialized architecture and optimization for agent-specific tasks. By navigating complex workflows and making intelligent decisions, xLAM not only boosts operational efficiency, but also reimagines how users interact with CRM systems.

This work is currently a research effort, however, we are in the process of building stronger xLAM models for internal use in our Salesforce products. While there’s still progress to be made—refining performance, ensuring robustness, and expanding real-world testing—we see strong potential for AI agents to respond more quickly and efficiently in the future using Large Action Models.

Explore More

Acknowledgements

Enhanced Multi-turn Data Synthesize: Zuxin Liu, Akshara Prabhakar

Unified Data Pipeline: Ming Zhu, Jianguo Zhang, Tulika Awalgaonkar

Model Training: Zuxin Liu, Jianguo Zhang, Thai Hoang

Benchmarking and Evaluation: Zuxin Liu, Ming Zhu, Shiyu Wang, Haolin Chen, Zhiwei Liu, Jianguo ZhangManagement Team: Shelby Heinecke, Weiran Yao, Juan Carlos Niebles, Huan Wang, Caiming Xiong

Source link