The Challenge with Flows Today
Salesforce flows sit at the heart of modern CRM automation, yet authoring them still requires a unique mix of declarative drag‑and‑drop and Apex know‑how. To ease this process, Salesforce has committed to incorporating cutting-edge Generative AI technologies such as Agentforce for Flow (A4F, now generally available). A4F uses AI to generate complete Salesforce flows from a user prompt, which can then be readily deployed on Flow Builder. These tools have already seen rapid adoption by Salesforce Admins, with thousands of unique org sign ups within the first few months.
In Figure 2 below, we present a snapshot of results with our A4F models across two deployments: v1 which uses Mistral-Nemo (12b) finetuned on text-to-flow data, and v2 which uses a stronger Mistral-Small (32b) backbone as well as a larger training corpus that includes synthetic training samples. As a metric, we report the ready-to-activate rate: the % of generations that can be directly activated in a production environment. We benchmark these models against a frontier closed-source LLM, and report performance for two types of flows – those containing only standard objects and flows containing custom objects as well. Despite starting from a significantly smaller backbone than the closed-source LLM, our A4F models strongly outperform the closed-source baseline, especially on custom flows!
This first generation of A4F models, though capable, still treat text-to-flow generation as a token generation problem: accepting a user prompt as input, and generating flow metadata as output (formatted as a JSON string, see Figure 1 above). This design passes up the ability to leverage the extensive business knowhow underpinning Salesforce Flows, e.g. that all flows can be represented as graphs consisting of node “elements” with edge “connectors” with precise triggers that dictate when they are run (in the example above, at 6 am daily). Without this knowledge, we find that models struggle to generate complex flows (e.g. with large and unusual structure or details), which poses a challenge to deploying them in production.
To remedy this, we set out to train Enterprise General Intelligence (EGI) models for flow – proprietary models fine-tuned to surpass out-of-the-box frontier models on enterprise tasks – that explicitly encode such structure and can continually self-improve from interaction within a rich flow simulation environment called Flow Simulator (FlowSim).
How we used Flow Simulator to train EGI models for A4F
Flow Simulator (FlowSim) is a comprehensive framework for building evaluation and training environments that simulate real-world enterprise scenarios. It enables benchmarking and optimization of agents, ensuring they perform reliably in real business applications.
To train flow generation models with FlowSim, we first hand-designed a Domain Specific Language (DSL) representation for flows: a set of function primitives and data models that encode flow structure and domain knowledge which can be composed to construct any flow. We implement this DSL in code as a Python schema, and then translate our existing flow metadata from JSON to DSL. Finally, we train EGI models by fine-tuning a strong open-source backbone to generate DSL flow representations (instead of JSON), in addition to a chain-of-thought trace. With this, we effectively reduce the task to code generation – a task at which LLMs already excel!
We also design automated metrics to evaluate the quality of the flow generations along two dimensions: validity (whether the generated flow is syntactically correct) and correctness (whether the generated flow matches the ground truth). By running our fine-tuned model within simulated orgs and automatically scoring its generations using these metrics as rewards, we continue to train the model with reinforcement learning.
In summary, by reformulating text-to-flow generation as code generation (in a domain specific language) and applying the EGI playbook, we train text-to-flow models that deliver highly accurate production-ready flows in less time.
EGI Phase | Our Build Phase |
1. Synthesize | • Data Curation: Thousands of flows annotated by human experts, including for failed prompts, as well as validated model-generated flows from synthetic user prompts. • Defining a Domain Specific Language (DSL) for flow: Hand-designed Python schema enriched with domain knowledge and real-world constraints (from developer docs) |
2. Measure | • Evaluation: Automatically measure the correctness (eg. topology and flow type) and validity (e.g. ability to load+save) of generated flows within sandbox Salesforce orgs |
3. Train | • EGI Fine‑Tuning: Train EGI models for <prompt> → <chain-of-thought> + <DSL> generation starting from a strong open-source base model (Mistral-Small (34B)) • Iterative self-improvement with Reinforcement Learning (RL): Train EGI model in FlowSim simulation environment using RL with environment rewards. |
To benchmark performance, we had flow experts create a challenging test split of highly complex flows for “AI Appdev” – an ambitious ongoing effort for fully autonomous software development. As the figure below shows, the first generation of A4F models perform modestly on this difficult test set, achieving ready-to-activate rates of 32-35%. We note here that ready-to-activate rate is a stringent metric: most flow generations that are not deemed “ready to activate” are almost always largely accurate and can be successfully activated with only a few human edits. Next, we benchmark our EGI models, and find that they perform significantly better, with the EGI RL model achieving a 48% activation rate (a ~50% relative improvement), despite being trained on 88% less data!
What’s Next
While these early findings showcase the potential of EGI in action, they are only scratching the surface. With Salesforce’s Flow Simulator, we hope to turbocharge EGI model development for a range of enterprise applications within a single comprehensive and tightly integrated ecosystem. Follow us on X to stay tuned for what’s next!

Viraj Prabhu
Research Scientist, AI Research
Viraj Prabhu is a Research Scientist at Salesforce AI working on developing digital AI agents that can perceive, plan, reason, and act in novel environments towards accomplishing complex goals. Previously, we was a graduate student at Georgia Tech where he earned his PhD (advised by Judy Hoffman)…
Read More
and Master’s (advised by Devi Parikh, and awarded the MS research award) degrees, both in Computer Science. He has over a decade of experience in AI research spanning a diverse range of topics in computer vision, NLP, and multimodal AI.

Zeyuan Chen
Senior Manager, Research
Zeyuan Chen is a Senior Manager of Research at Salesforce AI Research, where he has been contributing since 2019. His work focuses on advancing computer vision, machine learning, multimodal AI, AI agents, and workflow automation through code generation and data visualization. He holds a Bachelor’s…
Read More
degree from Huazhong University of Science and Technology, a Master’s from Cornell University, and a Ph.D. from North Carolina State University, experiences that have shaped his journey in AI research.

Ran Xu
Director, AI Research
Ran Xu received his Ph.D. in computer science from University at Buffalo from 2015. Currently, he leads a group of exceptional computer vision and multimodal AI researchers at Salesforce to push the boundary of research and productive AI for CRM.

Denise Pérez
Senior Product Marketing Manager
I am an AI storyteller and thought leader at Salesforce AI Research, where I shape the narrative on what’s next in AI. I help define how tomorrow’s AI is understood today. Since 2021, I’ve been bridging cutting-edge research with real-world impact—translating complex breakthroughs into…
Read More
compelling narratives for Salesforce, our CRM customers, and beyond. I’m passionate about making AI understandable, human, and impossible to ignore.

Silvio Savarese
Executive Vice President and Chief Scientist, Salesforce AI Research
Silvio Savarese is the Executive Vice President and Chief Scientist of Salesforce AI Research, as well as an Adjunct Faculty of Computer Science at Stanford University, where he served as an Associate Professor with tenure until winter 2021. At Salesforce, he shapes the scientific direction and…
Read More
long-term AI strategy by aligning research and innovation efforts with Salesforce’s mission and objectives. He leads the AI Research organization, including AI for C360 and CRM, AI for Trust, AI for developer productivity, and operational efficiency.