Dynamic Tool Allocation for AI Agents (The RATS Pattern)

12 min read

Dynamic Tool Allocation for AI Agents (The RATS Pattern)

TL;DR

  • Problem: "Tool overload" is a critical bottleneck for AI agents. Providing an LLM with a large, static list of tools bloats the context window, degrading performance, increasing costs, and reducing accuracy.
  • Solution: Implement a "select, then execute" architectural pattern. Use a lightweight "router" agent to first retrieve a small, relevant subset of tools for a specific task. Then, a more capable "specialist" agent uses that curated set to execute the request.
  • Benefits: Lower latency and cost (fewer tokens), higher tool-selection precision, a scalable architecture for large tool catalogs, and improved reliability.
  • Pattern: This pattern is a form of Retrieval-Augmented Generation (RAG) applied to tools, often called Retrieval-Augmented Tool Selection (RATS). It can be combined with State-Based Gating for even greater precision.
  • How: This post provides a complete, production-aware implementation using Google's Agent Development Kit (ADK).

1. Introduction: The Scalability Challenge of "Tool Overload"

What happens when you give your AI agent access to 50 tools, only to watch it fumble and select the wrong one? As AI agents become more sophisticated, they're equipped with an expanding arsenal of tools. However, the standard approach of providing an agent's LLM with a complete list of every available tool presents a significant scaling challenge. Tool schemas and descriptions can add 500–1500 tokens per tool; with 50+ tools, you can easily bloat prompts by over 25k–75k tokens.

This "tool overload" leads to several critical failure modes:

  • Degraded Performance: The model's ability to accurately select the correct tool diminishes as it gets distracted by the noise of irrelevant options.
  • Increased Latency and Cost: Larger prompts require more processing time and result in higher API costs.
  • Reduced Accuracy: The risk of "API hallucination" (calling a non-existent tool or using incorrect parameters) increases. The agent also suffers from "prompt drifting," where its behavior changes unpredictably as the tool list is modified, making it brittle and hard to maintain.

The solution is a powerful architectural pattern you can implement today: Dynamic Tool Allocation. Instead of giving your agent a warehouse of tools, give it a curated spice rack perfectly suited for the task at hand.

2. The "Select, then Execute" Pattern (RATS)

This pattern separates the problem of finding the right capability from executing the specific task. This is RAG for tools, or Retrieval-Augmented Tool Selection (RATS).

  1. Selection (The Router): A lightweight "router" agent analyzes the user's query. Its sole job is to identify and retrieve a small, relevant subset of tools from a much larger collection (e.g., a vector database). This step answers, "What capability is needed?"
  2. Execution (The Specialist): The main "action" agent receives only this curated list of relevant tools. With a focused context, it can accurately choose the correct tool and parameters to fulfill the request. This step answers, "How do I solve this with these specific tools?"

Visual Flow: User Query -> Orchestrator Agent -> Tool Retriever Tool -> Curated Tool List -> Action Agent -> Tool Execution -> Final Response

This pattern can be combined with State-Based Gating for structured workflows. For example, state gating could determine the agent is in an ORDER_INQUIRY state, exposing only order-related tools. RATS would then select the single best tool (track_order) from that pre-filtered set for the specific query (“Where’s my stuff?”), achieving maximum precision.

3. Implementation with Google ADK

We will use two different agent models to optimize for cost and performance:

  • Orchestrator Agent (gemini-1.5-flash): A fast, inexpensive model for routing.
  • Action Agent (gemini-1.5-pro): A more powerful model for complex reasoning and execution.

Step 1: Define a Structured, Production-Ready Toolset

First, we define our full toolset. Note the best practices: functions return structured dictionaries (not JSON strings) and include try/except blocks for robust error handling.

tools/retail_tools.py

from typing import Dict, Any
from google.adk.tools import FunctionTool

# Simulated data sources for the demo
ORDERS = { "A12345": {"status": "Shipped", "eta": "2025-09-06"} }
PRODUCT_CATALOG = [ {"id": "SKU123", "name": "Sneaker X", "price": 79.99} ]

def track_order(order_id: str) -> Dict[str, Any]:
    try:
        order = ORDERS.get(order_id)
        if not order:
            return {"order_id": order_id, "found": False, "message": "Order not found."}
        return {"order_id": order_id, "found": True, "status": order["status"], "eta": order["eta"]}
    except Exception as e:
        return {"order_id": order_id, "found": False, "error": str(e)}

def search_products(query: str) -> Dict[str, Any]:
    try:
        results = [p for p in PRODUCT_CATALOG if query.lower() in p["name"].lower()]
        return {"query": query, "results": results}
    except Exception as e:
        return {"query": query, "results": [], "error": str(e)}

# This dictionary simulates a database of all available tool metadata.
RETAIL_TOOL_METADATA = {
    "track_order": {
        "description": "Get shipping status and estimated delivery for a specific order.",
        "parameters": {"order_id": {"type": "str", "description": "The unique order identification number."}},
        "tool_fn": track_order,
    },
    "search_products": {
        "description": "Search the retail catalog by keyword to find items.",
        "parameters": {"query": {"type": "str", "description": "A search term for products."}},
        "tool_fn": search_products,
    },
}

# A dictionary to hold instantiated FunctionTool objects for easy access.
ALL_RETAIL_TOOLS = {
    name: FunctionTool(
        name=name,
        description=meta["description"],
        parameters=meta["parameters"],
        function=meta["tool_fn"],
    ) for name, meta in RETAIL_TOOL_METADATA.items()
}

Step 2: Build a Smarter Tool Retriever with Limits

This special tool simulates a semantic search by scoring tools and returning only the top-k results that meet a confidence threshold. This prevents re-introducing the "tool overload" problem with a long list of marginally relevant tools.

tools/tool_retrieval.py

from typing import Dict, Any, List
from google.adk.tools import FunctionTool
from .retail_tools import RETAIL_TOOL_METADATA

def retrieve_relevant_tools(user_query: str, top_k: int = 3, min_score: float = 0.2) -> List[Dict[str, Any]]:
    """
    Retrieves a small, relevant subset of tools for the given query.
    In production, replace this with a real vector database query.
    """
    query_terms = set(user_query.lower().split())
    scored_tools = []

    for name, meta in RETAIL_TOOL_METADATA.items():
        description_terms = set(meta['description'].lower().split())
        score = len(query_terms.intersection(description_terms)) / len(query_terms)
        if score >= min_score:
            scored_tools.append({"name": name, "score": round(score, 2)})

    # Sort by score and enforce returning a small shortlist
    sorted_tools = sorted(scored_tools, key=lambda x: x['score'], reverse=True)
    return sorted_tools[:top_k]

ToolRetrieverTool = FunctionTool(
    name="retrieve_relevant_tools",
    description="Selects a ranked list of the most relevant tools based on the user's query.",
    parameters={
        "user_query": {"type": "str", "description": "The full user query."},
        "top_k": {"type": "int", "description": "Max number of tools to return (default: 3)."},
        "min_score": {"type": "float", "description": "Minimum relevance score (0-1, default: 0.2)."}
    },
    function=retrieve_relevant_tools,
)

Step 3: Define the Orchestrator and Specialist Agents

The orchestrator gets a hardened prompt to ensure it calls the retriever. The specialist agent is created dynamically and handles the "no tools found" edge case.

agents/main_agent.py

from typing import Dict, Any, List
from google.adk.agents import Agent
from google.adk.agents.types import Prompt
from tools.retail_tools import ALL_RETAIL_TOOLS
from tools.tool_retrieval import ToolRetrieverTool

# 1. The Hardened Orchestrator (Router) Agent
orchestrator_agent = Agent(
    model="gemini-1.5-flash",
    name="OrchestratorAgent",
    instruction=Prompt(
        "You are an expert routing agent. Your primary and only task is to analyze the user's "
        "query and call the `retrieve_relevant_tools` tool to identify which tools are needed. "
        "Do NOT, under any circumstances, attempt to answer the user's query directly. "
        "Your sole output should be a call to the retrieval tool."
    ),
    tools=[ToolRetrieverTool],
)

# 2. A factory function to create our Specialist Agent on-the-fly
def get_action_agent(retrieved_tools: List[Dict[str, Any]]) -> Agent:
    """Creates a new agent instance with a curated set of tools."""
    tool_names = [t["name"] for t in (retrieved_tools or [])]
    curated_tools = [ALL_RETAIL_TOOLS[name] for name in tool_names if name in ALL_RETAIL_TOOLS]

    if curated_tools:
        instruction = Prompt(
            "You are a specialized retail assistant. Use ONLY the provided tools to answer the user's query."
        )
    else:
        # Handle the edge case where no relevant tools are found
        instruction = Prompt(
            "You are a helpful assistant. You have been provided with no specific tools. "
            "Answer the user's query based on your general knowledge."
        )

    return Agent(
        model="gemini-1.5-pro",
        name="ActionAgent",
        instruction=instruction,
        tools=curated_tools,
    )

Step 4: Create the Dynamic Execution Loop with Session Isolation

The orchestration script runs the two stages. Crucially, it uses a new, clean session for the Action Agent to avoid any "context pollution" from the orchestrator's turn.

run_dynamic_agent.py

import asyncio
from typing import List, Dict, Any
from google.adk.sessions import InMemorySessionService
from google.adk.runners import Runner
from google.genai import types
from agents.main_agent import orchestrator_agent, get_action_agent

async def run_query(user_query: str):
    print(f"
=========================================
USER QUERY: '{user_query}'
=========================================
")

    # === STAGE 1: ORCHESTRATION ===
    print("--- Running Orchestrator Agent to select tools ---")
    orchestrator_session_service = InMemorySessionService()
    orchestrator_runner = Runner(agent=orchestrator_agent, session_service=orchestrator_session_service)
    orchestrator_events = orchestrator_runner.run_async(
        user_id="user123", session_id="session_orch_001", new_message=types.Part(text=user_query)
    )

    retrieved_tools: List[Dict[str, Any]] = []
    async for event in orchestrator_events:
        if event.tool_call_result and event.tool_call_result.call.name == "retrieve_relevant_tools":
            retrieved_tools = event.tool_call_result.output or []
            tool_names = [tool['name'] for tool in retrieved_tools]
            print(f"Orchestrator retrieved tools: {tool_names}
")
            break

    # === STAGE 2: DYNAMIC ACTION ===
    print("--- Creating and running Specialist Action Agent in a clean session ---")
    action_agent = get_action_agent(retrieved_tools)
    
    # Use a new session service or a new session ID to ensure a clean context for the specialist.
    action_session_service = InMemorySessionService()
    action_runner = Runner(agent=action_agent, session_service=action_session_service)
    
    final_response_events = action_runner.run_async(
        user_id="user123", session_id="session_action_001", new_message=types.Part(text=user_query)
    )
    
    async for final_event in final_response_events:
        if final_event.is_final_response():
            print(f"FINAL RESPONSE: {final_event.content.text}")
            break

async def main():
    await run_query("Where is my order A12345?")
    await run_query("Can you tell me a joke?")

if __name__ == "__main__":
    asyncio.run(main())

4. Production Considerations and Best Practices

To move this pattern from a prototype to a robust production system, consider these upgrades:

  • Retrieval: Use a vector database (e.g., Vertex AI Vector Search, Pinecone) to index tool descriptions, parameters, and usage examples. For best results, use a hybrid search approach that combines semantic (vector) search with keyword-based search (like BM25).
  • Ranking and Access Control: Rank retrieved tools not just by semantic similarity but also by signals like historical success rate, execution cost, latency, and user permissions. Implement Guardrails as an access control layer to enforce allow/deny lists based on user roles.
  • Observability and Evaluation: Log the entire decision process: which tools were retrieved, their scores, which one was selected, and the execution outcome. Integrate with an observability platform like OpenTelemetry or Langfuse. Track key metrics like tool-call precision/recall and task completion rate.
  • Schema Management (CI/CD): To prevent schema drift, integrate your tool database with your CI/CD pipeline. Automatically re-index the vector database whenever a tool's schema, function signature, or description changes.
  • Security and Privacy: Always validate and sanitize inputs passed to your tools to prevent injection attacks. Ensure all data handling complies with security and privacy requirements.
  • Caching: Cache frequent retrieval results and tool outputs (with an appropriate TTL) to reduce latency and API costs.

5. When Not to Use This Pattern

Dynamic tool allocation introduces complexity and may be overkill if your agent has:

  • A very small, stable set of tools (e.g., fewer than 7-10).
  • An ultra-low-latency use case where the extra retrieval step is unacceptable.
  • A workflow where every tool is potentially relevant to every query.

6. Conclusion: From Warehouse to Spice Rack

Dynamic tool allocation represents a crucial paradigm shift from building static, monolithic agents to orchestrating dynamic, composable agent systems. It is an essential architecture for building AI agents that are scalable, efficient, and accurate. By graduating from the "tool warehouse" to the curated "spice rack," we build agents that are faster, more accurate, and more cost-effective. Frameworks like Google ADK provide the necessary orchestration capabilities to implement this pattern effectively, paving the way for production-ready AI applications.

Ready to optimize your AI agents? Try implementing the RATS pattern in your next project using the code examples above. I'd love to hear about your results and any challenges you encounter along the way.


Further Reading