Optimize agents¶
ADK provides an extendable framework for automated agent optimization based on
evaluation results.
Out of the box, you can use the adk optimize command to quickly optimize
simple agents based on ADK evaluation results using the default optimizer.
For more complex use cases, you can develop samplers that use data from custom
evals, or you can implement new optimization strategies.
Definitions¶
- Sampler: A sampler allows the agent optimizer to evaluate candidate optimized agents. When requested, the sampler provides the optimizer with detailed evaluation results that are useful for eval-guided agent optimization.
- Agent Optimizer: An agent optimizer reviews the evaluation results from the sampler and uses them to improve the agent.
Example - Optimize a Simple Agent with adk optimize¶
In this example, we will use the adk optimize command to update the
instructions of the
hello_world
sample agent based on evaluation results over a small eval set.
Step 1: Specify the Example Dataset¶
The default hello_world agent instructions describe how to determine whether a
number is prime.
The eval set for this example adds another aspect that the agent does not have
instructions for: numbers can be "good" or "bad" depending on their primality.
The optimizer is expected to derive this new rule and add it to the agent
instructions.
Create a file train_eval_set.evalset.json in
contributing/samples/hello_world/
with the following contents:
{
"eval_set_id": "train_eval_set",
"name": "train_eval_set",
"eval_cases": [
{
"eval_id": "simple",
"conversation": [
{
"invocation_id": "inv1",
"user_content": {
"parts": [ {"text": "Is 7 prime?"} ],
"role": "user"
},
"final_response": {
"parts": [ {"text": "7 is a prime number."} ],
"role": "model"
}
}
],
"session_input": {
"app_name": "hello_world",
"user_id": "user"
}
},
{
"eval_id": "is_good",
"conversation": [
{
"invocation_id": "inv1",
"user_content": {
"parts": [ {"text": "Is 4 a bad number?"} ],
"role": "user"
},
"final_response": {
"parts": [ {"text": "4 is not prime so it is a good number."} ],
"role": "model"
}
}
],
"session_input": {
"app_name": "hello_world",
"user_id": "user"
}
},
{
"eval_id": "is_bad",
"conversation": [
{
"invocation_id": "inv1",
"user_content": {
"parts": [ {"text": "Is 5 a bad number?"} ],
"role": "user"
},
"final_response": {
"parts": [ {"text": "5 is prime so it is a bad number."} ],
"role": "model"
}
}
],
"session_input": {
"app_name": "hello_world",
"user_id": "user"
}
}
]
}
Step 2: Define a Sampler Config¶
The sampler config controls the process for evaluating candidate optimized agents. For example, it specifies the correctness criterion for the agent output and also specifies the eval set to use for optimizing the agent.
The full list of configuration options is available below;
for now, simply create a file sampler_config.json in
contributing/samples/hello_world/
with the following contents:
{
"eval_config": {
"criteria": {
"response_match_score": 0.75
}
},
"app_name": "hello_world",
"train_eval_set": "train_eval_set"
}
Step 3: Run the Optimization Job¶
Run the adk optimize command, pointing it to the hello_world agent's
directory and passing the configuration file created above.
adk optimize contributing/samples/hello_world \
--sampler_config_file_path contributing/samples/hello_world/sampler_config.json
The final output varies, but might look similar to the following:
<logs and intermediate output>
================================================================================
Optimized root agent instructions:
--------------------------------------------------------------------------------
<existing unmodified instructions omitted for brevity>
**Special Rules for "Good" and "Bad" Numbers:**
* A "bad number" is defined as a prime number.
* A "good number" is defined as a non-prime number (i.e., a composite number or 1).
* If a user asks if a number is "good" or "bad", you must always use the `check_prime` tool to determine its primality first.
* After determining primality with the tool, respond according to the definitions above. Questions about "good" or "bad" numbers, when referring to primality, are objective and you are fully capable of answering them. Do not state you cannot answer such questions.
================================================================================
Using the adk optimize Command¶
AGENT_MODULE_FILE_PATH: The path to the__init__.pyfile that contains a module by the nameagent. Theagentmodule must contain aroot_agent. For an example of a valid setup, examine thehello_worldagent.--sampler_config_file_path PATH: The path to the config for the sampler. The sampler implementation and config format are described below.--optimizer_config_file_path PATH(optional): The path to the config for the agent optimizer. If not provided, the default config will be used. The optimizer implementation, config format, and default config are described below.--print_detailed_results(optional): Enables printing some detailed metrics measured by the agent optimizer.--log_level(optional): Set the logging level. Default isINFO. Valid options areDEBUG,INFO,WARNING,ERROR, andCRITICAL.
Available Samplers and Agent Optimizers¶
The following samplers and agent optimizers are provided with ADK.
The adk optimize command uses the LocalEvalSampler and
GEPARootAgentPromptOptimizer described below.
You can also use these samplers and agent optimizers in your own scripts.
LocalEvalSampler¶
The
LocalEvalSampler
evaluates candidate agents using the ADK's
LocalEvalService.
It provides eval results as an UnstructuredSamplingResult.
You can configure the LocalEvalSampler with a LocalEvalSamplerConfig that
contains the following fields:
eval_config: AnEvalConfigwhich provides the evaluation criteria and user simulation options.app_name: The app name to use for evaluation.train_eval_set: The name of the eval set to use for optimization.train_eval_case_ids(optional): The ids of the eval cases (examples) to use for optimization. If not provided, all eval cases intrain_eval_setare used.validation_eval_set(optional): The name of the eval set to use for validating the optimized agent. If not provided,train_eval_setis reused.validation_eval_case_ids(optional): The ids of the eval cases (examples) to use for validating the optimized agent. If not provided, all eval cases invalidation_eval_setare used. Ifvalidation_eval_setis also not provided, the effective train eval cases are reused.
While initializing the LocalEvalSampler, you must also provide an
EvalSetsManager
that can access the train and validation eval sets specified in the
LocalEvalSamplerConfig.
GEPARootAgentPromptOptimizer¶
The
GEPARootAgentPromptOptimizer
improves the instructions of the root agent
using the GEPA optimizer.
It expects the sampler to provide eval results as an
UnstructuredSamplingResult.
Its output is a subclass of OptimizerResult which
specifies a list of optimized agents with scores and
additional metrics collected during optimization.
Note: The GEPARootAgentPromptOptimizer does not improve any sub-agents, agent
tools, skills, or any other aspect of the root agent.
You can configure the GEPARootAgentPromptOptimizer with a
GEPARootAgentPromptOptimizerConfig that contains the following fields:
optimizer_model(optional): The model used to analyze evaluation results and optimize the agent. Defaults to"gemini-2.5-flash".model_configuration(optional): The configuration for the optimizer model. Defaults to a config with a 10K token thinking budget.max_metric_calls(optional): The maximum number of evaluations to run during optimization. Defaults to 100.reflection_minibatch_size(optional): The number of examples to use at a time to update the agent instructions. Defaults to 3.run_dir(optional): The directory to save intermediate and final optimization results if desired. Facilitates warm starts.
Key Data Types¶
ADK defines several base data types in
optimization/data_types.py
to regulate the transfer of eval data from the sampler to the optimizer and the
output of the optimizer.
These data types are designed for extensibility to accommodate custom evals and
optimization strategies.
Sampler Results¶
SamplingResult: The foundational class for the output of a sampler.- Must include a
scoresdictionary that maps an example UID to the agent's overall score on that example. UnstructuredSamplingResult: A built-in subclass ofSamplingResultthat adds an optionaldatafield to hold unstructured, per-example, JSON-serializable evaluation data (such as trajectories, intermediate outputs, and sub-metrics).
You can use the UnstructuredSamplingResult for most use cases.
Alternatively, you can create your own subclass of SamplingResult to return
additional evaluation data in a more structured format.
However, you must make sure that both the sampler and the optimizer support your
format.
Agent Optimizer Results¶
AgentWithScores: Represents a single optimized agent along with its overall score.- Must include the
optimized_agent(the updatedAgentobject). - Can include the
overall_scoreof the agent (typically on the validation set). OptimizerResult: Represents the final output of an optimization process.- Must include a list of
optimized_agents(which are objects ofAgentWithScoresor its subclasses). When measuring agent optimality over multiple metrics, multiple entries may be needed to represent the Pareto frontier.
You can create your own subclass of AgentWithScores to expose fine-grained
metrics about the candidate optimized agent.
For example, you might want to separately score the agent on accuracy, safety,
alignment, etc.
Similarly, you can create your own subclass of OptimizerResult to expose
overall metrics about the optimization process for your optimizer (number of
candidates evaluated, total number of evaluations, etc.).
Creating and Using new Samplers and Agent Optimizers¶
If your use case requires complex sampling and evaluation logic or a custom
agent optimization strategy, you can create custom implementations of the
Sampler and AgentOptimizer abstract classes described below.
By adhering to this API, you can mix and match ADK-provided samplers and agent
optimizers with your custom implementations.
Creating a New Sampler¶
To create a new sampler for custom evaluations, you must create a class that
extends the
Sampler base
class.
You must also specify the subclass of SamplingResult that
your sampler will use to return eval results.
The sampler must implement the following abstract methods:
get_train_example_ids(self): Returns the list of example UIDs to use for optimization.get_validation_example_ids(self): Returns the list of example UIDs to use for validating the optimized agent.
sample_and_score(self, candidate, example_set, batch, capture_full_eval_data): Evaluates thecandidateagent on abatchof examples from the specifiedexample_set("train"or"validation"). It should return aSamplingResultsubclass containing the calculated per-example scores and, ifcapture_full_eval_dataisTrue, any additional data required for eval-guided agent optimization. You can choose a format for the additional eval data based on your needs by subclassingSamplingResult. However, the agent optimizer must also support the same subclass ofSamplingResult. TheUnstructuredSamplingResultimplements the simplest case in which the additional data is stored in a per-example unstructured dictionary.
Creating a New Agent Optimizer¶
To create a custom agent optimizer, you must create a class that extends the
AgentOptimizer
base class.
You must also specify the subclass of SamplingResult that
it will accept for eval results and the subclass of
AgentWithScores it will use to represent each
optimized agent and its scores/metrics.
The optimizer must implement the following abstract method:
optimize(self, initial_agent, sampler): This method orchestrates the optimization process. It takes aninitial_agentto improve and asamplerto use for evaluating candidates. It should return anOptimizerResultsubclass containing a list of candidate optimized agents along with their scores/metrics and any overall metrics related to the optimization process. You can choose a format for the per-candidate scores/metrics based on your needs by subclassingAgentWithScores. Alternatively you can simply useAgentWithScoreswhich allows specifying a single overall score for each candidate optimized agent.
Optimizing an Agent Programmatically¶
The adk optimize command uses the LocalEvalSampler and
the GEPARootAgentPromptOptimizer.
When using custom samplers and agent optimizers, you will have to optimize the
agent programmatically.
The following reference code replicates the functionality of the adk optimize
command for the above example.
To use it, create the dataset as shown in the example and run
this code from a Python script within the
same directory:
import asyncio
import logging
import os
import agent # the hello_world agent
from google.adk.cli.utils import envs
from google.adk.cli.utils import logs
from google.adk.evaluation.eval_config import EvalConfig
from google.adk.evaluation.local_eval_sets_manager import LocalEvalSetsManager
from google.adk.optimization.gepa_root_agent_prompt_optimizer import GEPARootAgentPromptOptimizer
from google.adk.optimization.gepa_root_agent_prompt_optimizer import GEPARootAgentPromptOptimizerConfig
from google.adk.optimization.local_eval_sampler import LocalEvalSampler
from google.adk.optimization.local_eval_sampler import LocalEvalSamplerConfig
# setup environment variables (API keys, etc.) and logging
envs.load_dotenv_for_agent(".", ".")
logs.setup_adk_logger(logging.INFO)
# create the sampler
sampler_config = LocalEvalSamplerConfig(
eval_config=EvalConfig(criteria={"response_match_score": 0.75}),
app_name="hello_world", # typically the name of the directory containing the agent
train_eval_set="train_eval_set", # from the example
)
eval_sets_manager = LocalEvalSetsManager(
agents_dir=os.path.dirname(os.getcwd()),
)
sampler = LocalEvalSampler(sampler_config, eval_sets_manager)
# create the optimizer
opt_config = GEPARootAgentPromptOptimizerConfig()
optimizer = GEPARootAgentPromptOptimizer(config=opt_config)
# optimize the root agent
initial_agent = agent.root_agent
result = asyncio.run(
optimizer.optimize(initial_agent, sampler)
)
# show the results
best_idx = result.gepa_result["best_idx"]
print(
"Validation score:",
result.optimized_agents[best_idx].overall_score,
"Optimized prompt:",
result.optimized_agents[best_idx].optimized_agent.instruction,
"GEPA metrics:",
result.gepa_result,
sep="\n",
)