Artifacts¶
In ADK, Artifacts represent a crucial mechanism for managing named, versioned binary data associated either with a specific user interaction session or persistently with a user across multiple sessions. They allow your agents and tools to handle data beyond simple text strings, enabling richer interactions involving files, images, audio, and other binary formats.
What are Artifacts?¶
-
Definition: An Artifact is essentially a piece of binary data (like the content of a file) identified by a unique
filename
string within a specific scope (session or user). Each time you save an artifact with the same filename, a new version is created. -
Representation: Artifacts are consistently represented using the standard
google.genai.types.Part
object. The core data is typically stored within theinline_data
attribute of thePart
, which itself contains:data
: The raw binary content asbytes
.mime_type
: A string indicating the type of the data (e.g.,'image/png'
,'application/pdf'
). This is essential for correctly interpreting the data later.
# Example of how an artifact might be represented as a types.Part import google.genai.types as types # Assume 'image_bytes' contains the binary data of a PNG image image_bytes = b'\x89PNG\r\n\x1a\n...' # Placeholder for actual image bytes image_artifact = types.Part( inline_data=types.Blob( mime_type="image/png", data=image_bytes ) ) # You can also use the convenience constructor: # image_artifact_alt = types.Part.from_data(data=image_bytes, mime_type="image/png") print(f"Artifact MIME Type: {image_artifact.inline_data.mime_type}") print(f"Artifact Data (first 10 bytes): {image_artifact.inline_data.data[:10]}...")
-
Persistence & Management: Artifacts are not stored directly within the agent or session state. Their storage and retrieval are managed by a dedicated Artifact Service (an implementation of
BaseArtifactService
, defined ingoogle.adk.artifacts.base_artifact_service.py
). ADK provides implementations likeInMemoryArtifactService
(for testing/temporary storage, defined ingoogle.adk.artifacts.in_memory_artifact_service.py
) andGcsArtifactService
(for persistent storage using Google Cloud Storage, defined ingoogle.adk.artifacts.gcs_artifact_service.py
). The chosen service handles versioning automatically when you save data.
Why Use Artifacts?¶
While session state
is suitable for storing small pieces of configuration or conversational context (like strings, numbers, booleans, or small dictionaries/lists), Artifacts are designed for scenarios involving binary or large data:
- Handling Non-Textual Data: Easily store and retrieve images, audio clips, video snippets, PDFs, spreadsheets, or any other file format relevant to your agent's function.
- Persisting Large Data: Session state is generally not optimized for storing large amounts of data. Artifacts provide a dedicated mechanism for persisting larger blobs without cluttering the session state.
- User File Management: Provide capabilities for users to upload files (which can be saved as artifacts) and retrieve or download files generated by the agent (loaded from artifacts).
- Sharing Outputs: Enable tools or agents to generate binary outputs (like a PDF report or a generated image) that can be saved via
save_artifact
and later accessed by other parts of the application or even in subsequent sessions (if using user namespacing). - Caching Binary Data: Store the results of computationally expensive operations that produce binary data (e.g., rendering a complex chart image) as artifacts to avoid regenerating them on subsequent requests.
In essence, whenever your agent needs to work with file-like binary data that needs to be persisted, versioned, or shared, Artifacts managed by an ArtifactService
are the appropriate mechanism within ADK.
Common Use Cases¶
Artifacts provide a flexible way to handle binary data within your ADK applications.
Here are some typical scenarios where they prove valuable:
-
Generated Reports/Files:
- A tool or agent generates a report (e.g., a PDF analysis, a CSV data export, an image chart).
- The tool uses
tool_context.save_artifact("monthly_report_oct_2024.pdf", report_part)
to store the generated file. - The user can later ask the agent to retrieve this report, which might involve another tool using
tool_context.load_artifact("monthly_report_oct_2024.pdf")
or listing available reports usingtool_context.list_artifacts()
.
-
Handling User Uploads:
- A user uploads a file (e.g., an image for analysis, a document for summarization) through a front-end interface.
- The application backend receives the file, creates a
types.Part
from its bytes and MIME type, and uses therunner.session_service
(or similar mechanism outside a direct agent run) or a dedicated tool/callback within a run viacontext.save_artifact
to store it, potentially using theuser:
namespace if it should persist across sessions (e.g.,user:uploaded_image.jpg
). - An agent can then be prompted to process this uploaded file, using
context.load_artifact("user:uploaded_image.jpg")
to retrieve it.
-
Storing Intermediate Binary Results:
- An agent performs a complex multi-step process where one step generates intermediate binary data (e.g., audio synthesis, simulation results).
- This data is saved using
context.save_artifact
with a temporary or descriptive name (e.g.,"temp_audio_step1.wav"
). - A subsequent agent or tool in the flow (perhaps in a
SequentialAgent
or triggered later) can load this intermediate artifact usingcontext.load_artifact
to continue the process.
-
Persistent User Data:
- Storing user-specific configuration or data that isn't a simple key-value state.
- An agent saves user preferences or a profile picture using
context.save_artifact("user:profile_settings.json", settings_part)
orcontext.save_artifact("user:avatar.png", avatar_part)
. - These artifacts can be loaded in any future session for that user to personalize their experience.
-
Caching Generated Binary Content:
- An agent frequently generates the same binary output based on certain inputs (e.g., a company logo image, a standard audio greeting).
- Before generating, a
before_tool_callback
orbefore_agent_callback
checks if the artifact exists usingcontext.load_artifact
. - If it exists, the cached artifact is used, skipping the generation step.
- If not, the content is generated, and
context.save_artifact
is called in anafter_tool_callback
orafter_agent_callback
to cache it for next time.
Core Concepts¶
Understanding artifacts involves grasping a few key components: the service that manages them, the data structure used to hold them, and how they are identified and versioned.
Artifact Service (BaseArtifactService
)¶
-
Role: The central component responsible for the actual storage and retrieval logic for artifacts. It defines how and where artifacts are persisted.
-
Interface: Defined by the abstract base class
BaseArtifactService
(google.adk.artifacts.base_artifact_service.py
). Any concrete implementation must provide methods for:save_artifact(...) -> int
: Stores the artifact data and returns its assigned version number.load_artifact(...) -> Optional[types.Part]
: Retrieves a specific version (or the latest) of an artifact.list_artifact_keys(...) -> list[str]
: Lists the unique filenames of artifacts within a given scope.delete_artifact(...) -> None
: Removes an artifact (and potentially all its versions, depending on implementation).list_versions(...) -> list[int]
: Lists all available version numbers for a specific artifact filename.
-
Configuration: You provide an instance of an artifact service (e.g.,
InMemoryArtifactService
,GcsArtifactService
) when initializing theRunner
. TheRunner
then makes this service available to agents and tools via theInvocationContext
.
from google.adk.runners import Runner
from google.adk.artifacts import InMemoryArtifactService # Or GcsArtifactService
from google.adk.agents import LlmAgent # Any agent
from google.adk.sessions import InMemorySessionService
# Example: Configuring the Runner with an Artifact Service
my_agent = LlmAgent(name="artifact_user_agent", model="gemini-2.0-flash")
artifact_service = InMemoryArtifactService() # Choose an implementation
session_service = InMemorySessionService()
runner = Runner(
agent=my_agent,
app_name="my_artifact_app",
session_service=session_service,
artifact_service=artifact_service # Provide the service instance here
)
# Now, contexts within runs managed by this runner can use artifact methods
Artifact Data (google.genai.types.Part
)¶
-
Standard Representation: Artifact content is universally represented using the
google.genai.types.Part
object, the same structure used for parts of LLM messages. -
Key Attribute (
inline_data
): For artifacts, the most relevant attribute isinline_data
, which is agoogle.genai.types.Blob
object containing:data
(bytes
): The raw binary content of the artifact.mime_type
(str
): A standard MIME type string (e.g.,'application/pdf'
,'image/png'
,'audio/mpeg'
) describing the nature of the binary data. This is crucial for correct interpretation when loading the artifact.
-
Creation: You typically create a
Part
for an artifact using itsfrom_data
class method or by constructing it directly with aBlob
.
import google.genai.types as types
# Example: Creating an artifact Part from raw bytes
pdf_bytes = b'%PDF-1.4...' # Your raw PDF data
pdf_mime_type = "application/pdf"
# Using the constructor
pdf_artifact = types.Part(
inline_data=types.Blob(data=pdf_bytes, mime_type=pdf_mime_type)
)
# Using the convenience class method (equivalent)
pdf_artifact_alt = types.Part.from_data(data=pdf_bytes, mime_type=pdf_mime_type)
print(f"Created artifact with MIME type: {pdf_artifact.inline_data.mime_type}")
Filename (str
)¶
- Identifier: A simple string used to name and retrieve an artifact within its specific namespace (see below).
- Uniqueness: Filenames must be unique within their scope (either the session or the user namespace).
- Best Practice: Use descriptive names, potentially including file extensions (e.g.,
"monthly_report.pdf"
,"user_avatar.jpg"
), although the extension itself doesn't dictate behavior – themime_type
does.
Versioning (int
)¶
- Automatic Versioning: The artifact service automatically handles versioning. When you call
save_artifact
, the service determines the next available version number (typically starting from 0 and incrementing) for that specific filename and scope. - Returned by
save_artifact
: Thesave_artifact
method returns the integer version number that was assigned to the newly saved artifact. - Retrieval:
load_artifact(..., version=None)
(default): Retrieves the latest available version of the artifact.load_artifact(..., version=N)
: Retrieves the specific versionN
.- Listing Versions: The
list_versions
method (on the service, not context) can be used to find all existing version numbers for an artifact.
Namespacing (Session vs. User)¶
-
Concept: Artifacts can be scoped either to a specific session or more broadly to a user across all their sessions within the application. This scoping is determined by the
filename
format and handled internally by theArtifactService
. -
Default (Session Scope): If you use a plain filename like
"report.pdf"
, the artifact is associated with the specificapp_name
,user_id
, andsession_id
. It's only accessible within that exact session context. -
Internal Path (Example):
app_name/user_id/session_id/report.pdf/<version>
(as seen inGcsArtifactService._get_blob_name
andInMemoryArtifactService._artifact_path
) -
User Scope (
"user:"
prefix): If you prefix the filename with"user:"
, like"user:profile.png"
, the artifact is associated only with theapp_name
anduser_id
. It can be accessed or updated from any session belonging to that user within the app. -
Internal Path (Example):
app_name/user_id/user/user:profile.png/<version>
(Theuser:
prefix is often kept in the final path segment for clarity, as seen in the service implementations). - Use Case: Ideal for data that belongs to the user themselves, independent of a specific conversation, such as profile pictures, user preferences files, or long-term reports.
# Example illustrating namespace difference (conceptual)
# Session-specific artifact filename
session_report_filename = "summary.txt"
# User-specific artifact filename
user_config_filename = "user:settings.json"
# When saving 'summary.txt', it's tied to the current session ID.
# When saving 'user:settings.json', it's tied only to the user ID.
These core concepts work together to provide a flexible system for managing binary data within the ADK framework.
Interacting with Artifacts (via Context Objects)¶
The primary way you interact with artifacts within your agent's logic (specifically within callbacks or tools) is through methods provided by the CallbackContext
and ToolContext
objects. These methods abstract away the underlying storage details managed by the ArtifactService
.
Prerequisite: Configuring the ArtifactService
¶
Before you can use any artifact methods via the context objects, you must provide an instance of a BaseArtifactService
implementation (like InMemoryArtifactService
or GcsArtifactService
) when initializing your Runner
.
from google.adk.runners import Runner
from google.adk.artifacts import InMemoryArtifactService # Or GcsArtifactService
from google.adk.agents import LlmAgent
from google.adk.sessions import InMemorySessionService
# Your agent definition
agent = LlmAgent(name="my_agent", model="gemini-2.0-flash")
# Instantiate the desired artifact service
artifact_service = InMemoryArtifactService()
# Provide it to the Runner
runner = Runner(
agent=agent,
app_name="artifact_app",
session_service=InMemorySessionService(),
artifact_service=artifact_service # Service must be provided here
)
If no artifact_service
is configured in the InvocationContext
(which happens if it's not passed to the Runner
), calling save_artifact
, load_artifact
, or list_artifacts
on the context objects will raise a ValueError
.
Accessing Methods¶
The artifact interaction methods are available directly on instances of CallbackContext
(passed to agent and model callbacks) and ToolContext
(passed to tool callbacks). Remember that ToolContext
inherits from CallbackContext
.
Saving Artifacts¶
- Method:
-
Available Contexts:
CallbackContext
,ToolContext
. -
Action:
- Takes a
filename
string (which may include the"user:"
prefix for user-scoping) and atypes.Part
object containing the artifact data (usually inartifact.inline_data
). - Passes this information to the underlying
artifact_service.save_artifact
. - The service stores the data, assigns the next available version number for that filename and scope.
- Crucially, the context automatically records this action by adding an entry to the current event's
actions.artifact_delta
dictionary (defined ingoogle.adk.events.event_actions.py
). This delta maps thefilename
to the newly assignedversion
.
- Takes a
-
Returns: The integer
version
number assigned to the saved artifact. -
Code Example (within a hypothetical tool or callback):
import google.genai.types as types
from google.adk.agents.callback_context import CallbackContext # Or ToolContext
async def save_generated_report(context: CallbackContext, report_bytes: bytes):
"""Saves generated PDF report bytes as an artifact."""
report_artifact = types.Part.from_data(
data=report_bytes,
mime_type="application/pdf"
)
filename = "generated_report.pdf"
try:
version = context.save_artifact(filename=filename, artifact=report_artifact)
print(f"Successfully saved artifact '{filename}' as version {version}.")
# The event generated after this callback will contain:
# event.actions.artifact_delta == {"generated_report.pdf": version}
except ValueError as e:
print(f"Error saving artifact: {e}. Is ArtifactService configured?")
except Exception as e:
# Handle potential storage errors (e.g., GCS permissions)
print(f"An unexpected error occurred during artifact save: {e}")
# --- Example Usage Concept ---
# report_data = b'...' # Assume this holds the PDF bytes
# await save_generated_report(callback_context, report_data)
Loading Artifacts¶
- Method:
-
Available Contexts:
CallbackContext
,ToolContext
. -
Action:
- Takes a
filename
string (potentially including"user:"
). - Optionally takes an integer
version
. Ifversion
isNone
(the default), it requests the latest version from the service. If a specific integer is provided, it requests that exact version. - Calls the underlying
artifact_service.load_artifact
. - The service attempts to retrieve the specified artifact.
- Takes a
-
Returns: A
types.Part
object containing the artifact data if found, orNone
if the artifact (or the specified version) does not exist. -
Code Example (within a hypothetical tool or callback):
import google.genai.types as types from google.adk.agents.callback_context import CallbackContext # Or ToolContext async def process_latest_report(context: CallbackContext): """Loads the latest report artifact and processes its data.""" filename = "generated_report.pdf" try: # Load the latest version report_artifact = context.load_artifact(filename=filename) if report_artifact and report_artifact.inline_data: print(f"Successfully loaded latest artifact '{filename}'.") print(f"MIME Type: {report_artifact.inline_data.mime_type}") # Process the report_artifact.inline_data.data (bytes) pdf_bytes = report_artifact.inline_data.data print(f"Report size: {len(pdf_bytes)} bytes.") # ... further processing ... else: print(f"Artifact '{filename}' not found.") # Example: Load a specific version (if version 0 exists) # specific_version_artifact = context.load_artifact(filename=filename, version=0) # if specific_version_artifact: # print(f"Loaded version 0 of '{filename}'.") except ValueError as e: print(f"Error loading artifact: {e}. Is ArtifactService configured?") except Exception as e: # Handle potential storage errors print(f"An unexpected error occurred during artifact load: {e}") # --- Example Usage Concept --- # await process_latest_report(callback_context)
Listing Artifact Filenames (Tool Context Only)¶
- Method:
-
Available Context:
ToolContext
only. This method is not available on the baseCallbackContext
. -
Action: Calls the underlying
artifact_service.list_artifact_keys
to get a list of all unique artifact filenames accessible within the current scope (including both session-specific files and user-scoped files prefixed with"user:"
). -
Returns: A sorted
list
ofstr
filenames. -
Code Example (within a tool function):
from google.adk.tools.tool_context import ToolContext
def list_user_files(tool_context: ToolContext) -> str:
"""Tool to list available artifacts for the user."""
try:
available_files = tool_context.list_artifacts()
if not available_files:
return "You have no saved artifacts."
else:
# Format the list for the user/LLM
file_list_str = "\n".join([f"- {fname}" for fname in available_files])
return f"Here are your available artifacts:\n{file_list_str}"
except ValueError as e:
print(f"Error listing artifacts: {e}. Is ArtifactService configured?")
return "Error: Could not list artifacts."
except Exception as e:
print(f"An unexpected error occurred during artifact list: {e}")
return "Error: An unexpected error occurred while listing artifacts."
# This function would typically be wrapped in a FunctionTool
# from google.adk.tools import FunctionTool
# list_files_tool = FunctionTool(func=list_user_files)
These context methods provide a convenient and consistent way to manage binary data persistence within ADK, regardless of the chosen backend storage implementation (InMemoryArtifactService
, GcsArtifactService
, etc.).
Available Implementations¶
ADK provides concrete implementations of the BaseArtifactService
interface, offering different storage backends suitable for various development stages and deployment needs. These implementations handle the details of storing, versioning, and retrieving artifact data based on the app_name
, user_id
, session_id
, and filename
(including the user:
namespace prefix).
InMemoryArtifactService¶
- Source File:
google.adk.artifacts.in_memory_artifact_service.py
- Storage Mechanism: Uses a Python dictionary (
self.artifacts
) held in the application's memory to store artifacts. The dictionary keys represent the artifact path (incorporating app, user, session/user-scope, and filename), and the values are lists oftypes.Part
, where each element in the list corresponds to a version (index 0 is version 0, index 1 is version 1, etc.). - Key Features:
- Simplicity: Requires no external setup or dependencies beyond the core ADK library.
- Speed: Operations are typically very fast as they involve in-memory dictionary lookups and list manipulations.
- Ephemeral: All stored artifacts are lost when the Python process running the application terminates. Data does not persist between application restarts.
- Use Cases:
- Ideal for local development and testing where persistence is not required.
- Suitable for short-lived demonstrations or scenarios where artifact data is purely temporary within a single run of the application.
- Instantiation:
from google.adk.artifacts import InMemoryArtifactService
# Simply instantiate the class
in_memory_service = InMemoryArtifactService()
# Then pass it to the Runner
# runner = Runner(..., artifact_service=in_memory_service)
GcsArtifactService¶
- Source File:
google.adk.artifacts.gcs_artifact_service.py
- Storage Mechanism: Leverages Google Cloud Storage (GCS) for persistent artifact storage. Each version of an artifact is stored as a separate object within a specified GCS bucket.
- Object Naming Convention: It constructs GCS object names (blob names) using a hierarchical path structure, typically:
- Session-scoped:
{app_name}/{user_id}/{session_id}/{filename}/{version}
- User-scoped:
{app_name}/{user_id}/user/{filename}/{version}
(Note: The service handles theuser:
prefix in the filename to determine the path structure).
- Session-scoped:
- Key Features:
- Persistence: Artifacts stored in GCS persist across application restarts and deployments.
- Scalability: Leverages the scalability and durability of Google Cloud Storage.
- Versioning: Explicitly stores each version as a distinct GCS object.
- Configuration Required: Needs configuration with a target GCS
bucket_name
. - Permissions Required: The application environment needs appropriate credentials and IAM permissions to read from and write to the specified GCS bucket.
- Use Cases:
- Production environments requiring persistent artifact storage.
- Scenarios where artifacts need to be shared across different application instances or services (by accessing the same GCS bucket).
- Applications needing long-term storage and retrieval of user or session data.
- Instantiation:
from google.adk.artifacts import GcsArtifactService
# Specify the GCS bucket name
gcs_bucket_name = "your-gcs-bucket-for-adk-artifacts" # Replace with your bucket name
try:
gcs_service = GcsArtifactService(bucket_name=gcs_bucket_name)
print(f"GcsArtifactService initialized for bucket: {gcs_bucket_name}")
# Ensure your environment has credentials to access this bucket.
# e.g., via Application Default Credentials (ADC)
# Then pass it to the Runner
# runner = Runner(..., artifact_service=gcs_service)
except Exception as e:
# Catch potential errors during GCS client initialization (e.g., auth issues)
print(f"Error initializing GcsArtifactService: {e}")
# Handle the error appropriately - maybe fall back to InMemory or raise
Choosing the appropriate ArtifactService
implementation depends on your application's requirements for data persistence, scalability, and operational environment.
Best Practices¶
To use artifacts effectively and maintainably:
- Choose the Right Service: Use
InMemoryArtifactService
for rapid prototyping, testing, and scenarios where persistence isn't needed. UseGcsArtifactService
(or implement your ownBaseArtifactService
for other backends) for production environments requiring data persistence and scalability. - Meaningful Filenames: Use clear, descriptive filenames. Including relevant extensions (
.pdf
,.png
,.wav
) helps humans understand the content, even though themime_type
dictates programmatic handling. Establish conventions for temporary vs. persistent artifact names. - Specify Correct MIME Types: Always provide an accurate
mime_type
when creating thetypes.Part
forsave_artifact
. This is critical for applications or tools that laterload_artifact
to interpret thebytes
data correctly. Use standard IANA MIME types where possible. - Understand Versioning: Remember that
load_artifact()
without a specificversion
argument retrieves the latest version. If your logic depends on a specific historical version of an artifact, be sure to provide the integer version number when loading. - Use Namespacing (
user:
) Deliberately: Only use the"user:"
prefix for filenames when the data truly belongs to the user and should be accessible across all their sessions. For data specific to a single conversation or session, use regular filenames without the prefix. - Error Handling:
- Always check if an
artifact_service
is actually configured before calling context methods (save_artifact
,load_artifact
,list_artifacts
) – they will raise aValueError
if the service isNone
. Wrap calls intry...except ValueError
. - Check the return value of
load_artifact
, as it will beNone
if the artifact or version doesn't exist. Don't assume it always returns aPart
. - Be prepared to handle exceptions from the underlying storage service, especially with
GcsArtifactService
(e.g.,google.api_core.exceptions.Forbidden
for permission issues,NotFound
if the bucket doesn't exist, network errors).
- Always check if an
- Size Considerations: Artifacts are suitable for typical file sizes, but be mindful of potential costs and performance impacts with extremely large files, especially with cloud storage.
InMemoryArtifactService
can consume significant memory if storing many large artifacts. Evaluate if very large data might be better handled through direct GCS links or other specialized storage solutions rather than passing entire byte arrays in-memory. - Cleanup Strategy: For persistent storage like
GcsArtifactService
, artifacts remain until explicitly deleted. If artifacts represent temporary data or have a limited lifespan, implement a strategy for cleanup. This might involve:- Using GCS lifecycle policies on the bucket.
- Building specific tools or administrative functions that utilize the
artifact_service.delete_artifact
method (note: delete is not exposed via context objects for safety). - Carefully managing filenames to allow pattern-based deletion if needed.