Skip to content

Media Tagging

garf for Media Tagging API

PyPI Downloads PyPI

garf-media-tagging simplifies interaction with media_tagging library via SQL queries and can be used with garf framework.

Prerequisites

  • media_tagging library installed locally or running as a HTTP service.

Installation

pip install garf-media-tagging

Usage

Run via CLI

Install garf-executors package to run queries via CLI (pip install garf-executors).

garf <PATH_TO_QUERIES> --source media-tagging \
  --output <OUTPUT_TYPE> \
  --source.endpoint=MEDIA_TAGGING_API_ENDPOINT_URL

where:

Available source parameters

name values comments
endpoint http endpoint when media-tagging API is running
db-uri Optional connection string to DB where tagging results can be found

Queries for Media Tagging API

SELECT
  media_url,
  content.tags[].name AS tags
FROM tag
WHERE
  media_type = 'image'
  AND tagger_type = 'gemini'
  AND media_path IN ({{media}})

Resources

  • tag - identifies tags (pair name: score) uniquely defining media.
  • description - custom description of media; usually fine-tuned via custom_prompt parameter.

Filters

  • media_type - Required, one of: IMAGE, YOUTUBE_VIDEO, WEBPAGE, TEXT, VIDEO.
  • tagger_type - Tagger used to identify tags / descriptions.
  • media_path - location of media.
  • tagging_options - optional parameters to fine-tune tagging.
  • n_tags - number of tags to return.
  • tags - custom tags to find in the media.
  • custom_prompt - prompt to send to LLM.
  • custom_schema - schema for structured output. Supports several built-in schemas.

Fields

You can extract one of the following elements from reach row of API response.

  • media_type
  • media_url
  • identifier
  • processed_at
  • content
    • text for description
    • {name, score} for tag
  • hash

Custom schemas

Built-in schemas

Instead of specifying a schema verbosely you can use a couple of built-in schemas:

Used for returning text.

SELECT
  media_url,
  content.text AS description
FROM description
WHERE
  media_type = 'image'
  AND tagger_type = 'gemini'
  AND media_path IN ({{media}})
  AND tagging_options.custom_prompt='What is this image about?'
  AND tagging_options.custom_schema='string'

Returns floating-point numbers.

SELECT
  media_url,
  content.text AS description
FROM description
WHERE
  media_type = 'image'
  AND tagger_type = 'gemini'
  AND media_path IN ({{media}})
  AND tagging_options.custom_prompt='What is the share of green in this image?'
  AND tagging_options.custom_schema='number'

Returns integers.

SELECT
  media_url,
  content.text AS description
FROM description
WHERE
  media_type = 'image'
  AND tagger_type = 'gemini'
  AND media_path IN ({{media}})
  AND tagging_options.custom_prompt='Rate quality of this image from 1 to 5.'
  AND tagging_options.custom_schema='integer'

Returns True/False

SELECT
  media_url,
  content.text AS description
FROM description
WHERE
  media_type = 'image'
  AND tagger_type = 'gemini'
  AND media_path IN ({{media}})
  AND tagging_options.custom_prompt='Is this image advertising?'
  AND tagging_options.custom_schema='boolean'

Returns value from a specific set of possible strings for classification tasks.

SELECT
  media_url,
  content.text AS description
FROM description
WHERE
  media_type = 'image'
  AND tagger_type = 'gemini'
  AND media_path IN ({{media}})
  AND tagging_options.custom_prompt='Classify this image into one of the categories.'
  AND tagging_options.custom_schema='enum:Category1,Category2'

Full schema specification

If built-in schema is not enough you can specify the directly in the query:

SELECT
  media_url,
  content.text AS description
FROM description
WHERE
  media_type = 'image'
  AND tagger_type = 'gemini'
  AND media_path IN ({{media}})
  AND tagging_options.custom_prompt='Rate this image quality and explain why.'
  AND tagging_options.custom_schema = {{
    {
      "type": "object",
      "properties": {
          "quality_score": {"type": "integer",
          "description":
            "Number from 1 to 5 where 1 means the poorest quality and 5 is the highest"
          },
          "reason": {"type": "string", "description": "Reason for assigning a particular score."}
      }
    }
  }}