Overview
media-similarity finds how similar media are and combines them into clusters.
Similarity is calculated based on tags provided by media-tagging library and takes into
account how many similar tags two media have.
Key features
- Clusters media into groups of similar media.
- Finds the most similar media for a set of seed media.
- Performs detailed explanation why two media are consider similar.
Installation
pip install media-similarity
uv add media-similarity
Usage
Clustering
media-similarity can take several media, tag them and combine them into clusters.
media-similarity cluster image1.png image2.png image3.png \
--media-type IMAGE \
--tagger gemini
cluster commands writes data to local / remote storage via garf-io with the following columns:
cluster_id- id of a cluster medium belongs.media_url- identifier of medium.
from media_similarity import MediaClusteringRequest, MediaSimilarityService
service = MediaSimilarityService()
request = MediaClusteringRequest(
media_paths=[
'image1.png',
'image2.png',
'image3.png',
],
media_type='IMAGE',
tagger_type='gemini',
)
clusters = service.cluster_media(request)
cluster_media returns ClusteringResults object which contains the following properties:
clusters- mapping between medium and its assigned cluster id.adaptive_threshold- threshold used to identify whether two media belong to the same cluster.graph_info- stores information on each medium and its relationship to other media.
ClusteringResults can be written to local / remote storage via garf-io with to_garf_report method with the following columns:
cluster_id- id of a cluster medium belongs.media_url- identifier of medium.
Similarity Search
media-similarity can search for similar media given a set of seed media.
Please note that this requires persistence setup.
media-similarity search image1 image2 \
--media-type IMAGE \
--db-uri=<CONNECTION_STRING>
search commands writes data to local / remote storage via garf-io with the following columns:
seed_media_identifier- identifier of media used to perform a search.media_identifier- identifier of found similar media.score- similarity score showing how strong the connection between two media.
from media_similarity import MediaSimilaritySearchRequest, MediaSimilarityService
service = MediaSimilarityService.from_connecting_string(
'sqlite:///tagging_results.db'
)
request = MediaSimilaritySearchRequest(
media_paths=[
'image3.png',
],
media_type='IMAGE',
n_results=1,
)
similar_media = service.find_similar_media(request)
find_similar_media returns list of SimilaritySearchResults objects each containing the following properties:
seed_media_identifier- identifier of media used to perform a search.results- identifiers of the most similar media with their similarity scores.
SimilaritySearchResults can be written to local / remote storage via garf-io with to_garf_report method with the following columns:
cluster_id- id of a cluster medium belongs.media_url- identifier of medium.
Comparison
media-similarity can provide detailed information how two media as similar to each other.
Please note that this requires persistence setup.
media-similarity compare image1 image2 image3 \
--media-type IMAGE \
--db-uri=<CONNECTION_STRING>
compare commands writes data to local / remote storage via garf-io with the following columns:
media_pair_identifier- identifier of a media pair (pipe separated media ids of two media).score- similarity score for media pair.similar_tags- number of common tags between media.similarity_weight_normalized- weight of similar tags normalized by inverse-document frequency.similarity_weight_unnormalized- weight of similar tags.dissimilar_tags- number of dissimilar tags between media.dissimilarity_weight_normalized- weight of dissimilar tags normalized by inverse-document frequency.dissimilarity_weight_unnormalized- weight of dissimilar tags.
from media_similarity import MediaSimilarityComparisonRequest, MediaSimilarityService
service = MediaSimilarityService.from_connecting_string(
'sqlite:///tagging_results.db'
)
request = MediaSimilarityComparisonRequest(
media_paths=[
'image1.png',
'image2.png',
'image3.png',
],
media_type='IMAGE',
)
compared_media = service.compare_media(request)
compare_media returns list of MediaSimilarityComparisonResult objects each containing the following properties:
media_pair_identifier- identifier of a media pair (pipe separated media ids of two media).similarity_score- contains information on number of similar / dissimilar tags and their weights.
MediaSimilarityComparisonResult can be written to local / remote storage via garf-io with to_garf_report method with the following columns:
media_pair_identifier- identifier of a media pair (pipe separated media ids of two media).score- similarity score for media pair.similar_tags- number of common tags between media.similarity_weight_normalized- weight of similar tags normalized by inverse-document frequency.similarity_weight_unnormalized- weight of similar tags.dissimilar_tags- number of dissimilar tags between media.dissimilarity_weight_normalized- weight of dissimilar tags normalized by inverse-document frequency.dissimilarity_weight_unnormalized- weight of dissimilar tags.