You can attach metadata to filter queries in Amazon S3 vector buckets based on specific conditions:
Example: Let's say you have a vector bucket that stores product embeddings for an e-commerce store. You want to be able to perform similarity searches on the products, but also filter the results based on product category and date added.
To do this, you can attach metadata to each vector object when you upload it to the vector bucket. The metadata could include fields like:
product_category: Electronics, Apparel, Home, etc.
date_added: The date the product was added to the catalog
When you perform a similarity search on the vector bucket, you can include filters in the query to only return results that match certain metadata conditions. For example:
# Search for products similar to a given product vector
# But only return results in the "Electronics" category
# And that were added in the last 30 days
response = s3.search_vectors(
Bucket='my-vector-bucket',
IndexName='product-index',
QueryVector=product_vector,
FilterQuery={
'product_category': 'Electronics',
'date_added': {'gte': (datetime.now() - timedelta(days=30)).isoformat()}
}
)
This allows you to narrow down your similarity search results to only the most relevant products based on the metadata you've attached. You can get creative with the metadata fields to enable all sorts of filtering and faceting capabilities on top of your vector data.
The key thing is to plan ahead and decide what metadata fields will be most useful for your specific use case when storing vectors in an Amazon S3 vector bucket. This metadata can then power more advanced querying and filtering capabilities.
When organizing your S3 vector buckets for efficient search, there are a few key principles to consider:
Create a top-level vector bucket (e.g. my-ecommerce-vectors)
Within that, create separate indexes for different types of vectors
e.g. my-ecommerce-vectors/products-index, my-ecommerce-vectors/customers-index, etc.
This allows you to scope your searches to the appropriate index.
As we discussed earlier, attach relevant metadata to each vector object
This could include product category, date added, brand, etc.
Use this metadata to filter your similarity searches
If your vector data has a strong temporal aspect (e.g. products added over time)
You can further partition your index by time
e.g. my-ecommerce-vectors/products-index/2023, my-ecommerce-vectors/products-index/2024, etc.
This allows you to quickly narrow down searches to the most recent data
Name your indexes in a way that clearly conveys their purpose
e.g. my-ecommerce-vectors/products-index, my-ecommerce-vectors/customers-index
As your vector data grows, regularly review the performance of your indexes
Adjust partitioning, metadata, or other settings as needed to maintain sub-second query times
So in our example ecommerce scenario, the bucket structure might look like:
my-ecommerce-vectors/
products-index/
2023/
2024/
2025/
customers-index/
promotions-index/
And each vector object in the products-index would have metadata like:
{
"product_id": "12345",
"product_category": "Electronics",
"product_brand": "Acme",
"date_added": "2023-05-15"
}