Vector Search / Nearest Neighbors¶
Vector search systems with Vi is designed to be incredibly easy to do.
Let us write a quick function to show the results as pandas DataFrames.
[6]:
import os
username = os.environ['VI_USERNAME']
api_key = os.environ['VI_API_KEY']
url = "https://api.vctr.ai"
[7]:
collection_name = 'ecommerce'
[9]:
from vectorai.client import ViClient
vi_client = ViClient(username, api_key, url)
from vectorai.models.deployed import ViText2Vec, ViImage2Vec, ViAudio2Vec
text_encoder = ViText2Vec(username, api_key, 'https://api.vctr.ai')
image_encoder = ViImage2Vec(username, api_key, 'https://api.vctr.ai')
Here we insert the documents into a collection and encode it with a text2vec encoder. When a field is encoded a new field is created called ‘field_vector_’. e.g. ‘name’ -> ‘name_vector_’
[6]:
vi_client.insert_documents(
collection_name,
documents,
models={
'name':text_encoder.encode,
}
)
Once inserted you can also run jobs ontop of the collection to encode other fields that you didn’t encode intially at insert.
[7]:
job = vdb_client.encode_image_job(collection_name, 'image_url')
vdb_client.wait_till_jobs_complete(collection_name, job['job_id'], job['job_name'])
Search¶
Vi makes search really easy and blazing fast.
[17]:
search_results = vi_client.search(collection_name,
text_encoder.encode('samsung phone'), 'name_vector_', page_size=5, page=1)
vi_client.show_json(search_results, selected_fields=['_id', 'name', 'sku'],
image_fields=['image_url'], image_width=100)
[17]:
image_url | _id | name | sku | |
---|---|---|---|---|
0 | 1_43661 | Samsung - SM-J320ZZWNXSA - Galaxy J3 2106 - White | 1091002705 | |
1 | 1_50310 | Samsung - SM-R170NZKAXSA - Galaxy Buds | SM-R170NZKAXSA | |
2 | 1_43031 | Samsung - SM-G930FZSAXSA - Galaxy S7 32GB - Silver | SM-G930FZSAXSA | |
3 | 1_43029 | Samsung - SM-G930FZDAXSA - Galaxy S7 32GB - Gold | SM-G930FZDAXSA | |
4 | 1_43030 | Samsung - SM-G930FZKAXSA - Galaxy S7 32GB - Black | SM-G930FZKAXSA |
Recommendations/Search by _id
[21]:
product_id = '1_50198'
search_by_id_results = vi_client.search_by_id(collection_name, product_id, 'name_vector_', page_size=3)
vi_client.show_json(search_by_id_results, selected_fields=['_id', 'name', 'sku'],
image_fields=['image_url'], image_width=100)
[21]:
image_url | _id | name | sku | |
---|---|---|---|---|
0 | 1_50198 | Apple AirPods with Charging Case (2nd Gen) - MV7N2ZA/A | MV7N2ZA/A | |
1 | 1_50196 | Apple Wireless Charging Case for AirPods - MR8U2ZA/A | MR8U2ZA/A | |
2 | 1_50197 | Apple AirPods with Wireless Charging Case - MRXJ2ZA/A | MRXJ2ZA/A |
Hybrid Search¶
There are sometime search cases where pure vector search does not provide the best solution. For example product skus.
[25]:
[25]:
dict_keys(['_id', 'short_description', 'on_special', 'product_url', 'rating', 'description', 'saleable', 'pre_order', 'thumbnail_url', 'model_no', 'was_now', 'manufacturer', 'cashback_amount', 'price', 'merchandising_blocks', 'add_to_cart_html', 'attribute_set', 'id', 'sku', 'brand_img', 'store_id', 'quickview_url', 'compare_url', 'price_html', 'ticket_price', 'image_url', 'tax_class_id', 'cashback_time', 'memory_gb', 'webcode', 'special_time', 'name', '_clusters_', 'objectID', '_dr_', 'status', '_search_score'])
[31]:
search_results = vi_client.search(collection_name, text_encoder.encode('R170NZKAXSA'), 'name_vector_', page_size=3)
vi_client.show_json(search_results, selected_fields=['_id', 'name', 'sku'],
image_fields=['image_url'], image_width=150)
[31]:
image_url | _id | name | sku | |
---|---|---|---|---|
0 | 1_43035 | NutriBullet RX 1700 - N171007M | N171007M | |
1 | 1_39666 | Euroflex - AC6904194 - Antibak | AC6904194 | |
2 | 1_52451 | SEBO K Service Kit - 6695ER | 6695ER |
Instead, in those cases, we may want to adopt a hybrid search approach to fine-tune our search results.
[35]:
search_results = vi_client.hybrid_search(collection_name, 'R170NZKAXSA',
text_encoder.encode('R170NZKAXSA'),
fields=['name_vector_'], text_fields=['name'],
traditional_weight=0.015,
page_size=3)
vi_client.show_json(search_results, selected_fields=['_id', 'name', 'sku'],
image_fields=['image_url'], image_width=150)
[35]:
image_url | _id | name | sku | |
---|---|---|---|---|
0 | 1_50310 | Samsung - SM-R170NZKAXSA - Galaxy Buds | SM-R170NZKAXSA | |
1 | 1_43035 | NutriBullet RX 1700 - N171007M | N171007M | |
2 | 1_45830 | Samsung - SM-R150NZKAXSA - Gear IconX | SM-R150NZKAXSA |
Collection Metadata¶
View the collection schema.
[12]:
vi_client.collection_schema(collection_name)
[12]:
{'_clusters_': 'dict',
'_clusters_.image_url_vector_': 'dict',
'_clusters_.image_url_vector_.default': 'numeric',
'_clusters_.name_vector_': 'dict',
'_clusters_.name_vector_.default': 'numeric',
'_dr_': 'dict',
'_dr_.default': 'dict',
'_dr_.default.2': 'dict',
'_dr_.default.2.name_vector_': 'vector',
'_dr_.default.2048': 'dict',
'_dr_.default.2048.image_url_vector_': 'vector',
'add_to_cart_html': 'text',
'attribute_set': 'text',
'brand_img': 'text',
'cashback_amount': 'text',
'cashback_time': 'numeric',
'categories': 'numeric',
'categories_search': 'text',
'clearance_item': 'text',
'compare_url': 'text',
'description': 'text',
'description_vector_': 'vector',
'display_url_vector_': 'vector',
'id': 'text',
'image_url': 'text',
'image_url_vector_': 'vector',
'insert_date_': 'date',
'manufacturer': 'text',
'memory_gb': 'text',
'merchandising_blocks': 'text',
'model_no': 'text',
'name': 'text',
'name_vector_': 'vector',
'objectID': 'text',
'on_special': 'bool',
'pre_order': 'numeric',
'price': 'numeric',
'price_html': 'text',
'product_url': 'text',
'quickview_url': 'text',
'rating': 'text',
'saleable': 'numeric',
'short_description': 'text',
'sku': 'text',
'special_time': 'numeric',
'status': 'text',
'store_id': 'numeric',
'tax_class_id': 'text',
'thumbnail_url': 'text',
'ticket_price': 'text',
'was_now': 'text',
'webcode': 'text'}
View statistics about your collection!
[13]:
vi_client.collection_stats(collection_name)
[13]:
{'size_mb': 756.213328,
'number_of_documents': 6068,
'number_of_searches': 32698,
'number_of_id_lookups': 30519}
List all created collections!
[14]:
vi_client.list_collections()[0:5]
[14]:
['aggregated_ecommerce',
'audio_quickstart',
'common_voice',
'ecommerce',
'instagram']
Displaying a document by its id and without vectors.
[15]:
document = vi_client.id(collection_name, '1_50198', include_vector=False)
document.keys()
document['sku'], document['name'], document['description'], document['categories'], document['manufacturer']
[15]:
dict_keys(['short_description', 'on_special', 'categories_search', 'product_url', 'rating', 'description', 'saleable', 'pre_order', 'thumbnail_url', 'model_no', 'was_now', 'manufacturer', 'cashback_amount', 'price', 'merchandising_blocks', 'add_to_cart_html', 'attribute_set', 'id', 'categories', 'sku', 'brand_img', 'store_id', 'quickview_url', 'compare_url', 'price_html', 'ticket_price', 'image_url', 'tax_class_id', 'insert_date_', 'clearance_item', 'cashback_time', 'memory_gb', 'webcode', 'special_time', 'name', '_clusters_', 'objectID', '_dr_', 'status'])
[15]:
('MV7N2ZA/A',
'Apple AirPods with Charging Case (2nd Gen) - MV7N2ZA/A',
'Designed by Apple<br />Automatically on, automatically connected<br />Easy setup for all your Apple devices6<br />Quick access to Siri<br />Double tap to play or skip forward<br />Charges quickly in the case<br />Case can be charged with a Lightning connector<br />Rich, high-quality audio and voice<br />Seamless switching between devices<br /><br />Accessibility: Live Listen audio<br />AirPods sensors (each): Dual beamforming microphones, Dual optical sensors, Motion-detecting accelerometer, Speech-detecting accelerometer<br /><br />AirPods: Bluetooth<br />Charging case: Lightning connector<br /><br />AirPods with charging case: More than 24 hours of listening time3, up to 18 hours of talk time<br />AirPods (single charge): Up to five hours of listening time1, up to three hours of talk time; 15 minutes in the case equals three hours of listening time4 or up to two hours of talk time<br /><br />iPhone, iPad and iPod touch models: With iOS 12.2 or later<br />Apple Watch models: With watchOS 5.2 or later<br />Mac models: With macOS 10.14.4 or later<br />Apple TV models: With tvOS 12.2 or later<br /><br />iPhone Models: iPhone XS, iPhone XS Max, iPhone XR, iPhone X, iPhone 8, iPhone 8 Plus, iPhone 7, iPhone 7 Plus, iPhone 6s, iPhone 6s Plus, iPhone 6, iPhone 6 Plus, iPhone SE, iPhone 5s<br />iPad Models: iPad Air (3rd generation), iPad mini (5th generation), 11-inch iPad Pro, 12.9-inch iPad Pro (3rd generation), 10.5-inch iPad Pro, iPad (6th Generation), iPad (5th Generation), iPad Pro 12.9-inch (2nd Generation), iPad Pro 12.9-inch (1st Generation), iPad Pro 9.7-inch, iPad mini 4, iPad mini 3, iPad mini 2, iPad Air 2, iPad Air (1st generation)<br />Mac Models: 12-inch MacBook, 13-inch MacBook Air with Retina display, 13-inch MacBook Air, 11-inch MacBook Air, 13-inch MacBook Pro - Thunderbolt 3 (USB-C), 13-inch MacBook Pro, 15-inch MacBook Pro - Thunderbolt 3 (USB-C), 15-inch MacBook Pro, 21.5-inch iMac — Thunderbolt 3 (USB-C), 21.5-inch iMac — Thunderbolt 2, 27-inch iMac — Thunderbolt 3 (USB-C), 27-inch iMac — Thunderbolt 2, iMac Pro, Mac Pro, Mac mini — Thunderbolt 3 (USB-C), Mac mini<br />Watch Models: Series 4, Series 3, Series 2, Series 1<br />Apple TV Models: Apple TV 4K, Apple TV HD<br />iPod Models: iPod touch (6th Generation)<br />',
[1261, 4, 31, 1300, 1394, 1395, 2048, 2197, 2219, 2226, 1132, 2379],
'Apple')
Advanced Search¶
Search can have a lot more complexity than simple vector search! We can sum our vectors, take the average of our vectors and even use traditional filters on our search due to the document-oriented approach of our library!
Using Facets and Filters in search¶
Facets is the frequency count of each unique value in a field 1. we randomly create a filter and use it to filter 2. this will be combined with vector search to make it even more powerful
Get the facets and have an overview of each field’s value.
[16]:
vi_client.facets(collection_name, page=1, page_size=20, asc=False, fields=['attribute_set', 'price', 'manufacturer'])
[16]:
{'attribute_set': [{'attribute_set': 'Phone Accessories', 'frequency': 164},
{'attribute_set': 'Mobile Phones', 'frequency': 163},
{'attribute_set': 'USB Cables, Hubs & Readers', 'frequency': 121},
{'attribute_set': 'Sinks', 'frequency': 116},
{'attribute_set': 'Bluetooth Portable Speaker', 'frequency': 107},
{'attribute_set': 'Free Standing Dishwasher', 'frequency': 99},
{'attribute_set': 'Home Security', 'frequency': 86},
{'attribute_set': '60cm Built-In Ovens', 'frequency': 85},
{'attribute_set': 'Canopy Rangehoods', 'frequency': 84},
{'attribute_set': 'Induction Cooktops', 'frequency': 81},
{'attribute_set': 'Bottom Mount Fridge', 'frequency': 80},
{'attribute_set': 'Bluetooth Headphones', 'frequency': 75},
{'attribute_set': 'Ceramic Cooktop', 'frequency': 73},
{'attribute_set': 'French Door Fridge', 'frequency': 73},
{'attribute_set': 'Other Shelf Appliances', 'frequency': 73},
{'attribute_set': 'Front Load Washers', 'frequency': 70},
{'attribute_set': 'Power Accessories', 'frequency': 70},
{'attribute_set': 'Wearable Equipment', 'frequency': 69},
{'attribute_set': 'External Hard Drives', 'frequency': 68},
{'attribute_set': 'Taps', 'frequency': 68}],
'manufacturer': [{'manufacturer': 'Samsung', 'frequency': 356},
{'manufacturer': 'Smeg', 'frequency': 286},
{'manufacturer': 'Westinghouse', 'frequency': 230},
{'manufacturer': 'Miele', 'frequency': 225},
{'manufacturer': 'Cygnett', 'frequency': 180},
{'manufacturer': 'Apple', 'frequency': 154},
{'manufacturer': 'Philips', 'frequency': 144},
{'manufacturer': 'LG', 'frequency': 139},
{'manufacturer': 'Breville', 'frequency': 131},
{'manufacturer': 'Panasonic', 'frequency': 129},
{'manufacturer': 'Fisher & Paykel', 'frequency': 126},
{'manufacturer': 'Electrolux', 'frequency': 119},
{'manufacturer': 'Bosch', 'frequency': 110},
{'manufacturer': 'DeLonghi', 'frequency': 110},
{'manufacturer': 'Sony', 'frequency': 101},
{'manufacturer': 'Sunbeam', 'frequency': 95},
{'manufacturer': 'Blanco', 'frequency': 94},
{'manufacturer': 'JBL', 'frequency': 89},
{'manufacturer': 'Alogic', 'frequency': 83},
{'manufacturer': 'Beko', 'frequency': 77}],
'price': {'min': 2.0, 'max': 99999.0, 'avg': 974.5360909690178}}
Take a randomly generated filter query!
[21]:
filter_query = vi_client.random_filter_query(collection_name, text_filters=1, numeric_filters=0)
filter_query
[21]:
[{'field': 'status',
'filter_type': 'text',
'condition_value': 'Enabled',
'condition': '=='}]
View your filter results
[22]:
vi_client.results_to_df(vi_client.filters(collection_name, filter_query))[['_id', 'name', 'description', 'sku']]
[22]:
_id | name | description | sku | |
---|---|---|---|---|
0 | 1_48435 | Fisher & Paykel - CG604DX1 - 60cm Gas on Steel... | Cast iron trivets<br />Electronic ignition<br ... | CG604DX1 |
1 | 1_51255 | Kelvinator - KSV50HWH - 5kW/6kW Reverse Cycle ... | Kelvinator Connect Wi-Fi enabled<br />Heating ... | KSV50HWH |
2 | 1_52877 | Philips - HC5612/15 - Washable Hair Clipper | Trim-n-Flow PRO technology<br />28 length sett... | HC5612/15 |
3 | 1_52875 | LG - GS-B680DSLE - 679L Side-by-Side Fridge | <style type="text/css">\r\nbody {\r\n backg... | GSB680DSLE |
4 | 1_47181 | Asko - DBI654IB.S - XL 60cm Built-in Dishwasher | 15 Place Settings<br />13 Wash Programs<br />T... | DBI654IBS |
5 | 1_46728 | Panasonic - Wireless CD Hi-Fi System - SC-PMX1... | 3-ways speaker system<br />High resolution aud... | SCPMX152GNS |
6 | 1_52611 | Sunbeam - EM5300K - Barista Max Espresso Machi... | All-in-one, easy to use espresso machine<br />... | EM5300K |
7 | 1_50483 | JBL - Charge 4 Blue - Portable Bluetooth Speak... | Wireless Bluetooth Streaming<br />20 Hours of ... | JBLCHARGE4BLU |
8 | 1_48390 | Husky - HUS-C3-840 - 307L Bar Fridge - Silver | 6 x fully adjustable heavy duty shelves<br />D... | HUSC3840HY |
9 | 1_47665 | Alogic - MF-AUS2PC7-02 - 2m Aus 2 Pin Mains Pl... | Male to Female Power Cable<br />Insulated Pins... | MF-AUS2PC7-02R |
Using facets and filters in vector search¶
[23]:
advanced_search_query = {
'text' : {'vector': text_encoder.encode('iphone earphones'), 'fields' : ['name_vector_']}
}
filter_query = [
{'field': 'attribute_set',
'filter_type': 'text',
'condition_value': 'Bluetooth Headphones',
'condition': '=='}
]
results = vi_client.advanced_search(collection_name, advanced_search_query,
filters=filter_query, include_facets=True,
min_score=0.01, page_size=3, facets=['attribute_set', 'price'])
[24]:
vi_client.results_to_df(results)[['_id', 'name', 'description', 'sku']]
[24]:
_id | name | description | sku | |
---|---|---|---|---|
0 | 1_45511 | Apple AirPods | Designed by Apple<br />Automatically on, autom... | MMEF2ZA/A |
1 | 1_51937 | Apple AirPods Pro | Go to <a href="http://www.apple.com/airpods-pr... | MWP22ZA/A |
2 | 1_45609 | EarPods with 3.5mm Headphone Plug | Designed by Apple<br />Deeper, richer bass ton... | MNHF2FE/A |
[25]:
results['facets']
[25]:
{'attribute_set': [{'attribute_set': 'Sports Headphones', 'frequency': 6},
{'attribute_set': 'In-ear headphones', 'frequency': 8},
{'attribute_set': 'Bluetooth One Box HiFi', 'frequency': 15},
{'attribute_set': 'Headphones Phone Controls', 'frequency': 21},
{'attribute_set': 'True Wireless Headphones', 'frequency': 21},
{'attribute_set': 'Noise Cancelling headphones', 'frequency': 34},
{'attribute_set': 'Headphones', 'frequency': 43},
{'attribute_set': 'Bluetooth Headphones', 'frequency': 75},
{'attribute_set': 'Bluetooth Portable Speaker', 'frequency': 107}],
'price': {'min': 4.0, 'max': 749.0, 'avg': 178.83030303030304}}
Advanced Vector Search¶
Multi vector search¶
By Sum¶
Multi-vector search allows you to obtain search scores by taking the sum of these scores.
\(TextScore + ImageScore = SearchScore\)
We then rank by the new SearchScore, so for 1000 documents there will be 1000 search scores (which is different to our comparison-only method!)
[37]:
image_vector = image_encoder.encode('https://specials-images.forbesimg.com/imageserve/5e19e401a854780006e84e28/960x0.jpg?fit=scale')
ipd.Image('https://specials-images.forbesimg.com/imageserve/5e19e401a854780006e84e28/960x0.jpg', width=200)
[37]:
[39]:
advanced_search_query = {
'text': {'vector': text_encoder.encode('256gb'), 'fields': ['name_vector_']},
'image_url': {'vector': image_vector, 'fields': ['image_url_vector_']}
}
results = vi_client.advanced_search(collection_name, advanced_search_query, page_size=3)
vi_client.show_json(results, selected_fields=['_id', 'name', 'sku'], image_fields=['image_url'], image_width=150)
[39]:
image_url | _id | name | sku | |
---|---|---|---|---|
0 | 1_52947 | iPhone SE 256GB White | MXVU2X/A | |
1 | 1_51518 | iPhone 11 Pro 256GB Gold | MWC92X/A | |
2 | 1_51501 | iPhone 11 Pro Max 256GB Gold | MWHL2X/A |
Without Sum¶
Multi vector search but not summing the score, instead including it in the comparison!
Instead of using the search scores above, we can use the relevance instead.
\(TextScore = SearchScore\)
\(ImageScore = SearchScore\)
We then rank by the new SearchScore, so for 1000 documents there should be 2000 search scores.
We do this by setting sum_fields=False
[40]:
results = vi_client.advanced_search(collection_name, advanced_search_query,
page_size=3, sum_fields=False)
vi_client.show_json(results, selected_fields=['_id', 'name', 'description', 'sku'], image_fields=['image_url'])
[40]:
image_url | _id | name | description | sku | |
---|---|---|---|---|---|
0 | 1_53840 | Blanco - BWCB - Wooden Chopping Board | Ash board Supported by cushioned feet Fits perfectly across the width of the sink Integrated into the food preparation area Designed to suit the ZEROX, CLARON, ANDANO, DIVON & SUBLINE sinks |
BWCB | |
1 | 1_51518 | iPhone 11 Pro 256GB Gold | Go to www.apple.com/au/iphone-11-pro/specs/ for a complete set. | MWC92X/A | |
2 | 1_51516 | iPhone 11 Pro Max 64GB Gold | Go to www.apple.com/au/iphone-11-pro/specs/ for a complete set. | MWHG2X/A |
Multi vector search with different weightings:¶
\(0.9\times TextScore + 0.5\times ImageScore = SearchScore\)
We then rank by the new search score (with 1000 documents there should be 1000 search scores.) For the advanced search query, the aliases below text and image_url (the keys of the query) can be renamed to anything and are mainly used to provide an alias for the search query. However, vector and fields are required to identify the vector search component and the fields component.
[42]:
advanced_search_query = {
'text': {'vector': text_encoder.encode('iphone 11'), 'fields': {'name_vector_':0.9}},
'image_url': {'vector': image_vector, 'fields': {'image_url_vector_':0.5}}
}
results = vi_client.advanced_search(collection_name, advanced_search_query, page_size=3)
vi_client.show_json(results, selected_fields=['_id', 'name', 'sku'], image_fields=['image_url'], image_width=150)
[42]:
image_url | _id | name | sku | |
---|---|---|---|---|
0 | 1_51518 | iPhone 11 Pro 256GB Gold | MWC92X/A | |
1 | 1_51516 | iPhone 11 Pro Max 64GB Gold | MWHG2X/A | |
2 | 1_51501 | iPhone 11 Pro Max 256GB Gold | MWHL2X/A |
Vector based Recommendations (Search by Id)¶
Vector-based recommendations is the same as search with one minor difference - we use the id associated with the vector of “x” document for search.
Single Item Recommendation (One -> Many)¶
I liked product A, recommend me products similar to it, or products that users also bought
What is happening here is that the vector of product A stored in VecDB is used as a search query:
\(Vector_{ProductA} = Search Query\)
[47]:
product_id = '1_50017'
results = vi_client.search_by_id(collection_name, product_id, 'name_vector_', page_size=10)
vi_client.show_json(results, selected_fields=['_id', 'name', 'sku'], image_fields=['image_url'],
nrows=10, image_width=80)
[47]:
image_url | _id | name | sku | |
---|---|---|---|---|
0 | 1_50017 | DeLonghi - CTOC 2003.Y - Icona Capitals 2 Slice Toaster - New York Yellow | CTOC2003Y | |
1 | 1_49987 | DeLonghi - CTOC 2003.R - Icona Capitals 2 Slice Toaster - Tokyo Red | CTOC2003R | |
2 | 1_50016 | DeLonghi - CTOC 2003.BL - Icona Capitals 2 Slice Toaster - London Blue | CTOC2003BL | |
3 | 1_49988 | DeLonghi - CTOC 2003.W - Icona Capitals 2 Slice Toaster - Sydney White | CTOC2003W | |
4 | 1_50018 | DeLonghi - CTOC 4003.Y - Icona Capitals 4 Slice Toaster - New York Yellow | CTOC4003Y | |
5 | 1_50021 | DeLonghi - KBOC 2001.Y - Icona Capitals Kettle - New York Yellow | KBOC2001Y | |
6 | 1_50032 | DeLonghi - CTOC 4003.BL - Icona Capitals 4 Slice Toaster - London Blue | CTOC4003BL | |
7 | 1_49989 | DeLonghi - CTOC 4003.R - Icona Capitals 4 Slice Toaster - Tokyo Red | CTOC4003R | |
8 | 1_52910 | DeLonghi - CTOC 4003.O - Icona Capitals 4 Slice Toaster - Rome Orange | CTOC4003O | |
9 | 1_49990 | DeLonghi - CTOC 4003.W - Icona Capitals 4 Slice Toaster - Sydney White | CTOC4003W |
Advanced Recommendations With Multiple Vectors¶
[51]:
results = vi_client.advanced_search_by_id(collection_name, product_id,
{'name_vector_':1, 'image_url_vector_':1}, page_size=3)['results']
vi_client.show_json(results, selected_fields=['_id', 'name', 'sku'], image_fields=['image_url'], image_width=150)
[51]:
image_url | _id | name | sku | |
---|---|---|---|---|
0 | 1_50017 | DeLonghi - CTOC 2003.Y - Icona Capitals 2 Slice Toaster - New York Yellow | CTOC2003Y | |
1 | 1_49987 | DeLonghi - CTOC 2003.R - Icona Capitals 2 Slice Toaster - Tokyo Red | CTOC2003R | |
2 | 1_50018 | DeLonghi - CTOC 4003.Y - Icona Capitals 4 Slice Toaster - New York Yellow | CTOC4003Y |
Multi Item Recommendations (Many -> Many)¶
e.g. I liked product A+B, recommend me products similar to it
e.g. A like Product A only about 7/10, I like Product B 10/10
\(0.7\times Vector_{ProductA} + 1\times Vector_{ProductB} = Search Query\)
Recommendations for multiple products.
[54]:
results = vi_client.search_by_ids(collection_name, ['1_50017', '1_50021'], 'name_vector_', page_size=5)
vi_client.show_json(results, selected_fields=['_id', 'name', 'sku'], image_fields=['image_url'], image_width=100)
[54]:
image_url | _id | name | sku | |
---|---|---|---|---|
0 | 1_50021 | DeLonghi - KBOC 2001.Y - Icona Capitals Kettle - New York Yellow | KBOC2001Y | |
1 | 1_50017 | DeLonghi - CTOC 2003.Y - Icona Capitals 2 Slice Toaster - New York Yellow | CTOC2003Y | |
2 | 1_50018 | DeLonghi - CTOC 4003.Y - Icona Capitals 4 Slice Toaster - New York Yellow | CTOC4003Y | |
3 | 1_50016 | DeLonghi - CTOC 2003.BL - Icona Capitals 2 Slice Toaster - London Blue | CTOC2003BL | |
4 | 1_49987 | DeLonghi - CTOC 2003.R - Icona Capitals 2 Slice Toaster - Tokyo Red | CTOC2003R |
Advanced Recommendations With Multiple Vectors For Multiple Products
[55]:
results = vi_client.advanced_search_by_ids(collection_name, {'1_50017':1, '1_50021':1},
{'name_vector_':1, 'image_url_vector_':1},
page_size=5, vector_operation='mean')
vi_client.show_json(results, selected_fields=['_id', 'name', 'sku'], image_fields=['image_url'], image_width=100)
[55]:
image_url | _id | name | sku | |
---|---|---|---|---|
0 | 1_50017 | DeLonghi - CTOC 2003.Y - Icona Capitals 2 Slice Toaster - New York Yellow | CTOC2003Y | |
1 | 1_50021 | DeLonghi - KBOC 2001.Y - Icona Capitals Kettle - New York Yellow | KBOC2001Y | |
2 | 1_50018 | DeLonghi - CTOC 4003.Y - Icona Capitals 4 Slice Toaster - New York Yellow | CTOC4003Y | |
3 | 1_52912 | DeLonghi - KBOC 2001.O - Icona Capitals Kettle - Rome Orange | KBOC2001O | |
4 | 1_49987 | DeLonghi - CTOC 2003.R - Icona Capitals 2 Slice Toaster - Tokyo Red | CTOC2003R |
I liked product A+B but not product C, recommend me products based off that
\(Vector_{ProductA} + Vector_{ProductB} - Vector_{ProductC} = Search Query\)
Recommendations from multiple products/Search by IDs
[57]:
results = vi_client.advanced_search_by_positive_negative_ids(collection_name,
{'1_50017':1, '1_50018':1}, {'1_50021':1},
{'name_vector_':1}, page_size=5,
vector_operation='sum')
vi_client.show_json(results, selected_fields=['_id', 'name', 'sku'], image_fields=['image_url'], image_width=150)
[57]:
image_url | _id | name | sku | |
---|---|---|---|---|
0 | 1_50018 | DeLonghi - CTOC 4003.Y - Icona Capitals 4 Slice Toaster - New York Yellow | CTOC4003Y | |
1 | 1_50017 | DeLonghi - CTOC 2003.Y - Icona Capitals 2 Slice Toaster - New York Yellow | CTOC2003Y | |
2 | 1_49989 | DeLonghi - CTOC 4003.R - Icona Capitals 4 Slice Toaster - Tokyo Red | CTOC4003R | |
3 | 1_52910 | DeLonghi - CTOC 4003.O - Icona Capitals 4 Slice Toaster - Rome Orange | CTOC4003O | |
4 | 1_50032 | DeLonghi - CTOC 4003.BL - Icona Capitals 4 Slice Toaster - London Blue | CTOC4003BL |
Combining Query + Recommendation¶
Recommendations/Users like and dislike history can be embedded in a search query
\(Serch Query + Vector_{ProductA} + Vector_{ProductB} - Vector_{ProductC} = New Search Query\)
[58]:
results = vi_client.advanced_search_with_positive_negative_ids_as_history(
collection_name,
text_encoder.encode('Delonghi'),
positive_document_ids={'1_50017':1, '1_50018':1},
negative_document_ids={'1_50021':1},
fields={'name_vector_':1},
page_size=5,
vector_operation='sum')
vi_client.show_json(results, selected_fields=['_id', 'name', 'sku'], image_fields=['image_url'], image_width=100)
[58]:
image_url | _id | name | sku | |
---|---|---|---|---|
0 | 1_50018 | DeLonghi - CTOC 4003.Y - Icona Capitals 4 Slice Toaster - New York Yellow | CTOC4003Y | |
1 | 1_50017 | DeLonghi - CTOC 2003.Y - Icona Capitals 2 Slice Toaster - New York Yellow | CTOC2003Y | |
2 | 1_50032 | DeLonghi - CTOC 4003.BL - Icona Capitals 4 Slice Toaster - London Blue | CTOC4003BL | |
3 | 1_49987 | DeLonghi - CTOC 2003.R - Icona Capitals 2 Slice Toaster - Tokyo Red | CTOC2003R | |
4 | 1_49989 | DeLonghi - CTOC 4003.R - Icona Capitals 4 Slice Toaster - Tokyo Red | CTOC4003R |
Vector Analytics/Aggregation¶
Traditional aggregation¶
We randomly create an aggregation_query and use it to aggregate.
This will be the basis for the more advance aggregation enabled by vectors
[37]:
aggregation_query = vi_client.random_aggregation_query(collection_name, groupby=1, metrics=1)
aggregation_query
[37]:
{'groupby': [{'name': 'memory_gb', 'field': 'memory_gb', 'agg': 'texts'}],
'metrics': [{'name': 'price', 'field': 'price', 'agg': 'avg'}]}
View aggregated results.
[38]:
vi_client.aggregate(collection_name, aggregation_query)
[38]:
[{'memory_gb': '4GB', 'frequency': 5, 'price': 137.0},
{'memory_gb': '16GB', 'frequency': 3, 'price': 332.3333333333333},
{'memory_gb': '8GB', 'frequency': 3, 'price': 272.3333333333333},
{'memory_gb': '8 GB', 'frequency': 2, 'price': 1249.0},
{'memory_gb': '32GB', 'frequency': 1, 'price': 49.0},
{'memory_gb': '8GB microSD™ Class 10', 'frequency': 1, 'price': 649.0},
{'memory_gb': 'Internal solid state', 'frequency': 1, 'price': 149.0},
{'memory_gb': 'No', 'frequency': 1, 'price': 199.0},
{'memory_gb': 'microSD', 'frequency': 1, 'price': 3099.0}]
[39]:
aggregation_query = {'groupby': [{'name': 'manufacturer',
'field': 'manufacturer',
'agg': 'texts'}],
'metrics': [{'name': 'price', 'field': 'price', 'agg': 'avg'}]}
vi_client.aggregate(collection_name, aggregation_query)
[39]:
[{'manufacturer': 'Samsung', 'frequency': 356, 'price': 1216.1797752808989},
{'manufacturer': 'Smeg', 'frequency': 286, 'price': 3285.076923076923},
{'manufacturer': 'Westinghouse',
'frequency': 230,
'price': 1097.9782608695652},
{'manufacturer': 'Miele', 'frequency': 225, 'price': 3856.133333333333},
{'manufacturer': 'Cygnett', 'frequency': 180, 'price': 34.19444444444444},
{'manufacturer': 'Apple', 'frequency': 154, 'price': 985.7922077922078},
{'manufacturer': 'Philips', 'frequency': 144, 'price': 137.51388888888889},
{'manufacturer': 'LG', 'frequency': 139, 'price': 1661.4244604316548},
{'manufacturer': 'Breville', 'frequency': 131, 'price': 325.35114503816794},
{'manufacturer': 'Panasonic', 'frequency': 129, 'price': 412.8992248062016}]
Aggregations can also be published¶
Product -> Product Search
Product -> Brand Search
Brand -> Brand Search
[40]:
aggregated_collection_name = 'aggregated_{}'.format(collection_name)
aggregation_name = 'aggregation_{}'.format(collection_name)
[41]:
vi_client.publish_aggregation(collection_name,
aggregation_query,
aggregation_name=aggregation_name,
aggregated_collection_name=aggregated_collection_name,
description='some aggregation for {}'.format(collection_name),
refresh_time=2,
start_immediately=False)
vi_client.start_aggregation(aggregation_name)
[41]:
{'status': 'error',
'message': 'Something went wrong, please raise the error with maintainer.',
'error': 'Unknown Error'}
[41]:
{'status': 'complete', 'message': 'aggregation_ecommerce started'}
[42]:
vi_client.results_to_df(vi_client.retrieve_documents(aggregated_collection_name, 3))
[42]:
Basic Vector Analytics¶
Vector analytics provides an important way to understand how searches work.
[43]:
dr_job = vi_client.dimensionality_reduction_job(collection_name, vector_field='name_vector_', n_components=2)
cluster_job = vi_client.advanced_clustering_job(
collection_name=collection_name, alias='default', vector_field='name_vector_',
n_clusters=50, n_init=5)
We can call the wait_till_jobs_complete
method
[44]:
vi_client.wait_till_jobs_complete(collection_name, dr_job['job_id'], dr_job['job_name'])
vi_client.wait_till_jobs_complete(collection_name, cluster_job['job_id'], cluster_job['job_name'])
Viewing the advanced cluster facets for only the first cluster:
[45]:
vi_client.advanced_cluster_facets(collection_name, vector_field='name_vector_',
facets_fields=['manufacturer', 'attribute_set'])['0']
[45]:
{'attribute_set': [{'attribute_set': 'Vacuum Accessories', 'frequency': 19},
{'attribute_set': 'Stick Vacuum', 'frequency': 15},
{'attribute_set': 'Undermount Rangehoods', 'frequency': 15},
{'attribute_set': 'Compact Combi Built-In Microwave', 'frequency': 13},
{'attribute_set': 'Induction Cooktops', 'frequency': 13},
{'attribute_set': 'Bagless Vacuum ', 'frequency': 9},
{'attribute_set': 'Canopy Rangehoods', 'frequency': 9},
{'attribute_set': 'Compact Combi Steamers', 'frequency': 9},
{'attribute_set': 'Warming Draws', 'frequency': 8},
{'attribute_set': 'All Fridge', 'frequency': 6},
{'attribute_set': 'Ceramic Cooktop', 'frequency': 6},
{'attribute_set': 'Dryers', 'frequency': 6},
{'attribute_set': 'Fixed Rangehoods', 'frequency': 6},
{'attribute_set': 'Bag Vacuums', 'frequency': 5},
{'attribute_set': 'Hair Dryers', 'frequency': 5},
{'attribute_set': 'Upright Vacuum', 'frequency': 5},
{'attribute_set': '60cm Gas Cooktop', 'frequency': 4},
{'attribute_set': 'Built-in Coffee Machines', 'frequency': 4},
{'attribute_set': 'Evaporative Air Cooler', 'frequency': 3},
{'attribute_set': '60cm Slideout Rangehoods', 'frequency': 2},
{'attribute_set': '90cm Built-In Ovens', 'frequency': 2},
{'attribute_set': '90cm Gas Cooktop', 'frequency': 2},
{'attribute_set': 'Automatic Coffee Maker', 'frequency': 2},
{'attribute_set': 'Bottom Mount Fridge', 'frequency': 2},
{'attribute_set': 'Dehumidifiers', 'frequency': 2},
{'attribute_set': 'Hand Held Floor Cleaners', 'frequency': 2},
{'attribute_set': 'Shampoo Vacuums', 'frequency': 2},
{'attribute_set': '70cm Gas Cooktop', 'frequency': 1},
{'attribute_set': 'Fully Integrated Dishwasher', 'frequency': 1},
{'attribute_set': 'General Accessories', 'frequency': 1},
{'attribute_set': 'Hair Straighteners', 'frequency': 1},
{'attribute_set': 'Other Shelf Appliances', 'frequency': 1},
{'attribute_set': 'Outdoor Wine', 'frequency': 1},
{'attribute_set': 'Ovens Upright Duel Fuel 100cm', 'frequency': 1},
{'attribute_set': 'Rechargeable Vacuum Cleaner', 'frequency': 1},
{'attribute_set': 'Steam Irons', 'frequency': 1},
{'attribute_set': 'Vertical Freezers', 'frequency': 1},
{'attribute_set': 'Washer + Dryer Packages', 'frequency': 1},
{'attribute_set': 'Wet/Dry Vacuum', 'frequency': 1},
{'attribute_set': 'Wine Refrigerators', 'frequency': 1}],
'manufacturer': [{'manufacturer': 'Miele', 'frequency': 138},
{'manufacturer': 'Dyson', 'frequency': 21},
{'manufacturer': 'DeLonghi', 'frequency': 4},
{'manufacturer': 'LG', 'frequency': 4},
{'manufacturer': 'Shark', 'frequency': 4},
{'manufacturer': 'Honeywell', 'frequency': 3},
{'manufacturer': 'Samsung', 'frequency': 3},
{'manufacturer': 'Bissell', 'frequency': 2},
{'manufacturer': 'Numatic International', 'frequency': 2},
{'manufacturer': 'Remington', 'frequency': 2},
{'manufacturer': 'Vax', 'frequency': 2},
{'manufacturer': 'Black & Decker', 'frequency': 1},
{'manufacturer': 'Bosch', 'frequency': 1},
{'manufacturer': 'FoodSaver', 'frequency': 1},
{'manufacturer': 'Unilux', 'frequency': 1}]}
Viewing the cluster aggregation statistics.
[46]:
vi_client.advanced_cluster_aggregate(
collection_name,
aggregation_query,
vector_field='name_vector_')['0']
[46]:
[{'manufacturer': 'Miele', 'frequency': 138, 'price': 3647.804347826087},
{'manufacturer': 'Dyson', 'frequency': 21, 'price': 625.0},
{'manufacturer': 'DeLonghi', 'frequency': 4, 'price': 1174.0},
{'manufacturer': 'LG', 'frequency': 4, 'price': 896.0},
{'manufacturer': 'Shark', 'frequency': 4, 'price': 399.0},
{'manufacturer': 'Honeywell', 'frequency': 3, 'price': 449.0},
{'manufacturer': 'Samsung', 'frequency': 3, 'price': 965.6666666666666},
{'manufacturer': 'Bissell', 'frequency': 2, 'price': 349.0},
{'manufacturer': 'Numatic International', 'frequency': 2, 'price': 849.0},
{'manufacturer': 'Remington', 'frequency': 2, 'price': 184.0}]
[47]:
vi_client.advanced_cluster_aggregate(
collection_name,
aggregation_query,
vector_field='name_vector_')['1']
[47]:
[{'manufacturer': 'Targus', 'frequency': 22, 'price': 34.45454545454545},
{'manufacturer': 'Blanco', 'frequency': 9, 'price': 636.7777777777778},
{'manufacturer': 'STM', 'frequency': 7, 'price': 63.285714285714285},
{'manufacturer': 'Tauris', 'frequency': 1, 'price': 599.0}]
[48]:
vi_client.plot_dimensionality_reduced_vectors(
collection=collection_name,
cluster_field='name_vector_',
cluster_label='_clusters_.name_vector_.default',
point_label='name',
dim_reduction_field='_dr_.default.2.name_vector_',
include_centroids=True,
alias='default')
[49]:
docs = vi_client.retrieve_documents(collection_name)['documents']
[50]:
vi_client.plot_1d_cosine_similarity(docs, vector_fields='name_vector_', label='name', anchor_document=docs[0])
[51]:
vi_client.plot_2d_cosine_similarity(docs[:20], docs[:2], vector_fields='name_vector_', label='name')
Clean up¶
Clean up and delete the collections and aggregations.
[53]:
vi_client.stop_aggregation(aggregation_name)
vi_client.delete_published_aggregation(aggregation_name)
vi_client.delete_collection(aggregated_collection_name)
[53]:
{'status': 'complete', 'message': 'aggregation_ecommerce stopped'}
[53]:
{'status': 'complete', 'message': 'aggregation_ecommerce deleted'}
[53]:
{'status': 'complete', 'message': 'aggregated_ecommerce deleted'}
[ ]: