Client

Client

Documentation for Vector AI client goes here.

class vectorai.client.ViClient(username: Optional[str] = None, api_key: Optional[str] = None, url: str = 'https://vectorai-development-api.azurewebsites.net', analytics_url='https://vector-analytics.vctr.ai', verbose: bool = True)

The main Vi client with most of the available read and write methods available to it.

Parameters
  • username – your username for accessing vectorai

  • api_key – your api key for accessing vectorai

  • url – url of the deployed vectorai database

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.list_collections()
vectorai.client.request_api_key(self, email, description, referral_code='api_referred', **kwargs)

Request an api key Make sure to save the api key somewhere safe. If you have a valid referral code, you can recieve the api key more quickly.

Parameters
  • username (Username you'd like to create, lowercase only) –

  • email (Email you are using to sign up) –

  • description (Description of your intended use case) –

  • referral_code (The referral code you've been given to allow you to register for an api key before others) –

class vectorai.client.ViCollectionClient(collection_name: str, username: str, api_key: str, url: str = 'https://api.vctr.ai', verbose: bool = True)

The Vi client when you are mainly working with 1 client.

Parameters
  • username – your username for accessing vecdb

  • api_key – your api key for accessing vecdb

  • url – url of the deployed vecdb database

  • collection_name – The name of the collection

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, collection_name, vectorai_url)
>>> vi_client.insert_documents(documents)
add_collection_metadata(collection_name, metadata, **kwargs)

Add metadata about a collection Add metadata about a collection. notably description, data source, etc

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • metadata (Metadata for a collection, e.g. {'description' : 'collection for searching products'}) –

Advanced Vector Similarity Search on Chunks. Support for multiple vectors, vector weightings, facets and filtering Advanced Chunk Vector Search. Search with a multiple chunkvectors for the most similar documents.

Advanced chunk search also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • chunk_field (Field that the array of chunked documents are.) –

  • chunk_scoring (Scoring method for determining for ranking between document chunks.) –

  • page (Page of the results) –

  • page_size (Size of each page of results) –

  • approx (Used for approximate search) –

  • sum_fields (Whether to sum the multiple vectors similarity search score as 1 or seperate) –

  • metric (Similarity Metric, choose from ['cosine', 'l1', 'l2', 'dp']) –

  • filters (Query for filtering the search results) –

  • facets (Fields to include in the facets, if [] then all) –

  • min_score (Minimum score for similarity metric) –

  • include_vector (Include vectors in the search results) –

  • include_count (Include count in the search results) –

  • include_facets (Include facets in the search results) –

  • hundred_scale (Whether to scale up the metric by 100) –

  • asc (Whether to sort results by ascending or descending order) –

  • multivector_query (Query for advance search that allows for multiple vector and field querying) –

  • chunk_page (Page of the chunk results) –

  • chunk_page_size (Size of each page of chunk results) –

advanced_cluster_aggregate(collection_name, aggregation_query, vector_field, alias, filters=[], page_size=20, page=1, asc=False, flatten=True, **kwargs)

Aggregate every cluster in a collection Takes an aggregation query and gets the aggregate of each cluster in a collection. This helps you interpret each cluster and what is in them.

Only can be used after a vector field has been clustered with /advanced_cluster.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • aggregation_query (Aggregation query to aggregate data) –

  • filters (Query for filtering the search results) –

  • page_size (Size of each page of results) –

  • page (Page of the results) –

  • asc (Whether to sort results by ascending or descending order) –

  • flatten

  • vector_field (Clustered vector field) –

  • alias (Alias of a cluster) –

Advanced Vector Similarity Search on Clusters. Only can be used after a vector field has been clustered with /advanced_cluster. Perform advanced_search on each cluster

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • page (Page of the results) –

  • page_size (Size of each page of results) –

  • approx (Used for approximate search) –

  • sum_fields (Whether to sum the multiple vectors similarity search score as 1 or seperate) –

  • metric (Similarity Metric, choose from ['cosine', 'l1', 'l2', 'dp']) –

  • filters (Query for filtering the search results) –

  • facets (Fields to include in the facets, if [] then all) –

  • min_score (Minimum score for similarity metric) –

  • include_fields (Fields to include in the search results, empty array/list means all fields.) –

  • include_vector (Include vectors in the search results) –

  • include_count (Include count in the search results) –

  • include_facets (Include facets in the search results) –

  • hundred_scale (Whether to scale up the metric by 100) –

  • include_search_relevance (Whether to calculate a search_relevance cutoff score to flag relevant and less relevant results) –

  • search_relevance_cutoff_aggressiveness (How aggressive the search_relevance cutoff score is (higher value the less results will be relevant)) –

  • asc (Whether to sort results by ascending or descending order) –

  • keep_search_history (Whether to store the history of search or not) –

  • multivector_query (Query for advance search that allows for multiple vector and field querying) –

  • vector_field (Vector field to perform clustering on) –

  • alias (Alias is used to name a cluster) –

advanced_clustering_job(collection_name: str, vector_field: str, alias: str = 'default', n_clusters: int = 0, n_init: int = 5, n_iter: int = 10, refresh: bool = True, return_curl: bool = False)

Clusters a collection by a vector field

Clusters a collection into groups using unsupervised machine learning. Clusters can then be aggregated to understand whats in them and how vectors are seperating data into different groups. Advanced cluster allows for more parameters to tune and alias to name each differently trained clusters.

Parameters
  • vector_field – Vector field to perform clustering on

  • alias – Alias is used to name a cluster

  • n_clusters – Number of clusters

  • n_iter – Number of iterations in each run

  • n_init – Number of runs to run with different centroid seeds

  • refresh – Whether to refresh the whole collection and retrain the cluster model

  • collection_name – Name of Collection

Advanced Search a text field with vector and text using Vector Search and Traditional Search Advanced Vector similarity search + Traditional Fuzzy Search with text and vector.

You can also give weightings of each vector field towards the search, e.g. image_vector_ weights 100%, whilst description_vector_ 50%.

Advanced search also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • page (Page of the results) –

  • page_size (Size of each page of results) –

  • approx (Used for approximate search) –

  • sum_fields (Whether to sum the multiple vectors similarity search score as 1 or seperate) –

  • metric (Similarity Metric, choose from ['cosine', 'l1', 'l2', 'dp']) –

  • filters (Query for filtering the search results) –

  • facets (Fields to include in the facets, if [] then all) –

  • min_score (Minimum score for similarity metric) –

  • include_fields (Fields to include in the search results, empty array/list means all fields.) –

  • include_vector (Include vectors in the search results) –

  • include_count (Include count in the search results) –

  • include_facets (Include facets in the search results) –

  • hundred_scale (Whether to scale up the metric by 100) –

  • include_search_relevance (Whether to calculate a search_relevance cutoff score to flag relevant and less relevant results) –

  • search_relevance_cutoff_aggressiveness (How aggressive the search_relevance cutoff score is (higher value the less results will be relevant)) –

  • asc (Whether to sort results by ascending or descending order) –

  • keep_search_history (Whether to store the history of search or not) –

  • multivector_query (Query for advance search that allows for multiple vector and field querying) –

  • text (Text Search Query (not encoded as vector)) –

  • text_fields (Text fields to search against) –

  • traditional_weight (Multiplier of traditional search. A value of 0.025~0.1 is good.) –

  • fuzzy (Fuzziness of the search. A value of 1-3 is good. For automated fuzzines use -1.) –

  • join (Whether to consider cases where there is a space in the word. E.g. Go Pro vs GoPro.) –

Vector Similarity Search on Chunks. Advanced Multistep chunk search involves a simple search followed by chunk search.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • chunk_field (Field that the array of chunked documents are.) –

  • chunk_scoring (Scoring method for determining for ranking between document chunks.) –

  • page (Page of the results) –

  • page_size (Size of each page of results) –

  • approx (Used for approximate search) –

  • sum_fields (Whether to sum the multiple vectors similarity search score as 1 or seperate) –

  • metric (Similarity Metric, choose from ['cosine', 'l1', 'l2', 'dp']) –

  • filters (Query for filtering the search results) –

  • facets (Fields to include in the facets, if [] then all) –

  • min_score (Minimum score for similarity metric) –

  • include_vector (Include vectors in the search results) –

  • include_count (Include count in the search results) –

  • include_facets (Include facets in the search results) –

  • hundred_scale (Whether to scale up the metric by 100) –

  • asc (Whether to sort results by ascending or descending order) –

  • first_step_multivector_query (Query for advance search that allows for multiple vector and field querying) –

  • chunk_step_multivector_query (Query for advance search that allows for multiple vector and field querying) –

  • first_step_page (Page of the results) –

  • first_step_page_size (Size of each page of results) –

Advanced Vector Similarity Search. Support for multiple vectors, vector weightings, facets and filtering Advanced Vector Similarity Search, enables machine learning search with vector search. Search with a multiple vectors for the most similar documents.

For example: Search with a product image and description vectors to find the most similar products by what it looks like and what its described to do.

You can also give weightings of each vector field towards the search, e.g. image_vector_ weights 100%, whilst description_vector_ 50%.

Advanced search also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • page (Page of the results) –

  • page_size (Size of each page of results) –

  • approx (Used for approximate search) –

  • sum_fields (Whether to sum the multiple vectors similarity search score as 1 or seperate) –

  • metric (Similarity Metric, choose from ['cosine', 'l1', 'l2', 'dp']) –

  • filters (Query for filtering the search results) –

  • facets (Fields to include in the facets, if [] then all) –

  • min_score (Minimum score for similarity metric) –

  • include_fields (Fields to include in the search results, empty array/list means all fields.) –

  • include_vector (Include vectors in the search results) –

  • include_count (Include count in the search results) –

  • include_facets (Include facets in the search results) –

  • hundred_scale (Whether to scale up the metric by 100) –

  • include_search_relevance (Whether to calculate a search_relevance cutoff score to flag relevant and less relevant results) –

  • search_relevance_cutoff_aggressiveness (How aggressive the search_relevance cutoff score is (higher value the less results will be relevant)) –

  • asc (Whether to sort results by ascending or descending order) –

  • keep_search_history (Whether to store the history of search or not) –

  • multivector_query (Query for advance search that allows for multiple vector and field querying) –

advanced_search_by_id(collection_name, document_id, search_fields, page=1, page_size=20, approx=0, sum_fields=True, metric='cosine', filters=[], facets=[], min_score=None, include_fields=[], include_vector=False, include_count=True, include_facets=False, hundred_scale=False, include_search_relevance=False, search_relevance_cutoff_aggressiveness=1, asc=False, keep_search_history=False, **kwargs)

Advanced Single Product Recommendations Single Product Recommendations (Search by an id).

For example: Search with id of a product in the database, and using the product’s image and description vectors to find the most similar products by what it looks like and what its described to do.

You can also give weightings of each vector field towards the search, e.g. image_vector_ weights 100%, whilst description_vector_ 50%.

Advanced search also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • page (Page of the results) –

  • page_size (Size of each page of results) –

  • approx (Used for approximate search) –

  • sum_fields (Whether to sum the multiple vectors similarity search score as 1 or seperate) –

  • metric (Similarity Metric, choose from ['cosine', 'l1', 'l2', 'dp']) –

  • filters (Query for filtering the search results) –

  • facets (Fields to include in the facets, if [] then all) –

  • min_score (Minimum score for similarity metric) –

  • include_fields (Fields to include in the search results, empty array/list means all fields.) –

  • include_vector (Include vectors in the search results) –

  • include_count (Include count in the search results) –

  • include_facets (Include facets in the search results) –

  • hundred_scale (Whether to scale up the metric by 100) –

  • include_search_relevance (Whether to calculate a search_relevance cutoff score to flag relevant and less relevant results) –

  • search_relevance_cutoff_aggressiveness (How aggressive the search_relevance cutoff score is (higher value the less results will be relevant)) –

  • asc (Whether to sort results by ascending or descending order) –

  • keep_search_history (Whether to store the history of search or not) –

  • document_id (ID of a document) –

  • search_fields (Vector fields to search against, and the weightings for them.) –

advanced_search_by_ids(collection_name, document_ids, search_fields, page=1, page_size=20, approx=0, sum_fields=True, metric='cosine', filters=[], facets=[], min_score=None, include_fields=[], include_vector=False, include_count=True, include_facets=False, hundred_scale=False, include_search_relevance=False, search_relevance_cutoff_aggressiveness=1, asc=False, keep_search_history=False, vector_operation='sum', **kwargs)

Advanced Multi Product Recommendations Advanced Multi Product Recommendations (Search by ids).

For example: Search with multiple ids of products in the database, and using the product’s image and description vectors to find the most similar products by what it looks like and what its described to do.

You can also give weightings of each vector field towards the search, e.g. image_vector_ weights 100%, whilst description_vector_ 50%.

You can also give weightings of on each product as well e.g. product ID-A weights 100% whilst product ID-B 50%.

Advanced search also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • page (Page of the results) –

  • page_size (Size of each page of results) –

  • approx (Used for approximate search) –

  • sum_fields (Whether to sum the multiple vectors similarity search score as 1 or seperate) –

  • metric (Similarity Metric, choose from ['cosine', 'l1', 'l2', 'dp']) –

  • filters (Query for filtering the search results) –

  • facets (Fields to include in the facets, if [] then all) –

  • min_score (Minimum score for similarity metric) –

  • include_fields (Fields to include in the search results, empty array/list means all fields.) –

  • include_vector (Include vectors in the search results) –

  • include_count (Include count in the search results) –

  • include_facets (Include facets in the search results) –

  • hundred_scale (Whether to scale up the metric by 100) –

  • include_search_relevance (Whether to calculate a search_relevance cutoff score to flag relevant and less relevant results) –

  • search_relevance_cutoff_aggressiveness (How aggressive the search_relevance cutoff score is (higher value the less results will be relevant)) –

  • asc (Whether to sort results by ascending or descending order) –

  • keep_search_history (Whether to store the history of search or not) –

  • document_ids (Document IDs to get recommendations for, and the weightings of each document) –

  • search_fields (Vector fields to search against, and the weightings for them.) –

  • vector_operation (Aggregation for the vectors, choose from ['mean', 'sum', 'min', 'max']) –

advanced_search_by_positive_negative_ids(collection_name, positive_document_ids, negative_document_ids, search_fields, page=1, page_size=20, approx=0, sum_fields=True, metric='cosine', filters=[], facets=[], min_score=None, include_fields=[], include_vector=False, include_count=True, include_facets=False, hundred_scale=False, include_search_relevance=False, search_relevance_cutoff_aggressiveness=1, asc=False, keep_search_history=False, vector_operation='sum', **kwargs)

Advanced Multi Product Recommendations with likes and dislikes Advanced Multi Product Recommendations with Likes and Dislikes (Search by ids).

For example: Search with multiple ids of liked and dislike products in the database. Then using the product’s image and description vectors to find the most similar products by what it looks like and what its described to do against the positives and most disimilar products for the negatives.

You can also give weightings of each vector field towards the search, e.g. image_vector_ weights 100%, whilst description_vector_ 50%.

You can also give weightings of on each product as well e.g. product ID-A weights 100% whilst product ID-B 50%.

Advanced search also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • page (Page of the results) –

  • page_size (Size of each page of results) –

  • approx (Used for approximate search) –

  • sum_fields (Whether to sum the multiple vectors similarity search score as 1 or seperate) –

  • metric (Similarity Metric, choose from ['cosine', 'l1', 'l2', 'dp']) –

  • filters (Query for filtering the search results) –

  • facets (Fields to include in the facets, if [] then all) –

  • min_score (Minimum score for similarity metric) –

  • include_fields (Fields to include in the search results, empty array/list means all fields.) –

  • include_vector (Include vectors in the search results) –

  • include_count (Include count in the search results) –

  • include_facets (Include facets in the search results) –

  • hundred_scale (Whether to scale up the metric by 100) –

  • include_search_relevance (Whether to calculate a search_relevance cutoff score to flag relevant and less relevant results) –

  • search_relevance_cutoff_aggressiveness (How aggressive the search_relevance cutoff score is (higher value the less results will be relevant)) –

  • asc (Whether to sort results by ascending or descending order) –

  • keep_search_history (Whether to store the history of search or not) –

  • positive_document_ids (Positive Document IDs to get recommendations for, and the weightings of each document) –

  • negative_document_ids (Negative Document IDs to get recommendations for, and the weightings of each document) –

  • search_fields (Vector fields to search against, and the weightings for them.) –

  • vector_operation (Aggregation for the vectors, choose from ['mean', 'sum', 'min', 'max']) –

advanced_search_post_cluster(collection_name, multivector_query, cluster_vector_field, page=1, page_size=20, approx=0, sum_fields=True, metric='cosine', filters=[], facets=[], min_score=None, include_fields=[], include_vector=False, include_count=True, include_facets=False, hundred_scale=False, include_search_relevance=False, search_relevance_cutoff_aggressiveness=1, asc=False, keep_search_history=False, n_clusters=0, n_init=5, n_iter=10, return_as_clusters=False, **kwargs)

Performs Clustering on Top X search results This will first perform an advanced search and then cluster the top X (page_size) search results. Results are returned as such: Once you have the clusters:

` Cluster 0: [A, B, C] Cluster 1: [D, E] Cluster 2: [F, G] Cluster 3: [H, I] ` (Note, each cluster is ordered by highest to lowest search score.

This intermediately returns:

` results_batch_1: [A, H, F, D] (ordered by highest search score) results_batch_2: [G, E, B, I] (ordered by highest search score) results_batch_3: [C] `

This then returns the final results:

` results: [A, H, F, D, G, E, B, I, C] `

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • page (Page of the results) –

  • page_size (Size of each page of results) –

  • approx (Used for approximate search) –

  • sum_fields (Whether to sum the multiple vectors similarity search score as 1 or seperate) –

  • metric (Similarity Metric, choose from ['cosine', 'l1', 'l2', 'dp']) –

  • filters (Query for filtering the search results) –

  • facets (Fields to include in the facets, if [] then all) –

  • min_score (Minimum score for similarity metric) –

  • include_fields (Fields to include in the search results, empty array/list means all fields.) –

  • include_vector (Include vectors in the search results) –

  • include_count (Include count in the search results) –

  • include_facets (Include facets in the search results) –

  • hundred_scale (Whether to scale up the metric by 100) –

  • include_search_relevance (Whether to calculate a search_relevance cutoff score to flag relevant and less relevant results) –

  • search_relevance_cutoff_aggressiveness (How aggressive the search_relevance cutoff score is (higher value the less results will be relevant)) –

  • asc (Whether to sort results by ascending or descending order) –

  • keep_search_history (Whether to store the history of search or not) –

  • multivector_query (Query for advance search that allows for multiple vector and field querying) –

  • cluster_vector_field (Vector field to perform clustering on) –

  • n_clusters (Number of clusters) –

  • n_init (Number of runs to run with different centroid seeds) –

  • n_iter (Number of iterations in each run) –

  • return_as_clusters (If True, return as clusters as opposed to results list) –

advanced_search_with_positive_negative_ids_as_history(collection_name, multivector_query, positive_document_ids, negative_document_ids, page=1, page_size=20, approx=0, sum_fields=True, metric='cosine', filters=[], facets=[], min_score=None, include_fields=[], include_vector=False, include_count=True, include_facets=False, hundred_scale=False, include_search_relevance=False, search_relevance_cutoff_aggressiveness=1, asc=False, keep_search_history=False, vector_operation='sum', **kwargs)

Advanced Search with Likes and Dislikes as history For example: Vector search of a query vector with multiple ids of liked and dislike products in the database. Then using the product’s image and description vectors to find the most similar products by what it looks like and what its described to do against the positives and most disimilar products for the negatives.

You can also give weightings of each vector field towards the search, e.g. image_vector_ weights 100%, whilst description_vector_ 50%.

You can also give weightings of on each product as well e.g. product ID-A weights 100% whilst product ID-B 50%.

Advanced search also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • page (Page of the results) –

  • page_size (Size of each page of results) –

  • approx (Used for approximate search) –

  • sum_fields (Whether to sum the multiple vectors similarity search score as 1 or seperate) –

  • metric (Similarity Metric, choose from ['cosine', 'l1', 'l2', 'dp']) –

  • filters (Query for filtering the search results) –

  • facets (Fields to include in the facets, if [] then all) –

  • min_score (Minimum score for similarity metric) –

  • include_fields (Fields to include in the search results, empty array/list means all fields.) –

  • include_vector (Include vectors in the search results) –

  • include_count (Include count in the search results) –

  • include_facets (Include facets in the search results) –

  • hundred_scale (Whether to scale up the metric by 100) –

  • include_search_relevance (Whether to calculate a search_relevance cutoff score to flag relevant and less relevant results) –

  • search_relevance_cutoff_aggressiveness (How aggressive the search_relevance cutoff score is (higher value the less results will be relevant)) –

  • asc (Whether to sort results by ascending or descending order) –

  • keep_search_history (Whether to store the history of search or not) –

  • multivector_query (Query for advance search that allows for multiple vector and field querying) –

  • positive_document_ids (Positive Document IDs to get recommendations for, and the weightings of each document) –

  • negative_document_ids (Negative Document IDs to get recommendations for, and the weightings of each document) –

  • vector_operation (Aggregation for the vectors, choose from ['mean', 'sum', 'min', 'max']) –

aggregate(collection_name, aggregation_query, filters=[], page_size=20, page=1, asc=False, flatten=True, **kwargs)

Aggregate a collection Aggregation/Groupby of a collection using an aggregation query. The aggregation query is a json body that follows the schema of:

{
“groupby”[

{“name”: <alias>, “field”: <field in the collection>, “agg”: “category”}, {“name”: <alias>, “field”: <another groupby field in the collection>, “agg”: “numeric”}

], “metrics” : [

{“name”: <alias>, “field”: <numeric field in the collection>, “agg”: “avg”} {“name”: <alias>, “field”: <another numeric field in the collection>, “agg”: “max”}

]

} For example, one can use the following aggregations to group score based on region and player name. {

“groupby”[

{“name”: “region”, “field”: “player_region”, “agg”: “category”}, {“name”: “player_name”, “field”: “name”, “agg”: “category”}

], “metrics” : [

{“name”: “average_score”, “field”: “final_score”, “agg”: “avg”}, {“name”: “max_score”, “field”: “final_score”, “agg”: “max”}, {‘name’:’total_score’,’field’:”final_score”, ‘agg’:’sum’}, {‘name’:’average_deaths’,’field’:”final_deaths”, ‘agg’:’avg’}, {‘name’:’highest_deaths’,’field’:”final_deaths”, ‘agg’:’max’},

]

}

  • “groupby” is the fields you want to split the data into. These are the available groupby types:
    • category” : groupby a field that is a category

    • numeric: groupby a field that is a numeric

  • “metrics” is the fields you want to metrics you want to calculate in each of those, every aggregation includes a frequency metric. These are the available metric types:
    • “avg”, “max”, “min”, “sum”, “cardinality”

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • aggregation_query (Aggregation query to aggregate data) –

  • filters (Query for filtering the search results) –

  • page_size (Size of each page of results) –

  • page (Page of the results) –

  • asc (Whether to sort results by ascending or descending order) –

  • flatten

aggregate_fetch(collection_name, aggregation_query, filters=[], page_size=20, page=1, asc=False, flatten=True, **kwargs)

Aggregate a collection and fetch the documents Perform an aggregation and then a Bulk ID Lookup using IDs of the aggregated results to get the documents alongside the aggregations.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • aggregation_query (Aggregation query to aggregate data) –

  • filters (Query for filtering the search results) –

  • page_size (Size of each page of results) –

  • page (Page of the results) –

  • asc (Whether to sort results by ascending or descending order) –

  • flatten

bulk_delete_by_id(collection_name, document_ids, **kwargs)

Delete multiple documents in a Collection by ids Delete multiple document by its ids.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • document_ids (IDs of documents) –

bulk_edit_document(collection_name, documents={}, insert_date=True, **kwargs)

Edits multiple documents in a Collection by its ids Edits documents by providing a key value pair of fields you are adding or changing, make sure to include the “_id” in the documents.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • documents (A list of documents. Document is a JSON-like data that we store our metadata and vectors with. For specifying id of the document use the field '_id', for specifying vector field use the suffix of '_vector_') –

  • insert_date (Whether to include insert date as a field ‘insert_date_’.) –

bulk_insert(collection_name, documents={}, insert_date=True, overwrite=True, update_schema=True, quick=False, pipeline=[], **kwargs)

Insert multiple documents into a Collection When inserting the document you can specify your own id for a document by using the field name “_id”. For specifying your own vector use the suffix (ends with) “_vector_” for the field name. e.g. “product_description_vector_”

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • documents (A list of documents. Document is a JSON-like data that we store our metadata and vectors with. For specifying id of the document use the field '_id', for specifying vector field use the suffix of '_vector_') –

  • insert_date (Whether to include insert date as a field ‘insert_date_’.) –

  • overwrite (Whether to overwrite document if it exists.) –

  • update_schema (Whether the api should check the documents for vector datatype to update the schema.) –

  • quick (This will run the quickest insertion possible, which means there will be no schema checks or collection checks.) –

  • pipeline (This will run pipelines for the insert. example: pipeline=["encoders"]) –

bulk_insert_and_encode(encoders, collection_name, documents={}, insert_date=True, overwrite=True, update_schema=True, quick=False, store_to_pipeline=True, **kwargs)

Insert and encode multiple documents into a Collection Insert multiple document and encode specified fields into vectors with provided model urls or model names. [

{“model_url” : “”https://a_vector_model_url.com/encode_image_url””, “body” : “url”, “field” : “thumbnail”}, {“model_url” : “https://a_vector_model_url.com/encode_text”, “body” : “text”, “field” : “short_description”}, {“model_url” : “text”, “alias” : “bert”, “field” : “short_description”},

]

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • encoders (An array structure of models to encode fields with.) –

    Encoders can be a model_url or a model_name. For model_name, the options are: image_text, image, text. text_multi, text_image. Note: image_text encodes images for text to image search whereas text_image encodes texts for text to image search (text to image search/image to text search works both ways). For model_url, you are free to deploy your own model and specify the required body as such.

    [

    {“model_url” : “https://a_vector_model_url.com/encode_image_url”, “body” : “url”, “field”: “thumbnail”}, {“model_url” : “https://a_vector_model_url.com/encode_text”, “body” : “text”, “field”: “short_description”}, {“model_name” : “text”, “body” : “text”, “field”: “short_description”, “alias”:”bert”}, {“model_name” : “image_text”, “body” : “url”, “field” : “thumbnail”},

    ]

  • collection_name (Name of Collection) –

  • documents (A list of documents. Document is a JSON-like data that we store our metadata and vectors with. For specifying id of the document use the field '_id', for specifying vector field use the suffix of '_vector_') –

  • insert_date (Whether to include insert date as a field ‘insert_date_’.) –

  • overwrite (Whether to overwrite document if it exists.) –

  • update_schema (Whether the api should check the documents for vector datatype to update the schema.) –

  • quick (This will run the quickest insertion possible, which means there will be no schema checks or collection checks.) –

  • store_to_pipeline (Whether to store the encoders to pipeline) –

bulk_missing_id(collection_name, document_ids, **kwargs)

Look up in bulk if the ids exists in the collection, returns all the missing one as a list Look up in bulk if the ids exists in the collection, returns all the missing one as a list.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • document_ids (IDs of documents) –

check_schema(collection_name: str, document: Dict = None)

Check the schema of a given collection.

Parameters

collection_name – Name of collection.

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.check_schema(collection_name)

Vector Similarity Search on Chunks. Chunk Search allows one to search through chunks inside a document. The major difference between chunk search and normal search in Vector AI is that it relies on the _chunkvector_ field.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • chunk_field (Field that the array of chunked documents are.) –

  • chunk_scoring (Scoring method for determining for ranking between document chunks.) –

  • page (Page of the results) –

  • page_size (Size of each page of results) –

  • approx (Used for approximate search) –

  • sum_fields (Whether to sum the multiple vectors similarity search score as 1 or seperate) –

  • metric (Similarity Metric, choose from ['cosine', 'l1', 'l2', 'dp']) –

  • filters (Query for filtering the search results) –

  • facets (Fields to include in the facets, if [] then all) –

  • min_score (Minimum score for similarity metric) –

  • include_vector (Include vectors in the search results) –

  • include_count (Include count in the search results) –

  • include_facets (Include facets in the search results) –

  • hundred_scale (Whether to scale up the metric by 100) –

  • asc (Whether to sort results by ascending or descending order) –

  • vector (Vector, a list/array of floats that represents a piece of data) –

  • search_fields (Vector fields to search against) –

  • chunk_page (Page of the chunk results) –

  • chunk_page_size (Size of each page of chunk results) –

cluster_aggregate(collection_name, aggregation_query, filters=[], page_size=20, page=1, asc=False, flatten=True, **kwargs)

Aggregate every cluster in a collection Takes an aggregation query and gets the aggregate of each cluster in a collection. This helps you interpret each cluster and what is in them.

Only can be used after a vector field has been clustered with /cluster.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • aggregation_query (Aggregation query to aggregate data) –

  • filters (Query for filtering the search results) –

  • page_size (Size of each page of results) –

  • page (Page of the results) –

  • asc (Whether to sort results by ascending or descending order) –

  • flatten

cluster_comparator(collection_name, cluster_field, cluster_value, vector_field, alias, **kwargs)

Compare clusters Compare the clusters for cluster comparator

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (the name of the collection) –

  • cluster_field (the cluster field) –

  • cluster_value (the cluster values by which to compare on) –

  • vector_field (The vector field that has been clustered) –

  • alias (The alias of the vector field) –

clustering_job(collection_name: str, vector_field: str, n_clusters: int = 0, refresh: bool = True, return_curl=False, **kwargs)

Clusters a collection by a vector field

Clusters a collection into groups using unsupervised machine learning. Clusters can then be aggregated to understand whats in them and how vectors are seperating data into different groups.

Parameters
  • vector_field – Vector field to perform clustering on

  • n_clusters – Number of clusters

  • refresh – Whether to refresh the whole collection and retrain the cluster model

  • collection_name – Name of Collection

Compare Searching By ID

compare_search_by_id(collection_name: str, vector_fields: List[str], document_id: str, fields_to_display: List[str] = None, image_fields: List[str] = [], audio_fields: List[str] = [], x_axis_title: str = 'Fields', y_axis_title: str = 'Vector fields', header: str = '<h1>Top-K Ranking Comparator</h1>', subheader: str = '<h2>Compare ranks in the different lists.</h2>', colors: List[str] = ['#ccff99', 'powderblue', '#ffc2b3'])

Compare Searching By ID

copy_collection(collection_name, original_collection_name, collection_schema={}, rename_fields={}, remove_fields=[], **kwargs)

Copy a collection into a new collection Copy a collection into a new collection. You can use this to rename fields and change data schema. This is considered a project job.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • original_collection_name (Name of collection to copy from) –

  • collection_schema (Schema to change, if unspecified then schema is unchanged. Defaults to no schema change) –

  • rename_fields (Fields to rename {'old_field': 'new_field'}. Defaults to no renames) –

  • remove_fields (Fields to remove ['random_field', 'another_random_field'] Defaults to no removes) –

copy_collection_from_another_user(collection_name, source_collection_name, source_username, source_api_key, **kwargs)

Copy a collection from another user’s projects into your project Copy a collection from another user’s projects into your project. This is considered a project job :param collection_name: :type collection_name: Collection to copy into :param username: :type username: Your username :param api_key: :type api_key: Your api key to access the username :param source_collection_name: :type source_collection_name: Collection to copy frpm :param source_username: :type source_username: Source username of whom the collection belongs to :param source_api_key: :type source_api_key: Api key to access the source username

create_collection(collection_name: str, collection_schema: Dict = {}, **kwargs)

Create a collection

A collection can store documents to be searched, retrieved, filtered and aggregated (similar to Collections in MongoDB, Tables in SQL, Indexes in ElasticSearch).

If you are inserting your own vector use the suffix (ends with) “_vector_” for the field name. and specify the length of the vector in colletion_schema like below example:

{
    "collection_schema": {
        "celebrity_image_vector_": 1024,
        "celebrity_audio_vector" : 512,
        "product_description_vector" : 128
    }
}
Parameters
  • collection_name – Name of a collection

  • collection_schema – A collection schema. This is necessary if the first document is not representative of the overall schema collection. This should be specified if the items need to be edited. The schema needs to look like this : { vector_field_name: vector_length }

Example

>>> collection_schema = {'image_vector_':2048}
>>> ViClient.create_collection(collection_name, collection_schema)
create_collection_from_document(collection_name: str, document: dict, **kwargs)

Creates a collection by infering the schema from a document

If you are inserting your own vector use the suffix (ends with) “_vector_” for the field name. e.g. “product_description_vector_”

Parameters
  • collection_name – Name of Collection

  • document – A Document is a JSON-like data that we store our metadata and vectors with. For specifying id of the document use the field ‘_id’, for specifying vector field use the suffix of ‘_vector_’

create_filter_query(collection_name: str, field: str, filter_type: str, filter_values: Union[List[str], str] = None)

Filter type can be one of contains/exact_match/categories/exists/insert_date/numeric_range Filter types can be one of: contains: Field must contain this specific string. Not case sensitive. exact_match: Field must have an exact match categories: Matches entire field exists: If field exists in document >= / > / < / <= : Larger than or equal to / Larger than / Smaller than / Smaller than or equal to These, however, can only be applied on numeric/date values. Check collection_schema.

Args: collection_name: The name of the collection field: The field to filter on filter_type: One of contains/exact_match/categories/>=/>/<=/<.

delete_by_filters(collection_name, filters=[], **kwargs)

Delete documents by filters Delete documents by filters.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • filters (Query for filtering the search results) –

delete_collection(collection_name: str, **kwargs)

Delete the collection via the colleciton name.

Parameters

collection_name – Name of collection to delete.

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.delete_collection(collection_name)
dimensionality_reduce(collection_name, vectors, vector_field, alias='default', n_components=1, **kwargs)

Reduces the dimension of a list of vectors Reduce the dimensions of a list of vectors you input into a desired dimension.

This can only reduce to dimensions less than or equal to the n_components that the dimensionality reduction model is trained on.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • vectors (Vectors to perform dimensionality reduction on) –

  • vector_field (Vector field to perform dimensionality reduction on) –

  • alias (Alias of the dimensionality reduced vectors) –

  • n_components (The size/length to reduce the vector down to.) –

dimensionality_reduction(collection_name, vector_field, n_components, alias='default', task='dimensionality_reduction', refresh=False, store_to_pipeline=True, **kwargs)

Start job to dimensionality reduce a vector field Dimensionality reduction allows your vectors to be reduced down to any dimensions greater than 0 using unsupervised machine learning.

This is useful for even faster search and visualising the vectors. :param username: :type username: Username :param api_key: :type api_key: Api Key, you can request it from request_api_key :param collection_name: :type collection_name: Name of Collection :param vector_field: :type vector_field: Vector field to perform dimensionality reduction on :param alias: :type alias: Alias is used to name a dimensionality reduced vector field :param task: :type task: The name of the task for the job. :param n_components: :type n_components: The size/length to reduce the vector down to. If 0 is set then highest possible is of components is set, when this is done you can get reduction on demand of any length. :param refresh: :type refresh: If True, overwrite all labelled dimensionality reduced fields, otherwise it just adds the fields that don’t have dimensionality reduced fields. :param store_to_pipeline: :type store_to_pipeline: Whether to store the dimensionality reduction model to the dimensionality reductions pipeline

dimensionality_reduction_job(collection_name: str, vector_field: str, n_components: int = 0, alias: str = 'default', refresh: bool = True, return_curl: bool = False, **kwargs)

Trains a Dimensionality Reduction model on the collection

Dimensionality reduction allows your vectors to be reduced down to any dimensions greater than 0 using unsupervised machine learning. This is useful for even faster search and visualising the vectors.

Parameters
  • vector_field – Vector field to perform dimensionality reduction on

  • alias – Alias is used to name the dimensionality reduced vectors

  • n_components – The size/length to reduce the vector down to. If 0 is set then highest possible is of components is set, when this is done you can get reduction on demand of any length.

  • refresh – Whether to refresh the whole collection and retrain the dimensionality reduction model

  • collection_name – Name of Collection

edit_document(collection_name, document_id, edits, insert_date=True, **kwargs)

Edit a document in a Collection by its id Edit by providing a key value pair of fields you are adding or changing.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • document_id (ID of a document) –

  • edits (A dictionary to edit and add fields to a document.) –

  • insert_date (Whether to include insert date as a field ‘insert_date_’.) –

edit_documents(collection_name: str, edits: Dict, chunk_size: int = 15, verbose: bool = False, **kwargs)

Edit documents in a collection

Parameters
  • collection_name – Name of collection

  • edits – What edits to make in a collection. Ensure that _id is stored in the document.

  • workers – Number of parallel processes to run.

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.edit_documents(collection_name, edits=documents, workers=10)
edit_search_history(collection_name, search_history_id, edits, **kwargs)

Edit search history by its id Edit search history by providing a key value pair of fields you are adding or changing.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • search_history_id (Search history ID of the collection.) –

  • edits (A dictionary to edit and add fields to a document.) –

encode_dictionary(collection_name, dictionary, dictionary_field, **kwargs)

Encode an dictionary into a vector For example: a dictionary that represents a person’s characteristics visiting a store, field “person_characteristics”:

{“height”:180, “age”:40, “weight”:70}

-> <Encode the dictionary to vector> ->

height | age | weight | purchases | visits |

|--------|—–|--------|———–|--------| | 180 | 40 | 70 | 0 | 0 |

dictionary vector: [180, 40, 70, 0, 0]

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • dictionary (A dictionary to encode into vectors) –

  • dictionary_field (The dictionary field that encoding of the dictionary is trained on) –

encode_fields(collection_name, document, vector_name, **kwargs)

Encode fields into a vector For example: we choose the fields [“height”, “age”, “weight”]

document field: {“height”:180, “age”:40, “weight”:70, “purchases”:20, “visits”: 12}

-> <Encode the fields to vectors> ->

height | age | weight |

|--------|—–|--------| | 180 | 40 | 70 |

document vector: {“person_characteristics_vector_”: [180, 40, 70]}

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • document (A document to encode into vectors) –

  • vector_name (The name of the vector that the fields turn into) –

encode_fields_to_vector(collection_name, vector_name, selected_fields, **kwargs)

Encode all selected fields for a collection into vectors Within a collection encode the specified fields in every document into vectors.

For example: we choose the fields [“height”, “age”, “weight”]

document 1 field: {“height”:180, “age”:40, “weight”:70, “purchases”:20, “visits”: 12}

document 2 field: {“height”:160, “age”:32, “weight”:50, “purchases”:10, “visits”: 24}

-> <Encode the fields to vectors> ->

height | age | weight |

|--------|—–|--------| | 180 | 40 | 70 | | 160 | 32 | 50 |

document 1 vector: {“person_characteristics_vector_”: [180, 40, 70]}

document 2 vector: {“person_characteristics_vector_”: [160, 32, 50]}

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • vector_name (The name of the vector that the fields turn into) –

  • selected_fields (The fields to turn into vectors) –

encode_multiple_arrays(collection_name, multiarray_query, **kwargs)

Encode multiple arrays into vectors. Encode Multiple arrays Multiarray query is in the format:

{

“array_1”: {“array”: [“YES”], “field”: “sample_array”}, “array_2”: {“array”: [“NO”], “field”: “sample_array_2”},

}

This will then return

{

“array_1”: [1e-7, 1], “array_2”: [1, 1e-7]

}

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • multiarray_query (List of array fields) –

filters(collection_name, filters=[], page=1, page_size=20, asc=False, include_vector=False, sort=[], **kwargs)

Filters a collection Filter is used to retrieve documents that match the conditions set in a filter query. This is used in advance search to filter the documents that are searched.

The filters query is a json body that follows the schema of:

[

{‘field’ : <field to filter>, ‘filter_type’ : <type of filter>, “condition”:”==”, “condition_value”:”america”}, {‘field’ : <field to filter>, ‘filter_type’ : <type of filter>, “condition”:”>=”, “condition_value”:90},

]

These are the available filter_type types:

  1. “contains”: for filtering documents that contains a string.

    {‘field’ : ‘item_brand’, ‘filter_type’ : ‘contains’, “condition”:”==”, “condition_value”: “samsu”}

  2. “exact_match”/”category”: for filtering documents that matches a string or list of strings exactly.

    {‘field’ : ‘item_brand’, ‘filter_type’ : ‘category’, “condition”:”==”, “condition_value”: “sumsung”}

  3. “categories”: for filtering documents that contains any of a category from a list of categories.

    {‘field’ : ‘item_category_tags’, ‘filter_type’ : ‘categories’, “condition”:”==”, “condition_value”: [“tv”, “smart”, “bluetooth_compatible”]}

  4. “exists”: for filtering documents that contains a field.

    {‘field’ : ‘purchased’, ‘filter_type’ : ‘exists’, “condition”:”==”, “condition_value”:” “}

If you are looking to filter for documents where a field doesn’t exist, run this:

{‘field’ : ‘purchased’, ‘filter_type’ : ‘exists’, “condition”:”!=”, “condition_value”:” “}

  1. “date”: for filtering date by date range.

    {‘field’ : ‘insert_date_’, ‘filter_type’ : ‘date’, “condition”:”>=”, “condition_value”:”2020-01-01”}

  2. “numeric”: for filtering by numeric range.

    {‘field’ : ‘price’, ‘filter_type’ : ‘numeric’, “condition”:”>=”, “condition_value”:90}

  3. “ids”: for filtering by document ids.

    {‘field’ : ‘ids’, ‘filter_type’ : ‘ids’, “condition”:”==”, “condition_value”:[“1”, “10”]}

These are the available conditions:

“==”, “!=”, “>=”, “>”, “<”, “<=”

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • filters (Query for filtering the search results) –

  • page (Page of the results) –

  • page_size (Size of each page of results) –

  • asc (Whether to sort results by ascending or descending order) –

  • include_vector (Include vectors in the search results) –

  • sort (Fields to sort by) –

head(collection_name: str, page_size: int = 5, return_as_pandas_df: bool = True)

The main Vi client with most of the available read and write methods available to it.

Parameters
  • collection_name – The name of your collection

  • page_size – The number of results to return

  • return_as_pandas_df – If True, return as a pandas DataFrame rather than a JSON.

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.head(collection_name, page_size=10)
hybrid_search_with_filters(collection_name: str, text: str, vector: List, fields: List, text_fields: List, filters: List = [], sum_fields: bool = True, metric: str = 'cosine', min_score=None, traditional_weight=0.075, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False, **kwargs)

Search a text field with vector and text using Vector Search and Traditional Search

Vector similarity search + Traditional Fuzzy Search with text and vector.

You can also give weightings of each vector field towards the search, e.g. image_vector_ weights 100%, whilst description_vector_ 50%.

Hybrid search with filters also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.

Parameters
  • collection_name – Name of Collection

  • page – Page of the results

  • page_size – Size of each page of results

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • filters – Query for filtering the search results

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count – Include count in the search results

  • include_facets – Include facets in the search results

  • hundred_scale – Whether to scale up the metric by 100

  • multivector_query – Query for advance search that allows for multiple vector and field querying

  • text – Text Search Query (not encoded as vector)

  • text_fields – Text fields to search against

  • traditional_weight – Multiplier of traditional search. A value of 0.025~0.1 is good.

  • fuzzy – Fuzziness of the search. A value of 1-3 is good.

  • join

    Whether to consider cases where there is a space in the word. E.g. Go Pro vs GoPro.

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)

insert(collection_name, document={}, insert_date=True, overwrite=True, update_schema=True, quick=False, pipeline=[], **kwargs)

Insert a document into a Collection When inserting the document you can specify your own id for a document by using the field name “_id”. For specifying your own vector use the suffix (ends with) “_vector_” for the field name. e.g. “product_description_vector_”

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • document (A Document is a JSON-like data that we store our metadata and vectors with. For specifying id of the document use the field '_id', for specifying vector field use the suffix of '_vector_') –

  • insert_date (Whether to include insert date as a field ‘insert_date_’.) –

  • overwrite (Whether to overwrite document if it exists.) –

  • update_schema (Whether the api should check the documents for vector datatype to update the schema.) –

  • quick (This will run the quickest insertion possible, which means there will be no schema checks or collection checks.) –

  • pipeline (This will run pipelines for the insert. example: pipeline=["encoders"]) –

insert_and_encode(encoders, collection_name, document={}, insert_date=True, overwrite=True, update_schema=True, quick=False, store_to_pipeline=True, **kwargs)

Insert and encode document into a Collection Insert a document and encode specified fields into vectors with provided model urls or model names.

{

“thumbnail” : {“model_url” : “https://a_vector_model_url.com/encode_image_url”, “body” : “url”}, “short_description” : {“model_url” : “https://a_vector_model_url.com/encode_text”, “body” : “text”}, “short_description” : {“model_url” : “bert”, “alias” : “bert”},

}

This primarily uses deployed models.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • encoders (An array structure of models to encode fields with.) –

    Encoders can be a model_url or a model_name. For model_name, the options are: image_text, image, text. text_multi, text_image. Note: image_text encodes images for text to image search whereas text_image encodes texts for text to image search (text to image search/image to text search works both ways). For model_url, you are free to deploy your own model and specify the required body as such.

    [

    {“model_url” : “https://a_vector_model_url.com/encode_image_url”, “body” : “url”, “field”: “thumbnail”}, {“model_url” : “https://a_vector_model_url.com/encode_text”, “body” : “text”, “field”: “short_description”}, {“model_name” : “text”, “body” : “text”, “field”: “short_description”, “alias”:”bert”}, {“model_name” : “image_text”, “body” : “url”, “field” : “thumbnail”},

    ]

  • collection_name (Name of Collection) –

  • document (A Document is a JSON-like data that we store our metadata and vectors with. For specifying id of the document use the field '_id', for specifying vector field use the suffix of '_vector_') –

  • insert_date (Whether to include insert date as a field ‘insert_date_’.) –

  • overwrite (Whether to overwrite document if it exists.) –

  • update_schema (Whether the api should check the documents for vector datatype to update the schema.) –

  • quick (This will run the quickest insertion possible, which means there will be no schema checks or collection checks.) –

  • store_to_pipeline (Whether to store the encoders to pipeline) –

insert_cluster_centroids(collection_name, cluster_centers, vector_field, alias='default', job=False, job_metric='cosine', **kwargs)

Insert cluster centroids Insert your own cluster centroids for it to be used in approximate search settings and cluster aggregations.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • cluster_centers (Cluster centers with the key being the index number) –

  • vector_field (Clustered vector field) –

  • alias (Alias is used to name a cluster) –

  • job (Whether to run a job where each document is assigned a cluster from the cluster_center) –

  • job_metric (Similarity Metric, choose from ['cosine', 'l1', 'l2', 'dp']) –

insert_df(collection_name: str, df: pandas.core.frame.DataFrame, models: Dict[str, Callable] = {}, chunksize: int = 15, workers: int = 1, verbose: bool = True, use_bulk_encode: bool = False, **kwargs)

Insert dataframe into a collection

Parameters
  • collection_name – Name of collection

  • df – Pandas DataFrame

  • models – Models with an encode method

  • verbose – Whether to print document ids that have failed when inserting.

Example

>>> from vectorai.models.deployed import ViText2Vec
>>> text_encoder = ViText2Vec(username, api_key, vectorai_url)
>>> documents_df = pd.DataFrame.from_records([{'chicken': 'Big chicken'}, {'chicken': 'small_chicken'}, {'chicken': 'cow'}])
>>> vi_client.insert_df(documents=documents_df, models={'chicken': text_encoder.encode})
insert_document(collection_name: str, document: Dict, verbose=False)

Insert a document into a collection

Parameters
  • collection_name – Name of collection

  • documents – List of documents/jsons/dictionaries.

Example

>>> from vectorai import ViClient
>>> from vectorai.models.deployed import ViText2Vec
>>> vi_client = ViClient()
>>> collection_name = 'test_collection'
>>> document = {'chicken': 'Big chicken'}
>>> vi_client.insert_document(collection_name, document)
insert_documents(collection_name: str, documents: List, models: Dict[str, Callable] = {}, chunksize: int = 15, workers: int = 1, verbose: bool = False, use_bulk_encode: bool = False, overwrite: bool = False, show_progress_bar: bool = True, quick: bool = False, preprocess_hook: Callable = None, **kwargs)

Insert documents into a collection with an option to encode with models.

Parameters
  • collection_name – Name of collection

  • documents – All documents.

  • models – Models with an encode method

  • use_bulk_encode – Use the bulk_encode method in models

  • verbose – Whether to print document ids that have failed when inserting.

  • overwrite – If True, overwrites document based on _id field.

  • quick – If True, skip the collection schema checks. Not advised if this is your first time using the API until you are used to using Vector AI.

  • preprocess_hook – Document-level function taht updates

Example

>>> from vectorai.models.deployed import ViText2Vec
>>> text_encoder = ViText2Vec(username, api_key, vectorai_url)
>>> documents = [{'chicken': 'Big chicken'}, {'chicken': 'small_chicken'}, {'chicken': 'cow'}]
>>> vi_client.insert_documents(documents, models={'chicken': text_encoder.encode})
insert_single_document(collection_name: str, document: Dict)

Encode documents with models.

Parameters

documents – List of documents/jsons/dictionaries.

Example

>>> from vectorai import ViClient
>>> from vectorai.models.deployed import ViText2Vec
>>> vi_client = ViClient()
>>> collection_name = 'test_collection'
>>> document = {'chicken': 'Big chicken'}
>>> vi_client.insert_single_document(collection_name, document)
predict_knn_regression(collection_name, vector, search_field, target_field, impute_value, k=5, weighting=True, predict_operation='mean', **kwargs)

Predict KNN regression. Predict with KNN regression using normal search.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • vector (Vector, a list/array of floats that represents a piece of data) –

  • search_field (The field to search with.) –

  • target_field (The field to perform regression on.) –

  • k (The number of results for KNN.) –

  • weighting (weighting) –

  • impute_value (What value to fill if target field is missing.) –

  • predict_operation (How to predict using the vectors.) –

process_doc(collection_name, file_url, filename, **kwargs)

Process doc or docx Insert a word doc or docx file into Vector AI

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (What collection to insert the word doc into) –

  • file_url (The file url blob) –

  • filename (The name of the Doc or DocX file) –

process_pdf(collection_name, file_url, filename, **kwargs)

Process pdf Insert a PDF into Vector AI.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (What collection to insert the PDF into) –

  • file_url (The file url blob) –

  • filename (The name of the PDF file.) –

random_aggregation_query(collection_name: str, groupby: int = 1, metrics: int = 1)

Generates a random filter query.

Parameters
  • collection_name – name of collection

  • groupby – The number of groupbys to randomly generate

  • metrics – The number of metrics to randomly generate

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.random_aggregation_query(collection_name, groupby=1, metrics=1)
random_documents_with_filters(collection_name, seed=10, include_fields=[], page_size=20, include_vector=True, filters=[], **kwargs)

Retrieve some documents randomly with filters Mainly for testing purposes.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • seed (Random Seed for retrieving random documents.) –

  • include_fields (Fields to include in the search results, empty array/list means all fields.) –

  • page_size (Size of each page of results) –

  • include_vector (Include vectors in the search results) –

  • filters (Query for filtering the search results) –

random_filter_query(collection_name: str, text_filters: int = 1, numeric_filters: int = 0)

Generates a random filter query.

Parameters
  • collection_name – name of collection

  • text_filters – The number of text filters to randomly generate

  • numeric_filters – The number of numeric filters to randomly generate

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.random_filter_query(collection_name, text_filters=1, numeric_filters=0)
random_recommendation(collection_name: str, search_field: str, seed=None, sum_fields: bool = True, metric: str = 'cosine', min_score=0, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, approx: int = 0, hundred_scale=True, asc: bool = False, **kwargs)

Recommend by random ID using vector search document_id:

ID of a document

collection_name:

Name of Collection

field:

Vector fields to search through

approx:

Used for approximate search

sum_fields:

Whether to sum the multiple vectors similarity search score as 1 or seperate

page_size:

Size of each page of results

page:

Page of the results

metric:

Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

min_score:

Minimum score for similarity metric

include_vector:

Include vectors in the search results

include_count:

Include count in the search results

hundred_scale:

Whether to scale up the metric by 100

asc:

Whether to sort the score by ascending order (default is false, for getting most similar results)

remove_encoder_from_pipeline(collection_name, vector_fields, **kwargs)

Remove an encoder from the collection’s encoders pipeline

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • vector_fields (Vector fields that identifies an encoder to remove from pipeline) –

resume_insert_documents(collection_name: str, documents: List, models: Dict[str, Callable] = {}, chunksize: int = 15, workers: int = 1, verbose: bool = False, use_bulk_encode: bool = False, show_progress_bar: bool = True)

Resume inserting documents

retrieve_all_documents(collection_name: str, sort: List = [], asc: bool = True, include_vector: bool = True, include_fields: List = [], retrieve_chunk_size: int = 1000, **kwargs)

Retrieve all documents in a given collection. We recommend specifying specific fields to extract as otherwise this function may take a long time to run.

Parameters
  • collection_name – Name of collection.

  • sort_by – Select the fields by which to sort by.

  • asc – If true, returns in ascending order of what is sort.

  • include_vector – If true, includes _vector_ fields to return them.

  • include_fields – Adjust which fields are returned.

  • retrieve_chunk_size – The number of documents to retrieve per request.

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> all_documents = vi_client.retrieve_all_documents(collection_name)
retrieve_and_edit(collection_name: str, edit_fn: Callable, refresh: bool = False, edited_fields: list = [], include_fields: list = [], chunksize: int = 15)

Retrieve all documents and re-encode with new models. :param collection_name: Name of collection :param edit_fn: Function for editing an entire document :param include_fields: The number of fields to retrieve to speed up the document :param retrieval step: :param chunksize: the number of results to :param retrieve and then encode and then edit in one go: :param edited_fields: These are the edited fields used to change

retrieve_and_encode(collection_name: str, models: Dict[str, Callable] = {}, chunksize: int = 15, use_bulk_encode: bool = False, filters: list = [], refresh: bool = False)

Retrieve all documents and re-encode with new models. :param collection_name: Name of collection :param models: Models as a dictionary :param chunksize: the number of results to :param retrieve and then encode and then edit in one go: :param use_bulk_encode: Whether to use bulk_encode on the models. :param filter_query: Filtering :param refresh: If True, retrieves and encodes from scratch, otherwise, only

encodes for fields that are not there. Only the filter for the first model is applied

retrieve_documents_with_filters(collection_name, include_fields=[], cursor=None, page_size=20, sort=[], asc=False, include_vector=True, filters=[], **kwargs)

Retrieve some documents with filters Retrieve documents with filters. Cursor is provided to retrieve even more documents. Loop through it to retrieve all documents in the database.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • include_fields (Fields to include in the search results, empty array/list means all fields.) –

  • cursor (Cursor to paginate the document retrieval) –

  • page_size (Size of each page of results) –

  • sort (Fields to sort by) –

  • asc (Whether to sort results by ascending or descending order) –

  • include_vector (Include vectors in the search results) –

  • filters (Query for filtering the search results) –

search(collection_name: str, vector: List, field: List, filters: List = [], approx: int = 0, sum_fields: bool = True, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False, **kwargs)

Vector Similarity Search. Search a vector field with a vector, a.k.a Nearest Neighbors Search

Enables machine learning search with vector search. Search with a vector for the most similar vectors.

For example: Search with a person’s characteristics, who are the most similar (querying the “persons_characteristics_vector” field):

Query person's characteristics as a vector:
[180, 40, 70] representing [height, age, weight]

Search Results:
[
    {"name": Adam Levine, "persons_characteristics_vector" : [180, 56, 71]},
    {"name": Brad Pitt, "persons_characteristics_vector" : [180, 56, 65]},
...]
Parameters
  • vector – Vector, a list/array of floats that represents a piece of data.

  • collection_name – Name of Collection

  • search_fields – Vector fields to search through

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • page_size – Size of each page of results

  • filters – Filters for search

  • page – Page of the results

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count

    Include count in the search results

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)

search_with_audio(collection_name, audio_url, model_url, search_fields, page=1, page_size=20, approx=0, sum_fields=True, metric='cosine', filters=[], facets=[], min_score=None, include_fields=[], include_vector=False, include_count=True, include_facets=False, hundred_scale=False, include_search_relevance=False, search_relevance_cutoff_aggressiveness=1, asc=False, keep_search_history=False, **kwargs)

Advanced Search an audio field with audio using Vector Search Vector similarity search with an audio directly.

_note: audio has to be stored somewhere and be provided as audio_url, a url that stores the audio_

For example: an audio_url represents sounds that a pokemon make:

https://play.pokemonshowdown.com/audio/cries/pikachu.mp3

-> <Encode the audio to vector> ->

audio vector: [0.794617772102356, 0.3581121861934662, 0.21113917231559753, 0.24878688156604767, 0.9741804003715515 …]

-> <Vector Search> ->

Search Results: {…}

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • page (Page of the results) –

  • page_size (Size of each page of results) –

  • approx (Used for approximate search) –

  • sum_fields (Whether to sum the multiple vectors similarity search score as 1 or seperate) –

  • metric (Similarity Metric, choose from ['cosine', 'l1', 'l2', 'dp']) –

  • filters (Query for filtering the search results) –

  • facets (Fields to include in the facets, if [] then all) –

  • min_score (Minimum score for similarity metric) –

  • include_fields (Fields to include in the search results, empty array/list means all fields.) –

  • include_vector (Include vectors in the search results) –

  • include_count (Include count in the search results) –

  • include_facets (Include facets in the search results) –

  • hundred_scale (Whether to scale up the metric by 100) –

  • include_search_relevance (Whether to calculate a search_relevance cutoff score to flag relevant and less relevant results) –

  • search_relevance_cutoff_aggressiveness (How aggressive the search_relevance cutoff score is (higher value the less results will be relevant)) –

  • asc (Whether to sort results by ascending or descending order) –

  • keep_search_history (Whether to store the history of search or not) –

  • audio_url (The audio url of an audio to encode into a vector) –

  • model_url (The model url of a deployed vectorhub model) –

  • search_fields (Vector fields to search against) –

search_with_audio_upload(collection_name, audio, model_url, search_fields, page=1, page_size=20, approx=0, sum_fields=True, metric='cosine', filters=[], facets=[], min_score=None, include_fields=[], include_vector=False, include_count=True, include_facets=False, hundred_scale=False, include_search_relevance=False, search_relevance_cutoff_aggressiveness=1, asc=False, keep_search_history=False, **kwargs)

Advanced Search audio fields with uploaded audio using Vector Search Vector similarity search with an uploaded audio directly.

_note: audio has to be sent as a base64 encoded string_

For example: an audio represents sounds that a pokemon make:

https://play.pokemonshowdown.com/audio/cries/pikachu.mp3

-> <Encode the audio to vector> ->

audio vector: [0.794617772102356, 0.3581121861934662, 0.21113917231559753, 0.24878688156604767, 0.9741804003715515 …] -> <Vector Search> ->

Search Results: {…}

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • page (Page of the results) –

  • page_size (Size of each page of results) –

  • approx (Used for approximate search) –

  • sum_fields (Whether to sum the multiple vectors similarity search score as 1 or seperate) –

  • metric (Similarity Metric, choose from ['cosine', 'l1', 'l2', 'dp']) –

  • filters (Query for filtering the search results) –

  • facets (Fields to include in the facets, if [] then all) –

  • min_score (Minimum score for similarity metric) –

  • include_fields (Fields to include in the search results, empty array/list means all fields.) –

  • include_vector (Include vectors in the search results) –

  • include_count (Include count in the search results) –

  • include_facets (Include facets in the search results) –

  • hundred_scale (Whether to scale up the metric by 100) –

  • include_search_relevance (Whether to calculate a search_relevance cutoff score to flag relevant and less relevant results) –

  • search_relevance_cutoff_aggressiveness (How aggressive the search_relevance cutoff score is (higher value the less results will be relevant)) –

  • asc (Whether to sort results by ascending or descending order) –

  • keep_search_history (Whether to store the history of search or not) –

  • audio (Audio represented as a base64 encoded string) –

  • model_url (The model url of a deployed vectorhub model) –

  • search_fields (Vector fields to search against) –

search_with_dictionary(collection_name, search_fields, search_history_id, dictionary, dictionary_field, page_size=20, page=1, approx=0, sum_fields=True, metric='cosine', min_score=None, include_fields=[], include_vector=False, include_count=True, hundred_scale=False, asc=False, keep_search_history=False, **kwargs)

Search a dictionary field with a dictionary using Vector Search Vector similarity search with a dictionary directly.

For example: a dictionary that represents a person’s characteristics visiting a store, field “person_characteristics”:

{“height”:180, “age”:40, “weight”:70}

-> <Encode the dictionary to vector> ->

height | age | weight | purchases | visits |

|--------|—–|--------|———–|--------| | 180 | 40 | 70 | 0 | 0 |

dictionary vector: [180, 40, 70, 0, 0]

-> <Vector Search> ->

Search Results: {…}

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • search_fields (Vector fields to search against) –

  • page_size (Size of each page of results) –

  • page (Page of the results) –

  • approx (Used for approximate search) –

  • sum_fields (Whether to sum the multiple vectors similarity search score as 1 or seperate) –

  • metric (Similarity Metric, choose from ['cosine', 'l1', 'l2', 'dp']) –

  • min_score (Minimum score for similarity metric) –

  • include_fields (Fields to include in the search results, empty array/list means all fields.) –

  • include_vector (Include vectors in the search results) –

  • include_count (Include count in the search results) –

  • hundred_scale (Whether to scale up the metric by 100) –

  • asc (Whether to sort results by ascending or descending order) –

  • keep_search_history (Whether to store the history of search or not) –

  • search_history_id (Search history ID of the collection.) –

  • dictionary (A dictionary to encode into vectors) –

  • dictionary_field (The dictionary field that encoding of the dictionary is trained on) –

search_with_fields(collection_name, search_fields, search_history_id, document, selected_fields, vector_name, page_size=20, page=1, approx=0, sum_fields=True, metric='cosine', min_score=None, include_fields=[], include_vector=False, include_count=True, hundred_scale=False, asc=False, keep_search_history=False, **kwargs)

Search with fields with a document using Vector Search Vector similarity search with fields directly.

For example: we choose the fields [“height”, “age”, “weight”]

document field: {“height”:180, “age”:40, “weight”:70, “purchases”:20, “visits”: 12}

-> <Encode the fields to vectors> ->

height | age | weight |

|--------|—–|--------| | 180 | 40 | 70 |

document dictionary vector: {“person_characteristics_vector_”: [180, 40, 70]}

-> <Vector Search> ->

Search Results: {…}

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • search_fields (Vector fields to search against) –

  • page_size (Size of each page of results) –

  • page (Page of the results) –

  • approx (Used for approximate search) –

  • sum_fields (Whether to sum the multiple vectors similarity search score as 1 or seperate) –

  • metric (Similarity Metric, choose from ['cosine', 'l1', 'l2', 'dp']) –

  • min_score (Minimum score for similarity metric) –

  • include_fields (Fields to include in the search results, empty array/list means all fields.) –

  • include_vector (Include vectors in the search results) –

  • include_count (Include count in the search results) –

  • hundred_scale (Whether to scale up the metric by 100) –

  • asc (Whether to sort results by ascending or descending order) –

  • keep_search_history (Whether to store the history of search or not) –

  • search_history_id (Search history ID of the collection.) –

  • document (A document to encode into vectors) –

  • selected_fields (The fields to turn into vectors) –

  • vector_name (A name to call the vector that the fields turn into) –

search_with_filters(collection_name: str, vector: List, field: List, filters: List = [], approx: int = 0, sum_fields: bool = True, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False, **kwargs)

Vector Similarity Search. Search a vector field with a vector, a.k.a Nearest Neighbors Search

Enables machine learning search with vector search. Search with a vector for the most similar vectors.

For example: Search with a person’s characteristics, who are the most similar (querying the “persons_characteristics_vector” field):

Query person's characteristics as a vector:
[180, 40, 70] representing [height, age, weight]

Search Results:
[
    {"name": Adam Levine, "persons_characteristics_vector" : [180, 56, 71]},
    {"name": Brad Pitt, "persons_characteristics_vector" : [180, 56, 65]},
...]
Parameters
  • vector – Vector, a list/array of floats that represents a piece of data.

  • collection_name – Name of Collection

  • search_fields – Vector fields to search through

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • page_size – Size of each page of results

  • page – Page of the results

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count

    Include count in the search results

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)

search_with_image(collection_name, image_url, model_url, search_fields, page=1, page_size=20, approx=0, sum_fields=True, metric='cosine', filters=[], facets=[], min_score=None, include_fields=[], include_vector=False, include_count=True, include_facets=False, hundred_scale=False, include_search_relevance=False, search_relevance_cutoff_aggressiveness=1, asc=False, keep_search_history=False, **kwargs)

Advanced Search an image field with image using Vector Search Vector similarity search with an image directly.

_note: image has to be stored somewhere and be provided as image_url, a url that stores the image_

For example: an image_url represents an image of a celebrity:

https://www.celebrity_images.com/brad_pitt.png

-> <Encode the image to vector> ->

image vector: [0.794617772102356, 0.3581121861934662, 0.21113917231559753, 0.24878688156604767, 0.9741804003715515 …]

-> <Vector Search> ->

Search Results: {…}

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • page (Page of the results) –

  • page_size (Size of each page of results) –

  • approx (Used for approximate search) –

  • sum_fields (Whether to sum the multiple vectors similarity search score as 1 or seperate) –

  • metric (Similarity Metric, choose from ['cosine', 'l1', 'l2', 'dp']) –

  • filters (Query for filtering the search results) –

  • facets (Fields to include in the facets, if [] then all) –

  • min_score (Minimum score for similarity metric) –

  • include_fields (Fields to include in the search results, empty array/list means all fields.) –

  • include_vector (Include vectors in the search results) –

  • include_count (Include count in the search results) –

  • include_facets (Include facets in the search results) –

  • hundred_scale (Whether to scale up the metric by 100) –

  • include_search_relevance (Whether to calculate a search_relevance cutoff score to flag relevant and less relevant results) –

  • search_relevance_cutoff_aggressiveness (How aggressive the search_relevance cutoff score is (higher value the less results will be relevant)) –

  • asc (Whether to sort results by ascending or descending order) –

  • keep_search_history (Whether to store the history of search or not) –

  • image_url (The image url of an image to encode into a vector) –

  • model_url (The model url of a deployed vectorhub model) –

  • search_fields (Vector fields to search against) –

search_with_image_upload(collection_name, image, model_url, search_fields, page=1, page_size=20, approx=0, sum_fields=True, metric='cosine', filters=[], facets=[], min_score=None, include_fields=[], include_vector=False, include_count=True, include_facets=False, hundred_scale=False, include_search_relevance=False, search_relevance_cutoff_aggressiveness=1, asc=False, keep_search_history=False, **kwargs)

Advanced Search an image field with uploaded image using Vector Search Vector similarity search with an uploaded image directly.

_note: image has to be sent as a base64 encoded string_

For example: an image represents an image of a celebrity:

https://www.celebrity_images.com/brad_pitt.png

-> <Encode the image to vector> ->

image vector: [0.794617772102356, 0.3581121861934662, 0.21113917231559753, 0.24878688156604767, 0.9741804003715515 …]

-> <Vector Search> ->

Search Results: {…}

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • page (Page of the results) –

  • page_size (Size of each page of results) –

  • approx (Used for approximate search) –

  • sum_fields (Whether to sum the multiple vectors similarity search score as 1 or seperate) –

  • metric (Similarity Metric, choose from ['cosine', 'l1', 'l2', 'dp']) –

  • filters (Query for filtering the search results) –

  • facets (Fields to include in the facets, if [] then all) –

  • min_score (Minimum score for similarity metric) –

  • include_fields (Fields to include in the search results, empty array/list means all fields.) –

  • include_vector (Include vectors in the search results) –

  • include_count (Include count in the search results) –

  • include_facets (Include facets in the search results) –

  • hundred_scale (Whether to scale up the metric by 100) –

  • include_search_relevance (Whether to calculate a search_relevance cutoff score to flag relevant and less relevant results) –

  • search_relevance_cutoff_aggressiveness (How aggressive the search_relevance cutoff score is (higher value the less results will be relevant)) –

  • asc (Whether to sort results by ascending or descending order) –

  • keep_search_history (Whether to store the history of search or not) –

  • image (Image represented as a base64 encoded string) –

  • model_url (The model url of a deployed vectorhub model) –

  • search_fields (Vector fields to search against) –

search_with_text(collection_name, text, search_fields, page=1, page_size=20, approx=0, sum_fields=True, metric='cosine', filters=[], facets=[], min_score=None, include_fields=[], include_vector=False, include_count=True, include_facets=False, hundred_scale=False, include_search_relevance=False, search_relevance_cutoff_aggressiveness=1, asc=False, keep_search_history=False, **kwargs)

Advanced Search text fields with text using Vector Search Vector similarity search with text directly.

For example: “product_description” represents the description of a product:

“AirPods deliver effortless, all-day audio on the go. And AirPods Pro bring Active Noise Cancellation to an in-ear headphone — with a customisable fit”

-> <Encode the text to vector> ->

i.e. text vector, “product_description_vector_”: [0.794617772102356, 0.3581121861934662, 0.21113917231559753, 0.24878688156604767, 0.9741804003715515 …]

-> <Vector Search> ->

Search Results: {…}

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • page (Page of the results) –

  • page_size (Size of each page of results) –

  • approx (Used for approximate search) –

  • sum_fields (Whether to sum the multiple vectors similarity search score as 1 or seperate) –

  • metric (Similarity Metric, choose from ['cosine', 'l1', 'l2', 'dp']) –

  • filters (Query for filtering the search results) –

  • facets (Fields to include in the facets, if [] then all) –

  • min_score (Minimum score for similarity metric) –

  • include_fields (Fields to include in the search results, empty array/list means all fields.) –

  • include_vector (Include vectors in the search results) –

  • include_count (Include count in the search results) –

  • include_facets (Include facets in the search results) –

  • hundred_scale (Whether to scale up the metric by 100) –

  • include_search_relevance (Whether to calculate a search_relevance cutoff score to flag relevant and less relevant results) –

  • search_relevance_cutoff_aggressiveness (How aggressive the search_relevance cutoff score is (higher value the less results will be relevant)) –

  • asc (Whether to sort results by ascending or descending order) –

  • keep_search_history (Whether to store the history of search or not) –

  • text (Text to encode into vector and vector search with) –

  • search_fields (Vector fields to search against) –

store_encoders_pipeline(encoders, collection_name, **kwargs)

Store encoder to the collection’s pipeline

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • encoders (An array structure of models to encode fields with.) –

    Encoders can be a model_url or a model_name. For model_name, the options are: image_text, image, text. text_multi, text_image. Note: image_text encodes images for text to image search whereas text_image encodes texts for text to image search (text to image search/image to text search works both ways). For model_url, you are free to deploy your own model and specify the required body as such.

    [

    {“model_url” : “https://a_vector_model_url.com/encode_image_url”, “body” : “url”, “field”: “thumbnail”}, {“model_url” : “https://a_vector_model_url.com/encode_text”, “body” : “text”, “field”: “short_description”}, {“model_name” : “text”, “body” : “text”, “field”: “short_description”, “alias”:”bert”}, {“model_name” : “image_text”, “body” : “url”, “field” : “thumbnail”},

    ]

  • collection_name (Name of Collection) –

store_taggers_pipeline(collection_name, taggers, **kwargs)

Store multiple tagger pipelines Store pipeline for taggers.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • taggers (Taggers contain metadata to) –

  • and then use a specific collection to get data. (encode) – {“field”: field, “vector_field”: vector_field, “tag_field”: tag_field, “tag_vector_field”: tag_vector_field, “number_of_tags”: number_of_tags, “alias”: alias, “metric”: metric}

tag_collection_from_vectors(collection_name, vector_field, tag_collection_name, tag_field='tag', tag_vector_field='tag_vector_', metric='cosine', alias='default', number_of_tags=5, include_tag_vector=True, pad_vector_length=100, refresh=True, return_tagged_documents=True, **kwargs)

Add tagging Tag vectors

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • vector_field (vector field from the source collection to tag on) –

  • tag_collection_name (Name of the collection you are retrieving the tags from) –

  • tag_field (The field in the tag collection to use for tagging.) –

  • tag_vector_field (vector field in the tag collection, used for matching the vectors to tag.) –

  • metric (Similarity Metric, choose from ['cosine', 'l1', 'l2', 'dp']) –

  • alias (The alias of the tags. Defaults to 'default') –

  • number_of_tags (The number of tags to retrieve.) –

  • include_tag_vector (Whether to include the one hot encoded tag vector.) –

  • pad_vector_length (Whether to pad the vector length of the one hot encoded array.) –

  • refresh (If True, retags the whole collection.) –

  • return_tagged_documents (If True, returns the original documents with tags.) –

tag_job(collection_name, tag_collection_name, hub_username, hub_api_key, field='', encoder_task='text', tag_field='tag', tag_vector_field='tag_vector_', alias='default', metric='cosine', number_of_tags=5, include_tag_vector=True, refresh=False, store_to_pipeline=True, **kwargs)

Start a job for encoding a field and then tagging Encode using an encoder and tag

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • tag_collection_name (Name of the collection you are retrieving the tags from) –

  • field (The field in the source collection to be tagged.) –

  • encoder_task (Name of the task to run an encoding job on. This can be one of text, text-image, text-multi, image-text.) –

  • tag_field (The field in the tag collection to use for tagging.) –

  • tag_vector_field (vector field in the tag collection, used for matching the vectors to tag.) –

  • alias (The alias of the tags. Defaults to 'default') –

  • metric (Similarity Metric, choose from ['cosine', 'l1', 'l2', 'dp']) –

  • number_of_tags (The number of tags to retrieve.) –

  • include_tag_vector (Whether to include the one hot encoded tag vector.) –

  • refresh (If True, Re-tags from scratch.) –

  • hub_username (The username of the hub account.) –

  • hub_api_key (The API key of the hub account.) –

  • store_to_pipeline (Whether to store the encoders to pipeline) –

tag_vector_job(collection_name, tag_collection_name, vector_field, hub_username, hub_api_key, tag_field='tag', tag_vector_field='tag_vector_', field='', alias='default', metric='cosine', number_of_tags=5, include_tag_vector=True, refresh=False, store_to_pipeline=True, **kwargs)

Start job for tagging vectors Search for a tag and then encode

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • tag_collection_name (Name of the collection you are retrieving the tags from) –

  • vector_field (vector field from the source collection to tag on) –

  • tag_field (The field in the tag collection to use for tagging.) –

  • tag_vector_field (vector field in the tag collection, used for matching the vectors to tag.) –

  • field (The field in the source collection to be tagged.) –

  • alias (The alias of the tags. Defaults to 'default') –

  • metric (Similarity Metric, choose from ['cosine', 'l1', 'l2', 'dp']) –

  • number_of_tags (The number of tags to retrieve.) –

  • include_tag_vector (Whether to include the one hot encoded tag vector.) –

  • refresh (If True, Re-tags from scratch.) –

  • hub_username (The username of the hub account.) –

  • hub_api_key (The API key of the hub account.) –

  • store_to_pipeline (Whether to store the encoders to pipeline) –

text_chunking(collection_name, text_field, chunk_field, insert_results_to_seperate_collection_name, refresh=True, insert_results=True, return_processed_documents=False, **kwargs)

Chunking a text field in a collection. Chunking a text field in a collection. e.g. a paragraph text field to sentence chunks

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • text_field (Text field to chunk) –

  • chunk_field (Whats the field that the text chunks will belong in) –

  • refresh (Whether to refresh the whole collection and re-encode all to vectors) –

  • insert_results (Whether to insert the processed document chunks into the collection.) –

  • insert_results_to_seperate_collection_name (If specified the chunks will be inserted into a seperate collection. Default is None which means no seperate collection.) –

  • return_processed_documents (Whether to return the processed documents.) –

text_chunking_encoder(collection_name, text_field, chunk_field, insert_results_to_seperate_collection_name, encoder_task='text', refresh=True, store_to_pipeline=True, **kwargs)

Chunk a text field and encode the chunks Split text into separate sentences. Encode each sentence to create chunkvectors. These are stored as _chunkvector_. The chunk field created is field + _chunk_.

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • text_field (Text field to chunk) –

  • chunk_field (Whats the field that the text chunks will belong in) –

  • encoder_task (Encoder that is used to turn the text chunks into vectors) –

  • refresh (Whether to refresh the whole collection and re-encode all to vectors) –

  • insert_results_to_seperate_collection_name (If specified the chunks will be inserted into a seperate collection. Default is None which means no seperate collection.) –

  • store_to_pipeline (Whether to store the encoder to the chunking pipeline) –

update_by_filters(collection_name, updates, filters=[], **kwargs)

Updates documents by filters Updates documents by filters. The updates to make to the documents that is returned by a filter. The updates should be specified in a format of {“field_name”: “value”}. e.g. {“item.status” : “Sold Out”}

Parameters
  • username (Username) –

  • api_key (Api Key, you can request it from request_api_key) –

  • collection_name (Name of Collection) –

  • updates (Updates to make to the documents. It should be specified in a format of {"field_name": "value"}. e.g. {"item.status" : "Sold Out"}) –

  • filters (Query for filtering the search results) –

wait_till_jobs_complete(collection_name: str, job_id: str, job_name: str)

Wait until a specific job is complete.

Parameters
  • collection_name – Name of collection.

  • job_id – ID of the job.

  • job_name – Name of the job.

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> job = vi_client.dimensionality_reduction_job('nba_season_per_36_stats_demo', vector_field='season_vector_', n_components=2)
>>> vi_client.wait_till_jobs_complete('nba_season_per_36_stats_demo', **job)