Client

Client

Documentation for Vector AI client goes here.

class vectorai.client.ViClient(username: str = None, api_key: str = None, url: str = 'https://api.vctr.ai', verbose: bool = True)

The main Vi client with most of the available read and write methods available to it.

Parameters
  • username – your username for accessing vectorai

  • api_key – your api key for accessing vectorai

  • url – url of the deployed vectorai database

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.list_collections()
vectorai.client.request_api_key(username: str, email: str, description: str = "I'd like to try it out.", referral_code: str = 'github_referred')

Request an api key Make sure to save the api key somewhere safe. If you have a valid referral code, you can recieve the api key more quickly.

Parameters
  • username – Username you’d like to create, lowercase only

  • email – Email you are using to sign up

  • description – Description of your intended use case

  • referral_code – The referral code you’ve been given to allow you to register for an api key before others

class vectorai.client.ViCollectionClient(collection_name: str, username: str, api_key: str, url: str = 'https://api.vctr.ai', verbose: bool = True)

The Vi client when you are mainly working with 1 client.

Parameters
  • username – your username for accessing vecdb

  • api_key – your api key for accessing vecdb

  • url – url of the deployed vecdb database

  • collection_name – The name of the collection

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, collection_name, vectorai_url)
>>> vi_client.insert_documents(documents)
advanced_cluster_aggregate(collection_name: str, aggregation_query: Dict, vector_field: str, alias: str = 'default', page: int = 1, page_size: int = 10, asc: bool = False, filters: list = [], flatten: bool = True)

Aggregate every cluster in a collection

Takes an aggregation query and gets the aggregate of each cluster in a collection. This helps you interpret each cluster and what is in them.

Only can be used after a vector field has been clustered with /advanced_cluster.

Parameters
  • collection_name – Name of Collection

  • aggregation_query – Aggregation query to aggregate data

  • page_size – Size of each page of results

  • page – Page of the results

  • asc – Whether to sort results by ascending or descending order

  • vector_field – Clustered vector field

  • alias

    Alias of a cluster

    flatten:

    Whether to flatten the aggregated results into a list of dictionarys or dictionary of lists.

advanced_cluster_centroid_documents(collection_name: str, vector_field: str, alias: str = 'default', metric: str = 'cosine', include_vector: bool = True)

Returns the document closest to each cluster center of a collection

Only can be used after a vector field has been clustered with /advanced_cluster.

Parameters
  • vector_field – Clustered vector field

  • alias – Alias is used to name a cluster

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • include_vector – Include vectors in the search results

  • collection_name – Name of Collection

advanced_cluster_centroids(collection_name: str, vector_field: str, alias: str = 'default')

Returns the cluster centers of a collection by a vector field

Only can be used after a vector field has been clustered with /advanced_cluster.

Parameters
  • vector_field – Clustered vector field

  • alias – Alias is used to name a cluster

  • collection_name – Name of Collection

advanced_cluster_facets(collection_name: str, vector_field: str, alias: str = 'default', facets_fields: List = [], asc: bool = True)

Get Facets in each cluster in a collection

Takes a high level aggregation of every field and every cluster in a collection. This helps you interpret each cluster and what is in them.

Only can be used after a vector field has been clustered with /advanced_cluster.

Parameters
  • vector_field – Clustered vector field

  • alias – Alias is used to name a cluster

  • facets_fields – Fields to include in the facets, if [] then all

  • page_size – Size of facet page

  • page – Page of the results

  • asc – Whether to sort results by ascending or descending order

  • date_interval – Interval for date facets

  • collection_name – Name of Collection

advanced_clustering_job(collection_name: str, vector_field: str, alias: str = 'default', n_clusters: int = 0, n_init: int = 5, n_iter: int = 10, refresh: bool = True)

Clusters a collection by a vector field

Clusters a collection into groups using unsupervised machine learning. Clusters can then be aggregated to understand whats in them and how vectors are seperating data into different groups. Advanced cluster allows for more parameters to tune and alias to name each differently trained clusters.

Parameters
  • vector_field – Vector field to perform clustering on

  • alias – Alias is used to name a cluster

  • n_clusters – Number of clusters

  • n_iter – Number of iterations in each run

  • n_init – Number of runs to run with different centroid seeds

  • refresh – Whether to refresh the whole collection and retrain the cluster model

  • collection_name – Name of Collection

Advanced Search a text field with vector and text using Vector Search and Traditional Search

Advanced Vector similarity search + Traditional Fuzzy Search with text and vector.

You can also give weightings of each vector field towards the search, e.g. image_vector_ weights 100%, whilst description_vector_ 50%.

Advanced search also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.

Parameters
  • collection_name – Name of Collection

  • page – Page of the results

  • page_size – Size of each page of results

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • filters – Query for filtering the search results

  • facets – Fields to include in the facets, if [] then all

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count – Include count in the search results

  • include_facets – Include facets in the search results

  • hundred_scale – Whether to scale up the metric by 100

  • multivector_query – Query for advance search that allows for multiple vector and field querying

  • text – Text Search Query (not encoded as vector)

  • text_fields – Text fields to search against

  • traditional_weight – Multiplier of traditional search. A value of 0.025~0.1 is good.

  • fuzzy – Fuzziness of the search. A value of 1-3 is good.

  • join

    Whether to consider cases where there is a space in the word. E.g. Go Pro vs GoPro.

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)

Advanced Vector Similarity Search. Support for multiple vectors, vector weightings, facets and filtering

Advance Vector Similarity Search, enables machine learning search with vector search. Search with a multiple vectors for the most similar documents.

For example: Search with a product image and description vectors to find the most similar products by what it looks like and what its described to do.

You can also give weightings of each vector field towards the search, e.g. image_vector_ weights 100%, whilst description_vector_ 50%.

Advanced search also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.

Parameters
  • collection_name – Name of Collection

  • multivector_query – Query for advance search that allows for multiple vector and field querying

  • page – Page of the results

  • page_size – Size of each page of results

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • filters – Query for filtering the search results

  • facets – Fields to include in the facets, if [] then all

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count – Include count in the search results

  • include_facets – Include facets in the search results

  • hundred_scale

    Whether to scale up the metric by 100

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)

Example

>>> vi_client = ViCollectionClient(username, api_key, collection_name, url)
>>> advanced_search_query = {
        'text' : {'vector': encode_question("How do I cluster?"), 'fields' : ['function_vector_']}
    }
>>> vi_client.advanced_search(advanced_search_query)
advanced_search_by_id(collection_name: str, document_id: str, fields: Dict, sum_fields: bool = True, facets: List = [], filters: List = [], metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, include_facets: bool = False, asc: bool = False)

Advanced Single Product Recommendations (Search by an id).

For example: Search with id of a product in the database, and using the product’s image and description vectors to find the most similar products by what it looks like and what its described to do.

You can also give weightings of each vector field towards the search, e.g. image_vector_ weights 100%, whilst description_vector_ 50%.

Advanced search also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.

Parameters
  • collection_name – Name of Collection

  • page – Page of the results

  • page_size – Size of each page of results

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • filters – Query for filtering the search results

  • facets – Fields to include in the facets, if [] then all

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count – Include count in the search results

  • include_facets – Include facets in the search results

  • hundred_scale – Whether to scale up the metric by 100

  • document_id – ID of a document

  • search_fields

    Vector fields to search against, and the weightings for them.

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)

Example

>>> filter_query = [
        {'field': 'field_name',
        'filter_type': 'text',
        'condition_value': 'monkl',
        'condition': '=='}
    ]
>>> results = client.advanced_search_by_id(document_id=client.random_documents()['documents'][0]['_id'],
fields={'image_url_field_flattened_vector_':1}, filters=filter_query)
advanced_search_by_ids(collection_name: str, document_ids: Dict, fields: Dict, vector_operation: str = 'mean', sum_fields: bool = True, facets: List = [], filters: List = [], metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, include_facets: bool = False, asc: bool = False)

Advanced Multi Product Recommendations (Search by ids).

For example: Search with multiple ids of products in the database, and using the product’s image and description vectors to find the most similar products by what it looks like and what its described to do.

You can also give weightings of each vector field towards the search, e.g. image_vector_ weights 100%, whilst description_vector_ 50%.

You can also give weightings of on each product as well e.g. product ID-A weights 100% whilst product ID-B 50%.

Advanced search also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.

Parameters
  • collection_name – Name of Collection

  • page – Page of the results

  • page_size – Size of each page of results

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • filters – Query for filtering the search results

  • facets – Fields to include in the facets, if [] then all

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count – Include count in the search results

  • include_facets – Include facets in the search results

  • hundred_scale – Whether to scale up the metric by 100

  • document_ids – Document IDs to get recommendations for, and the weightings of each document

  • search_fields – Vector fields to search against, and the weightings for them.

  • vector_operation

    Aggregation for the vectors, choose from [‘mean’, ‘sum’, ‘min’, ‘max’]

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)

advanced_search_by_positive_negative_ids(collection_name: str, positive_document_ids: Dict, negative_document_ids: Dict, fields: Dict, vector_operation: str = 'mean', sum_fields: bool = True, facets: List = [], filters: List = [], metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, include_facets: bool = False, asc: bool = False)

Advanced Multi Product Recommendations with likes and dislikes (Search by ids).

For example: Search with multiple ids of liked and dislike products in the database. Then using the product’s image and description vectors to find the most similar products by what it looks like and what its described to do against the positives and most disimilar products for the negatives.

You can also give weightings of each vector field towards the search, e.g. image_vector_ weights 100%, whilst description_vector_ 50%.

You can also give weightings of on each product as well e.g. product ID-A weights 100% whilst product ID-B 50%.

Advanced search also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.

Parameters
  • collection_name – Name of Collection

  • page – Page of the results

  • page_size – Size of each page of results

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • filters – Query for filtering the search results

  • facets – Fields to include in the facets, if [] then all

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count – Include count in the search results

  • include_facets – Include facets in the search results

  • hundred_scale – Whether to scale up the metric by 100

  • positive_document_ids – Positive Document IDs to get recommendations for, and the weightings of each document

  • negative_document_ids – Negative Document IDs to get recommendations for, and the weightings of each document

  • search_fields – Vector fields to search against, and the weightings for them.

  • vector_operation

    Aggregation for the vectors, choose from [‘mean’, ‘sum’, ‘min’, ‘max’]

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)

advanced_search_with_positive_negative_ids_as_history(collection_name: str, vector: List, positive_document_ids: Dict, negative_document_ids: Dict, fields: Dict, vector_operation: str = 'mean', sum_fields: bool = True, facets: List = [], filters: List = [], metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, include_facets: bool = False, asc: bool = False)

Advanced Search with Likes and Dislikes as history

For example: Vector search of a query vector with multiple ids of liked and dislike products in the database. Then using the product’s image and description vectors to find the most similar products by what it looks like and what its described to do against the positives and most disimilar products for the negatives.

You can also give weightings of each vector field towards the search, e.g. image_vector_ weights 100%, whilst description_vector_ 50%.

You can also give weightings of on each product as well e.g. product ID-A weights 100% whilst product ID-B 50%.

Advanced search also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.

Parameters
  • collection_name – Name of Collection

  • page – Page of the results

  • page_size – Size of each page of results

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • filters – Query for filtering the search results

  • facets – Fields to include in the facets, if [] then all

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count – Include count in the search results

  • include_facets – Include facets in the search results

  • hundred_scale – Whether to scale up the metric by 100

  • positive_document_ids – Positive Document IDs to get recommendations for, and the weightings of each document

  • negative_document_ids – Negative Document IDs to get recommendations for, and the weightings of each document

  • search_fields – Vector fields to search against, and the weightings for them.

  • vector_operation – Aggregation for the vectors, choose from [‘mean’, ‘sum’, ‘min’, ‘max’]

  • vector

    Vector, a list/array of floats that represents a piece of data

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)

aggregate(collection_name: str, aggregation_query: Dict, page: int = 1, page_size: int = 10, asc: bool = False, flatten: bool = True)

Aggregate a collection

Aggregation/Groupby of a collection using an aggregation query. The aggregation query is a json body that follows the schema of:

{
    "groupby" : [
        {"name": <nickname/alias>, "field": <field in the collection>, "agg": "category"},
        {"name": <another_nickname/alias>, "field": <another groupby field in the collection>, "agg": "category"}
    ],
    "metrics" : [
        {"name": <nickname/alias>, "field": <numeric field in the collection>, "agg": "avg"}
    ]
}
  • “groupby” is the fields you want to split the data into. These are the available groupby types:

    • category” : groupby a field that is a category

  • “metrics” is the fields you want to metrics you want to calculate in each of those. These are the available metric types: every aggregation includes a frequency metric:

    • average”, “max”, “min”, “sum”, “cardinality”

Parameters
  • collection_name – Name of Collection

  • aggregation_query – Aggregation query to aggregate data

  • page_size – Size of each page of results

  • page – Page of the results

  • asc

    Whether to sort results by ascending or descending order

    flatten:

    Whether to flatten the aggregated results into a list of dictionarys or dictionary of lists.

bulk_id(collection_name: str, document_ids: List[str])

Look up multiple document by their ids

Parameters
  • document_ids – IDs of documents

  • include_vector – Include vectors in the search results

  • collection_name – Name of Collection

bulk_insert(collection_name: str, documents: List, insert_date: bool = True, overwrite: bool = True)

Insert multiple documents into a Collection When inserting the document you can specify your own id for a document by using the field name “_id”. For specifying your own vector use the suffix (ends with) “_vector_” for the field name. e.g. “product_description_vector_”

Parameters
  • collection_name – Name of Collection

  • documents – A list of documents. Document is a JSON-like data that we store our metadata and vectors with. For specifying id of the document use the field ‘_id’, for specifying vector field use the suffix of ‘_vector_’

  • insert_date – Whether to include insert date as a field ‘insert_date_’.

  • overwrite – Whether to overwrite document if it exists.

bulk_insert_and_encode(collection_name: str, docs: list, models: dict)

Client-side encoding of documents to improve speed of inserting. This removes the step of retrieving the vectors and can be useful to accelerate the encoding process if required. Models can be one of ‘text’, ‘audio’ or ‘image’.

bulk_missing_id(collection_name: str, document_ids: List[str])

Return IDs that are not in a collection.

check_schema(collection_name: str, document: Dict = None)

Check the schema of a given collection.

Parameters

collection_name – Name of collection.

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.check_schema(collection_name)

Chunk search functionality :param collection_name: Name of collection :param vector: A list of values :param Search_fields: A list of fields to search :param chunk_scoring: How each chunk should be scored :param approx: How many approximate neighbors to go through

cluster_aggregate(collection_name: str, aggregation_query: Dict, page: int = 1, page_size: int = 10, asc: bool = False, flatten: bool = True)

Aggregate every cluster in a collection

Takes an aggregation query and gets the aggregate of each cluster in a collection. This helps you interpret each cluster and what is in them.

Only can be used after a vector field has been clustered with /cluster.

Parameters
  • collection_name – Name of Collection

  • aggregation_query – Aggregation query to aggregate data

  • page_size – Size of each page of results

  • page – Page of the results

  • asc

    Whether to sort results by ascending or descending order

    flatten:

    Whether to flatten the aggregated results into a list of dictionarys or dictionary of lists.

cluster_centroid_documents(collection_name: str, vector_field: str, metric: str = 'cosine', include_vector: bool = True)

Returns the document closest to each cluster center of a collection

Only can be used after a vector field has been clustered with /cluster.

Parameters
  • vector_field – Clustered vector field

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • include_vector – Include vectors in the search results

  • collection_name – Name of Collection

cluster_centroids(collection_name: str, vector_field: str)

Returns the cluster centers of a collection by a vector field

Only can be used after a vector field has been clustered with /cluster.

Parameters
  • vector_field – Clustered vector field

  • collection_name – Name of Collection

cluster_facets(collection_name: str, fields: List = [], asc: bool = True)

Get Facets in each cluster in a collection

Takes a high level aggregation of every field and every cluster in a collection. This helps you interpret each cluster and what is in them.

Only can be used after a vector field has been clustered with /cluster.

Parameters
  • facets_fields – Fields to include in the facets, if [] then all

  • page_size – Size of facet page

  • page – Page of the results

  • asc – Whether to sort results by ascending or descending order

  • date_interval – Interval for date facets

  • collection_name – Name of Collection

clustering_job(collection_name: str, vector_field: str, n_clusters: int = 0, refresh: bool = True)

Clusters a collection by a vector field

Clusters a collection into groups using unsupervised machine learning. Clusters can then be aggregated to understand whats in them and how vectors are seperating data into different groups.

Parameters
  • vector_field – Vector field to perform clustering on

  • n_clusters – Number of clusters

  • refresh – Whether to refresh the whole collection and retrain the cluster model

  • collection_name – Name of Collection

collection_schema(collection_name: str)

Retrieves the schema of a collection

The schema of a collection can include types of: text, numeric, date, bool, etc.

Parameters

collection_name – Name of Collection

collection_stats(collection_name: str)

Retrieves stats about a collection

Stats include: size, searches, number of documents, etc.

Parameters

collection_name – Name of Collection

compare_vector_search_results(collection_name: str, vector_fields: List[str], label: str, id_document: str = None, id_value: str = None, num_rows=10)

Compare vector results :param vector_fields: The list of vectors :param id_value: The value of the ID of the document :param id_document: The document with the id_value in it :param label: The label for the vector :param num_rows: The number of rows to compare search results for

Example

compare_vector_search_results(collection_name, vector_fields)

create_collection(collection_name: str, collection_schema: Dict = {})

Create a collection

A collection can store documents to be searched, retrieved, filtered and aggregated (similar to Collections in MongoDB, Tables in SQL, Indexes in ElasticSearch).

If you are inserting your own vector use the suffix (ends with) “_vector_” for the field name. and specify the length of the vector in colletion_schema like below example:

{
    "collection_schema": {
        "celebrity_image_vector_": 1024,
        "celebrity_audio_vector" : 512,
        "product_description_vector" : 128
    }
}
Parameters
  • collection_name – Name of a collection

  • collection_schema – A collection schema. This is necessary if the first document is not representative of the overall schema collection. This should be specified if the items need to be edited. The schema needs to look like this : { vector_field_name: vector_length }

Example

>>> collection_schema = {'image_vector_':2048}
>>> ViClient.create_collection(collection_name, collection_schema)
create_collection_from_document(collection_name: str, document: dict)

Creates a collection by infering the schema from a document

If you are inserting your own vector use the suffix (ends with) “_vector_” for the field name. e.g. “product_description_vector_”

Parameters
  • collection_name – Name of Collection

  • document – A Document is a JSON-like data that we store our metadata and vectors with. For specifying id of the document use the field ‘_id’, for specifying vector field use the suffix of ‘_vector_’

create_filter_query(collection_name: str, field: str, filter_type: str, filter_values: Union[List[str], str] = None)

Filter type can be one of contains/exact_match/categories/exists/insert_date/numeric_range Filter types can be one of: contains: Field must contain this specific string. Not case sensitive. exact_match: Field must have an exact match categories: Matches entire field exists: If field exists in document >= / > / < / <= : Larger than or equal to / Larger than / Smaller than / Smaller than or equal to These, however, can only be applied on numeric/date values. Check collection_schema.

Args: collection_name: The name of the collection field: The field to filter on filter_type: One of contains/exact_match/categories/>=/>/<=/<.

delete_by_id(collection_name: str, document_id: str)

Delete a document in a Collection by its id

Parameters
  • document_id – ID of a document

  • collection_name – Name of Collection

delete_collection(collection_name: str)

Delete the collection via the colleciton name.

Parameters

collection_name – Name of collection to delete.

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.delete_collection(collection_name)
dimensionality_reduce(collection_name: str, vectors: List[List[float]], vector_field: str, n_components: int, alias: str = 'default')

Trains a Dimensionality Reduction model on the collection

Dimensionality reduction allows your vectors to be reduced down to any dimensions greater than 0 using unsupervised machine learning. This is useful for even faster search and visualising the vectors.

Parameters
  • vector_field – Vector field to perform dimensionality reduction on

  • alias – Alias is used to name the dimensionality reduced vectors

  • n_components – The size/length to reduce the vector down to. If 0 is set then highest possible is of components is set, when this is done you can get reduction on demand of any length.

  • refresh – Whether to refresh the whole collection and retrain the dimensionality reduction model

  • collection_name – Name of Collection

dimensionality_reduction_job(collection_name: str, vector_field: str, n_components: int = 0, alias: str = 'default', refresh: bool = True)

Trains a Dimensionality Reduction model on the collection

Dimensionality reduction allows your vectors to be reduced down to any dimensions greater than 0 using unsupervised machine learning. This is useful for even faster search and visualising the vectors.

Parameters
  • vector_field – Vector field to perform dimensionality reduction on

  • alias – Alias is used to name the dimensionality reduced vectors

  • n_components – The size/length to reduce the vector down to. If 0 is set then highest possible is of components is set, when this is done you can get reduction on demand of any length.

  • refresh – Whether to refresh the whole collection and retrain the dimensionality reduction model

  • collection_name – Name of Collection

edit_document(collection_name: str, edits: Dict[str, str], verbose=True)

Edit a document ina collection based on ID

Parameters
  • collection_name – Name of collection

  • edits – What edits to make in a collection.

  • document_id – Id of the document

Example: Example:
>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.edit_documents(collection_name, edits=documents, workers=10)
>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> documents_df = pd.DataFrame.from_records([{'chicken': 'Big chicken'}, {'chicken': 'small_chicken'}, {'chicken': 'cow'}])
>>> vi_client.edit_document(documents=documents_df, models={'chicken': text_encoder.encode})
edit_documents(collection_name: str, edits: Dict, workers: int = 1)

Edit documents in a collection

Parameters
  • collection_name – Name of collection

  • edits – What edits to make in a collection. Ensure that _id is stored in the document.

  • workers – Number of parallel processes to run.

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.edit_documents(collection_name, edits=documents, workers=10)
encode_array(collection_name: str, array: List, array_field: str)

Encode an array into a vector

For example: an array that represents a movie’s categories, field “movie_categories”:

["sci-fi", "thriller", "comedy"]

-> <Encode the arrays to vectors> ->

| sci-fi | thriller | comedy | romance | drama |
|--------|----------|--------|---------|-------|
| 1      | 1        | 1      | 0       | 0     |

array vector: [1, 1, 1, 0, 0]
Parameters
  • array_field – The array field that encoding of the dictionary is trained on

  • array – The array to encode into vectors

  • collection_name – Name of Collection

encode_array_field(collection_name: str, array_fields: List)

Encode all arrays in a field for a collection into vectors

Within a collection encode the specified array field in every document into vectors.

For example, array that represents a **movie’s categories, field “movie_categories”:

document 1 array field: {"category" : ["sci-fi", "thriller", "comedy"]}

document 2 array field: {"category" : ["sci-fi", "romance", "drama"]}

-> <Encode the arrays to vectors> ->

| sci-fi | thriller | comedy | romance | drama |
|--------|----------|--------|---------|-------|
| 1      | 1        | 1      | 0       | 0     |
| 1      | 0        | 0      | 1       | 1     |

document 1 array vector: {"movie_categories_vector_": [1, 1, 1, 0, 0]}

document 2 array vector: {"movie_categories_vector_": [1, 0, 0, 1, 1]}
Parameters
  • array_fields – The array field to train on to encode into vectors

  • collection_name – Name of Collection

encode_audio(collection_name: str, audio)

Encode encode into a vector

_note: audio has to be stored somewhere and be provided as audio_url, a url that stores the audio_

For example: an audio_url represents sounds that a pokemon make:

"https://play.pokemonshowdown.com/audio/cries/pikachu.mp3"

-> <Encode the audio to vector> ->

audio_url vector: [0.794617772102356, 0.3581121861934662, 0.21113917231559753, 0.24878688156604767, 0.9741804003715515 ...]
Parameters
  • audio_url – The audio url of an audio to encode into a vector

  • collection_name – Name of Collection

encode_audio_job(collection_name: str, audio_field: str, refresh: bool = False)

Encode all audios in a field into vectors

Within a collection encode the specified audio field in every document into vectors.

_note: audio has to be stored somewhere and be provided as audio_url, a url that stores the audio_

For example, an audio_url field “pokemon_cries” represents sounds that a pokemon make:

document 1 audio_url field: {"pokemon_cries" : "https://play.pokemonshowdown.com/audio/cries/pikachu.mp3"}

document 2 audio_url field: {"pokemon_cries" : "https://play.pokemonshowdown.com/audio/cries/meowth.mp3"}

-> <Encode the audios to vectors> ->

document 1 audio_url vector: {"pokemon_cries_vector_": [0.794617772102356, 0.3581121861934662, 0.21113917231559753, 0.24878688156604767, 0.9741804003715515 ...]}

document 2 audio_url vector: {"pokemon_cries_vector_": [0.8364648222923279, 0.6280597448348999, 0.8112713694572449, 0.36105549335479736, 0.005313870031386614 ...]}
Parameters
  • audio_field – The audio field to encode into vectors

  • refresh – Whether to refresh the whole collection and re-encode all to vectors

  • collection_name – Name of Collection

encode_dictionary(collection_name: str, dictionary: Dict, dictionary_field: str)

Encode an dictionary into a vector

For example: a dictionary that represents a person’s characteristics visiting a store, field “person_characteristics”:

{"height":180, "age":40, "weight":70}

-> <Encode the dictionary to vector> ->

| height | age | weight | purchases | visits |
|--------|-----|--------|-----------|--------|
| 180    | 40  | 70     | 0         | 0      |

dictionary vector: [180, 40, 70, 0, 0]
Parameters
  • collection_name – Name of Collection

  • dictionary – A dictionary to encode into vectors

  • dictionary_field – The dictionary field that encoding of the dictionary is trained on

encode_dictionary_field(collection_name: str, dictionary_fields: List)

Encode all dictionaries in a field for collection into vectors

Within a collection encode the specified dictionary field in every document into vectors.

For example: a dictionary that represents a person’s characteristics visiting a store, field “person_characteristics”:

document 1 field: {"person_characteristics" : {"height":180, "age":40, "weight":70}}

document 2 field: {"person_characteristics" : {"age":32, "purchases":10, "visits": 24}}

-> <Encode the dictionaries to vectors> ->

| height | age | weight | purchases | visits |
|--------|-----|--------|-----------|--------|
| 180    | 40  | 70     | 0         | 0      |
| 0      | 32  | 0      | 10        | 24     |

document 1 dictionary vector: {"person_characteristics_vector_": [180, 40, 70, 0, 0]}

document 2 dictionary vector: {"person_characteristics_vector_": [0, 32, 0, 10, 24]}
Parameters
  • dictionary_fields – The dictionary field to train on to encode into vectors

  • collection_name – Name of Collection

encode_image(collection_name: str, image)

Encode image into a vector

_note: image has to be stored somewhere and be provided as image_url, a url that stores the image_

For example: an image_url represents an image of a celebrity:

"https://www.celebrity_images.com/brad_pitt.png"

-> <Encode the image to vector> ->

image vector: [0.794617772102356, 0.3581121861934662, 0.21113917231559753, 0.24878688156604767, 0.9741804003715515 ...]
Parameters
  • image – The image url of an image to encode into a vector

  • collection_name – Name of Collection

encode_image_job(collection_name: str, image_field: str, refresh: bool = False)

Encode all images in a field into vectors

Within a collection encode the specified image field in every document into vectors.

_note: image has to be stored somewhere and be provided as image_url, a url that stores the image_

For example, an image_url field “celebrity_image” represents an image of a celebrity:

document 1 image_url field: {"celebrity_image" : "https://www.celebrity_images.com/brad_pitt".png}

document 2 image_url field: {"celebrity_image" : "https://www.celebrity_images.com/brad_pitt.png"}

-> <Encode the images to vectors> ->

document 1 image_url vector: {"celebrity_image_vector_": [0.794617772102356, 0.3581121861934662, 0.21113917231559753, 0.24878688156604767, 0.9741804003715515 ...]}

document 2 image_url vector: {"celebrity_image_vector_": [0.8364648222923279, 0.6280597448348999, 0.8112713694572449, 0.36105549335479736, 0.005313870031386614 ...]}
Parameters
  • image_field – The image field to encode into vectors

  • refresh – Whether to refresh the whole collection and re-encode all to vectors

  • collection_name – Name of Collection

encode_text(collection_name: str, text)

Encode text into a vector

For example: a text field “product_description” represents the description of a product:

"AirPods deliver effortless, all-day audio on the go. And AirPods Pro bring Active Noise Cancellation to an in-ear headphone — with a customisable fit"

-> <Encode the text to vector> ->

text vector: [0.794617772102356, 0.3581121861934662, 0.21113917231559753, 0.24878688156604767, 0.9741804003715515 ...]
Parameters
  • text – Text to encode into vector

  • collection_name – Name of Collection

encode_text_job(collection_name: str, text_field: str, refresh: bool = False)

Encode all texts in a field into vectors

Within a collection encode the specified text field in every document into vectors.

For example, a text field “product_description” represents the description of a product:

document 1 text field: {"product_description" : "AirPods deliver effortless, all-day audio on the go. And AirPods Pro bring Active Noise Cancellation to an in-ear headphone — with a customisable fit."

document 2 text field: {"product_description" : "MacBook Pro elevates the notebook to a whole new level of performance and portability. Wherever your ideas take you, you’ll get there faster than ever with high‑performance processors and memory, advanced graphics, blazing‑fast storage and more — all in a compact package."

-> <Encode the texts to vectors> ->

document 1 text vector: {"product_description_vector_": [0.794617772102356, 0.3581121861934662, 0.21113917231559753, 0.24878688156604767, 0.9741804003715515 ...]}

document 2 text vector: {"product_description_vector_": [0.8364648222923279, 0.6280597448348999, 0.8112713694572449, 0.36105549335479736, 0.005313870031386614 ...]}
Parameters
  • text_field – The text field to encode into vectors

  • refresh – Whether to refresh the whole collection and re-encode all to vectors

  • collection_name – Name of Collection

facets(collection_name: str, fields: List[str] = [], page: int = 1, page_size: int = 20, asc: bool = False)

Retrieve the facets of a collection

Takes a high level aggregation of every field in a collection. This is used in advance search to help create the filter bar for search.

Parameters
  • facets_fields – Fields to include in the facets, if [] then all

  • date_interval – Interval for date facets

  • page_size – Size of facet page

  • page – Page of the results

  • asc – Whether to sort results by ascending or descending order

  • collection_name – Name of Collection

filters(collection_name: str, filters: List, page=1, page_size=10, include_vector: bool = False)

Filters a collection

Filter is used to retrieve documents that match the conditions set in a filter query. This is used in advance search to filter the documents that are searched.

The filters query is a json body that follows the schema of:

[
    {'field' : <field to filter>, 'filter_type' : <type of filter>, "condition":"==", "condition_value":"america"},
    {'field' : <field to filter>, 'filter_type' : <type of filter>, "condition":">=", "condition_value":90},
]

These are the available filter_type types:

1. "contains": for filtering documents that contains a string.
        {'field' : 'category', 'filter_type' : 'contains', "condition":"==", "condition_value": "bluetoo"]}
2. "exact_match"/"category": for filtering documents that matches a string or list of strings exactly.
        {'field' : 'category', 'filter_type' : 'categories', "condition":"==", "condition_value": "tv"]}
3. "categories": for filtering documents that contains any of a category from a list of categories.
        {'field' : 'category', 'filter_type' : 'categories', "condition":"==", "condition_value": ["tv", "smart", "bluetooth_compatible"]}
4. "exists": for filtering documents that contains a field.
        {'field' : 'purchased', 'filter_type' : 'exists', "condition":">=", "condition_value":" "}
5. "date": for filtering date by date range.
        {'field' : 'insert_date_', 'filter_type' : 'date', "condition":">=", "condition_value":"2020-01-01"}
6. "numeric": for filtering by numeric range.
        {'field' : 'price', 'filter_type' : 'date', "condition":">=", "condition_value":90}

These are the available conditions:

“==”, “!=”, “>=”, “>”, “<”, “<=”

Parameters
  • collection_name – Name of Collection

  • filters – Query for filtering the search results

  • page – Page of the results

  • page_size – Size of each page of results

  • asc – Whether to sort results by ascending or descending order

  • include_vector – Include vectors in the search results

head(collection_name: str, page_size: int = 5, return_as_pandas_df: bool = True)

The main Vi client with most of the available read and write methods available to it.

Parameters
  • collection_name – The name of your collection

  • page_size – The number of results to return

  • return_as_pandas_df – If True, return as a pandas DataFrame rather than a JSON.

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.head(collection_name, page_size=10)

Search a text field with vector and text using Vector Search and Traditional Search

Vector similarity search + Traditional Fuzzy Search with text and vector.

Parameters
  • text – Text Search Query (not encoded as vector)

  • vector – Vector, a list/array of floats that represents a piece of data.

  • text_fields – Text fields to search against

  • traditional_weight – Multiplier of traditional search. A value of 0.025~0.1 is good.

  • fuzzy – Fuzziness of the search. A value of 1-3 is good.

  • join – Whether to consider cases where there is a space in the word. E.g. Go Pro vs GoPro.

  • collection_name – Name of Collection

  • search_fields – Vector fields to search through

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • page_size – Size of each page of results

  • page – Page of the results

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count – Include count in the search results

  • hundred_scale

    Whether to scale up the metric by 100

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)

id(collection_name: str, document_id: str, include_vector: bool = True)

Look up a document by its id

Parameters
  • document_id – ID of a document

  • include_vector – Include vectors in the search results

  • collection_name – Name of Collection

insert(collection_name: str, document: Dict, insert_date: bool = True, overwrite: bool = True)

Insert a document into a Collection When inserting the document you can specify your own id for a document by using the field name “_id”. For specifying your own vector use the suffix (ends with) “_vector_” for the field name. e.g. “product_description_vector_”

Parameters
  • collection_name – Name of Collection

  • document – A Document is a JSON-like data that we store our metadata and vectors with. For specifying id of the document use the field ‘_id’, for specifying vector field use the suffix of ‘_vector_’

  • insert_date – Whether to include insert date as a field ‘insert_date_’.

  • overwrite – Whether to overwrite document if it exists.

insert_df(collection_name: str, df: pandas.core.frame.DataFrame, models: Dict[str, Callable] = {}, chunksize: int = 15, workers: int = 1, verbose: bool = True, use_bulk_encode: bool = False)

Insert dataframe into a collection

Parameters
  • collection_name – Name of collection

  • df – Pandas DataFrame

  • models – Models with an encode method

  • verbose – Whether to print document ids that have failed when inserting.

Example

>>> from vectorai.models.deployed import ViText2Vec
>>> text_encoder = ViText2Vec(username, api_key, vectorai_url)
>>> documents_df = pd.DataFrame.from_records([{'chicken': 'Big chicken'}, {'chicken': 'small_chicken'}, {'chicken': 'cow'}])
>>> vi_client.insert_df(documents=documents_df, models={'chicken': text_encoder.encode})
insert_document(collection_name: str, document: Dict, verbose=False)

Insert a document into a collection

Parameters
  • collection_name – Name of collection

  • documents – List of documents/jsons/dictionaries.

Example

>>> from vectorai.models.deployed import ViText2Vec
>>> text_encoder = ViText2Vec(username, api_key, vectorai_url)
>>> document = {'chicken': 'Big chicken'}
>>> vi_client.insert_document(document)
insert_documents(collection_name: str, documents: List, models: Dict[str, Callable] = {}, chunksize: int = 15, workers: int = 1, verbose: bool = False, use_bulk_encode: bool = False, overwrite: bool = False, show_progress_bar: bool = True)

Insert documents into a collection with an option to encode with models.

Parameters
  • collection_name – Name of collection

  • documents – All documents.

  • models – Models with an encode method

  • use_bulk_encode – Use the bulk_encode method in models

  • verbose – Whether to print document ids that have failed when inserting.

  • overwrite – If True, overwrites document based on _id field.

Example

>>> from vectorai.models.deployed import ViText2Vec
>>> text_encoder = ViText2Vec(username, api_key, vectorai_url)
>>> documents = [{'chicken': 'Big chicken'}, {'chicken': 'small_chicken'}, {'chicken': 'cow'}]
>>> vi_client.insert_documents(documents, models={'chicken': text_encoder.encode})
insert_single_document(collection_name: str, document: Dict)

Encode documents with models.

Parameters

documents – List of documents/jsons/dictionaries.

Example

>>> from vectorai.models.deployed import ViText2Vec
>>> text_encoder = ViText2Vec(username, api_key, vectorai_url)
>>> document = {'chicken': 'Big chicken'}
>>> vi_client.insert_single_document(document)
job_status(collection_name: str, job_id: str, job_name: str)

Get status of a job. Whether its starting, running, failed or finished.

Parameters
  • job_id

    .

  • job_name

    .

  • collection_name – Name of Collection

list_jobs(collection_name: str)

Get history of jobs

List and get a history of all the jobs and its job_id, parameters, start time, etc.

Parameters

collection_name – Name of Collection

publish_aggregation(collection_name: str, aggregation_query: dict, aggregation_name: str, aggregated_collection_name: str, description: str = 'published aggregation', date_field: str = 'insert_date_', refresh_time: int = 30, start_immediately: bool = True)

Publishes your aggregation query to a new collection Publish and schedules your aggregation query and saves it to a new collection. This new collection is just like any other collection and you can read, filter and aggregate it.

Parameters
  • source_collection – The collection where the data to aggregate comes from

  • dest_collection – The name of collection of where the data will be aggregated to

  • aggregation_name – The name for the published scheduled aggregation

  • description – The description for the published scheduled aggregation

  • aggregation_query – The aggregation query to schedule

  • date_field – The date field to check whether there is new data coming in

  • refresh_time – How often should the aggregation check for new data

  • start_immediately – Whether to start the published aggregation immediately

random_aggregation_query(collection_name: str, groupby: int = 1, metrics: int = 1)

Generates a random filter query.

Parameters
  • collection_name – name of collection

  • groupby – The number of groupbys to randomly generate

  • metrics – The number of metrics to randomly generate

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.random_aggregation_query(collection_name, groupby=1, metrics=1)
random_documents(collection_name: str, page_size: int = 20, seed: int = None, include_vector: bool = True, include_fields: list = [])

Retrieve some documents randomly

Mainly for testing purposes.

Parameters
  • seed – Random Seed for retrieving random documents.

  • page_size – Size of each page of results

  • include_vector – Include vectors in the search results

  • collection_name – Name of Collection

random_filter_query(collection_name: str, text_filters: int = 1, numeric_filters: int = 0)

Generates a random filter query.

Parameters
  • collection_name – name of collection

  • text_filters – The number of text filters to randomly generate

  • numeric_filters – The number of numeric filters to randomly generate

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.random_filter_query(collection_name, text_filters=1, numeric_filters=0)
retrieve_all_documents(collection_name: str, sort_by: List = [], asc: bool = True, include_vector: bool = True, include_fields: List = [], retrieve_chunk_size: int = 1000)

Retrieve all documents in a given collection. We recommend specifying specific fields to extract as otherwise this function may take a long time to run.

Parameters
  • collection_name – Name of collection.

  • sort_by – Select the fields by which to sort by.

  • asc – If true, returns in ascending order of what is sort.

  • include_vector – If true, includes _vector_ fields to return them.

  • include_fields – Adjust which fields are returned.

  • retrieve_chunk_size – The number of documents to retrieve per request.

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> all_documents = vi_client.retrieve_all_documents(collection_name)
retrieve_documents(collection_name: str, page_size: int = 20, cursor: str = None, sort: List = [], asc: bool = True, include_vector: bool = True, include_fields: List = [])

Retrieve some documents

Cursor is provided to retrieve even more documents. Loop through it to retrieve all documents in the database.

Parameters
  • include_fields – Fields to include in the document, if empty list [] then all is returned

  • cursor – Cursor to paginate the document retrieval

  • page_size – Size of each page of results

  • sort – Fields to sort the documents by

  • asc – Whether to sort results by ascending or descending order

  • include_vector – Include vectors in the search results

  • collection_name – Name of Collection

search(collection_name: str, vector: List, field: List, approx: int = 0, sum_fields: bool = True, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False)

Vector Similarity Search. Search a vector field with a vector, a.k.a Nearest Neighbors Search

Enables machine learning search with vector search. Search with a vector for the most similar vectors.

For example: Search with a person’s characteristics, who are the most similar (querying the “persons_characteristics_vector” field):

Query person's characteristics as a vector:
[180, 40, 70] representing [height, age, weight]

Search Results:
[
    {"name": Adam Levine, "persons_characteristics_vector" : [180, 56, 71]},
    {"name": Brad Pitt, "persons_characteristics_vector" : [180, 56, 65]},
...]
Parameters
  • vector – Vector, a list/array of floats that represents a piece of data.

  • collection_name – Name of Collection

  • search_fields – Vector fields to search through

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • page_size – Size of each page of results

  • page – Page of the results

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count

    Include count in the search results

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)

search_audio(collection_name: str, audio, fields: List, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False)

Search an audio field with audio using Vector Search Vector similarity search with an audio directly.

_note: audio has to be stored somewhere and be provided as audio_url, a url that stores the audio_

For example: an audio_url represents sounds that a pokemon make:

"https://play.pokemonshowdown.com/audio/cries/pikachu.mp3"

-> <Encode the audio to vector> ->

audio vector: [0.794617772102356, 0.3581121861934662, 0.21113917231559753, 0.24878688156604767, 0.9741804003715515 ...]

-> <Vector Search> ->

Search Results: {...}
Parameters
  • audio_url – The audio url of an audio to encode into a vector

  • collection_name – Name of Collection

  • search_fields – Vector fields to search through

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • page_size – Size of each page of results

  • page – Page of the results

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count – Include count in the search results

  • hundred_scale

    Whether to scale up the metric by 100

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)

search_audio_by_upload(collection_name: str, audio, fields: List, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False)

Search an audio field with uploaded audio using Vector Search with an uploaded audio directly.

_note: audio has to be sent as a base64 encoded string_

Parameters
  • collection_name – Name of Collection

  • search_fields – Vector fields to search against

  • page_size – Size of each page of results

  • page – Page of the results

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count – Include count in the search results

  • hundred_scale – Whether to scale up the metric by 100

  • audio

    Audio in local file path

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)

search_by_id(collection_name: str, document_id: str, field: str, sum_fields: bool = True, metric: str = 'cosine', min_score=0, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False)

Single Product Recommendations (Search by an id)

Recommendation by retrieving the vector from the specified id’s document. Then performing a search with that vector.

Parameters
  • document_id – ID of a document

  • collection_name – Name of Collection

  • search_field – Vector fields to search through

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • page_size – Size of each page of results

  • page – Page of the results

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count – Include count in the search results

  • hundred_scale

    Whether to scale up the metric by 100

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)

search_by_ids(collection_name: str, document_ids: List, field: str, vector_operation: str = 'mean', sum_fields: bool = True, metric: str = 'cosine', min_score=0, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False)

Multi Product Recommendations (Search by ids)

Recommendation by retrieving the vectors from the specified list of ids documents. Then performing a search with an aggregated vector that is the sum (depends on vector_operation) of those vectors.

Parameters
  • document_ids – IDs of documents

  • vector_operation – Aggregation for the vectors, choose from [‘mean’, ‘sum’, ‘min’, ‘max’]

  • collection_name – Name of Collection

  • search_field – Vector fields to search through

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • page_size – Size of each page of results

  • page – Page of the results

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count – Include count in the search results

  • hundred_scale

    Whether to scale up the metric by 100

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)

search_by_positive_negative_ids(collection_name: str, positive_document_ids: List, negative_document_ids: List, field: str, vector_operation: str = 'mean', sum_fields: bool = True, metric: str = 'cosine', min_score=0, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False)

Multi Product Recommendations with Likes and Dislikes (Search by ids)

Recommendation by retrieving the vectors from the specified list of positive and negative ids documents. Then performing a search with an aggregated vector that is the sum (depends on vector_operation) of positive id vectors minus the negative id vectors.

Parameters
  • positive_document_ids – Positive Document IDs to get recommendations for, and the weightings of each document

  • negative_document_ids – Negative Document IDs to get recommendations for, and the weightings of each document

  • vector_operation – Aggregation for the vectors, choose from [‘mean’, ‘sum’, ‘min’, ‘max’]

  • collection_name – Name of Collection

  • search_field – Vector fields to search through

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • page_size – Size of each page of results

  • page – Page of the results

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count – Include count in the search results

  • hundred_scale

    Whether to scale up the metric by 100

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)

search_image(collection_name: str, image, fields: List, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False)

Search an image field with image using Vector Search

Vector similarity search with an image directly.

_note: image has to be stored somewhere and be provided as image_url, a url that stores the image_

For example: an image_url represents an image of a celebrity:

"https://www.celebrity_images.com/brad_pitt.png"

-> <Encode the image to vector> ->

image vector: [0.794617772102356, 0.3581121861934662, 0.21113917231559753, 0.24878688156604767, 0.9741804003715515 ...]

-> <Vector Search> ->

Search Results: {...}
Parameters
  • image_url – The image url of an image to encode into a vector

  • collection_name – Name of Collection

  • search_fields – Vector fields to search through

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • page_size – Size of each page of results

  • page – Page of the results

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count – Include count in the search results

  • hundred_scale

    Whether to scale up the metric by 100

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)

search_image_by_upload(collection_name: str, image, fields: List, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector=False, include_count=True, asc=False)

Search an image field with uploaded image using Vector Search

Vector similarity search with an uploaded image directly.

_note: image has to be sent as a base64 encoded string_

Parameters
  • collection_name – Name of Collection

  • search_fields – Vector fields to search against

  • page_size – Size of each page of results

  • page – Page of the results

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count – Include count in the search results

  • hundred_scale – Whether to scale up the metric by 100

  • image

    Image in local file path

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)

search_text(collection_name: str, text, fields: List, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False)

Search a text field with text using Vector Search with text directly.

For example: “product_description” represents the description of a product:

"AirPods deliver effortless, all-day audio on the go. And AirPods Pro bring Active Noise Cancellation to an in-ear headphone — with a customisable fit"

-> <Encode the text to vector> ->

i.e. text vector, "product_description_vector_": [0.794617772102356, 0.3581121861934662, 0.21113917231559753, 0.24878688156604767, 0.9741804003715515 ...]

-> <Vector Search> ->

Search Results: {...}
Parameters
  • text – Text to encode into vector and vector search with

  • collection_name – Name of Collection

  • search_fields – Vector fields to search through

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • page_size – Size of each page of results

  • page – Page of the results

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count – Include count in the search results

  • hundred_scale

    Whether to scale up the metric by 100

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)

search_with_array(collection_name: str, array: List, array_field: str, fields: List, sum_fields: bool = True, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False)

Search an array field with an array using Vector Search with an array directly.

For example: an array that represents a movie’s categories, field “movie_categories”:

["sci-fi", "thriller", "comedy"]

-> <Encode the arrays to vectors> ->

| sci-fi | thriller | comedy | romance | drama |
|--------|----------|--------|---------|-------|
| 1      | 1        | 1      | 0       | 0     |

array vector: [1, 1, 1, 0, 0]

-> <Vector Search> ->

Search Results: {...}
Parameters
  • array_field – The array field that encoding of the dictionary is trained on

  • array – The array to encode into vectors

  • collection_name – Name of Collection

  • search_fields – Vector fields to search through

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • page_size – Size of each page of results

  • page – Page of the results

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count – Include count in the search results

  • hundred_scale

    Whether to scale up the metric by 100

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)

search_with_dictionary(collection_name: str, dictionary: Dict, dictionary_field: str, fields: List, sum_fields: bool = True, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False)

Search a dictionary field with a dictionary using Vector Search with a dictionary directly.

For example: a dictionary that represents a person’s characteristics visiting a store, field “person_characteristics”:

{"height":180, "age":40, "weight":70}

-> <Encode the dictionary to vector> ->

| height | age | weight | purchases | visits |
|--------|-----|--------|-----------|--------|
| 180    | 40  | 70     | 0         | 0      |

dictionary vector: [180, 40, 70, 0, 0]

-> <Vector Search> ->

Search Results: {...}
Parameters
  • collection_name – Name of Collection

  • search_fields – Vector fields to search against

  • page_size – Size of each page of results

  • page – Page of the results

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count – Include count in the search results

  • hundred_scale – Whether to scale up the metric by 100

  • dictionary – A dictionary to encode into vectors

  • dictionary_field

    The dictionary field that encoding of the dictionary is trained on

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)

search_with_positive_negative_ids_as_history(collection_name: str, vector: List, positive_document_ids: List, negative_document_ids: List, field: str, vector_operation: str = 'mean', sum_fields: bool = True, metric: str = 'cosine', min_score=0, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False)

Multi Product Recommendations with Likes and Dislikes (Search by ids)

Search by retrieving the vectors from the specified list of positive and negative ids documents. Then performing a search with search query vector and aggregated vector, that is the sum (depends on vector_operation) of positive id vectors minus the negative id vectors.

Parameters
  • vector – Vector, a list/array of floats that represents a piece of data.

  • positive_document_ids – Positive Document IDs to get recommendations for, and the weightings of each document

  • negative_document_ids – Negative Document IDs to get recommendations for, and the weightings of each document

  • vector_operation – Aggregation for the vectors, choose from [‘mean’, ‘sum’, ‘min’, ‘max’]

  • collection_name – Name of Collection

  • search_field – Vector fields to search through

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • page_size – Size of each page of results

  • page – Page of the results

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count – Include count in the search results

  • hundred_scale

    Whether to scale up the metric by 100

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)

wait_till_jobs_complete(collection_name: str, job_id: str, job_name: str)

Wait until a specific job is complete.

Parameters
  • collection_name – Name of collection.

  • job_id – ID of the job.

  • job_name – Name of the job.

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> job = vi_client.dimensionality_reduction_job('nba_season_per_36_stats_demo', vector_field='season_vector_', n_components=2)
>>> vi_client.wait_till_jobs_complete('nba_season_per_36_stats_demo', **job)