Read¶

Read

class vectorai.api.read.ViReadAPIClient(username, api_key, url=None)¶

Read Operations

collection_stats(collection_name: str, return_curl: bool = False, **kwargs)¶

Retrieves stats about a collection

Stats include: size, searches, number of documents, etc.

Parameters: collection_name – Name of Collection

collection_schema(collection_name: str, return_curl: bool = False, **kwargs)¶

Retrieves the schema of a collection

The schema of a collection can include types of: text, numeric, date, bool, etc.

Parameters: collection_name – Name of Collection

id(collection_name: str, document_id: str, include_vector: bool = True, return_curl: bool = False, **kwargs)¶

Look up a document by its id

Parameters

document_id – ID of a document
include_vector – Include vectors in the search results
collection_name – Name of Collection

bulk_id(collection_name: str, document_ids: List[str], return_curl: bool = False, **kwargs)¶

Look up multiple document by their ids

Parameters

document_ids – IDs of documents
include_vector – Include vectors in the search results
collection_name – Name of Collection

retrieve_documents(collection_name: str, page_size: int = 20, cursor: str = None, sort: List = [], asc: bool = True, include_vector: bool = True, include_fields: List = [], return_curl: bool = False, **kwargs)¶

Retrieve some documents

Cursor is provided to retrieve even more documents. Loop through it to retrieve all documents in the database.

Parameters

include_fields – Fields to include in the document, if empty list [] then all is returned
cursor – Cursor to paginate the document retrieval
page_size – Size of each page of results
sort – Fields to sort the documents by
asc – Whether to sort results by ascending or descending order
include_vector – Include vectors in the search results
collection_name – Name of Collection

random_documents(collection_name: str, page_size: int = 20, seed: int = None, include_vector: bool = True, include_fields: list = [], return_curl: bool = False, **kwargs)¶

Retrieve some documents randomly

Mainly for testing purposes.

Parameters

seed – Random Seed for retrieving random documents.
page_size – Size of each page of results
include_vector – Include vectors in the search results
collection_name –
Name of Collection

return_curl:
Return CURL statement

id_lookup_joined(join_query: dict, doc_id: str, return_curl: bool = False, **kwargs)¶

Look up a document by its id with joins

Parameters

join_query –
.
doc_id – ID of a Document

aggregate(collection_name: str, aggregation_query: Dict, page: int = 1, page_size: int = 10, asc: bool = False, flatten: bool = True, return_curl: bool = False, **kwargs)¶

Aggregate a collection

Aggregation/Groupby of a collection using an aggregation query. The aggregation query is a json body that follows the schema of:

{
    "groupby" : [
        {"name": <nickname/alias>, "field": <field in the collection>, "agg": "category"},
        {"name": <another_nickname/alias>, "field": <another groupby field in the collection>, "agg": "category"}
    ],
    "metrics" : [
        {"name": <nickname/alias>, "field": <numeric field in the collection>, "agg": "avg"}
    ]
}

“groupby” is the fields you want to split the data into. These are the available groupby types:
- category” : groupby a field that is a category
“metrics” is the fields you want to metrics you want to calculate in each of those. These are the available metric types: every aggregation includes a frequency metric:
- average”, “max”, “min”, “sum”, “cardinality”

Parameters

collection_name – Name of Collection
aggregation_query – Aggregation query to aggregate data
page_size – Size of each page of results
page – Page of the results
asc –
Whether to sort results by ascending or descending order

flatten:
Whether to flatten the aggregated results into a list of dictionarys or dictionary of lists.

return_curl:
Return the CURL statement

facets(collection_name: str, fields: List[str] = [], page: int = 1, page_size: int = 20, asc: bool = False, return_curl: bool = False, **kwargs)¶

Retrieve the facets of a collection

Takes a high level aggregation of every field in a collection. This is used in advance search to help create the filter bar for search.

Parameters

facets_fields – Fields to include in the facets, if [] then all
date_interval – Interval for date facets
page_size – Size of facet page
page – Page of the results
asc – Whether to sort results by ascending or descending order
collection_name – Name of Collection

filters(collection_name: str, filters: List, page=1, page_size=10, include_vector: bool = False, return_curl: bool = False, **kwargs)¶

Filters a collection

Filter is used to retrieve documents that match the conditions set in a filter query. This is used in advance search to filter the documents that are searched.

The filters query is a json body that follows the schema of:

[
    {'field' : <field to filter>, 'filter_type' : <type of filter>, "condition":"==", "condition_value":"america"},
    {'field' : <field to filter>, 'filter_type' : <type of filter>, "condition":">=", "condition_value":90},
]

These are the available filter_type types:

1. "contains": for filtering documents that contains a string.
        {'field' : 'category', 'filter_type' : 'contains', "condition":"==", "condition_value": "bluetoo"]}
2. "exact_match"/"category": for filtering documents that matches a string or list of strings exactly.
        {'field' : 'category', 'filter_type' : 'categories', "condition":"==", "condition_value": "tv"]}
3. "categories": for filtering documents that contains any of a category from a list of categories.
        {'field' : 'category', 'filter_type' : 'categories', "condition":"==", "condition_value": ["tv", "smart", "bluetooth_compatible"]}
4. "exists": for filtering documents that contains a field.
        {'field' : 'purchased', 'filter_type' : 'exists', "condition":">=", "condition_value":" "}
5. "date": for filtering date by date range.
        {'field' : 'insert_date_', 'filter_type' : 'date', "condition":">=", "condition_value":"2020-01-01"}
6. "numeric": for filtering by numeric range.
        {'field' : 'price', 'filter_type' : 'date', "condition":">=", "condition_value":90}

These are the available conditions:

“==”, “!=”, “>=”, “>”, “<”, “<=”

Parameters

collection_name – Name of Collection
filters – Query for filtering the search results
page – Page of the results
page_size – Size of each page of results
asc – Whether to sort results by ascending or descending order
include_vector –
Include vectors in the search results

return_curl:
Returns Curl statement for debugging

job_status(collection_name: str, job_id: str, job_name: str, return_curl: bool = False, **kwargs)¶

Get status of a job. Whether its starting, running, failed or finished.

Parameters

job_id –
.
job_name –
.
collection_name – Name of Collection

list_jobs(collection_name: str, return_curl: bool = False, **kwargs)¶

Get history of jobs

List and get a history of all the jobs and its job_id, parameters, start time, etc.

Parameters: collection_name – Name of Collection

bulk_missing_id(collection_name: str, document_ids: List[str], return_curl: bool = False, **kwargs)¶: Return IDs that are not in a collection.

random_documents_with_filters(collection_name: str, seed: int = None, include_fields: List[str] = [], page_size: int = 20, include_vector: bool = False, filters: List[Dict] = [], return_curl: bool = False, **kwargs)¶: Random documents with filters.

vector_health(collection_name: str, return_curl: bool = False, **kwargs)¶: Return vector health of a collection

Reads Operations designed for python

class vectorai.read.ViReadClient(username: str, api_key: str, url: str = 'https://api.vctr.ai')¶

random_aggregation_query(collection_name: str, groupby: int = 1, metrics: int = 1)¶

Generates a random filter query.

Parameters

collection_name – name of collection
groupby – The number of groupbys to randomly generate
metrics – The number of metrics to randomly generate

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.random_aggregation_query(collection_name, groupby=1, metrics=1)

search(collection_name: str, vector: List, field: List, filters: List = [], approx: int = 0, sum_fields: bool = True, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False, **kwargs)¶

Vector Similarity Search. Search a vector field with a vector, a.k.a Nearest Neighbors Search

Enables machine learning search with vector search. Search with a vector for the most similar vectors.

For example: Search with a person’s characteristics, who are the most similar (querying the “persons_characteristics_vector” field):

Query person's characteristics as a vector:
[180, 40, 70] representing [height, age, weight]

Search Results:
[
    {"name": Adam Levine, "persons_characteristics_vector" : [180, 56, 71]},
    {"name": Brad Pitt, "persons_characteristics_vector" : [180, 56, 65]},
...]

Parameters

vector – Vector, a list/array of floats that represents a piece of data.
collection_name – Name of Collection
search_fields – Vector fields to search through
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size – Size of each page of results
filters – Filters for search
page – Page of the results
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count –
Include count in the search results

asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)

random_filter_query(collection_name: str, text_filters: int = 1, numeric_filters: int = 0)¶

Generates a random filter query.

Parameters

collection_name – name of collection
text_filters – The number of text filters to randomly generate
numeric_filters – The number of numeric filters to randomly generate

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.random_filter_query(collection_name, text_filters=1, numeric_filters=0)

head(collection_name: str, page_size: int = 5, return_as_pandas_df: bool = True)¶

The main Vi client with most of the available read and write methods available to it.

Parameters

collection_name – The name of your collection
page_size – The number of results to return
return_as_pandas_df – If True, return as a pandas DataFrame rather than a JSON.

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.head(collection_name, page_size=10)

retrieve_all_documents(collection_name: str, sort: List = [], asc: bool = True, include_vector: bool = True, include_fields: List = [], retrieve_chunk_size: int = 1000, **kwargs)¶

Retrieve all documents in a given collection. We recommend specifying specific fields to extract as otherwise this function may take a long time to run.

Parameters

collection_name – Name of collection.
sort_by – Select the fields by which to sort by.
asc – If true, returns in ascending order of what is sort.
include_vector – If true, includes _vector_ fields to return them.
include_fields – Adjust which fields are returned.
retrieve_chunk_size – The number of documents to retrieve per request.

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> all_documents = vi_client.retrieve_all_documents(collection_name)

wait_till_jobs_complete(collection_name: str, job_id: str, job_name: str)¶

Wait until a specific job is complete.

Parameters

collection_name – Name of collection.
job_id – ID of the job.
job_name – Name of the job.

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> job = vi_client.dimensionality_reduction_job('nba_season_per_36_stats_demo', vector_field='season_vector_', n_components=2)
>>> vi_client.wait_till_jobs_complete('nba_season_per_36_stats_demo', **job)

check_schema(collection_name: str, document: Optional[Dict] = None)¶

Check the schema of a given collection.

Parameters: collection_name – Name of collection.

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.check_schema(collection_name)

list_collections() → List[str]¶

List Collections

Parameters

username –

Username api_key:

Api Key, you can request it from request_api_key

Returns

List of collections

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> doc = {'items': {'chicken': 'fried'}, 'food_vector_': [0, 1, 2]}
>>> vi_client._check_schema(doc)

search_collections(keyword: str) → List[str]¶

Performs keyword matching in collections. :param keyword: Matches based on keywords

Returns: List of collection names

Example

>>> from vectorai import ViClient
>>> vi_client = ViClient()
>>> vi_client.search_collections('example')

random_recommendation(collection_name: str, search_field: str, seed=None, sum_fields: bool = True, metric: str = 'cosine', min_score=0, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, approx: int = 0, hundred_scale=True, asc: bool = False, **kwargs)¶

Recommend by random ID using vector search document_id:

ID of a document

collection_name:: Name of Collection
field:: Vector fields to search through
approx:: Used for approximate search
sum_fields:: Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size:: Size of each page of results
page:: Page of the results
metric:: Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
min_score:: Minimum score for similarity metric
include_vector:: Include vectors in the search results
include_count:: Include count in the search results
hundred_scale:: Whether to scale up the metric by 100
asc:: Whether to sort the score by ascending order (default is false, for getting most similar results)

create_filter_query(collection_name: str, field: str, filter_type: str, filter_values: Optional[Union[List[str], str]] = None)¶

Filter type can be one of contains/exact_match/categories/exists/insert_date/numeric_range Filter types can be one of: contains: Field must contain this specific string. Not case sensitive. exact_match: Field must have an exact match categories: Matches entire field exists: If field exists in document >= / > / < / <= : Larger than or equal to / Larger than / Smaller than / Smaller than or equal to These, however, can only be applied on numeric/date values. Check collection_schema.

Args: collection_name: The name of the collection field: The field to filter on filter_type: One of contains/exact_match/categories/>=/>/<=/<.

search_with_filters(collection_name: str, vector: List, field: List, filters: List = [], approx: int = 0, sum_fields: bool = True, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False, **kwargs)¶

Vector Similarity Search. Search a vector field with a vector, a.k.a Nearest Neighbors Search

Enables machine learning search with vector search. Search with a vector for the most similar vectors.

For example: Search with a person’s characteristics, who are the most similar (querying the “persons_characteristics_vector” field):

Query person's characteristics as a vector:
[180, 40, 70] representing [height, age, weight]

Search Results:
[
    {"name": Adam Levine, "persons_characteristics_vector" : [180, 56, 71]},
    {"name": Brad Pitt, "persons_characteristics_vector" : [180, 56, 65]},
...]

Parameters

vector – Vector, a list/array of floats that represents a piece of data.
collection_name – Name of Collection
search_fields – Vector fields to search through
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size – Size of each page of results
page – Page of the results
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count –
Include count in the search results

asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)

hybrid_search_with_filters(collection_name: str, text: str, vector: List, fields: List, text_fields: List, filters: List = [], sum_fields: bool = True, metric: str = 'cosine', min_score=None, traditional_weight=0.075, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False, **kwargs)¶

Search a text field with vector and text using Vector Search and Traditional Search

Vector similarity search + Traditional Fuzzy Search with text and vector.

You can also give weightings of each vector field towards the search, e.g. image_vector_ weights 100%, whilst description_vector_ 50%.

Hybrid search with filters also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.

Parameters

collection_name – Name of Collection
page – Page of the results
page_size – Size of each page of results
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
filters – Query for filtering the search results
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count – Include count in the search results
include_facets – Include facets in the search results
hundred_scale – Whether to scale up the metric by 100
multivector_query – Query for advance search that allows for multiple vector and field querying
text – Text Search Query (not encoded as vector)
text_fields – Text fields to search against
traditional_weight – Multiplier of traditional search. A value of 0.025~0.1 is good.
fuzzy – Fuzziness of the search. A value of 1-3 is good.
join –
Whether to consider cases where there is a space in the word. E.g. Go Pro vs GoPro.

asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)