Read

Read

Read

class vectorai.api.read.ViReadAPIClient(username, api_key, url=None)

Read Operations

collection_stats(collection_name: str, return_curl: bool = False, **kwargs)

Retrieves stats about a collection

Stats include: size, searches, number of documents, etc.

Parameters

collection_name – Name of Collection

collection_schema(collection_name: str, return_curl: bool = False, **kwargs)

Retrieves the schema of a collection

The schema of a collection can include types of: text, numeric, date, bool, etc.

Parameters

collection_name – Name of Collection

id(collection_name: str, document_id: str, include_vector: bool = True, return_curl: bool = False, **kwargs)

Look up a document by its id

Parameters
  • document_id – ID of a document

  • include_vector – Include vectors in the search results

  • collection_name – Name of Collection

bulk_id(collection_name: str, document_ids: List[str], return_curl: bool = False, **kwargs)

Look up multiple document by their ids

Parameters
  • document_ids – IDs of documents

  • include_vector – Include vectors in the search results

  • collection_name – Name of Collection

retrieve_documents(collection_name: str, page_size: int = 20, cursor: str = None, sort: List = [], asc: bool = True, include_vector: bool = True, include_fields: List = [], return_curl: bool = False, **kwargs)

Retrieve some documents

Cursor is provided to retrieve even more documents. Loop through it to retrieve all documents in the database.

Parameters
  • include_fields – Fields to include in the document, if empty list [] then all is returned

  • cursor – Cursor to paginate the document retrieval

  • page_size – Size of each page of results

  • sort – Fields to sort the documents by

  • asc – Whether to sort results by ascending or descending order

  • include_vector – Include vectors in the search results

  • collection_name – Name of Collection

random_documents(collection_name: str, page_size: int = 20, seed: int = None, include_vector: bool = True, include_fields: list = [], return_curl: bool = False, **kwargs)

Retrieve some documents randomly

Mainly for testing purposes.

Parameters
  • seed – Random Seed for retrieving random documents.

  • page_size – Size of each page of results

  • include_vector – Include vectors in the search results

  • collection_name

    Name of Collection

    return_curl:

    Return CURL statement

id_lookup_joined(join_query: dict, doc_id: str, return_curl: bool = False, **kwargs)

Look up a document by its id with joins

Parameters
  • join_query

    .

  • doc_id – ID of a Document

aggregate(collection_name: str, aggregation_query: Dict, page: int = 1, page_size: int = 10, asc: bool = False, flatten: bool = True, return_curl: bool = False, **kwargs)

Aggregate a collection

Aggregation/Groupby of a collection using an aggregation query. The aggregation query is a json body that follows the schema of:

{
    "groupby" : [
        {"name": <nickname/alias>, "field": <field in the collection>, "agg": "category"},
        {"name": <another_nickname/alias>, "field": <another groupby field in the collection>, "agg": "category"}
    ],
    "metrics" : [
        {"name": <nickname/alias>, "field": <numeric field in the collection>, "agg": "avg"}
    ]
}
  • “groupby” is the fields you want to split the data into. These are the available groupby types:

    • category” : groupby a field that is a category

  • “metrics” is the fields you want to metrics you want to calculate in each of those. These are the available metric types: every aggregation includes a frequency metric:

    • average”, “max”, “min”, “sum”, “cardinality”

Parameters
  • collection_name – Name of Collection

  • aggregation_query – Aggregation query to aggregate data

  • page_size – Size of each page of results

  • page – Page of the results

  • asc

    Whether to sort results by ascending or descending order

    flatten:

    Whether to flatten the aggregated results into a list of dictionarys or dictionary of lists.

    return_curl:

    Return the CURL statement

facets(collection_name: str, fields: List[str] = [], page: int = 1, page_size: int = 20, asc: bool = False, return_curl: bool = False, **kwargs)

Retrieve the facets of a collection

Takes a high level aggregation of every field in a collection. This is used in advance search to help create the filter bar for search.

Parameters
  • facets_fields – Fields to include in the facets, if [] then all

  • date_interval – Interval for date facets

  • page_size – Size of facet page

  • page – Page of the results

  • asc – Whether to sort results by ascending or descending order

  • collection_name – Name of Collection

filters(collection_name: str, filters: List, page=1, page_size=10, include_vector: bool = False, return_curl: bool = False, **kwargs)

Filters a collection

Filter is used to retrieve documents that match the conditions set in a filter query. This is used in advance search to filter the documents that are searched.

The filters query is a json body that follows the schema of:

[
    {'field' : <field to filter>, 'filter_type' : <type of filter>, "condition":"==", "condition_value":"america"},
    {'field' : <field to filter>, 'filter_type' : <type of filter>, "condition":">=", "condition_value":90},
]

These are the available filter_type types:

1. "contains": for filtering documents that contains a string.
        {'field' : 'category', 'filter_type' : 'contains', "condition":"==", "condition_value": "bluetoo"]}
2. "exact_match"/"category": for filtering documents that matches a string or list of strings exactly.
        {'field' : 'category', 'filter_type' : 'categories', "condition":"==", "condition_value": "tv"]}
3. "categories": for filtering documents that contains any of a category from a list of categories.
        {'field' : 'category', 'filter_type' : 'categories', "condition":"==", "condition_value": ["tv", "smart", "bluetooth_compatible"]}
4. "exists": for filtering documents that contains a field.
        {'field' : 'purchased', 'filter_type' : 'exists', "condition":">=", "condition_value":" "}
5. "date": for filtering date by date range.
        {'field' : 'insert_date_', 'filter_type' : 'date', "condition":">=", "condition_value":"2020-01-01"}
6. "numeric": for filtering by numeric range.
        {'field' : 'price', 'filter_type' : 'date', "condition":">=", "condition_value":90}

These are the available conditions:

“==”, “!=”, “>=”, “>”, “<”, “<=”

Parameters
  • collection_name – Name of Collection

  • filters – Query for filtering the search results

  • page – Page of the results

  • page_size – Size of each page of results

  • asc – Whether to sort results by ascending or descending order

  • include_vector

    Include vectors in the search results

    return_curl:

    Returns Curl statement for debugging

job_status(collection_name: str, job_id: str, job_name: str, return_curl: bool = False, **kwargs)

Get status of a job. Whether its starting, running, failed or finished.

Parameters
  • job_id

    .

  • job_name

    .

  • collection_name – Name of Collection

list_jobs(collection_name: str, return_curl: bool = False, **kwargs)

Get history of jobs

List and get a history of all the jobs and its job_id, parameters, start time, etc.

Parameters

collection_name – Name of Collection

bulk_missing_id(collection_name: str, document_ids: List[str], return_curl: bool = False, **kwargs)

Return IDs that are not in a collection.

random_documents_with_filters(collection_name: str, seed: int = None, include_fields: List[str] = [], page_size: int = 20, include_vector: bool = False, filters: List[Dict] = [], return_curl: bool = False, **kwargs)

Random documents with filters.

vector_health(collection_name: str, return_curl: bool = False, **kwargs)

Return vector health of a collection

Reads Operations designed for python

class vectorai.read.ViReadClient(username: str, api_key: str, url: str = 'https://api.vctr.ai')
random_aggregation_query(collection_name: str, groupby: int = 1, metrics: int = 1)

Generates a random filter query.

Parameters
  • collection_name – name of collection

  • groupby – The number of groupbys to randomly generate

  • metrics – The number of metrics to randomly generate

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.random_aggregation_query(collection_name, groupby=1, metrics=1)
search(collection_name: str, vector: List, field: List, filters: List = [], approx: int = 0, sum_fields: bool = True, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False, **kwargs)

Vector Similarity Search. Search a vector field with a vector, a.k.a Nearest Neighbors Search

Enables machine learning search with vector search. Search with a vector for the most similar vectors.

For example: Search with a person’s characteristics, who are the most similar (querying the “persons_characteristics_vector” field):

Query person's characteristics as a vector:
[180, 40, 70] representing [height, age, weight]

Search Results:
[
    {"name": Adam Levine, "persons_characteristics_vector" : [180, 56, 71]},
    {"name": Brad Pitt, "persons_characteristics_vector" : [180, 56, 65]},
...]
Parameters
  • vector – Vector, a list/array of floats that represents a piece of data.

  • collection_name – Name of Collection

  • search_fields – Vector fields to search through

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • page_size – Size of each page of results

  • filters – Filters for search

  • page – Page of the results

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count

    Include count in the search results

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)

random_filter_query(collection_name: str, text_filters: int = 1, numeric_filters: int = 0)

Generates a random filter query.

Parameters
  • collection_name – name of collection

  • text_filters – The number of text filters to randomly generate

  • numeric_filters – The number of numeric filters to randomly generate

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.random_filter_query(collection_name, text_filters=1, numeric_filters=0)
head(collection_name: str, page_size: int = 5, return_as_pandas_df: bool = True)

The main Vi client with most of the available read and write methods available to it.

Parameters
  • collection_name – The name of your collection

  • page_size – The number of results to return

  • return_as_pandas_df – If True, return as a pandas DataFrame rather than a JSON.

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.head(collection_name, page_size=10)
retrieve_all_documents(collection_name: str, sort: List = [], asc: bool = True, include_vector: bool = True, include_fields: List = [], retrieve_chunk_size: int = 1000, **kwargs)

Retrieve all documents in a given collection. We recommend specifying specific fields to extract as otherwise this function may take a long time to run.

Parameters
  • collection_name – Name of collection.

  • sort_by – Select the fields by which to sort by.

  • asc – If true, returns in ascending order of what is sort.

  • include_vector – If true, includes _vector_ fields to return them.

  • include_fields – Adjust which fields are returned.

  • retrieve_chunk_size – The number of documents to retrieve per request.

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> all_documents = vi_client.retrieve_all_documents(collection_name)
wait_till_jobs_complete(collection_name: str, job_id: str, job_name: str)

Wait until a specific job is complete.

Parameters
  • collection_name – Name of collection.

  • job_id – ID of the job.

  • job_name – Name of the job.

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> job = vi_client.dimensionality_reduction_job('nba_season_per_36_stats_demo', vector_field='season_vector_', n_components=2)
>>> vi_client.wait_till_jobs_complete('nba_season_per_36_stats_demo', **job)
check_schema(collection_name: str, document: Optional[Dict] = None)

Check the schema of a given collection.

Parameters

collection_name – Name of collection.

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.check_schema(collection_name)
list_collections()List[str]

List Collections

Parameters

username

Username api_key:

Api Key, you can request it from request_api_key

Returns

List of collections

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> doc = {'items': {'chicken': 'fried'}, 'food_vector_': [0, 1, 2]}
>>> vi_client._check_schema(doc)
search_collections(keyword: str)List[str]

Performs keyword matching in collections. :param keyword: Matches based on keywords

Returns

List of collection names

Example

>>> from vectorai import ViClient
>>> vi_client = ViClient()
>>> vi_client.search_collections('example')
random_recommendation(collection_name: str, search_field: str, seed=None, sum_fields: bool = True, metric: str = 'cosine', min_score=0, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, approx: int = 0, hundred_scale=True, asc: bool = False, **kwargs)

Recommend by random ID using vector search document_id:

ID of a document

collection_name:

Name of Collection

field:

Vector fields to search through

approx:

Used for approximate search

sum_fields:

Whether to sum the multiple vectors similarity search score as 1 or seperate

page_size:

Size of each page of results

page:

Page of the results

metric:

Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

min_score:

Minimum score for similarity metric

include_vector:

Include vectors in the search results

include_count:

Include count in the search results

hundred_scale:

Whether to scale up the metric by 100

asc:

Whether to sort the score by ascending order (default is false, for getting most similar results)

create_filter_query(collection_name: str, field: str, filter_type: str, filter_values: Optional[Union[List[str], str]] = None)

Filter type can be one of contains/exact_match/categories/exists/insert_date/numeric_range Filter types can be one of: contains: Field must contain this specific string. Not case sensitive. exact_match: Field must have an exact match categories: Matches entire field exists: If field exists in document >= / > / < / <= : Larger than or equal to / Larger than / Smaller than / Smaller than or equal to These, however, can only be applied on numeric/date values. Check collection_schema.

Args: collection_name: The name of the collection field: The field to filter on filter_type: One of contains/exact_match/categories/>=/>/<=/<.

search_with_filters(collection_name: str, vector: List, field: List, filters: List = [], approx: int = 0, sum_fields: bool = True, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False, **kwargs)

Vector Similarity Search. Search a vector field with a vector, a.k.a Nearest Neighbors Search

Enables machine learning search with vector search. Search with a vector for the most similar vectors.

For example: Search with a person’s characteristics, who are the most similar (querying the “persons_characteristics_vector” field):

Query person's characteristics as a vector:
[180, 40, 70] representing [height, age, weight]

Search Results:
[
    {"name": Adam Levine, "persons_characteristics_vector" : [180, 56, 71]},
    {"name": Brad Pitt, "persons_characteristics_vector" : [180, 56, 65]},
...]
Parameters
  • vector – Vector, a list/array of floats that represents a piece of data.

  • collection_name – Name of Collection

  • search_fields – Vector fields to search through

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • page_size – Size of each page of results

  • page – Page of the results

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count

    Include count in the search results

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)

hybrid_search_with_filters(collection_name: str, text: str, vector: List, fields: List, text_fields: List, filters: List = [], sum_fields: bool = True, metric: str = 'cosine', min_score=None, traditional_weight=0.075, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False, **kwargs)

Search a text field with vector and text using Vector Search and Traditional Search

Vector similarity search + Traditional Fuzzy Search with text and vector.

You can also give weightings of each vector field towards the search, e.g. image_vector_ weights 100%, whilst description_vector_ 50%.

Hybrid search with filters also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.

Parameters
  • collection_name – Name of Collection

  • page – Page of the results

  • page_size – Size of each page of results

  • approx – Used for approximate search

  • sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • filters – Query for filtering the search results

  • min_score – Minimum score for similarity metric

  • include_vector – Include vectors in the search results

  • include_count – Include count in the search results

  • include_facets – Include facets in the search results

  • hundred_scale – Whether to scale up the metric by 100

  • multivector_query – Query for advance search that allows for multiple vector and field querying

  • text – Text Search Query (not encoded as vector)

  • text_fields – Text fields to search against

  • traditional_weight – Multiplier of traditional search. A value of 0.025~0.1 is good.

  • fuzzy – Fuzziness of the search. A value of 1-3 is good.

  • join

    Whether to consider cases where there is a space in the word. E.g. Go Pro vs GoPro.

    asc:

    Whether to sort the score by ascending order (default is false, for getting most similar results)