Read¶
Read¶
Read
- class vectorai.api.read.ViReadAPIClient(username, api_key, url=None)¶
Read Operations
- collection_stats(collection_name: str, return_curl: bool = False, **kwargs)¶
Retrieves stats about a collection
Stats include: size, searches, number of documents, etc.
- Parameters
collection_name – Name of Collection
- collection_schema(collection_name: str, return_curl: bool = False, **kwargs)¶
Retrieves the schema of a collection
The schema of a collection can include types of: text, numeric, date, bool, etc.
- Parameters
collection_name – Name of Collection
- id(collection_name: str, document_id: str, include_vector: bool = True, return_curl: bool = False, **kwargs)¶
Look up a document by its id
- Parameters
document_id – ID of a document
include_vector – Include vectors in the search results
collection_name – Name of Collection
- bulk_id(collection_name: str, document_ids: List[str], return_curl: bool = False, **kwargs)¶
Look up multiple document by their ids
- Parameters
document_ids – IDs of documents
include_vector – Include vectors in the search results
collection_name – Name of Collection
- retrieve_documents(collection_name: str, page_size: int = 20, cursor: str = None, sort: List = [], asc: bool = True, include_vector: bool = True, include_fields: List = [], return_curl: bool = False, **kwargs)¶
Retrieve some documents
Cursor is provided to retrieve even more documents. Loop through it to retrieve all documents in the database.
- Parameters
include_fields – Fields to include in the document, if empty list [] then all is returned
cursor – Cursor to paginate the document retrieval
page_size – Size of each page of results
sort – Fields to sort the documents by
asc – Whether to sort results by ascending or descending order
include_vector – Include vectors in the search results
collection_name – Name of Collection
- random_documents(collection_name: str, page_size: int = 20, seed: int = None, include_vector: bool = True, include_fields: list = [], return_curl: bool = False, **kwargs)¶
Retrieve some documents randomly
Mainly for testing purposes.
- Parameters
seed – Random Seed for retrieving random documents.
page_size – Size of each page of results
include_vector – Include vectors in the search results
collection_name –
Name of Collection
- return_curl:
Return CURL statement
- id_lookup_joined(join_query: dict, doc_id: str, return_curl: bool = False, **kwargs)¶
Look up a document by its id with joins
- Parameters
join_query –
.
doc_id – ID of a Document
- aggregate(collection_name: str, aggregation_query: Dict, page: int = 1, page_size: int = 10, asc: bool = False, flatten: bool = True, return_curl: bool = False, **kwargs)¶
Aggregate a collection
Aggregation/Groupby of a collection using an aggregation query. The aggregation query is a json body that follows the schema of:
{ "groupby" : [ {"name": <nickname/alias>, "field": <field in the collection>, "agg": "category"}, {"name": <another_nickname/alias>, "field": <another groupby field in the collection>, "agg": "category"} ], "metrics" : [ {"name": <nickname/alias>, "field": <numeric field in the collection>, "agg": "avg"} ] }
“groupby” is the fields you want to split the data into. These are the available groupby types:
category” : groupby a field that is a category
“metrics” is the fields you want to metrics you want to calculate in each of those. These are the available metric types: every aggregation includes a frequency metric:
average”, “max”, “min”, “sum”, “cardinality”
- Parameters
collection_name – Name of Collection
aggregation_query – Aggregation query to aggregate data
page_size – Size of each page of results
page – Page of the results
asc –
Whether to sort results by ascending or descending order
- flatten:
Whether to flatten the aggregated results into a list of dictionarys or dictionary of lists.
- return_curl:
Return the CURL statement
- facets(collection_name: str, fields: List[str] = [], page: int = 1, page_size: int = 20, asc: bool = False, return_curl: bool = False, **kwargs)¶
Retrieve the facets of a collection
Takes a high level aggregation of every field in a collection. This is used in advance search to help create the filter bar for search.
- Parameters
facets_fields – Fields to include in the facets, if [] then all
date_interval – Interval for date facets
page_size – Size of facet page
page – Page of the results
asc – Whether to sort results by ascending or descending order
collection_name – Name of Collection
- filters(collection_name: str, filters: List, page=1, page_size=10, include_vector: bool = False, return_curl: bool = False, **kwargs)¶
Filters a collection
Filter is used to retrieve documents that match the conditions set in a filter query. This is used in advance search to filter the documents that are searched.
The filters query is a json body that follows the schema of:
[ {'field' : <field to filter>, 'filter_type' : <type of filter>, "condition":"==", "condition_value":"america"}, {'field' : <field to filter>, 'filter_type' : <type of filter>, "condition":">=", "condition_value":90}, ]
These are the available filter_type types:
1. "contains": for filtering documents that contains a string. {'field' : 'category', 'filter_type' : 'contains', "condition":"==", "condition_value": "bluetoo"]} 2. "exact_match"/"category": for filtering documents that matches a string or list of strings exactly. {'field' : 'category', 'filter_type' : 'categories', "condition":"==", "condition_value": "tv"]} 3. "categories": for filtering documents that contains any of a category from a list of categories. {'field' : 'category', 'filter_type' : 'categories', "condition":"==", "condition_value": ["tv", "smart", "bluetooth_compatible"]} 4. "exists": for filtering documents that contains a field. {'field' : 'purchased', 'filter_type' : 'exists', "condition":">=", "condition_value":" "} 5. "date": for filtering date by date range. {'field' : 'insert_date_', 'filter_type' : 'date', "condition":">=", "condition_value":"2020-01-01"} 6. "numeric": for filtering by numeric range. {'field' : 'price', 'filter_type' : 'date', "condition":">=", "condition_value":90}
These are the available conditions:
“==”, “!=”, “>=”, “>”, “<”, “<=”
- Parameters
collection_name – Name of Collection
filters – Query for filtering the search results
page – Page of the results
page_size – Size of each page of results
asc – Whether to sort results by ascending or descending order
include_vector –
Include vectors in the search results
- return_curl:
Returns Curl statement for debugging
- job_status(collection_name: str, job_id: str, job_name: str, return_curl: bool = False, **kwargs)¶
Get status of a job. Whether its starting, running, failed or finished.
- Parameters
job_id –
.
job_name –
.
collection_name – Name of Collection
- list_jobs(collection_name: str, return_curl: bool = False, **kwargs)¶
Get history of jobs
List and get a history of all the jobs and its job_id, parameters, start time, etc.
- Parameters
collection_name – Name of Collection
- bulk_missing_id(collection_name: str, document_ids: List[str], return_curl: bool = False, **kwargs)¶
Return IDs that are not in a collection.
- random_documents_with_filters(collection_name: str, seed: int = None, include_fields: List[str] = [], page_size: int = 20, include_vector: bool = False, filters: List[Dict] = [], return_curl: bool = False, **kwargs)¶
Random documents with filters.
- vector_health(collection_name: str, return_curl: bool = False, **kwargs)¶
Return vector health of a collection
Reads Operations designed for python
- class vectorai.read.ViReadClient(username: str, api_key: str, url: str = 'https://api.vctr.ai')¶
- random_aggregation_query(collection_name: str, groupby: int = 1, metrics: int = 1)¶
Generates a random filter query.
- Parameters
collection_name – name of collection
groupby – The number of groupbys to randomly generate
metrics – The number of metrics to randomly generate
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> vi_client.random_aggregation_query(collection_name, groupby=1, metrics=1)
- search(collection_name: str, vector: List, field: List, filters: List = [], approx: int = 0, sum_fields: bool = True, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False, **kwargs)¶
Vector Similarity Search. Search a vector field with a vector, a.k.a Nearest Neighbors Search
Enables machine learning search with vector search. Search with a vector for the most similar vectors.
For example: Search with a person’s characteristics, who are the most similar (querying the “persons_characteristics_vector” field):
Query person's characteristics as a vector: [180, 40, 70] representing [height, age, weight] Search Results: [ {"name": Adam Levine, "persons_characteristics_vector" : [180, 56, 71]}, {"name": Brad Pitt, "persons_characteristics_vector" : [180, 56, 65]}, ...]
- Parameters
vector – Vector, a list/array of floats that represents a piece of data.
collection_name – Name of Collection
search_fields – Vector fields to search through
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size – Size of each page of results
filters – Filters for search
page – Page of the results
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count –
Include count in the search results
- asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)
- random_filter_query(collection_name: str, text_filters: int = 1, numeric_filters: int = 0)¶
Generates a random filter query.
- Parameters
collection_name – name of collection
text_filters – The number of text filters to randomly generate
numeric_filters – The number of numeric filters to randomly generate
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> vi_client.random_filter_query(collection_name, text_filters=1, numeric_filters=0)
- head(collection_name: str, page_size: int = 5, return_as_pandas_df: bool = True)¶
The main Vi client with most of the available read and write methods available to it.
- Parameters
collection_name – The name of your collection
page_size – The number of results to return
return_as_pandas_df – If True, return as a pandas DataFrame rather than a JSON.
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> vi_client.head(collection_name, page_size=10)
- retrieve_all_documents(collection_name: str, sort: List = [], asc: bool = True, include_vector: bool = True, include_fields: List = [], retrieve_chunk_size: int = 1000, **kwargs)¶
Retrieve all documents in a given collection. We recommend specifying specific fields to extract as otherwise this function may take a long time to run.
- Parameters
collection_name – Name of collection.
sort_by – Select the fields by which to sort by.
asc – If true, returns in ascending order of what is sort.
include_vector – If true, includes _vector_ fields to return them.
include_fields – Adjust which fields are returned.
retrieve_chunk_size – The number of documents to retrieve per request.
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> all_documents = vi_client.retrieve_all_documents(collection_name)
- wait_till_jobs_complete(collection_name: str, job_id: str, job_name: str)¶
Wait until a specific job is complete.
- Parameters
collection_name – Name of collection.
job_id – ID of the job.
job_name – Name of the job.
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> job = vi_client.dimensionality_reduction_job('nba_season_per_36_stats_demo', vector_field='season_vector_', n_components=2) >>> vi_client.wait_till_jobs_complete('nba_season_per_36_stats_demo', **job)
- check_schema(collection_name: str, document: Optional[Dict] = None)¶
Check the schema of a given collection.
- Parameters
collection_name – Name of collection.
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> vi_client.check_schema(collection_name)
- list_collections() → List[str]¶
List Collections
- Parameters
username –
Username api_key:
Api Key, you can request it from request_api_key
- Returns
List of collections
Example
>>> from vectorai.client import ViClient >>> vi_client = ViClient(username, api_key, vectorai_url) >>> doc = {'items': {'chicken': 'fried'}, 'food_vector_': [0, 1, 2]} >>> vi_client._check_schema(doc)
- search_collections(keyword: str) → List[str]¶
Performs keyword matching in collections. :param keyword: Matches based on keywords
- Returns
List of collection names
Example
>>> from vectorai import ViClient >>> vi_client = ViClient() >>> vi_client.search_collections('example')
- random_recommendation(collection_name: str, search_field: str, seed=None, sum_fields: bool = True, metric: str = 'cosine', min_score=0, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, approx: int = 0, hundred_scale=True, asc: bool = False, **kwargs)¶
Recommend by random ID using vector search document_id:
ID of a document
- collection_name:
Name of Collection
- field:
Vector fields to search through
- approx:
Used for approximate search
- sum_fields:
Whether to sum the multiple vectors similarity search score as 1 or seperate
- page_size:
Size of each page of results
- page:
Page of the results
- metric:
Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
- min_score:
Minimum score for similarity metric
- include_vector:
Include vectors in the search results
- include_count:
Include count in the search results
- hundred_scale:
Whether to scale up the metric by 100
- asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)
- create_filter_query(collection_name: str, field: str, filter_type: str, filter_values: Optional[Union[List[str], str]] = None)¶
Filter type can be one of contains/exact_match/categories/exists/insert_date/numeric_range Filter types can be one of: contains: Field must contain this specific string. Not case sensitive. exact_match: Field must have an exact match categories: Matches entire field exists: If field exists in document >= / > / < / <= : Larger than or equal to / Larger than / Smaller than / Smaller than or equal to These, however, can only be applied on numeric/date values. Check collection_schema.
Args: collection_name: The name of the collection field: The field to filter on filter_type: One of contains/exact_match/categories/>=/>/<=/<.
- search_with_filters(collection_name: str, vector: List, field: List, filters: List = [], approx: int = 0, sum_fields: bool = True, metric: str = 'cosine', min_score=None, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False, **kwargs)¶
Vector Similarity Search. Search a vector field with a vector, a.k.a Nearest Neighbors Search
Enables machine learning search with vector search. Search with a vector for the most similar vectors.
For example: Search with a person’s characteristics, who are the most similar (querying the “persons_characteristics_vector” field):
Query person's characteristics as a vector: [180, 40, 70] representing [height, age, weight] Search Results: [ {"name": Adam Levine, "persons_characteristics_vector" : [180, 56, 71]}, {"name": Brad Pitt, "persons_characteristics_vector" : [180, 56, 65]}, ...]
- Parameters
vector – Vector, a list/array of floats that represents a piece of data.
collection_name – Name of Collection
search_fields – Vector fields to search through
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
page_size – Size of each page of results
page – Page of the results
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count –
Include count in the search results
- asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)
- hybrid_search_with_filters(collection_name: str, text: str, vector: List, fields: List, text_fields: List, filters: List = [], sum_fields: bool = True, metric: str = 'cosine', min_score=None, traditional_weight=0.075, page: int = 1, page_size: int = 10, include_vector: bool = False, include_count: bool = True, asc: bool = False, **kwargs)¶
Search a text field with vector and text using Vector Search and Traditional Search
Vector similarity search + Traditional Fuzzy Search with text and vector.
You can also give weightings of each vector field towards the search, e.g. image_vector_ weights 100%, whilst description_vector_ 50%.
Hybrid search with filters also supports filtering to only search through filtered results and facets to get the overview of products available when a minimum score is set.
- Parameters
collection_name – Name of Collection
page – Page of the results
page_size – Size of each page of results
approx – Used for approximate search
sum_fields – Whether to sum the multiple vectors similarity search score as 1 or seperate
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
filters – Query for filtering the search results
min_score – Minimum score for similarity metric
include_vector – Include vectors in the search results
include_count – Include count in the search results
include_facets – Include facets in the search results
hundred_scale – Whether to scale up the metric by 100
multivector_query – Query for advance search that allows for multiple vector and field querying
text – Text Search Query (not encoded as vector)
text_fields – Text fields to search against
traditional_weight – Multiplier of traditional search. A value of 0.025~0.1 is good.
fuzzy – Fuzziness of the search. A value of 1-3 is good.
join –
Whether to consider cases where there is a space in the word. E.g. Go Pro vs GoPro.
- asc:
Whether to sort the score by ascending order (default is false, for getting most similar results)