Write¶

Write

This is documentation for the Write API for Vector AI.

Write Operations for Vi that involves inserting documents, editing or deleting documents.

class vectorai.write.ViWriteClient(username, api_key, url='https://api.vctr.ai')¶

Bases: vectorai.api.client.ViAPIClient, vectorai.utils.UtilsMixin

Class to write to database.

static chunk(documents: Union[pandas.core.frame.DataFrame, List], chunk_size: int = 20)¶

Chunk an iterable object in Python.

Parameters

documents – List of dictionaries/Pandas dataframe
chunk_size – The chunk size of an object.

Example

>>> documents = [{...}]
>>> ViClient.chunk(documents)

static dummy_vector(vector_length)¶

Dummy vector for missing vector fields.

Parameters

collection_name – Name of collection
edits – What edits to make in a collection.
document_id – Id of the document

Example

>>> from vectorai.client import ViClient
>>> dummy_vector = ViClient.dummy_vector(20)

static set_field(field: str, doc: Dict, value: Any, handle_if_missing=True)¶

For nested dictionaries, tries to write to the respective field. If you toggle off handle_if_misisng, then it will output errors if the field is not found. e.g. field = kfc.item value = “fried wings” This should then create the following entries if they dont exist: {

“kfc”: {
“item”: “fried wings”

}

}

Parameters

field – Field of the document to write.
doc – Python dictionary
value – Value to write

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> sample_document = {'kfc': {'item': ''}}
>>> vi_client.set_field('kfc.item', sample_document, 'chickens')

create_collection(collection_name: str, collection_schema: Dict = {}, **kwargs)¶

Create a collection

A collection can store documents to be searched, retrieved, filtered and aggregated (similar to Collections in MongoDB, Tables in SQL, Indexes in ElasticSearch).

If you are inserting your own vector use the suffix (ends with) “_vector_” for the field name. and specify the length of the vector in colletion_schema like below example:

{
    "collection_schema": {
        "celebrity_image_vector_": 1024,
        "celebrity_audio_vector" : 512,
        "product_description_vector" : 128
    }
}

Parameters

collection_name – Name of a collection
collection_schema – A collection schema. This is necessary if the first document is not representative of the overall schema collection. This should be specified if the items need to be edited. The schema needs to look like this : { vector_field_name: vector_length }

Example

>>> collection_schema = {'image_vector_':2048}
>>> ViClient.create_collection(collection_name, collection_schema)

delete_collection(collection_name: str, **kwargs)¶

Delete the collection via the colleciton name.

Parameters: collection_name – Name of collection to delete.

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.delete_collection(collection_name)

encode_documents_with_models_using_encode(documents: List[Dict], models: Dict)¶: Encode documents with appropriate models without a bulk_encode function. :param documents: List of documents/JSONs/dictionaries. :param models: A dictionary of fields and models to determine the type of encoding for each field.

encode_documents_with_models(documents: List[Dict], models: Union[Dict[str, Callable], List[Dict]] = {}, use_bulk_encode=False)¶

Encode documents with appropriate models.

Parameters

documents – List of documents/jsons/dictionaries.
models – A dictionary of fields and models to determine the type of encoding for each field.

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> from vectorai.models.deployed import ViText2Vec
>>> text_encoder = ViText2Vec(username, api_key, vectorai_url)
>>> documents = [{'chicken': 'Big chicken'}, {'chicken': 'small_chicken'}, {'chicken': 'cow'}]
>>> vi_client.encode_documents_with_models(documents=documents, models={'chicken': text_encoder.encode})

encode_documents_with_models_in_bulk(documents: List[Dict], models: Dict)¶

Encode documents with models to allow for bulk_encode.

Parameters

documents – List of documents/jsons/dictionaries.
models – A dictionary of fields and models to determine the type of encoding for each field.

insert_document(collection_name: str, document: Dict, verbose=False)¶

Insert a document into a collection

Parameters

collection_name – Name of collection
documents – List of documents/jsons/dictionaries.

Example

>>> from vectorai import ViClient
>>> from vectorai.models.deployed import ViText2Vec
>>> vi_client = ViClient()
>>> collection_name = 'test_collection'
>>> document = {'chicken': 'Big chicken'}
>>> vi_client.insert_document(collection_name, document)

insert_single_document(collection_name: str, document: Dict)¶

Encode documents with models.

Parameters: documents – List of documents/jsons/dictionaries.

Example

>>> from vectorai import ViClient
>>> from vectorai.models.deployed import ViText2Vec
>>> vi_client = ViClient()
>>> collection_name = 'test_collection'
>>> document = {'chicken': 'Big chicken'}
>>> vi_client.insert_single_document(collection_name, document)

insert_documents(collection_name: str, documents: List, models: Dict[str, Callable] = {}, chunksize: int = 15, workers: int = 1, verbose: bool = False, use_bulk_encode: bool = False, overwrite: bool = False, show_progress_bar: bool = True, quick: bool = False, preprocess_hook: Optional[Callable] = None, **kwargs)¶

Insert documents into a collection with an option to encode with models.

Parameters

collection_name – Name of collection
documents – All documents.
models – Models with an encode method
use_bulk_encode – Use the bulk_encode method in models
verbose – Whether to print document ids that have failed when inserting.
overwrite – If True, overwrites document based on _id field.
quick – If True, skip the collection schema checks. Not advised if this is your first time using the API until you are used to using Vector AI.
preprocess_hook – Document-level function taht updates

Example

>>> from vectorai.models.deployed import ViText2Vec
>>> text_encoder = ViText2Vec(username, api_key, vectorai_url)
>>> documents = [{'chicken': 'Big chicken'}, {'chicken': 'small_chicken'}, {'chicken': 'cow'}]
>>> vi_client.insert_documents(documents, models={'chicken': text_encoder.encode})

resume_insert_documents(collection_name: str, documents: List, models: Dict[str, Callable] = {}, chunksize: int = 15, workers: int = 1, verbose: bool = False, use_bulk_encode: bool = False, show_progress_bar: bool = True)¶: Resume inserting documents

insert_df(collection_name: str, df: pandas.core.frame.DataFrame, models: Dict[str, Callable] = {}, chunksize: int = 15, workers: int = 1, verbose: bool = True, use_bulk_encode: bool = False, **kwargs)¶

Insert dataframe into a collection

Parameters

collection_name – Name of collection
df – Pandas DataFrame
models – Models with an encode method
verbose – Whether to print document ids that have failed when inserting.

Example

>>> from vectorai.models.deployed import ViText2Vec
>>> text_encoder = ViText2Vec(username, api_key, vectorai_url)
>>> documents_df = pd.DataFrame.from_records([{'chicken': 'Big chicken'}, {'chicken': 'small_chicken'}, {'chicken': 'cow'}])
>>> vi_client.insert_df(documents=documents_df, models={'chicken': text_encoder.encode})

edit_documents(collection_name: str, edits: Dict, chunk_size: int = 15, verbose: bool = False, **kwargs)¶

Edit documents in a collection

Parameters

collection_name – Name of collection
edits – What edits to make in a collection. Ensure that _id is stored in the document.
workers – Number of parallel processes to run.

Example

>>> from vectorai.client import ViClient
>>> vi_client = ViClient(username, api_key, vectorai_url)
>>> vi_client.edit_documents(collection_name, edits=documents, workers=10)

retrieve_and_encode(collection_name: str, models: Dict[str, Callable] = {}, chunksize: int = 15, use_bulk_encode: bool = False, filters: list = [], refresh: bool = False)¶: Retrieve all documents and re-encode with new models. :param collection_name: Name of collection :param models: Models as a dictionary :param chunksize: the number of results to :param retrieve and then encode and then edit in one go: :param use_bulk_encode: Whether to use bulk_encode on the models. :param filter_query: Filtering :param refresh: If True, retrieves and encodes from scratch, otherwise, only

encodes for fields that are not there. Only the filter for the first model is applied

retrieve_and_edit(collection_name: str, edit_fn: Callable, refresh: bool = False, edited_fields: list = [], include_fields: list = [], chunksize: int = 15)¶: Retrieve all documents and re-encode with new models. :param collection_name: Name of collection :param edit_fn: Function for editing an entire document :param include_fields: The number of fields to retrieve to speed up the document :param retrieval step: :param chunksize: the number of results to :param retrieve and then encode and then edit in one go: :param edited_fields: These are the edited fields used to change

create_collection_from_document(collection_name: str, document: dict, **kwargs)¶

Creates a collection by infering the schema from a document

If you are inserting your own vector use the suffix (ends with) “_vector_” for the field name. e.g. “product_description_vector_”

Parameters

collection_name – Name of Collection
document – A Document is a JSON-like data that we store our metadata and vectors with. For specifying id of the document use the field ‘_id’, for specifying vector field use the suffix of ‘_vector_’

class vectorai.api.write.ViWriteAPIClient(username, api_key, url=None)¶

Bases: vectorai.api.read.ViReadAPIClient

Write Operations

create_collection_from_document(collection_name: str, document: dict, return_curl: bool = False, **kwargs)¶

Creates a collection by infering the schema from a document

If you are inserting your own vector use the suffix (ends with) “_vector_” for the field name. e.g. “product_description_vector_”

Parameters

collection_name – Name of Collection
document – A Document is a JSON-like data that we store our metadata and vectors with. For specifying id of the document use the field ‘_id’, for specifying vector field use the suffix of ‘_vector_’

bulk_insert_and_encode(collection_name: str, docs: list, models: dict, return_curl: bool = False, **kwargs)¶: Client-side encoding of documents to improve speed of inserting. This removes the step of retrieving the vectors and can be useful to accelerate the encoding process if required. Models can be one of ‘text’, ‘audio’ or ‘image’.

bulk_insert(collection_name: str, documents: List, insert_date: bool = True, overwrite: bool = True, quick: bool = False, return_curl: bool = False, **kwargs)¶

Insert multiple documents into a Collection When inserting the document you can specify your own id for a document by using the field name “_id”. For specifying your own vector use the suffix (ends with) “_vector_” for the field name. e.g. “product_description_vector_”

Parameters

collection_name – Name of Collection
documents – A list of documents. Document is a JSON-like data that we store our metadata and vectors with. For specifying id of the document use the field ‘_id’, for specifying vector field use the suffix of ‘_vector_’
insert_date – Whether to include insert date as a field ‘insert_date_’.
overwrite –
Whether to overwrite document if it exists.

quick:
If True, skips collection schema checks

insert(collection_name: str, document: Dict, insert_date: bool = True, overwrite: bool = True, return_curl: bool = False, **kwargs)¶

Insert a document into a Collection When inserting the document you can specify your own id for a document by using the field name “_id”. For specifying your own vector use the suffix (ends with) “_vector_” for the field name. e.g. “product_description_vector_”

Parameters

collection_name – Name of Collection
document – A Document is a JSON-like data that we store our metadata and vectors with. For specifying id of the document use the field ‘_id’, for specifying vector field use the suffix of ‘_vector_’
insert_date – Whether to include insert date as a field ‘insert_date_’.
overwrite – Whether to overwrite document if it exists.

bulk_edit_document(collection_name: str, documents: List[Dict], return_curl: bool = False, **kwargs)¶: Edits documents by providing a key value pair of fields you are adding or changing, make sure to include the “_id” in the documents. :param collection_name: Name of collection :param documents: A list of documents. Document is a JSON-like data that we store our metadata and vectors with. For specifying id of the document use the field ‘_id’, for specifying vector field use the suffix of ‘_vector_’

delete_by_id(collection_name: str, document_id: str, return_curl: bool = False, **kwargs)¶

Delete a document in a Collection by its id

Parameters

document_id – ID of a document
collection_name – Name of Collection

publish_aggregation(collection_name: str, aggregation_query: dict, aggregation_name: str, aggregated_collection_name: str, description: str = 'published aggregation', date_field: str = 'insert_date_', refresh_time: int = 30, start_immediately: bool = True, return_curl: bool = False, **kwargs)¶

Publishes your aggregation query to a new collection Publish and schedules your aggregation query and saves it to a new collection. This new collection is just like any other collection and you can read, filter and aggregate it.

Parameters

source_collection – The collection where the data to aggregate comes from
dest_collection – The name of collection of where the data will be aggregated to
aggregation_name – The name for the published scheduled aggregation
description – The description for the published scheduled aggregation
aggregation_query – The aggregation query to schedule
date_field – The date field to check whether there is new data coming in
refresh_time – How often should the aggregation check for new data
start_immediately – Whether to start the published aggregation immediately

start_aggregation(aggregation_name: str, return_curl: bool = False, **kwargs)¶

Start your published aggregation Start or resume your published aggregation. The published aggregation can be stopped with /stop_aggregation.

Parameters: aggregation_name – The name for the published scheduled aggregation

stop_aggregation(aggregation_name: str, return_curl: bool = False, **kwargs)¶

Stop your published aggregation Stop/pause your published aggregation. The published aggregation can be resumed/started with /start_aggregation.

Parameters: aggregation_name – The name for the published scheduled aggregation

delete_published_aggregation(aggregation_name: str, return_curl: bool = False)¶

Delete a published aggregation and collection Delete a published aggregation and the associated collection it creates.

Parameters: aggregation_name – The name for the published scheduled aggregation

join_collections(join_query: dict, joined_collection_name: str, return_curl: bool = False, **kwargs)¶

Join collections with a query Perform a join query on a whole collection and write the results to a new collection. We currently only support left joins.

Parameters

join_query –
.
joined_collection_name – Name of the new collection that contains the joined results