Cluster¶

Documentation for vector clustering goes here.

class vectorai.api.cluster.ViClusterClient(username, api_key, url=None)¶

Clustering

clustering_job(collection_name: str, vector_field: str, n_clusters: int = 0, refresh: bool = True, return_curl=False, **kwargs)¶

Clusters a collection by a vector field

Clusters a collection into groups using unsupervised machine learning. Clusters can then be aggregated to understand whats in them and how vectors are seperating data into different groups.

Parameters

vector_field – Vector field to perform clustering on
n_clusters – Number of clusters
refresh – Whether to refresh the whole collection and retrain the cluster model
collection_name – Name of Collection

cluster_aggregate(collection_name: str, aggregation_query: Dict, page: int = 1, page_size: int = 10, asc: bool = False, flatten: bool = True, return_curl=False, **kwargs)¶

Aggregate every cluster in a collection

Takes an aggregation query and gets the aggregate of each cluster in a collection. This helps you interpret each cluster and what is in them.

Only can be used after a vector field has been clustered with /cluster.

Parameters

collection_name – Name of Collection
aggregation_query – Aggregation query to aggregate data
page_size – Size of each page of results
page – Page of the results
asc –
Whether to sort results by ascending or descending order

flatten:
Whether to flatten the aggregated results into a list of dictionarys or dictionary of lists.

cluster_facets(collection_name: str, facets_fields: List = [], asc: bool = True, page_size: int = 1000, page: int = 1, date_interval: str = 'monthly', return_curl: bool = False)¶

Get Facets in each cluster in a collection

Takes a high level aggregation of every field and every cluster in a collection. This helps you interpret each cluster and what is in them.

Only can be used after a vector field has been clustered with /cluster.

Parameters

facets_fields – Fields to include in the facets, if [] then all
page_size – Size of facet page
page – Page of the results
asc – Whether to sort results by ascending or descending order
date_interval – Interval for date facets
collection_name –
Name of Collection

date_interval:
Defaults “monthly”. Interval for date facets

cluster_centroids(collection_name: str, vector_field: str, return_curl: bool = False, **kwargs)¶

Returns the cluster centers of a collection by a vector field

Only can be used after a vector field has been clustered with /cluster.

Parameters

vector_field – Clustered vector field
collection_name – Name of Collection

cluster_centroid_documents(collection_name: str, vector_field: str, metric: str = 'cosine', include_vector: bool = True, return_curl: bool = False, **kwargs)¶

Returns the document closest to each cluster center of a collection

Only can be used after a vector field has been clustered with /cluster.

Parameters

vector_field – Clustered vector field
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
include_vector – Include vectors in the search results
collection_name – Name of Collection

advanced_clustering_job(collection_name: str, vector_field: str, alias: str = 'default', n_clusters: int = 0, n_init: int = 5, n_iter: int = 10, refresh: bool = True, return_curl: bool = False)¶

Clusters a collection by a vector field

Clusters a collection into groups using unsupervised machine learning. Clusters can then be aggregated to understand whats in them and how vectors are seperating data into different groups. Advanced cluster allows for more parameters to tune and alias to name each differently trained clusters.

Parameters

vector_field – Vector field to perform clustering on
alias – Alias is used to name a cluster
n_clusters – Number of clusters
n_iter – Number of iterations in each run
n_init – Number of runs to run with different centroid seeds
refresh – Whether to refresh the whole collection and retrain the cluster model
collection_name – Name of Collection

advanced_cluster_aggregate(collection_name: str, aggregation_query: Dict, vector_field: str, alias: str = 'default', page: int = 1, page_size: int = 10, asc: bool = False, filters: list = [], flatten: bool = True, return_curl=False, **kwargs)¶

Aggregate every cluster in a collection

Takes an aggregation query and gets the aggregate of each cluster in a collection. This helps you interpret each cluster and what is in them.

Only can be used after a vector field has been clustered with /advanced_cluster.

Parameters

collection_name – Name of Collection
aggregation_query – Aggregation query to aggregate data
page_size – Size of each page of results
page – Page of the results
asc – Whether to sort results by ascending or descending order
vector_field – Clustered vector field
alias –
Alias of a cluster

flatten:
Whether to flatten the aggregated results into a list of dictionarys or dictionary of lists.

advanced_cluster_facets(collection_name: str, vector_field: str, alias: str = 'default', facets_fields: List = [], asc: bool = True, page_size: int = 1000, return_curl: bool = False, **kwargs)¶

Get Facets in each cluster in a collection

Takes a high level aggregation of every field and every cluster in a collection. This helps you interpret each cluster and what is in them.

Only can be used after a vector field has been clustered with /advanced_cluster.

Parameters

vector_field – Clustered vector field
alias – Alias is used to name a cluster
facets_fields – Fields to include in the facets, if [] then all
page_size – Size of facet page
page – Page of the results
asc – Whether to sort results by ascending or descending order
date_interval – Interval for date facets
collection_name – Name of Collection

advanced_cluster_centroids(collection_name: str, vector_field: str, alias: str = 'default', **kwargs)¶

Returns the cluster centers of a collection by a vector field

Only can be used after a vector field has been clustered with /advanced_cluster.

Parameters

vector_field – Clustered vector field
alias – Alias is used to name a cluster
collection_name – Name of Collection

advanced_cluster_centroid_documents(collection_name: str, vector_field: str, alias: str = 'default', metric: str = 'cosine', include_vector: bool = True, return_curl: bool = False, **kwargs)¶

Returns the document closest to each cluster center of a collection

Only can be used after a vector field has been clustered with /advanced_cluster.

Parameters

vector_field – Clustered vector field
alias – Alias is used to name a cluster
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
include_vector – Include vectors in the search results
collection_name – Name of Collection