Cluster¶
Cluster¶
Documentation for vector clustering goes here.
- class vectorai.api.cluster.ViClusterClient(username, api_key, url=None)¶
Clustering
- clustering_job(collection_name: str, vector_field: str, n_clusters: int = 0, refresh: bool = True, return_curl=False, **kwargs)¶
Clusters a collection by a vector field
Clusters a collection into groups using unsupervised machine learning. Clusters can then be aggregated to understand whats in them and how vectors are seperating data into different groups.
- Parameters
vector_field – Vector field to perform clustering on
n_clusters – Number of clusters
refresh – Whether to refresh the whole collection and retrain the cluster model
collection_name – Name of Collection
- cluster_aggregate(collection_name: str, aggregation_query: Dict, page: int = 1, page_size: int = 10, asc: bool = False, flatten: bool = True, return_curl=False, **kwargs)¶
Aggregate every cluster in a collection
Takes an aggregation query and gets the aggregate of each cluster in a collection. This helps you interpret each cluster and what is in them.
Only can be used after a vector field has been clustered with /cluster.
- Parameters
collection_name – Name of Collection
aggregation_query – Aggregation query to aggregate data
page_size – Size of each page of results
page – Page of the results
asc –
Whether to sort results by ascending or descending order
- flatten:
Whether to flatten the aggregated results into a list of dictionarys or dictionary of lists.
- cluster_facets(collection_name: str, facets_fields: List = [], asc: bool = True, page_size: int = 1000, page: int = 1, date_interval: str = 'monthly', return_curl: bool = False)¶
Get Facets in each cluster in a collection
Takes a high level aggregation of every field and every cluster in a collection. This helps you interpret each cluster and what is in them.
Only can be used after a vector field has been clustered with /cluster.
- Parameters
facets_fields – Fields to include in the facets, if [] then all
page_size – Size of facet page
page – Page of the results
asc – Whether to sort results by ascending or descending order
date_interval – Interval for date facets
collection_name –
Name of Collection
- date_interval:
Defaults “monthly”. Interval for date facets
- cluster_centroids(collection_name: str, vector_field: str, return_curl: bool = False, **kwargs)¶
Returns the cluster centers of a collection by a vector field
Only can be used after a vector field has been clustered with /cluster.
- Parameters
vector_field – Clustered vector field
collection_name – Name of Collection
- cluster_centroid_documents(collection_name: str, vector_field: str, metric: str = 'cosine', include_vector: bool = True, return_curl: bool = False, **kwargs)¶
Returns the document closest to each cluster center of a collection
Only can be used after a vector field has been clustered with /cluster.
- Parameters
vector_field – Clustered vector field
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
include_vector – Include vectors in the search results
collection_name – Name of Collection
- advanced_clustering_job(collection_name: str, vector_field: str, alias: str = 'default', n_clusters: int = 0, n_init: int = 5, n_iter: int = 10, refresh: bool = True, return_curl: bool = False)¶
Clusters a collection by a vector field
Clusters a collection into groups using unsupervised machine learning. Clusters can then be aggregated to understand whats in them and how vectors are seperating data into different groups. Advanced cluster allows for more parameters to tune and alias to name each differently trained clusters.
- Parameters
vector_field – Vector field to perform clustering on
alias – Alias is used to name a cluster
n_clusters – Number of clusters
n_iter – Number of iterations in each run
n_init – Number of runs to run with different centroid seeds
refresh – Whether to refresh the whole collection and retrain the cluster model
collection_name – Name of Collection
- advanced_cluster_aggregate(collection_name: str, aggregation_query: Dict, vector_field: str, alias: str = 'default', page: int = 1, page_size: int = 10, asc: bool = False, filters: list = [], flatten: bool = True, return_curl=False, **kwargs)¶
Aggregate every cluster in a collection
Takes an aggregation query and gets the aggregate of each cluster in a collection. This helps you interpret each cluster and what is in them.
Only can be used after a vector field has been clustered with /advanced_cluster.
- Parameters
collection_name – Name of Collection
aggregation_query – Aggregation query to aggregate data
page_size – Size of each page of results
page – Page of the results
asc – Whether to sort results by ascending or descending order
vector_field – Clustered vector field
alias –
Alias of a cluster
- flatten:
Whether to flatten the aggregated results into a list of dictionarys or dictionary of lists.
- advanced_cluster_facets(collection_name: str, vector_field: str, alias: str = 'default', facets_fields: List = [], asc: bool = True, page_size: int = 1000, return_curl: bool = False, **kwargs)¶
Get Facets in each cluster in a collection
Takes a high level aggregation of every field and every cluster in a collection. This helps you interpret each cluster and what is in them.
Only can be used after a vector field has been clustered with /advanced_cluster.
- Parameters
vector_field – Clustered vector field
alias – Alias is used to name a cluster
facets_fields – Fields to include in the facets, if [] then all
page_size – Size of facet page
page – Page of the results
asc – Whether to sort results by ascending or descending order
date_interval – Interval for date facets
collection_name – Name of Collection
- advanced_cluster_centroids(collection_name: str, vector_field: str, alias: str = 'default', **kwargs)¶
Returns the cluster centers of a collection by a vector field
Only can be used after a vector field has been clustered with /advanced_cluster.
- Parameters
vector_field – Clustered vector field
alias – Alias is used to name a cluster
collection_name – Name of Collection
- advanced_cluster_centroid_documents(collection_name: str, vector_field: str, alias: str = 'default', metric: str = 'cosine', include_vector: bool = True, return_curl: bool = False, **kwargs)¶
Returns the document closest to each cluster center of a collection
Only can be used after a vector field has been clustered with /advanced_cluster.
- Parameters
vector_field – Clustered vector field
alias – Alias is used to name a cluster
metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]
include_vector – Include vectors in the search results
collection_name – Name of Collection