Cluster

Cluster

Documentation for vector clustering goes here.

class vectorai.api.cluster.ViClusterClient(username, api_key, url=None)

Clustering

clustering_job(collection_name: str, vector_field: str, n_clusters: int = 0, refresh: bool = True, return_curl=False, **kwargs)

Clusters a collection by a vector field

Clusters a collection into groups using unsupervised machine learning. Clusters can then be aggregated to understand whats in them and how vectors are seperating data into different groups.

Parameters
  • vector_field – Vector field to perform clustering on

  • n_clusters – Number of clusters

  • refresh – Whether to refresh the whole collection and retrain the cluster model

  • collection_name – Name of Collection

cluster_aggregate(collection_name: str, aggregation_query: Dict, page: int = 1, page_size: int = 10, asc: bool = False, flatten: bool = True, return_curl=False, **kwargs)

Aggregate every cluster in a collection

Takes an aggregation query and gets the aggregate of each cluster in a collection. This helps you interpret each cluster and what is in them.

Only can be used after a vector field has been clustered with /cluster.

Parameters
  • collection_name – Name of Collection

  • aggregation_query – Aggregation query to aggregate data

  • page_size – Size of each page of results

  • page – Page of the results

  • asc

    Whether to sort results by ascending or descending order

    flatten:

    Whether to flatten the aggregated results into a list of dictionarys or dictionary of lists.

cluster_facets(collection_name: str, facets_fields: List = [], asc: bool = True, page_size: int = 1000, page: int = 1, date_interval: str = 'monthly', return_curl: bool = False)

Get Facets in each cluster in a collection

Takes a high level aggregation of every field and every cluster in a collection. This helps you interpret each cluster and what is in them.

Only can be used after a vector field has been clustered with /cluster.

Parameters
  • facets_fields – Fields to include in the facets, if [] then all

  • page_size – Size of facet page

  • page – Page of the results

  • asc – Whether to sort results by ascending or descending order

  • date_interval – Interval for date facets

  • collection_name

    Name of Collection

    date_interval:

    Defaults “monthly”. Interval for date facets

cluster_centroids(collection_name: str, vector_field: str, return_curl: bool = False, **kwargs)

Returns the cluster centers of a collection by a vector field

Only can be used after a vector field has been clustered with /cluster.

Parameters
  • vector_field – Clustered vector field

  • collection_name – Name of Collection

cluster_centroid_documents(collection_name: str, vector_field: str, metric: str = 'cosine', include_vector: bool = True, return_curl: bool = False, **kwargs)

Returns the document closest to each cluster center of a collection

Only can be used after a vector field has been clustered with /cluster.

Parameters
  • vector_field – Clustered vector field

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • include_vector – Include vectors in the search results

  • collection_name – Name of Collection

advanced_clustering_job(collection_name: str, vector_field: str, alias: str = 'default', n_clusters: int = 0, n_init: int = 5, n_iter: int = 10, refresh: bool = True, return_curl: bool = False)

Clusters a collection by a vector field

Clusters a collection into groups using unsupervised machine learning. Clusters can then be aggregated to understand whats in them and how vectors are seperating data into different groups. Advanced cluster allows for more parameters to tune and alias to name each differently trained clusters.

Parameters
  • vector_field – Vector field to perform clustering on

  • alias – Alias is used to name a cluster

  • n_clusters – Number of clusters

  • n_iter – Number of iterations in each run

  • n_init – Number of runs to run with different centroid seeds

  • refresh – Whether to refresh the whole collection and retrain the cluster model

  • collection_name – Name of Collection

advanced_cluster_aggregate(collection_name: str, aggregation_query: Dict, vector_field: str, alias: str = 'default', page: int = 1, page_size: int = 10, asc: bool = False, filters: list = [], flatten: bool = True, return_curl=False, **kwargs)

Aggregate every cluster in a collection

Takes an aggregation query and gets the aggregate of each cluster in a collection. This helps you interpret each cluster and what is in them.

Only can be used after a vector field has been clustered with /advanced_cluster.

Parameters
  • collection_name – Name of Collection

  • aggregation_query – Aggregation query to aggregate data

  • page_size – Size of each page of results

  • page – Page of the results

  • asc – Whether to sort results by ascending or descending order

  • vector_field – Clustered vector field

  • alias

    Alias of a cluster

    flatten:

    Whether to flatten the aggregated results into a list of dictionarys or dictionary of lists.

advanced_cluster_facets(collection_name: str, vector_field: str, alias: str = 'default', facets_fields: List = [], asc: bool = True, page_size: int = 1000, return_curl: bool = False, **kwargs)

Get Facets in each cluster in a collection

Takes a high level aggregation of every field and every cluster in a collection. This helps you interpret each cluster and what is in them.

Only can be used after a vector field has been clustered with /advanced_cluster.

Parameters
  • vector_field – Clustered vector field

  • alias – Alias is used to name a cluster

  • facets_fields – Fields to include in the facets, if [] then all

  • page_size – Size of facet page

  • page – Page of the results

  • asc – Whether to sort results by ascending or descending order

  • date_interval – Interval for date facets

  • collection_name – Name of Collection

advanced_cluster_centroids(collection_name: str, vector_field: str, alias: str = 'default', **kwargs)

Returns the cluster centers of a collection by a vector field

Only can be used after a vector field has been clustered with /advanced_cluster.

Parameters
  • vector_field – Clustered vector field

  • alias – Alias is used to name a cluster

  • collection_name – Name of Collection

advanced_cluster_centroid_documents(collection_name: str, vector_field: str, alias: str = 'default', metric: str = 'cosine', include_vector: bool = True, return_curl: bool = False, **kwargs)

Returns the document closest to each cluster center of a collection

Only can be used after a vector field has been clustered with /advanced_cluster.

Parameters
  • vector_field – Clustered vector field

  • alias – Alias is used to name a cluster

  • metric – Similarity Metric, choose from [‘cosine’, ‘l1’, ‘l2’, ‘dp’]

  • include_vector – Include vectors in the search results

  • collection_name – Name of Collection