{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Vector Analytics\n", "\n", "Vector analytics is useful when you are interested in determining the quality of your vectors.\n", "\n", "Once we have our multi-dimensional vectors (in 128D, 256D, 768D etc), we may want some way to visually grasp how good our encodings are and to understand if there's any anomalies in our data and if we may want to try a different model. Vi supports visualising your vectors in 2D using high-level libraries such as Plotly.\n", "\n", "Vector analytics include clustering the different vectors, reducing the dimensionality of the vectors and then visualising." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# %pip install -e .." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from vectorai import ViClient\n", "import os\n", "username = os.environ['VI_USERNAME']\n", "api_key = os.environ['VI_API_KEY']\n", "collection_name = 'nlp-qa'" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [], "source": [ "collection_name = 'nlp-qa'\n", "vi_client = ViClient(username, api_key)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Clustering\n", "Quickstart using Vi client.\n", "Download Google QUEST Kaggle data from: https://www.kaggle.com/c/google-quest-challenge/data\n", "Once you download data, move train.csv into local directory." ] }, { "cell_type": "code", "execution_count": 60, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | qa_id | \n", "question_title | \n", "question_body | \n", "question_user_name | \n", "question_user_page | \n", "answer | \n", "answer_user_name | \n", "answer_user_page | \n", "url | \n", "category | \n", "... | \n", "question_well_written | \n", "answer_helpful | \n", "answer_level_of_information | \n", "answer_plausible | \n", "answer_relevance | \n", "answer_satisfaction | \n", "answer_type_instructions | \n", "answer_type_procedure | \n", "answer_type_reason_explanation | \n", "answer_well_written | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "0 | \n", "What am I losing when using extension tubes in... | \n", "After playing around with macro photography on... | \n", "ysap | \n", "https://photo.stackexchange.com/users/1024 | \n", "I just got extension tubes, so here's the skin... | \n", "rfusca | \n", "https://photo.stackexchange.com/users/1917 | \n", "http://photo.stackexchange.com/questions/9169/... | \n", "LIFE_ARTS | \n", "... | \n", "1.000000 | \n", "1.000000 | \n", "0.666667 | \n", "1.000000 | \n", "1.000000 | \n", "0.800000 | \n", "1.0 | \n", "0.0 | \n", "0.000000 | \n", "1.000000 | \n", "
1 | \n", "1 | \n", "What is the distinction between a city and a s... | \n", "I am trying to understand what kinds of places... | \n", "russellpierce | \n", "https://rpg.stackexchange.com/users/8774 | \n", "It might be helpful to look into the definitio... | \n", "Erik Schmidt | \n", "https://rpg.stackexchange.com/users/1871 | \n", "http://rpg.stackexchange.com/questions/47820/w... | \n", "CULTURE | \n", "... | \n", "0.888889 | \n", "0.888889 | \n", "0.555556 | \n", "0.888889 | \n", "0.888889 | \n", "0.666667 | \n", "0.0 | \n", "0.0 | \n", "0.666667 | \n", "0.888889 | \n", "
2 rows × 41 columns
\n", "\n", " | _id | \n", "qa_id | \n", "question_title | \n", "question_body | \n", "question_user_name | \n", "question_user_page | \n", "answer | \n", "answer_user_name | \n", "answer_user_page | \n", "url | \n", "... | \n", "answer_relevance | \n", "answer_satisfaction | \n", "answer_type_instructions | \n", "answer_type_procedure | \n", "answer_type_reason_explanation | \n", "answer_well_written | \n", "insert_date_ | \n", "_clusters_ | \n", "_dr_ | \n", "question_vector_ | \n", "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", "3e80C3QBW7TQbeJX1K3V | \n", "5899 | \n", "Eigenvalues of a transition probability matrix | \n", "I have read that, for\\n$$I - \\alpha P$$\\nwhere... | \n", "user243716 | \n", "https://math.stackexchange.com/users/243716 | \n", "As stated, the result is FALSE. Presuming P i... | \n", "Mark L. Stone | \n", "https://math.stackexchange.com/users/240387 | \n", "http://math.stackexchange.com/questions/130034... | \n", "... | \n", "1.000000 | \n", "1.000000 | \n", "0.0 | \n", "0.000000 | \n", "1.000000 | \n", "1.000000 | \n", "2020-08-21T05:20:49.072579 | \n", "{'question_vectors_': {'1st_cluster': 43}, 'qu... | \n", "{'default': {'2': {'question_vectors_': [1.229... | \n", "[0.15878267586231232, 0.2284708470106125, -0.0... | \n", "
1 | \n", "dn00C3QBv-xqKfY1H0KD | \n", "2660 | \n", "Does adding a comma before \"or\" change the mea... | \n", "For example, the definition given from the OAL... | \n", "kiamlaluno | \n", "https://ell.stackexchange.com/users/95 | \n", "Regarding general usage (in the U.S., at least... | \n", "Scott | \n", "https://ell.stackexchange.com/users/357 | \n", "http://ell.stackexchange.com/questions/3208/do... | \n", "... | \n", "0.888889 | \n", "0.733333 | \n", "0.0 | \n", "0.333333 | \n", "0.666667 | \n", "0.888889 | \n", "2020-08-21T05:20:49.073579 | \n", "{'question_vectors_': {'1st_cluster': 42}, 'qu... | \n", "{'default': {'2': {'question_vectors_': [-0.19... | \n", "[0.24962793290615082, 0.05536836013197899, -0.... | \n", "
2 rows × 46 columns
\n", "\n", " | question_bert_vector_ | \n", "question_albert_vector_ | \n", "
---|---|---|
0 | \n", "This Main Building, and the library collection, was entirely destroyed by a fire in April 1879, and the school closed immediately and students were sent home. The university founder, Fr. Sorin and the president at the time, the Rev. William Corby, immediately planned for the rebuilding of the st... | \n", "This Main Building, and the library collection, was entirely destroyed by a fire in April 1879, and the school closed immediately and students were sent home. The university founder, Fr. Sorin and the president at the time, the Rev. William Corby, immediately planned for the rebuilding of the st... | \n", "
1 | \n", "Architecturally, the school has a Catholic character. Atop the Main Building's gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend \"Venite Ad Me Omnes\". Next to the Main Building... | \n", "This Main Building, and the library collection, was entirely destroyed by a fire in April 1879, and the school closed immediately and students were sent home. The university founder, Fr. Sorin and the president at the time, the Rev. William Corby, immediately planned for the rebuilding of the st... | \n", "
2 | \n", "Architecturally, the school has a Catholic character. Atop the Main Building's gold dome is a golden statue of the Virgin Mary. Immediately in front of the Main Building and facing it, is a copper statue of Christ with arms upraised with the legend \"Venite Ad Me Omnes\". Next to the Main Building... | \n", "As of 2012[update] research continued in many fields. The university president, John Jenkins, described his hope that Notre Dame would become \"one of the pre–eminent research institutions in the world\" in his inaugural address. The university has many multi-disciplinary institutes devoted to res... | \n", "