Example - Vector Recommendations With NBA Players

With the growth of the NBA quickly introducing new way to calculate statistics that are human-calculated, it doesn’t make much sense to keep this using traditional human-orientated approaches and judgements based on players.

Here, we want to be able to identify players that have similar statistics to other players without having to look at each statistic line by line. We want to just know.

Below, we investigate the different players and their most similar counterparts as well as examine the efficiencies between the different players.

nba_per_36 = pd.read_excel('data/nba_per_36.xlsx', skiprows=[0])
nba_per_game = pd.read_excel('data/nba_per_game.xlsx')
FULL NAME TEAM POS AGE GP MPG MIN%Minutes PercentagePercentage of team minutes used by a player while he was on the floor USG%Usage RateUsage rate, a.k.a., usage percentage is an estimate of the percentage of team plays used by a player while he was on the floor Tor%Turnover RateA metric that estimates the number of turnovers a player commits per 100 possessions FTA FT% 2PA 2P% 3PA 3P% eFG%Effective Shooting PercentageWith eFG%, three-point shots made are worth 50% more than two-point shots made. eFG% Formula=(FGM+ (0.5 x 3PM))/FGA TS%True Shooting PercentageTrue shooting percentage is a measure of shooting efficiency that takes into account field goals, 3-point field goals, and free throws. PPGPointsPoints per game. RPGReboundsRebounds per game. TRB%Total Rebound PercentageTotal rebound percentage is estimated percentage of available rebounds grabbed by the player while the player is on the court. APGAssistsAssists per game. AST%Assist PercentageAssist percentage is an estimated percentage of teammate field goals a player assisted while the player is on the court SPGStealsSteals per game. BPGBlocksBlocks per game. TOPGTurnoversTurnovers per game. VIVersatility IndexVersatility index is a metric that measures a player’s ability to produce in points, assists, and rebounds. The average player will score around a five on the index, while top players score above 10 ORTGOffensive RatingIndividual offensive rating is the number of points produced by a player per 100 total individual possessions. DRTGDefensive RatingIndividual defensive rating estimates how many points the player allowed per 100 possessions he individually faced while staying on the court.
151 Angel Delgado Lac C 24.39 2 7.4 15.4 16.8 0.0 2 0.500 5 0.200 0 0.000 0.200 0.255 1.5 2.0 14.3 0.0 0.0 0.50 0.00 0.00 0.0 79.2 98.9
587 John Wall Was G 28.60 32 34.5 71.9 28.8 16.3 175 0.697 382 0.508 169 0.302 0.491 0.528 20.7 3.6 5.7 8.7 39.2 1.53 0.91 3.81 10.0 104.1 111.5
from vectorai import ViClient
vi_client = ViClient(username, api_key, url)
Logged in. Welcome jacky-wong. To view list of available collections, call list_collections() method.
from sklearn.preprocessing import StandardScaler

def create_collection(df, collection_name):
    df = df.fillna(0)
    scaler = StandardScaler()
    season_vector = scaler.fit_transform(df.drop(['FULL NAME', 'TEAM', 'POS', 'AGE', 'MPG'], axis=1))
    df['season_vector_'] = season_vector.tolist()
    if collection_name in vi_client.list_collections():
    return vi_client.insert_df(collection_name, df, chunksize=100)

create_collection(nba_per_game, 'nba_season_per_game_stats_demo')

{'inserted_successfully': 622, 'failed': 0, 'failed_document_ids': []}
create_collection(nba_per_36, 'nba_season_per_36_stats_demo')

{'inserted_successfully': 212, 'failed': 0, 'failed_document_ids': []}

Visualising NBA players

job = vi_client.dimensionality_reduction_job('nba_season_per_game_stats_demo', vector_field='season_vector_', n_components=2)
job = vi_client.dimensionality_reduction_job('nba_season_per_36_stats_demo', vector_field='season_vector_', n_components=2)
vi_client.wait_till_jobs_complete('nba_season_per_36_stats_demo', **job)
{'status': 'Finished'}
# rename cluster_field to vector_field
fig = vi_client.plot_dimensionality_reduced_vectors(
    collection='nba_season_per_game_stats_demo', point_label='FULL NAME', cluster_field='season_vector_', cluster_label='POS',
    title="NBA Players Stats Per Gmae",