Analyzing NBA Free Agency using Machine Learning.

Source: unsplash.com/

The National Basketball Association is one of the many professional sports leagues that impose financial restrictions on Team Salaries. These restrictions are called the NBA Salary Cap and are defined by the league’s collective bargaining agreement(CBA). The NBA Salary Cap is imposed to ensure balance among teams and prevents the Lakers from signing every All-Star they get their hands on. NBA general managers have a crucial role in building a team capable of winning the Championship while also maintaining team salary per the Salary Cap rules. In this article, we investigate how successful teams are in allocating their contracts and optimizing their teams. This study is restricted to NBA Free Agencies from 2016 to the most recent class in 2019.

Why? .. because it’s all we could find ….. :(

Methodology:

This study considers Free Agency signings and the amount paid in the first year of a new contract. This means, if a player signs a 3-year contract, only the first year is considered. To explore a team’s Free Agency practices, we built a Multiple Linear Regression model, using Python, which models NBA player salaries based on their statistics. Using the residuals from this model, we can analyze which players are inaccurately valued and which teams tend to over/under value players while signing them.

Data:

All Data used for this model is sourced from Basketball-Reference and Basketball Real GM and consists of player information, regular statistics, advanced statistics, and individual accomplishments. All collected features are as follows:

  1. Player Name
  2. Age
  3. Position
  4. Old Team, (before Free Agency)
  5. Draft Pick
  6. Draft Year
  7. Years Played in the League
  8. New Team (Team that signed them)
  9. Salary for the first year of a contract
  1. Games Played
  2. Games Started
  3. Minutes played/Game
  4. Field Goals/Game
  5. Field Goals Attempted/Game
  6. Field Goals Percentage
  7. 3 Points/Game
  8. 3 Points Attempted/Game
  9. 3 Points Percentage
  10. 2 Points/Game
  11. 2 Points Attempted/Game
  12. 2 Points Percentage
  13. Effective Field Goals Percentage
  14. Free Throws/Game
  15. Free Throws Attempted
  16. Free Throws Percentage
  17. Offensive Rebounds/Game
  18. Defensive Rebounds/Game
  19. Total Rebounds/Game
  20. Assists/Game
  21. Steals/Game
  22. Blocks/Game
  23. Turnovers/Game
  24. Personal Fouls/Game
  25. Points/Game.
  1. Player Efficiency Rating (PER)
  2. True Shooting Percentage
  3. 3 Point Attempt Rate(3PAr)
  4. Free Throw Attempt Rate (FTr)
  5. Offensive Rebounds Percentage
  6. Defensive Rebounds Percentage
  7. Total Rebounds Percentage
  8. Assists Percentage
  9. Steals Percentage
  10. Block Percentage
  11. Turnovers Percentage
  12. Usage Percentage
  13. Offensive Win Shares
  14. Defensive Win Shares
  15. Win Shares
  16. Win Shares Per 48 Minutes
  17. Offensive Box Plus/Minus
  18. Defensive Box Plus/Minus
  19. Box Plus/Minus
  20. Value over Replacement Player (VORP)
  1. NBA All-Star Appearance This Year
  2. Total NBA All-Star Appearances
  3. All NBA First, Second or Third Team Nomination This Year
  4. Total All NBA First, Second or Third Team Nominations
  5. All-Defensive First or Second Team Nomination This Year
  6. Total All-Defensive First or Second Team Nominations
  7. All-Rookie First or Second Team Nomination This Year
  8. Total All-Rookie First or Second Team Nominations
  9. Most Valuable Player (MVP) This Year
  10. Total Most Valuable Player Awards
  11. Rookie of the Year This Year
  12. Total Rookie of the Year Awards
  13. Defensive Player of the Year This Year
  14. Total Defensive Player of the Year Awards
  15. Sixth Man of the Year This Year
  16. Total Sixth Man of the Year Awards
  17. Most Improved Player of the Year This Year
  18. Total Most Improved Player of the Year Awards

According to the CBA, individual player contracts are restricted to a certain percentage of the Salary cap. This is because the Salary Cap tends to fluctuate season to season as the Cap value depends on league revenue. For this model, we considered percentages of Salary Cap for the respective season as the target variable.

Multiple Linear Regression:

Using Python’s Scikit-Learn library, we built a Linear Regression model. To refine this model, we used a Backward Elimination algorithm to reduce model dimensionality and a 10-folds cross-validation to evaluate its performance. The Backward Elimination algorithm selects features that have the most impact on our target and helps generalize our model. The following features were chosen by the Backward Elimination Algorithm:

  • Age
  • All NBA Second Team Nomination This Year
  • NBA All-Star Appearance This Year
  • Positions
  • Defensive Win Shares
  • Draft Year
  • Field Goals Attempted
  • Games Played
  • Offensive Rebounds
  • Offensive Win Shares
  • Turnovers
  • Turnover Percentage
  • Total Sixth Man of the Year Awards
  • Total All-Defensive Second Team Nominations
  • Total All NBA First Team Nominations
  • Total Most Valuable Player Awards

This resulted in a model with an r-squared value of 0.728 and a Root Mean Squared Error of 3.9. This means that our model can explain 72.8% of the variance in the ‘percentage of Salary Cap’ for each player and that our model on average misses the actual value by 3.9 points.

Residual Plot from Linear Model.

Feature Impact:

The linear regression model gives the following coefficients:

NBA All-Star Appearance This Year, Offensive and Defensive Win Shares, Total All-Defensive Second Team Nominations, and Total Sixth Man of the Year Awards have the highest positive coefficients. This means these features have a high positive impact on ‘Percentage of Salary Cap’ of each player, i.e. the higher these features, the higher the ‘Percentage of Salary Cap’.

All NBA Second Team Nomination This Year, Total Most Valuable Player awards and Turnovers/Game have the highest negative impact. It makes sense that a high turnover/game value would decrease the ‘Percentage of Salary Cap’ allocated for a player. However, the latter two features make no sense. On further investigation, we observe that these features act in normalizing the value for players with high impacts. The model without these two features would produce extreme values for a star player.

Predicted % of Cap before and after dropping normalizing features.

As the model produces predictions for the ‘Percentage of Salary’ for each player, we categorized players into:

  • Overpaid: Those who are paid 2.5% points (or more) more than their predicted amount.
  • Underpaid: Those who are paid 2.5% points (or more) less than their predicted amount.

This is done to give NBA teams more flexibility as it would be unreasonable to expect teams to pay the exact amount as mentioned by this model.

Model Insights:

Looking at the residuals, we get the following insights:

  • The model tends to undervalue star players. This could arise due to All NBA Second Team Nomination This Year and Total Most Valuable Player awards having high negative impacts on the target variable.
  • The model tends to overvalue Centers. This makes sense with the current landscape of the league. Centers are no longer considered valuable in modern basketball with teams focusing more on 3 points and perimeter-oriented offense.

Hence the following insights are observed:

  • Looking at the below tables, we observe that teams tend to overpay those players who had good careers but have started to regress due to old age or major injuries. Older players and role players tend to be underpaid by teams.
Overpaid players with high residuals.
Overpaid Players with high Percentage of Salary Cap.
Underpaid players with high residuals.
Underpaid player with high Percentage of Salary Cap.
  • Looking at age distributions, we observe that older players tend to get underpaid. This likely because teams are reluctant to sign them over younger more energetic players.
Age distributions of Players
  • Looking at position distributions, it’s clear that Centers tend to be underpaid.
Distribution of Positions.
Underpaid Players Ratio Chart
Overpaid Players Ratio Chart
  • Sacramento Kings have the highest ratio of overpaid players. However, they do not tend to pay them with a high margin. They overpay players with an average residual margin of 4.42%.
  • Charlotte Hornets and Minnesota Timberwolves pay extensively high when they overpay players, however, they do not overplay players often.
  • Chicago Bulls and Philadelphia 76ers tend to overpay players and pay them with a relatively high margin. They overpay players with an average residual margin of 6.85% and 6.63% respectively.
  • New Orleans Pelicans have the highest ratio of underpaid players and they tend to underpay them by a sizable margin. However, they also tend to overpay players and by a high margin. They overpay players with an average residual margin of 6.3%.
  • Cleveland Cavaliers have the highest average residual margin for underpaid players. However, they seldom tend to underpay players.
  • Indiana Pacers and Golden State Warriors have a high ratio of underpaid players and are amongst the teams with a low ratio of overpaid players.
  • Golden State Warriors has a very high average residual margin for underpaid players with -6.76%. It’s safe to claim that they have been very successful in Free Agency

Golden State Warriors and Indiana Pacers are amongst the most cost-efficient teams. Philadelphia 76ers and Chicago Bulls tend to over evaluate talent and are the least cost-efficient teams.

Problems and Potential Improvements

  • This model fails to consider intangible characteristics. Leadership qualities, player potential, and team spirit are difficult to quantify numerically. However, they have massive impacts on team success. This is one of the reasons why this model undervalues star players like Jimmy Butler and Klay Thompson.
  • Big market effect has not been explored in this model. Players gravitate towards Big Market Cities such as Los Angeles or New York due to the potential commercial success players witness with increased media coverage and sponsorship deals. Smaller Market teams are forced to overpay their players as incentive to stay, e.g. Buddy Hield managed to negotiate a larger contract stating to no ‘Big-Name’ Free Agent comes to the Sacramento Kings.
  • The data collected is quite small, have more data from older Free Agency classes would provide a more stable and accurate model.

Github Repo

Thanks for Reading *******************************

Data Scientist | Basketball Enthusiast | Comic Book Geek

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store