Machine Learning

Machine Learning — Overview

Machine learning is a subset of artificial intelligence (AI) that involves the development of algorithms and statistical models that enable computers to learn from and make predictions or decisions based on data. Rather than being explicitly programmed to perform specific tasks, machine learning systems use data to identify patterns and improve their performance over time.

Machine Learning examples: This scatter plot shows the linear regression model. Supervised Learning: In this type, the model is trained on a labeled dataset, meaning that each training example is paired with an output label. The model learns to predict the output from the input data. Common algorithms include decision trees, support vector machines, and neural networks.

#We plot the scatter for the linear regression model
data1 = pd.DataFrame({'Actual': y_test, 'Predicted': y_pred_test})

sns.scatterplot(data=data1, x='Actual', y='Predicted')
plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', lw=3)
plt.xlabel('Actual')
plt.ylabel('Predicted')
plt.show()

Conclusions: When we compare wages and performance between the top 20 players and the bottom 20 players, a plan can be made to move those players from the bottom to become top players. This last chart helps us understand areas that may be improved, such as areas of weight and pace.

top_players = df1.sort_values(by='potential',ascending=False).head(20)
top_players

plt.figure(figsize=(15,15))
#x = ['overall','potential','wage_eur','pace','physic','skills']
x = ['work_rate','preferred_foot','wage_eur','pace','weight_kg']
plt.subplots_adjust(left=0.1,
                    bottom=0.1,
                    right=0.9,
                    top=0.9,
                    wspace=0.4,
                    hspace=0.4)

width = 3
height = 4
index = 1
for i in x:
    plt.subplot(height, width, index)
    sns.histplot(x=top_players[i], kde=True)
    plt.xlabel(i)
    plt.xticks(rotation=45)
    index = index + 1

top_players = df1.sort_values(by='potential',ascending=False).tail(20)
top_players

plt.figure(figsize=(15,15))
#x = ['overall','potential','wage_eur','pace','physic','skills']
x = ['work_rate','preferred_foot','wage_eur','pace','weight_kg']
plt.subplots_adjust(left=0.1,
                    bottom=0.1,
                    right=0.9,
                    top=0.9,
                    wspace=0.4,
                    hspace=0.4)

width = 3
height = 4
index = 1
for i in x:
    plt.subplot(height, width, index)
    sns.histplot(x=top_players[i], kde=True)
    plt.xlabel(i)
    plt.xticks(rotation=45)
    index = index + 1

Conclusion
To summarize, while I think more work can be done targeting specific players for areas of improvement, we can focus on each induvial chart displayed to find specific incidences for further analysis. The idea is to explore the data, perform the necessary linear regression to develop relationships between different players at different skill levels. I was surprised to learn that the
insight into the many aspects of game play that can be quantified to develop an exercise program to improve players of all levels.

I think future work would be beneficial to develop some predictive behaviors as well as indicators that predict future outcomes in a more consistent way.

Resources
The dataset used can be found here as well:
https://www.kaggle.com/datasets/stefanoleone992/fifa-22-complete-player-dataset/data