Data Visualization and Machine Learning

FIFA 22 : Exploratory Data Analysis
This analysis includes importing libraires and exploring the data. This includes data manipulation and cleaning and data analysis.



Import pandas, a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language

Import numpy, NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.

Import random, Returns a non-negative Python integer with k random bits. 

Import plotly, a Python graphing library which makes interactive, publication-quality graphs. Examples of how to make line plots, scatter plots, area charts, bar charts, error bars, box plots, histograms, heatmaps, subplots, multiple-axes, polar charts, and bubble charts.

Import Seaborn, a Python data visualization library based on matplotlib.

Import Matplotlib, Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.

Machine Learning

import Sklearn, a Machine Learning tool in Python · Simple and efficient tools for predictive data analysis · Accessible to everybody, and reusable in various contexts. Linear regression uses the relationship between the data-points to draw a straight line through all them. This line can be used to predict future values. DecisionTreeClassifier is a class capable of performing multi-class classification on a dataset.

import pandas as pd 
import numpy as np
import random as rnd
import plotly.express as px

# visualization
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

# machine learning
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeClassifier

df = pd.read_csv('players_22.csv')

Now that we have loaded the data, lets see if we can answer some questions.

Is the data clean or do we need to remove nulls or other things that would impact the data before we can use it.