ML Project on Detection of Parkinson’s Disease with Python
Being a Python Developer I have always heard about its application in all the latest technological advancements going on around the world. This always fascinated me and I’ve felt encouraged to delve into the details of how it is done.
Coincidentally, I heard news about one of my relatives suffering from Parkinson’s Disease and how their doctors were unable to diagnose it in the earlier stages.
This is where I got the idea for my next adventure with Python. I decided to build a system that could detect Parkinson’s Disease quite easily. For those who are unfamiliar with the disease, it is a disorder of the nervous system that affects the movement of parts of the body.
MY MACHINE LEARNING ADVENTURE WITH PYTHON
Insights from the Project
For this Machine Learning Project, I used different Python libraries such as scikit-learn, NumPy, pandas, and xgboost to build a model by using XGBClassifier. The flow of the project goes like this — loading the data, getting the features and labels, scaling the features, then splitting the dataset, building an XGBClassifier, and then calculating the accuracy of the model.
I used the UCI ML Parkinson’s dataset for my project. It has 24 columns and 195 records and is sized only 39.7 KB.
Download Link: UCI ML Parkinsons dataset
I installed the following libraries through pip:
pip install numpy pandas sklearn xgboost
I also installed JupyterLab and then run through the following command:
This prompted a new JupyterLab window into the browser. Then I created a new console and typed in my code, then I pressed Shift+Enter to execute the lines.
THIS IS HOW I BUILT MY PROJECT:
The first step of any project is to make all the necessary imports for the project. I used the following commands:
import numpy as np
import pandas as pd
import os, sys
from sklearn.preprocessing import MinMaxScaler
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
Now the next step was to read the data into a DataFrame so that I get the first five records.
#Rinu— Read the data
Then I got the features and labels from the DataFrame (dataset). The features are all the columns except ‘status’, and the labels are those in the ‘status’ column.
#Rinu — Get the features and labels
The ‘status’ column has values 0 and 1 as labels; then I counted the labels for both 0 and 1.
#Rinu— Get the count of each label (0 and 1) in labels
There are 147 ones and 48 zeros in the status column in the dataset.
So, now I initialized a MinMaxScaler and scaled the features to between -1 and 1 to normalize them. The MinMaxScaler is meant to transform the features by scaling them to a given range. The fit_transform() method does two things; fits to the data and then transforms it. There’s no need to scale the labels.
#Rinu — Scale the features to between -1 and 1
Time to split the dataset into training and testing sets keeping only 20% of the data for testing.
#Rinu — Split the dataset
x_train,x_test,y_train,y_test=train_test_split(x, y, test_size=0.2, random_state=7)
I then initialized an XGBClassifier and trained the model. The classification is done using eXtreme Gradient Boosting, i.e. using gradient boosting algorithms for modern Data Science problems. This comes under the category of Ensemble Learning in ML, where we train and predict using many models to produce one superior output.
#Rinu — Train the model
The final step is to generate y_pred (predicted values for x_test) and to calculate the accuracy for the model. Finally, I printed it.
#Rinu — Calculate the accuracy
THE FINAL NOTE…
In this machine learning project of mine, I detected the presence of Parkinson’s Disease in people using various factors. For this, I used an XGBClassifier and made use of the sklearn library to prepare the dataset. Through this process, I achieved an accuracy of 94.87%, which is more than decent considering the number of lines of code in this Python project.
Using Python to Detect Early Onset Parkinson’s Disease was originally published in codeburst on Medium, where people are continuing the conversation by highlighting and responding to this story.