Building your first Machine Learning Classifier in Python

Data Science with Python (15 Blogs)

Machine Learning is the buzzword right now. Some incredible stuff is being done with the help of machine learning. From being our personal assistant, to deciding our travel routes, helping us shop, aiding us in running our businesses, to taking care of our health and wellness, machine learning is integrated to our daily existence at such fundamental levels, that most of the time we don’t even realize that we are relying on it. In this article, we will follow a beginner’s approach to implement standard a machine learning classifier in Python.

Overview of Machine Learning
A Template for Machine Learning Classifiers
Machine Learning Classification Problem

Overview of Machine Learning

Machine Learning is a concept which allows the machine to learn from examples and experience, and that too without being explicitly programmed. So instead of you writing the code, what you do is you feed data to the generic algorithm, and the algorithm/ machine builds the logic based on the given data.

Machine Learning Classifier

Machine Learning involves the ability of machines to take decisions, assess the results of their actions, and improve their behavior to get better results successively.

The learning process takes place in three major ways

Supervised Learning
Unsupervised Learning
Reinforcement Learning

A Template for Machine Learning Classifiers

Machine learning tools are provided quite conveniently in a Python library named as scikit-learn, which are very simple to access and apply.

Install scikit-learn through the command prompt using:


pip install -U scikit-learn

If you are an anaconda user, on the anaconda prompt you can use:


conda install scikit-learn

The installation requires prior installation of NumPy and SciPy packages on your system.

Preprocessing: The first and most necessary step in any machine learning-based data analysis is the preprocessing part. Correct representation and cleaning of the data is absolutely essential for the ML model to train well and perform to its potential.

Step 1 – Import necessary libraries


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Step 2 – Import the dataset


dataset = pd.read_csv(<pathtofile>)

Then we split the dataset into independent and dependent variables. The independent variables shall be the input data, and the dependent variable is the output data.


X=dataset.iloc[<range of rows and input columns>].values
y=dataset.iloc[<range of rows and output column>].values

Step 3 – Handle missing data

The dataset may contain blank or null values, which can cause errors in our results. Hence we need to deal with such entries. A common practice is to replace the null values with a common value, like the mean or the most frequent value in that column.


from sklearn.preprocessing import Imputer
imputer=Imputer(missing_values="NaN", strategy="mean", axis=0)
imputer=imputer.fit(X[<range of rows and columns>])
X[<range of rows and columns>]=imputer.transform(X[<range of rows and columns>])

Step 4 – Convert categorical variables to numeric variables


from sklearn.preprocessing import LabelEncoder
le_X=LabelEncoder()
X[<range of rows and columns>]=le_X.fit_transform(X[<range of rows and columns>])
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)

Now, after encoding, it might happen that the machine assumes the numeric data as a ranking for the encoded columns. Thus, to provide equal weight, we have to convert the numbers to one-hot vectors, using the OneHotEncoder class.


from sklearn.preprocessing import OneHotEncoder
oneHE=OneHotEncoder(categorical_features=[<range of rows and columns>])
X=oneHE.fit_transform(X).toarray()

Step 5 – Perform scaling

This step is to deal with discrepancies arising out of mismatched scales of the variables. Hence, we scale them all to the same range, so that they receive equal weight while being input to the model. We use an object of the StandardScaler class for this purpose.


from sklearn.preprocessing import StandardScaler
sc_X=StandardScaler()
X=sc_X.fit_transform(X)

Step 6 – Split the dataset into training and testing data

As the last step of preprocessing, the dataset needs to be divided into a training set and test set. The standard ratio of the train-test split is 75%-25%. We can modify as per requirements. The train_test_split() function can do this for us.


from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.25)

Model Building: This step is actually quite simple. Once we decide which model to apply on the data, we can create an object of its corresponding class, and fit the object on our training set, considering X_train as the input and y_train as the output.


from sklearn.<class module> import <model class>
classifier = <model class>(<parameters>)
classifier.fit(X_train, y_train)

The model is now trained and ready. We can now apply our model to the test set, and find predicted output.


y_pred = classifier.predict(X_test)

Viewing Results: The performance of a classifier can be assessed by the parameters of accuracy, precision, recall and f1-score. These values can be seen using a method known as classification_report(). t can also be viewed as a confusion matrix that helps us to know how many of which category of data have been classified correctly.


from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)

from sklearn.metrics import classification_report
target_names = [<list of class names>]
print(classification_report(y_test, y_pred, target_names=target_names))

Machine Learning Classifier Problem

We will use the very popular and simple Iris dataset, containing dimensions of flowers in 3 categories – Iris-setosa, Iris-versicolor, and Iris-virginica. There are 150 entries in the dataset.


# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset
dataset = pd.read_csv('iris.csv')

Let us view the dataset now.


dataset.head()

We have 4 independent variables (excluding the Id), namely column numbers 1-4, and column 5 is the dependent variable. So we can separate them out.


X = dataset.iloc[:, 1:5].values
y = dataset.iloc[:, 5].values

Now we can Split the Dataset into Training and Testing.


# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)

Now we will apply a Logistic Regression classifier to the dataset.


# Building and training the model
from sklearn.linear_model import LogisticRegression
classifier = LogisticRegression()
classifier.fit(X_train, y_train)

# Predicting the Test set results
y_pred = classifier.predict(X_test)

The last step will be to analyze the performance of the trained model.


# Making the Confusion Matrix
from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)

This shows us that 13 entries of the first category, 11 of the second, and 9 of the third category are correctly predicted by the model.


# Generating accuracy, precision, recall and f1-score
from sklearn.metrics import classification_report
target_names = ['Iris-setosa','Iris-versicolor','Iris-virginica']
print(classification_report(y_test, y_pred, target_names=target_names))

The report shows the precision, recall, f1-score and accuracy values of the model on our test set, which consists of 38 entries (25% of the dataset).

Congratulations, you have successfully created and implemented your first machine learning classifier in Python! To get in-depth knowledge of Python along with its various applications, you can enroll for live Python Machine Learning training with 24/7 support and lifetime access.

Stay ahead of the curve in technology with our Post Graduate Program in AI and Machine Learning in partnership with E&ICT Academy, National Institute of Technology, Warangal. This Artificial Intelligence Course is curated to deliver the best results.

Data Science

Building your first Machine Learning Classifier in Python

Overview of Machine Learning

A Template for Machine Learning Classifiers

Machine Learning Classifier Problem

Recommended videos for you

Python Tutorial – All You Need To Know In Python Programming

Application of Clustering in Data Science Using Real-Time Examples

Python Programming – Learn Python Programming From Scratch

Python Loops – While, For and Nested Loops in Python Programming

The Whys and Hows of Predictive Modelling-I

Linear Regression With R

Web Scraping And Analytics With Python

Python Numpy Tutorial – Arrays In Python

Python Classes – Python Programming Tutorial

Diversity Of Python Programming

Python for Big Data Analytics

Data Science : Make Smarter Business Decisions

Business Analytics with R

Sentiment Analysis In Retail Domain

3 Scenarios Where Predictive Analytics is a Must

Python List, Tuple, String, Set And Dictonary – Python Sequences

Business Analytics Decision Tree in R

The Whys and Hows of Predictive Modeling-II

Introduction to Business Analytics with R

Know The Science Behind Product Recommendation With R Programming

Recommended blogs for you

Understanding Logistic Regression in R

All you Need to Know About File Handling in Python

Introduction To Markov Chains With Examples – Markov Chains With Python

Top 8 Data Science Tools Everyone Should Know

Python Decorator Tutorial : How To Use Decorators In Python

Inheritance In Python With Examples: All You Need To Know

What is Random Number Generator in Python and how to use it?

Scrapy Tutorial: How To Make A Web-Crawler Using Scrapy?

What are Comments in Python and how to use them?

How to Learn Python 3 from Scratch – A Beginners Guide

SciPy Tutorial: What is Python SciPy and How to use it?

Latest Machine Learning Projects to Try in 2019

Top 50 OOPs Interview Questions and Answers in 2024

All You Need to Know About Eval in Python

Learn How To Make A Resume For A Python Developer

Naive Bayes Classifier: Learning Naive Bayes with Python

How To Install pip In Python: Get Started With Python Installation

Python Modulo in Practice: How to Use the % Operator

Python For Loop Tutorial With Examples To Practice

How To Run A Python Script?

Join the discussionCancel reply

Trending Courses in Data Science

Data Science and Machine Learning Internship ...

Python Programming Certification Course

Data Science with Python Certification Course

Statistics Essentials for Analytics

SAS Training and Certification

Data Science with R Programming Certification ...

Data Analytics with R Programming Certificati ...

Analytics for Retail Banks

Decision Tree Modeling Using R Certification ...

Advanced Predictive Modelling in R Certificat ...

Browse Categories

Subscribe to our Newsletter, and get personalized recommendations.

Building your first Machine Learning Classifier in Python