Artificial Intelligence

Regression Versus Classification Machine Learning: What’s the Difference?

Regression Versus Classification Machine Learning: What’s the Difference?

 

The difference between regression machine learning algorithms and classification machine learning algorithms sometimes confuse most data scientists, which make them to implement wrong methodologies in solving their prediction problems.

Andreybu, who is from Germany and has more than 5 years of machine learning experience, says that “understanding whether the machine learning task is a regression or classification problem is key for selecting the right algorithm to use.”

You can learn from one of his machine learning projects here.

Let’s start by talking about the similarities between the two techniques.

Supervised machine learning

Regression and classification are categorized under the same umbrella of supervised machine learning.

Both share the same concept of utilizing known datasets (referred to as training datasets) to make predictions.

In supervised learning, an algorithm is employed to learn the mapping function from the input variable (x) to the output variable (y); that is y = f(X).

The objective of such a problem is to approximate the mapping function (f) as accurately as possible such that whenever there is a new input data (x), the output variable (y) for the dataset can be predicted.

Here is a chart that shows the different groupings of machine learning:

machine learning categories, supervised learning unsupervised learning

Unfortunately, there is where the similarity between regression versus classification machine learning ends.

The main difference between them is that the output variable in regression is numerical (or continuous) while that for classification is categorical (or discrete).

Regression in machine learning

In machine learning, regression algorithms attempt to estimate the mapping function (f) from the input variables (x) to numerical or continuous output variables (y).

In this case, y is a real value, which can be an integer or a floating point value. Therefore, regression prediction problems are usually quantities or sizes.

For example, when provided with a dataset about houses, and you are asked to predict their prices, that is a regression task because price will be a continuous output.

Examples of the common regression algorithms include linear regression, Support Vector Regression (SVR), and regression trees.

Some algorithms, such as logistic regression, have the name “regression” in their names but they are not regression algorithms.

Here is an example of a linear regression problem in Python:

import numpy as np

import pandas as pd

# importing the model

from sklearn.linear_model import LinearRegression

from sklearn.cross_validation import train_test_split

# importing the module for calculating the performance metrics of the model

from sklearn import metrics

data_path = “http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv” # loading the advertising dataset

data = pd.read_csv(data_path, index_col=0)

array_items = [‘TV’, ‘radio’, ‘newspaper’] #creating an array list of the items

X = data[array_items] #choosing a subset of the dataset

y = data.sales #sales

# dividing X and y into training and testing units

X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

linearreg = LinearRegression() #applying the linear regression model

linearreg.fit(X_train, y_train) #fitting the model to the training data

y_predict = linearreg.predict(X_test) #making predictions based on the testing unit

print(np.sqrt(metrics.mean_squared_error(y_test, y_predict))) #calculating the RMSE number

#output gives the RMSE number as 1.4046514230328955

Classification in machine learning

On the other hand, classification algorithms attempt to estimate the mapping function (f) from the input variables (x) to discrete or categorical output variables (y).

In this case, y is a category that the mapping function predicts. If provided with a single or several input variables, a classification model will attempt to predict the value of a single or several conclusions.

For example, when provided with a dataset about houses, a classification algorithm can try to predict whether the prices for the houses “sell more or less than the recommended retail price.”

Here, the houses will be classified whether their prices fall into two discrete categories: above or below the said price.

Examples of the common classification algorithms include logistic regression, Naïve Bayes, decision trees, and K Nearest Neighbors.

Here is an example of a classification problem that differentiates between an orange and an apple:

from sklearn import tree

# Gathering training data

# features = [[155, “rough”], [180, “rough”], [135, “smooth”], [110, “smooth”]]  # Input to classifier

features = [[155, 0], [180, 0], [135, 1], [110, 1]]  # scikit-learn requires real-valued features

# labels = [“orange”, “orange”, “apple”, “apple”]  # output values

labels = [1, 1, 0, 0]

# Training classifier

classifier = tree.DecisionTreeClassifier()  # using decision tree classifier

classifier = classifier.fit(features, labels)  # Find patterns in data

# Making predictions

print (classifier.predict([[120, 1]]))

# Output is 0 for apple

Wrapping up

Selecting the correct algorithm for your machine learning problem is critical for the realization of the results you need.

As a data scientist, you need to know how to differentiate between regression predictive models and classification predictive models so that you can choose the best one for your specific use case.

Do you have any comments or questions?

Please post them below.

About author

About Author Dr. Michael Jurgen Garbade is the founder of LiveEdu.TV, Kyuda, Education Ecosystem. He is future Venture Capitalist, Future Politician and always on the lookout for the Next Big Challenge. Obtained Masters in business administration and physics, and a Ph.D. in finance with professional work experience in high-paced environments at Fortune 500 companies like Amazon and General Electric. Expertize: Python, PHP, Sencha Touch & C++, SEO, Finance, Strategy & E-commerce. He speaks English and German and has worked in the US, Europe, and Asia. At Education Ecosystem he is the CEO and runs business operations.
Related posts
Coding Community

25 NEW THINGS EVERY PYTHON ENGINEER SHOULD KNOW IN 2020

Cryptocurrency

Education Ecosystem to Present and Exhibit at the SXSW Edu Conference

Career DevelopmentContent Creators

Meet Yuzby, An Inspiring Developer From Nigeria

Data Science

R Tutorial for Data Science Part 1