Skip to main content

Supervised Learning using Python

Simple Linear
Regression Using
Python

The very basics of Machine Learning starts with
understanding Simple Linear Regression and
implementing it in a programing language.
Python being already equipped with so many
statistical computing and analysis library makes
this task way easier.

Let's learn how can we use python to predict some
values using SLR . I am using Spyder as IDE.

We will be using a test set containing salaries and
years of experience. Like this:-


YearsExperience
Salary
1.1
39343
1.3
46205
1.5
37731
2
43525
2.2
39891
2.9
56642
3
60150
3.2
54445
3.2
64445


The dataset has 30 records and can be found in
the link

Once done. You need to import this dataset into
python for which you can use the class Pandas

Import it by writing

import pandas as pd

You now need to obtain the import the dataset.

dataset = pd.read_csv('Salary_Data.csv')

Ensure that the file is in the same directory you are
saving your code in. If not mention the path of the
file in read_csv() function.

Now, you have the complete data imported in the
dataset variable and it will appear somewhat like

But, you want the values of X(Years of experience)
and Y(salary) in separate variable.. What to do???

Simple!!! Just slice it using iloc

X = dataset.iloc[:, 0].values
y = dataset.iloc[:, 1].values

Where the first parameter in iloc denotes the
number of records we are taking.
A ‘:’ colon means take all rows.
The second parameter indicates the column to take.
Since, we want to need the first column in variable X, 0 is used.

Once done you can see the values of X and Y
containing Years of experience and Salary respectively.



I hope it has been easy to understand the concepts
so far.
Now let’s dive a little deep and divide our data into
test set and training set.
Why do we need to have a training set?
It is because once you conclude a best fit line using
the training set you must have some data to check
how appropriate your estimation is.

Splitting data into test set and
training set.

In order to split the data into random test and training set,
a function train_test_split  from the sklearn library is used.
It could be imported like this...

from sklearn.model_selection import train_test_split

It returns lists of X_training, X_test, y_training,y_test
in the following order.

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=1/3,random_state=0)

Lets understand the parameters of train_test_split()

  • First parameter X indicates the list of all X values(Years of Experience).
  • Second parameter y indicates the list of all y values(Salary).
  • Thirds Parameter  indicates the fraction in which the test and training data must be split. Here the set of 30 data will be split like X_test=(⅓)*30 and y_test=(⅓)*30
While, X_train and y_train will have the remaining part which is  X_train=(⅔) *30 and y_test=(⅔)*30.
  • Fourth parameter indicates the randomness of the training and test set obtained. If it is set high, then for each run it will pick a new set of training and test values giving a new prediction each time of the run.

You can check the rest and training data underneath.
Fine, as of now..
Now its time to visualize the data, For serving the purpose we need to import LinearRegression.

from sklearn.linear_model import LinearRegression

Once imported just create an object of it using
regressor=LinearRegression()

And the fit the Training values into it using fit function.
fitting is equal to training.
Then, after it is trained, the model can be used to make predictions, usually with a .predict() method call.

regressor.fit(X_train,y_train)

y_pred=regressor.predict(X_test)

This predicts the values of Y based on X_test values.



On the left the predicted values of salaries exists
while on the right the actual values Salary from test
data set exists. You can compare y_pred with y_test
you will find the that model has given a very close predictions for the X_test values.


Finally, we have come to the point where we will plot and see how the data looks like.

Visualizing data

We have a library matplotlib.pyplot which helps in visualizing the data we have.

In order to plot the the Training point simply use the function scatter()

plt.scatter(X_train,y_train,color=’red’)
plt.show()

Will show the scattered points for the coordinates.


If you need to plot the best fit line you need to consider
the values of X_train along with the predicted values of
Y for X_train  because prediction is just estimating the values of the salaries(Y) for the years of experience(X)
. I.e regressor.predict(X_train)

So the function will be kinda

plt.plot(X_train,regressor.predict(X_train),color=’blue’)
plt.show()

Now, if we see the line along with the scattered plots
we may find the varing Y value for a X_train value.


If you want to add title, label for X axis and Label for Y axis.
You can do it using.

plt.xlabel(“Years of Experience”)
plt.ylabel(“Salary”)
plt.title(“Salary VS Experience”)


Now, if we plot the same best fit line along with the
test values we can visualize how close is our prediction
with respect to the y_test values for a respective X_test.

Just replace plt.scatter(X_train,y_train,color='red')  
in the above lines with plt.scatter(X_test,y_test,color='red')


Just see how well the best fit line is!
The test datas are very close and some even falls on the best fit line.
So, this is how you can predict values using simple linear regression in python.

Now let’s try it out on my Experience.. ;p
I have 1.42 years of experience which is almost 1 Year 5 months.
So, what salary shall I expect in my current
organization now or what should I expect if I switch my company now.

y_p=regressor.predict(1.42)

The output I received is



So, this is the salary that one could expect with 1.42 years of experience. 



Hope, you enjoyed reading the blog. For any doubt or query comment .

Comments

  1. This topic on artificial intelligence is very much exciting, and the article is also so amazing. When we use deep learning as a service , it helps engineers to create a new and exciting product or technology in this field.

    ReplyDelete

Post a Comment

Popular posts from this blog

KPMG Interview Experience

Hello Readers, This blog is about my interview experience with KPMG India. I have covered the intricacies of the interview process along with the questions asked. I must highlight my profile to you guys so that you get the context of the interview. I had 1 year of experience working as a web developer and had worked as an ETL developer at cognizant for 3 years. During my stint, I got to work on tools and technologies like Informatica Powercenter, Informatica B2B Dx, Informatica MFT, Splunk, Python, Shell Scripting, PHP, MongoDB, Laravel, NodeJS, and SQL.  So, let me begin. I have divided it into parts so that it gets easy to comprehend the sections separately.  The Application. I have been applying for jobs since September 2020. In the beginning, I hardly received any calls for the interview. I applied using Naukri and even got a Linkedin premium account. I must admit that most of the calls that I received later were via my applications over LinkedIn but during the initial per...

Interview at Servian

 Hi All, I am back again with another wonderful blog sharing my experience about my interview at one of the very promising Australia Based Company, Servian. I happen to know about it from linkedin. After a good deal of research and seeing reviews from Glassdoor I finally applied for it.  the JD was for ETL Developer. At the very begining I received a call from their HR who was based on Bangalore. He asked me about my current job role and my relocation preference. They have been hiring for just Banagalore location so my saying that "I don't have any specific location preference did help!". I was asked about my current and expected CTC and then followed a question to justify such a hike :p Machine Round I was finally considered for a machine round and the interview was supposed to be scheduled once I clear this round. There were 15 Questions and the allocated time was 1 hour. The difficulty was not much. Question based on joins, aggregator and group by were there. I felt th...

Starting with Amazon Athena

Hi All! Its long since I have offered my readers something  technical to go through. The wait is over and its really worth it. We will dive into  cloud  today. I will be explaining you in details how can you use one of the popular service of AWS known as  Athena . The blog is written keeping in mind that you have little or no idea about any AWS service. Albeit, a tad bit of knowledge on  S3  service will help you to understand the blog better. Lets take an use case before proceeding ahead. Suppose we have a csv file that contains some data into it. lets say it has details of Employees in a company. I have another csv file that contains details of the department of a company. Now, I wish to do some operations that needs data from the employee file and department file to be joined. e.g to get max salary of employee in every department etc. If we had the same data in 2 tables in a database, things had been simple. right? We could have just joined both ...