Skip to main content

Supervised Learning using Python

Simple Linear
Regression Using
Python

The very basics of Machine Learning starts with
understanding Simple Linear Regression and
implementing it in a programing language.
Python being already equipped with so many
statistical computing and analysis library makes
this task way easier.

Let's learn how can we use python to predict some
values using SLR . I am using Spyder as IDE.

We will be using a test set containing salaries and
years of experience. Like this:-


YearsExperience
Salary
1.1
39343
1.3
46205
1.5
37731
2
43525
2.2
39891
2.9
56642
3
60150
3.2
54445
3.2
64445


The dataset has 30 records and can be found in
the link

Once done. You need to import this dataset into
python for which you can use the class Pandas

Import it by writing

import pandas as pd

You now need to obtain the import the dataset.

dataset = pd.read_csv('Salary_Data.csv')

Ensure that the file is in the same directory you are
saving your code in. If not mention the path of the
file in read_csv() function.

Now, you have the complete data imported in the
dataset variable and it will appear somewhat like

But, you want the values of X(Years of experience)
and Y(salary) in separate variable.. What to do???

Simple!!! Just slice it using iloc

X = dataset.iloc[:, 0].values
y = dataset.iloc[:, 1].values

Where the first parameter in iloc denotes the
number of records we are taking.
A ‘:’ colon means take all rows.
The second parameter indicates the column to take.
Since, we want to need the first column in variable X, 0 is used.

Once done you can see the values of X and Y
containing Years of experience and Salary respectively.



I hope it has been easy to understand the concepts
so far.
Now let’s dive a little deep and divide our data into
test set and training set.
Why do we need to have a training set?
It is because once you conclude a best fit line using
the training set you must have some data to check
how appropriate your estimation is.

Splitting data into test set and
training set.

In order to split the data into random test and training set,
a function train_test_split  from the sklearn library is used.
It could be imported like this...

from sklearn.model_selection import train_test_split

It returns lists of X_training, X_test, y_training,y_test
in the following order.

X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=1/3,random_state=0)

Lets understand the parameters of train_test_split()

  • First parameter X indicates the list of all X values(Years of Experience).
  • Second parameter y indicates the list of all y values(Salary).
  • Thirds Parameter  indicates the fraction in which the test and training data must be split. Here the set of 30 data will be split like X_test=(⅓)*30 and y_test=(⅓)*30
While, X_train and y_train will have the remaining part which is  X_train=(⅔) *30 and y_test=(⅔)*30.
  • Fourth parameter indicates the randomness of the training and test set obtained. If it is set high, then for each run it will pick a new set of training and test values giving a new prediction each time of the run.

You can check the rest and training data underneath.
Fine, as of now..
Now its time to visualize the data, For serving the purpose we need to import LinearRegression.

from sklearn.linear_model import LinearRegression

Once imported just create an object of it using
regressor=LinearRegression()

And the fit the Training values into it using fit function.
fitting is equal to training.
Then, after it is trained, the model can be used to make predictions, usually with a .predict() method call.

regressor.fit(X_train,y_train)

y_pred=regressor.predict(X_test)

This predicts the values of Y based on X_test values.



On the left the predicted values of salaries exists
while on the right the actual values Salary from test
data set exists. You can compare y_pred with y_test
you will find the that model has given a very close predictions for the X_test values.


Finally, we have come to the point where we will plot and see how the data looks like.

Visualizing data

We have a library matplotlib.pyplot which helps in visualizing the data we have.

In order to plot the the Training point simply use the function scatter()

plt.scatter(X_train,y_train,color=’red’)
plt.show()

Will show the scattered points for the coordinates.


If you need to plot the best fit line you need to consider
the values of X_train along with the predicted values of
Y for X_train  because prediction is just estimating the values of the salaries(Y) for the years of experience(X)
. I.e regressor.predict(X_train)

So the function will be kinda

plt.plot(X_train,regressor.predict(X_train),color=’blue’)
plt.show()

Now, if we see the line along with the scattered plots
we may find the varing Y value for a X_train value.


If you want to add title, label for X axis and Label for Y axis.
You can do it using.

plt.xlabel(“Years of Experience”)
plt.ylabel(“Salary”)
plt.title(“Salary VS Experience”)


Now, if we plot the same best fit line along with the
test values we can visualize how close is our prediction
with respect to the y_test values for a respective X_test.

Just replace plt.scatter(X_train,y_train,color='red')  
in the above lines with plt.scatter(X_test,y_test,color='red')


Just see how well the best fit line is!
The test datas are very close and some even falls on the best fit line.
So, this is how you can predict values using simple linear regression in python.

Now let’s try it out on my Experience.. ;p
I have 1.42 years of experience which is almost 1 Year 5 months.
So, what salary shall I expect in my current
organization now or what should I expect if I switch my company now.

y_p=regressor.predict(1.42)

The output I received is



So, this is the salary that one could expect with 1.42 years of experience. 



Hope, you enjoyed reading the blog. For any doubt or query comment .

Comments

  1. This topic on artificial intelligence is very much exciting, and the article is also so amazing. When we use deep learning as a service , it helps engineers to create a new and exciting product or technology in this field.

    ReplyDelete

Post a Comment

Popular posts from this blog

KPMG Interview Experience

Hello Readers, This blog is about my interview experience with KPMG India. I have covered the intricacies of the interview process along with the questions asked. I must highlight my profile to you guys so that you get the context of the interview. I had 1 year of experience working as a web developer and had worked as an ETL developer at cognizant for 3 years. During my stint, I got to work on tools and technologies like Informatica Powercenter, Informatica B2B Dx, Informatica MFT, Splunk, Python, Shell Scripting, PHP, MongoDB, Laravel, NodeJS, and SQL.  So, let me begin. I have divided it into parts so that it gets easy to comprehend the sections separately.  The Application. I have been applying for jobs since September 2020. In the beginning, I hardly received any calls for the interview. I applied using Naukri and even got a Linkedin premium account. I must admit that most of the calls that I received later were via my applications over LinkedIn but during the initial per...

Cognizant Interview Experience

Hi all, I am back again with a new interview experience with cognizant. The interview was held on 11th March 18, at Unitech, Kolkata …. The candidates who cleared an online assessment by amcat were called for the interview. The assessment was on quantitative, verbal and non verbal questions. Also, there was a coding test of one hour that comprised of two programs. The problems were quite simple based on loops, arrays and functions. However, it must not be taken lightly because some Of the problems were quite challenging. I insists one to solve some questions from hackerank before you sit for the automata round. Let's move on to the interview experience now. My reporting time was 11:00 am and I reached there on time with all the asked documents. The amcat team was present there to assess and check the documents. I must highlight that they are very strict with the documents, so don't miss out carrying any of the document they have asked for. Each of the cand...

Technical Interview at Josh Technology

Hi, Its Suraj Jeswara, a Computer Engineer. I had my technical interview under Josh technologies on 29.03.17 conducted by Amcat at BP Poddar Institute of Management & Technology . The technical interview was basically comprised of an online and a written test . The test was far more challenging than anticipated at the beginning. The online test was comprised of quantitative aptitude, computer programming and automata . The quants portion was kinda the one we encounter in Amcat examination. Easy but time crux do exists. In automata section only 2 questions were to be done in 45. Seems pretty easy but I vouch if you are not in flow with coding you can not solve a single one. I could do a pattern problem. Question : Print the following pattern 1 3*2 4*5*6 10*9*8*7 11*12*13*14*15 For n rows… we were suppose to run and compile it against the test cases provided. The 2nd one was on linked list and unfortunetly I couldn’t dispose it on time. The computer prog...