Sunday, April 8, 2018

The bias variance tradeoff

This page is easily one of the best explanations of bias, variance and the tradeoff between them!

Saturday, March 17, 2018

A gentle introduction to Gradient Descent

A gentle introduction to Gradient Descent

One way to think about machine learning is as a series of “Optimization” problems i.e. finding the minimum or maximum of a mathematical function (often called a loss function). Gradient descent is a workhorse for optimization and is often employed in neural networks.

Imagine that you are hiking through a landscape that looks like the Chocolate Hills of the Philippines and that your objective is to get to the lowest point amongst these hills.

But wait a second! What does this have to do with machine learning? In fact, it is so fundamental, that it is easy to miss the intuition! Let’s say you are trying to build a predictive model. The way you would build the perfect model is by comparing your prediction against the actual data and constructing what is known as a loss function. Simplistically, that could be :-
Loss function = Your prediction – Actual data

Optimization in this scenario would be to find the model that reduces the error between your prediction and the actual data. Said in other words, optimization aims to find the model that provides the lowest possible value of the loss function. This hilly landscape can be imagined as the error for various models and finding the lowest point in the landscape corresponds to the perfect model!

The ‘objective’ function you are trying to optimize during the hike is your height above sea level. The parameters of this function, known as the objective, are your latitude and longitude. Gradient descent offers a set of techniques to iteratively locate the lowest (or highest) point. Using gradient descent, you can figure out which direction to take your steps in and how long each step should be.

Say that you’ve stopped to take in the beautiful view of the hills and you stretch one foot out in every direction around you to feel the ground. The tilt of your feet as it lands on the ground is the gradient or the slope. This is exactly the ‘gradient’ in gradient descent too!
In logical terms, all of gradient descent can be encapsulated within the following few lines:-
Set initial random parameters (say latitude, longitude = 43N,45W)
Calculate height above sea level at these initial parameters (objective function)
Repeat until change in objective function is very small or you’re tired of walking (number of steps has reached a large value):
          Compute gradient with given parameter values (lat, long)
         New parameters = Old parameters – Learning rate * Gradient
         Calculate height above sea level (objective function) at these new parameters (new objective function)
         Is the change in your height above sea level very small? If so, then stop

The learning rate used above determines how fast you arrive at a solution (whether optimal or not). A large learning rate would imply that the length of each step you take during your hike is large. You would ‘learn’ quickly with a large learning rate, but you also risk missing the lowest point in the hill, because you may have just stepped over it. A smaller learning rate could be exhaustive, but might make for a very slow process. Imagine a 6 ft human being hiking through these hills versus an 18ft troll. The human being might cover the entire hilly landscape thoroughly but take months, while a troll might miss the lowest point, but take only a few hours. 

There are three variants in gradient descent and they all differ in what information you use to determine your direction of travel and length of step.
  1. Batch gradient descent
    The gradient in batch gradient descent is calculated using the entire available training data set. Although this ensures that for convex landscapes a global minima will always be found, you can imagine it being very slow. Before every step you (the hiker) takes, you compute the gradient based on all available data.
  2. Stochastic gradient descent
    In this case, before every step that you, as the hiker, take, you stochastically (or randomly) select a single training data point and use it to compute the gradient. This can lead to a lot of variation and less evenly taken steps compared to batch gradient descent since now, the new step depends on a randomly chosen data point. The nice thing though is that it is much faster than the batch method.
  3. Mini-batch gradient descent
    This is the chicago blend popcorn of the gradient descent world! It uses small batches of training examples to compute the gradient (not all like in batch gradient descent and not one like in stochastic gradient descent). A combination of stability of parameter updates as well as manageable computation time makes this technique an attractive option when choosing gradient descent for optimization problems.

Tuesday, April 19, 2016

Friday, September 19, 2014

Freshly baked slackware

I typically run into three problems with a fresh install of slackware (all of two times!!)
1. Anyone (other than root) out there?
useradd <myname>
passwd <myname>
Typically, I leave my home is installed on a separate unperturbed partition, so to link it up, I simply re-add my old username.

2. Volume volume people!
Run alsamixer and turn on your speakers :)

3. Wireless networking
Without preamble...
This is not my work...only a link to the actual article and its printed form

Friday, November 15, 2013

Handy bash and vim tips that I've used from time to time

1. Copying files while retaining their directory structure
find ~dir1/ -name "*.mat" | xargs -I{} cp {} ~/<destination>
The directory structure below ~dir1/ where the *.mat files are found are copied to <destination> retaining the structure
2. Scp-ing only a set of files from multiple directories (using find and then scp-ing those files out retaining directory structure
find -name "*.mat" -print0 | tar --null --no-recursion -czf archive.tar.gz --files-from -

1. To comment or uncomment a block of code
ctrl+v --> select your block of code --> :norm I# (inserts a # in front of each line) or :norm ^x (removes the first character from each line in the selected block)

Thursday, October 31, 2013

Installing java on your linux machine without removing the native version it comes with

I like to have all my software installations in a single place so I know exactly what state my machine is in by glancing at a single directory, say ~/Software
My ancient CentOs setup doesn't have jdk 1.7 in its repo, so instead of fooling around with alternatives <shudder>, I downloaded the jdk version I wanted and put it into said ~/Software directory.

Here's a simple way to go about changing the default version of java that your system picks up without having to uninstall the native java that it comes with.

1. Switch to su
2. Create a file,say in /etc/profile.d/ containing :


3. Save out the file, and logout and log back in to your machine.
4. To test out your java version, run
java -version
5. To see which executable will be picked up, run
which java

Wednesday, September 25, 2013

Installing 32 bit java on my 64 bit CentOs6 machine

This is a right old pain to do, so unless you really need to, I would say find a way to stick to the native 64 bit jre/jdk install.
As of this writing, the latest jdk version available is 1.7.0_40

A quick listing of the steps I followed are below. The commands are italicised.

1. yum erase jdk (to remove the existing 64 bit java devel environment)

2. Download the x86 version of jdk (the rpm), say jdk-7u40-linux-i586.rpm

3. yum provides
    yum install libgcc-4.4.7-3.el6.i686 
(which is what came up as the reqd package from the command above)

4. yum install jdk-7u40-linux-i586.rpm

5. Check your installation with the usual 
    java -version

In case you want both the 64 and 32 bit versions of java installed, you'll have to muck about with alternatives, which I happily chose to avoid!