Fork me on GitHub

Sunday, July 9, 2017

What's in your neurons?

Thanks to Tony Stark and Elon Musk, everyone seems to be talking about the AI, Big Data and Machine Learning, even though very few actually understand them thoroughly. To most people AI is a magical tool fuelling Self-Driving Cars, Robots and the ever ubiquitous recommender systems among other magical things. True indeed, most Machine Learning (ML) models are black boxes, trained by copious quantities of data. They are fed with data from which they learn but little is known on they learn and what each unit will learn. . 

Definitely magic
But its not ..



Backed by calculus and linear algebra, Machine Learning is far from magic. Having learnt how the models learn I was curios to interpret what the models actually learnt. 

Support Vector Machine (SVM)
This is no doubt one of the simplest models used for classification.  Given input data with dimensionality D, C different classes of output and X outputs, it uses a score function  to compute scores for different classes and assigns the input to the class with the highest score. 
where W is C*D weight matrix, xi is D*1 input sample vector, and b is C*1 bias vector. The result is a C*1 score vector with a score for each of the C classes; the output will be the class with the highest score. This operation can be visually represented as follows:


Each row i of W can be interpreted the ideal vector  for a corresponding class and the product Wxi as a cross product that measures the similarity between the respective class and the input. The closer they match, the higher the score and vice versa. For Image Classification, proving this is as easy as plotting each row of W of a trained model, as a D-dimensional image. 


Taken from the Stanford course cs231n website, these images reveal what the model what each class should look like. The uncanny similarity between a deer and a bird speaks volumes about the small capabilities of the SVM. Although they are easily trained and interpreted, they don't exactly produce the best results, hence  the need for more sophisticated models; Neural Networks. 

Neural Networks.
Built from SVMs, Neural Networks (NN) are made of SVM-like units called neurons, arranged in one or more layers. Unlike SVMs , neurons have one output computed with a sigmoid like activation function which indicates whether a neurons fires or not. it is due to this complication that Neural Networks are universal approximators, capable of approximating every function, of course at the expense of transparency. 

Folks at Google accepted the challenge and published interesting results in their famous inceptionism. By turning the network upside down, they were able to show what the network learnt about each class. For instance, in the image below, this is what the network learnt about bananas from random noise. 

In addition, they also deduced that the higher up the layer is, the more abstract features it detects. That is , . This was done by "asking" a network layer to enhance what it learnt from the images. Lower layers detected low-level features and patterns like edges and strokes.
Patterns seen by low layers

Higher layers produced more interesting images, things you'd see in your dreams.  




Even more interesting are Recurrent Neural Networks (RNN's), Neural Networks with memory that are used predicting sequences. In his famous blogpost, Andrej Karpathy (a Deep Learning researcher should know ), wrote not only on how effective they are but also how they can be interpreted. Using LSTM variant, he was able to generate interesting texts like Paul Graham's essays, Shakespeare's poems, research papers all the way to the Linux database.. In addition to that, he was able to single out what some neurons detected. Only 5% of the neurons detected clearly defined patterns but these patterns are interesting to say the least. 
This one fired on lines end

This one fired on texts between quotes.


NB:Red indicates higher activation compared to blue.
I hope this is enough to convince you that AI isn't black box magic that computers will use to exterminate mankind, but a very scientific process. It may not be clear what each neuron will learn, but we can control what it will learn using hyper parameters and regularization. and after training, figure out what each has learnt.

No doubt, the original neural network, the brain does something learns in a similar way. Just as how connection weights are strengthened through back propagation, neurons enforce connections that are repeatedly used.  Neuroscientists refute this claim but what is true in both cases is that learning is highly influenced by input data and hypermaterers. In NNs these hyperparameters includes the learning rates and regularization constants while in the brain these hyper parameters are influenced by emotions, curiosity, and frequency of learning. We can't  control all of them but we can definitely control the kind of data we feed our brains, and how often we do so.  On that note, what stuff are you feeding your brain, What fires your neurons?  Have a great week ahead!

Sunday, July 2, 2017

Sorting pancakes with Bill Gates


I've seen this meme all over the internet. We know how comforting when life is simply not working out.  It tends to the notion that you can struggle as a student and still be useful to the world. Only problem the reality is the complete opposite. Bill Gates, was far from a sloppy student, he was the type of engaged student, who read and thought a lot. Infact as an undergrad student, he published an academic paper that for 30 years remained the state of the art solution to a very fundamental problem. Although the title 'Bounds for sorting by Prefix reversal'[1] sounds boring and academic, its about pancakes, so grab some pancakes, lets talk about pancakes.

What exactly is the problem with pancakes? None, such a sweet and comforting food source is far from a nuance. The problem is with the chef who like me, likes to throw things around, with little attention to aesthetics. Pancakes come in different sizes and most people agree they look better when sorted with the biggest at the bottom. Given an arbitrary order produced by our chief, produce pleasant looking stack with the minimum number of flips. As easy it sounds, it is still an open Computer Science problem, since the only move allowed is reversing a sequence of pancakes from the top.

When Prof Harry Lewis casually posed this question to his undergrad class, 40 years ago, he wasn't expecting any breakthrough. That was until one William Gates brought him a solution 2 days later. Previously, it was known that the solution is at most 2(n-1) because the lazy approach is start by flipping the biggest to the bottom ( at most 2 moves) and iteratively do so, for the next biggest until everything is sorted. Like all brute force algorithms, it  works but doesn't scale, imagine a very hungry customer orders 1000 pancakes. He'll definitely starve. Good guy Bill proposed  an algorithm with an upper bound that is substantially better.

Pancakes are represented as a permutation of numbers reflecting their sizes. If the two neighbors in the permutation differ by 1, the pair is called an adjacency. A block is a series of consecutive adjacencies and an element that is not a block is a free element. The perfectly sorted permutation 123...n has n-1 adjacencies and one block, others have fewer adjacencies and more blocks.  The algorithm Bill proposed increases adjacencies by designing minimum flips for all possible configurations of the first element of the permutation with respect to its neighbors.For each configuration different kinds of flips are performed increasing the number of adjacencies by at least one and changing the number of blocks differently. The result is a linear programming problem with the objective of maximizing the number of flips while constrained by the number of adjacencies and blocks. Using the duality theorem of Von Neuman, Kuhn, Tucker and Gale, the upper bound on the number of flips is established to be of the order five thirds (actually (5n-5)/3)of the number of elements in the permutation. As if a 1.2 times speedup is not enough, the paper goes on to establish the lower bound as well as the bounds for the restricted case where pancakes not only have to be sorted but also sorted the right side up.

Sorting pancakes may not sound like a ground breaking problem but it is the same problem encountered when establishing gene similarities between species. For 30 years, Gates' algorithm remained state of the art until folks at University of Texas, produced a marginal improvement [2], utilizing automation made possible by the same Bill Gates.

I have always known that it takes a lot more than quitting Harvard to revolutionize the tech industry, or at the very least launch a successful startup. Now I know that uncertainity fused with bountiful curiosity  is a major ingredient of the recipe. Bill Gates didn't foresee anything, he simply solved a problem that interested him, building his confidence to solve bigger problems. If you are confused on your purpose in life, hang in there, savor the feeling, its building up to something. In the meantime, get even more inspired with Gates Notes and have a great weekend ahead.

References:
[1] Bounds for sorting by Prefix reversal  by Gates,W and H, Papadimitriou
[2] An 18n/11 upper bound for sorting by prefix reversals by B. Chitturi, W. Fahle, Z. Meng, L. Morales, C.O. Shields, I.H. Sudborough , W. Voit