Fork me on GitHub

Thursday, September 21, 2017

A gentle introduction to CNN.. [Part 1]

No, not the news channel.
Lets talk about Convolutional Neural Networks CNNs also known as ConvNets. You probably should care because they power a considerable portion of your apps, whether you realize it nor not. Facebook uses it auto tag your photos, Google uses it for more than just image tagging and street sign translation but also to create weird dreams.  Pinterest, Instagram, Amazon you name it, they all employ these networks in one way or another. We know CNNs are state of the art for computer vision because they have produced winners in the infamous Imagenet challenge among other challenges.

facebook auto-tag
Google 'weird' deep dream
Google Image translate




So, what exactly are CNNs?
To start with, they are neural networks(NNs). Modelled after our very own neuronal structure, neural networks are inter-connected processing elements (neurons) which process information by responding to external inputs.

Biological neuron
Artificial Neuron



You can imagine a single neuron as a black box that takes in numerical input and produces output(s) with a linear followed by a non linear activation function. When many neurons are then stacked in a column they form a layer which are interconnected to get neural networks.
Shallow Neural Network of Fully Connected layers



Deep Neural Network



For detailed explanation of NNs, see this post.
As shown above all neurons in each layer are connected to neurons in adjacent layers, making them Fully Connected (FC) layers.

CNNs go a step further, instead of generic neurons, they are modelled after our very own visual cortex. They are not only composed of multiple layers but also different kinds of layers: Input,(IN) Convolutional(CONV), Non-Linear (RELU), Pooling(POOL), Normalization (optional),Fully Connected(FC) and Output (OUT)layers.

Of interest is the Convolutional layer which performs the linear function, convolution.  A convolution neuron is basically a filter that sweeps across the width and height of the input, computing the dot product between itself and input elements in its receptive field producing a 2D activation map. The dot product is computed along all three dimensions of the input: width, height and depth. For raw images the depth is 3 for RGB pixel intensities. When multiple filters stacked in a layer, then each filter produces a different 2D activation map, rendering a 3D output. The CONV layer therefore maps a 3D input to a 3D output which may differ in dimensions.

3D  Convolution


The Non-linear performs a non linear function such as tanh, sigmoid but most commonly ReLu(Rectified Linear Unit). It affects the values but leaves dimensions unchanged.

The Pooling layer performs reduces the spatial size of its input by selecting a value to represent a relatively small region in the input. Common pooling operations include the Maximum, Average and l2-norm. This layer reduces the height and width of the input but does not affect the depth.
A series of  CONV-RELU-POOL layers are normally stacked together to form a bipyramid like structure where the height and weight decreases while the depth increases as the number of filters increases in higher layers.

bi-pyramid structure

 Finally, the network is concluded with FC layers similar to traditional neural networks.

full-stack


Despite their complication, CNN have competitive advantage over traditional neural networks. This comes as result of their peculiar features, some of which I have outlined below.


  1. Convolution filters can be thought of as FC neurons that share parameters across all filter-sized portions of the input. This is in contrast to FC neurons where each input feature would require a separate parameter. This leads to less memory requirement for parameter storage. Less parameters also reduce the chances of overfitting to training data.
  2. CNNs pick up patterns that result from order of input features. In FC, this order doesn't matter since all features are independent and their locations do not matter. On the other hand, CONV filters, have local receptive field therefore patterns that arise out of proximity or lack of therefore are easily detected. For instance CNNs can easily detect edge patterns or that eyes are close to the nose in faces.
  3. Spatial subsampling (Pooling) ensures insensitivity to size/position/slant variations. This is important images are unstructured data,meaning each pixel doesn't exactly represent a defined feature, a nose pixel in one selfie is most likely in another location in another selfie. A good model stays invariant to these distortions. In CNNs this is achieved by the pooling layer which places importance on the relative rather than absolute position of an image.
  4.  Sparse and locally connected neurons eliminates the need for feature engineering. On training, the network will determine which features are important by allocating appropriate parameters to different locations. Zero-valued parameters imply that the feature is not important. 


CNNs are not just useful to images but also time-series and speech data. They are a hot topic not only in research but in academia as well, this post has barely scratched the surface. Therefore, as the title suggests,there will be a part 2 follow up, where we walk through details of training CNN's from scratch. Thanks for reading !!!



Monday, September 4, 2017

On packaging and shipping tribes.

Image result for shipping packages

 all started with Facebook, the one social media platform I have a love-hate relationship with. I like that I find interesting news and opportunities there but it always comes at a cost. Cats pictures are fun to watch but not very productive. The carefully curated best versions of my friends online is sometimes sad to watch. I recently realized that most the interesting things are shared in groups rather than personal profiles. Figured I needed to see headlines from the groups I mostly care about, to decide whether its worth logging in or not, so I  made a tool for it, enter Tribe. With tribe, all you do is enter the group id and optional start date, end date and data format and out you get the posts in a properly formatted file.
pip install fb-tribe


json magic
csv data
Isn't the facebook web interface much nicer than a csv file? You ask, yes, I've considered a friendlier interface. In the meantime it will remain as minimalist as it stands. Of what use is data in a structured file? Thanks for asking; ask a data science friend the same question and watch their eyes glow. So basically you built a mini-facebook for data scientists and minimalist-disguised hippies. Exactly! I built it in python and as useless as it may sound, it had its challenges.

First off was the scope. Ideally I wanted to able to scrape all groups, but the facebook graph API  allows cli apps to to only scrape open groups. Closed and secret groups require a user token which is only obtained with more secure Web and Mobile apps. Bummer! I settled on a MVP for public groups only.

Having not used Python for a while, I tackled interesting language problems. From mundane problems of importing modules in subpackages to dealing with emojis and Chinese characters in the data. The topic of encoding probably deserves another blogpost but rule of thumb is always explicitly write to files with 'utf-8' encoding. The Python default is charmap encoding which doesn't speak emoji very well.

Equally fun was packaging and uploading my first package to pypi, cause you know , who doesn't like to just 'pip install [package]'. Trial and error and Jamie's blogpost helped me get through most of the issues of package directory structure, defining entry points and console scripts in setup.py. All was well until I couldn't upload the zip file due to some permission issue. After lots of tea and internet consumption, it dawned on me that another package exists called tribe hence why I changed the name to fb-tribe. The package was uploaded, all was well.

As you can tell, I had **so much fun experimenting the facebook Graph API, Python and PyPi. If logging on facebook gives you anxiety,lets be friends , go ahead and 'pip install fb-tribe' from your command line. If you really love it, kindly give it a star (I like stars).  If you find a bug or want to see a new feature feel free to create an issue and if you would to contribute feel to open a pull request.  Yes, the source code is public on github, here.

So what's next?
Glad you asked. Perhaps some semantic analysis with NLP before presenting a post. Perhaps presenting it in a proper web interface, or email service or both, time will tell. In the meantime , I will be getting my facebook updates, the good old fashioned way, like an offline magazine. Thanks for reading and have a great week ahead!


Tuesday, August 22, 2017

Course review: Neural Networks and Deep Learning

If you have been in the Machine Learning space, you know of the visionary and my favorite machine learning scientist Andrew Ng. Not a lot of people can claim the titles of Stanford professor, Coursera cofounder, founder of Google Brain project and chief scientist at Baidu so I have all reasons to look up to him. His latest venture is DeepLearning.ai, a startup for Deep Learning training which has launched a Deep Learning Specialization on Coursera. I'm not exactly a beginner in the field; in addition to doing the Machine Learning course on Coursera, I did other ML courses at school and did my thesis in a ML related project. Thought it be a good idea to do the specialization as a refresher, learn a different approach and advance on to deeper networks. I might have had motivations but none of them was as motivating as this tweet.



I just finished the first course Neural Networks and Deep Learning, I'd like to share a few nuggets.
The specialization is intended for beginners, comfortable with python and linear algebra, intending to create neural network classifiers in 4 weeks. It builds up to Neural Networks from the simple logistic regression classifier. Andrew slowly goes through the concepts of matrix operations, gradient descent and backpropagation from a single neuron to deep neural network. This pace allows one to develop intuition of the concept without feeling overwhelmed. There are also programming assignments for the final 3 weeks leading up to the final task of building a cat image classifier.


I've completed it with a full mark, thanks to the following features that made it a whole lot enjoyable
  • It guides you develop intuitions on neural networks. Although it may sound annoying, Andrew keeps repeating the same equations and concepts in a lot of the videos, and it does really help to stick. Some videos are even dedicated to debugging numpy code as well as matrix equations. 

  • The programming assignments require no setup letting you focus on the concepts. They are done in Jupyter Notebooks which are hosted on Coursera hub. This is a feature I'm sure complete beginners will appreciate since the last thing you want is installation errors.
  • My very favourite feature was heroes of deep learning where he interviews the heroes of deep learning like Ian Goodwell, the innovator of General Adversarial Networks, Geoff Hinton, one of the inventors of backpropagation, who also introduced backprop in word embeddings, Boltzmann machines, deep belief nets among others and finally Pieter Abbeel who has developed algorithms that enable helicopters to do advanced aerobatics. Reading about these heroes is a lot different from actually hearing them talk, I mean, I never would have guessed that Geoff Hinton struggled to get a job and he even tried carpenting while figuring his research interests. They all gave very good advice on how to get into Machine Learning where I was both surprised and delighted that they all emphasized project-based learning  instead swallowing all the papers written to date (which is obviously impossible).
There were times where it felt very repetitive but its a tradeoff I'm willing to take and its also good in the long run. I genuinely enjoyed the course so here is to finishing the remaining 4 courses in the specialization and doing rad deep things! Enjoy the rest of your week!



Sunday, July 9, 2017

What's in your neurons?

Thanks to Tony Stark and Elon Musk, everyone seems to be talking about the AI, Big Data and Machine Learning, even though very few actually understand them thoroughly. To most people AI is a magical tool fuelling Self-Driving Cars, Robots and the ever ubiquitous recommender systems among other magical things. True indeed, most Machine Learning (ML) models are black boxes, trained by copious quantities of data. They are fed with data from which they learn but little is known on they learn and what each unit will learn. . 

Definitely magic
But its not ..



Backed by calculus and linear algebra, Machine Learning is far from magic. Having learnt how the models learn I was curios to interpret what the models actually learnt. 

Support Vector Machine (SVM)
This is no doubt one of the simplest models used for classification.  Given input data with dimensionality D, C different classes of output and X outputs, it uses a score function  to compute scores for different classes and assigns the input to the class with the highest score. 
where W is C*D weight matrix, xi is D*1 input sample vector, and b is C*1 bias vector. The result is a C*1 score vector with a score for each of the C classes; the output will be the class with the highest score. This operation can be visually represented as follows:


Each row i of W can be interpreted the ideal vector  for a corresponding class and the product Wxi as a cross product that measures the similarity between the respective class and the input. The closer they match, the higher the score and vice versa. For Image Classification, proving this is as easy as plotting each row of W of a trained model, as a D-dimensional image. 


Taken from the Stanford course cs231n website, these images reveal what the model what each class should look like. The uncanny similarity between a deer and a bird speaks volumes about the small capabilities of the SVM. Although they are easily trained and interpreted, they don't exactly produce the best results, hence  the need for more sophisticated models; Neural Networks. 

Neural Networks.
Built from SVMs, Neural Networks (NN) are made of SVM-like units called neurons, arranged in one or more layers. Unlike SVMs , neurons have one output computed with a sigmoid like activation function which indicates whether a neurons fires or not. it is due to this complication that Neural Networks are universal approximators, capable of approximating every function, of course at the expense of transparency. 

Folks at Google accepted the challenge and published interesting results in their famous inceptionism. By turning the network upside down, they were able to show what the network learnt about each class. For instance, in the image below, this is what the network learnt about bananas from random noise. 

In addition, they also deduced that the higher up the layer is, the more abstract features it detects. That is , . This was done by "asking" a network layer to enhance what it learnt from the images. Lower layers detected low-level features and patterns like edges and strokes.
Patterns seen by low layers

Higher layers produced more interesting images, things you'd see in your dreams.  




Even more interesting are Recurrent Neural Networks (RNN's), Neural Networks with memory that are used predicting sequences. In his famous blogpost, Andrej Karpathy (a Deep Learning researcher should know ), wrote not only on how effective they are but also how they can be interpreted. Using LSTM variant, he was able to generate interesting texts like Paul Graham's essays, Shakespeare's poems, research papers all the way to the Linux database.. In addition to that, he was able to single out what some neurons detected. Only 5% of the neurons detected clearly defined patterns but these patterns are interesting to say the least. 
This one fired on lines end

This one fired on texts between quotes.


NB:Red indicates higher activation compared to blue.
I hope this is enough to convince you that AI isn't black box magic that computers will use to exterminate mankind, but a very scientific process. It may not be clear what each neuron will learn, but we can control what it will learn using hyper parameters and regularization. and after training, figure out what each has learnt.

No doubt, the original neural network, the brain does something learns in a similar way. Just as how connection weights are strengthened through back propagation, neurons enforce connections that are repeatedly used.  Neuroscientists refute this claim but what is true in both cases is that learning is highly influenced by input data and hypermaterers. In NNs these hyperparameters includes the learning rates and regularization constants while in the brain these hyper parameters are influenced by emotions, curiosity, and frequency of learning. We can't  control all of them but we can definitely control the kind of data we feed our brains, and how often we do so.  On that note, what stuff are you feeding your brain, What fires your neurons?  Have a great week ahead!

Sunday, July 2, 2017

Sorting pancakes with Bill Gates


I've seen this meme all over the internet. We know how comforting when life is simply not working out.  It tends to the notion that you can struggle as a student and still be useful to the world. Only problem the reality is the complete opposite. Bill Gates, was far from a sloppy student, he was the type of engaged student, who read and thought a lot. Infact as an undergrad student, he published an academic paper that for 30 years remained the state of the art solution to a very fundamental problem. Although the title 'Bounds for sorting by Prefix reversal'[1] sounds boring and academic, its about pancakes, so grab some pancakes, lets talk about pancakes.

What exactly is the problem with pancakes? None, such a sweet and comforting food source is far from a nuance. The problem is with the chef who like me, likes to throw things around, with little attention to aesthetics. Pancakes come in different sizes and most people agree they look better when sorted with the biggest at the bottom. Given an arbitrary order produced by our chief, produce pleasant looking stack with the minimum number of flips. As easy it sounds, it is still an open Computer Science problem, since the only move allowed is reversing a sequence of pancakes from the top.

When Prof Harry Lewis casually posed this question to his undergrad class, 40 years ago, he wasn't expecting any breakthrough. That was until one William Gates brought him a solution 2 days later. Previously, it was known that the solution is at most 2(n-1) because the lazy approach is start by flipping the biggest to the bottom ( at most 2 moves) and iteratively do so, for the next biggest until everything is sorted. Like all brute force algorithms, it  works but doesn't scale, imagine a very hungry customer orders 1000 pancakes. He'll definitely starve. Good guy Bill proposed  an algorithm with an upper bound that is substantially better.

Pancakes are represented as a permutation of numbers reflecting their sizes. If the two neighbors in the permutation differ by 1, the pair is called an adjacency. A block is a series of consecutive adjacencies and an element that is not a block is a free element. The perfectly sorted permutation 123...n has n-1 adjacencies and one block, others have fewer adjacencies and more blocks.  The algorithm Bill proposed increases adjacencies by designing minimum flips for all possible configurations of the first element of the permutation with respect to its neighbors.For each configuration different kinds of flips are performed increasing the number of adjacencies by at least one and changing the number of blocks differently. The result is a linear programming problem with the objective of maximizing the number of flips while constrained by the number of adjacencies and blocks. Using the duality theorem of Von Neuman, Kuhn, Tucker and Gale, the upper bound on the number of flips is established to be of the order five thirds (actually (5n-5)/3)of the number of elements in the permutation. As if a 1.2 times speedup is not enough, the paper goes on to establish the lower bound as well as the bounds for the restricted case where pancakes not only have to be sorted but also sorted the right side up.

Sorting pancakes may not sound like a ground breaking problem but it is the same problem encountered when establishing gene similarities between species. For 30 years, Gates' algorithm remained state of the art until folks at University of Texas, produced a marginal improvement [2], utilizing automation made possible by the same Bill Gates.

I have always known that it takes a lot more than quitting Harvard to revolutionize the tech industry, or at the very least launch a successful startup. Now I know that uncertainity fused with bountiful curiosity  is a major ingredient of the recipe. Bill Gates didn't foresee anything, he simply solved a problem that interested him, building his confidence to solve bigger problems. If you are confused on your purpose in life, hang in there, savor the feeling, its building up to something. In the meantime, get even more inspired with Gates Notes and have a great weekend ahead.

References:
[1] Bounds for sorting by Prefix reversal  by Gates,W and H, Papadimitriou
[2] An 18n/11 upper bound for sorting by prefix reversals by B. Chitturi, W. Fahle, Z. Meng, L. Morales, C.O. Shields, I.H. Sudborough , W. Voit

Saturday, June 24, 2017

Learning ang growing.

Aloha!



Hello from the other side, the side where student discounts and free wifi no longer apply. As hard as it has been for me to accept it, I have come or at least try to embrace adulthood and all the perks and challenges it brings.
Speaking of perks is the void that is evenings and weekends disguised as free time. In my attempts to fill this void by consuming the internet, I stumbled across the Coursera Learning How to Learn course by  Dr. Barbara Oakley and Dr. Terrence Sejnowski. I was so hooked that I vowed to actually finish it. Well, I did sorta, I followed on until the part where Barbara mentioned something about 'if something doesn't excite you anymore, let it go'. Actually, she may or may not have mentioned it but I do remember not enjoying it anymore. However, I did learn cool things that I'd like to share.

  • The difference between diffuse vs focused thinking, and how they compliment each other. Interesting is diffused thinking which as opposed to focused thinking, happens when you are not actively concetrating on the subject, for instance when chatting or watching related videos. 
  • A few good hours of focused thinking a day over a period beats trying to cram everything in a day and the stress that comes with it. 
  • The importance of REST and SLEEP. (Bold for a reason)
  • The importance of running. Okay, I'm pretty sure Dr. Terrence mentioned the importance of sports, but he's a runner so that's all I heard. 
  • How to procrastinate productively. Procrastination is inevitable and everyone does it. Just allow for it after the important bits are done. 
  • The importance of feedback. If you ever find yourself lacking motivation for a task, feedback, negative or positive will probably solve your problems. 
  • Barbara is a super awesome human being. she casually transitioned from linguistics to Industrial System Engineering, hosted a radio in Antarctica and runs a cool newsletter that I immediately subscribed to. Check it out here.  
The course touched on productivity and learning, I dare say the core of my existence. I'll get back to it one day, but like Bill Gates My attention was captured by another interesting course - Computer Vision course on Udacity. The idea of being able to detect cats from pictures and directing self driving cars sounded very appealing at the time. Aside from laughing at the hilarious jokes of Dr. Aaron Bobick, I learnt a thing or two.
  • Computer Vision(CV) != Image Processing != Computational Photography, even though they do have lots of overlap. While Image Processing manipulates images to create new images, Computer Vision extracts models and features from images. Moreover, Computational Photography is sort of the inverse of Computer Vision as it involves rendering good images of objects. 
  • Linear Algebra, Cordinate Geometry is very handy to CV, it's worth refreshing all these concepts before starting the course. Images are after all just matrices.
  • Homography, Stereo, Filtering, Hough Transform, Perspective Projection and all the new jargon I can now speak. 
  • Image and Object Recognition can be done with CV or ML techniques. CV is faster and less hacky but probably less accurate than deep learning.
  • The course is very long graduate level  (900 videos of content), ideally it is to be done in 4 months and I got through 2/3 of it in a month. I'm sure Barbara approves the fact that I need a break from it.
That's exactly what I did, not that I intended but I came across a Computational Bioinformatics course on Stepik that pressed the all the right buttons.
  • Its free
  • A combination of Math, Biology (especially Genetics) and Programming is very to resist.
  • Very easy to follow for people for people without Biology experience. 
  • It is text based, no video required. May not seem like a plus but in the land of South Africa where data is not cheap, it means everything.
  • It has easy coding challenges that I looked forward to solving. Check out my repo. Although I could get away with Java, I wish I implemented them in a functional language. Oh yeah, there goes my next challenge. I will definitely write about it.
  • Its free , need I say more?
Ladies and Gentlemen, I would like to take this opportunity to announce that I finished it (in a week!) -- Round of applause please, Even though it only covers one chapter, I'm thirsty for more, so the next step is as you might expect, enrolling in a proper Course, assuming my short lived attention span is not captivated elsewhere.
Considering that I'm working at a very fast paced company, with a steep learning curve ,you might wonder why bother? I happen to be a very curious character in a very fast paced, fascinating field called Tech. There is always something new to learn, a constant reminder to stay humble. Plus, I very much see myself going back to school for further studies, by then I'll know exactly what excites me.
So here's to learning and hopefully making our lives better.

What are you currently learning? Please tweet me. I swear I'll tweet back.

Have a great weekend!