Perceptrons, Sigmoid, Oh My!

So one of my takeaways from last weekend's Women Techmaker Summit 2017 was this great resource on neutral networks and deep learning - Neural Networks and Deep Learning. Machine learning, big data and artificial intelligence are hot topics currently in the technology world. There are a lot of online and offline resources out there. What I most like about neutral networks and deep learning is that its

a free online book and accessible to anyone with a computer and Internet access (free is almost always a good thing!)
the conversational writing style of the author Michael Nielsen
Nielsen's philosophy to build on the core concepts and theory rather than become a Jack or Jill of all trades and learn some long list of concepts. He wants the reader to build and develop a solid (and deep) understanding of key concepts. I love the analogy he keeps in the introduction on learning the core concepts of a programming languages. Once you know the fundamentals, you can pick up different libraries and frameworks in no time! You just need to gain expertise in the core material. There's that word again - core!

I'm making my way through chapter 1 currently. So far I've enlightened about the following

Perceptrons
Sigmoid Neutrons

Perceptrons

So the buzzwords are Binary, Perceptron, Neuron, Weights. Good stuff.

Binary means the output takes on one of two values. Either zero or one.

Perceptrons have various inputs and associated weights which output a result called a neuron. The output is either zero or one, based on whether the weighted sum of the inputs and weights is greater than or less than some predefined threshold value. The threshold value can be tweaked or modified to give certain inputs "more influence" or "less influence".

If the following holds, then the output is zero (translation: no, the intended condition or output won't happen)

$$ \sum_{i=0}^n w_i * x_i \le threshold $$

If the following holds, then the output is 1 (translation: yes, the intended condition or output will happen)
$$ \sum_{i=0}^n w_i * x_i \gt threshold $$

That's lot of vocabulary but that's pretty much all there is to perceptrons.
Interestingly, in the book, Nielsen reworks the equations (introducing vector notation for w and x) and renames and redefines the threshold to be the perceptron's bias. I won't repeat the same material here but essentially the bias is the threshold negated $$ bias = {-} threshold $$
A very negative bias means it will be hard to output one. A very large (positive) bias will help us get closer to an output one. Check out the book for more details and how the equations are reworked.

Perceptrons can expand to a network of perceptrons which can be used as a computational device, a learning algorithm. Such algorithms can automatically tune the weights and biases of a network of artificial neurons and tuning happens in response to external stimuli, without intervention of a human programmer. The idea is that a small change (delta) in the weight or bias will result in a small change in the output. If this true, I could essentially modify the weights, as an example, to get the desired output from my network. However this is not what happens when the network contains perceptrons; a small change in a weight (or bias) can completely flip the output from say a one to a zero or vice versa. So its not immediately clear how the network of perceptrons can be adapted to learning. This is where a "new" type of artificial neuron comes into play. Introducing the sigmoid neuron 👏

Sigmoid Neurons

The main charm with sigmoid neurons is exactly the point just discussed above. A small change in the weight (or bias) will result in a small change in the output. As desired.

Sigmoid, oh sigmoid! I also came across the sigmoid function (logistic regression) in the Coursera course on machine learning. I'm still a machine learning newbie but its clear that sigmoid is a big deal in the machine learning world. Sigmoids behave very similar to perceptrons when
$$ w \cdot x + b $$
is very large or very negative. However, the difference between the two models manifest itself for less extreme values. One key difference between sigmoid neurons and perceptrons is that the signmoid can output any real number between zero and one e.g. 0.121 or 1/3. Its not just binary with sigmoid. It can also accept any real number between zero and one as an input. Perceptrons however are limited to two values: zero and one.

That's it for now. I plan to share more as I continue to deep dive into deep learning. Wish me luck.

\begin{array}{rcl} (4) & \frac{1}{1 + \exp (- \sum_{j} w_{j} x_{j} - b)} . \end{array}

\begin{array}{rcl} (4) & \frac{1}{1 + \exp (- \sum_{j} w_{j} x_{j} - b)} . \end{array}

\begin{array}{rcl} (4) & \frac{1}{1 + \exp (- \sum_{j} w_{j} x_{j} - b)} . \end{array}

Search This Blog

Geeky Gal Technical Musings