Deep Learning SIMPLIFIED

Are you overwhelmed by overly-technical explanations of Deep Learning? If so, this series will bring you up to speed on this fast-growing field – without any of the math or code.

Deep Learning is an important subfield of Artificial Intelligence (AI) that connects various topics like Machine Learning, Neural Networks, and Classification. The field has advanced significantly over the years due to the works of giants like Andrew Ng, Geoff Hinton, Yann LeCun, Adam Gibson, and Andrej Karpathy. Many companies have also invested heavily in Deep Learning and AI research - Google with DeepMind and its Driverless car, nVidia with CUDA and GPU computing, and recently Toyota with its new plan to allocate one billion dollars to AI research.

You've probably looked up videos on YouTube and found that most of them contain too much math for a beginner. The few videos that promise to just present concepts are usually still too high level for someone getting started. Any videos that show complicated code just make these problems worse for the viewers.

There’s nothing wrong with technical explanations, and to go far in this field you must understand them at some point. However, Deep Learning is a complex topic with a lot of information, so it can be difficult to know where to begin and what path to follow.

The goal of this series is to give you a road map with enough detail that you’ll understand the important concepts, but not so much detail that you’ll feel overwhelmed. The hope is to further explain the concepts that you already know and bring to light the concepts that you need to know. In the end, you’ll be able to decide whether or not to invest additional time on this topic.

So while the math and the code are important, you will see neither in this series. The focus is on the intuition behind Deep Learning – what it is, how to use it, who’s behind it, and why it’s important. You'll first get an overview of Deep Learning and a brief introduction of how to choose between different models. Then we'll see some use cases. After that, we’ll discuss various Deep Learning tools including important software libraries and platforms where you can build your own Deep Nets.

With plenty of machine learning tools currently available, why would you ever choose an artificial neural network over all the rest? This clip and the next could open your eyes to their awesome capabilities! You'll get a closer look at neural nets without any of the math or code - just what they are and how they work. Soon you'll understand why they are such a powerful tool!

Deep Learning is primarily about neural networks, where a network is an interconnected web of nodes and edges. Neural nets were designed to perform complex tasks, such as the task of placing objects into categories based on a few attributes. This process, known as classification, is the focus of our series.

Classification involves taking a set of objects and some data features that describe them, and placing them into categories. This is done by a classifier which takes the data features as input and assigns a value (typically between 0 and 1) to each object; this is called firing or activation; a high score means one class and a low score means another. There are many different types of classifiers such as Logistic Regression, Support Vector Machine (SVM), and Naïve Bayes.

Neural nets are highly structured networks, and have three kinds of layers - an input, an output, and so called hidden layers, which refer to any layers between the input and the output layers. Each node (also called a neuron) in the hidden and output layers has a classifier. The input neurons first receive the data features of the object. After processing the data, they send their output to the first hidden layer. The hidden layer processes this output and sends the results to the next hidden layer. This continues until the data reaches the final output layer, where the output value determines the object's classification. This entire process is known as Forward Propagation, or Forward prop. The scores at the output layer determine which class a set of inputs belongs to.

With so many alternatives available, why are neural nets used for Deep Learning? Neural nets excel at complex pattern recognition and they can be trained quickly with GPUs.

Historically, computers have only been useful for tasks that we can explain with a detailed list of instructions. As such, they tend to fail in applications where the task at hand is fuzzy, such as recognizing patterns. Neural Networks fill this gap in our computational abilities by advancing machine perception – that is, they allow computers to start to making complex judgements about environmental inputs. Most of the recent hype in the field of AI has been due to progress in the application of deep neural networks.

Neural nets tend to be too computationally expensive for data with simple patterns; in such cases you should use a model like Logistic Regression or an SVM. As the pattern complexity increases, neural nets start to outperform other machine learning methods. At the highest levels of pattern complexity – high-resolution images for example – neural nets with a small number of layers will require a number of nodes that grows exponentially with the number of unique patterns. Even then, the net would likely take excessive time to train, or simply would fail to produce accurate results.

As a result, deep nets are essentially the only practical choice for highly complex patterns such as the human face. The reason is that different parts of the net can detect simpler patterns and then combine them together to detect a more complex pattern. For example, a convolutional net can detect simple features like edges, which can be combined to form facial features like the nose and eyes, which are then combined to form a face (Credit: Andrew Ng). Deep nets can do this accurately – in fact, a deep net from Google beat a human for the first time at pattern recognition.

However, the strength of deep nets is coupled with an important cost – computational power. The resources required to effectively train a deep net were prohibitive in the early years of neural networks. However, thanks to advances in high-performance GPUs of the last decade, this is no longer an issue. Complex nets that once would have taken months to train, now only take days.

Deep Nets come in a large variety of structures and sizes, so how do you decide which kind to use? The answer depends on whether you are classifying objects or extracting features. Let’s take a look at your choices.

A forewarning: this section contains several new terms, but rest assured – they will all be explained in the upcoming video clips.

If your goal is to train a classifier with a set of labelled data, you should use a Multilayer Perceptron (MLP) or a Deep Belief Network (DBN). Here are some guidelines if you are targeting any of the following applications:

Natural Language Processing: use a Recursive Neural Tensor Network (RNTN) or Recurrent Net.
Image Recognition: use a DBNN or Convolutional Net
Object Recognition: use a Convolutional Net or RNTN
Speech Recognition: use a Recurrent Net

If your goal is to extract potentially useful patterns from a set of unlabelled data, you should use a Restricted Boltzmann Machine (RBM) or some other kind of autoencoder. For any work that involves the processing of time series data, use a Recurrent Net.

If deep neural networks are so powerful, why aren’t they used more often? The reason is that they are very difficult to train due to an issue known as the vanishing gradient.

To train a neural network over a large set of labelled data, you must continuously compute the difference between the network’s predicted output and the actual output. This difference is called the cost, and the process for training a net is known as backpropagation, or backprop. During backprop, weights and biases are tweaked slightly until the lowest possible cost is achieved. An important aspect of this process is the gradient, which is a measure of how much the cost changes with respect to a change in a weight or bias value.

Backprop suffers from a fundamental problem known as the vanishing gradient. During training, the gradient decreases in value back through the net. Because higher gradient values lead to faster training, the layers closest to the input layer take the longest to train. Unfortunately, these initial layers are responsible for detecting the simple patterns in the data, while the later layers help to combine the simple patterns into complex patterns. Without properly detecting simple patterns, a deep net will not have the building blocks necessary to handle the complexity. This problem is the equivalent of to trying to build a house without the proper foundation.

So what causes the gradient to decay back through the net? Backprop, as the name suggests, requires the gradient to be calculated first at the output layer, then backwards across the net to the first hidden layer. Each time the gradient is calculated, the net must compute the product of all the previous gradients up to that point. Since all the gradients are fractions between 0 and 1 – and the product of fractions in this range results in a smaller fraction – the gradient continues to shrink.

For example, if the first two gradients are one fourth and one third, then the next gradient would be one fourth of one third, which is one twelfth. The following gradient would be one twelfth of one fourth, which is one forty-eighth, and so on. Since the layers near the input layer receive the smallest gradients, the net would take a very long time to train. As a subsequent result, the overall accuracy would suffer.

So what was the breakthrough that allowed deep nets to combat the vanishing gradient problem? The answer has two parts, the first of which involves the RBM, an algorithm that can automatically detect the inherent patterns in data by reconstructing the input.

Geoff Hinton of the University of Toronto, a pioneer and giant in the field, was able to devise a method for training deep nets. His work led to the creation of the Restricted Boltzmann Machine, or RBM.

Structurally, an RBM is a shallow neural net with just two layers – the visible layer and the hidden layer. In this net, each node connects to every node in the adjacent layer. The “restriction” refers to the fact that no two nodes from the same layer share a connection.

The goal of an RBM is to recreate the inputs as accurately as possible. During a forward pass, the inputs are modified by weights and biases and are used to activate the hidden layer. In the next pass, the activations from the hidden layer are modified by weights and biases and sent back to the input layer for activation. At the input layer, the modified activations are viewed as an input reconstruction and compared to the original input. A measure called KL Divergence is used to analyze the accuracy of the net. The training process involves continuously tweaking the weights and biases during both passes until the input is as close as possible to the reconstruction.

Because RBMs try to reconstruct the input, the data does not have to be labelled. This is important for many real-world applications because most data sets – photos, videos, and sensor signals for example – are unlabelled. By reconstructing the input, the RBM must also decipher the building blocks and patterns that are inherent in the data. Hence the RBM belongs to a family of feature extractors known as auto-encoders.

An RBM can extract features and reconstruct input data, but it still lacks the ability to combat the vanishing gradient. However, through a clever combination of several stacked RBMs and a classifier, you can form a neural net that can solve the problem. This net is known as a Deep Belief Network.

The Deep Belief Neural Network, or DBNN, was also conceived by Geoff Hinton. These powerful nets are believed to be used by Google for their work on the image recognition problem. In terms of structure, a Deep Belief is identical to a Multilayer Perceptron, but structure is where their similarities end – a DBNN has a radically different training method which allows it to tackle the vanishing gradient.

The method is known as Layer-wise, unsupervised, greedy pre-training. Essentially, the DBNN is trained two layers at a time, and these two layers are treated like an RBM. Throughout the net, the hidden layer of an RBM acts as the input layer of the adjacent one. So the first RBM is trained, and its outputs are then used as inputs to the next RBM. This procedure is repeated until the output layer is reached.

After this training process, the DBNN is capable of recognizing the inherent patterns in the data. In other words, it’s a sophisticated, multilayer feature extractor. The unique aspect of this type of net is that each layer ends up learning the full input structure. In other types of deep nets, layers generally learn progressively complex patterns – for facial recognition, early layers could detect edges and later layers would combine them to form facial features. On the other hand, A DBNN learns the hidden patterns globally, like a camera slowly bringing an image into focus.

In the end, a DBNN still requires a set of labels to apply to the resulting patterns. As a final step, the DBN is fine-tuned with supervised learning and a small set of labeled examples. After making minor tweaks to the weights and biases, the net will achieve a slight increase in accuracy.

This entire process can be completed in a reasonable amount of time using GPUs, and the resulting net is typically very accurate. Thus the DBNN is an effective solution to the vanishing gradient problem. As an added real-world bonus, the training process only requires a small set of labelled data.

Out of all the current Deep Learning applications, machine vision remains one of the most popular. Since Convolutional Neural Nets (CNN) are one of the best available tools for machine vision, these nets have helped Deep Learning become one of the hottest topics in AI.

CNNs are deep nets that are used for image, object, and even speech recognition. Pioneered by Yann Lecun at New York University, these nets are currently utilized in the tech industry, such as with Facebook for facial recognition. If you start reading about CNNs you will quickly discover the ImageNet challenge, a project that was started to showcase the state of the art and to help researchers access high-quality image data. Every top Deep Learning team in the world joins the competition, but each time it’s a CNN that ends up taking first place.

A CNN tends to be a difficult concept to grasp. If you’ve ever struggled while trying to learn about these nets, please comment and share your experiences.

CNNs have multiple types of layers, the first of which is the convolutional layer. To visualize this layer, imagine a set of evenly spaced flashlights all shining directly at a wall. Every flashlight is looking for the exact same pattern through a process called convolution. A flashlight’s area of search is fixed in place, and it is bounded by the individual circle of light cast on the wall. The entire set of flashlights forms one filter, which is able to output location data of the given pattern. A CNN typically uses multiple filters in parallel, each scanning for a different pattern in the image. Thus the entire convolutional layer is a 3-dimensional grid of these flashlights.

Connecting some dots

A series of filters forms layer one, called the convolutional layer. The weights and biases in this layer determine the effectiveness of the filtering process.
Each flashlight represents a single neuron. Typically, neurons in a layer activate or fire. On the other hand, in the convolutional layer, neurons search for patterns through convolution. Neurons from different filters search for different patterns, and thus they will process the input differently.
Unlike the nets we've seen thus far where every neuron in a layer is connected to every neuron in the adjacent layers, a CNN has the flashlight effect. A convolutional neuron will only connect to the input neurons that it “shines” upon.

The convoluted input is then sent to the next layer for activation. CNNs use backprop for training, but because a special engine called Rectified Linear Unit (ReLU) is used for activation, the nets don’t suffer from the vanishing gradient problem.

In real world applications, image convolution results in 100s of millions of weights and biases, which has an adverse effect on performance. Thus after ReLU, the activations are typically pooled in an adjacent layer to reduce dimensionality. Afterwards, there is usually a fully connected layer that acts as a classifier.

CNNs that are in use typically have an architecture with repeated sets of layers. Set 1 is a convolutional layer followed by a ReLU. This set can be repeated a few times, and the repeated structure is followed by a pooling layer. This resulting combination forms set 2, which is also repeated a few more times. The final resulting structure is then attached to a fully connected layer at the end. This architecture allows the net to continuously build complex patterns from simple ones, all while lowering computing costs with dimensionality reduction.

CNNs are a powerful tool, but there is one drawback – they require 10s of millions of labelled data points for training. They also must be trained with GPUs for the process to be completed in a reasonable amount of time.

Our previous discussions of deep net applications were limited to static patterns, but how can a net decipher and label patterns that change with time? For example, could a net be used to scan traffic footage and immediately flag a collision? Through the use of a recurrent net, these real-time interactions are now possible.

The Recurrent Neural Net (RNN) is the brainchild of Juergen Schmidhuber and Sepp Hochreiter. The three deep nets we’ve seen thus far – MLP, DBNN, and CNN – are known as feedforward networks since a signal moves in only one direction across the layers. In contrast, RNNs have a feedback loop where the net’s output is fed back into the net along with the next input. Since RNNs have just one layer of neurons, they are structurally one of the simplest types of nets.

Like other nets, RNNs receive an input and produce an output. Unlike other nets, the inputs and outputs can come in a sequence. Here are some sample applications for different input-output scenarios:

Single input, sequence of outputs: image captioning
Sequence of inputs, single output: document classification
Sequence of inputs, sequence of outputs: video processing by frame, statistical forecasting of demand in Supply Chain Planning

Have you ever used an RNN in one of your projects before? If so, please comment and tell us about your experience.

RNNs are trained using backpropagation through time, which reintroduces the vanishing gradient problem. In fact, the problem is worse with an RNN because each time step is the equivalent of a layer in a feedforward net. Thus if the net is trained for 1000 time steps, the gradient will vanish exponentially as it would in a 1000-layer MLP.

There are different approaches to address this problem, the most popular of which is gating. Gating takes the output of any time step and the next input, and performs a transformation before feeding the result back into the RNN. There are several types of gates, the LSTM being the most popular. Other approaches to address this problem include gradient clipping, steeper gates, and better optimizers.

GPUs are an essential tool for training an RNN. A team at Indico compared the speed boost from using a GPU over a CPU, and found a 250-fold increase. That’s the difference between 1 day and over 8 months!

A recurrent net has one additional capability – it can predict the next item in a sequence, essentially acting as a forecasting engine.

Autoencoders are a family of neural nets that are well suited for unsupervised learning, a method for detecting inherent patterns in a data set. These nets can also be used to label the resulting patterns.

Essentially, autoencoders reconstruct a data set and, in the process, figure out its inherent structure and extract its important features. An RBM is a type of autoencoder that we have previously discussed, but there are several others.

Autoencoders are typically shallow nets, the most common of which have one input layer, one hidden layer, and one output layer. Some nets, like the RBM, have only two layers instead of three. Input signals are encoded along the path to the hidden layer, and these same signals are decoded along the path to the output layer. Like the RBM, the autoencoder can be thought of as a 2-way translator.

Autoencoders are trained with backpropgation and a new concept known as loss. Loss measures the amount of information about the input that was lost through the encoding-decoding process. The lower the loss value, the stronger the net.

Some autoencoders have a very deep structure, with an equal number of layers for both encoding and decoding. A key application for deep autoencoders is dimensionality reduction. For example, these nets can transform a 256x256 pixel image into a representation with only 30 numbers. The image can then be reconstructed with the appropriate weights and bias; as an addition, some nets also add random noise at this stage in order to enhance the robustness of the discovered patterns. The reconstructed image wouldn’t be perfect, but the result would be a decent approximation depending on the strength of the net. The purpose of this compression is to the reduce the input size on a set of data before feeding it to a deep classifier. Smaller inputs lead to large computational speedups, so this preprocessing step is worth the effort.

Deep autoencoders are much more powerful than their predecessor, principal component analysis. In the video, you'll see the comparison of two letter codes associated with news stories of different topics. Among the two models, you’ll find the deep autoencoder to be far superior.

Certain patterns are innately hierarchical, like the underlying parse tree of a natural language sentence. A Recursive Neural Tensor Network (RNTN) is a powerful tool for deciphering and labelling these types of patterns.

The RNTN was conceived by Richard Socher in order to address a key problem of current sentiment analysis techniques – double negatives being treated as negatives. Structurally, an RNTN is a binary tree with three nodes: a root and two leaves. The root and leaf nodes are not neurons, but instead they are groups of neurons – the more complicated the input data the more neurons are required. As expected, the root group connects to each leaf group, but the leaf groups do not share a connection with each other. Despite the simple structure of the net, an RNTN is capable of extracting deep, complex patterns out of a set of data.

An RNTN detects patterns through a recursive process. In a sentence-parsing application where the objective is to identify the grammatical elements in a sentence (like a noun phrase or a verb phrase, for example), the first and second words are initially converted into an ordered set of numbers known as a vector. The conversion method is highly technical, but the numerical values in the vector indicate how closely related the words are to each other compared to other words in the vocabulary.

Once the vectors for the first and second word are formed, they are fed into the left and right leaf groups respectively. The root group outputs, among other things, a vector representation of the current parse. The net then feeds this vector back into one of the leaf groups and, recursively, feeds different combinations of the remaining words into the other leaf group. It is through this process that the net is able to analyze every possible syntactic parse. If during the recursion the net runs out of input, the current parse is scored and compared to the previously discovered parses. The one with the highest score is considered to be the optimal parse or grammatical structure, and it is delivered as the final output.

After determining the optimal parse, the net backtracks to figure out the appropriate labels to apply to each substructure; in this case, substructures could be noun phrases, verb phrases, prepositional phrases, and so on.

RNTNs are used in Natural Language Processing for both sentiment analysis and syntactic parsing. They can also be used in scene parsing to identify different parts of an image.

Despite its popularity, machine vision is not the only Deep Learning application. Deep nets have started to take over text processing as well, beating every traditional method in terms of accuracy. They also are used extensively for cancer detection and medical imaging. When a data set has highly complex patterns, deep nets tend to be the optimal choice of model.

Demo URLs

Clarifai - http://www.clarifai.com Metamind - https://www.metamind.io/language/twitter

As we have previously discussed, Deep Learning is used in many areas of machine vision. Facebook uses deep nets to detect faces from different angles, and the startup Clarifai uses these nets for object recognition. Other applications include scene parsing and vehicular vision for driverless cars.

Deep Nets are also starting to beat out other models in certain Natural Language Processing tasks like sentiment analysis, which can be seen with new tools like MetaMind. Recurrent nets can be used effectively in document classification and character-level text processing.

Deep Nets are even being used in the medical space. A Stanford team was able to use deep nets to identify 6,642 factors that help doctors better predict the chances of cancer survival. Researchers from IDSIA in Switzerland used a deep net to identify invasive breast cancer cells. In drug discovery, Merck hosted a deep learning challenge to predict the biological activity of molecules based on chemical structure.

In finance, deep nets are trained to make predictions based on market data streams, portfolio allocations, and risk profiles. In digital advertising, these nets are used to optimize the use of screen space, and to cluster users in order to offer personal ads. They are even used to detect fraud in real time, and to segment customers for upselling/cross-selling in a sales environment.

A deep learning platform enables a user to apply deep nets without building one from scratch. They come in two different forms: software platforms and full platforms.

A platform is a set of tools that users can build on top of. Platforms in other contexts include iOS/Android and MacOS/Windows for example. A Deep Net platform provides a set of tools that simplify the process of building a deep net for a custom application. They typically allow the user to select a particular deep net, integrate and munge data, and manage models from a UI. Some platforms also help to enhance performance when dealing with large data sets.

Ersatz Labs is an example of a full platform because it hosts your Deep Learning applications on a cloud. The platform handles the technical aspects like hardware, code, and networking; the user only needs to build and manage deep nets through a UI. In contrast, software platforms require the user to train and run the nets on their own personal hardware.

H2O.ai and Dato GraphLab are two examples of machine learning software platforms that offer Deep Nets; since they aren’t full platforms, you will need to install them on your own hardware infrastructure in order to use them.

H2O.ai is a software platform that offers a host of machine learning algorithms, as well as one deep net model. It also provides sophisticated data munging, an intuitive UI, and several built-in enhancements for handling data. However, the tools must be run on your own hardware.

H2O.ai was founded by SriSatish Ambati, Cliff Click, and Arno Candel. In addition to its only deep net – a vanilla MLP – the platform offers a variety of models like GLM, Distributed Random Forest, Naive Bayes, a K-Means clustering model, and a few others. H2O.ai can be linked to multiple data sources in order to train data loads.

The UI is highly intuitive, but you can also work with the tools through other apps like Tableau or Excel. These interfaces allow you to set up a deep net by configuring its hyper-parameters.

H2O.ai needs to be deployed and maintained on your own hardware, which may be a limiting factor. However, the platform comes with many performance enhancements like in-memory map-reduce, columnar compression, and distributed parallel processing. Depending on your hardware’s capabilities, training on big data sets could be completed in a reasonable amount of time. As an added note, it’s unclear whether or not GPU support is a built-in feature at this point in time.

Dato GraphLab is a good software platform for Deep Learning projects that require graph analytics and other important algorithms. It provides two deep nets, sophisticated data munging, an intuitive UI, and built-in enhancements for handling big data.

Dato GraphLab currently offers a vanilla MLP and a convolutional net. An important feature of the platform is the Graph Analytics toolset, which can be run alongside the deep learning models. Other provided tools include text analytics, a recommender, classification, regression, and clustering. You can also point GraphLab at multiple data sources in order to train data loads.

The platform has an intuitive UI along with an extension called the GraphLab Canvas. This extension offers highly sophisticated visualizations of your models.

Even though GraphLab needs to be deployed and maintained on your own hardware, the platform comes with many performance enhancements that speed up training on big data sets.
GraphLab offers three different types of built-in storage – tabular, columnar, and graph. In addition, the platform provides built-in GPU support which is extremely beneficial for training. You can also set up each type of model as a service that can be accessed programmatically through an API.

Deep Learning libraries provide pre-written, professional-quality code that you can use for your own projects. Given the complexity of deep net applications, reusing code is a wise choice for a developer.

A library is a set of functions and modules that you can call through your own programs. Library code is typically created by highly-qualified software teams, and many libraries bring together large communities that support and extend the codebase. If you’re a developer, you’ve almost certainly used a library at one point in time.

For a commercial-grade deep learning application, the best libraries are deeplearning4j, Torch, and Caffe. The library Theano is suited for educational, research, and scientific projects. Other available libraries include Deepmat and Neon.

Theano is a Python library that defines a set of mathematical functions for building deep nets. Nets that use these functions as their building blocks will be highly optimized for training.

The core feature of Theano is the use of vectors and matrices for all of its functions. Vectorized code runs quickly since multiple values can be processed in parallel. Since Deep Nets require large amounts of computation throughout the training process, vectorization is a highly-recommended feature. Theano is multi-threaded with GPU support, so deep nets can be trained on just a single machine within a reasonable amount of time.

To use Theano for Deep Learning, you must code every aspect of a deep net including the layers, the nodes, the activation, and the training rate. However, all the functions that run your code will be vectorized, resulting in an efficient implementation. Many software libraries extend Theano, making it easier to use in your projects. The Blocks library helps by parameterizing Theano functions. The Lasagne library allows you to specify hyper-parameters in order to build a net layer by layer. Niche libraries like Passage help implement recurrent nets for text analysis.

Deeplearning4j is one of the few libraries that allows you to train your net over a distributed, multi-node cluster. The library provides an Iterative Map-Reduce procedure as well as a set of tools for configuring a Deep Net using hyper-parameters.

The Deeplearning4j Java library was created by Adam Gibson in response to the lack of distributed, multi-node capabilities in other Deep Net libraries. Deeplearning4j can run on both Scala and Clojure, and it provides built-in GPU support for a distributed framework. You can also use the library to set up a deep net by configuring its hyper-parameters.

Deeplearning4j supports nearly every type of deep net, including the MLP, RBM/DBN, Convolutional Net, Recurrent Net, RNTN, and autoencoders. In addition, the Canova vectorization library is included with the package.

How does the Iterative Map-Reduce procedure differ from standard Map-Reduce? In Deeplearning4j, there are two different steps:

MAP: Input data is distributed throughout the cluster, with every node receiving a different portion of the data. Each node begins training with its input set.
REDUCE: After training, the parameters of all the nets are averaged. Every node overwrites its net’s parameters with this global average.

These two steps are repeated iteratively until the error is sufficiently small.

Torch is another great library for developing Deep Learning applications. Several useful libraries extend its codebase, all of which are backed by an active community.

Torch is a LuaJIT library that was developed by Ronan Collobert and Soumith Chintala of Facebook, Clement Farabet of Twitter, and Koray Kavukcuoglu of Google DeepMind. With Torch, you can configure a deep net by selecting options for certain hyper-parameters, and then you can access the deep net within your code.

Several libraries extend the functionalities offered by Torch. The CUDA library Cutorch provides GPU support. Other libraries like NN, Cephes, DP, and NNgraph provide you the necessary tools to build nearly every kind of deep net.

Caffe is a Deep Learning library that is well suited for machine vision and forecasting applications. With Caffe you can build a net with sophisticated configuration options, and you can access premade nets in an online community.

Caffe is a C++/CUDA library that was developed by Yangqing Jia of Google. The library was initially designed for machine vision tasks, but recent versions support sequences, speech and text, and reinforcement learning. Since it’s built on top of CUDA, Caffe supports the use of GPUs.

Caffe allows the user to configure the hyper-parameters of a deep net. The layer configuration options are robust and sophisticated – individual layers can be set up as vision layers, loss layers, activation layers, and many others. Caffe’s community website allows users to contribute premade deep nets along with other useful resources.

Vectorization is achieved through specialized arrays called “blobs”, which help optimize the computational costs of various operations.