exploredatascience

Comparison while using ANN, CNN and RNN

Let’s first understand what is gradient? Gradient of a function with only one independent variable as y = f(x) is the derivative i.e. f'(x) of that function f(x) which is calculated with the help of slope of the tangent drawn at some point(x0, y0) on that curve, whereas the gradient of a function with multiple independent variables as z = f(x, y){where x, y are independent variables} is the vector nebla f(x, y) = [df'(x, y)/dx , df'(x, y)/dy] made up of partial derivatives of that function.
Gradient is a vector quantity, when its magnitude || nebla f(x, y) || is taken, it tells the step size, required to reach at the point of steepest ascent. When its gradient vector nebla f(x, y) is taken, it tells the direction of steepest ascent to reach at the top of the curve{here the curve is assumed as a mountain.}
Above discussion is just the mathematical way of explaing gradients. Lets understand the problem of Vanishing Gradients in ANNs, ANNs are artificial neural networks. Let’s take an example of Handwritten Digits Recognition using ANNs. During the recognition process lets say we take the data-set of input images with dimensions 28×28 i.e. 784 pixels. This will form an input layer with these many pixels.
Now, next step is to decide the hidden layers and number of neurons in each hidden layer. This can only be decided using intuition, and/or experimentation. Let’s say we have two hidden layers with five neurons in each layer.
Since, we are doing handwritten digit recognition task, which will fix the neurons in our output layer. Hence, we will have ten{numbers can only be between 0-9} neurons in output layer. Every layer is connected with its adjacent layer, where each neuron of previous layer is connected with every neuron of next layer. These connection have weights associated with each other. Network is trained to fix the value of weights, to find the solution of a problem.
Every neuron is a combination of two functions – one is summation and other one is activation function. Summation function is pretty obvious, whereas we have to decide the activation function. We have choices like sigmoid, tanh, relu etc. Here, we will use sigmoid activation function to understand the problem of vanishing gradient. Sometimes, It’s curve type introduces the problem of vanishing gradient.
After deciding above parameters, our next task is to decide the loss function to train the above network. Here, we take cross-entropy loss function, which is pretty much used for classification problems.
Our next task is to train that network. Initially weights are assigned at random in the network. Later forward and backward pass takes place, to train the network. During the forward pass , the network architecture is exposed to an input image pixels i.e. the intensity values of the pixels. These values are passed through the network to calculate the value of an output, which is called predicted output. As, we are doing classification which is a supervised learning process, Hence, the data-set has a list of actual outputs associated with every input image.
Therefore, cross entropy loss value is calculated between predicted output and actual output, to tune the weights of the network. Now the backward pass takes place, which is called back-propagation algorithm. This algorithm
To understand the concept of vanishing gradient with the help of sigmoid function, please refer the below link:
https://towardsdatascience.com/the-vanishing-gradient-problem-69bf08b15484
— Oscar Wilde.

This is an example post, originally published as part of Blogging University. Enroll in one of our ten programs, and start your blog right.

You’re going to publish a post today. Don’t worry about how your blog looks. Don’t worry if you haven’t given it a name yet, or you’re feeling overwhelmed. Just click the “New Post” button, and tell us why you’re here.

Why do this?

Because it gives new readers context. What are you about? Why should they read your blog?
Because it will help you focus you own ideas about your blog and what you’d like to do with it.

The post can be short or long, a personal intro to your life or a bloggy mission statement, a manifesto for the future or a simple outline of your the types of things you hope to publish.

To help you get started, here are a few questions:

Why are you blogging publicly, rather than keeping a personal journal?
What topics do you think you’ll write about?
Who would you love to connect with via your blog?
If you blog successfully throughout the next year, what would you hope to have accomplished?

You’re not locked into any of this; one of the wonderful things about blogs is how they constantly evolve as we learn, grow, and interact with one another — but it’s good to know where and why you started, and articulating your goals may just give you a few other post ideas.

Can’t think how to get started? Just write the first thing that pops into your head. Anne Lamott, author of a book on writing we love, says that you need to give yourself permission to write a “crappy first draft”. Anne makes a great point — just start writing, and worry about editing it later.

When you’re ready to publish, give your post three to five tags that describe your blog’s focus — writing, photography, fiction, parenting, food, cars, movies, sports, whatever. These tags will help others who care about your topics find you in the Reader. Make sure one of the tags is “zerotohero,” so other new bloggers can find you, too.

exploredatascience

What is vanishing Gradient with an example?

Comparison while using ANN, CNN and RNN

what is the gradient of a single independent point?

ways to address over-fitting problem

Ways to address vanishing gradient problem

Causes of Over fitting in ANN, CNN and RNN?

Why machine learning algorithms do not have space and time complexity while data structure algorithms have?

Does CNN work on time series data??

Introduce Yourself (Example Post)