Hands On Deep Learning Algorithms with Python

Hands-On Deep Learning Algorithms with Python¶

Metadata¶

Author: Sudharsan Ravichandiran
ASIN: B07LH43V8P
Reference: https://www.amazon.com/dp/B07LH43V8P
Kindle link

Highlights¶

what is the difference between neurons and linear regression? In neurons, we introduce non-linearity to the result, , by applying a function called the activation or transfer function. — location: 604 ^ref-23412

The number of neurons in the output layer is based on the type of problem we want our network to solve. If it is a binary classification, then the number of neurons in the output layer is one that tells us which class the input belongs to. If it is a multi-class classification say, with five classes, and if we want to get the probability of each class as an output, then the number of neurons in the output layer is five, each emitting the probability. If it is a regression problem, then we have one neuron in the output layer. — location: 631 ^ref-44958

If we do not apply the activation function, then a neuron simply resembles the linear regression. — location: 639 ^ref-39984

This is because the activation function introduces the non linearity in the mechanism

Leaky ReLU is a variant of the ReLU function that solves the dying ReLU problem. Instead of converting every negative input to zero, it has a small slope for a negative value — location: 670 ^ref-40475

We can also set the value of to some random value and it is called as Randomized ReLU function. — location: 678 ^ref-24689

We can also set the value of alpha to some random value, and it is called as Rnadomized ReLU function.

Swish is a non-monotonic function, which means it is neither always non-increasing nor non-decreasing. It provides better performance than ReLU. It is simple and can be expressed as follows: Here, is the sigmoid function. The Swish function is shown in the following diagram: — location: 689 ^ref-52610

Check the function images in the Kindle book

The softmax function is basically the generalization of the sigmoid function. It is usually applied to the final layer of the network and while performing multi-class classification tasks. It gives the probabilities of each class for being output and thus, the sum of softmax values will always equal 1. — location: 700 ^ref-37759

The dimensions of the weight matrix must be number of neurons in the current layer x number of neurons in the next layer. Why is that? Because it is a basic matrix multiplication rule. To multiply any two matrices, AB, the number of columns in matrix A must be equal to the number of rows in matrix B. So, the dimension of the weight matrix, , should be number of neurons in the input layer x number of neurons in the hidden layer, — location: 720 ^ref-8322

Gradient descent is a first-order optimization algorithm, which means we only take into account the first derivative when performing the updates: — location: 778 ^ref-53691

This whole process of backpropagating the network from the output layer to the input layer and updating the weights of the network using gradient descent to minimize the loss is called backpropagation. — location: 788 ^ref-28594

Before moving on, let’s familiarize ourselves with some of the frequently used terminologies in neural networks: Forward pass: Forward pass implies forward propagating from the input layer to the output layer. Backward pass: Backward pass implies backpropagating from the output layer to the input layer. Epoch: The epoch specifies the number of times the neural network sees our whole training data. So, we can say one epoch is equal to one forward pass and one backward pass for all training samples. Batch size: The batch size specifies the number of training samples we use in one forward pass and one backward pass. Number of iterations: The number of iterations implies the number of passes where one pass = one forward pass + one backward pass. — location: 844 ^ref-61469

in order to compute anything on TensorFlow, we need to create a TensorFlow session. — location: 1031 ^ref-23301

Unlike tf.Variable(), we cannot pass the value directly to tf.get_variable(); instead, we use initializer. — location: 1072 ^ref-38744

Variables created using tf.Variable() cannot be shared, and every time we call tf.Variable(), it will create a new variable. But tf.get_variable() checks the computational graph for an existing variable with the specified parameter. If the variable already exists, then it will be reused; otherwise, a new variable will be created: — location: 1077 ^ref-8920

In order to visualize in TensorBoard, we first need to save our event files. It can be done using tf.summary.FileWriter(). It takes two important parameters, logdir and graph. — location: 1161 ^ref-4102

tensorboard –logdir=graphs –port=8000 — location: 1172 ^ref-25777

model.compile(loss=’binary_crossentropy’, optimizer=’sgd’, metrics=[‘accuracy’]) — location: 1528 ^ref-8273

model.fit(x=data, y=labels, epochs=100, batch_size=10) — location: 1535 ^ref-8981

model.evaluate(x=data_test,y=labels_test) — location: 1540 ^ref-60376

model.evaluate(x=data,y=labels) — location: 1542 ^ref-55446

Last update : 21 septembre 2023
Created : 21 septembre 2023