Optimization
When we do not have a closed form, we can use the optimization to find estimate of parameters. In order to find optimization, you need a loos function and use a procedure to minimize the value of loss function based on the parameter values.
Contents
Newton-Raphson method
The Newton-Raphson algorithm is one of the old iterative algorithm to approximately find the roots of a real-valued function (loos function);
. The idea behind of it is based on the simple linear approximation.
Given x_0
, a starting point, by solving the root of function by solving
, we get closer to the solution.
The simple and plain algorithm of Newton-Raphosn is given in the below
Since we should run the iteration for many time; it is better to replace for
with while
loop. In the below, we present the algorithm in Python.
To write the algorithm in code, we consider that is also used in Wikipedia.
Define the function
def f(x):
return x**4 - 3 * x**3+2
def df(x):
return 4 * x**3 - 9 * x**2
To see the minimum of the function, plot it:
from matplotlib import pyplot as plt
import numpy as np
x_ran_0 = np.linspace(-7,7,100)
plt.plot(x_ran_0 ,f(x_ran_0))
plt.xlabel('x')
plt.ylabel('f(x)')
plt.show(block=False)
To see how the algorithm works, we can add the plot pf gradients along running the iterations:
import matplotlib as mpl
x_s = -4 # Starting point
cur_x = x_s
diff_cur_prev= 1
precision = 0.001
max_ite = 1000 #Maximum number of iterations
#Choose different colors for each gradient
colors = mpl.cm.autumn(np.linspace(0,5,max_ite))
plt.plot(x_ran_0 ,f(x_ran_0)) # plot again the f(x)
i= 1
while diff_cur_prev > precision:
#Plot the points
plt.scatter(cur_x, f(cur_x), color='black', s=10, zorder=2);
plt.scatter(cur_x, f(cur_x), color='white', s=5, zorder=2)
x_ran_1= np.linspace(cur_x-5,cur_x+5,10)
plt.plot(x_ran_1, (df(cur_x) * (x_ran_1 - cur_x)) + f(cur_x), color=colors[i], zorder=1)
prev_x = cur_x
cur_x += -f(prev_x) / df(prev_x)
diff_cur_prev = abs(cur_x - prev_x)
print("Run: %d, Current: %f, Previous: %f, Different: %f" % (i, cur_x,prev_x,diff_cur_prev))
i+= 1
plt.xlim([-10,10])
plt.xlabel('x')
plt.ylabel('f(x)')
plt.show(block=False)
Gradient Descent
Gradient Descent is variant of Newton-Raphson that can be used to find the minimum value of a differentiable function. The local minimum can be obtained by solving
.
We often use theta instead of x
:
, where is the derivative of loss function for the given parameters.
where is called the step size or learning rate, see Wikipedia. To not take big steps we use . We use Obviously, the criterion function is different from the Newton-Raphson Algorithm.
To write a simple code to find the minimum using the Gradient Descent consider that is already used for explaining the Newton-Raphson.
The plot of gradient can be obtained using running the following script:
import matplotlib as mpl
x_s = -4 # Starting point
cur_x = x_s
diff_cur_prev= 1
gamma = 0.001
precision = 0.001
max_ite = 1000 #Maximum number of iterations
#Choose different colors for each gradient
colors = mpl.cm.autumn(np.linspace(0,5,max_ite))
plt.plot(x_ran_0 ,f(x_ran_0)) # plot again the f(x)
i= 1
while diff_cur_prev > precision:
#Plot the points
plt.scatter(cur_x, f(cur_x), color='black', s=10, zorder=2);
plt.scatter(cur_x, f(cur_x), color='white', s=5, zorder=2)
x_ran_1= np.linspace(cur_x-5,cur_x+5,10)
plt.plot(x_ran_1, (df(cur_x) * (x_ran_1 - cur_x)) + f(cur_x), color=colors[i], zorder=1)
prev_x = cur_x
cur_x += -gamma * df(prev_x)
diff_cur_prev = abs(cur_x - prev_x)
print("Run: %d, Current: %f, Previous: %f, Different: %f" % (i, cur_x,prev_x,diff_cur_prev))
i+= 1
plt.xlim([-10,10])
plt.xlabel('x')
plt.ylabel('f(x)')
plt.show(block=False)
License
Copyright (c) 2019 Saeid Amiri