provocationofmind.com

Mastering Step Size and Learning Rate in Optimization

Written on

Understanding Step Size and Learning Rate

In the realm of Machine Learning optimization algorithms, the step size, commonly referred to as the learning rate, plays a pivotal role. It dictates how much the model parameters are adjusted in each iteration of the optimization process.

What are the implications of selecting an inappropriate step size or learning rate?

→ A step size that is too small can lead to slow convergence, requiring numerous iterations to approach the minimum of the cost function.

→ Conversely, a step size that is excessively large may cause the algorithm to overshoot the minimum or oscillate without achieving convergence.

Line Search Algorithm: An Essential Tool

The Line Search algorithm is crucial in Machine Learning optimization. It assists in identifying the optimal step size for adjusting model parameters during each iteration of the optimization process. This method can be utilized for both univariate and multivariate optimization tasks.

To implement the Line Search algorithm, a starting point and a search direction are necessary. Throughout the optimization journey, this algorithm automatically chooses a scaling factor, denoted as alpha, to establish the step size in the specified direction. Typically, the search direction is opposite to the gradient of the objective function, which indicates the steepest descent.

The first video discusses line search methods for unconstrained optimization, providing insights into their implementation and effectiveness.

Defining the Search Direction

The alpha value serves as a scaling factor for the search direction, constrained between 0 and 1. During the line search, the objective is to minimize the function by adjusting the current position with a scaled direction:

minimize f(x + alpha * direction)

Implementing the Line Search Algorithm

To effectively execute the line search algorithm, knowledge of the first derivative of the objective function is essential. Additionally, a starting point and a defined search range are required. It is feasible to conduct multiple searches with varying directions (both positive and negative with different magnitudes) to identify the optimal solution.

The line search function necessitates eight initial inputs:

  1. Objective function
  2. Derivative of the objective function
  3. Initial evaluation point
  4. A direction (often the negative of the derivative)
  5. Initial step size (alpha) with a default value of 1
  6. Shrinkage factor (rho) with a default value of 0.5
  7. A small positive constant to validate the Armijo condition (more on this later)
  8. Maximum number of iterations = 100

def line_search_function(f, df, x, d, alpha_init=1.0, rho=0.5, c=1e-4, max_iter=100):

Next, several variables must be initialized:

  1. Set alpha as the initial step size
  2. Define phi0 as the objective function value at the initial evaluation point x
  3. Calculate dphi0 as the directional derivative of the objective function f at the evaluation point x along the direction d
  4. Initialize i as a counter variable to track iterations

alpha = alpha_init

phi0 = f(x)

dphi0 = np.dot(df(x), d)

i = 0

The algorithm enters a loop that continues until the maximum number of iterations is reached or a suitable step size is found:

while i < max_iter:

The next step involves calculating the value of the objective function at the point ( x + alpha d ):

phi = f(x + alpha * d)

Checking the Armijo Condition

The Armijo condition, also known as the Wolfe first condition, ensures that the step size chosen sufficiently reduces the objective function. It asserts that the new objective function value must be less than or equal to a specified proportion of the decrease predicted by the first-order Taylor approximation at the current point. This can be expressed as:

f(x + alpha * d) ≤ f(x) + c * alpha * dphi0

Where ( c ) is a constant between 0 and 1, ( dphi0 ) is the directional derivative at the current point, and ( alpha ) is the step size.

To check for the Armijo condition:

if phi > phi0 + c * alpha * dphi0 or (phi >= f(x + (alpha / 2) * d) and i > 0):

If not satisfied, we call another function, the zoom function, to refine the step size:

return zoom(f, df, x, d, phi0, dphi0, alpha, alpha / 2, rho, c)

The second video explores the fundamental concepts of line search, offering practical examples of its application in optimization.

Curvature Condition: Wolfe Second Condition

This condition guarantees that the step size is not excessively large. It states that the directional derivative at the new point must be greater than or equal to a certain proportion of the derivative at the current point. This condition is verified as:

abs(df(x + alpha * d).dot(d)) ≤ -rho * dphi0

Where ( rho ) is a constant between 0 and 1.

If the curvature condition is not satisfied, the zoom function is invoked again to refine the step size:

if dphi >= 0:

return zoom(f, df, x, d, phi0, dphi0, alpha / 2.0, alpha, rho, c)

Zoom Function: Refining the Step Size

The zoom function is activated when the Armijo or curvature conditions are unmet. It takes in eleven arguments:

  1. Objective function ( f )
  2. Derivative of the objective function ( df )
  3. Current point ( x )
  4. Search direction ( d )
  5. Objective function value at the current point ( phi0 )
  6. Directional derivative at the current point ( dphi0 )
  7. Two candidate step sizes ( alpha_{lo} ) and ( alpha_{hi} )
  8. Shrinkage factor ( rho )
  9. Small positive constant ( c )
  10. Maximum allowed iterations ( max_{iter} )

Initialize a counter ( i ) to zero and run a loop for a maximum of ( max_{iter} ):

i = 0

while i < max_iter:

Calculate the midpoint between the two candidate step sizes:

alpha = (alpha_lo + alpha_hi) / 2.0

Next, compute the objective function value at the point ( x + alpha d ):

phi = f(x + alpha * d)

Adjust the bounds on the step size based on the sufficient decrease condition and curvature condition. If either condition fails, set ( alpha_{hi} ) to ( alpha ):

if phi > phi0 + c * alpha * dphi0 or phi >= f(x + alpha_lo * d):

alpha_hi = alpha

else:

Compute the directional derivative at ( x + alpha d ). If it meets the curvature condition, return the value of ( alpha ):

dphi = np.dot(df(x + alpha * d), d)

if abs(dphi) <= -c * dphi0:

return alpha

Otherwise, update the bounds on the step size:

if dphi * (alpha_hi - alpha_lo) >= 0:

alpha_hi = alpha_lo

alpha_lo = alpha

Increment the counter ( i ) and repeat the loop if ( i < max_{iter} ):

i += 1

If the loop concludes without returning a value, return ( alpha ), which is the current best estimate for the optimal step size.

Bringing It All Together

import numpy as np

from matplotlib import pyplot

from scipy.optimize import line_search

def line_search_function(f, df, x, d, alpha_init=1.0, rho=0.5, c=1e-4, max_iter=100):

alpha = alpha_init

phi0 = f(x)

dphi0 = np.dot(df(x), d)

i = 0

while i < max_iter:

phi = f(x + alpha * d)

if phi > phi0 + c * alpha * dphi0 or (phi >= f(x + (alpha / 2) * d) and i > 0):

return zoom(f, df, x, d, phi0, dphi0, alpha, alpha / 2.0, rho, c)

dphi = np.dot(df(x + alpha * d), d)

if abs(dphi) <= -c * dphi0:

return alpha

if dphi >= 0:

return zoom(f, df, x, d, phi0, dphi0, alpha / 2.0, alpha, rho, c)

alpha = alpha * rho

i += 1

return alpha, i

def zoom(f, df, x, d, phi0, dphi0, alpha_lo, alpha_hi, rho, c, max_iter=100):

i = 0

while i < max_iter:

alpha = (alpha_lo + alpha_hi) / 2.0

phi = f(x + alpha * d)

if phi > phi0 + c * alpha * dphi0 or phi >= f(x + alpha_lo * d):

alpha_hi = alpha

else:

dphi = np.dot(df(x + alpha * d), d)

if abs(dphi) <= -c * dphi0:

return alpha

if dphi * (alpha_hi - alpha_lo) >= 0:

alpha_hi = alpha_lo

alpha_lo = alpha

i += 1

return alpha

# Define the objective function and its derivative

def f(x):

return x**2 + 2*x + 1

def df(x):

return 2*x + 2

# Define the range of x values to plot

x = np.linspace(-5, 5, 100)

# Plot the objective function in blue

plt.plot(x, f(x), 'b', label='f(x)')

# Plot the derivative in red

plt.plot(x, df(x), 'r', label='f'(x)')

# Add axis labels and a legend

plt.xlabel('x')

plt.ylabel('y')

plt.legend()

# Show the plot

plt.show()

# Define the initial point and direction

x = 10

d = -df(x)

# Call the function

line_search_function(f, df, x, d)

# Find the step size

alpha = line_search(f, df, x, d)

# Print the step size

print(alpha[0])

Conclusion

Thank you for taking the time to read this article! If you have any suggestions for additional content, please let me know. Don't forget to subscribe to receive updates on my future publications. If you enjoyed this piece, consider following me for further insights. For those looking to delve deeper into this topic, my book "Data-Driven Decisions: A Practical Introduction to Machine Learning" is available at an affordable price, providing a comprehensive starting point in Machine Learning. Thank you once again!

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Navigating the Duality of Body Image and Eating Disorders

Explore the complex relationship between body image and eating disorders in today's hedonistic society.

A Culinary Journey Through Brooklyn's Pizza Culture

Join me as I explore Brooklyn's vibrant pizza scene, comparing my homemade creations with the city's iconic pizzerias.

Is Blogging Still Worth It in 2024? A Comprehensive Guide

Discover whether blogging remains a profitable venture in 2024 and beyond, with insights on adapting and thriving in the evolving landscape.