Why Simulated Annealing works
Of all optimization methods, Simulated Annealing is one of the most fascinating one. If you need a quick refresher in Simulated Annealing then see this slide. Is Simulated Annealing better than other techniques in finding the global optima ? Perhaps. I will discuss why I think it is one of the best optimization technique and why so.
Simulated Annealing is another example of use of Interdisciplinary research, about which I wrote two weeks ago. Annealing is a metallurgical process, where heating a metal beyond a critical temperature and slow cooling gives it unique properties (related to lower energy states). Simulated Annealing is, as the name suggests, simulation of annealing process. Algorithm for Simulated Annealing is very close to real annealing process. Infact the cost function used is same as the distribution underlying the movement of molecules in annealing process, which is Boltzmann distribution.
P(E) = e-E/kT
where, P(E) = Energy Function associated with Boltzman distribution
T = Temperature of the molecule
k = Boltzman constant ( 1.380 x 10-23 J/K)
Above function gives the probability of molecule at given Temperature, in Simulated Annealing it transcends to the search space. At higher Temperature(T) there is no restriction on the search space and algorithm is free to move anywhere. Certainly this means it will lead to some non-efficient movement across the solution space and perhaps in the wrong direction. As counter-intuitive it may sound,** it is this erratic behaviour that gives Simulated Annealing the power to find global optima** and not get stuck at local optimum.
As the local gradient function will lead the heuristics into the local optima, a non-negligible probability of search iteration moving outside the gradient descent will help push the search agent outside the realm of locality. This ensures that there will be search beyond local descent. Caveat though, this also means that sometimes even when the agent has successfully found the ‘best solution’ it will move away from it. This is nature of all non-deterministic and stochastic algorithm and this cannot be dealt with.
Now if there were no way to finally stop this random walk then this method would never reach a solution and the heuristic would be equivalent to random search, but as shown by the energy function above, when the Temperature(T) gradually decreases the movement across solution space is constrained. Gradually it starts behaving like a ‘Gradient Descent’. After critical Temperature (TLOW), algorithm zeroes on the currently found gradient slope.
Consider a wild case of solution space where the base of Global Optima is very narrow (read lower probability for agent following uniform distribution) and local optimum with broad base. This means any search agent has higher probability of getting stuck at local optimum. For e.g as in the figure below :
It is for these kind of search space, Simulated Annealing works and performs much better than other optimization techniques. Certainly this is not the best method for several other solution space requiring more foresighted heuristics. Nevertheless due to this unique property Simulated Annealing is not going to become obsolete very soon.
More detailed presentation of Simulated Annealing