This article aims to be a follow-up of my introduction into methods of non-linear optimization. The recent was giving an overview about heuristic methods which you can read here
Optimizing like Bees and Birds … and now with Golang
Machine Learning … probably is the biggest tech and research trend of this century. This article is not about Machine…
Now we will turn ourselves towards very often used standard algorithms which you can find in action in many applications … especially in our so much loved subject ‘machine learning’.
Let’s start easy:
The main task of line search is this:
In words, we are given a function f which is continuous and maps an interval of the real-line into the real-line. We are searching for a point within this interval which locally minimizes f.
The main idea is to construct a sequence of intervals such that each has the property that f takes a local minimum inside. Moreover the intervals are constructed to narrow with each iteration. After sufficiently enough steps, the intervals’ width has decreased so much that one of its endpoints can be considered as approximation for a local minimum.
It needs a little bit of thinking in order to construct such a sequence by only having to pick one more point in each iteration. Let us look into the details:
This can be illustrated by:
By comparing the values of f at c and d we can make conclusions like:
From this statement we know what we have to do next: Narrowing the interval in accordance to above outcome. That is, if a local minimum is ensured in [a,d] then this becomes our next considered interval, otherwise [c,b].
First let us assume [a,d] would be the chosen interval. But how to proceed now? This interval has a, d as its ends and contains one intermediate value, that is c. In order to repeat above argument, we would have to chose a fourth point. Actually we could do this arbitrarily, but in order to keep the algorithm as unbiased as possible, we aim to choose this point exactly the same way we have chosen c and d previously.
So our requirements to this new point are: It lies in with a distance of q(d-a) at the right of a. This is easily achieved. On the other hand we also want c to fulfill the corresponding relation w.r.t the new interval, that is: c is lying with a distance of q(d-a) at the left of d.
Of course, the later is not automatically fulfilled. We have to chose q in a suitable manner. We can formulate above requirement:
If [c,b] had been the chosen interval we would have d as intermediate point. This times we requires d to be the point next to the lower end of the new interval. Again, we can pose an equation for q, and find that it is the same equation as in previous case (details are in analogy with the above). Therefore, we can say, the solution of q works for both scenarios.
This all gives us the full algorithm at hand. Remember, all what we have done in the first step, we can, by using this specific q, repeat in the next step.
At the very beginning we define our q and initialize the c and d for the first step. See, how this exactly coincides with our above definitions.
In each step we construct a new smaller interval but always remain our notations in a sense that: a denotes the lower end, b the upper end, c the one at right of a and d the one at left of b.
This means b-a is always the current interval’s width, and so we can check if our accuracy (acc) has been reached.
In each step we compare the value of the inner points c and d. The case f(c)<f(d) corresponds to the case which we have elaborated in detail and you easily verify how the new interval’s points are determined from the old ones. The case f(c) ≥ f(d) is covered in analogy within the else-block.
You can checkout and play with the code by using this link to a go-playground: https://play.golang.org/p/JhGuQBAgkov
By construction the initial interval is shrunk by the factor (1-q). This means after n steps our accuracy reaches (1-q)^n*(b-a). For a function which contains exactly one local minimum inside [a,b], the procedure will converge against this minimum. This directly follows from each interval ensuring to contain a local minimum, which in this case can only be the unique one. In case a function contain several local minima, it converges against any of them.
One interesting note is that we don’t necessarily have a sequence of values converging against a local minimum, but rather a sequence of intervals each enclosing more and more narrow local minima.
In production-ready implementations there are often used improved versions of line search which guaranty a much higher rate of convergence.