Pseudo loss function in distributed Adaboost
This is the sketch of proof of correctness of , pseudo loss function, in the case of distributed boosting algorithm. It has been known that boosting of functions with each applicator with more than 0.5 accuracy is sufficient to guarantee lowest minimum accuracy in iterative pooling. A similar sketch is presented for the distributed case where the score from each stage is not shared across all the computing nodes. While this guarantees the correctness, it does not guarantee the convergence, which is still a highly sought after problem in distributed algorithm.
For people uninitiated in the problem, it is highly advised to read this paper or for much simpler and faster read, these slides.
Notations
weights Vector
Number of iterations
= Parameter
= Loss Vector for the iteration (trials)
N = Total Number of weak learners
Strategy i’s cumulative loss =
= Algorithm A’s total cumulative loss =
Algorithm
Consider the following Algorithm
Hedge() :
Do for t = 1, 2, …, T
- Choose Allocation
\[
\mathbf{p}^{t}=\frac{\mathbf{w}^{t}}{\sum_{i-1}^{N}w_{i}^{t}}
\]
-
Receive loss vector from environment
-
Suffer loss
-
Set the new weights
\[
w_{i}^{t+1}=w_{i}^{t}\beta^{\ell_{i}^{t}}
\]
To Prove
\[
ln(\sum_{i=1}^{N}w_{i}^{T+1})\leq-(1-\beta)L_{Hedge(\beta)}
\]
Proof
From the Hedge() function, we know that
\[
\sum_{i=1}^{N}w_{i}^{t+1}=\sum_{i=1}^{N}w_{i}^{t}\beta^{\ell_{i}^{t}}
\]
For and , by convexity we know that
, therefore above eqn can be rewritten
as,
\[
\sum_{i=1}^{N}w_{i}^{t+1}\leq\sum_{i=1}^{N}w_{i}^{t}(1-(1-\beta)\ell_{i}^{t})
\]
\[
\sum_{i=1}^{N}w_{i}^{t+1}\leq\left(\sum_{i=1}^{N}w_{i}^{t}\right)(1-(1-\beta)\mathbf{p}^{t}.\ell_{i}^{t})
\]
For all the trials
\[
\sum_{i=1}^{N}w_{i}^{t+1}\leq\prod_{t=1}^{T}(1-(1-\beta)\mathbf{p}^{t}.\ell^{t}
\]
(since )
\[
\sum_{i=1}^{N}w_{i}^{t+1}\leq exp\left(-(1-\beta)\sum_{t=1}^{T}\mathbf{p}^{t}.\ell^{t}\right)
\]
Taking log on both sides
\[
ln\left(\sum_{i=1}^{N}w_{i}^{t+1}\right)\leq-(1-\beta)\sum_{t=1}^{T}\mathbf{p}^{t}.\ell^{t}
\]
Taking negative sign and switching the equality
\[
(1-\beta)\sum_{t=1}^{T}\mathbf{p}^{t}.\ell^{t}\leq ln\left(\sum_{i=1}^{N}w_{i}^{t+1}\right)
\]
\[
\sum_{t=1}^{T}\mathbf{p}^{t}.\ell^{t}\leq\frac{ln\left(\sum_{i=1}^{N}w_{i}^{t+1}\right)}{(1-\beta)}
\]
Now from the definition of ,
since the algorithm here is Hedge(), where is the
parameter, , is called as
therefore,
\[
L_{Hedge(\beta)}\leq\frac{ln\left(\sum_{i=1}^{N}w_{i}^{t+1}\right)}{(1-\beta)}
\]
QED