Pseudo loss function in distributed Adaboost

This is the sketch of proof of correctness of L_{Hedge(\beta)} , pseudo loss function, in the case of distributed boosting algorithm. It has been known that boosting of functions with each applicator with more than 0.5 accuracy is sufficient to guarantee lowest minimum accuracy in iterative pooling. A similar sketch is presented for the distributed case where the score from each stage is not shared across all the computing nodes. While this guarantees the correctness, it does not guarantee the convergence, which is still a highly sought after problem in distributed algorithm.

For people uninitiated in the problem, it is highly advised to read this paper or for much simpler and faster read, these slides.

Notations

w = weights Vector

T = Number of iterations

\beta = Parameter \epsilon[0,1]

\ell^{t} = Loss Vector for the t iteration (trials)

N = Total Number of weak learners

{L_i= } Strategy i’s cumulative loss = \sum_{t=1}^{T}\ell_{i}^{t}

L_{A} = Algorithm A’s total cumulative loss = \sum_{t=1}^{T}\mathbf{p}^{t}.\ell^{t}

 

Algorithm

Consider the following Algorithm

Hedge(\beta ) :

Do for t = 1, 2, …, T

  1. Choose Allocation

\[
\mathbf{p}^{t}=\frac{\mathbf{w}^{t}}{\sum_{i-1}^{N}w_{i}^{t}}
\]

  1. Receive loss vector \ell^{t}\epsilon[0,1]^{N} from environment

  2. Suffer loss \mathbf{p}^{t}.\ell^{t}

  3. Set the new weights

\[
w_{i}^{t+1}=w_{i}^{t}\beta^{\ell_{i}^{t}}
\]

To Prove

\[
ln(\sum_{i=1}^{N}w_{i}^{T+1})\leq-(1-\beta)L_{Hedge(\beta)}
\]

Proof

From the Hedge(\beta ) function, we know that

\[
\sum_{i=1}^{N}w_{i}^{t+1}=\sum_{i=1}^{N}w_{i}^{t}\beta^{\ell_{i}^{t}}
\]

For \alpha\geq0 and r\epsilon[0,1] , by convexity we know that
\alpha^{r}\leq1-(1-\alpha)r, therefore above eqn can be rewritten
as,

\[
\sum_{i=1}^{N}w_{i}^{t+1}\leq\sum_{i=1}^{N}w_{i}^{t}(1-(1-\beta)\ell_{i}^{t})
\]

\[
\sum_{i=1}^{N}w_{i}^{t+1}\leq\left(\sum_{i=1}^{N}w_{i}^{t}\right)(1-(1-\beta)\mathbf{p}^{t}.\ell_{i}^{t})
\]

For all the trials t=1,\ldots,T

\[
\sum_{i=1}^{N}w_{i}^{t+1}\leq\prod_{t=1}^{T}(1-(1-\beta)\mathbf{p}^{t}.\ell^{t}
\]

(since 1+x\leq e^{x} )

\[
\sum_{i=1}^{N}w_{i}^{t+1}\leq exp\left(-(1-\beta)\sum_{t=1}^{T}\mathbf{p}^{t}.\ell^{t}\right)
\]

Taking log on both sides

\[
ln\left(\sum_{i=1}^{N}w_{i}^{t+1}\right)\leq-(1-\beta)\sum_{t=1}^{T}\mathbf{p}^{t}.\ell^{t}
\]

Taking negative sign and switching the equality

\[
(1-\beta)\sum_{t=1}^{T}\mathbf{p}^{t}.\ell^{t}\leq ln\left(\sum_{i=1}^{N}w_{i}^{t+1}\right)
\]

\[
\sum_{t=1}^{T}\mathbf{p}^{t}.\ell^{t}\leq\frac{ln\left(\sum_{i=1}^{N}w_{i}^{t+1}\right)}{(1-\beta)}
\]

Now from the definition of L_{A} = \sum_{t=1}^{T}\mathbf{p}^{t}.\ell^{t} ,
since the algorithm here is Hedge(\beta ), where \beta is the
parameter, L_{A} , is called as L_{Hedge(\beta)}

therefore,

\[
L_{Hedge(\beta)}\leq\frac{ln\left(\sum_{i=1}^{N}w_{i}^{t+1}\right)}{(1-\beta)}
\]

QED

Written on December 10, 2014