Pseudo loss function in distributed Adaboost

This is the sketch of proof of correctness of $L_{Hedge(\beta)}$ , pseudo loss function, in the case of distributed boosting algorithm. It has been known that boosting of functions with each applicator with more than 0.5 accuracy is sufficient to guarantee lowest minimum accuracy in iterative pooling. A similar sketch is presented for the distributed case where the score from each stage is not shared across all the computing nodes. While this guarantees the correctness, it does not guarantee the convergence, which is still a highly sought after problem in distributed algorithm.

For people uninitiated in the problem, it is highly advised to read this paper or for much simpler and faster read, these slides.

Notations

$w =$ weights Vector

$T =$ Number of iterations

$\beta$ = Parameter $\epsilon[0,1]$

$\ell^{t}$ = Loss Vector for the $t$ iteration (trials)

N = Total Number of weak learners

${L_i= }$ Strategy i’s cumulative loss = $\sum_{t=1}^{T}\ell_{i}^{t}$

$L_{A}$ = Algorithm A’s total cumulative loss = $\sum_{t=1}^{T}\mathbf{p}^{t}.\ell^{t}$

Algorithm

Consider the following Algorithm

Hedge( $\beta$ ) :

Do for t = 1, 2, …, T

Choose Allocation

\[
\mathbf{p}^{t}=\frac{\mathbf{w}^{t}}{\sum_{i-1}^{N}w_{i}^{t}}
\]

Receive loss vector $\ell^{t}\epsilon[0,1]^{N}$ from environment
Suffer loss $\mathbf{p}^{t}.\ell^{t}$
Set the new weights

\[
w_{i}^{t+1}=w_{i}^{t}\beta^{\ell_{i}^{t}}
\]

To Prove

\[
ln(\sum_{i=1}^{N}w_{i}^{T+1})\leq-(1-\beta)L_{Hedge(\beta)}
\]

Proof

From the Hedge( $\beta$ ) function, we know that

\[
\sum_{i=1}^{N}w_{i}^{t+1}=\sum_{i=1}^{N}w_{i}^{t}\beta^{\ell_{i}^{t}}
\]

For $\alpha\geq0$ and $r\epsilon[0,1]$ , by convexity we know that
$\alpha^{r}\leq1-(1-\alpha)r$ , therefore above eqn can be rewritten
as,

\[
\sum_{i=1}^{N}w_{i}^{t+1}\leq\sum_{i=1}^{N}w_{i}^{t}(1-(1-\beta)\ell_{i}^{t})
\]

\[
\sum_{i=1}^{N}w_{i}^{t+1}\leq\left(\sum_{i=1}^{N}w_{i}^{t}\right)(1-(1-\beta)\mathbf{p}^{t}.\ell_{i}^{t})
\]

For all the trials $t=1,\ldots,T$

\[
\sum_{i=1}^{N}w_{i}^{t+1}\leq\prod_{t=1}^{T}(1-(1-\beta)\mathbf{p}^{t}.\ell^{t}
\]

(since $1+x\leq e^{x}$ )

\[
\sum_{i=1}^{N}w_{i}^{t+1}\leq exp\left(-(1-\beta)\sum_{t=1}^{T}\mathbf{p}^{t}.\ell^{t}\right)
\]

Taking log on both sides

\[
ln\left(\sum_{i=1}^{N}w_{i}^{t+1}\right)\leq-(1-\beta)\sum_{t=1}^{T}\mathbf{p}^{t}.\ell^{t}
\]

Taking negative sign and switching the equality

\[
(1-\beta)\sum_{t=1}^{T}\mathbf{p}^{t}.\ell^{t}\leq ln\left(\sum_{i=1}^{N}w_{i}^{t+1}\right)
\]

\[
\sum_{t=1}^{T}\mathbf{p}^{t}.\ell^{t}\leq\frac{ln\left(\sum_{i=1}^{N}w_{i}^{t+1}\right)}{(1-\beta)}
\]

Now from the definition of $L_{A} = \sum_{t=1}^{T}\mathbf{p}^{t}.\ell^{t}$ ,
since the algorithm here is Hedge( $\beta$ ), where $\beta$ is the
parameter, $L_{A}$ , is called as $L_{Hedge(\beta)}$

therefore,

\[
L_{Hedge(\beta)}\leq\frac{ln\left(\sum_{i=1}^{N}w_{i}^{t+1}\right)}{(1-\beta)}
\]

QED

Written on December 10, 2014