For all speech enhancement algorithms, a voice activity detector (VAD) is utilized, not only to limit robust processing only during actual speech frames, but also to dynamically detect the noise floor. In an adaptive VAD, the threshold for speech detection is constantly being updated.
Consider an energy based VAD, with the energy computed as an average of the instantaneous temporal energies. Suppose the received signal at the microphones are given as:

$y_i[n] = s[n-\tau_i] + \nu_i[n]$

where $s[n]$ is the desired speech signal, $\tau_i$ is the relative delay with respect to microphone 1. $tau_1 = 0$ and $\nu_i[n]$ is i.i.d zero mean Gaussian noise. Then, the threshold can be adaptively computed using the equation:

$\alpha_T[n] = \beta_1 \underset{n}{argmax} \sum\limits_{m=0}^{M-1} (\sum\limits_{i-1}^{N} y_i[n-m])^2 + (1-\beta_1) \underset{n}{argmin} \sum\limits_{m=0}^{M-1} (\sum\limits_{i-1}^{N} y_i[n-m])^2$

where $M$ is the number of samples per frame, $N$ is the number of microphones in theh array and $0 \le \beta_1 \le 1$ is a design parameter. A sample performance of this algorithm is shown in Figure 1 below: