Background Subtraction using Compressed Sensing

In many security applications, the image being recorded is not as important to keep track of as the difference between images. For example, if there is a camera recording an empty back hallway, the image being transmitted could be unchanging for hours or even days. If that hallway needs to be secure, however, that same image may still be constantly transmitted to a receiver, wasting bandwidth (in the case of a wireless video network) and power. Ideally, we would only transmit changes in the video, but most background subtraction algorithms require much more processing than can be accomplished on a cameras processor.

This can be accomplished with very little processing or complexity at the transmitter, pushing all of the complexity to the receiver. Considering most video networks already have a PC as the receiver, this solution fits with current network topologies. Compressed sensing is a mathematical process by which a signal can be undersampled (below the Nyquist rate), and recovered by using ℓ₁ minimization, given that the original signal can be represented as a sparse signal and that the sampling process is incoherent to the sparse signal.

To understand how this works, take two sequential frames in a video. Assume that in the first frame, there is just an empty hallway and in the second there is someone in that hallway. We can take the difference between these two frames to get d = Φ (x₁ − x₂). The difference signal d can then be recovered directly using the same ℓ₁ minimization as would be used to recover x₁ or x₂ individually. Since the camera is stationary, we know that the image represented by d is at least as sparse as the minimum between x₁ and x₂, and in most cases will be much more sparse since most of the difference image will be zeros. This allows us to compress d even more than the images that it is derived from, allowing us to transmit only the information needed to reconstruct the difference. This results in transmitting only the difference between two frames, effectively removing the background from the image of a stationary camera.