Call Today 716.688.4675

Histogram of Oriented Gradients (HOG) for Object Detection

Histogram of Oriented Gradients (HOG) is a feature descriptor widely employed on several domains to characterize objects through their shapes. Local object appearance and shape can often be described by the distribution of local intensity gradients or edge directions.


Fig.1 The sequence of object detection using HOG

HOG is widely utilized as a feature described image region for object detection such as human face or human body detection. To increase the efficiency of the object searching, gamma and colors of the image should be normalized. The object search is based on the detection technique applied for the small images defined by sliding detector window that probes region by region of the original input image and its scaled versions.

The first step in HOG detection is to divide the source image into blocks (for example 16×16 pixels). Each block is divided by small regions, called cells (for example 8×8 pixels). Usually blocks overlap each other, so that the same cell may be in several blocks. For each pixel within the cell the vertical and horizontal gradients are obtained. The simplest method to do that is to use 1-D Sobel vertical and horizontal operators:

Gx(y,x) = Y(y,x+1) – Y(y,x-1); Gy(y,x) = Y(y+1,x) – Y(y-1,x)

Y(y,x) is the pixel intensity at coordinates x and y. Gx(y,x) is the horizontal gradient, and Gy(y,x) is the vertical gradient. The magnitude and phase of the gradient are determined as:


Next, the HOG is created for each cell. For the histogram, Q bins for the angle are chosen (for example Q=9). Usually unsigned orientation is used, so angles below 0o are increased by 180o.

Since different images may have different contrast, contrast normalization can be very useful. Normalization is done on the histogram vector v within a block. One of the following norms could be used:


A descriptor is assigned to each detector window. This descriptor consists of all the cell histograms for each block in the detector window. The detector window descriptor is used as information for object recognition. Training and testing happens using this descriptor. Many possible methods exist to classify objects using the descriptor such as SVM (support vector machine), neural networks, etc.

More Information

VOCAL Technologies, Ltd.
520 Lee Entrance, Suite 202
Amherst New York 14228
Phone: +1-716-688-4675
Fax: +1-716-639-0713