Devices that support video conferencing require an image sensor for video input, a display for video output, a microphone for audio input and a speaker for audio output.  These hardware peripherals are usually connected to a central processing unit (CPU) with access to networking hardware for transferring the video and audio data to and from a remote device.  These are the basic hardware components of a video conferencing device.

The CPU requires software to support video conferencing.  Device drivers are required to communicate with the video and audio hardware peripherals.  These drivers are usually integrated into the operating system.  They directly communicate with the hardware, and translate those signals to a standard API that video conferencing applications can use to exchange raw image and audio data with the hardware.  Audio and video codecs are also required to translate the raw data into a compressed form suitable for sending over a network.  The codecs might be implemented in software on the CPU, or on a separate co-processor.  Another requirement is signaling protocols that define how the video conferencing device will interact with other devices to establish and run media sessions.  The SIP and RTP protocols are commonly used for this.

Video Conferencing Device block diagram