Saturday, April 26, 2014

Data Compression in Computer Networks



Introduction of Data Compression

The goal of data compression is to represent an information source (e.g. a data file, a speech signal, an image, or a video signal) as accurately as possible using the fewest number of bits.
Compression is used just about everywhere. All the images you get on the web are compressed, typically in the JPEG or GIF formats, most modems use compression, HDTV will be compressed using MPEG-2, and several file systems automatically compress files when stored, and the rest of us do it by hand.
The task of compression consists of two components, an encoding algorithm that takes a message and generates a “compressed” representation (hopefully with fewer bits), and a decoding algorithm that reconstructs the original message or some approximation of it from the compressed representation. These two components are typically intricately tied together since they both have to understand the shared compressed representation.
Our goal is to develop a set of algorithms based on different techniques, to compress image/video along with audio data.
Encoding and decoding
Message. Binary data M we want to compress.
Encode. Generate a "compressed" representation C(M).
Decode. Reconstruct original message or some approximation M'
There are basically two types of compression techniques.
·       Lossless Compression
·       Lossy Compression
Lossless Compression
Lossless data compression has been suggested for many space science exploration mission applications either to increase the science return or to reduce the requirement for on-board memory, station contact time, and data archival volume. A Lossless compression technique guarantees full reconstruction of the original data without incurring any distortion in the process. The Lossless Data Compression technique recommended preserves the source data accuracy by removing redundancy from the application source data. In the decompression processes the original source data is reconstructed from the compressed data by restoring the removed redundancy. The reconstructed data is an exact replica of the original source data. The quantity of redundancy removed from the source data is variable and is highly dependent on the source data statistics, which are often non-stationary.
The Lossless Data Compression algorithm can be applied at the application data source or performed as a function of the on-board data system as shown in the following figure. The performance of the data compression algorithm is independent of where it is applied. However, if the data compression algorithm is part of the on-board data system, the on-board data system will, in general, have to capture the data in a buffer. In both cases, it may be necessary to rearrange the data into appropriate sequence before applying the data compression algorithm. The purpose of rearranging data is to improve the compression ratio.
Lossy Compression
Lossy compression is compression in which some of the information from the original message sequence is lost. This means the original sequences cannot be regenerated from the compressed sequence. Just because information is lost doesn’t mean the quality of the output is reduced.
For example, random noise has very high information content, but when present in an image or a sound file, we would typically be perfectly happy to drop it. Also certain losses in images or sound might be completely imperceptible to a human viewer (e.g. the loss of very high frequencies). For this reason, lossy compression algorithms on images can often get a factor of 2 better compressions than lossless algorithms with an imperceptible loss in quality. However, when quality does start degrading in a noticeable way, it is important to make sure it degrades in a way that is least objectionable to the viewer (e.g., dropping random pixels is probably more objectionable than dropping some colour information). For these reasons, the ways most lossy compression techniques are used are highly dependent on the media that is being compressed. Lossy compression for sound, for example, is very different than lossy compression for images.