Introduction of Data Compression
The goal of data
compression is to represent an information source (e.g. a data file, a speech
signal, an image, or a video signal) as accurately as possible using the fewest
number of bits.
Compression is used just about everywhere.
All the images you get on the web are compressed, typically in the JPEG or GIF
formats, most modems use compression, HDTV will be compressed using MPEG-2, and
several file systems automatically compress files when stored, and the rest of
us do it by hand.
The task of compression consists of two
components, an encoding algorithm that takes a message and generates a
“compressed” representation (hopefully with fewer bits), and a decoding
algorithm that reconstructs the original message or some approximation of it
from the compressed representation. These two components are typically
intricately tied together since they both have to understand the shared
compressed representation.
Our goal is to develop a set of algorithms
based on different techniques, to compress image/video along with audio data.
Encoding and decoding
Message. Binary data M we want to compress.
Encode. Generate a "compressed"
representation C(M).
Decode. Reconstruct original message or some
approximation M'
There are basically two types of compression
techniques.
· Lossless Compression
· Lossy Compression
Lossless Compression
Lossless data compression has been suggested
for many space science exploration mission applications either to increase the
science return or to reduce the requirement for on-board memory, station
contact time, and data archival volume. A Lossless compression technique
guarantees full reconstruction of the original data without incurring any
distortion in the process. The Lossless Data Compression technique recommended
preserves the source data accuracy by removing redundancy from the application
source data. In the decompression processes the original source data is
reconstructed from the compressed data by restoring the removed redundancy. The
reconstructed data is an exact replica of the original source data. The
quantity of redundancy removed from the source data is variable and is highly
dependent on the source data statistics, which are often non-stationary.
The Lossless Data Compression algorithm can
be applied at the application data source or performed as a function of the
on-board data system as shown in the following figure. The performance of the
data compression algorithm is independent of where it is applied. However, if
the data compression algorithm is part of the on-board data system, the
on-board data system will, in general, have to capture the data in a buffer. In
both cases, it may be necessary to rearrange the data into appropriate sequence
before applying the data compression algorithm. The purpose of rearranging data
is to improve the compression ratio.
Lossy Compression
Lossy compression is compression in which
some of the information from the original message sequence is lost. This means
the original sequences cannot be regenerated from the compressed sequence. Just
because information is lost doesn’t mean the quality of the output is reduced.
For example, random noise has very high
information content, but when present in an image or a sound file, we would
typically be perfectly happy to drop it. Also certain losses in images or sound
might be completely imperceptible to a human viewer (e.g. the loss of very high
frequencies). For this reason, lossy compression algorithms on images can often
get a factor of 2 better compressions than lossless algorithms with an
imperceptible loss in quality. However, when quality does start degrading in a
noticeable way, it is important to make sure it degrades in a way that is least
objectionable to the viewer (e.g., dropping random pixels is probably more
objectionable than dropping some colour information). For these reasons, the
ways most lossy compression techniques are used are highly dependent on the
media that is being compressed. Lossy compression for sound, for example, is
very different than lossy compression for images.