Binary Tree Predictive Coding
John A Robinson
john@userport.com
The Latest: APT
The pages you are now reading document an old version of BTPC.
Please see
this page
for information about APT 1.0, the successor to BTPC.
Introduction
Binary Tree Predictive Coding (BTPC) is an efficient compression method
for still images. While JPEG and GIF are both used for general-purpose
still-image coding (for example for inline images on WWW pages), JPEG is
intended for lossy coding of photos, and GIF for lossless coding of
graphics. In contrast, BTPC is designed to perform both lossless and
lossy compression, and to be effective
for both photos and graphics. It is also suitable for compressing
multimedia images, which integrate two or more types of visual material.
Typical examples are document pages containing both text and
continuous-tone natural images, annotated photographs, and presentation
graphics with inset photos.
Here is a photo, compressed to 1.34 bits per pel by JPEG.
BTPC 4.1 uses 0.94 bits per pel to achieve the same quality
(34.7 dB PSNR).
Here is a graphic, compressed to 0.81 bits per pel with GIF.
Lossless coding with BTPC 4.1 requires 0.87 bits per pel, but using lossy
BTPC at the default quality setting codes the picture at 0.73 bits per
pel and produces this output:
The reconstruction accuracy is 44.80 dB PSNR.
(Baseline JPEG cannot code losslessly, but provides similar reconstruction
accuracy (41.93 dB) at 2.06 bits per pel.)
In general, BTPC can usually code graphics at or below the GIF rate with
invisible degradations.
Here is a multimedia image, compressed to 2.36 bits per pel by JPEG.
BTPC 4.1 uses 1.57 bits per pel to achieve the same quality
(30.7 dB PSNR). (Lossless GIF coding requires 5.57 bits per pel.)
BTPC compresses color images too. Performance is superior to JPEG on almost
all images, and far superior on graphics (see the results section for more
detail). However, BTPC lossless compression of color graphics is significantly
more costly than GIF coding with a color palette. As with monochrome graphics,
lossy color graphic compression with transparent degradation is usually
possible at around the GIF rate, (Improvement of lossless performance on
color graphics is a current research priority. See Current and Future Work
below).
The purpose of this series of HTML documents is to provide an online
description of the methods used in BTPC.
I retain the copyright on the code and this documentation, but, so far
as I know, there are no patents associated with BTPC. I have written a
paper on BTPC 1 [1] which uses a simple predictor and a separate lossless
coding stage. This was submitted to IEEE Transactions on Image Processing
in December 1994 and is currently on its first round of review.
Unfortunately it is not available online.
However, most of the information is included
here, and BTPC 2, which has shape-based prediction and integrated adaptive
huffman coding, is also discussed. BTPC 2 is the version implemented in
the evaluation code. Some material from [2] is also included here.
Here are the sections of the documentation, with synopses:
- Design criteria for general-purpose
still-image coding General-purpose means
the method must compress photos, graphics, text, B/W, monochrome or
color, with or without loss. It must be time and space efficient.
- Image pyramids and trees Image pyramids
represent the picture information as a set of bands of increasing size
(number of sample points) and bandwidth. Because high frequency activity
is localized, higher bands have many zero points. Arranging pels into
trees allows leaf codewords to be used to signal that pels in successive
bands are all zero.
- Binary pyramid structure BTPC orders pels into
a binary pyramid. At each level of the pyramid, a "band" of pel values
L(n) is split into two "subbands" L(n+1) and H(n+1), each with half as
many pels. L(n+1) is a downsampling of L(n) done by removing every even
pel on every odd-numbered row and every odd pel on every even-numbered
row. H(n+1) stores the differences between the true values of the
intervening pels, and predicted values based on neighboring L pels. The
L(n+1) channel is then used to generate the next level of the pyramid
(L(n+2) and H(n+2)).
- Binary tree structure
Each pel in band n+1 is associated with two children
in band n, and a leaf codeword is used to indicate that a particular pel
and all its descendents are zero.
- BTPC 1 Predictor design The basic
predictor uses closest-opposite-pair prediction. The current pel is
surrounded by a square of four pels from earlier bands; the average
of two of these four pels is used to estimate the current pel's
value; the chosen two are at opposite corners of the prediction
square and are closer in value to each other than the other
opposite-pair.
- Shape-based predictor design BTPC 2 uses
shape-based prediction, where the relative values of the four surrounding
pels are interpreted as edge, flat, point, ridge, etc., and a different
prediction formed for each case.
- Quantizer design Lossy data compression is
achieved by quantizing the difference (H) bands. BTPC 2 uses linear
deadzone quantizers with representative levels in the middle of each
quantizer bin.
- Coder design The H band histograms
are highly peaked at 0 and roughly laplacian in shape, so entropy coding
provides further lossless compression. BTPC 2 uses integrated adaptive
Huffman coders (adapted from Unix compact).
- Implementation issues Encoding is a
three-pass process, decoding a one-pass. Shape-based prediction is done
efficiently with lookup tables. Adaptive Huffman coding requires walking
a tree for each symbol and is therefore the processing bottleneck.
- Experimental results and discussion For
lossless compression of photographs BTPC is far superior to JPEG and GIF,
but for lossless compression of graphics, it is slightly inferior to GIF.
For lossy compression of photographs, its performance is close to that of
JPEG, and usually superior. Lossy coding of graphics is less
objectionable with BTPC than with JPEG, and is often acceptable at the
(lossless) GIF rate.
- Current and future work Current activity
includes: dealing more effectively with color, speeding up decoding (and,
with less emphasis, encoding), applying BTPC to moving images - videos
and animations.
- References
Please send comments and questions to John Robinson.