1991-10-06 20:00:00 -04:00
|
|
|
|
1992-03-16 19:00:00 -05:00
|
|
|
JPEG SYSTEM ARCHITECTURE 29-FEB-92
|
1991-10-06 20:00:00 -04:00
|
|
|
|
|
|
|
|
|
|
|
This file provides an overview of the "architecture" of the portable JPEG
|
|
|
|
software; that is, the functions of the various modules in the system and the
|
|
|
|
interfaces between modules. For more precise details about any data structure
|
|
|
|
or calling convention, see the header files.
|
|
|
|
|
|
|
|
Important note: when I say "module" I don't mean "a C function", which is what
|
|
|
|
some people seem to think the term means. A separate C source file is closer
|
|
|
|
to the mark. Also, it is frequently the case that several different modules
|
|
|
|
present a common interface to callers; the term "object" or "method" refers to
|
|
|
|
this common interface (see "Poor man's object-oriented programming", below).
|
|
|
|
|
1992-03-16 19:00:00 -05:00
|
|
|
JPEG-specific terminology follows the JPEG standard:
|
1991-10-06 20:00:00 -04:00
|
|
|
A "component" means a color channel, e.g., Red or Luminance.
|
|
|
|
A "sample" is a pixel component value (i.e., one number in the image data).
|
|
|
|
A "coefficient" is a frequency coefficient (a DCT transform output number).
|
|
|
|
The term "block" refers to an 8x8 group of samples or coefficients.
|
|
|
|
"MCU" (minimum coded unit) is the same as "MDU" of the R8 draft; i.e., an
|
|
|
|
interleaved set of blocks of size determined by the sampling factors,
|
|
|
|
or a single block in a noninterleaved scan.
|
|
|
|
|
|
|
|
|
|
|
|
*** System requirements ***
|
|
|
|
|
|
|
|
We must support compression and decompression of both Huffman and
|
|
|
|
arithmetic-coded JPEG files. Any set of compression parameters allowed by the
|
|
|
|
JPEG spec should be readable for decompression. (We can be more restrictive
|
|
|
|
about what formats we can generate.) (Note: for legal reasons no arithmetic
|
|
|
|
coding implementation is currently included in the publicly available sources.
|
|
|
|
However, the architecture still supports it.)
|
|
|
|
|
|
|
|
We need to be able to handle both raw JPEG files (more specifically, the JFIF
|
|
|
|
format) and JPEG-in-TIFF (C-cubed's format, and perhaps Kodak's). Even if we
|
|
|
|
don't implement TIFF ourselves, other people will want to use our code for
|
|
|
|
that. This means that generation and scanning of the file header has to be
|
|
|
|
separated out.
|
|
|
|
|
|
|
|
Perhaps we should be prepared to support the JPEG lossless mode (also referred
|
|
|
|
to in the spec as spatial DPCM coding). A lot of people seem to believe they
|
|
|
|
need this... whether they really do is debatable, but the customer is always
|
|
|
|
right. On the other hand, there will not be much sharable code between the
|
|
|
|
lossless and lossy modes! At best, a lossless program could be derived from
|
|
|
|
parts of the lossy version. For now we will only worry about the lossy mode.
|
|
|
|
|
|
|
|
I see no real value in supporting the JPEG progressive modes (note that
|
|
|
|
spectral selection and successive approximation are two different progressive
|
|
|
|
modes). These are only of interest when painting the decompressed image in
|
|
|
|
real-time, which nobody is going to do with a pure software implementation.
|
|
|
|
|
|
|
|
There is some value in supporting the hierarchical mode, which allows for
|
|
|
|
successive frames of higher resolution. This could be of use for including
|
1992-03-16 19:00:00 -05:00
|
|
|
"thumbnail" representations. However, this appears to add a lot more
|
1991-10-06 20:00:00 -04:00
|
|
|
complexity than it is worth.
|
|
|
|
|
|
|
|
A variety of uncompressed image file formats and user interfaces must be
|
|
|
|
supported. These aspects therefore have to be kept separate from the rest of
|
|
|
|
the system. A particularly important issue is whether color quantization of
|
|
|
|
the output is needed (i.e., whether a colormap is used). We should be able to
|
|
|
|
support both adaptive quantization (which requires two or more passes over the
|
|
|
|
image) and nonadaptive (quantization to a prespecified colormap, which can be
|
|
|
|
done in one pass).
|
|
|
|
|
|
|
|
Memory usage is an important concern, since we will port this code to 80x86
|
|
|
|
and other limited-memory machines. For large intermediate structures, we
|
|
|
|
should be able to use either virtual memory or temporary files.
|
|
|
|
|
|
|
|
It should be possible to build programs that handle compression only,
|
|
|
|
decompression only, or both, without much duplicate or unused code in any
|
|
|
|
version. (In particular, a decompression-only version should have no extra
|
|
|
|
baggage.)
|
|
|
|
|
|
|
|
|
|
|
|
*** Compression overview ***
|
|
|
|
|
|
|
|
The *logical* steps needed in (non-lossless) JPEG compression are:
|
|
|
|
|
|
|
|
1. Conversion from incoming image format to a standardized internal form
|
1992-03-16 19:00:00 -05:00
|
|
|
(either RGB or grayscale).
|
1991-10-06 20:00:00 -04:00
|
|
|
|
|
|
|
2. Color space conversion (e.g., RGB to YCbCr). This is a null step for
|
1992-03-16 19:00:00 -05:00
|
|
|
grayscale (unless we support mapping color inputs to grayscale, which
|
1991-10-06 20:00:00 -04:00
|
|
|
would most easily be done here). Gamma adjustment may also be needed here.
|
|
|
|
|
|
|
|
3. Subsampling (reduction of number of samples in some color components).
|
|
|
|
This step operates independently on each color component.
|
|
|
|
|
|
|
|
4. MCU extraction (creation of a single sequence of 8x8 sample blocks).
|
|
|
|
This step and the following ones are performed once for each scan
|
|
|
|
in the output JPEG file, i.e., once if making an interleaved file and more
|
|
|
|
than once for a noninterleaved file.
|
|
|
|
Note: both this step and the previous one must deal with edge conditions
|
|
|
|
for pictures that aren't a multiple of the MCU dimensions. Alternately,
|
|
|
|
we could expand the picture to a multiple of an MCU before doing these
|
|
|
|
two steps. (The latter seems better and has been adopted below.)
|
|
|
|
|
|
|
|
5. DCT transformation of each 8x8 block.
|
|
|
|
|
|
|
|
6. Quantization scaling and zigzag reordering of the elements in each 8x8
|
|
|
|
block.
|
|
|
|
|
|
|
|
7. Huffman or arithmetic encoding of the transformed block sequence.
|
|
|
|
|
|
|
|
8. Output of the JPEG file with whatever headers/markers are wanted.
|
|
|
|
|
|
|
|
Of course, the actual implementation will combine some of these logical steps
|
|
|
|
for efficiency. The trick is to keep these logical functions as separate as
|
|
|
|
possible without losing too much performance.
|
|
|
|
|
|
|
|
In addition to these logical pipeline steps, we need various modules that
|
|
|
|
aren't part of the data pipeline. These are:
|
|
|
|
|
|
|
|
A. Overall control (sequencing of other steps & management of data passing).
|
|
|
|
|
|
|
|
B. User interface; this will determine the input and output files, and supply
|
|
|
|
values for some compression parameters. Note that this module is highly
|
|
|
|
platform-dependent.
|
|
|
|
|
|
|
|
C. Compression parameter selection: some parameters should be chosen
|
|
|
|
automatically rather than requiring the user to find a good value.
|
|
|
|
The prototype only does this for the back-end (Huffman or arithmetic)
|
|
|
|
parameters, but further in the future, more might be done. A
|
|
|
|
straightforward approach to selection is to try several values; this
|
|
|
|
requires being able to repeatedly apply some portion of the pipeline and
|
|
|
|
inspect the results (without actually outputting them). Probably only
|
|
|
|
entropy encoding parameters can reasonably be done this way; optimizing
|
|
|
|
earlier steps would require too much data to be reprocessed (not to mention
|
|
|
|
the problem of interactions between parameters for different steps).
|
|
|
|
What other facilities do we need to support automatic parameter selection?
|
|
|
|
|
|
|
|
D. A memory management module to deal with small-memory machines. This must
|
|
|
|
create the illusion of virtual memory for certain large data structures
|
|
|
|
(e.g., the subsampled image or the transformed coefficients).
|
|
|
|
The interface to this must be defined to minimize the overhead incurred,
|
|
|
|
especially on virtual-memory machines where the module won't do much.
|
|
|
|
|
|
|
|
In many cases we can arrange things so that a data stream is produced in
|
|
|
|
segments by one module and consumed by another without the need to hold it all
|
|
|
|
in (virtual) memory. This is obviously not possible for any data that must be
|
|
|
|
scanned more than once, so it won't work everywhere.
|
|
|
|
|
|
|
|
The major variable at this level of detail is whether the JPEG file is to be
|
|
|
|
interleaved or not; that affects the order of processing so fundamentally that
|
|
|
|
the central control module must know about it. Some of the other modules may
|
|
|
|
need to know it too. It would simplify life if we didn't need to support
|
|
|
|
noninterleaved images, but that is not reasonable.
|
|
|
|
|
|
|
|
Many of these steps operate independently on each color component; the
|
|
|
|
knowledge of how many components there are, and how they are interleaved,
|
|
|
|
ought to be confined to the central control module. (Color space conversion
|
|
|
|
and MCU extraction probably have to know it too.)
|
|
|
|
|
|
|
|
|
|
|
|
*** Decompression overview ***
|
|
|
|
|
|
|
|
Decompression is roughly the inverse process from compression, but there are
|
|
|
|
some additional steps needed to produce a good output image.
|
|
|
|
|
|
|
|
The *logical* steps needed in (non-lossless) JPEG decompression are:
|
|
|
|
|
|
|
|
1. Scanning of the JPEG file, decoding of headers/markers etc.
|
|
|
|
|
|
|
|
2. Huffman or arithmetic decoding of the coefficient sequence.
|
|
|
|
|
|
|
|
3. Quantization descaling and zigzag reordering of the elements in each 8x8
|
|
|
|
block.
|
|
|
|
|
|
|
|
4. MCU disassembly (conversion of a possibly interleaved sequence of 8x8
|
|
|
|
blocks back to separate components in pixel map order).
|
|
|
|
|
1992-03-16 19:00:00 -05:00
|
|
|
5. (Optional) Cross-block smoothing per JPEG section K.8 or a similar
|
1991-10-06 20:00:00 -04:00
|
|
|
algorithm. (Steps 5-8 operate independently on each component.)
|
|
|
|
|
|
|
|
6. Inverse DCT transformation of each 8x8 block.
|
|
|
|
|
|
|
|
7. De-subsampling. At this point a pixel image of the original dimensions
|
|
|
|
has been recreated.
|
|
|
|
|
|
|
|
8. Post-subsampling smoothing. This can be combined with de-subsampling,
|
|
|
|
by using a convolution-like calculation to generate each output pixel
|
|
|
|
directly from one or more input pixels.
|
|
|
|
|
|
|
|
9. Cropping to the original pixel dimensions (throwing away duplicated
|
|
|
|
pixels at the edges). It is most convenient to do this now, as the
|
|
|
|
preceding steps are simplified by not having to worry about odd picture
|
|
|
|
sizes.
|
|
|
|
|
|
|
|
10. Color space reconversion (e.g., YCbCr to RGB). This is a null step for
|
1992-03-16 19:00:00 -05:00
|
|
|
grayscale. (Note that mapping a color JPEG to grayscale output is most
|
|
|
|
easily done in this step.) Gamma adjustment may also be needed here.
|
1991-10-06 20:00:00 -04:00
|
|
|
|
|
|
|
11. Color quantization (only if a colormapped output format is requested).
|
1992-03-16 19:00:00 -05:00
|
|
|
NOTE: it is probably preferable to perform quantization in the internal
|
|
|
|
(JPEG) colorspace rather than the output colorspace. Doing it that way,
|
|
|
|
color conversion need only be applied to the colormap entries, not to
|
|
|
|
every pixel; and quantization gets to operate in a non-gamma-corrected
|
|
|
|
space. But the internal space may not be suitable for some algorithms.
|
|
|
|
The system design is such that only the color quantizer module knows
|
|
|
|
whether color conversion happens before or after quantization.
|
1991-10-06 20:00:00 -04:00
|
|
|
|
|
|
|
12. Writing of the desired image format.
|
|
|
|
|
|
|
|
As before, some of these will be combined into single steps. When dealing
|
|
|
|
with a noninterleaved JPEG file, steps 2-9 will be performed once for each
|
1992-03-16 19:00:00 -05:00
|
|
|
scan; the resulting data will need to be buffered up so that steps 10-12 can
|
1991-10-06 20:00:00 -04:00
|
|
|
process all the color components together.
|
|
|
|
|
|
|
|
The same auxiliary modules are needed as before, except for compression
|
|
|
|
parameter selection. Note that rerunning a pipeline stage should never be
|
|
|
|
needed during decompression. This may allow a simpler control module. The
|
|
|
|
user interface might also be simpler since it need not supply any compression
|
|
|
|
parameters.
|
|
|
|
|
|
|
|
As before, not all of these steps require the whole image to be stored.
|
|
|
|
Actually, two-pass color quantization is the only step that logically requires
|
|
|
|
this; everything else could be done a few raster lines at a time (at least for
|
|
|
|
interleaved images). We might want to make color quantization be a separate
|
|
|
|
program because of this fact.
|
|
|
|
|
|
|
|
Again, many of the steps should be able to work on one color component in
|
|
|
|
ignorance of the other components.
|
|
|
|
|
|
|
|
|
|
|
|
*** Implications of noninterleaved formats ***
|
|
|
|
|
|
|
|
Much of the work can be done in a single pass if an interleaved JPEG file
|
|
|
|
format is used. With a noninterleaved JPEG file, separating or recombining
|
|
|
|
the components will force use of virtual memory (on a small-memory machine,
|
|
|
|
we probably would want one temp file per color component).
|
|
|
|
|
|
|
|
If any of the image formats we read or write are noninterleaved, the opposite
|
|
|
|
condition might apply: processing a noninterleaved JPEG file would be more
|
|
|
|
efficient. Offhand, though, I can't think of any popular image formats that
|
|
|
|
work that way; besides the win would only come if the same color space were
|
|
|
|
used in JPEG and non-JPEG files. It's not worth the complexity to make the
|
|
|
|
system design accommodate that case efficiently.
|
|
|
|
|
|
|
|
An argument against interleaving is that it makes the decompressor need more
|
|
|
|
memory for cross-block smoothing (since the minimum processable chunk of the
|
|
|
|
image gets bigger). With images more than 1000 pixels across, 80x86 machines
|
|
|
|
are likely to have difficulty in handling this feature.
|
|
|
|
|
|
|
|
Another argument against interleaving is that the noninterleaved format allows
|
|
|
|
a wider range of sampling factors, since the limit of ten blocks per MCU no
|
|
|
|
longer applies. We could get around this by blithely ignoring the spec's
|
|
|
|
limit of ten blocks, but that seems like a bad idea (especially since it makes
|
|
|
|
the above problem worse).
|
|
|
|
|
|
|
|
The upshot is that we need to support both interleaved and noninterleaved JPEG
|
|
|
|
formats, since for any given machine and picture size one may be much more
|
|
|
|
efficient than the other. However, the non-JPEG format we convert to or from
|
|
|
|
will be assumed to be an interleaved format (i.e., it produces or stores all
|
|
|
|
the components of a pixel together).
|
|
|
|
|
|
|
|
I do not think it is necessary for the compressor to be able to output
|
|
|
|
partially-interleaved formats (multiple scans, some of which interleave a
|
|
|
|
subset of the components). However, the decompressor must be able to read
|
|
|
|
such files to conform to the spec.
|
|
|
|
|
|
|
|
|
|
|
|
*** Data formats ***
|
|
|
|
|
|
|
|
Pipeline steps that work on pixel sample values will use the following data
|
|
|
|
structure:
|
|
|
|
|
|
|
|
typedef something JSAMPLE; a pixel component value, 0..MAXJSAMPLE
|
|
|
|
typedef JSAMPLE *JSAMPROW; ptr to a row of samples
|
|
|
|
typedef JSAMPROW *JSAMPARRAY; ptr to a list of rows
|
|
|
|
typedef JSAMPARRAY *JSAMPIMAGE; ptr to a list of color-component arrays
|
|
|
|
|
|
|
|
The basic element type JSAMPLE will be one of unsigned char, (signed) char, or
|
|
|
|
unsigned short. Unsigned short will be used if samples wider than 8 bits are
|
|
|
|
to be supported (this is a compile-time option). Otherwise, unsigned char is
|
|
|
|
used if possible. If the compiler only supports signed chars, then it is
|
|
|
|
necessary to mask off the value when reading. Thus, all reads of sample
|
|
|
|
values should be coded as "GETJSAMPLE(value)", where the macro will be defined
|
|
|
|
as "((value)&0xFF)" on signed-char machines and "(value)" elsewhere.
|
|
|
|
|
|
|
|
With these conventions, JSAMPLE values can be assumed to be >= 0. This should
|
|
|
|
simplify correct rounding during subsampling, etc. The JPEG draft's
|
|
|
|
specification that sample values run from -128..127 will be accommodated by
|
|
|
|
subtracting 128 just as the sample value is copied into the source array for
|
|
|
|
the DCT step (this will be an array of signed shorts or longs). Similarly,
|
|
|
|
during decompression the output of the IDCT step will be immediately shifted
|
|
|
|
back to 0..255. (NB: different values are required when 12-bit samples are in
|
|
|
|
use. The code should be written in terms of MAXJSAMPLE and CENTERJSAMPLE,
|
|
|
|
which will be #defined as 255 and 128 respectively in an 8-bit implementation,
|
|
|
|
and as 4095 and 2048 in a 12-bit implementation.)
|
|
|
|
|
|
|
|
On compilers that don't support "unsigned short", signed short can be used for
|
|
|
|
a 12-bit implementation. To support lossless coding (which allows up to
|
|
|
|
16-bit data precision) masking with 0xFFFF in GETJSAMPLE might be necessary.
|
|
|
|
(But if "int" is 16 bits then using "unsigned int" is the best solution.)
|
|
|
|
|
|
|
|
Notice that we use a pointer per row, rather than a two-dimensional JSAMPLE
|
|
|
|
array. This choice costs only a small amount of memory and has several
|
|
|
|
benefits:
|
|
|
|
|
|
|
|
* Code using the data structure doesn't need to know the allocated width of
|
|
|
|
the rows. This will simplify edge expansion/compression, since we can work
|
|
|
|
in an array that's wider than the logical picture width.
|
|
|
|
|
|
|
|
* The rows forming a component array may be allocated at different times
|
|
|
|
without extra copying. This will simplify working a few scanlines at a time,
|
|
|
|
especially in smoothing steps that need access to the previous and next rows.
|
|
|
|
|
|
|
|
* Indexing doesn't require multiplication; this is a performance win on many
|
|
|
|
machines.
|
|
|
|
|
|
|
|
Note that each color component is stored in a separate array; we don't use the
|
|
|
|
traditional structure in which the components of a pixel are stored together.
|
|
|
|
This simplifies coding of steps that work on each component independently,
|
|
|
|
because they don't need to know how many components there are. Furthermore,
|
|
|
|
we can read or write each component to a temp file independently, which is
|
|
|
|
helpful when dealing with noninterleaved JPEG files.
|
|
|
|
|
|
|
|
A specific sample value will be accessed by code such as
|
|
|
|
GETJSAMPLE(image[colorcomponent][row][col])
|
|
|
|
where col is measured from the image left edge, but row is measured from the
|
|
|
|
first sample row currently in memory. Either of the first two indexings can
|
|
|
|
be precomputed by copying the relevant pointer.
|
|
|
|
|
|
|
|
|
|
|
|
Pipeline steps that work on frequency-coefficient values will use the
|
|
|
|
following data structure:
|
|
|
|
|
|
|
|
typedef short JCOEF; a 16-bit signed integer
|
|
|
|
typedef JCOEF JBLOCK[64]; an 8x8 block of coefficients
|
|
|
|
typedef JBLOCK *JBLOCKROW; ptr to one horizontal row of 8x8 blocks
|
|
|
|
typedef JBLOCKROW *JBLOCKARRAY; ptr to a list of such rows
|
|
|
|
typedef JBLOCKARRAY *JBLOCKIMAGE; ptr to a list of color component arrays
|
|
|
|
|
|
|
|
The underlying type is always a 16-bit signed integer (this is "short" on all
|
|
|
|
machines of interest, but let's use the typedef name anyway). These are
|
|
|
|
grouped into 8x8 blocks (we should use #defines DCTSIZE and DCTSIZE2 rather
|
|
|
|
than "8" and "64"). The contents of a block may be either in "natural" or
|
|
|
|
zigzagged order, and may be true values or divided by the quantization
|
|
|
|
coefficients, depending on where the block is in the pipeline.
|
|
|
|
|
|
|
|
Notice that the allocation unit is now a row of 8x8 blocks, corresponding to
|
|
|
|
eight rows of samples. Otherwise the structure is much the same as for
|
|
|
|
samples, and for the same reasons.
|
|
|
|
|
|
|
|
On machines where malloc() can't handle a request bigger than 64Kb, this data
|
|
|
|
structure limits us to rows of less than 512 JBLOCKs, which would be a picture
|
|
|
|
width of 4000 pixels. This seems an acceptable restriction.
|
|
|
|
|
|
|
|
|
|
|
|
On 80x86 machines, the bottom-level pointer types (JSAMPROW and JBLOCKROW)
|
|
|
|
must be declared as "far" pointers, but the upper levels can be "near"
|
|
|
|
(implying that the pointer lists are allocated in the DS segment).
|
|
|
|
To simplify sharing code, we'll have a #define symbol FAR, which expands to
|
|
|
|
the "far" keyword when compiling on 80x86 machines and to nothing elsewhere.
|
|
|
|
|
|
|
|
|
|
|
|
The data arrays used as input and output of the DCT transform subroutine will
|
|
|
|
be declared using a separate typedef; they could be arrays of "short", "int"
|
|
|
|
or "long" independently of the above choices. This would depend on what is
|
|
|
|
needed to make the compiler generate correct and efficient multiply/add code
|
|
|
|
in the DCT inner loops. No significant speed or memory penalty will be paid
|
|
|
|
to have a different representation than is used in the main image storage
|
|
|
|
arrays, since some additional value-by-value processing is done at the time of
|
|
|
|
creation or extraction of the DCT data anyway (e.g., add/subtract 128).
|
|
|
|
|
|
|
|
|
|
|
|
*** Poor man's object-oriented programming ***
|
|
|
|
|
|
|
|
It should be pretty clear by now that we have a lot of quasi-independent
|
|
|
|
steps, many of which have several possible behaviors. To avoid cluttering the
|
|
|
|
code with lots of switch statements, we'll use a simple form of object-style
|
|
|
|
programming to separate out the different possibilities.
|
|
|
|
|
|
|
|
For example, Huffman and arithmetic coding will be implemented as two separate
|
|
|
|
modules that present the same external interface; at runtime, the calling code
|
|
|
|
will access the proper module indirectly through an "object".
|
|
|
|
|
|
|
|
We can get the limited features we need while staying within portable C. The
|
|
|
|
basic tool is a function pointer. An "object" is just a struct containing one
|
|
|
|
or more function pointer fields, each of which corresponds to a method name in
|
|
|
|
real object-oriented languages. During initialization we fill in the function
|
|
|
|
pointers with references to whichever module we have determined we need to use
|
|
|
|
in this run. Then invocation of the module is done by indirecting through a
|
|
|
|
function pointer; on most architectures this is no more expensive (and
|
|
|
|
possibly cheaper) than a switch, which would be the only other way of making
|
|
|
|
the required run-time choice. The really significant benefit, of course, is
|
|
|
|
keeping the source code clean and well structured.
|
|
|
|
|
|
|
|
For example, the interface for entropy decoding (Huffman or arithmetic
|
|
|
|
decoding) might look like this:
|
|
|
|
|
|
|
|
struct function_ptr_struct {
|
|
|
|
...
|
|
|
|
/* Entropy decoding methods */
|
|
|
|
void (*prepare_for_scan) ();
|
|
|
|
void (*get_next_mcu) ();
|
|
|
|
...
|
|
|
|
};
|
|
|
|
|
|
|
|
typedef struct function_ptr_struct * function_ptrs;
|
|
|
|
|
|
|
|
The struct pointer is what will actually be passed around. A call site might
|
|
|
|
look like this:
|
|
|
|
|
|
|
|
some_function (function_ptrs fptrs)
|
|
|
|
{
|
|
|
|
...
|
|
|
|
(*fptrs->get_next_mcu) (...);
|
|
|
|
...
|
|
|
|
}
|
|
|
|
|
|
|
|
(It might be worth inventing some specialized macros to hide the rather ugly
|
|
|
|
syntax for method definition and call.) Note that the caller doesn't know how
|
|
|
|
many different get_next_mcu procedures there are, what their real names are,
|
|
|
|
nor how to choose which one to call.
|
|
|
|
|
|
|
|
An important benefit of this scheme is that it is easy to provide multiple
|
|
|
|
versions of any method, each tuned to a particular case. While a lot of
|
|
|
|
precalculation might be done to select an optimal implementation of a method,
|
|
|
|
the cost per invocation is constant. For example, the MCU extraction step
|
|
|
|
might have a "generic" method, plus one or more "hardwired" methods for the
|
|
|
|
most popular sampling factors; the hardwired methods would be faster because
|
|
|
|
they'd use straight-line code instead of for-loops. The cost to determine
|
|
|
|
which method to use is paid only once, at startup, and the selection criteria
|
|
|
|
are hidden from the callers of the method.
|
|
|
|
|
|
|
|
This plan differs a little bit from usual object-oriented structures, in that
|
|
|
|
only one instance of each object class will exist during execution. The
|
|
|
|
reason for having the class structure is that on different runs we may create
|
|
|
|
different instances (choose to execute different modules).
|
|
|
|
|
|
|
|
To minimize the number of object pointers that have to be passed around, it
|
|
|
|
will be easiest to have just a few big structs containing all the method
|
1992-03-16 19:00:00 -05:00
|
|
|
pointers. We'll actually use two such structs, one for "system-dependent"
|
|
|
|
methods (memory allocation and error handling) and one for everything else.
|
1991-10-06 20:00:00 -04:00
|
|
|
|
|
|
|
Because of this choice, it's best not to think of an "object" as a specific
|
|
|
|
data structure. Rather, an "object" is just a group of related methods.
|
|
|
|
There would typically be one or more C modules (source files) providing
|
|
|
|
concrete implementations of those methods. You can think of the term
|
|
|
|
"method" as denoting the common interface presented by some set of functions,
|
|
|
|
and "object" as denoting a group of common method interfaces, or the total
|
|
|
|
shared interface behavior of a group of modules.
|
|
|
|
|
|
|
|
|
|
|
|
*** Data chunk sizes ***
|
|
|
|
|
|
|
|
To make the cost of this object-oriented style really minimal, we should make
|
|
|
|
sure that each method call does a fair amount of computation. To do that we
|
|
|
|
should pass large chunks of data around; for example, the colorspace
|
|
|
|
conversion method should process much more than one pixel per call.
|
|
|
|
|
|
|
|
For many steps, the most natural unit of data seems to be an "MCU row".
|
|
|
|
This consists of one complete horizontal strip of the image, as high as an
|
|
|
|
MCU. In a noninterleaved scan, an MCU row is always eight samples high (when
|
|
|
|
looking at samples) or one 8x8 block high (when looking at coefficients). In
|
|
|
|
an interleaved scan, an MCU row consists of all the data for one horizontal
|
|
|
|
row of MCUs; this may be from one to four blocks high (eight to thirty-two
|
|
|
|
samples) depending on the sampling factors. The height and width of an MCU
|
|
|
|
row may be different in each component. (Note that the height and width of an
|
|
|
|
MCU row changes at the subsampling and de-subsampling steps. An unsubsampled
|
|
|
|
image has the same size in each component. The preceding statements apply to
|
|
|
|
the subsampled dimensions.)
|
|
|
|
|
|
|
|
For example, consider a 1024-pixel-wide image using (2h:2v)(1h:1v)(1h:1v)
|
|
|
|
subsampling. In the noninterleaved case, an MCU row of Y would contain 8x1024
|
|
|
|
samples or the same number of frequency coefficients, so it would occupy
|
|
|
|
8K bytes (samples) or 16K bytes (coefficients). An MCU row of Cb or Cr would
|
|
|
|
contain 8x512 samples and occupy half as much space. In the interleaved case,
|
|
|
|
an MCU row would contain 16x1024 Y samples, 8x512 Cb and 8x512 Cr samples, so
|
|
|
|
a total of 24K (samples) or 48K (coefficients) would be needed. This is a
|
|
|
|
reasonable amount of data to expect to retain in memory at one time. (Bear in
|
|
|
|
mind that we'll usually need to have several MCU rows resident in memory at
|
|
|
|
once, at the inputs and outputs to various pipeline steps.)
|
|
|
|
|
|
|
|
The worst case is probably (2h:4v)(1h:1v)(1h:1v) interleaving (this uses 10
|
|
|
|
blocks per MCU, which is the maximum allowed by the spec). An MCU will then
|
|
|
|
contain 32 sample rows worth of Y, so it would occupy 40K or 80K bytes for a
|
|
|
|
1024-pixel-wide image. The most memory-intensive step is probably cross-block
|
|
|
|
smoothing, for which we'd need 3 MCU rows of coefficients as input and another
|
|
|
|
one as output; that would be 320K of working storage. Anything much larger
|
|
|
|
would not fit in an 80x86 machine. (To decompress wider pictures on an 80x86,
|
|
|
|
we'll have to skip cross-block smoothing or else use temporary files.)
|
|
|
|
|
|
|
|
This unit is thus a reasonable-sized chunk for passing through the pipeline.
|
|
|
|
Of course, its major advantage is that it is a natural chunk size for the MCU
|
|
|
|
assembly and disassembly steps to work with.
|
|
|
|
|
|
|
|
For the entropy (Huffman or arithmetic) encoding/decoding steps, the most
|
|
|
|
convenient chunk is a single MCU: one 8x8 block if not interleaved, three to
|
|
|
|
ten such blocks if interleaved. The advantage of this is that when handling
|
|
|
|
interleaved data, the blocks have the same sequence of component membership on
|
|
|
|
each call. (For example, Y,Y,Y,Y,Cb,Cr when using (2h:2v)(1h:1v)(1h:1v)
|
|
|
|
subsampling.) The code needs to know component membership so that it can
|
|
|
|
apply the right set of compression coefficients to each block. A prebuilt
|
|
|
|
array describing this membership can be used during each call. This chunk
|
|
|
|
size also makes it easy to handle restart intervals: just count off one MCU
|
|
|
|
per call and reinitialize when the count reaches zero (restart intervals are
|
|
|
|
specified in numbers of MCU).
|
|
|
|
|
|
|
|
For similar reasons, one MCU is also the best chunk size for the frequency
|
|
|
|
coefficient quantization and dequantization steps.
|
|
|
|
|
|
|
|
For subsampling and desubsampling, the best chunk size is to have each call
|
|
|
|
transform Vk sample rows from or to Vmax sample rows (Vk = this component's
|
|
|
|
vertical sampling factor, Vmax = largest vertical sampling factor). There are
|
|
|
|
eight such chunks in each MCU row. Using a whole MCU row as the chunk size
|
|
|
|
would reduce function call overhead a fraction, but would imply more buffering
|
|
|
|
to provide context for cross-pixel smoothing.
|
|
|
|
|
|
|
|
|
|
|
|
*** Compression object structure ***
|
|
|
|
|
|
|
|
I propose the following set of objects for the compressor. Here an "object"
|
|
|
|
is the common interface for one or more modules having comparable functions.
|
|
|
|
|
|
|
|
Most of these objects can be justified as information-hiding modules.
|
|
|
|
I've indicated what information is private to each object/module.
|
|
|
|
|
|
|
|
Note that in all cases, the caller of a method is expected to have allocated
|
|
|
|
any storage needed for it to return its result. (Typically this storage can
|
|
|
|
be re-used in successive calls, so malloc'ing and free'ing once per call is
|
|
|
|
not reasonable.) Also, much of the context required (compression parameters,
|
|
|
|
image size, etc) will be passed around in large common data structures, which
|
|
|
|
aren't described here; see the header files. Notice that any object that
|
|
|
|
might need to allocate working storage receives an "init" and a "term" call;
|
|
|
|
"term" should be careful to free all allocated storage so that the JPEG system
|
|
|
|
can be used multiple times during a program run. (For the same reason,
|
1992-03-16 19:00:00 -05:00
|
|
|
depending on static initialization of variables is a no-no. The only
|
|
|
|
exception to the free-all-allocated-storage rule is that storage allocated for
|
|
|
|
the entire processing of an image need not be explicitly freed, since the
|
|
|
|
memory manager's free_all cleanup will free it.)
|
1991-10-06 20:00:00 -04:00
|
|
|
|
|
|
|
1. Input file conversion to standardized form. This provides these methods:
|
|
|
|
input_init: read the file header, report image size & component count.
|
|
|
|
get_input_row: read one pixel row, return it in our standard format.
|
|
|
|
input_term: finish up at the end.
|
|
|
|
In implementations that support multiple input formats, input_init could
|
|
|
|
set up an appropriate get_input_row method depending on the format it
|
|
|
|
finds. Note that in most applications, the selection and opening of the
|
|
|
|
input file will be under the control of the user interface module; and
|
|
|
|
indeed the user interface may have already read the input header, so that
|
|
|
|
all that input_init may have to do is return previously saved values. The
|
|
|
|
behind-the-scenes interaction between this object and the user interface is
|
|
|
|
not specified by this architecture.
|
|
|
|
(Hides format of input image and mechanism used to read it. This code is
|
|
|
|
likely to vary considerably from one implementation to another. Note that
|
|
|
|
the color space and number of color components of the source are not hidden;
|
|
|
|
but they are used only by the next object.)
|
|
|
|
|
|
|
|
2. Gamma and color space conversion. This provides three methods:
|
|
|
|
colorin_init: initialization.
|
|
|
|
get_sample_rows: read, convert, and return a specified number of pixel
|
|
|
|
rows (not more than remain in the picture).
|
|
|
|
colorin_term: finish up at the end.
|
|
|
|
The most efficient approach seems to be for this object to call
|
|
|
|
get_input_row directly, rather than being passed the input data; that way,
|
|
|
|
any intermediate storage required can be local to this object.
|
|
|
|
(get_sample_rows might tell get_input_row to read directly into its own
|
|
|
|
output area and then convert in place; or it may do something different.
|
|
|
|
For example, conversion in place wouldn't work if it is changing the number
|
|
|
|
of color components.) The output of this step is in the standardized
|
|
|
|
sample array format shown previously.
|
|
|
|
(Hides all knowledge of color space semantics and conversion. Remaining
|
|
|
|
modules only need to know the number of JPEG components.)
|
|
|
|
|
|
|
|
3. Edge expansion: needs only a single method.
|
|
|
|
edge_expand: Given an NxM sample array, expand to a desired size (a
|
|
|
|
multiple of the MCU dimensions) by duplicating the last
|
|
|
|
row or column. Repeat for each component.
|
|
|
|
Expansion will occur in place, so the caller must have pre-allocated enough
|
|
|
|
storage. (I'm assuming that it is easier and faster to do this expansion
|
|
|
|
than it is to worry about boundary conditions in the next two steps.
|
|
|
|
Notice that vertical expansion will occur only once, at the bottom of the
|
|
|
|
picture, so only horizontal expansion by a few pixels is speed-critical.)
|
|
|
|
(This doesn't really hide any information, so maybe it could be a simple
|
|
|
|
subroutine instead of a method. Depends on whether we want to be able to
|
|
|
|
use alternative, optimized methods.)
|
|
|
|
|
|
|
|
4. Subsampling: this will be applied to one component at a time.
|
|
|
|
subsample_init: initialize (precalculate convolution factors, for
|
|
|
|
example). This will be called once per scan.
|
|
|
|
subsample: Given a sample array, reduce it to a smaller number of
|
|
|
|
samples using specified sampling factors.
|
|
|
|
subsample_term: clean up at the end of a scan.
|
|
|
|
If the current component has vertical sampling factor Vk and the largest
|
|
|
|
sampling factor is Vmax, then the input is always Vmax sample rows (whose
|
|
|
|
width is a multiple of Hmax) and the output is always Vk sample rows.
|
|
|
|
Vmax additional rows above and below the nominal input rows are also passed
|
|
|
|
for use by partial-pixel-averaging sampling methods. (Is this necessary?)
|
|
|
|
At the top and bottom of the image, these extra rows are copies of the
|
|
|
|
first or last actual input row.
|
|
|
|
(This hides whether and how cross-pixel averaging occurs.)
|
|
|
|
|
|
|
|
5. MCU extraction (creation of a single sequence of 8x8 sample blocks).
|
|
|
|
extract_init: initialize as needed. This will be called once per scan.
|
|
|
|
extract_MCUs: convert a sample array to a sequence of MCUs.
|
|
|
|
extract_term: clean up at the end of a scan.
|
|
|
|
Given one or more MCU rows worth of image data, extract sample blocks in the
|
|
|
|
appropriate order; pass these off to subsequent steps one MCU at a time.
|
|
|
|
The input must be a multiple of the MCU dimensions. It will probably be
|
|
|
|
most convenient for the DCT transform, frequency quantization, and zigzag
|
|
|
|
reordering of each block to be done as simple subroutines of this step.
|
|
|
|
Once a transformed MCU has been completed, it'll be passed off to a
|
|
|
|
method call, which will be passed as a parameter to extract_MCUs.
|
|
|
|
That routine might either encode and output the MCU immediately, or buffer
|
|
|
|
it up for later output if we want to do global optimization of the entropy
|
|
|
|
encoding coefficients. Note: when outputting a noninterleaved file this
|
|
|
|
object will be called separately for each component. Direct output could
|
|
|
|
be done for the first component, but the others would have to be buffered.
|
|
|
|
(Again, an object mainly on the grounds that multiple instantiations might
|
|
|
|
be useful.)
|
|
|
|
|
|
|
|
6. DCT transformation of each 8x8 block. This probably doesn't have to be a
|
|
|
|
full-fledged method, but just a plain subroutine that will be called by MCU
|
|
|
|
extraction. One 8x8 block will be processed per call.
|
|
|
|
|
|
|
|
7. Quantization scaling and zigzag reordering of the elements in each 8x8
|
|
|
|
block. (This can probably be a plain subroutine called once per block by
|
|
|
|
MCU extraction; hard to see a need for multiple instantiations here.)
|
|
|
|
|
|
|
|
8. Entropy encoding (Huffman or arithmetic).
|
|
|
|
entropy_encoder_init: prepare for one scan.
|
|
|
|
entropy_encode: accepts an MCU's worth of quantized coefficients,
|
|
|
|
encodes and outputs them.
|
|
|
|
entropy_encoder_term: finish up at end of a scan (dump any buffered
|
|
|
|
bytes, for example).
|
|
|
|
The data output by this module will be sent to the entropy_output method
|
|
|
|
provided by the pipeline controller. (It will probably be worth using
|
|
|
|
buffering to pass multiple bytes per call of the output method.) The
|
|
|
|
output method could be just write_jpeg_data, but might also be a dummy
|
|
|
|
routine that counts output bytes (for use during cut-and-try coefficient
|
|
|
|
optimization).
|
|
|
|
(This hides which entropy encoding method is in use.)
|
|
|
|
|
|
|
|
9. JPEG file header construction. This will provide these methods:
|
|
|
|
write_file_header: output the initial header.
|
|
|
|
write_scan_header: output scan header (called once per component
|
|
|
|
if noninterleaved mode).
|
|
|
|
write_jpeg_data: the actual data output method for the preceding step.
|
|
|
|
write_scan_trailer: finish up after one scan.
|
|
|
|
write_file_trailer: finish up at end of file.
|
|
|
|
Note that compressed data is passed to the write_jpeg_data method, in case
|
|
|
|
a simple fwrite isn't appropriate for some reason.
|
|
|
|
(This hides which variant JPEG file format is being written. Also, the
|
|
|
|
actual mechanism for writing the file is private to this object and the
|
|
|
|
user interface.)
|
|
|
|
|
|
|
|
10. Pipeline control. This object will provide the "main loop" that invokes
|
|
|
|
all the pipeline objects. Note that we will need several different main
|
|
|
|
loops depending on the situation (interleaved output or not, global
|
|
|
|
optimization of encoding parameters or not, etc). This object will do
|
|
|
|
most of the memory allocation, since it will provide the working buffers
|
|
|
|
that are the inputs and outputs of the pipeline steps.
|
|
|
|
(An object mostly to support multiple instantiations; however, overall
|
|
|
|
memory management and sequencing of operations are known only here.)
|
|
|
|
|
|
|
|
11. Overall control. This module will provide at least two routines:
|
|
|
|
jpeg_compress: the main entry point to the compressor.
|
|
|
|
per_scan_method_selection: called by pipeline controllers for
|
|
|
|
secondary method selection passes.
|
|
|
|
jpeg_compress is invoked from the user interface after the UI has selected
|
|
|
|
the input and output files and obtained values for all compression
|
|
|
|
parameters that aren't dynamically determined. jpeg_compress performs
|
|
|
|
basic initialization (e.g., calculating the size of MCUs), does the
|
|
|
|
"global" method selection pass, and finally calls the selected pipeline
|
|
|
|
control object. (Per-scan method selections will be invoked by the
|
|
|
|
pipeline controller.)
|
|
|
|
Note that jpeg_compress can't be a method since it is invoked prior to
|
|
|
|
method selection.
|
|
|
|
|
|
|
|
12. User interface; this is the architecture's term for "the rest of the
|
|
|
|
application program", i.e., that which invokes the JPEG compressor. In a
|
|
|
|
standalone JPEG compression program the UI need be little more than a C
|
|
|
|
main() routine and argument parsing code; but we can expect that the JPEG
|
|
|
|
compressor may be incorporated into complex graphics applications, wherein
|
|
|
|
the UI is much more complex. Much of the UI will need to be written
|
|
|
|
afresh for each non-Unix-like platform the compressor is ported to.
|
|
|
|
The UI is expected to supply input and output files and values for all
|
|
|
|
non-automatically-chosen compression parameters. (Hence defaults are
|
1992-03-16 19:00:00 -05:00
|
|
|
determined by the UI; we should provide helpful routines to fill in
|
|
|
|
the recommended defaults.) The UI must also supply error handling
|
1991-10-06 20:00:00 -04:00
|
|
|
routines and some mechanism for trace messages.
|
|
|
|
(This module hides the user interface provided --- command line,
|
|
|
|
interactive, etc. Except for error/message handling, the UI calls the
|
|
|
|
portable JPEG code, not the other way around.)
|
|
|
|
|
|
|
|
13. (Optional) Compression parameter selection control.
|
|
|
|
entropy_optimize: given an array of MCUs ready to be fed to entropy
|
|
|
|
encoding, find optimal encoding parameters.
|
|
|
|
The actual optimization algorithm ought to be separated out as an object,
|
|
|
|
even though a special pipeline control method will be needed. (The
|
|
|
|
pipeline controller only has to understand that the output of extract_MCUs
|
|
|
|
must be built up as a virtual array rather than fed directly to entropy
|
|
|
|
encoding and output. This pipeline behavior may also be useful for future
|
|
|
|
implementation of hierarchical modes, etc.)
|
|
|
|
To minimize the amount of control logic in the optimization module, the
|
|
|
|
pipeline control doesn't actually hand over big-array pointers, but rather
|
|
|
|
an "iterator": a function which knows how to scan the stored image.
|
|
|
|
(This hides the details of the parameter optimization algorithm.)
|
|
|
|
|
|
|
|
The present design doesn't allow for multiple passes at earlier points
|
|
|
|
in the pipeline, but allowing that would only require providing some
|
|
|
|
new pipeline control methods; nothing else need change.
|
|
|
|
|
|
|
|
14. A memory management object. This will provide methods to allocate "small"
|
|
|
|
things and "big" things. Small things have to fit in memory and you get
|
|
|
|
back direct pointers (this could be handled by direct calls to malloc, but
|
|
|
|
it's cleaner not to assume malloc is the right routine). "Big" things
|
|
|
|
mean buffered images for multiple passes, noninterleaved output, etc.
|
|
|
|
In this case the memory management object will give you room for a few MCU
|
|
|
|
rows and you have to ask for access to the next few; dumping and reloading
|
|
|
|
in a temporary file will go on behind the scenes. (All big objects are
|
|
|
|
image arrays containing either samples or coefficients, and will be
|
|
|
|
scanned top-to-bottom some number of times, so we can apply this access
|
|
|
|
model easily.) On a platform with virtual memory, the memory manager can
|
|
|
|
treat small and big things alike: just malloc up enough virtual memory for
|
|
|
|
the whole image, and let the operating system worry about swapping the
|
|
|
|
image to disk.
|
|
|
|
|
|
|
|
Most of the actual calls on the memory manager will be made from pipeline
|
|
|
|
control objects; changing any data item from "small" to "big" status would
|
|
|
|
require a new pipeline control object, since it will contain the logic to
|
|
|
|
ask for a new chunk of a big thing. Thus, one way in which pipeline
|
|
|
|
controllers will vary is in which structures they treat as big.
|
|
|
|
|
|
|
|
The memory manager will need to be told roughly how much space is going to
|
|
|
|
be requested overall, so that it can figure out how big a buffer is safe
|
|
|
|
to allocate for a "big" object. (If it happens that you are dealing with
|
|
|
|
a small image, you'd like to decide to keep it all in memory!) The most
|
|
|
|
flexible way of doing this is to divide allocation of "big" objects into
|
|
|
|
two steps. First, there will be one or more "request" calls that indicate
|
|
|
|
the desired object sizes; then an "instantiate" call causes the memory
|
|
|
|
manager to actually construct the objects. The instantiation must occur
|
|
|
|
before the contents of any big object can be accessed.
|
|
|
|
|
|
|
|
For 80x86 CPUs, we would like the code to be compilable under small or
|
|
|
|
medium model, meaning that pointers are 16 bits unless explicitly declared
|
|
|
|
FAR. Hence space allocated by the "small" allocator must fit into the
|
|
|
|
64Kb default data segment, along with stack space and global/static data.
|
|
|
|
For normal JPEG operations we seem to need only about 32Kb of such space,
|
|
|
|
so we are within the target (and have a reasonable slop for the needs of
|
|
|
|
a surrounding application program). However, some color quantization
|
|
|
|
algorithms need 64Kb or more of all-in-memory space in order to create
|
|
|
|
color histograms. For this purpose, we will also support "medium" size
|
|
|
|
things. These are semantically the same as "small" things but are
|
|
|
|
referenced through FAR pointers.
|
|
|
|
|
|
|
|
The following methods will be needed:
|
|
|
|
alloc_small: allocate an object of given size; use for any random
|
|
|
|
data that's not an image array.
|
|
|
|
free_small: release same.
|
1992-03-16 19:00:00 -05:00
|
|
|
alloc_medium: like alloc_small, but returns a FAR pointer. Use for
|
|
|
|
any object bigger than a couple kilobytes.
|
1991-10-06 20:00:00 -04:00
|
|
|
free_medium: release same.
|
|
|
|
alloc_small_sarray: construct an all-in-memory image sample array.
|
|
|
|
free_small_sarray: release same.
|
|
|
|
alloc_small_barray,
|
|
|
|
free_small_barray: ditto for block (coefficient) arrays.
|
|
|
|
request_big_sarray: request a virtual image sample array. The size
|
|
|
|
of the in-memory buffer will be determined by the
|
|
|
|
memory manager, but it will always be a multiple
|
|
|
|
of the passed-in MCU height.
|
|
|
|
request_big_barray: ditto for block (coefficient) arrays.
|
|
|
|
alloc_big_arrays: instantiate all the big arrays previously requested.
|
|
|
|
This call will also pass some info about future
|
|
|
|
memory demands, so that the memory manager can
|
|
|
|
figure out how much space to leave unallocated.
|
|
|
|
access_big_sarray: obtain access to a specified portion of a virtual
|
|
|
|
image sample array.
|
|
|
|
free_big_sarray: release a virtual sample array.
|
1992-03-16 19:00:00 -05:00
|
|
|
access_big_barray,
|
1991-10-06 20:00:00 -04:00
|
|
|
free_big_barray: ditto for block (coefficient) arrays.
|
1992-03-16 19:00:00 -05:00
|
|
|
free_all: release any remaining storage. This is called
|
|
|
|
before normal or error termination; the main reason
|
|
|
|
why it must exist is to ensure that any temporary
|
|
|
|
files will be deleted upon error termination.
|
1991-10-06 20:00:00 -04:00
|
|
|
|
|
|
|
alloc_big_arrays will be called by the pipeline controller, which does
|
|
|
|
most of the memory allocation anyway. The only reason for having separate
|
|
|
|
request calls is to allow some of the other modules to get big arrays.
|
|
|
|
The pipeline controller is required to give an upper bound on total future
|
|
|
|
small-array requests, so that this space can be discounted. (A fairly
|
|
|
|
conservative estimate will be adequate.) Future small-object requests
|
|
|
|
aren't counted; the memory manager has to use a slop factor for those.
|
|
|
|
10K or so seems to be sufficient. (In an 80x86, small objects aren't an
|
|
|
|
issue anyway, since they don't compete for far-heap space. "Medium"-size
|
|
|
|
objects will have to be counted separately.)
|
|
|
|
|
|
|
|
The distinction between sample and coefficient array routines is annoying,
|
|
|
|
but it has to be maintained for machines in which "char *" is represented
|
1992-03-16 19:00:00 -05:00
|
|
|
differently from "int *". On byte-addressable machines some of these
|
|
|
|
methods could perhaps point to the same code.
|
1991-10-06 20:00:00 -04:00
|
|
|
|
|
|
|
The array routines will operate on only 2-D arrays (one component at a
|
|
|
|
time), since different components may require different-size arrays.
|
|
|
|
|
|
|
|
(This object hides the knowledge of whether virtual memory is available,
|
|
|
|
as well as the actual interface to OS and library support routines.)
|
|
|
|
|
|
|
|
Note that any given implementation will presumably contain only one
|
|
|
|
instantiation of input file header reading, overall control, user interface,
|
|
|
|
and memory management. Thus these could be called as simple subroutines,
|
|
|
|
without bothering with an object indirection. This is essential for overall
|
1992-03-16 19:00:00 -05:00
|
|
|
control (which has to initialize the object structure); for consistency we
|
|
|
|
will impose objectness on the other three.
|
1991-10-06 20:00:00 -04:00
|
|
|
|
|
|
|
|
|
|
|
*** Decompression object structure ***
|
|
|
|
|
|
|
|
I propose the following set of objects for decompression. The general
|
|
|
|
comments at the top of the compression object section also apply here.
|
|
|
|
|
|
|
|
1. JPEG file scanning. This will provide these methods:
|
|
|
|
read_file_header: read the file header, determine which variant
|
|
|
|
JPEG format is in use, read everything through SOF.
|
|
|
|
read_scan_header: read scan header (up through SOS). This is called
|
|
|
|
after read_file_header and again after each scan;
|
|
|
|
it returns TRUE if it finds SOS, FALSE if EOI.
|
|
|
|
read_jpeg_data: fetch data for entropy decoder.
|
|
|
|
read_scan_trailer: finish up after one scan, prepare for another call
|
|
|
|
of read_scan_header (may be a no-op).
|
|
|
|
read_file_trailer: finish up at end of file (probably a no-op).
|
|
|
|
The entropy decoder must deal with restart markers, but all other JPEG
|
|
|
|
marker types will be handled in this object; useful data from the markers
|
|
|
|
will be extracted into data structures available to subsequent routines.
|
|
|
|
Note that on exit from read_file_header, only the SOF-marker data should be
|
|
|
|
assumed valid (image size, component IDs, sampling factors); other data
|
|
|
|
such as Huffman tables may not appear until after the SOF. The overall
|
|
|
|
image size and colorspace can be determined after read_file_header, but not
|
|
|
|
whether or how the data is interleaved. (This hides which variant JPEG
|
|
|
|
file format is being read. In particular, for JPEG-in-TIFF the read_header
|
|
|
|
routines might not be scanning standard JPEG markers at all; they could
|
|
|
|
extract the data from TIFF tags. The user interface will already have
|
|
|
|
opened the input file and possibly read part of the header before
|
|
|
|
read_file_header is called.)
|
|
|
|
|
|
|
|
NOTE: for JFIF/raw-JPEG file format, the read_jpeg_data routine is actually
|
|
|
|
supplied by the user interface; the jrdjfif module uses read_jpeg_data
|
|
|
|
internally to scan the input stream. This makes it possible for the user
|
|
|
|
interface module to single-handedly implement special applications like
|
|
|
|
reading from a non-stdio source. For JPEG-in-TIFF format, the need for
|
|
|
|
random access will make it impossible for this to work; hence the TIFF
|
1992-03-16 19:00:00 -05:00
|
|
|
header module will override the UI-supplied read_jpeg_data routine.
|
1991-10-06 20:00:00 -04:00
|
|
|
Non-stdio input from a TIFF file will require extensive surgery to the TIFF
|
|
|
|
header module, if indeed it is practical at all.
|
|
|
|
|
|
|
|
2. Entropy (Huffman or arithmetic) decoding of the coefficient sequence.
|
|
|
|
entropy_decoder_init: prepare for one scan.
|
|
|
|
entropy_decode: decodes and returns an MCU's worth of quantized
|
|
|
|
coefficients per call.
|
|
|
|
entropy_decoder_term: finish up after a scan (may be a no-op).
|
|
|
|
This will read raw data by calling the read_jpeg_data method (I don't see
|
|
|
|
any reason to provide a further level of indirection).
|
|
|
|
(This hides which entropy encoding method is in use.)
|
|
|
|
|
|
|
|
3. Quantization descaling and zigzag reordering of the elements in each 8x8
|
|
|
|
block. (This can probably be a plain subroutine called once per block;
|
|
|
|
hard to see a need for multiple instantiations here.)
|
|
|
|
|
|
|
|
4. MCU disassembly (conversion of a possibly interleaved sequence of 8x8
|
|
|
|
blocks back to separate components in pixel map order).
|
|
|
|
disassemble_init: initialize. This will be called once per scan.
|
|
|
|
disassemble_MCU: Given an MCU's worth of dequantized blocks,
|
|
|
|
distribute them into the proper locations in a
|
|
|
|
coefficient image array.
|
|
|
|
disassemble_term: clean up at the end of a scan.
|
|
|
|
Probably this should be called once per MCU row and should call the
|
|
|
|
preceding two objects repeatedly to obtain the row's data. The output is
|
|
|
|
always a multiple of an MCU's dimensions.
|
|
|
|
(An object on the grounds that multiple instantiations might be useful.)
|
|
|
|
|
1992-03-16 19:00:00 -05:00
|
|
|
5. Cross-block smoothing per JPEG section K.8 or a similar algorithm.
|
1991-10-06 20:00:00 -04:00
|
|
|
smooth_coefficients: Given three block rows' worth of a single
|
|
|
|
component, emit a smoothed equivalent of the
|
|
|
|
middle row. The "above" and "below" pointers
|
|
|
|
may be NULL if at top/bottom of image.
|
|
|
|
The pipeline controller will do the necessary buffering to provide the
|
|
|
|
above/below context. Smoothing will be optional since a good deal of
|
|
|
|
extra memory is needed to buffer the additional block rows.
|
|
|
|
(This object hides the details of the smoothing algorithm.)
|
|
|
|
|
1992-03-16 19:00:00 -05:00
|
|
|
6. Inverse DCT transformation of each 8x8 block.
|
|
|
|
reverse_DCT: given an MCU row's worth of blocks, perform inverse
|
|
|
|
DCT on each block and output the results into an array
|
|
|
|
of samples.
|
|
|
|
We put this method into the jdmcu module for symmetry with the division of
|
|
|
|
labor in compression. Note that the actual IDCT code is a separate source
|
|
|
|
file.
|
1991-10-06 20:00:00 -04:00
|
|
|
|
|
|
|
7. De-subsampling and smoothing: this will be applied to one component at a
|
|
|
|
time. Note that cross-pixel smoothing, which was a separate step in the
|
|
|
|
prototype code, will now be performed simultaneously with expansion.
|
|
|
|
unsubsample_init: initialize (precalculate convolution factors, for
|
|
|
|
example). This will be called once per scan.
|
|
|
|
unsubsample: Given a sample array, enlarge it by specified sampling
|
|
|
|
factors.
|
|
|
|
unsubsample_term: clean up at the end of a scan.
|
|
|
|
If the current component has vertical sampling factor Vk and the largest
|
|
|
|
sampling factor is Vmax, then the input is always Vk sample rows (whose
|
|
|
|
width is a multiple of Hk) and the output is always Vmax sample rows.
|
|
|
|
Vk additional rows above and below the nominal input rows are also passed
|
|
|
|
for use in cross-pixel smoothing. At the top and bottom of the image,
|
|
|
|
these extra rows are copies of the first or last actual input row.
|
|
|
|
(This hides whether and how cross-pixel smoothing occurs.)
|
|
|
|
|
|
|
|
8. Cropping to the original pixel dimensions (throwing away duplicated
|
|
|
|
pixels at the edges). This won't be a separate object, just an
|
|
|
|
adjustment of the nominal image size in the pipeline controller.
|
|
|
|
|
|
|
|
9. Color space reconversion and gamma adjustment.
|
|
|
|
colorout_init: initialization. This will be passed the component
|
|
|
|
data from read_file_header, and will determine the
|
|
|
|
number of output components.
|
|
|
|
color_convert: convert a specified number of pixel rows. Input and
|
|
|
|
output are image arrays of same size but possibly
|
|
|
|
different numbers of components.
|
|
|
|
colorout_term: cleanup (probably a no-op except for memory dealloc).
|
1992-03-16 19:00:00 -05:00
|
|
|
In practice will usually be given an MCU row's worth of pixel rows, except
|
1991-10-06 20:00:00 -04:00
|
|
|
at the bottom where a smaller number of rows may be left over. Note that
|
|
|
|
this object works on all the components at once.
|
1992-03-16 19:00:00 -05:00
|
|
|
When quantizing colors, color_convert may be applied to the colormap
|
|
|
|
instead of actual pixel data. color_convert is called by the color
|
|
|
|
quantizer in this case; the pipeline controller calls color_convert
|
|
|
|
directly only when not quantizing.
|
1991-10-06 20:00:00 -04:00
|
|
|
(Hides all knowledge of color space semantics and conversion. Remaining
|
|
|
|
modules only need to know the number of JPEG and output components.)
|
|
|
|
|
|
|
|
10. Color quantization (used only if a colormapped output format is requested).
|
|
|
|
We use two different strategies depending on whether one-pass (on-the-fly)
|
|
|
|
or two-pass quantization is requested. Note that the two-pass interface
|
|
|
|
is actually designed to let the quantizer make any number of passes.
|
|
|
|
color_quant_init: initialization, allocate working memory. In 1-pass
|
|
|
|
quantization, should call put_color_map.
|
|
|
|
color_quantize: convert a specified number of pixel rows. Input
|
|
|
|
and output are image arrays of same size, but input
|
|
|
|
is N coefficients and output is only one. (Used only
|
|
|
|
in 1-pass quantization.)
|
|
|
|
color_quant_prescan: prescan a specified number of pixel rows in
|
|
|
|
2-pass quantization.
|
|
|
|
color_quant_doit: perform multi-pass color quantization. Input is a
|
|
|
|
"big" sample image, output is via put_color_map and
|
|
|
|
put_pixel_rows. (Used only in 2-pass quantization.)
|
|
|
|
color_quant_term: cleanup (probably a no-op except for memory dealloc).
|
1992-03-16 19:00:00 -05:00
|
|
|
The input to the color quantizer is always in the unconverted colorspace;
|
|
|
|
its output colormap must be in the converted colorspace. The quantizer
|
|
|
|
has the choice of which space to work in internally. It must call
|
|
|
|
color_convert either on its input data or on the colormap it sends to the
|
|
|
|
output module.
|
1991-10-06 20:00:00 -04:00
|
|
|
For one-pass quantization the image is simply processed by color_quantize,
|
|
|
|
a few rows at a time. For two-pass quantization, the pipeline controller
|
1992-03-16 19:00:00 -05:00
|
|
|
accumulates the output of steps 1-8 into a "big" sample image. The
|
1991-10-06 20:00:00 -04:00
|
|
|
color_quant_prescan method is invoked during this process so that the
|
1992-03-16 19:00:00 -05:00
|
|
|
quantizer can accumulate statistics. (If the input file has multiple
|
|
|
|
scans, the prescan may be done during the final scan or as a separate
|
|
|
|
pass.) At the end of the image, color_quant_doit is called; it must
|
|
|
|
create and output a colormap, then rescan the "big" image and pass mapped
|
|
|
|
data to the output module. Additional scans of the image could be made
|
|
|
|
before the output pass is done (in fact, prescan could be a no-op).
|
1991-10-06 20:00:00 -04:00
|
|
|
As with entropy parameter optimization, the pipeline controller actually
|
|
|
|
passes an iterator function rather than direct access to the big image.
|
|
|
|
(Hides color quantization algorithm.)
|
|
|
|
|
|
|
|
11. Writing of the desired image format.
|
|
|
|
output_init: produce the file header given data from read_file_header.
|
|
|
|
put_color_map: output colormap, if any (called by color quantizer).
|
1992-03-16 19:00:00 -05:00
|
|
|
If used, must be called before any pixel data is output.
|
1991-10-06 20:00:00 -04:00
|
|
|
put_pixel_rows: output image data in desired format.
|
|
|
|
output_term: finish up at the end.
|
1992-03-16 19:00:00 -05:00
|
|
|
The actual timing of I/O may differ from that suggested by the routine
|
|
|
|
names; for instance, writing of the file header may be delayed until
|
|
|
|
put_color_map time if the actual number of colors is needed in the header.
|
|
|
|
Also, the colormap is available to put_pixel_rows and output_term as well
|
|
|
|
as put_color_map.
|
1991-10-06 20:00:00 -04:00
|
|
|
Note that whether colormapping is needed will be determined by the user
|
|
|
|
interface object prior to method selection. In implementations that
|
|
|
|
support multiple output formats, the actual output format will also be
|
|
|
|
determined by the user interface.
|
|
|
|
(Hides format of output image and mechanism used to write it. Note that
|
1992-03-16 19:00:00 -05:00
|
|
|
several other objects know the color model used by the output format.
|
|
|
|
The actual mechanism for writing the file is private to this object and
|
|
|
|
the user interface.)
|
1991-10-06 20:00:00 -04:00
|
|
|
|
|
|
|
12. Pipeline control. This object will provide the "main loop" that invokes
|
|
|
|
all the pipeline objects. Note that we will need several different main
|
|
|
|
loops depending on the situation (interleaved input or not, whether to
|
|
|
|
apply cross-block smoothing or not, etc). We may want to divvy up the
|
|
|
|
pipeline controllers into two levels, one that retains control over the
|
|
|
|
whole file and one that is invoked per scan.
|
|
|
|
This object will do most of the memory allocation, since it will provide
|
|
|
|
the working buffers that are the inputs and outputs of the pipeline steps.
|
|
|
|
(An object mostly to support multiple instantiations; however, overall
|
|
|
|
memory management and sequencing of operations are known only here.)
|
|
|
|
|
|
|
|
13. Overall control. This module will provide at least two routines:
|
|
|
|
jpeg_decompress: the main entry point to the decompressor.
|
|
|
|
per_scan_method_selection: called by pipeline controllers for
|
|
|
|
secondary method selection passes.
|
|
|
|
jpeg_decompress is invoked from the user interface after the UI has
|
|
|
|
selected the input and output files and obtained values for all
|
|
|
|
user-specified options (e.g., output file format, whether to do block
|
|
|
|
smoothing). jpeg_decompress calls read_file_header, performs basic
|
|
|
|
initialization (e.g., calculating the size of MCUs), does the "global"
|
|
|
|
method selection pass, and finally calls the selected pipeline control
|
|
|
|
object. (Per-scan method selections will be invoked by the pipeline
|
|
|
|
controller.)
|
|
|
|
Note that jpeg_decompress can't be a method since it is invoked prior to
|
|
|
|
method selection.
|
|
|
|
|
|
|
|
14. User interface; this is the architecture's term for "the rest of the
|
|
|
|
application program", i.e., that which invokes the JPEG decompressor.
|
|
|
|
The UI is expected to supply input and output files and values for all
|
|
|
|
operational parameters. The UI must also supply error handling routines.
|
|
|
|
(This module hides the user interface provided --- command line,
|
|
|
|
interactive, etc. Except for error handling, the UI calls the portable
|
|
|
|
JPEG code, not the other way around.)
|
|
|
|
|
|
|
|
15. A memory management object. This will be identical to the memory
|
|
|
|
management for compression (and will be the same code, in combined
|
|
|
|
programs). See above for details.
|
|
|
|
|
|
|
|
|
|
|
|
*** Initial method selection ***
|
|
|
|
|
|
|
|
The main ugliness in this design is the portion of startup that will select
|
|
|
|
which of several instantiations should be used for each of the objects. (For
|
|
|
|
example, Huffman or arithmetic for entropy encoding; one of several pipeline
|
|
|
|
controllers depending on interleaving, the size of the image, etc.) It's not
|
|
|
|
really desirable to have a single chunk of code that knows the names of all
|
|
|
|
the possible instantiations and the conditions under which to select each one.
|
|
|
|
|
|
|
|
The best approach seems to be to provide a selector function for each object
|
|
|
|
(group of related method calls). This function knows about each possible
|
|
|
|
instantiation of its object and how to choose the right one; but it doesn't
|
|
|
|
know about any other objects.
|
|
|
|
|
|
|
|
Note that there will be several rounds of method selection: at initial startup,
|
|
|
|
after overall compression parameters are determined (after the file header is
|
|
|
|
read, if decompressing), and one in preparation for each scan (this occurs
|
|
|
|
more than once if the file is noninterleaved). Each object method will need
|
|
|
|
to be clearly identified as to which round sets it up.
|
|
|
|
|
|
|
|
|
|
|
|
*** Implications of DNL marker ***
|
|
|
|
|
|
|
|
Some JPEG files may use a DNL marker to postpone definition of the image
|
|
|
|
height (this would be useful for a fax-like scanner's output, for instance).
|
|
|
|
In these files the SOF marker claims the image height is 0, and you only
|
|
|
|
find out the true image height at the end of the first scan.
|
|
|
|
|
|
|
|
We could handle these files as follows:
|
|
|
|
1. Upon seeing zero image height, replace it by 65535 (the maximum allowed).
|
|
|
|
2. When the DNL is found, update the image height in the global image
|
|
|
|
descriptor.
|
|
|
|
This implies that pipeline control objects must avoid making copies of the
|
|
|
|
image height, and must re-test for termination after each MCU row. This is
|
|
|
|
no big deal.
|
|
|
|
|
|
|
|
In situations where image-size data structures are allocated, this approach
|
|
|
|
will result in very inefficient use of virtual memory or
|
|
|
|
much-larger-than-necessary temporary files. This seems acceptable for
|
|
|
|
something that probably won't be a mainstream usage. People might have to
|
|
|
|
forgo use of memory-hogging options (such as two-pass color quantization or
|
|
|
|
noninterleaved JPEG files) if they want efficient conversion of such files.
|
|
|
|
(One could improve efficiency by demanding a user-supplied upper bound for the
|
|
|
|
height, less than 65536; in most cases it could be much less.)
|
|
|
|
|
|
|
|
Alternately, we could insist that DNL-using files be preprocessed by a
|
|
|
|
separate program that reads ahead to the DNL, then goes back and fixes the SOF
|
|
|
|
marker. This is a much simpler solution and is probably far more efficient.
|
|
|
|
Even if one wants piped input, buffering the first scan of the JPEG file
|
|
|
|
needs a lot smaller temp file than is implied by the maximum-height method.
|
|
|
|
For this approach we'd simply treat DNL as a no-op in the decompressor (at
|
|
|
|
most, check that it matches the SOF image height).
|
|
|
|
|
1992-03-16 19:00:00 -05:00
|
|
|
We will not worry about making the compressor capable of outputting DNL.
|
|
|
|
Something similar to the first scheme above could be applied if anyone ever
|
|
|
|
wants to make that work.
|
|
|
|
|
|
|
|
|
|
|
|
*** Memory manager internal structure ***
|
|
|
|
|
|
|
|
The memory manager contains the most potential for system dependencies.
|
|
|
|
To isolate system dependencies as much as possible, we have broken the
|
|
|
|
memory manager into two parts. There is a reasonably system-independent
|
|
|
|
"front end" (jmemmgr.c) and a "back end" that contains only the code
|
|
|
|
likely to change across systems. All of the memory management methods
|
|
|
|
outlined above are implemented by the front end. The back end provides
|
|
|
|
the following routines for use by the front end (none of these routines
|
|
|
|
are known to the rest of the JPEG code):
|
|
|
|
|
|
|
|
jmem_init, jmem_term system-dependent initialization/shutdown
|
|
|
|
|
|
|
|
jget_small, jfree_small interface to malloc and free library routines
|
|
|
|
|
|
|
|
jget_large, jfree_large interface to FAR malloc/free in MS-DOS machines;
|
|
|
|
otherwise same as jget_small/jfree_small
|
|
|
|
|
|
|
|
jmem_available estimate available memory
|
|
|
|
|
|
|
|
jopen_backing_store create a backing-store object
|
|
|
|
|
|
|
|
read_backing_store, manipulate a backing store object
|
|
|
|
write_backing_store,
|
|
|
|
close_backing_store
|
|
|
|
|
|
|
|
On some systems there will be more than one type of backing-store object
|
|
|
|
(specifically, in MS-DOS a backing store file might be an area of extended
|
|
|
|
memory as well as a disk file). jopen_backing_store is responsible for
|
|
|
|
choosing how to implement a given object. The read/write/close routines
|
|
|
|
are method pointers in the structure that describes a given object; this
|
|
|
|
lets them be different for different object types.
|
|
|
|
|
|
|
|
It may be necessary to ensure that backing store objects are explicitly
|
|
|
|
released upon abnormal program termination. (For example, MS-DOS won't free
|
|
|
|
extended memory by itself.) To support this, we will expect the main program
|
|
|
|
or surrounding application to arrange to call the free_all method upon
|
|
|
|
abnormal termination; this may require a SIGINT signal handler, for instance.
|
|
|
|
(We don't want to have the system-dependent module install its own signal
|
|
|
|
handler, because that would pre-empt the surrounding application's ability
|
|
|
|
to control signal handling.)
|
1991-10-06 20:00:00 -04:00
|
|
|
|
|
|
|
|
|
|
|
*** Notes for MS-DOS implementors ***
|
|
|
|
|
|
|
|
The standalone cjpeg and djpeg applications can be compiled in "small" memory
|
|
|
|
model, at least at the moment; as the code grows we may be forced to switch to
|
|
|
|
"medium" model. (Small = both code and data pointers are near by default;
|
|
|
|
medium = far code pointers, near data pointers.) Medium model will slow down
|
|
|
|
calls through method pointers, but I don't think this will amount to any
|
|
|
|
significant speed penalty.
|
|
|
|
|
|
|
|
When integrating the JPEG code into a larger application, it's a good idea to
|
|
|
|
stay with a small-data-space model if possible. An 8K stack is much more than
|
|
|
|
sufficient for the JPEG code, and its static data requirements are less than
|
1992-03-16 19:00:00 -05:00
|
|
|
1K. When executed, it will typically malloc about 10K-20K worth of near heap
|
1991-10-06 20:00:00 -04:00
|
|
|
space (and lots of far heap, but that doesn't count in this calculation).
|
|
|
|
This figure will vary depending on image size and other factors, but figuring
|
1992-03-16 19:00:00 -05:00
|
|
|
30K should be more than sufficient. Thus you have about 25K available for
|
1991-10-06 20:00:00 -04:00
|
|
|
other modules' static data and near heap requirements before you need to go to
|
|
|
|
a larger memory model. The C library's static data will account for several K
|
|
|
|
of this, but that still leaves a good deal for your needs. (If you are tight
|
|
|
|
on space, you could reduce JPEG_BUF_SIZE from 4K to 1K to save 3K of near heap
|
|
|
|
space.)
|
|
|
|
|
|
|
|
As the code is improved, we will endeavor to hold the near data requirements
|
|
|
|
to the range given above. This does imply that certain data structures will
|
|
|
|
be allocated as FAR although they would fit in near space if we assumed the
|
|
|
|
JPEG code is stand-alone. (The LZW tables in jrdgif/jwrgif are examples.)
|
|
|
|
To make an optimal implementation, you might want to move these structures
|
|
|
|
back to near heap if you know there is sufficient space.
|
|
|
|
|
1992-03-16 19:00:00 -05:00
|
|
|
FAR data space may also be a tight resource when you are dealing with large
|
|
|
|
images. The most memory-intensive case is decompression with two-pass color
|
|
|
|
quantization. This requires a 128Kb color histogram plus strip buffers
|
|
|
|
amounting to about 150 bytes per column for typical sampling ratios (eg, about
|
|
|
|
96000 bytes for a 640-pixel-wide image). You may not be able to process wide
|
|
|
|
images if you have large data structures of your own.
|
|
|
|
|
1991-10-06 20:00:00 -04:00
|
|
|
|
|
|
|
*** Potential optimizations ***
|
|
|
|
|
|
|
|
For colormapped input formats it might be worthwhile to merge the input file
|
|
|
|
reading and the colorspace conversion steps; in other words, do the colorspace
|
|
|
|
conversion by hacking up the colormap before inputting the image body, rather
|
|
|
|
than doing the conversion on each pixel independently. Not clear if this is
|
|
|
|
worth the uglification involved. In the above design for the compressor, only
|
|
|
|
the colorspace conversion step ever sees the output of get_input_row, so this
|
|
|
|
sort of thing could be done via private agreement between those two modules.
|
|
|
|
|
|
|
|
Level shift from 0..255 to -128..127 may be done either during colorspace
|
|
|
|
conversion, or at the moment of converting an 8x8 sample block into the format
|
|
|
|
used by the DCT step (which will be signed short or long int). This could be
|
|
|
|
selectable by a compile-time flag, so that the intermediate steps can work on
|
|
|
|
either signed or unsigned chars as samples, whichever is most easily handled
|
|
|
|
by the platform. However, making sure that rounding is done right will be a
|
|
|
|
lot easier if we can assume positive values. At the moment I think that
|
|
|
|
benefit is worth the overhead of "& 0xFF" when reading out sample values on
|
|
|
|
signed-char-only machines.
|