Those advanced writing functions must be used in a particular sequence
to make their intended effect. Their aim is to control when/where
the [Strip/Tile][Offsets/ByteCounts] arrays are written into the file.
The purpose of this is to generate 'cloud-optimized geotiff' files where
the first KB of the file only contain the IFD entries without the potentially
large strile arrays. Those are written afterwards.
The typical sequence of calls is:
TIFFOpen()
[ TIFFCreateDirectory(tif) ]
Set fields with calls to TIFFSetField(tif, ...)
TIFFDeferStrileArrayWriting(tif)
TIFFWriteCheck(tif, ...)
TIFFWriteDirectory(tif)
... potentially create other directories and come back to the above directory
TIFFForceStrileArrayWriting(tif): emit the arrays at the end of file
See test/defer_strile_writing.c for a practical example.
Found on GDAL with https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=14894
Disabling the TIFF_DEFERSTRILELOAD bit in ChopupStripArray() was a
bad idea since when using TIFFReadDirectory() to reload the directory again
would lead to a different value of td_rowsperstrip, which could confuse
readers if they relied on the value found initially.
This function replaces the use of TIFFReadEncodedStrip()/TIFFReadEncodedTile()
when the user can provide the buffer for the input data, for example when
he wants to avoid libtiff to read the strile offset/count values from the
[Strip|Tile][Offsets/ByteCounts] array.
... and add per-strile offset/bytecount loading capabilities.
Part of this commit makes the behaviour that was previously met when
libtiff was compiled with -DDEFER_STRILE_LOAD available for default builds
when specifying the new 'D' (Deferred) TIFFOpen() flag. In that mode, the [Tile/Strip][ByteCounts/Offsets]
arrays are only loaded when first accessed. This can speed-up the opening
of files stored on the network when just metadata retrieval is needed.
This mode has been used for years by the GDAL library when compiled with
its embeded libtiff copy.
To avoid potential out-of-tree code (typically codecs) that would use
the td_stripbytecount and td_stripoffset array inconditionnaly assuming they
have been loaded, those have been suffixed with _p (for protected). The
use of the new functions mentionned below is then recommended.
Another addition of this commit is the capability of loading only the
values of the offset/bytecount of the strile of interest instead of the
whole array. This is enabled with the new 'O' (Ondemand) flag of TIFFOpen()
(which implies 'D'). That behaviour has also been used by GDAL, which hacked
into the td_stripoffset/td_stripbytecount arrays directly. The new code
added in the _TIFFFetchStrileValue() and _TIFFPartialReadStripArray() internal
functions is mostly a port of what was in GDAL GTiff driver previously.
Related to that, the public TIFFGetStrileOffset[WithErr]() and TIFFGetStrileByteCount[WithErr]()
functions have been added to API. They are of particular interest when
using sparse files (with offset == bytecount == 0) and you want to detect
if a strile is present or not without decompressing the data, or updating
an existing sparse file.
They will also be used to enable a future enhancement where client code can entirely
skip bytecount loading in some situtations
A new test/defer_strile_loading.c test has been added to test the above
capabilities.
In most situations of BigTIFF file, the tile/strip sizes are of reasonable size,
that is they fit on a 4-byte LONG. So in that case, use LONG instead of LONG8
to save some space. For uncompressed file, it is easy to detect such situations
by checking at the TIFFTileSize64()/TIFFStripSize64() return. For compressed file,
we must take into account the fact that compression may sometimes result in
larger compressed data. So we allow this optimization only for a few select
compression times, and take a huge security margin (10x factor). We also only
apply this optimization on multi-strip files, so as to allow easy on-the-fly
growing of single-strip files whose strip size could grow above the 4GB threshold.
This change is compatible with the BigTIFF specification. According to
https://www.awaresystems.be/imaging/tiff/bigtiff.html:
"The StripOffsets, StripByteCounts, TileOffsets, and TileByteCounts tags are
allowed to have the datatype TIFF_LONG8 in BigTIFF. Old datatypes TIFF_LONG,
and TIFF_SHORT where allowed in the TIFF 6.0 specification, are still valid in BigTIFF, too. "
On a practical point of view, this is also compatible on reading/writing of
older libtiff 4.X versions.
The only glitch I found, which is rather minor, is when using such a BigTIFF
file with TileByteCounts/StripByteCounts written with TIFF_LONG, and updating
it with an older libtiff 4.X version with a change in the
[Tile/Strip][ByteCounts/Offsets] array. In that case the _TIFFRewriteField()
function will rewrite the directory and array with TIFF_LONG8, instead of updating
the existing array (this is an issue fixed by this commit). The file will
still be valid however, hence the minor severity of this.
EXIF tags FILESOURCE and SCENETYPE are defined as TIFF_UNDEFINED and field_readcount==1!
There is a bug in TIFFReadDirEntryByte() preventing to read correctly type TIFF_UNDEFINED fields with field_readcount==1
Upgrade of TIFFReadDirEntryByte() with added TIFF_UNDEFINED switch-entry allows libtiff to read those tags correctly.
Fixes https://github.com/OSGeo/gdal/issues/1439
When rewriting a LZW tile/strip whose existing size is very close to a multiple of
1024 bytes (and larger than 8192 bytes) with compressed data that is larger,
the new data was not placed at the end of the file, causing corruption.