From fe6c8d05d12c752cf4d887f9698ea273a29b4665 Mon Sep 17 00:00:00 2001 From: "Fred L. Drake, Jr." Date: Mon, 22 Apr 2002 17:03:39 +0000 Subject: [PATCH] Formatted version of the Unix manpage, ready for installation. --- expat/doc/xmlwf.1 | 203 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 203 insertions(+) create mode 100644 expat/doc/xmlwf.1 diff --git a/expat/doc/xmlwf.1 b/expat/doc/xmlwf.1 new file mode 100644 index 00000000..b2c56168 --- /dev/null +++ b/expat/doc/xmlwf.1 @@ -0,0 +1,203 @@ +.\" This manpage has been automatically generated by docbook2man +.\" from a DocBook document. This tool can be found at: +.\" +.\" Please send any bug reports, improvements, comments, patches, +.\" etc. to Steve Cheng . +.TH "XMLWF" "1" "22 April 2002" "" "" +.SH NAME +xmlwf \- Determines if an XML document is well-formed +.SH SYNOPSIS + +\fBxmlwf\fR [ \fB-s\fR] [ \fB-n\fR] [ \fB-p\fR] [ \fB-x\fR] [ \fB-e \fIencoding\fB\fR] [ \fB-w\fR] [ \fB-d \fIoutput-dir\fB\fR] [ \fB-c\fR] [ \fB-m\fR] [ \fB-r\fR] [ \fB-t\fR] [ \fB-v\fR] [ \fBfile ...\fR] + +.SH "DESCRIPTION" +.PP +\fBxmlwf\fR uses the Expat library to determine +if an XML document is well-formed. It is non-validating. +.PP +If you do not specify any files on the command-line, +and you have a recent version of xmlwf, the input +file will be read from stdin. +.SH "WELL-FORMED DOCUMENTS" +.PP +A well-formed document must adhere to the +following rules: +.TP 0.2i +\(bu +The file begins with an XML declaration. For instance, +. +\fBNOTE:\fR xmlwf does not currently +check for a valid XML declaration. +.TP 0.2i +\(bu +Every start tag is either empty () +or has a corresponding end tag. +.TP 0.2i +\(bu +There is exactly one root element. This element must contain +all other elements in the document. Only comments, white +space, and processing instructions may come after the close +of the root element. +.TP 0.2i +\(bu +All elements nest properly. +.TP 0.2i +\(bu +All attribute values are enclosed in quotes (either single +or double). +.PP +If the document has a DTD, and it strictly complies with that +DTD, then the document is also considered \fBvalid\fR. +xmlwf is a non-validating parser -- it does not check the DTD. +However, it does support external entities (see the -x option). +.SH "OPTIONS" +.PP +When an option includes an argument, you may specify the argument either +separate ("d output") or mashed ("-doutput"). xmlwf supports both. +.TP +\fB-c\fR +If the input file is well-formed and xmlwf doesn't +encounter any errors, the input file is simply copied to +the output directory unchanged. +This implies no namespaces (turns off -n) and +requires -d to specify an output file. +.TP +\fB-d output-dir\fR +Specifies a directory to contain transformed +representations of the input files. +By default, -d outputs a canonical representation +(described below). +You can select different output formats using -c and -m. + +The output filenames will +be exactly the same as the input filenames or "STDIN" if the input is +coming from STDIN. Therefore, you must be careful that the +output file does not go into the same directory as the input +file. Otherwise, xmlwf will delete the input file before +it generates the output file (just like running +cat < file > file in most shells). + +Two structurally equivalent XML documents have a byte-for-byte +identical canonical XML representation. +Note that ignorable white space is considered significant and +is treated equivalently to data. +More on canonical XML can be found at +http://www.jclark.com/xml/canonxml.html . +.TP +\fB-e encoding\fR +Specifies the character encoding for the document, overriding +any document encoding declaration. xmlwf +has four built-in encodings: +US-ASCII, +UTF-8, +UTF-16, and +ISO-8859-1. +Also see the -w option. +.TP +\fB-m\fR +Outputs some strange sort of XML file that completely +describes the the input file, including character postitions. +Requires -d to specify an output file. +.TP +\fB-n\fR +Turns on namespace processing. (describe namespaces) +-c disables namespaces. +.TP +\fB-p\fR +Tells xmlwf to process external DTDs and parameter +entities. + +Normally xmlwf never parses parameter entities. +-p tells it to always parse them. +-p implies -x. +.TP +\fB-r\fR +Normally xmlwf memory-maps the XML file before parsing. +-r turns off memory-mapping and uses normal file IO calls instead. +Of course, memory-mapping is automatically turned off +when reading from STDIN. +.TP +\fB-s\fR +Prints an error if the document is not standalone. +A document is standalone if it has no external subset and no +references to parameter entities. +.TP +\fB-t\fR +Turns on timings. This tells Expat to parse the entire file, +but not perform any processing. +This gives a fairly accurate idea of the raw speed of Expat itself +without client overhead. +-t turns off most of the output options (-d, -m -c, ...). +.TP +\fB-v\fR +Prints the version of the Expat library being used, and then exits. +.TP +\fB-w\fR +Enables Windows code pages. +Normally, xmlwf will throw an error if it runs across +an encoding that it is not equipped to handle itself. With +-w, xmlwf will try to use a Windows code page. See +also -e. +.TP +\fB-x\fR +Turns on parsing external entities. + +Non-validating parsers are not required to resolve external +entities, or even expand entities at all. +Expat always expands internal entities (?), +but external entity parsing must be enabled explicitly. + +External entities are simply entities that obtain their +data from outside the XML file currently being parsed. + +This is an example of an internal entity: + +.nf + +.fi + +And here are some examples of external entities: + +.nf + (parsed) + (unparsed) +.fi +.TP +\fB--\fR +For some reason, xmlwf specifically ignores "--" +anywhere it appears on the command line. +.PP +Older versions of xmlwf do not support reading from STDIN. +.SH "OUTPUT" +.PP +If an input file is not well-formed, xmlwf outputs +a single line describing the problem to STDOUT. +If a file is well formed, xmlwf outputs nothing. +Note that the result code is \fBnot\fR set. +.SH "BUGS" +.PP +According to the W3C standard, an XML file without a +declaration at the beginning is not considered well-formed. +However, xmlwf allows this to pass. +.PP +xmlwf returns a 0 - noerr result, even if the file is +not well-formed. There is no good way for a program to use +xmlwf to quickly check a file -- it must parse xmlwf's STDOUT. +.PP +The errors should go to STDERR, not stdout. +.PP +There should be a way to get -d to send its output to STDOUT +rather than forcing the user to send it to a file. +.PP +I have no idea why anyone would want to use the -d, -c +and -m options. If someone could explain it to me, I'd +like to add this information to this manpage. +.SH "ALTERNATIVES" +.PP +Here are some XML validators on the web: + +.nf +http://www.hcrc.ed.ac.uk/~richard/xml-check.html +http://www.stg.brown.edu/service/xmlvalid/ +http://www.scripting.com/frontier5/xml/code/xmlValidator.html +http://www.xml.com/pub/a/tools/ruwf/check.html