c12bc4de58
git-svn-id: https://svn.wxwidgets.org/svn/wx/wxWidgets/trunk@9492 c3d73ce0-8a6f-49c7-b76d-6d57e0e08775
117 lines
5.3 KiB
Plaintext
117 lines
5.3 KiB
Plaintext
Using Structured Text
|
|
|
|
The goal of StructuredText is to make it possible to express
|
|
structured text using a relatively simple plain text format. Simple
|
|
structures, like bullets or headings are indicated through
|
|
conventions that are natural, for some definition of
|
|
"natural". Hierarchical structures are indicated through
|
|
indentation. The use of indentation to express hierarchical
|
|
structure is inspired by the Python programming language.
|
|
|
|
Use of StructuredText consists of one to three logical steps. In the
|
|
first step, a text string is converted to a network of objects using
|
|
the 'StructuredText.Basic' facility, as in the following
|
|
example::
|
|
|
|
raw=open("mydocument.txt").read()
|
|
import StructuredText
|
|
st=StructuredText.Basic(raw)
|
|
|
|
The output of 'StructuredText.Basic' is simply a
|
|
StructuredTextDocumemt object containing StructuredTextParagraph
|
|
objects arranged in a hierarchy. Paragraphs are delimited by strings
|
|
of two or more whitespace characters beginning and ending with
|
|
newline characters. Hierarchy is indicated by indentation. The
|
|
indentation of a paragraph is the minimum number of leading spaces
|
|
in a line containing non-white-space characters after converting tab
|
|
characters to spaces (assuming a tab stop every eight characters).
|
|
|
|
StructuredTextNode objects support the read-only subset of the
|
|
Document Object Model (DOM) API. It should be possible to process
|
|
'StructuredTextNode' hierarchies using XML tools such as XSLT.
|
|
|
|
The second step in using StructuredText is to apply additional
|
|
structuring rules based on text content. A variety of differentText
|
|
rules can be used. Typically, these are used to implement a
|
|
structured text language for producing documents, but any sort of
|
|
structured text language could be implemented in the second
|
|
step. For example, it is possible to use StructuredText to implement
|
|
structured text formats for representing structured data. The second
|
|
step, which could consist of multiple processing steps, is
|
|
performed by processing, or "coloring", the hierarchy of generic
|
|
StructuredTextParagraph objects into a network of more specialized
|
|
objects. Typically, the objects produced should also implement the DOM
|
|
API to allow processing with XML tools.
|
|
|
|
A document processor is provided to convert a StructuredTextDocument
|
|
object containing only StructuredStructuredTextParagraph objects
|
|
into a StructuredTextDocument object containing a richer collection
|
|
of objects such as bullets, headings, emphasis, and so on using
|
|
hints in the text. Hints are selected based on conventions of the
|
|
sort typically seen in electronic mail or news-group postings. It
|
|
should be noted, however, that these conventions are somewhat
|
|
culturally dependent, fortunately, the document processor is easily
|
|
customized to implement alternative rules. Here's an example of
|
|
using the DOC processor to convert the output of the previous example::
|
|
|
|
doc=StructuredText.Document(st)
|
|
|
|
The final step is to process the colored networks produced from the
|
|
second step to produce additional outputs. The final step could be
|
|
performed by Python programs, or by XML tools. A Python outputter is
|
|
provided for the document processor output that produces Hypertext Markup
|
|
Language (HTML) text::
|
|
|
|
html=StructuredText.HTML(doc)
|
|
|
|
Customizing the document processor
|
|
|
|
The document processor is driven by two tables. The first table,
|
|
named 'paragraph_types', is a sequence of callable objects or method
|
|
names for coloring paragraphs. If a table entry is a string, then it
|
|
is the name of a method of the document processor to be used. For
|
|
each input paragraph, the objects in the table are called until one
|
|
returns a value (not 'None'). The value returned replaces the
|
|
original input paragraph in the output. If none of the objects in
|
|
the paragraph types table return a value, then a copy of the
|
|
original paragraph is used. The new object returned by calling a
|
|
paragraph type should implement the ReadOnlyDOM,
|
|
StructuredTextColorizable, and StructuredTextSubparagraphContainer
|
|
interfaces. See the 'Document.py' source file for examples.
|
|
|
|
A paragraph type may return a list or tuple of replacement
|
|
paragraphs, this allowing a paragraph to be split into multiple
|
|
paragraphs.
|
|
|
|
The second table, 'text_types', is a sequence of callable objects or
|
|
method names for coloring text. The callable objects in this table
|
|
are used in sequence to transform the input text into new text or
|
|
objects. The callable objects are passed a string and return
|
|
nothing ('None') or a three-element tuple consisting of:
|
|
|
|
- a replacement object,
|
|
|
|
- a starting position, and
|
|
|
|
- an ending position
|
|
|
|
The text from the starting position is (logically) replaced with the
|
|
replacement object. The replacement object is typically an object
|
|
that implements that implements the ReadOnlyDOM, and
|
|
StructuredTextColorizable interfaces. The replacement object can
|
|
also be a string or a list of strings or objects. Replacement is
|
|
done from beginning to end and text after the replacement ending
|
|
position will be passed to the character type objects for processing.
|
|
|
|
Example: adding wiki links
|
|
|
|
We want to add support for Wiki links. A Wiki link is a string of
|
|
text containing mixed-case letters, such that at least two of the
|
|
letters are upper case and such that the first letter is upper case.
|
|
|
|
|
|
|
|
|
|
|
|
|