Chapter 4 XML Processing
equivalent XML representation is often called serialization or marshalling. Some
processing models support both processing types, but others, such as SAX, do not.
Just as you would avoid manually parsing XML documents, you should avoid
manually constructing XML documents. It is better to rely on higher level, re
liable APIs (such as DOM, and DOM like APIs, or JAXB technology) to con
struct XML documents, because these APIs enforce the construction of well
formed documents. In some instances, these APIs also allow you to validate
constructed XML documents.
Now let's take a closer look at these XML processing APIs.
SAX Programming Model
When you use SAX to process an XML document, you have to implement event
handlers to handle events generated by the parser when it encounters the various
tokens of the markup language. Because a SAX parser generates a transient flow of
these events, it is advisable to process the source document in the following fashion.
Intercept the relevant type of the events generated by the parser. You can use the
information passed as parameters of the events to help identify the relevant informa
tion that needs to be extracted from the source document. Once extracted from the
document, the application logic can process the information.
Typically, with SAX processing, an application may have to maintain some
context so that it can logically aggregate or consolidate information from the flow
of events. Such consolidation is often done before invoking or applying the appli
cation's logic. The developer has two choices when using SAX processing:
1. The application can on the fly invoke the business logic on the extracted in
formation. That is, the logic is invoked as soon as the information is extracted
or after only a minimal consolidation. With this approach, referred to as stream
processing, the document can be processed in one step.
2. The application invokes the business logic after it completes parsing the docu
ment and has completely consolidated the extracted information. This ap
proach takes two steps to process a document.
Note that what we refer to as consolidated information may in fact be domain
specific objects that can be directly passed to the business logic.