Software and Languages

XML Parsing

XML typically needs processing in terms of either SAX events or as a DOM tree. If this is done in Java then you can use the Java libraries to write an application that interprets the tags in the order that you expect and performs actions as the XML document is processed. If you do this a great deal then you will notice patterns emerging in the source code that you write.The Java application available here as an Eclipse project and available as browsable source code under Software -> XML -> Parser implements a Java parser that processes XML documents using SAX events. The structure of the XML document is provided to the parser as an XML grammar written in XML. A grammar describes both the structure of an XML tree and the actions to be performed on each node. For example, here is an example of a simple language that constructs instances of classes in the Java package xcore as it processes the XML document (available in the data directory under source):

<Grammar>
<Rule name="Value">
<Or>
<Call name="Atom"/>
<Or>
<Call name="List"/>
<Or>
<Call name="Obj"/>
<Call name="Ref"/>
</Or>
</Or>
</Or>
</Rule>
<Rule name="Atom">
<Or>
<Call name="Bool"/>
<Or>
<Call name="Float"/>
<Or>
<Call name="Int"/>
<Call name="Str"/>
</Or>
</Or>
</Or>
</Rule>
<Rule name="Bool">
<Element tag="Bool">
<Cnstr name="xcore.Bool">
<Arg name="value"/>
</Cnstr>
<Att name="value"/>
</Element>
</Rule>
<Rule name="Float">
<Element tag="Float">
<Cnstr name="xcore.Float">
<Arg name="value"/>
</Cnstr>
<Att name="value"/>
</Element>
</Rule>
<Rule name="Int">
<Element tag="Int">
<Cnstr name="xcore.Int">
<Arg name="value"/>
</Cnstr>
<Att name="value"/>
</Element>
</Rule>
<Rule name="Str">
<Element tag="Str">
<VarRef name="value"/>
<Att name="value"/>
</Element>
</Rule>
<Rule name="List">
<Element tag="List">
<Star>
<Call name="Value"/>
</Star>
</Element>
</Rule>
<Rule name="Obj">
<Element tag="Obj">
<And>
<Bind name="type">
<Call name="Value"/>
</Bind>
<And>
<Bind name="slots">
<Star>
<Call name="Slot"/>
</Star>
</Bind>
<Cnstr name="xcore.Obj">
<Arg name="id"/>
<Arg name="type"/>
<Arg name="slots"/>
</Cnstr>
</And>
</And>
<Att name="id"/>
</Element>
</Rule>
<Rule name="Slot">
<Element tag="Slot">
<And>
<Bind name="value">
<Call name="Value"/>
</Bind>
<Cnstr name="xcore.Slot">
<Arg name="name"/>
<Arg name="value"/>
</Cnstr>
</And>
<Att name="name"/>
</Element>
</Rule>
<Rule name="Ref">
<Element tag="Ref">
<Cnstr name="xcore.Ref">
<Arg name="id"/>
</Cnstr>
<Att name="id"/>
</Element>
</Rule>
</Grammar>
Once it has been loaded the language definition shown above can parse the following XML:
<Obj id='1'>
<Ref id='1'/>
<Slot name='x'>
<List> <Int value='10'/> <Bool value='true'/> <Ref id='1'/> </List>
</Slot>
</Obj>
The main method that does this is defined in parser/Grammar.java. After reading the file it prints out the result:
0=($0)[x=(10.(true.($0.[])))]
Notice how labels are used to show the cycles in the data. How the object-ids are turned into references and the resulting cycles are printed out is due to general purpose walkers (see the source code).