Another very text format is XML. Unfortunately, there is no stable serialization/deserialization library to manage XML format. However, this is not necessarily a shortcoming. In actual fact, XML format is often used to store large datasets; so large, in fact, that it would be inefficient to load them all before we start converting the data into an internal format. In these cases, it may be more efficient to scan the file or incoming stream and process it as long as it is read.
The xml_example project is a rather convoluted program that scans the XML file specified on the command line and, in a procedural fashion, loads information from the file into a Rust data structure. It is meant to read the ../data/sales.xml file. This file has a structure corresponding to the JSON file we sought in the previous section. The following lines show an excerpt of that file:
<?xml version="1.0" encoding="utf-8"?>
<sales-and-products>
<product>
<id>862</id>
</product>
<sale>
<id>2020-3987</id>
</sale>
</sales-and-products>
All XML files have a header in the first line and then one root element; in this case, the root element it is named sales-and-products. This element contains two kinds of elements—product and sale. Both kinds of elements have specific sub-elements, which are the fields of the corresponding data. In this example, only the id fields are shown.
To run the project, open its folder and type in cargo run ../data/sales.xml. Some lines will be printed on the console. The first four of them should be as follows:
Got product.id: 862.
Got product.category: fruit.
Got product.name: cherry.
Exit product: Product { id: 862, category: "fruit", name: "cherry" }
These describe the contents of the specified XML file. In particular, the program found a product with ID 862, then it detected that it is a fruit, then that it is a cherry, and then, when the whole product had been read, the whole struct representing the product was printed. A similar output will appear for sales.
The parsing is performed using only the xml-rs crate. This crate enables a mechanism of parsing, shown in the following code excerpt:
let file = std::fs::File::open(pathname).unwrap();
let file = std::io::BufReader::new(file);
let parser = EventReader::new(file);
for event in parser {
match &location_item {
LocationItem::Other => ...
LocationItem::InProduct => ...
LocationItem::InSale => ...
}
}
An object of the EventReader type scans the buffered file and it generates an event whenever a step is performed in the parsing. The application code handles these kinds of events according to their needs.
The word event is used by this crate, but the word transition would probably be a better description of the data extracted by the parser.
A complex language is hard to parse, but for languages as simple as our data, the situation during the parsing can be modeled by a state machine. To that purpose, three enum variables are declared in the source code: location_item, with the LocationItem type; location_product, with the LocationProduct type; and location_sale, with the LocationSale type.
The first one indicates the current position of the parsing in general. We can be inside a product (InProduct), inside a sale (InSale), or outside of both (Other). If we are inside a product, the LocationProduct enum indicates the current position of parsing inside the current product. This can be within any of the allowed fields or outside of all of them. Similar states happen for sales.
The iteration encounters several kinds of events. The main ones are the following:
- XmlEvent::StartElement: Signals that an XML element is beginning. It is decorated by the name of the beginning element and the possible attributes of that element.
- XmlEvent::EndElement: Signals that an XML element is ending. It is decorated by the name of the ending element.
- XmlEvent::Characters: Signals that the textual contents of an element is available. It is decorated by that available text.
The program declares a mutable product struct, with the Product type, and a mutable sale struct, with the Sale type. They are initialized with default values. Whenever there are some characters available, they are stored in the corresponding field of the current struct.
For example, consider a situation where the value of location_item is LocationItem::InProduct and the value of location_product is LocationProduct::InCategory—that is, we are in a category of a product. In this situation, there can be the name of the category or the end of the category. To get the name of the category, the code contains this pattern of a match statement:
Ok(XmlEvent::Characters(characters)) => {
product.category = characters.clone();
println!("Got product.category: {}.", characters);
}
In this statement, the characters variable gets the name of the category and a clone of it is assigned to the product.category field. Then, the name is printed to the console.