If you realize that you cannot track the objects created by your application because the code that creates them is in many different places instead of a single function/method, you should consider using the Factory Method pattern [Eckel08, page 187]. The Factory Method centralizes an object creation and tracking your objects becomes much easier. Note that it is absolutely fine to create more than one Factory Method, and this is how it is typically done in practice. Each Factory Method logically groups the creation of objects that have similarities. For example, one Factory Method might be responsible for connecting you to different databases (MySQL, SQLite), another Factory Method might be responsible for creating the geometrical object that you request (circle, triangle), and so on.
The Factory Method is also useful when you want to decouple an object creation from an object usage. We are not coupled/bound to a specific class when creating an object, we just provide partial information about what we want by calling a function. This means that introducing changes to the function is easy without requiring any changes to the code that uses it [Zlobin13, page 30].
Another use case worth mentioning is related to improving the performance and memory usage of an application. A Factory Method can improve the performance and memory usage by creating new objects only if it is absolutely necessary [Zlobin13, page 28]. When we create objects using a direct class instantiation, extra memory is allocated every time a new object is created (unless the class uses caching internally, which is usually not the case). We can see that in practice in the following code (file id.py
), it creates two instances of the same class A
and uses the id()
function to compare their memory addresses. The addresses are also printed in the output so that we can inspect them. The fact that the memory addresses are different means that two distinct objects are created as follows:
Executing id.py
on my computer gives the following output:
Note that the addresses that you see if you execute the file are not the same as I see because they depend on the current memory layout and allocation. But the result must be the same: the two addresses should be different. There's one exception that happens if you write and execute the code in the Python Read-Eval-Print Loop (REPL) (interactive prompt), but that's a REPL-specific optimization which is not happening normally.
Data comes in many forms. There are two main file categories for storing/retrieving data: human-readable files and binary files. Examples of human-readable files are XML, Atom, YAML, and JSON. Examples of binary files are the .sq3
file format used by SQLite and the .mp3
file format used to listen to music.
In this example, we will focus on two popular human-readable formats: XML and JSON. Although human-readable files are generally slower to parse than binary files, they make data exchange, inspection, and modification much easier. For this reason, it is advised to prefer working with human-readable files, unless there are other restrictions that do not allow it (mainly unacceptable performance and proprietary binary formats).
In this problem, we have some input data stored in an XML and a JSON file, and we want to parse them and retrieve some information. At the same time, we want to centralize the client's connection to those (and all future) external services. We will use the Factory Method to solve this problem. The example focuses only on XML and JSON, but adding support for more services should be straightforward.
First, let's take a look at the data files. The XML file, person.xml
, is based on the Wikipedia example [j.mp/wikijson] and contains information about individuals (firstName
, lastName
, gender
, and so on) as follows:
The JSON file, donut.json
, comes from the GitHub account of Adobe [j.mp/adobejson] and contains donut information (type
, price/unit that is, ppu
, topping
, and so on) as follows:
[
{
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters": {
"batter": [
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" },
{ "id": "1003", "type": "Blueberry" },
{ "id": "1004", "type": "Devil's Food" }
]
},
"topping": [
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5007", "type": "Powdered Sugar" },
{ "id": "5006", "type": "Chocolate with Sprinkles" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
},
{
"id": "0002",
"type": "donut",
"name": "Raised",
"ppu": 0.55,
"batters": {
"batter": [
{ "id": "1001", "type": "Regular" }
]
},
"topping": [
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
},
{
"id": "0003",
"type": "donut",
"name": "Old Fashioned",
"ppu": 0.55,
"batters": {
"batter": [
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" }
]
},
"topping": [
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
}
]
We will use two libraries that are part of the Python distribution for working with XML and JSON: xml.etree.ElementTree
and json
as follows:
The JSONConnector
class parses the JSON file and has a parsed_data()
method that returns all data as a dictionary (dict
). The property
decorator is used to make parsed_data()
appear as a normal variable instead of a method as follows:
The XMLConnector
class parses the XML file and has a parsed_data()
method that returns all data as a list of xml.etree.Element
as follows:
The connection_factory()
function is a Factory Method. It returns an instance of JSONConnector
or XMLConnector
depending on the extension of the input file path as follows:
The connect_to()
function is a wrapper of connection_factory()
. It adds exception handling as follows:
The main()
function demonstrates how the Factory Method design pattern can be used. The first part makes sure that exception handling is effective as follows:
The next part shows how to work with the XML files using the Factory Method. XPath is used to find all person
elements that have the last name Liar
. For each matched person, the basic name and phone number information are shown as follows:
The final part shows how to work with the JSON files using the Factory Method. Here, there's no pattern matching, and therefore the name
, price
, and topping
of all donuts are shown as follows:
For completeness, here is the complete code of the Factory Method implementation (factory_method.py
) as follows:
Here is the output of this program as follows:
Notice that although JSONConnector
and XMLConnector
have the same interfaces, what is returned by parsed_data()
is not handled in a uniform way. Different python code must be used to work with each connector. Although it would be nice to be able to use the same code for all connectors, this is at most times not realistic unless we use some kind of common mapping for the data which is very often provided by external data providers. Assuming that you can use exactly the same code for handling the XML and JSON files, what changes are required to support a third format, for example, SQLite? Find an SQLite file or create your own and try it.
As it is now, the code does not forbid a direct instantiation of a connector. Is it possible to do this? Try doing it.
Tip
Hint: Functions in Python can have nested classes.