Unlocking the shapefile
At last, we are ready to start working with some geospatial data. Open up a command line or terminal window and cd
into the TM_WORLD_BORDERS-0.3
directory you unzipped earlier. Then type python
to fire up your Python interpreter.
We're going to start by loading the OGR library we installed earlier:
>>> import osgeo.ogr
We next want to open the shapefile
using OGR:
>>> shapefile = osgeo.ogr.Open("TM_WORLD_BORDERS-0.3.shp")
After executing this statement, the shapefile
variable will hold an osgeo.ogr.Datasource
object representing the geospatial data source we have opened. OGR data sources can support multiple layers of information, even though a shapefile
has only a single layer. For this reason, we next need to extract the (one and only) layer from the shapefile:
>>>layer = shapefile.GetLayer(0)
Let's iterate through the various features within the shapefile, processing each feature in turn. We can do this using the following:
>>> for i in range(layer.GetFeatureCount()): >>> feature = layer.GetFeature(i)
The feature
object, an instance of osgeo.ogr.Feature
, allows us to access the geometry associated with the feature, along with the feature's attributes. According to the README.txt
file, the country's name is stored in an attribute called NAME
. Let's extract that name now:
>>> feature_name = feature.GetField("NAME")
Note
Notice that the attribute is in uppercase. Shapefile attributes are case sensitive, so you have to use the exact capitalization to get the right attribute. Using feature.getField("name")
would generate an error.
To get a reference to the feature's geometry object, we use the GetGeometryRef()
method:
>>> geometry = feature.GetGeometryRef()
We can do all sorts of things with geometries, but for now, let's just see what type of geometry we've got. We can do this using the GetGeometryName()
method:
>>>> geometry_type = geometry.GetGeometryName()
Finally, let's print out the information we have extracted for this feature:
>>> print i, feature_name, geometry_type
Here is the complete mini-program we've written to unlock the contents of the shapefile:
import osgeo.ogr shapefile = osgeo.ogr.Open("TM_WORLD_BORDERS-0.3.shp") layer = shapefile.GetLayer(0) for i in range(layer.GetFeatureCount()): feature = layer.GetFeature(i) feature_name = feature.GetField("NAME") geometry = feature.GetGeometryRef() geometry_type = geometry.GetGeometryName() print i, feature_name, geometry_type
If you press Return a second time to close off the for
loop, your program will run, displaying useful information about each country extracted from the shapefile:
0 Antigua and Barbuda MULTIPOLYGON 1 Algeria POLYGON 2 Azerbaijan MULTIPOLYGON 3 Albania POLYGON 4 Armenia MULTIPOLYGON 5 Angola MULTIPOLYGON 6 American Samoa MULTIPOLYGON 7 Argentina MULTIPOLYGON 8 Australia MULTIPOLYGON 9 Bahrain MULTIPOLYGON ...
Notice that the geometry associated with some countries is a polygon, while for other countries the geometry is a multipolygon. As the name suggests, a multipolygon is simply a collection of polygons. Because the geometry represents the outline of each country, a polygon is used where the country's outline can be represented by a single shape, while a multipolygon is used when the outline has multiple parts. This most commonly happens when a country is made up of multiple islands. For example:
As you can see, Algeria is represented by a polygon, while Australia with its outlying islands would be a multipolygon.