Transforming the coordinate system and calculating the area of all countries
Now, the objective is to know how much area is occupied by each country. However, the coordinates of country borders are expressed in latitude and longitude, and we can't calculate areas in this coordinate system. We want the area to be in the metric system, so first we need to convert the spatial reference system of the geometries.
Let's also take a step further in the programming techniques and start using functions to avoid the repetition of code. Perform the following steps:
- Create a new file in the
Chapter1
directory, name this fileworld_areas.py
, and program this first function:import ogr def open_shapefile(file_path): """Open the shapefile, get the first layer and returns the ogr datasource. """ datasource = ogr.Open(file_path) layer = datasource.GetLayerByIndex(0) print("Opening {}".format(file_path)) print("Number of features: {}".format( layer.GetFeatureCount())) return datasource
- Run the code, go to Run | Run... in the menu, and select
world_areas
. If everything is correct, nothing should happen. This is because we are not calling our function. Add this line of code at the end and outside the function:datasource = open_shapefile("../data/world_borders_simple.shp")
- Now, run the code again with Shift + F10 and check the output, as follows:
Opening ../data/world_borders_simple.shp Number of features: 246 Process finished with exit code 0
That's wonderful! You just created a piece of very useful and reusable code. You now have a function that can open any shapefile, print the number of features, and return the ogr
datasource. From now on, you can reuse this function in any of your projects.
You are already familiar with how this code works, but there are a few novelties here that deserve an explanation. The def
statement defines a function with the def function_name(arguments):
syntax.
Remember when I told you that OGR doesn't follow Python's naming convention? Well, the convention is that function names should all be in lowercase with an underscore between words. A good hint for names is to follow the verb_noun
rule.
Tip
These conventions are described in a document called PEP-8, where PEP stands for Python Enhancement Program. You can find this document at https://www.python.org/dev/peps/pep-0008/.
Right after the function's definition, you can see a description between triple quotes; this is a docstring, and it is used to document the code. It's optional but very useful for you to know what the function does.
Now, let's get back to our code. The second important thing to point out is the return
statement. This makes the function return the values of the variables listed after the statement—in this case, the datasource.
Note
It's very important that all pieces of the OGR objects flow together through the program. In this case, if we return only the layer, for example, we will get a runtime error later in our program. This happens because in OGR internals, the layer has a reference to the data source, and when you exit a Python function, all objects that don't exit the function are trashed, and this breaks the reference.
Now, the next step is to create a function that performs the transformation. In OGR, the transformation is made in the feature's geometry, so we need to iterate over the features, get the geometry, and transform its coordinates. We will do this using the following steps:
- Add the following function to your
world_areas.py
file just after theopen_shapefile
function:def transform_geometries(datasource, src_epsg, dst_epsg): """Transform the coordinates of all geometries in the first layer. """ # Part 1 src_srs = osr.SpatialReference() src_srs.ImportFromEPSG(src_epsg) dst_srs = osr.SpatialReference() dst_srs.ImportFromEPSG(dst_epsg) transformation = osr.CoordinateTransformation(src_srs, dst_srs) layer = datasource.GetLayerByIndex(0) # Part 2 geoms = [] layer.ResetReading() for feature in layer: geom = feature.GetGeometryRef().Clone() geom.Transform(transformation) geoms.append(geom) return geoms
The function takes three arguments: the ogr layer, the EPSG code of the coordinate system of the file, and the EPSG code for the transformation output.
Here, it created an
osr.CoordinateTransformation
object; this object contains the instructions to perform the transformation.Probably by now, Pycharm should be complaining that
osr
is anunresolved reference
;osr
is the part of GDAL that deals with coordinate systems. - Now, import the module by adding this line at the top of your code:
import osr
Here, the code iterates over all features, gets a reference to the geometry, and performs the transformation. As we don't want to change the original data, the geometry is cloned, and the transformation is made on the clone.
Python lists are ordered; this means that the elements are in the same order in which they are appended to the list, and this order is always kept. This allows us to create a list of geometries in the same order of the features that are in the data source. This means that the geometries in the list and the features have the same index and can be related in the future by the index.
- Now, let's test the code; add the following lines at the end of the file (the first line is the one that you already added before):
datasource = open_shapefile("../data/world_borders_simple.shp") layer = datasource.GetLayerByIndex(0) feature = layer.GetFeature(0) print("Before transformation:") print(feature.GetGeometryRef()) transformed_geoms = transform_geometries(datasource, 4326, 3395) print("After transformation:") print(transformed_geoms[0])
- Finally, before you run the code, add one more
import
at the beginning of the program. It should be the first statement of your code, as follows:from __future__ import print_function
This
import
allows us to use theprint()
function from Python 3 with the desired behavior, thus maintaining the compatibility. - The complete code should look similar to this:
from __future__ import print_function import ogr import osr def open_shapefile(file_path): ... def transform_geometries(datasource, src_epsg, dst_epsg): ... datasource = open_shapefile("../data/world_borders_simple.shp") layer = datasource.GetLayerByIndex(0) feature = layer.GetFeature(0) print("Before transformation:") print(feature.GetGeometryRef()) transformed_geoms = transform_geometries(datasource, 4326, 3395) print("After transformation:") print(transformed_geoms[0])
- Run your program again by pressing Shift + F10. In the output, note the difference in the coordinates before and after the transformation:
Opening ../data/world_borders_simple.shp Number of features: 246 Before transformation: MULTIPOLYGON (((-61.686668 17.024441000000138 ... ))) After transformation: MULTIPOLYGON (((-6866928.4704937246 ... ))) Process finished with exit code 0
- Now, add another function. This function will calculate the area in square meters (because we will use the geometries that have coordinates in meters), convert the value (or not) to square kilometers or square miles, and store the values in another list with the same order as before. Execute the following code:
def calculate_areas(geometries, unity='km2'): """Calculate the area for a list of ogr geometries.""" # Part 1 conversion_factor = { 'sqmi': 2589988.11, 'km2': 1000000, 'm': 1} # Part2 if unity not in conversion_factor: raise ValueError( "This unity is not defined: {}".format(unity)) # Part 3 areas = [] for geom in geometries: area = geom.Area() areas.append(area / conversion_factor[unity]) return areas
Firstly, note that in the function definition, we use
unity='km2'
; this is a keyword argument, and when you call the functions, this argument is optional.In
Part 1
, a dictionary is used to define a few conversion factors for the area unit. Feel free to add more units if you wish. By the way, Python doesn't care if you use single or double quotes.In
Part 2
, a verification is made to check whether the passed unity exists and whether it is defined inconversion_factor
. Another way of doing this is catching the exception later; however, for now, let's opt for readability.In
Part 3
, the code iterates theogr
geometries, calculates the area, converts the values, and puts it on a list. - Now, to test the code, edit your first line, including
division
to the future imports. This will ensure that all divisions return floating point numbers and not integers. Here's how it should look:from __future__ import print_function, division
- Then, update the testing part of your code to the following:
datasource = open_shapefile("../data/world_borders_simple.shp") transformed_geoms = transform_geometries(datasource, 4326, 3395) calculated_areas = calculate_areas(transformed_geoms, unity='sqmi') print(calculated_areas)
- Run it, change the unity, then run again, and note how the results change.
Very well, unity conversion is another very important procedure in geoprocessing, and you just implemented it in your calculate_areas
function.
However, having a list of numbers as the output is not very useful to us. So, it's time to combine everything that we did so far in order to extract valuable information.