Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Learning Geospatial Analysis with Python-Second Edition

You're reading from   Learning Geospatial Analysis with Python-Second Edition An effective guide to geographic information systems and remote sensing analysis using Python 3

Arrow left icon
Product type Paperback
Published in Dec 2015
Publisher Packt
ISBN-13 9781783552429
Length 394 pages
Edition 1st Edition
Languages
Arrow right icon
Author (1):
Arrow left icon
Joel Lawhead Joel Lawhead
Author Profile Icon Joel Lawhead
Joel Lawhead
Arrow right icon
View More author details
Toc

Table of Contents (12) Chapters Close

Preface 1. Learning Geospatial Analysis with Python 2. Geospatial Data FREE CHAPTER 3. The Geospatial Technology Landscape 4. Geospatial Python Toolbox 5. Python and Geographic Information Systems 6. Python and Remote Sensing 7. Python and Elevation Data 8. Advanced Geospatial Python Modeling 9. Real-Time Data 10. Putting It All Together Index

Common vector GIS concepts

This section will discuss the different types of GIS processes commonly used in geospatial analysis. This list is not exhaustive; however, it provides you with the essential operations that all other operations are based on. If you understand these operations, you can quickly understand much more complex processes as they are either derivatives or combinations of these processes.

Data structures

GIS vector data uses coordinates consisting of, at a minimum, an x horizontal value and a y vertical value to represent a location on the Earth. In many cases, a point may also contain a z value. Other ancillary values are possible including measurements or timestamps.

These coordinates are used to form points, lines, and polygons to model real-world objects. Points can be a geometric feature in and of themselves or they can connect line segments. Closed areas created by line segments are considered polygons. Polygons model objects such as buildings, terrain, or political boundaries.

A GIS feature can consist of a single point, line, or polygon or it can consist of more than one shape. For example, in a GIS polygon dataset containing world country boundaries, the Philippines, which is made up of 7,107 islands, would be represented as a single country made up of thousands of polygons.

Vector data typically represents topographic features better than raster data. Vector data has better accuracy potential and is more precise. However, to collect vector data on a large scale is also traditionally more costly than raster data.

Two other important terms related to vector data structures are bounding box and convex hull. The bounding box or minimum bounding box is the smallest possible square that contains all of the points in a dataset. The following image demonstrates a bounding box for a collection of points:

Data structures

The convex hull of a dataset is similar to the bounding box, but instead of a square, it is the smallest possible polygon that can contain a dataset. The bounding box of a dataset always contains its convex hull. The following image shows the same point data as the previous example with the convex hull polygon shown in red:

Data structures

Buffer

A buffer operation can be applied to spatial objects including points, lines, or polygons. This operation creates a polygon around the object at a specified distance. Buffer operations are used for proximity analysis, for example, establishing a safety zone around a dangerous area. In the following image, the black shapes represent the original geometry while the red outlines represent the larger buffer polygons generated from the original shape:

Buffer

Dissolve

A dissolve operation creates a single polygon out of adjacent polygons. A common use for a dissolve operation is to merge two adjacent properties in a tax database that have been purchased by a single owner. Dissolves are also used to simplify data extracted from remote sensing:

Dissolve

Generalize

Objects that have more points than necessary for the geospatial model can be generalized to reduce the number of points used to represent the shape. This operation usually requires a few attempts to get the optimal number of points without compromising the overall shape. It is a data optimization technique to simplify data for the efficiency of computing or better visualization. This technique is useful in web-mapping applications. Computer screens have a resolution of 72 dots per inch (dpi). Highly-detailed point data, which would not be visible, can be reduced so that less bandwidth is used to send a visually equivalent map to the user:

Generalize

Intersection

An intersection operation is used to see if one part of a feature intersects with one or more features. This operation is for spatial queries in proximity analysis and is often a follow-on operation to a buffer analysis:

Intersection

Merge

A merge operation combines two or more non-overlapping shapes in a single multishape object. Multishape objects mean that the shapes maintain separate geometries but are treated as a single feature with a single set of attributes by the GIS:

Merge

Point in polygon

A fundamental geospatial operation is checking to see whether a point is inside a polygon. This one operation is the atomic building block of many different types of spatial queries. If the point is on the boundary of the polygon, it is considered inside. Very few spatial queries exist that do not rely on this calculation in some way. However, it can be very slow on a large number of points.

The most common and efficient algorithm to detect if a point is inside a polygon is called the ray casting algorithm. First, a test is performed to see if the point is on the polygon boundary. Next, the algorithm draws a line from the point in question in a single direction. The program counts the number of times the line crosses the polygon boundary until it reaches the bounding box of the polygon. The bounding box is the smallest box that can be drawn around the entire polygon. If the number is odd, the point is inside. If the number of boundary intersections is even, the point is outside:

Point in polygon

Union

The union operation is less common but very useful to combine two or more overlapping polygons in a single shape. It is similar to dissolve, but in this case, the polygons are overlapping as opposed to being adjacent. Usually, this operation is used to clean up automatically-generated feature datasets from remote sensing operations:

Union

Join

A join or SQL join is a database operation used to combine two or more tables of information. Relational databases are designed to avoid storing redundant information for one-to-many relationships. For example, a U.S. state may contain many cities. Rather than creating a table for each state containing all of its cities, a table of states with numeric IDs is created, while a table for all the cities in every state is created with a state numeric ID. In a GIS, you can also have spatial joins that are part of the spatial extension software for a database. In spatial joins, combine the attributes to two features in the same way that you do in a SQL join, but the relation is based on the spatial proximity of the two features. To follow the previous cities example, we could add the county name that each city resides in using a spatial join. The cities layer could be loaded over a county polygon layer whose attributes contain the county name. The spatial join would determine which city is in which county and perform a SQL join to add the county name to each city's attribute row.

Geospatial rules about polygons

In geospatial analysis, there are several general rules of thumb regarding polygons that are different from mathematical descriptions of polygons:

  • Polygons must have at least four points—the first and last points must be the same
  • A polygon boundary should not overlap itself
  • Polygons in a layer shouldn't overlap
  • A polygon in a layer inside another polygon is considered as a hole in the underlying polygon

Different geospatial software packages and libraries handle exceptions to these rules differently and can lead to confusing errors or software behaviors. The safest route is to make sure that your polygons obey these rules. There is one more important piece of information about polygons. A polygon is by definition a closed shape, which means that the first and last vertices of the polygon are identical. Some geospatial software will throw an error if you haven't explicitly duplicated the first point as the last point in the polygon dataset. Other software will automatically close the polygon without complaining. The data format that you use to store your geospatial data may also dictate how polygons are defined. This issue is a gray area and so it didn't make the polygon rules, but knowing this quirk will come in handy someday when you run into an error that you can't explain easily.

lock icon The rest of the chapter is locked
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image