Recent discussions on both the SANPCPA group and now the KRMNPA group have shown that there are many people trying to find information about park boundaries, locations, and other fun facts, like whether or not SOTA peaks exist within the park boundaries or not. This usually involves a substantial amount of trawling through Internet sites, and usually a lot of swearing, some of it colourful.
As much as I enjoy a good curse, it pains me to see folks repeating each other’s mistakes, reinventing the wheel and generally avoiding what is, for me at least, a more efficient way. I have been meaning to explain this for a while, but now is the time to unleash the power of GIS systems. I’ve used these to generate the initial SOTA to WWFF mappings used on Parks n’ Peaks, grab a list of the latest CPs for the VK5 Parks award, and also to answer some questions on the NA SOTA group around boundary summits near association borders.
I will now detail the technique.
What is GIS?
GIS stands for Geospatial Information Systems. This is a very active area of research, covering the breadth of surveying, planning, spatial database research and general computer science. It’s main benefit to us is that they’ve done almost all the heavy lifting already. There’s almost no query you might wish to answer that hasn’t already been considered, in generic form, by GIS boffins. This is useful.
The key point is getting GIS data for what you are looking for. In our case, what we want access to is the CAPAD 2014 database. CAPAD stands for Collaborative Australian Protected Areas Database, and we want the latest version, 2014. It is updated every 2 years, and is probably as close to canonical data as you can find. It is collated by each state government and then coordinated by the Federal Government into a single database of information. I believe they are required to do this under treaty obligations, but whatever the reason, the data is available to the public under a Creative Commons CC-BY license (anyone can use it, you just have to acknowledge where the data came from).
Getting the data
The first step is to get the data from the FED site. This used to be known as the DIG site, but you know, new government and all. In the search box, enter CAPAD 2014. The first two search results are the 2014 examples, one for terrestrial parks, the other for marine. For the sake of demonstration, we’ll just take the first one.
Click on first entry, then on the words “Full Statement”. It will pop up a new tab containing information about the data. At the top of that, there should be two options – Details and Download. Click on Download. You’ll be presented with a file list, containing one file, a zip file, in ArcGIS Shapefile format. Click on that, and save it to a location that you can find again.
Once it is downloaded, unzip the contents into another location you can find again.
Reading the data
“But wait, what is ArcGIS Shapefile format? That doesn’t sound all spreadsheet-y or word processor-y”, I hear you cry (and if I don’t hear you cry, then you hear me cry rhetorically). ArcGIS Shapefiles (or ESRI shapefiles now they are standardised) are the most common professional GIS format out there – KML/KMZ files from Google Earth may outnumber them, but don’t seem to have all of the fun capabilities of Shapefiles, and frankly, almost all government data comes in Shapefile format, regardless of the government involved.
ArcGIS is an expensive product, so instead, we download QGIS, a free and open source GIS software. I run Linux exclusively, so under Fedora, I simply type in a root shell “yum install qgis”. Other distros will be similar. If you run Windows or a Mac, there are options for download from the website, and a quick Google search will help you find telephone numbers of people who can offer counselling and support for your illness.
Once it’s installed, you are good to go.
Let’s open up QGIS now through whatever means your operating system requires (Start Menu, typing ‘qgis’, etc). You will be presented with a simple screen:
QGIS 2.0.1 after it is opened. Later versions may differ a little.
Let’s now open up CAPAD.
QGIS works on the principle of layers. There can be many different attributes for a particular location that exist in different formats. For example, Anakie Gorge is located in Brisbane Ranges National Park (a NP layer), Victoria (a state layer), Golden Plains Shire (a LGA layer), and Australia (a country layer). Layers can also be used to combine information from different datasets.
There are two main kinds of layers, vector and raster. Vector data consists of points (SOTA peaks, eg), lines (roads) or polygons (park boundaries and all their inclusive area). Raster data is evenly spaced data, usually elevation data or data such as Landsat data (this square is grass, this square is industrial, etc). CAPAD data, and shapefile data in general, is Vector data.
Go to the Layer menu, and choose “Add Vector Layer”. A dialog will pop up, the defaults are usually accurate enough – System encoding, File source type (we’re using a file). Now, select Browse, and go to the location you had unzipped CAPAD into, and select the CAPAD_2014_Terrestrial.shp file:
Select the CAPAD file
OK the dialog box, and watch the screen fill with polygon data:
You can see the rough outline of Australia, and you can see parks spreading all the way down to Heard and McDonald Island to the west. Success! So now, what can we do with this data?
Exporting to KML
Let’s try something simple – an export to KML of all the VK5 Parks for the Parks award.
On the layers tab to the left of the map, you will see the words “CAPAD_2014_Terrestrial”. You can right click on that name to bring up a context menu. Let’s first choose “Open Attribute Table”.
The Attribute Table
This shows us all the data we have in CAPAD. Have a few minutes scrolling around to see what is here. You can select items here and they will appear yellow in the main screen, although we are still zoomed out a fair way and it might not be obvious.
At the top of the window, there are buttons. The fourth one from the left is a yellow rectangle with a greek epsilon letter on it – Select by Expression. Click on that.
Under Fields and Values, you can find all of the fields we can query against. We’re going to use TYPE_ABBR and STATE. Double click on STATE, and you will see it added to the Expression box below. Make the box read
"STATE" = 'SA' AND ("TYPE_ABBR" = 'CP' OR "TYPE_ABBR" = 'NP'). The use of quotes is important, and make sure you do not have the words “Expression is invalid” showing. If it does, recheck what you have typed. Click on Select, then close the dialog and the attribute table.
Selecting by Expression
You should now see some yellow areas, particularly in the bigger NPs in the north of South Australia. Right click again on the CAPAD layer in the layer tab and choose “Save Selection As”. From the format box, choose “Keyhole Markup language [KML]”. Enter a filename, or browse to save the file. The rest of the defaults are fine, and choose OK.
Save as KML format
You will now have a KML file with the boundaries of the parks shown located in your saved file. That took no more than 10 minutes – including download time. Easy!
What about a CSV?
Follow the same process as above, but instead of Keyhole Markup Language as a Format, choose Comma Separated Values! Use X_Coord and Y_Coord columns to find out the parks’ longitudes and latitudes.
What SOTA summits are in a park?
Let’s get a bit more creative now. Follow the same process above, but now we want to export as an ESRI Shapefile. Save it as “vk5_nps_cps.shp”.
Now, go to the Layer menu and Add New Vector Layer. Now, instead of loading CAPAD, load your vk5_nps_cps.shp file. It should cover up your selection with a new colour. Our next step is to get a list of the SOTA summits from the SOTA website.
Go to the Layer menu, and Add Delimited Text Layer, this time with the summitlist.csv file. It will pop up a dialog – make sure the X coordinate is Longitude, the Y is Latitude. You may have to ensure you ignore the first 1 line to remove the SOTA header. Once you hit OK, there will be some lines it cannot import – none of these are important, although I have not chased down which summits are at fault.
You should now see a bunch of point data displayed on your map. To make it a bit more viewable, remove the tick next to CAPAD_2014_Terrestrial, and right click on the VK5_NPS_CPS layer, and “Zoom to Layer Extent”.
Now, go to the Vector menu, and choose “Spatial Query”. If it isn’t there, you may need to enable the plugin (Plugins -> Manage and Install Plugins). Plugins can be downloaded from the QGIS website, or installed via your distribution’s software repository (or both).
Spatial Query dialog
We want to select source features from summitlists (we want the Summits), where the feature is Within, the list of VK5_cps_nps, and create a new selection. Click on Apply, and let it do its thing.
Spatial Query results
Close the dialog, and then right click on summitslist, and Save Selection As. Save it as a Shapefile format (or in CSV format). This is a list of SOTA summits inside parks. You can reverse the query to find the parks containing summits.
Add in the layer (you should know how to do that now!) and right click on it, selecting “Open Attribute Table”. You now have a list of all the SOTA summits in the park. This is not complete, as some parks may be close to a boundary, with the activation zone in a park, but the summit point itself isn’t. That requires a more complex query.
More complex queries
This is usually about where I stop using QGIS and start using PostGIS. PostGIS is an extension to a PostgreSQL database that allows you to run SQL queries against spatial tables. If that last sentence doesn’t make sense, then PostGIS is probably a bit complex, although I’m happy to help anyone who has queries (no pun intended).
In a nutshell, set up a PostgreSQL database, add in the PostGIS extension (“create extension”), then import the Shapefile data – either through SPIT under QGIS or my preferred method, shp2pgsql. You could then run queries. Eg, for the “near to boundary of park” question above, you could query using the ST_DWithin() function. Or use a spatial join to show both Summit Code and Park Name in a single query, using ST_Within() or ST_Contains().
In short, stop manually transcribing boundaries! Stop resorting to weird internet sites! Use QGIS and CAPAD to manage it all as closely as possible to reality. No data is perfect, but CAPAD is about as close as we mere mortals will manage. QGIS helps run simple queries over that data. PostGIS can run very complex queries. Let technology work for you.