It is common for mappers to find information with a spatial component available in formats that are not yet optimized to work with mapping software. One such occurrence is working with spreadsheet data.
In this tutorial you will learn how to join tables to GIS shapefiles to create a new, mappable dataset with both geographic boundaries and statistical information.
Often, spreadsheets contain useful information we want to map. GIS software, however, does not know how to translate a spreadsheet without any inherent spatial information (such as point coordinates) into a visual map.
Take, for instance, this spreadsheet of weekly COVID rates for each town in Massachusetts for the week of January 26, 2022.
Despite the obvious role of geography in this data (the data is arranged by town), if we were to bring this spreadsheet into a GIS program, the program would not know (without being told by us) which column to interpret as the “geography” to display visually. The formats
.xlsx are not inherently spatial data formats.
On the other hand, take this Massachusetts municipalities dataset which we downloaded from the state government’s open data portal and opened up in a desktop mapping software. The data comes available as a
shapefile, which is an inherently geospatial format.
Because this data is spatial, GIS software can recognize it, and automatically make a map out of it.
The image above shows how mapping software displays GIS data, but what does the “data” actually look like? Let’s look at the data’s underlying
Here, we can see that for each row, which represents a single GIS
feature (in this case, a town), there are some basic facts (each column) about that town. Included here are town name, town ID, population count, etc.
Besides this basic information and the geospatial polygon shapes of the towns, this dataset is not entirely useful for mapping, when it comes to showing any meaningful statistics.
Most of the time, the information we want to map, such as our example COVID rates, are not made available as shapefiles. They are created and distributed as spreadsheets, and in order to map them, we must join the spreadsheet together with GIS shapefile data using a GIS software like QGIS.
Map of weekly Massachusetts COVID data spreadsheet from January 26, 2022, after being joined to a Massachusetts municipalities shapefile from MassGIS.
In this tutorial, we will walk step-by-step through performing a join in QGIS. This tutorial is part of a series called the Preparing census data for digital mapping, but you can also use this guide separately to learn about joins.
Before we dive in to step-by-step instructions, however, let’s go over the basic concept of how joins work.
To perform a join, you must have a column in each dataset which contains the same values.
For instance, when we joined the COVID spreadsheet to the Massachusetts municipalities shapefile, we used a column common to both datasets:
If you don’t already have the tutorial data, you can either:
- Follow the Preparing census data for digital mapping series to obtain them.
- Download them here
Add the census tract shapes data to the QGIS document by following these steps.
Add the statistical tenure data to the map by following these steps.
Right-click the tracts layer in the layer list and select
In the menu, select
In the bottom of the wizard interface, select the green plus sign
Add new join icon.
- Join layer: tenure-2019
- Join field: GEOFIPS
- Target field: GEOID
Select the checkbox next to
Scroll all the way to the bottom and check off only:
Right-click the Cambridge tracts layer in the layer list and select
Open Attribute Table.
Confirm the join worked by scrolling to the far-right of the attribute table and checking that fields
SE_A10060_003 are populated with numerical data.
Export → Save Features Asand save the new joined dataset to preserve the join.