Georeferencing Protocols
The purpose of the Mountains and Plains Spatio-Temporal Database Informatics Initiative (MaPSTeDI) georeferencing process is to assign geographic coordinates to museum collections databases of zoological, botanical and paleontological specimens at participating institutions. Georeferencing is the core activity of the MaPSTeDI project, because it generates and validates the spatial data which will permit the analysis of biodiversity changes in space and time using the MapStedi web interface. It is critical that project participants adhere to georeferencing protocols in order to ensure consistency and reproducibility in georeferencing decisions, and eliminate as much subjectivity as possible from the georeferencing process.
A variety of georeferencing methodologies and technologies are presently in use, and more are being developed. This document details the MaPSTeDI georeferencing procedures and protocols for project personnel (georeferencers) at participating institutions, and we encourage non-participants to try the MaPSTeDI method and provide feedback. The document includes procedural instructions and appendices which list additional MaPSTeDI standards, and describes how to resolve many of the most common georeferencing problems.
It is recommended that the reader refer to Appendix A: Georeferencing Standards and Appendix B: Materials for Georeferencing to ensure the correct set-up for georeferencing.
This document is divided into four sections: Finding Coordinates, Assigning Confidence Values, Quality Checking, and Georeferencing Procedure.
This document is also available in PDF format.
I. Finding Coordinates
Assigning coordinates to museum specimens based on the locality information provided in the museum catalogue or collections database is the basis of georeferencing. The amount and accuracy of locality data provided in each record determines the difficulty of georeferencing it. This section is intended to provide a process by which most points can be easily found. All coordinates given are in Universal Transverse Mercator, North American Datum of 1927.
A. Coordinates given
Usually, there is no need to find coordinates for these records. However, it is necessary to verify that the coordinates do actually match the rest of the locality data provided. Occasionally, they will not match because of collecting errors or datum differences. Other sources of error include rounding coordinates when using UTM grids on maps and errors while using GPS units (which can be as much as 200 meters). However, the collectors did choose the coordinates for a reason, so be sure of the error before correcting it. It may be necessary to convert other coordinate systems such as latitude and longitude into UTMs.
Example: Boulder Falls, 465408E, 4428396N, Boulder County, Colorado (probably collected from a GPS unit or gazetteer)
Example: Boulder Falls, 465500E, 4428500N, Boulder County, Colorado (probably collected from a UTM grid on a paper map)
Example: Boulder Falls, 40º 0' 25'' N, 105º 24' 19'' W, Boulder County, Colorado
B. Township/Range/Section (TRS) given
If there is no other usable locality data, or if TRS is the most precise information provided in the locality description, place the point at the center of the TRS or ¼ section. Otherwise, TRS is only used as one factor in determining the final coordinates (see Appendix C: Township/Range/Section for a complete description of the TRS system).
Example: Boulder Falls, PM 6 T1N R72W Sec.35 NW¼ NE¼ NW¼
C. Place Name Only
The majority of locality descriptions will reference a place name and may or may not mention some additional information clarifying where the specimen was collected within that place. Finding coordinates for these place names is usually simple. The Geographic Names Information System (GNIS) contains almost 2 million place names in the United States. Search GNIS in one of several ways (see Appendix B) to locate the correct place name and its corresponding coordinates. By using the GNIS coordinates, MaPSTeDI records with the same locality will have the same coordinates.
Example: Boulder Falls, Boulder County, Colorado
If the place name cannot be found in GNIS initially, it is worthwhile to check alternate spellings and partial names. If there is still no success, the georeferencer should search additional gazetteers as well as Internet resources and historical gazetteers for the place name. There are very few place names that cannot be found in one of these resources.
If additional information is provided about the locality, adjust the point within the boundaries of the place name to fit the locality most accurately.
Example: just below Boulder Falls on the west bank, Boulder County, Colorado
D. Place Name with Offset
Many locality descriptions will indicate that the specimen was collected a certain distance from a place name. This distance is called an offset. Collectors measure these offsets in two ways. An offset measured by air is measured in a straight line from the place name to the point of collection in a specific direction. Conversely, an offset measured by road or river is measured along a road or river in a general approximation of the indicated direction. Unless the collector indicates how the offset is measured, it can be difficult to determine which method to use.
1. Offset by Air
Offsets by air are measured by a straight line from the geographic reference point. This geographic reference point is almost always the GNIS coordinates for the place name in the locality description. An offset by air is occasionally indicated by a researcher but it is more often chosen by the georeferencer when there is no indication that the offset follows any specific road, river, or other linear feature. For instance, in the example below, the offset is almost definitely by air since there are no features that run northwest away from the falls. The GNIS coordinates for Boulder Falls are used as a beginning point for a line that measures 2.1 miles directly northwest away from the falls.
Example: 2.1 miles NW of Boulder Falls, Boulder County, Colorado
2. Offset by Linear Feature
Offsets by a linear feature are measured by tracing the feature for the distance and direction listed. The reference point should be the closest point to the place name that is located along the feature that is being traced. Often, the feature may travel only in the indicated direction for sections before turning a somewhat different direction. An offset by linear feature is used when a road, river, or other linear feature is mentioned in the locality description. It can also be used when a suitable linear feature clearly presents itself on the map. In the example below, CO State Highway 119 generally runs east-west right next to Boulder Falls. The offset would therefore be measured by tracing CO-119 for 2.1 miles east from the point on CO-119 that is closest to the falls.
Example: 2.1 miles E of Boulder Falls, Boulder County, Colorado
3. Offset with more than one direction
Offsets such as these are almost always measured by air. Occasionally, however, there will be references to roads that may indicate otherwise.
Example: 2 miles north, 1 mile east of Boulder Falls, Boulder County, Colorado
4. Undetermined Offsets
In certain cases, it is not possible to determine whether an offset has been measured by air or by linear feature. In such cases, the georeferencer should find the point equidistant between the two possibilities and address the difference by using confidence values (see II, Assigning Confidence Values).
5. Special Note for Offsets by River
Offsets on rivers are often measured with the words "above" and "below" instead of cardinal directions. "Above" is used when referring to upstream of the feature while "below" refers to downstream. The direction a river flows can be easily determined on a topographic map by looking at the contour lines and elevation. The contour lines will always point upstream as they cross the river. Also, remember that river drainages and lakes change size and thus location rapidly, and this is a possible source of error when attempting to georeference specimen records of aquatic taxa.
Example: N. Boulder Creek, 1.3 miles above Boulder Falls, Boulder County, Colorado
E. Other Modifiers
Other modifiers can adjust where the point is placed for a locality record.
Elevation markings can narrow down the area in which you place a point. More often than not, however, they seem to create inconsistency. While elevation should not be ignored, it is important to realize that elevation was often measured inaccurately and/or imprecisely, especially early in the 20th century. One of the best uses of elevation in a locality description is to pinpoint a location along a road or river in a topographically complex area, especially when the rest of the locality description is vague.
Example: Middle Boulder Creek, 7125', Boulder County, Colorado
The year in which a specimen was collected can often change the location of the points. Town names, road names, and even county names change over time. For instance, in the example below, there are four possibilities depending on the year. Before 1935, US Highway 287 was called US Highway 285. Between 1937 and 1946, US 287 south of Fort Collins was replaced by US 87. For records before 1935 and between 1937 and 1946, this locality would obviously create confusion. After 1946, this record would be located along what would eventually become 120th Street in Broomfield. However, in 2002, part of Boulder County broke off to become a part of the new Broomfield County. Thus, after 2002, this locality would no longer refer to 120th Street but, instead, would reference a section of highway that runs through eastern Boulder County.
Example: Two miles from the county border along US-287, Boulder County, Colorado
Addresses are sometimes marked when specimens are collected in cities or towns. When possible, plot the point at the indicated spot with the aid of a local road map or a mapping product such as Mapquest. If the exact address cannot be found, estimate the location as best as possible. Remember that many addresses reflect a grid system of labeling addresses. For instance, addresses between 12th Street and 13th Street would lie between 1200 and 1300. Be aware, however, that street names often change over time!
Example: Backyard of 593 West Street, Louisville, Boulder County, Colorado
Building names are often given to clarify the location within a town or city. Rarely are these buildings given coordinates in GNIS, however. Most buildings can be found in local yellow pages (which are often available via the Internet). Unlike natural features, most buildings change names or even disappear over time, so verify that the building named in the record existed in that location at that time.
Example: Greenhouse at 20th and Broadway, Boulder, Boulder County, Colorado
Highway directions often create confusion in locality descriptions. The highway numbering system for interstates and U.S. highways decrees that highways with odd numbers go north and south while highways with even numbers go east and west. The reality of highways, however, is that highways do not always travel perfectly in the direction their number indicates. One example of this is US Highway 36 in Colorado. US 36 originates in north Denver and runs northwest to Boulder. From Boulder, it then goes north to the town of Lyons and then turns northwest to end in Estes Park. Because it is an even-numbered route, however, US 36 is an "east-west" highway even though it never travels directly east-west. Motorists would go "east" on US 36 from Boulder to Denver. This can be confusing in locality descriptions. If a locality description reads "Five miles east of Boulder on US 36", the correct placement of the point is five miles from Boulder on US 36 even though US 36 leaves Boulder at the south end of the city and never travels directly east. To further complicate the issue, the system of even and odd numbers only applies to highways with one or two digit route numbers and often does not apply to state highway systems. The Santa Cruz Public Libraries website provides a short description of the interstate and US highway numbering rules.
Example: Five miles east of Boulder on US 36, Boulder County, Colorado
F. Vagueness
The bane of the georeferencing process is vagueness in locality records. This vagueness comes in many forms. In addition to actual errors in locality coordinates, vagueness necessitates the use of confidence values in the georeferencing process (see II, Assigning Confidence Values). The most common cause of vagueness is incorrect data entry. It is recommended that checking the original catalog books or field notes be the first step in georeferencing a vague record. This step is only possible, however, when the georeferencer has access to these catalog books.
Example: Near Boulder Falls, Boulder County, Colorado.
An indication that a specimen was collected "near" a location is common. Records such as these should be assigned the coordinates of the location and the vagueness should be addressed in the selection of the confidence value.
Example: 5 miles from Boulder Falls, Boulder County, Colorado.
Offsets without a direction are often the result of errors by the collector when recording the locality. They should be given the coordinates of the location and the vagueness should be addressed in the selection of the confidence value. Occasionally, these localities are data entry errors. If it is possible to view the original collection catalogs, there may be more information.
Example: North of Boulder Falls, Boulder County, Colorado.
Offsets without a distance should be placed at the northern boundary of the indicated feature. If the feature is too small to have a northern boundary, the point can be placed just north of the feature. In either case, the vagueness should be addressed in the selection of the confidence value. Occasionally, these localities are data entry errors. If it is possible to view the original collection catalogs, there may be more information.
Example: North of Boulder Falls on CO SH-119, Boulder County, Colorado.
This locality actually contradicts itself. It states that the specimen was collected on CO State Highway 119, but in reality, this road is south of Boulder Falls. Unfortunately, this type of vagueness is common in many locality descriptions. There are several ways to address contradictory records. First, because the confusion may be due to an error in data entry, it is helpful to check any sort of written catalog that may exist. If no written error exists (or if there is no access to the written catalogs), verify that the contradiction is indeed a contradiction. Often, localities seem to contradict themselves because of the way the georeferencer interprets the locality. Changes in roads and rivers over time can also create contradictions. If there is no question that there is an error in locality, the points can be placed at the reference point (in this case, Boulder Falls) and the confidence values can be increased. The georeferencer can also attempt to locate the point where the specimen was probably collected. For instance, since the locality indicates that the specimen was collected on CO-119, the georeferencer could reason that the locality should actually read "South of Boulder Falls" and place the point accordingly. No matter what is chosen, the georeferencer should get a second opinion on the interpretation of the locality data.
G. Linear Features
Localities indicating linear features such as rivers and roads can be difficult to plot because linear features may go on for miles. Thus, a record using a linear feature may not have a solid single reference point. In cases such as these, the georeferencer should place the point at the midpoint of the range of possibility of locations along the feature, and address the problem when assigning confidence values. Usually, additional locality information such as county will help narrow down the possibilities along a linear feature.
Example: Boulder Creek, Boulder County, Colorado
H. Point Cannot Be Determined
If a locality description does not provide information beyond the county or state level, it is not necessary to plot a point. Leave the coordinate fields blank and address the issue when assigning confidence values.
Assigning Coordinates Values -->
Finding Coordinates | Assigning Coordinates Values | Quality Checking georeferencing Procedure | Appendicies
|