Wim Cardoen, University of Utah, May 14, 2018
The Objective
As part of a broader Class I project for the Bureau of Land Management (BLM), the University of Utah Archaeological Center (UUAC) was contracted to create a statistical model for predicting the likelihood of archaeological sites, also referred to as cultural resources, across the Grand Staircase-Escalante National Monument (GSENM). This Cultural Resources Predictive Model (GSENM-CRPM) uses a complete sample of all known archaeological sites broken into time period specific components to predict unknown sites based on environmental characteristics associated with these sites using a species distribution model (following a Maximum Entropy, MaxEnt, approach). The result of this is a set of robust statistical models capable of predicting the occurrence of cultural resources throughout the region.
The UUAC director, Dr. Brian Codding and University of Utah Anthropology graduate students Peter Yaworsky and Kenneth B. Vernon began developing a MaxEnt model specific to archaeological sites in August 2017. When looking to generate these models, the group realized they did not have the computational resources to accomplish the task and they reached out the Center for High Performance Computing ( CHPC), with Peter acting as the primary contact. During an initial meeting with Dr. Anita Orendt, CHPC’s Research Consulting & Faculty Engagement Coordinator and ACI-REF, Peter discussed the computational needs of the project. The memory and CPU of the statistical calculations (using the R statistical programming environment) were beyond the available resources with their office computers, and even if they were able to perform the simulations on these resources it would take too long and they would not meet the project deadline. However, there were barriers to moving this research to CHPC as Peter had no experience using Linux and HPC.
The Solution
At the end of Peter’s initial meeting, Anita introduced Peter to CHPC Scientific Consultant and ACI-REF Dr. Wim Cardoen, who, in his current role, takes care of the R statistical package at CHPC in the broad sense — he teaches an introductory class on R, performs the installation of R (core and external packages) on the CHPC clusters, writes the corresponding SLURM scripts, and consults with the CHPC user base when they have R-related questions. Peter then met with Wim to gain some familiarity with the CHPC clusters, specifically working in a Linux environment and using a batch system to submit his analyses to the cluster versus running them directly on his Windows computer.
In a first step, Wim installed eight external R packages, required to process spatial data. Among these packages were rgdal for spatial data processing and rgeos for vector processing, which are R interfaces to the C libraries of gdal (geospatial abstraction layer) and geos (Geometry Engine – Open Source), respectively. Wim first installed the C libraries, and then had to address the fact that they were installed in non-default locations. Along with the R packages, Wim also installed the MaxEnt package on the clusters.
In addition, Peter shared his R code and data sets with Wim. Wim modified part of Peter’s original code to make it more suited to run on CHPC’s HPC clusters. He also tinkered with Peter’s R code to determine the optimal use of the compute nodes. Wim found that the use of multiple cores (using the environment variable OMP_NUM_THREADS) did not significantly improve the performance of the code. Therefore, he decided to proceed by running multiple serial simulations per node, in order to maximize the efficient use of multi-core compute nodes. However, when using one simulation per core, Wim realized that the memory needs of each simulation placed an additional constraint on the runs, as each simulation required about 6 GB, a quantity greater than the memory per core of most of the CHPC nodes. Therefore the decision of the number of simulations per node was determined by the memory of the node. With this knowledge, Wiim created the corresponding Slurm scripts for Peter, allowing him to proceed with the validation of the data, the generation of the models, and the use of the modes to create the predictive rasters, as described below.
The original data set consisted of 132 geospatial predictor variables, called rasters. The geospatial rasters fell into five categories: resource distribution, climate, environmental productivity, landscape and soil attributes.Only 110 of the 132 geospatial rasters were used due to the abundance of missing values for 22 rasters. The initial calculations tested whether the sample areas (areas inventoried for archeological sites) were adequately represented by parameters derived from the 110 predictor rasters. Wim assisted Peter in running these preliminary calculations, which were finished on CHPC resources within several days.
In the subsequent calculations the MaxEnt method was used to produce and analyze two types/generations of models. The MaxEnt method allowed for the determination of the relative weight of the different predictor rasters. In the first generation, 37 models were created using all 110 predictor rasters. Each of these 37 models had a certain time frame associated with it. The second generation of models was a further refinement by selecting the most important predictor raster variables and by dropping the predictor variables with a strong correlation to the selected predictors. In the final stage the five refined models were used to create the four predictive rasters for different time periods, namely for Archaic, Formative, Late Prehistoric, and Historic Time Periods; these were then combined, taking the average of these four individual time periods to produce the General Time Period predictive raster and taking highest probability from each of these time periods, into the Combined Time Period predictive raster. (see Figure 1).
The Result
The first training run of MaxEnt resulted in 37 predictive models based on the 110 predictor rasters and all 4400 archaeological sites. These models allow one to predict where, for example, a residential site dating to the Archaic period, or a rock art site from the Late Prehistoric period is likely to be found within the Monument. The second training run of MaxEnt resulted in four new time period specific predictive models and one general time period predictive model. These models differ from the preliminary time period models in that they utilize a subset of the 110 raster variables that did not covary with one another.
The refined models were used to create six predictive rasters or maps showing the probability of site occurrence (from 0 to 1) at a 5 m2 resolution. They include the four specific time period rasters (Archaic, Formative, Late Prehistoric, and Historic), one General time period raster, and one Combined time period raster. The Combined raster was created by overlaying the four time period rasters and keeping the highest cell values. Where the General time period raster identifies only locations where sites affiliated with specific time periods are likely to occur together, the Combined time period raster identifies any location where sites affiliated with specific time periods are likely to occur together or separately. Thus, the UUAC project was capable of addressing the longstanding problem of underestimating the potential for archaeological resources that accompanied the more promiscuous lumping strategies of previous modeling efforts.
The research, therefore, has both intellectual merit and broader impacts. First, it furnishes anthropology and archaeology with a new method for evaluating hypotheses regarding the evolution of human land-use through time. Second, the project provides a stepping stone to future research aimed at addressing questions of prehistoric land-use on a regional scale. Finally, it equips federal land managers with a powerful new tool, allowing them to craft more effective preservation strategies on public lands.
Collaborators and Resources
- Peter M. Yaworsky, M.S. Graduate Researcher, University of Utah Archaeological Center
- Kenneth B. Vernon, M.A. Graduate Researcher, University of Utah Archaeological Center
- Brian F. Codding, Ph.D., Associate Professor of Anthropology and Director of the University of Utah Archaeological Center
- Wim R. Cardoen, Ph.D, CHPC Scientific Consultant and ACI-REF
Funding Sources
This work was funded by the Bureau of Land Management, subaward from the Colorado Plateau Archaeological Alliance to the University of Utah Archaeological Center. Wim Cardoen was supported in part by a grant from the National Science Foundation, Award #1341935, Advanced Cyberinfrastructure – Research and Educational Facilitation: Campus-Based Computational Research Support.