
ARCI, Archaeology eResearch Collaboration Initiative
Joshua Emmitt, Graduate Teaching Assistant; Dr Rebecca Phillipps, Research Fellow; Prof Simon Holdaway; School of Social Sciences; Sina Masoud-Ansari, eResearch Support; Prof Mark Gahegan, Centre for eResearch

Figure 1: Excavation on the Ahuahu Great Mercury Island Project, New Zealand. Artefacts are measured in space with a total station before being registered into a database in-field with a tablet.
Background
Archaeologists are interested in human environment interrelationships over long spans of time and often engage in comparative analyses of these relationships. Data are compiled from a range of sources. The Archaeology eResearch Collaboration Initiative (ARCI) is a research group specialising in the management and analysis of data intensive archaeology. Currently the project is working with data from field projects in Egypt, New Zealand, Saudi Arabia, and Australia. These projects generate large amounts of data that need to be shared regularly between researchers around the globe.
Data collection
Data acquired in the field consists of high resolution sampling of archaeological phenomena, recorded as a collection of geographic information with corresponding attribute information. It is not unusual for tens of thousands of observations to be collected during a field season. In addition, survey data can include photographs of archaeological features, including high resolution GigaPan imagery, and LIDAR point data. This data often includes complex file structures or large file sizes, the reading of which by specialist software such as ESRI’s ArcGIS are not easily shared in a structured way amongst researchers, both domestic and international. The Centre for eResearch provided access to the NeSI Data Fabric and customised servers which facilitated data sharing and collaboration, greatly increasing the productivity of the projects involved.
Methods for collection and recording archaeological data can vary between projects to accommodate different research interests, however, there are usually similarities between projects in that recorded artefacts and features need to show both geographic and archaeological context. The question is on what scale this information is recorded by different researchers, and how data can be meaningfully related. To facilitate this, each data point is given a unique identifier (UNID), and each project is given a unique project code (PC). UNIDs are unique within a project, and the PC is never repeated. Together the UNID and PC form a dual-identifier system (PCUNID) which is unique across all projects. This enables the comparison of data between projects.
With the help of the Centre for eResearch, the ARCI team developed the pre-existing data representation schema that could be used to represent geographic and archaeological data across a wide range of physical environments. This improved schema can support both excavation and survey based data collection techniques, as well as incorporate data collected using different instruments and methods. It also enables the recording of metadata about the way the data was recorded. The schema was built into a PostgreSQL relational database and used PostGIS to link spatial attributes based on the PCUNID. This allowed both geographic and descriptive data to be stored in the same database and created a centralised, authoritative source for project data that could be accessed concurrently through various desktop and web-based applications. Using a database also had the benefit of providing fine-grained access control, so that access could be given to students who only needed to edit part of the dataset. This further aided the research workflow as it minimised problems in sharing and integrating changes made by different team members.

Figure 2: In-field data collection using notebooks in the Fayum, Egypt.
Workflows and tools
Projects which use the ARCI schema may not have started their data collection with it, meaning that data management and cleaning is required. The Centre for eResearch provided software development support and a number of different programs have been written to aid in the managing and cleaning of data. Some examples of this include tools that synchronise a collection of photos with their UNIDs in a photo database and software that creates attribute fields for geospatial data files based on their location in the filesystem. Some of the most important tools have been those that merge the attributes from a number of spatial data files into a standardised schema. In one project, this resulted in the merging of 1,200 shapefiles with different attributes to three, each with a common set of attributes which reflect the ARCI schema.
Data collected in the field is piece-provenanced meaning that the 3D position of each artefact is recorded (Figure 1). Attribute data is also collected on each artefact which could occur in-field (Figure 2) or in the lab. While the recording of information is systematic, errors can occur, and data collected is checked before integration into the main database. With this in mind a series of programs and database workflows were created to help clean the data before integration. These workflows identify duplicate, conflicting, or incomplete records. For example, one of the workflows compares the PCUNIDs between the descriptive and geospatial datasets to identify records that lack corresponding entries.
What’s next
The ARCI project plans to roll out several of its interfaces over the next year. This includes a website which will make available several of the workflows developed by the project for other researchers to use. The website will also serve as a front end for our online database which will be at first accessible to researchers involved in the individual projects, with the future goal of facilitating the open-access of archaeological data, something that is increasingly required by funding agencies.
With increased public outreach as well as numerous articles which are being published in academic journals, it is hoped that more researchers will show interest in incorporating their datasets into our schema and database model. For more information please email the ARCI team at arci@auckland.ac.nz
See more case study projects

Our Voices: using innovative techniques to collect, analyse and amplify the lived experiences of young people in Aotearoa

Painting the brain: multiplexed tissue labelling of human brain tissue to facilitate discoveries in neuroanatomy

Detecting anomalous matches in professional sports: a novel approach using advanced anomaly detection techniques

Benefits of linking routine medical records to the GUiNZ longitudinal birth cohort: Childhood injury predictors

Using a virtual machine-based machine learning algorithm to obtain comprehensive behavioural information in an in vivo Alzheimer’s disease model

Mapping livability: the “15-minute city” concept for car-dependent districts in Auckland, New Zealand

Travelling Heads – Measuring Reproducibility and Repeatability of Magnetic Resonance Imaging in Dementia

Novel Subject-Specific Method of Visualising Group Differences from Multiple DTI Metrics without Averaging

Re-assess urban spaces under COVID-19 impact: sensing Auckland social ‘hotspots’ with mobile location data

Aotearoa New Zealand’s changing coastline – Resilience to Nature’s Challenges (National Science Challenge)

Proteins under a computational microscope: designing in-silico strategies to understand and develop molecular functionalities in Life Sciences and Engineering

Coastal image classification and nalysis based on convolutional neural betworks and pattern recognition

Determinants of translation efficiency in the evolutionarily-divergent protist Trichomonas vaginalis

Measuring impact of entrepreneurship activities on students’ mindset, capabilities and entrepreneurial intentions

Using Zebra Finch data and deep learning classification to identify individual bird calls from audio recordings

Automated measurement of intracranial cerebrospinal fluid volume and outcome after endovascular thrombectomy for ischemic stroke

Using simple models to explore complex dynamics: A case study of macomona liliana (wedge-shell) and nutrient variations

Fully coupled thermo-hydro-mechanical modelling of permeability enhancement by the finite element method

Modelling dual reflux pressure swing adsorption (DR-PSA) units for gas separation in natural gas processing

Molecular phylogenetics uses genetic data to reconstruct the evolutionary history of individuals, populations or species

Wandering around the molecular landscape: embracing virtual reality as a research showcasing outreach and teaching tool
