Novel tools and methods for designing and wrangling multifunctional, machine-readable evidence synthesis databases
journal contributionposted on 2021-05-07, 00:25 authored by NR Haddaway, Charles Gray, M Grainger
One of the most important steps in the process of conducting a systematic review or map is data extraction and the production of a database of coding, metadata and study data. There are many ways to structure these data, but to date, no guidelines or standards have been produced for the evidence synthesis community to support their production. Furthermore, there is little adoption of easily machine-readable, readily reusable and adaptable databases: these databases would be easier to translate into different formats by review authors, for example for tabulation, visualisation and analysis, and also by readers of the review/map. As a result, it is common for systematic review and map authors to produce bespoke, complex data structures that, although typically provided digitally, require considerable efforts to understand, verify and reuse. Here, we report on an analysis of systematic reviews and maps published by the Collaboration for Environmental Evidence, and discuss major issues that hamper machine readability and data reuse or verification. We highlight different justifications for the alternative data formats found: condensed databases; long databases; and wide databases. We describe these challenges in the context of data science principles that can support curation and publication of machine-readable, Open Data. We then go on to make recommendations to review and map authors on how to plan and structure their data, and we provide a suite of novel R-based functions to support efficient and reliable translation of databases between formats that are useful for presentation (condensed, human readable tables), filtering and visualisation (wide databases), and analysis (long databases). We hope that our recommendations for adoption of standard practices in database formatting, and the tools necessary to rapidly move between formats will provide a step-change in transparency and replicability of Open Data in evidence synthesis.