When fiction is presented as fact, false data can seem valid. If the data isn’t accessible or destroyed, it could become more difficult to tell the difference between the two. The destruction of data, a modern-day version of book burning, is a legitimate concern, especially when politics guides decision-making. The wholesale destruction of data can be couched in “cost saving measures,” which is happening in Canada. The closure of Department of Fisheries and Oceans libraries north of the border resulted in loss of or eliminated access to some government data collected 30 years ago. Some of the printed archive was taken by private companies at no cost, and some was thrown away. To date, the government hasn’t followed through on a commitment to digitize and share any of the paper-based material.
So when Republican Scott Pruitt, who vowed to shut down the Environmental Protection Agency (E.P.A.) in the past, was appointed to head that government entity, the fear that climate data could be destroyed became very real. Now, the removal of scientists from the E.P.A. review board in an apparent move to make room for industry advocates the agency is supposed to regulate, creates a blatant conflict of interest that heightens the specter of data destruction. Many scientists and concerned citizens anticipated these developments in late 2016 and started planning.
Grassroots Data Preservation
DataRescue, Environmental Data Governance Initiative (EDGI), Climate Mirror, ProjectARCC, #ProtectClimateData and other collaborations brought together scientists, academic institutions, librarians, archivists and individuals interested in science to preserve at-risk environmental data. The groups developed protocols and software for identifying, reviewing and flagging data sets for archiving. One primary goal is to safeguard the data by keeping it in the public domain. More than 40 data archiving events have been held to date, and more are scheduled.
Some crowd-sourcing events jump started Data Rescue, the name given to the overall movement, garnered media attention when everyday people sat next to tech geeks and scientists to find data sets stored on government websites.
“Almost all of these events happen at universities and are supported by universities and libraries at universities,” says Toly Rinberg, member of the EDGI Steering Committee. “A lot of the time, [participants] are just concerned about data or want to find something to do in response to what’s happening in the country. We had so much interest … we wanted to capitalize that.”
Transparency and availability around scientific data has been growing over the years, according to Ge Peng, research scholar with NOAA’s Cooperative Institute for Climate and Satellites in North Carolina (CICS-NC).
“U.S. federal mandates require that federally funded research data and results be archived and made publicly accessible in a timely fashion,” she says. “Federal funding agencies have developed data access and sharing policies or guidelines.
“Scientific-journal publishers have been taking steps to make research data and results more accessible, for example, with more and more online open-access journals and reproducibility requirements.”
Peng says data archiving and analysis haven’t kept pace with the recent explosion of data collection, so creating standards for data set structure and archiving procedures are essential to making that information usable now and into the future. Citing a research study about the Internet of Things, Peng says less than 1 percent of the world’s digital data are being analyzed.
What is Archiving?
“Many people tend to think data archiving is simply storing data on a disk and that this is only the responsibility of archives or repositories,” Peng says. “It is much more complicated and effort-intense than that. However, I have started to see the culture shift as scientists/data producers see the benefits of working with data centers.
“Efforts have been [made] at archives and repositories to improve the workflow of requesting that a data set be archived, collecting data and information and creating metadata and archival packages so that scientists do not necessarily need to know everything about data archiving standards to participate in the archival process”
Metadata is a term used for the description of the content in an electronic medium, such as a website, document or data set. These descriptions make it possible for a search function, using keywords, to find available material.
Something Rinberg and his peers have learned from the Data Rescue events is that “downloading the datasets is not that interesting,” but making sure the digital documentation for the source of the data is more engaging. This makes metadata an essential step that supporters can champion.
“We would like to have these [grassroots] events be more about engaging the community and engaging scientist to add metadata and information around the datasets. What is this data set, what URLs is it linked to?” he says. “It’s not super difficult, but it’s important to do it properly and to make sure there’s the right provenance for it.”
Previous efforts to make government data available have been tried with limited success – data.gov is one such attempt, but the execution is poor, according to Rinberg. And finding common ground for creating accessibility in a field that is experiencing explosive growth is difficult.
“How to standardize on a data format for all disciplines has been the subject of active, ongoing discussions for quite some time,” says Peng. “I believe that there is still some way to go before we could agree on one particular data format—if ever.”
While the wholesale purge of government data feared with the change in administration hasn’t been realized, data is starting to go missing from the EPA web site, according to EDGI. One target supports a group with zero political power and influence—kids.
“EDGI’s Website Tracking Committee has released a report detailing a particular instance of loss of access to information concerning the EPA’s website, A Student’s Guide to Global Climate Change, an educational site for kids with more than 50 web pages … is currently not accessible.”
Top photo by Pixabay
Margo is a science writer poking her nose into everything that piques her curiosity, from NASA and sea turtles to climate change and green tech.