Session: Open Source for Open Science with Frictionless Data
Scientific research is undergoing a cultural change, pivoting towards more transparency and openness spurred by what some have termed a “reproducibility crisis”. Many data and findings from published research are not replicable, leading to decreased scientific rigor and reduced public trust in science. But, many scientists and developers are now starting to work on making research more reproducible, and open source projects are a natural fit to help. This talk will discuss how our open source project, Frictionless Data for Reproducible Research, aims to improve researchers’ data workflows and champion reproducible science. The Frictionless Data initiative at Open Knowledge Foundation (http://okfn.org) aims to reduce friction in working with data, with a goal to go from messy data to insight faster. “Frictions”, such as when data is in a difficult to use format, is hard to find, or is poorly structured, make it difficult to use, publish, and share data. This project is a suite of open source software, tools, and specifications focused on improving data and metadata interoperability that make it effortless to transport data among different tools and platforms for further analysis. In this talk, I will discuss the lessons we learned from integrating this open software into researchers’ existing data pipelines by showcasing recent collaborative use cases with biologists. For example, I will show how we worked with oceanographers to implement Frictionless Data Python code into their data ingest pipelines to integrate disparate data while maintaining quality metadata in an easy to use interface. The talk is well suited for scientists and researchers, but is also equally applicable for anyone that works with open data or with messy data. The talk is somewhat technical, but can be easily understood by beginner-level audience members too.