A growing number of publishers and funding agencies require scientists to make their data available upon publication. Four foundational principles  – Findability, Accessibility, Interoperability, and Reusability (FAIR) – support data producers and users by increasing added-value gained by contemporary, formal scholarly digital publishing. Data literacy and management are becoming basic skills for scientists.

The goal of the workshop is to debate how research and education, both funded heavily with public money, can accelerate their potential by being open (Open Access), transparent and largely processed in the public domain (Open Science). Practical examples such as the “reproducibility crisis” and retracted papers steering public opinion; bugs in proprietary data analysis software that compromise results will be given, together with an overview of current practical solutions to improve scientific practices (e.g. Project DEAL, FAIR data management, GOSH Roadmap). Consequently, open practices make scientific results and publications more reliable and reproducible.

Participants will collaborate on their own projects, while acquiring the non-digital and digital knowledge necessary to fulfil novel standards in data management and analysis reports. The workshop is particularly suited for groups aiming at fostering collaboration between their members; participants will explain their workflow and their data to their colleagues as part of the work.

We will conclude on how the adoption of Open Science practices not only brings benefits to the scientific community and society as whole, but also facilitates and optimizes individual workflows.

Research Data Management

Open Data is becoming a standard requested by funders, publishers and universities and Research Data Management (RDM) has been recognised as a core competencies for researchers. I have been developing  workshops to teach RDM with a practical focus, as I am a former scientist with extensive experience in RDM and open data; and I would be happy to discuss the possibility to offer workshops through your graduate school. I am proposing a constructivist approach (short theoretical introductions, examples treated in the whole group, and practical work in small groups on the management of personal research data). Students will be working as data specialists for their own project, and outsiders for the other projects. In addition to learn about RDM (data format, tidy spreadsheets, metadata, data organisation, backup and storage, data sharing, data citation), they will also have to make their data and projects understandable for the other members of their group, helping them to experience the importance of data documentation and metadata. The workshop is aimed at researchers working with long tail data, independently of their research focus.

As good RDM is a time saver on the long run, it would be most effective for researchers to follow such a workshop early in their career, and I hope we could help the next generation of researchers to produce better, sharable datasets.

Course content

Data management in a Reproducible Research Workflow (RRW)

  • From experimental design to publication
  • The art of the spreadsheet: csv xlsx, tidy data, interoperability, machine and human readability
  • Metadata: experiment and sample wide, content, timing
  • Data inventory, folder organisation, file names, backup
  • Open & FAIR data: repositories, licences, FAIR principles

Reproducibility and data analysis

  • Version control &  helper tools (git, Rstudio, Github)
  • How to combine data from different sources
  • Data modification, analysis documentation with Excel and R and Rstudio
  • Make your analysis human readable – code commenting: conventions and examples, dplyr package

Methodology

Our courses are geared towards adult learning and use participatory approaches. The trainer encourages participants to add their experience and knowledge to the course content. Topics covered are backed by real examples and relate to the participants’ field of research.

Before the course, participants can submit specific questions and their own presentation examples by email. The course content will be adjusted to the specific needs and requirements of the participants.

Participants are handed out reading material to be discussed during the course as well as a course summary with their achievements.

Course duration: 2 consecutive days (9am – 5pm)
Number of participants: 8-12

Reading Suggestions & Resources

Gregory K, Khalsa SJ, Michener WK, Psomopoulos FE, de Waard A, Wu M (2018) Eleven quick tips for finding research data. PLoS Comput Biol 14(4): e1006038. doi.org/10.1371/journal.pcbi.1006038

Wilkinson, MD. et al., (Dec 2016), The FAIR Guiding Principles for scientific data management and stewardship. nature.com/scientificdata

The Data FAIRport initiative is an open movement started as the practical follow up of a Lorentz Workshop in Leiden, The Netherlands, January 2014, named: Jointly designing a Data FAIRport. Their vision is to join and support existing communities that try to realise and enable a situation where valuable scientific data is ‘FAIR’ in the sense of being Findable, Accessible, Interoperable and Reusable. | datafairport.org

DMPonline helps you to create, review, and share data management plans that meet institutional and funder requirements. It is provided by the Digital Curation Centre (DCC).| dmponline.dcc.ac.uk