Computational workflows

Workflows have become essential in research data processing, facilitating key activities such as data collection, cleaning, analytics, and populating public archives with updated data.

Recognising their critical role in life sciences research, the UK Node is actively involved in enhancing computational workflows in several areas.

Workflow platforms

Workflow platforms are vital for designing, executing, and sharing computational workflows in research. These tools ensure scalability, reproducibility, and transparency, enabling researchers to efficiently handle complex data processing tasks. ELIXIR-UK is deeply involved in promoting and developing best practices for these platforms, including:

  • Nextflow

    A workflow management tool that facilitates scalable and reproducible scientific workflows. It supports running pipelines across different computing environments, from local machines to cloud and cluster systems.
  • Snakemake

    A workflow management system that simplifies the creation, execution, and scaling of data analysis workflows. It automates workflows, ensuring reproducibility from local systems to high-performance clusters and cloud environments.
  • Galaxy

    An open-source platform for accessible, reproducible, and transparent data analysis. It allows users to create, share, and publish workflows in a web-based environment and supports various scientific domains.

Standards and methods 

Workflows inherently contribute to the FAIR data principles by processing data according to established metadata standards, generating metadata during data processing, and meticulously tracking data provenance.

However, for this to be possible, well-established standards and methods must be supported and used when developing and sharing workflows. To this end, ELIXIR-UK actively participates in the development and promotion of key standards such as:

  • Bioschemas

    Enhances the discoverability of life sciences resources by promoting consistent use of Schema.org markup on websites, making resources like datasets, workflows and software easier to find and integrate.
  • EDAM

    An ontology for scientific data analysis and management in life sciences. It categorises topics, operations, data types, and formats, providing a structured hierarchy for semantic annotation and data management.
  • RO-Crate

    Research Object Crate packages research data with structured metadata, supporting FAIR principles. It offers a self-contained description of research objects, ensuring sufficient context for data reuse.
  • Common Workflow Language (CWL)

    An open standard for describing and connecting command line tools to create workflows. It enables the portability of workflows across platforms, scaling from laptops to large computing environments.

Workflow registres

Workflows are digital objects on their own, and as such, FAIR Principles should also be applied to them. To ensure that workflows are as accessible and citable as data, they need to be archived and referenced using citation metadata.

However, most workflows still need to be registered in specialised repositories. They are often stored alongside software, or the specialised ones tend to cater only to specific Workflow Management Systems. 

WorkflowHub addresses these challenges by offering a comprehensive solution for workflow registries. It provides a centralised platform where workflows can be registered, archived and made accessible, complete with rich metadata and persistent identifiers. It also supports diverse WfMSs, fostering interoperability.

ELIXIR-UK Node endorses WorkflowHub as a recognised Node service, and several Node members lead on its development and sustainability.  

  • WorkflowHub

    WorkflowHub is a new registry for describing, sharing and publishing scientific computational workflows. It is an ELIXIR Tools Platform service and part of the ELIXIR Tools ecosystem. The registry has […]
  • BioDT

  • BY-COVID

  • EOSC-life

  • Enabling the reuse, extension, scaling, and reproducibility of scientific workflows

  • Standardising the fluxomics workflows

  • Reference hCNV datasets, use-case workflows and benchmarking