Leader(s)

Difficulty & duration

  • difficulty: intermediate
  • duration: 1-2 days

Abstract

Our current aims are:

  • Finalize SpatialExperiment class structure by creating and testing example datasets from several technological platforms (to be collected in STdata package)
  • Adapt visualization functions for data from each platform (to be collected in ggspavis package)
  • Create a short analysis workflow using one of the example datasets from each platform

This will allow us to build an integrated Bioconductor-based infrastructure for analyzing spatially resolved transcriptomics data. Ultimately, this will all be showcased in our OSTA online book.

So far, we have mainly worked with data from the 10x Genomics Visium platform. The STdata package (under development) currently contains two example datasets from the Visium platform (human DLPFC and mouse coronal).

In this set of “Dataset” challenges, we will aim to select and prepare several additional datasets for demonstration and testing purposes.

Our existing Visium datasets above each contain only a single sample. However, we would also like to ensure that our infrastructure can correctly handle multi-sample datasets, which we expect will become more common in the future as Visium (and other platforms) become more widely adopted. This includes the SpatialExperiment class as well as plotting functions.

Targets

This challenge consists of:

  • selecting a suitable publicly accessible dataset containing multiple samples
  • formatting this into a SpatialExperiment object
  • creating a reproducible script to create the object from the raw data files, saving the object, and adding these to the STdata package

As a starting point, we can use the make-data.R scripts from the existing objects in the STdata package:

One good option may be to use some of the additional samples from the human DLPFC dataset from our paper (Maynard and Collado-Torres et al. 2020). Currently, we have used sample 151673 from this dataset for the “human DLPFC” dataset in the STdata package. However, we would also be interested to find out if there are any other suitable publicly available multi-sample datasets, either from Visium or other platforms.

Also, as a possible extension to this challenge, we could consider:

  • creating a set of scripts to load all Visium datasets from the 10x Genomics Visium website (10 datasets in total), formatting these as SpatialExperiments, and re-distributing them as an additional data package (similar to the existing TENxPBMCData dataset for single-cell data)