GeoSeeq ::

The Data Model

Organization: Represent a company, univeristy, like "Metasub Consortium”

Have multiple projects
Researchers can join the organziation

Project: Represents a sample group, a project, like "MetaSUB Paris"

Store samples
Store group-level results

Sample: Represents a Single Datum, e. g. one swab

Can belong to multiple groups
Store sample-level results
Stores metadata

Result: Represents a result folder

Stores related files (e. g. Reads 1 & 2) together

Fields: Represents actual data, like sequencing reads

Track files in S3 (or similar)
Stores a version number

GeoSeeq employs a simple data model that can support a variety of use cases. The core of this data model is the Sample.

To group samples into projects GeoSeeq supports projects. Projects are quite literally just groups of samples. Samples may belong to many different groups to support different analyses and sub-group analyses with the only restrictions being related to privacy. The only exception to this are Sample Libraries (often called just Libraries in our documentation). Sample Libraries are also Sample Groups but have a special property that every sample must belong to exactly one Sample Library. This library is, in effect, the sample's home-base.

The real strength of GeoSeeq is its ability to connect data and analyses to samples. Samples contain Analysis-Results which represent either raw data from the sample or results derived from analysis of that data. An example of this could be the raw reads from paired-end DNA sequencing of a sample. The raw reads would be stored as an Analysis-Result with two Analysis-Result-Fields, one each for the forward and reverse reads. Each Field could point to a file stored on the cloud or, for results that require less storage, be stored directly in GeoSeeq.

Projects may also contain Analysis-Results. (On the group result tab) In this case Analysis-Results are used to represent anything that applies to all the samples at once. An example would be a pairwise distance matrix between all samples in a dataset.

Analysis-Results may contain multiple replicates of the same type and each Analysis-Result may contain a list of the other Analysis-Results it was derived from. This helps to ensure provenance of each result and reproducible research.