Developer Information
Application Architecture
Introduction
The DataMeta application stack comprises three main applications:
The web application (aka DataMeta)
A memcached server
A PostgreSQL database server
Relational data model
The relational database model comprises the following entities:
User
A database user, uniquely identifiable by their email address. The attributes are pretty self explanatory. A user can be disabled but should not be deleted to maintain tracability of submitted records.
Group
Every user is associated with exactly one group.
File
A file corresponds to a single file submitted by the user. It holds the following attributes:
name
- The original name of the file as submitted by the username_storage
- The name under which the file is stored in the storage backend. The webapp uses a fixed naming scheme for this purpose which ensures that the filename is unique on the storage backend:{file_id.rjust(10,'0')}_{user_id}_{group_id}_{file_size}_{md5_checksum}
checksum
- The MD5 checksum, computed immediately after uploadfilesize
- The filesize of the filechecksum_crypt
- The checksum of the file after encryptionfilesize_crypt
- The filesize of the file after encryptionuser_id
- The UID of the user who uploaded the filegroup_id
- The GID of the user who uploaded the file (see User and Group)
MetaDatum
A metadatum describes a datum conceptually, it does not hold any data. It thus corresponds to a column in the submitted sample sheet, but not to the data held in that column. The corresponding table in the database is filled when values when the DataMeta instance is configured and typically remains static throughout the lifetime of the instance. However, there are scenarios in which one may want to add a new column in a later stage, modify the linting constraints defined for a column etc. Implementing this will require migration concepts for these events, e.g. old data may no longer pass linting.
A metadatum has the following attributes:name
- The name of the metadata. This defines the text required in the header of the corresponding sample sheet column.regexp
- A regular expression. If specified, it’s applied to the field during linting.short_description
- A message to display in the interface when the verification based onregexp
failes, e.g. “Only three digits are allowed in this field”.datetimefmt
- C standard date time format code string (see e.g. here). If specified, the application assumes that this field holds a datetime value. When reading sample sheets from plain text formats (CSV, TSV) or if the corresponding column in a submitted Excel sheet is formatted as plain text, this format string is applied to parse the column values. Additionally, this format string is applied to display the values of this column in the user interface. Note that date, time and datetime values are always stored as plain text datetime values in ISO 8601 format internally. The missing component falls back to 00:00 and 1900-01-01 respectively.datetimemode
- An enum holding DATE, TIME or DATETIME. If set, denotes that the field should be treated as the corresponding type internally.mandatory
- A flag indicating whether this field may be empty in sample sheets or notorder
- An integer. When displaying the sample sheet in the user interface, the columns are shown in the order of theirorder
values.isfile
- A flag indicating whether this column corresponds to a file.
MetaDatumRecord
Corresponds to one captured value of one metadatum. Every filled field in the sample sheet (except the header) corresponds to one MetaDatumRecord. Attributes:
metadatum_id
- The ID of the metadatum this record corresponds tometadataset_id
- The MetaDataSet (see below) this record is part of.file_id
- If this metadatum record corresponds to a file and it has been submitted (see Conceptual Notes)value
- The plain text value of this metadatum
MetaDataSet
A metadataset corresponds to one row in the sample sheet. It groups all metadatum records that correspond to this row and additionally holds information about the owner of the metadata and whether it has been submitted (aka committed) or not. Attributes are:
user_id
- The owner’s UIDgroup_id
- The owner’s GID at the time of creationsubmission_id
- The submission ID once this metadata has been committed (see below).
Submission
A submission corresponds to the event of a user clicking the
COMMIT
button on the/submit
view. It holds the time at which the submission took place.AppSettings
A table holding application settings configured by the administrator running the DataMeta instance. This is currently not used.
Conceptual Notes
All captured metadata is treated as text internally
The generic application design, i.e. that the metadata fields can be defined dynamically at runtime, implies that there cannot be a 1:1 mapping of the sample sheet into the relational data model in form of a corresponding relation / table, unless one wants to go into the realm of using
CREATE / ALTER TABLE
at runtime. The current data model design stores all captured values in one attribute of one relation (metadatumrecord.value
), which makes type reflection on the data model level impossible. At the same time, we’re anyway accepting plain text formats such as CSV or TSV as input, thus all values that are to be held must be serializable anyway. Thus thevalue
field of the MetaDatumRecord entity isTEXT / VARCHAR
.Files and MetaDatumRecords are detached until submission
Until submission (aka commit), the data model does not link files and metadatumrecords. The integration of the file names and file uploads into the sample sheet (Pending annotated submissions) is purely visual on the client side through name-based matching. This is also utilized internally, i.e. to differentiate pending files from files that have been submitted, as files do not have a direct relation to a submission themselves. Only when a data record is submitted, files get linked to metadatumrecords and those get linked to a submission (via metadataset).
Both files and metadatasets have owners
The previous point, i.e. pending files and metadatasets not being connected, requires that ownership is documented both on the file and on the metadataset level.
Who has access to what data
What is shown on the
/submit
view is private to a UID/GID combination. Other members of the same group cannot access the pending submission, neither can the user himself in case they change their group. After submission, in the/view
view, the user can see all submitted data from his group (not yet implemented).
Visual Studio Code Setup
Installation instructions for using the remote container feature of Visual Studio Code (vscode)
The remote container feature allows to run the editor’s backend inside a docker container, which is readily set up for development.
Quickstart
Clone this repository:
git clone https://github.com/ghga-de/datameta.git
And open the created directory in vscode, for instance like that:
code ./datameta
Install the remote development extension:
click on the extensions symbol in the side bar
search for
Remote Development
and install it
To reopen vscode inside the dev container:
select
View > Command Palette
in the dropdown menuthen select (or type):
Remote-Containers: Reopen in Container
If you are executing this for the first time, the containers will be set up via docker-compose. This might take some time.
Using the container
Once the build is successful, you will be able to use vscode as usual.
The workspace will be mounted at /workspace
.
However, before you start, you have to first install datameta in edit mode. Just type in the terminal:
dev_install
(this will execute the script at /workspace/docker/dev_install
)
You only have to run this once (unless you re-build the container or want to re-install datameta).
Every time you would like to deploy datameta, just type:
dev_launcher
(this will execute the script at /workspace/docker/dev_launcher
)
The frontend should be available at http://localhost:8080/
in your browser.
Configuration
Any configuration regarding the dev container environment can be found at /.devcontainer
.
The environment includes a few useful vscode extensions out of the box.
If you find a extension that might be of use to everybody, feel free to add it to the /.devcontainer/devcontainer.json
.
For general information on this vscode feature please look here.