Python-Based API Client

We offer a Python-based client to talk to the RESTful API of a DataMeta Server (e.g. the CoGDat Portal). It can either be used as a Python library or from the command-line.

Its primary use case is to automate the staging and submission of files and metadata.

If this client does not fit your needs, e.g. if you would like to interact with a DataMeta server through a non-Python-based application, you can also use the RESTful API of that server directly. Please see the API documentation for details.

Installation

Requirements:

The client is compatible with all major OS platforms (Linux, macOS, and Windows). However, following requirements have to be satisfied:

Python 3.6 or higher (Installation instructions for all platforms)
pip
git (for the development version, Installation instructions)

Installation from PyPI:

The latest release of the client can be installed from PyPI:

python3 -m pip install datameta_client

Install the development version:

Alternatively, you can install the latest development version from GitHub:

python3 -m pip install git+https://github.com/ghga-de/datameta-client

Check if the installation succeeded:

Please check if the installation was successful by running:

dmclient --help

This should print a basic CLI description.

Configure the client to connect to a DataMeta server

To connect the client to a DataMeta server (e.g. the CoGDat Portal), there are two important configuration parameters:

1. The URL to the datameta server

This is https://data.cogdat.de/ in the case of CoGDat.
Please ensure to specify the server’s index/root. Do not include any sub-routes (e.g. the api route: https://data.cogdat.de/api/v0).

2. An API key/token obtained from the datameta server

The easiest way to obtain an API key is through the UI. Please see this section.

There are generally three ways to provide these parameters:

using a configuration file in YAML format
via environment variables
as arguments to the function call or the command line

Please note: if the parameters are specified via multiple options, the options mentioned last in the above list will have priority.

Configure using a YAML configuration file

The required parameters can be stored in a YAML config file containing a url and a token property.

The content might look like this:

# please adapt values accordingly:
url: https://data.cogdat.de/
token: SYV40qEjbURZwiI1yXKP5GhDqdd3u0kpa7BUASULBd8QpWvR1kTXnIKaO7lQmgTj

This file can be placed in the following path in your home directory to be automatically considered:

~/.dmclient.yaml for Linux or macOS
C:\Users\YourUser\.dmclient.yaml for Windows

Alternatively, you could also provide the config file on the command line via the --config argument:

dmclient --config /path/to/your/config.yaml <commands>

Configure using environment variables

You can also use the following environment variables to specify the required parameters:

DATAMETA_URL
DATAMETA_TOKEN

On Linux or macOS, you might set them like in the following example:

# please adapt values accordingly
export DATAMETA_URL=https://data.cogdat.de/
export DATAMETA_TOKEN=SYV40qEjbURZwiI1yXKP5GhDqdd3u0kpa7BUASULBd8QpWvR1kTXnIKaO7lQmgTj

To set environment variables on Windows please see this tutorial.

Provide parameters as arguments to function call or command line

Alternatively, you might also specify the parameters directly when calling a Python function or using a CLI command.

For instance for the file upload functionality, the CLI can be used like this:

dmclient \
  --url "https://data.cogdat.de/" \
  --token "SYV40qEjbURZwiI1yXKP5GhDqdd3u0kpa7BUASULBd8QpWvR1kTXnIKaO7lQmgTj" \
  /path/to/the/file/to/upload

Within Python, the above example would look like this:

from datameta_client.files import stage
stage(
    path="/path/to/the/file/to/upload",
    url="https://data.cogdat.de/",
    token="SYV40qEjbURZwiI1yXKP5GhDqdd3u0kpa7BUASULBd8QpWvR1kTXnIKaO7lQmgTj"
)

Upload and submit data

The data submission process consists of 3 steps:

uploading/staging files
staging metadata
submit a set of files and associated metadata record

For a more detailed explanation, please refer to the General Concepts section.

Prepare metadataset records:

For a detailed discussion of all mandatory and optional metadata fields, please refer to this section.

To format metadata records for use with the Python client, you have three choices:

1. JSON File: Store metadata records in a JSON file.
For a single metadata record, the content might look like this:

{
    "ID": "my_unique_id_01",
    "Date": "2021-02-23",
    "ZIPCode": "692",
    "RawFQ1": "sample_01.fastq",
    "AssemblyFA": "sample_01_assembly.fasta",
    "SeqPlatform": "ONT",
    "AmpKit": "COVIDSeq"
}

You may also provide a list of metadata records, e.g.:

[
    {
        "ID": "my_unique_id_01",
        "Date": "2021-02-23",
        "ZIPCode": "692",
        "RawFQ1": "sample_01.fastq",
        "AssemblyFA": "sample_01_assembly.fasta",
        "SeqPlatform": "ONT",
        "AmpKit": "COVIDSeq"
    },
    {
        "ID": "my_unique_id_02",
        "Date": "2021-02-23",
        "ZIPCode": "692",
        "RawFQ1": "sample_02.fastq",
        "AssemblyFA": "sample_02_assembly.fasta",
        "SeqPlatform": "ONT",
        "AmpKit": "COVIDSeq"
    }
]

2. JSON string: Instead of storing it in a file, you may also provide the JSON-formatted metadata as a string.
For the above single metadata record example, this would look like this:

"{\"ID\": \"my_unique_id_01\", \"Date\": \"2021-02-23\", \"ZIPCode\": \"692\", \"RawFQ1\": \"sample_01.fastq\", \"AssemblyFA\": \"sample_01_assembly.fasta\", \"SeqPlatform\": \"ONT\", \"AmpKit\": \"COVIDSeq\"}"

3. Python dictionary: When using the client as a Python library, you may provide a single metadata record as a Python dictionary or multiple records as a list of Python dictionaries. The syntax will be identical to the above JSON file example.

Shortcut - stage and submit in one go:

The Python client provides a “shortcut” functionality that will stage and submit files and metadata records in one go. This is the recommended procedure for most use cases. However, if you need more fine-grained control over the staging/submission process, please refer to the following section.

The following parameters are required:

metadatasets_json: single or a list of metadatarecords (see here)
files_dir: the path to the directory containing the files to upload.

Please note: only files mentioned by file name in the metadata records will be considered.

The following Parameter is optional:

label: human-readable label/title describing the submission

The response will contain information about the submission including the ID of the submission itself as well as the IDs of the associated files and metadata records (see here for an explanation on ID usage).
E.g. it might look like this:

{
    "id": {
        "uuid": "ed233c00-26de-4a1a-8384-4c613d4bcb33",
        "site": "CDS-97478693"
    },
    "metadataset_ids": [
        {
            "uuid": "be0fe621-5cd9-4f2e-a901-1df2bbe255cb",
            "site": "CDM-91697562"
        }, 
        {
            "uuid": "065d31d3-bcec-4c7a-9d82-06ceb4bb8761",
            "site": "CDM-07002994"
        }
    ],
    "file_ids": [
        {
            "uuid": "be0fe621-5cd9-4f2e-a901-1df2bbe255cb",
            "site": "CDF-91697562"
        },
        {
            "uuid": "3d3effdb-8181-4750-8b3b-38e0e5cf8b24",
            "site": "CDF-81097563"
        },
        {
            "uuid": "7199115f-0ef6-4f0d-8b45-0e32317dadad",
            "site": "CDF-16594243"
        },
        {
            "uuid": "e56198f5-d6bc-4dc2-801f-53397c5f841b",
            "site": "CDF-34906130"
        }
    ],
    "label": "Patient Group X"
}

Example using the CLI:

dmclient \
    shortcuts \
    stage-and-submit \
    --label "Patient Group X" \
    "/path/to/metadata.json" \ # can also be provided as JSON string
    "/path/to/files/dir" \
    > submission_response.json # response is printed as json to stdout

Example using the Python library:

from datameta_client.shortcuts import stage_and_submit

metadata_records = [
    {
        "ID": "my_unique_id_01",
        "Date": "2021-02-23",
        "ZIPCode": "692",
        "RawFQ1": "sample_01.fastq",
        "AssemblyFA": "sample_01_assembly.fasta",
        "SeqPlatform": "ONT",
        "AmpKit": "COVIDSeq"
    },
    {
        "ID": "my_unique_id_02",
        "Date": "2021-02-23",
        "ZIPCode": "692",
        "RawFQ1": "sample_02.fastq",
        "AssemblyFA": "sample_02_assembly.fasta",
        "SeqPlatform": "ONT",
        "AmpKit": "COVIDSeq"
    }
]

submission_response = stage_and_submit(
    metadata_json = metadata_records, # can also be provided as 
                                     # JSON string or path to JSON FILE
    files_dir = "/path/to/files/dir",
    label = "Patient Group X"
)

Staging and submission - step by step:

If you need more control over the submission process, you can perform all steps individually. These steps include:

staging a file (repeat for each file)
staging the corresponding metadata records (repeat for each record)
(optional) pre-validating a set of files and associated metadata records for submission
submitting a set of files and associated metadata records

Step 1 - Staging a file:

Parameters to set:

path: Path to the file to upload. (required)
name: File name to be used after upload. By default, the original file name is used. (optional)

The response will contain information about the uploaded file including its ID (see here for an explanation on ID usage).
E.g. it migh look like this:

{
    "id": {
        "uuid": "3cc89e9d-50f9-4bbc-9dfc-421836b21477",
        "site": "CDF-91697562"
    },
    "name": "test_file_1.txt",
    "content_uploaded": true,
    "checksum": "5a105e8b9d40e1329780d62ea2265d8a",
    "user_id": {
        "uuid": "5bea80d2-fd1e-4433-87a8-00cf6da76300",
        "site": "CDU-73698886"
    },
    "group_id": {
        "uuid": "14bded6e-505e-4007-b4b9-eb43fb865a42",
        "site": "CDG-00268878"
    },
    "expires": "2021-03-26T11:18:31.592234",
    "filesize": 5
}

Example using the CLI:

dmclient \
    files \
    stage \
    --name "sample_01.fastq" \
    "/path/to/file" \
    > file_response.json # response is printed as json to stdout

Example using the Python library:

from datameta_client.files import stage

file_response = stage(
    path = "/path/to/file",
    name = "sample_01.fastq"
)

Step 2 - Staging a metadata record:

Parameters to set:

metadata_json: A single metadata record (see here). Lists of records are not allowed.

The response will contain information about the staged metadata record including its ID (see here for an explanation on ID usage).
E.g. it might look like this:

{
    "record": {
        "ID": "my_unique_id_01",
        "Date": "2021-03-04T00:00:00",
        "ZIPCode": "692",
        "RawFQ1": "sample_01.fastq",
        "AssemblyFA": "sample_01_assembly.fasta",
        "SeqPlatform": "ONT",
        "AmpKit": "COVIDSeq"
    },
    "id": {
        "uuid": "be0fe621-5cd9-4f2e-a901-1df2bbe255cb",
        "site": "CDM-91697562"
    },
    "group_id": {
        "uuid": "14bded6e-505e-4007-b4b9-eb43fb865a42",
        "site": "CDG-00268878"
    },
    "user_id": {
        "uuid": "5bea80d2-fd1e-4433-87a8-00cf6da76300",
        "site": "CDU-73698886"
    },
    "submission_id": null
}

Example using the CLI:

dmclient \
    metadatasets \
    stage \
    "/path/to/metadata.json" \
    > metadata_response.json # response is printed as json to stdout

Example using the Python library:

from datameta_client.metadatasets import stage

metadata_record = {
    "ID": "my_unique_id_01",
    "Date": "2021-02-23",
    "ZIPCode": "692",
    "RawFQ1": "sample_01.fastq",
    "AssemblyFA": "sample_01_assembly.fasta",
    "SeqPlatform": "ONT",
    "AmpKit": "COVIDSeq"
}

metadata_response = stage(
    metadata_json = metadata_record
)

Step 3 - Pre-validation

For a successful submission, you have to bundle a set of files and a set of metadata records. Ahead of performing the actual submission, you may carry out the optional pre-validation step as a “dry run” to check the validity of a submission request.

Parameters to set:

metadataset_ids: A list of metadataset IDs. Specify a single comma-separated string (when using the CLI) or a Python list of strings (when using the Python library).
file_ids: A list of file IDs. Specify a single comma-seperated string (when using the CLI) or a Python list of string (when using the Python library).
label: human-readable label/title describing the submission.

The metadatasets and file ids can be obtained from the responses obtained in step 1 and step 2. You may either use the site IDs or the UUIDs.

The responce will either be True if the request is valid or False if it is invalid.

Example using the CLI:

dmclient \
    submissions \
    prevalidate \
    --label "Patient Group X" \
    "CDM-91697562,CDM-07002994" \
    "CDF-91697562,CDF-81097563,CDF-16594243,CDF-34906130" \
    > prevalidation_response.txt # response (True/False) will be written to stdout

Example using the Python library:

from datameta_client.submissions import prevalidate

metadataset_ids = [
    "CDM-91697562",
    "CDM-07002994"
]

file_ids = [
    "CDF-91697562",
    "CDF-81097563",
    "CDF-16594243",
    "CDF-34906130"
]

valid = prevalidate(
    metadataset_ids = metadataset_ids,
    file_ids = file_ids,
    label = "Patient Group X"
)

Step 4 - Submission

This step will actually store your submitted data/metadata on the server.```

The parameters are identical to step 3:

metadataset_ids: A list of metadataset IDs. Specify a single comma-seperated string (when using the CLI) or a Python list of strings (when using the Python library).
file_ids: A list of file IDs. Specify a single comma-seperated string (when using the CLI) or a Python list of strings (when using the Python library).
label: human-readable label/title describing the submission.

The response will contain information about the submission including the ID of the submission itself as well as the IDs of the associated files and metadata records (see here for an explanation on ID usage).

E.g. it might look like this:

{
    "id": {
        "uuid": "ed233c00-26de-4a1a-8384-4c613d4bcb33",
        "site": "CDS-97478693"
    },
    "metadataset_ids": [
        {
            "uuid": "be0fe621-5cd9-4f2e-a901-1df2bbe255cb",
            "site": "CDM-91697562"
        }, 
        {
            "uuid": "065d31d3-bcec-4c7a-9d82-06ceb4bb8761",
            "site": "CDM-07002994"
        }
    ],
    "file_ids": [
        {
            "uuid": "be0fe621-5cd9-4f2e-a901-1df2bbe255cb",
            "site": "CDF-91697562"
        },
        {
            "uuid": "3d3effdb-8181-4750-8b3b-38e0e5cf8b24",
            "site": "CDF-81097563"
        },
        {
            "uuid": "7199115f-0ef6-4f0d-8b45-0e32317dadad",
            "site": "CDF-16594243"
        },
        {
            "uuid": "e56198f5-d6bc-4dc2-801f-53397c5f841b",
            "site": "CDF-34906130"
        }
    ],
    "label": "Patient Group X"
}

Example using the CLI:

dmclient \
    submissions \
    submit \
    --label "Patient Group X" \
    "CDM-91697562,CDM-07002994" \
    "CDF-91697562,CDF-81097563,CDF-16594243,CDF-34906130" \
    > submission_response.json # response is printed as json to stdout

Example using the Python library:

from datameta_client.submissions import submit

metadataset_ids = [
    "CDM-91697562",
    "CDM-07002994"
]

file_ids = [
    "CDF-91697562",
    "CDF-81097563",
    "CDF-16594243",
    "CDF-34906130"
]

submission_response = submit(
    metadataset_ids = metadataset_ids,
    file_ids = file_ids,
    label = "Patient Group X"
)