# Python-Based API Client We offer a Python-based client to talk to the RESTful API of a DataMeta Server (e.g. the [CoGDat Portal](https://data.cogdat.de/)). It can either be used as a Python library or from the command-line. Its primary use case is to automate the staging and submission of files and metadata. If this client does not fit your needs, e.g. if you would like to interact with a DataMeta server through a non-Python-based application, you can also use the RESTful API of that server directly. Please see the [API documentation](api.html) for details. ## Installation ### Requirements: The client is compatible with all major OS platforms (Linux, macOS, and Windows). However, following requirements have to be satisfied: * Python 3.6 or higher ([Installation instructions for all platforms](https://www.tutorialdocs.com/tutorial/python3/setup-guide.html)) * pip * git (for the development version, [Installation instructions](https://git-scm.com/downloads)) ### Installation from PyPI: The latest release of the client can be installed from [PyPI](https://pypi.org/project/datameta-client/): ```bash python3 -m pip install datameta_client ``` ### Install the development version: Alternatively, you can install the latest development version from GitHub: ```bash python3 -m pip install git+https://github.com/ghga-de/datameta-client ``` ### Check if the installation succeeded: Please check if the installation was successful by running: ```bash dmclient --help ``` This should print a basic CLI description. ## Configure the client to connect to a DataMeta server To connect the client to a DataMeta server (e.g. the [CoGDat Portal](https://data.cogdat.de/)), there are two important configuration parameters: **1. The URL to the datameta server** This is `https://data.cogdat.de/` in the case of CoGDat. Please ensure to specify the server's index/root. Do **not** include any sub-routes (e.g. the api route: `https://data.cogdat.de/api/v0`). **2. An API key/token obtained from the datameta server** The easiest way to obtain an API key is through the UI. Please see [this section](web_client.html#api-keys). There are generally three ways to provide these parameters: * using a configuration file in YAML format * via environment variables * as arguments to the function call or the command line Please note: if the parameters are specified via multiple options, the options mentioned last in the above list will have priority. ### Configure using a YAML configuration file The required parameters can be stored in a YAML config file containing a `url` and a `token` property. The content might look like this: ```yaml # please adapt values accordingly: url: https://data.cogdat.de/ token: SYV40qEjbURZwiI1yXKP5GhDqdd3u0kpa7BUASULBd8QpWvR1kTXnIKaO7lQmgTj ``` This file can be placed in the following path in your home directory to be automatically considered: * `~/.dmclient.yaml` for Linux or macOS * `C:\Users\YourUser\.dmclient.yaml` for Windows Alternatively, you could also provide the config file on the command line via the `--config` argument: ```bash dmclient --config /path/to/your/config.yaml ``` ### Configure using environment variables You can also use the following environment variables to specify the required parameters: * `DATAMETA_URL` * `DATAMETA_TOKEN` On Linux or macOS, you might set them like in the following example: ```bash # please adapt values accordingly export DATAMETA_URL=https://data.cogdat.de/ export DATAMETA_TOKEN=SYV40qEjbURZwiI1yXKP5GhDqdd3u0kpa7BUASULBd8QpWvR1kTXnIKaO7lQmgTj ``` To set environment variables on Windows please see [this tutorial](https://www.computerhope.com/issues/ch000549.htm). ### Provide parameters as arguments to function call or command line Alternatively, you might also specify the parameters directly when calling a Python function or using a CLI command. For instance for the file upload functionality, the CLI can be used like this: ```bash dmclient \ --url "https://data.cogdat.de/" \ --token "SYV40qEjbURZwiI1yXKP5GhDqdd3u0kpa7BUASULBd8QpWvR1kTXnIKaO7lQmgTj" \ /path/to/the/file/to/upload ``` Within Python, the above example would look like this: ```python from datameta_client.files import stage stage( path="/path/to/the/file/to/upload", url="https://data.cogdat.de/", token="SYV40qEjbURZwiI1yXKP5GhDqdd3u0kpa7BUASULBd8QpWvR1kTXnIKaO7lQmgTj" ) ``` ## Upload and submit data The data submission process consists of 3 steps: 1. uploading/staging files 2. staging metadata 3. submit a set of files and associated metadata record For a more detailed explanation, please refer to the [General Concepts](about_datameta.html#general-concepts) section. ### Prepare metadataset records: For a detailed discussion of all mandatory and optional metadata fields, please refer to [this section](metadata.html). To format metadata records for use with the Python client, you have three choices: **1. JSON File:** Store metadata records in a JSON file. For a single metadata record, the content might look like this: ```json { "ID": "my_unique_id_01", "Date": "2021-02-23", "ZIPCode": "692", "RawFQ1": "sample_01.fastq", "AssemblyFA": "sample_01_assembly.fasta", "SeqPlatform": "ONT", "AmpKit": "COVIDSeq" } ``` You may also provide a list of metadata records, e.g.: ```json [ { "ID": "my_unique_id_01", "Date": "2021-02-23", "ZIPCode": "692", "RawFQ1": "sample_01.fastq", "AssemblyFA": "sample_01_assembly.fasta", "SeqPlatform": "ONT", "AmpKit": "COVIDSeq" }, { "ID": "my_unique_id_02", "Date": "2021-02-23", "ZIPCode": "692", "RawFQ1": "sample_02.fastq", "AssemblyFA": "sample_02_assembly.fasta", "SeqPlatform": "ONT", "AmpKit": "COVIDSeq" } ] ``` **2. JSON string:** Instead of storing it in a file, you may also provide the JSON-formatted metadata as a string. For the above single metadata record example, this would look like this: ``` "{\"ID\": \"my_unique_id_01\", \"Date\": \"2021-02-23\", \"ZIPCode\": \"692\", \"RawFQ1\": \"sample_01.fastq\", \"AssemblyFA\": \"sample_01_assembly.fasta\", \"SeqPlatform\": \"ONT\", \"AmpKit\": \"COVIDSeq\"}" ``` **3. Python dictionary:** When using the client as a Python library, you may provide a single metadata record as a Python dictionary or multiple records as a list of Python dictionaries. The syntax will be identical to the above JSON file example. ### Shortcut - stage and submit in one go: The Python client provides a "shortcut" functionality that will stage and submit files and metadata records in one go. This is the recommended procedure for most use cases. However, if you need more fine-grained control over the staging/submission process, please refer to the [following section](#staging-and-submission---step-by-step). The following parameters are required: - **metadatasets_json**: single or a list of metadatarecords (see [here](#prepare-metadataset-records)) - **files_dir**: the path to the directory containing the files to upload. Please note: only files mentioned by file name in the metadata records will be considered. The following Parameter is optional: - **label**: human-readable label/title describing the submission The response will contain information about the submission including the ID of the submission itself as well as the IDs of the associated files and metadata records (see [here](api.html#identifiers) for an explanation on ID usage). E.g. it might look like this: ```json { "id": { "uuid": "ed233c00-26de-4a1a-8384-4c613d4bcb33", "site": "CDS-97478693" }, "metadataset_ids": [ { "uuid": "be0fe621-5cd9-4f2e-a901-1df2bbe255cb", "site": "CDM-91697562" }, { "uuid": "065d31d3-bcec-4c7a-9d82-06ceb4bb8761", "site": "CDM-07002994" } ], "file_ids": [ { "uuid": "be0fe621-5cd9-4f2e-a901-1df2bbe255cb", "site": "CDF-91697562" }, { "uuid": "3d3effdb-8181-4750-8b3b-38e0e5cf8b24", "site": "CDF-81097563" }, { "uuid": "7199115f-0ef6-4f0d-8b45-0e32317dadad", "site": "CDF-16594243" }, { "uuid": "e56198f5-d6bc-4dc2-801f-53397c5f841b", "site": "CDF-34906130" } ], "label": "Patient Group X" } ``` #### Example using the CLI: ```bash dmclient \ shortcuts \ stage-and-submit \ --label "Patient Group X" \ "/path/to/metadata.json" \ # can also be provided as JSON string "/path/to/files/dir" \ > submission_response.json # response is printed as json to stdout ``` #### Example using the Python library: ```python from datameta_client.shortcuts import stage_and_submit metadata_records = [ { "ID": "my_unique_id_01", "Date": "2021-02-23", "ZIPCode": "692", "RawFQ1": "sample_01.fastq", "AssemblyFA": "sample_01_assembly.fasta", "SeqPlatform": "ONT", "AmpKit": "COVIDSeq" }, { "ID": "my_unique_id_02", "Date": "2021-02-23", "ZIPCode": "692", "RawFQ1": "sample_02.fastq", "AssemblyFA": "sample_02_assembly.fasta", "SeqPlatform": "ONT", "AmpKit": "COVIDSeq" } ] submission_response = stage_and_submit( metadata_json = metadata_records, # can also be provided as # JSON string or path to JSON FILE files_dir = "/path/to/files/dir", label = "Patient Group X" ) ``` ### Staging and submission - step by step: If you need more control over the submission process, you can perform all steps individually. These steps include: 1. staging a file (repeat for each file) 2. staging the corresponding metadata records (repeat for each record) 3. (optional) pre-validating a set of files and associated metadata records for submission 4. submitting a set of files and associated metadata records #### Step 1 - Staging a file: Parameters to set: - **path**: Path to the file to upload. (required) - **name**: File name to be used after upload. By default, the original file name is used. (optional) The response will contain information about the uploaded file including its ID (see [here](api.html#identifiers) for an explanation on ID usage). E.g. it migh look like this: ```json { "id": { "uuid": "3cc89e9d-50f9-4bbc-9dfc-421836b21477", "site": "CDF-91697562" }, "name": "test_file_1.txt", "content_uploaded": true, "checksum": "5a105e8b9d40e1329780d62ea2265d8a", "user_id": { "uuid": "5bea80d2-fd1e-4433-87a8-00cf6da76300", "site": "CDU-73698886" }, "group_id": { "uuid": "14bded6e-505e-4007-b4b9-eb43fb865a42", "site": "CDG-00268878" }, "expires": "2021-03-26T11:18:31.592234", "filesize": 5 } ``` ##### Example using the CLI: ```bash dmclient \ files \ stage \ --name "sample_01.fastq" \ "/path/to/file" \ > file_response.json # response is printed as json to stdout ``` ##### Example using the Python library: ```python from datameta_client.files import stage file_response = stage( path = "/path/to/file", name = "sample_01.fastq" ) ``` #### Step 2 - Staging a metadata record: Parameters to set: - **metadata_json**: A single metadata record (see [here](#prepare-metadataset-records)). Lists of records are not allowed. The response will contain information about the staged metadata record including its ID (see [here](api.html#identifiers) for an explanation on ID usage). E.g. it might look like this: ```json { "record": { "ID": "my_unique_id_01", "Date": "2021-03-04T00:00:00", "ZIPCode": "692", "RawFQ1": "sample_01.fastq", "AssemblyFA": "sample_01_assembly.fasta", "SeqPlatform": "ONT", "AmpKit": "COVIDSeq" }, "id": { "uuid": "be0fe621-5cd9-4f2e-a901-1df2bbe255cb", "site": "CDM-91697562" }, "group_id": { "uuid": "14bded6e-505e-4007-b4b9-eb43fb865a42", "site": "CDG-00268878" }, "user_id": { "uuid": "5bea80d2-fd1e-4433-87a8-00cf6da76300", "site": "CDU-73698886" }, "submission_id": null } ``` ##### Example using the CLI: ```bash dmclient \ metadatasets \ stage \ "/path/to/metadata.json" \ > metadata_response.json # response is printed as json to stdout ``` ##### Example using the Python library: ```python from datameta_client.metadatasets import stage metadata_record = { "ID": "my_unique_id_01", "Date": "2021-02-23", "ZIPCode": "692", "RawFQ1": "sample_01.fastq", "AssemblyFA": "sample_01_assembly.fasta", "SeqPlatform": "ONT", "AmpKit": "COVIDSeq" } metadata_response = stage( metadata_json = metadata_record ) ``` #### Step 3 - Pre-validation For a successful submission, you have to bundle a set of files and a set of metadata records. Ahead of performing the actual submission, you may carry out the optional pre-validation step as a "dry run" to check the validity of a submission request. Parameters to set: - **metadataset_ids**: A list of metadataset IDs. Specify a single comma-separated string (when using the CLI) or a Python list of strings (when using the Python library). - **file_ids**: A list of file IDs. Specify a single comma-seperated string (when using the CLI) or a Python list of string (when using the Python library). - **label**: human-readable label/title describing the submission. The metadatasets and file ids can be obtained from the responses obtained in [step 1](#step-1---staging-a-file) and [step 2](#step-2---staging-a-metadata-record). You may either use the site IDs or the UUIDs. The responce will either be `True` if the request is valid or `False` if it is invalid. ##### Example using the CLI: ```bash dmclient \ submissions \ prevalidate \ --label "Patient Group X" \ "CDM-91697562,CDM-07002994" \ "CDF-91697562,CDF-81097563,CDF-16594243,CDF-34906130" \ > prevalidation_response.txt # response (True/False) will be written to stdout ``` ##### Example using the Python library: ```python from datameta_client.submissions import prevalidate metadataset_ids = [ "CDM-91697562", "CDM-07002994" ] file_ids = [ "CDF-91697562", "CDF-81097563", "CDF-16594243", "CDF-34906130" ] valid = prevalidate( metadataset_ids = metadataset_ids, file_ids = file_ids, label = "Patient Group X" ) ``` #### Step 4 - Submission This step will actually store your submitted data/metadata on the server.``` The parameters are identical to [step 3](#step-3---pre-validation): - **metadataset_ids**: A list of metadataset IDs. Specify a single comma-seperated string (when using the CLI) or a Python list of strings (when using the Python library). - **file_ids**: A list of file IDs. Specify a single comma-seperated string (when using the CLI) or a Python list of strings (when using the Python library). - **label**: human-readable label/title describing the submission. The response will contain information about the submission including the ID of the submission itself as well as the IDs of the associated files and metadata records (see [here](api.html#identifiers) for an explanation on ID usage). E.g. it might look like this: ```json { "id": { "uuid": "ed233c00-26de-4a1a-8384-4c613d4bcb33", "site": "CDS-97478693" }, "metadataset_ids": [ { "uuid": "be0fe621-5cd9-4f2e-a901-1df2bbe255cb", "site": "CDM-91697562" }, { "uuid": "065d31d3-bcec-4c7a-9d82-06ceb4bb8761", "site": "CDM-07002994" } ], "file_ids": [ { "uuid": "be0fe621-5cd9-4f2e-a901-1df2bbe255cb", "site": "CDF-91697562" }, { "uuid": "3d3effdb-8181-4750-8b3b-38e0e5cf8b24", "site": "CDF-81097563" }, { "uuid": "7199115f-0ef6-4f0d-8b45-0e32317dadad", "site": "CDF-16594243" }, { "uuid": "e56198f5-d6bc-4dc2-801f-53397c5f841b", "site": "CDF-34906130" } ], "label": "Patient Group X" } ``` ##### Example using the CLI: ```bash dmclient \ submissions \ submit \ --label "Patient Group X" \ "CDM-91697562,CDM-07002994" \ "CDF-91697562,CDF-81097563,CDF-16594243,CDF-34906130" \ > submission_response.json # response is printed as json to stdout ``` ##### Example using the Python library: ```python from datameta_client.submissions import submit metadataset_ids = [ "CDM-91697562", "CDM-07002994" ] file_ids = [ "CDF-91697562", "CDF-81097563", "CDF-16594243", "CDF-34906130" ] submission_response = submit( metadataset_ids = metadataset_ids, file_ids = file_ids, label = "Patient Group X" ) ```