Data Landing Zone (DLZ)

Data Landing Zone (DLZ)

version: 2022-06-09
update: 2024-04-04

Introduction

The Data Landing Zone (DLZ) API is a public API that allows to safely push data to specific myDRE Workspace. Data transfer authentication and authorisation is managed by two API keys: Subscription specific Key that is common for all workspaces of that Azure subscription and Workspace specific API key that is generated for a specific workspace. 


Detailed description


Authentication

The Data Landing Zone (DLZ) API uses a two-tier API key authentication system for data transfer:

  • Subscription-specific Key: This key is shared across all workspaces within an Azure subscription and this key can be acquired from your Local Research Support team member.

  • Workspace-specific API Key: This key is unique to a specific workspace and can be generated by workspace Accountable or Privileged members in the Details section of your workspace.


Include both keys in either the request header or query string:

Ensure the security of your API keys. Do not share them publicly.
Header:
  1. Ocp-Apim-Subscription-Key: <your_subscription_key>

  2. Workspace-API-Key: <your_workspace_key>

Query string:
  1. subscription-key=<your_subscription_key>
  2. workspace-api-key=<your_workspace_key>

Endpoints

  • Testing and Authorization

    • GET /api/ping

      • Tests your connection to the API.

    • GET /api/ping/authorized

      • Tests your authorized connection to the API.

  • Workspace Upload Container Management

    • GET /api/workspace/{workspaceName}/files/containers

      • Description: Lists all upload containers (and their titles) for a given workspace.

      • Parameters:

        • workspaceName (string, required): The workspace name (e.g., dws-1-ACRONYM)

      • Responses:

        • 200 (OK): Success (returns a JSON object of containers)

        • 401 (Unauthorized): Invalid or missing API keys.

        • 404 (Not Found): Workspace not found.

        • 500 (Server Error): An internal server error occurred.


Ensure the security of your API keys. Do not share them publicly.

Further reading


Example using Python

This example demonstrates how to automatically upload data to a myDRE Workspace using the DLZ API and Python. This may differ from the setup you are intending to using it for. This example covers both local-to-workspace and workspace-to-workspace scenarios. Please note that configuration of Data Landing Zone API requires technical understand of API, if you are unsure how to configure this for your specific application, please contact your Local Research Support for help. 
Please note that if performing these actions from inside the virtual machine (e.g. workspace to workspace transfer) you need to have workspace Role Accountable or Priviledged member

Prerequisites

  • Python installation (with administrator rights)

  • Azure-api.net allowlisted (if performing the action from inside the workspace)

  • API Keys:

    • Subscription key (tenant key, request this from your local Research support)

    • Workspace key (can be generated in your workspace in Details section)

    • Workspace name (can be found in Details section of your workspace)

  1. Python packages: azure-storage-blob, azure-identity (Install using pip install azure-storage-blob azure-identity)

Instructions

  • Create two Python files in the same directory:

  • main.py

  • DataUpload.py

  • In main.py, add your keys and workspace name:


  1. import json
    import DataUpload as DU
    import pandas as pd

    DU.workspace_key = 'YOUR_WORKSPACE_KEY'
    DU.tenant_key = 'YOUR_TENANT_KEY'
    DU.workspace_name = 'YOUR_WORKSPACE_NAME

    # create test dictionary
    data = {'Name': ['John', 'Jane', 'Bob', 'Alice'],
            'Age': [25, 30, 35, 28],
            'City': ['New York', 'Los Angeles', 'Chicago', 'San Francisco']}

    # Create a DataFrame from the dictionary
    df = pd.DataFrame(data)


    # create new container
    container = DU.create_workspace_container(DU.workspace_name, title='My Data Upload')
    print(f"New container created at: {container}")

    # upload text from memory
    text_data = f'{container=} is uploaded'
    DU.upload_text_to_container(container, text_data)

    # upload file
    local_file_path = 'DataUpload.py'
    DU.upload_file_to_azure(container, local_file_path)

    #upload pandas dataframe
    DU.upload_dataframe_to_azure(container, df, filename='data2.csv')


    # commit container to workspace
    commit_result = DU.commit_workspace_container(DU.workspace_name, container)
    print(commit_result) 


    # get all containers
    all_containers = DU.list_workspace_containers(DU.workspace_name).json()

    # delete all containers
    for container_key in all_containers:
      try:
        response = DU.delete_workspace_container(DU.workspace_name, container_key['identifier'])
        print(f'{container_key["identifier"]}, {response}')
      except:
        print(f'{container_key["identifier"]}, ERROR')

  • Place the API helper function code in DataUpload.py

  1. import requests
    import os
    import io
    from datetime import datetime
    from azure.storage.blob import ContainerClient, ContentSettings

    # workspace = {'name': '', 'apiKey': ''}
    # config ={'tenantkey': ''}
    workspace_key = ''
    tenant_key = ''
    workspace_name = ''


    def getHeaders():
        return {
            'Api-Key': workspace_key,
            'Ocp-Apim-Subscription-Key': tenant_key
        }
        
        # return {
        #     'Api-Key': workspace['apiKey'],
        #     'Ocp-Apim-Subscription-Key': config['tenantkey']
        # }


    def _make_request(method, endpoint, data=None):
        url = f"{BASE_URL}{endpoint}"
        response = requests.request(method, url, headers=getHeaders(), json=data) 
        response.raise_for_status()  # Raise exception for error codes
        return response


    def test_connection():
        """Tests basic connection to the API"""
        return _make_request('GET', '/api/ping')


    def list_workspace_containers(workspace_name):
        """Gets upload containers for a workspace"""
        endpoint = f"/api/workspace/{workspace_name}/files/containers"
        return _make_request('GET', endpoint)


    def create_workspace_container(workspace_name, title=None):
        """Creates an upload container for a specified workspace.

        Args:
            workspace_name: The name of the workspace (e.g., 'dws-1-ACRONYM').
            title: An optional title for the container.

        Returns:
            str: The location of the created container (from the response header).
        """
        timestamp = f'{datetime.now():%Y-%m-%d %H:%M:%S}'
        title = timestamp + ' ' + title

        endpoint = f"/api/workspace/{workspace_name}/files/containers"
        url = f"{BASE_URL}{endpoint}"

        # Add title as a query parameter if provided
        params = {'title': title} if title else None

        response = requests.post(url, headers=getHeaders(), params=params)
        response.raise_for_status()  

        return response.headers['Location']  # Assuming location is in the header


    def delete_workspace_container(workspace_name, container_location):
        """Deletes an upload container in the specified workspace.

        Args:
            workspace_name: The name of the workspace (e.g., 'dws-1-ACRONYM').
            container_location: The location/identifier of the container. 
        """
        if not(container_location):
          return f'Create container first'

        # Extract the container identifier from the location
        container_identifier = container_location.rsplit('/', 1)[-1]

        endpoint = f"/api/workspace/{workspace_name}/files/containers/{container_identifier}"
        url = f"{BASE_URL}{endpoint}"

        response = requests.delete(url, headers=getHeaders())
        response.raise_for_status()  # Raise an exception for error codes
        return response



    def commit_workspace_container(workspace_name, container_location):
        """Commits changes to an upload container.

        Args:
            workspace_name: The name of the workspace.
            container_identifier: The identifier of the container.
        """

        # Extract the container identifier (adapt if necessary)
        container_identifier = container_location.rsplit('/', 1)[-1]

        endpoint = f"/api/workspace/{workspace_name}/files/containers/{container_identifier}"
        url = f"{BASE_URL}{endpoint}"

        response = requests.patch(url, headers=getHeaders())
        response.raise_for_status()  # Raise an exception for error codes

        # You might get some useful data in the response upon a successful commit
        return response


    def upload_text_to_container(container, text_data, filename='my_text.txt'):
        """Uploads text data to a container using container URL.

        Args:
            container: An Azure Blob Storage URL.
            text_data: The text content to be uploaded.
            filename: The desired filename within the container.
        """

        container_client = ContainerClient.from_container_url(container)
        blob_client = container_client.get_blob_client(filename)

        # Upload using the appropriate method on your container client 
        blob_client.upload_blob(
            text_data.encode('utf-8'),
            content_settings=ContentSettings(content_type='text/plain') 
        )


    def upload_dataframe_to_azure(container, df, filename='data.csv'):
        """Uploads text data to a container using container URL.

        Args:
            container: An Azure Blob Storage URL.
            df: pandas data frame
            filename: The desired filename within the container.
        """
        csv_buffer = io.StringIO()
        df.to_csv(csv_buffer, index=False)

        csv_data = csv_buffer.getvalue().encode('utf-8')  # Encoding for Blob Storage

        container_client = ContainerClient.from_container_url(container)
        blob_client = container_client.get_blob_client(filename)

        blob_client.upload_blob(csv_data, overwrite=True)


    def upload_file_to_azure(container, local_file_path):
        """Uploads text data to a container using container URL.

        Args:
            container: An Azure Blob Storage URL.
            df: pandas data frame
            filename: The desired filename within the container.
        """
        try:
          # Get the file name from the local file path
          file_name = os.path.basename(local_file_path)
          
          # Create a ContainerClient using the SAS URI
          container_client = ContainerClient.from_container_url(container)
          
          # Upload the file to the container
          with open(local_file_path, "rb") as file_to_upload:
              result = container_client.upload_blob(file_name, file_to_upload, overwrite=True)
          
          print(f"File '{file_name}' uploaded successfully. \n{result=} \nto {container=}")
        
        except Exception as ex:
            print(f"Exception: {ex}")
  1. Run: python main.py

Google Colab note

If using Google colab, add this to the first cell:
  1. %%capture 

    !pip install azure-storage-blob azure-identity

Code Explanation (main.py)

The main.py file demonstrates how to use the functions in DataUpload.py to:

  • Create sample data

  • Create an upload container

  • Upload various data types to the container (text, a file, a DataFrame)

  • Commit the container to the workspace

  • List and delete containers (optional for cleanup)


Important Notes

  • Secure your API keys. Avoid hardcoding them in your scripts and do not share them publicly.

  • Adapt file paths and container identifiers if necessary, based on your specific setup.

  • Google Colab: The %%capture magic suppresses output from the installation cell, keeping your Colab notebook tidy.


Further reading

If you are configuring DLZ API for Android applications: https://github.com/Azure/azure-storage-android

    • Related Articles

    • Data Handling policy

      First version: 2021-05-13 Last updated: 2023-10-19 Last change: Removed a double negative sentence based on the feedback in our Support Team Agreement. Introduction anDREa B.V. (hereafter called anDREa) is committed to protect the data and privacy of ...
    • Data Protection policy

      First version: 2021-05-13 Last updated: 2023-10-25 Last change(s): Added links to GDPR compliance assessment, Data Handling policy, GDPR Article 5; Modified contact information; Substituted Azure DRE for myDRE; Formatting. Approval: 2023-10-26 ...
    • Data Breach Procedure

      First version: 2021-04-15 Last updated: 2023-10-19 Last change: Link to Data Protection policy Introduction Every care is taken by anDREa to protect personal data from situations where a data protection breach could compromise security. This policy ...
    • Data - ownership, responsibility, and control

      Ownership of data can be a tricky question when it comes down to personal data or data of persons. For instance, it is not unlikely that it depends on what subsection of Article 6 was used. By design, myDRE is a pragmatic and solid answer to a, ...
    • Data Protection Impact Assessment (DPIA)

      First version: 2021-05-13 Last updated: 2024-03-07 Last change: Added link to NEN-7510 article. Introduction anDREa is committed to the GDPR. The purpose of this document is to describe anDREa’s Data Protection Impact Assessment (DPIA). The template ...