Direct upload of large datasets to blob

Direct upload of large datasets to blob

Some datasets do not upload well using the upload mechanisms described in Uploading your data and Blob storage. This is mostly the case with large datasets, both those containing many files and those containing large files. At Radboudumc, we offer the option to directly upload your large datasets to blob storage within your workspace.

Warning: when data is uploaded via this route, the upload itself is not logged, meaning that there will be no trace of the activity detailing the timing of the upload and who initiated it.
We want everyone to use the technical solutions provided by anDREa as much as possible. Direct upload will only be facillitated if the existing upload mechanisms of the DRE do not allow you to upload your dataset in a reasonable way.
Requirements:
- Your dataset
- A workspace with a blob container (see Blob storage - Creating a blob container)
- Privileged or Accountable access to said workspace
- The IP address of the PC from which the upload will be initiated
- Azure storage explorer installed on aforementioned PC

Requesting upload support

For direct upload to blob, you need full access to the storage account of your workspace. Direct access can be granted by your support person, and will be made possibly only from a specified IP address.

To request access, submit a 'Request a non-standard feature' ticket and mention the following information:
- The date and time you plan to start your upload, at least 1 workday in advance
- The workspace you want to upload your data to (dws-xxxx-aaaaaa)
- The name of your blob container (if there are more than one in your workspace)
- The IP address of the PC you want to upload your data from

You will receive confirmation once the ticket has been seen by support, and will also be notified when direct access should be possible.

Uploading your data

Once you have been granted direct access to your storage account, there are two ways to upload your data directly to your blob container:

1. Azure Storage Explorer

Normally, when you upload data using Storage Explorer, you make use of a temporary blob container that is not connected to your workspace. With your request for upload support, you can ask for a SAS URL that can be used to directly access the blob container. This SAS URL grants access for a limited amount of time, which will be set to 24 hours by default. If you require a longer upload time, please specify this in the request.

Steps:

a) Open Azure Storage Explorer and click on the power plug icon (red circle), then choose the option Blob container (blue circle).


b) Select the Shared Access Signature (SAS) option and click on Next.

c) Paste the copied SAS URI in the bottom field, and check the container name that appears in the top box. Then click on Next and then Connect.

d) You should now see the contents of your blob container. Click on Upload, then choose the option file or folder in the next screen.

e) Browse for your specific file or folder by clicking on the three dots. Choose the access tier (Hot, Cool, or Archive; Default is Hot unless you have had this changed by support), and if applicable specify a new 'subfolder' where the file/folder should be uploaded to (please be aware that blob does not really support folder structures, but it merely imitates them). Click on Upload.

You will see the upload progress at the bottom of the screen. Once Azure Storage Explorer says the upload has finished, make sure to double-check the contents of the container.

2. Azure portal

If Azure storage explorer cannot be used, it is possible to directly upload your data through the Azure portal.
The following steps should be followed carefully. Straying from the path may result in permanent loss of data or may otherwise negatively affect your workspace.

Steps:

a) Go to https://portal.azure.com and log in with your myDRE credentials.

b) In the search bar on the website, type in the name of your workspace without hyphens, followed by 'data' (e.g., dws1234acronymdata). The search results will appear in a drop-down box, find the storage account resource exactly matching your search term and click on it.


c) On the left hand side, click on Containers (circled in red), and then click on the blob container where you would like to upload your data.


d) Select the files/folders you would like to upload, and when applicable fill in any additional information under Advanced. Finally, click on Upload and wait until your files have finished uploading.


Tips

- If you are uploading data from an external hard disk, the process will be slow. Try connecting it to a USB 3.0, USB-C or Thunderbolt 3 outlet, if available.
- Check the expiration date/time of your SAS URL, and adjust if necessary.
- On the day of your planned upload, check whether your IP address still matches the IP address filled in on your request. If not, let Radboudumc support know.

    • Related Articles

    • Connecting to the Radboud Data Repository

      The Radboud Data Repository (RDR) is used to archive and share research data collected by researchers associated with Radboudumc and/or Radboud University. All information about the RDR can be found on the RDR Help pages. The article below explains ...
    • Radboudumc DRE newsletter - December 9, 2022

      Dear Radboudumc DRE user, Until November of last year, RTC Data Stewardship sent out regular information emails to a group of key users of the Digital Research Environment. These emails have since stopped for several reasons, but we would like to ...
    • DRE tutorials

      DRE Beginner sessions Are you a Radboudumc employee and do you need some help navigating the DRE? Then sign up for a DRE tutorial session! Gain hands-on experience with the DRE during a 1,5-hour session. We start this session with a short ...
    • Radboudumc DRE newsletter - October 2023

      Dear Radboudumc DRE user, We have yet another DRE newsletter for you, with the following topics: Summary of the DRE Community meeting (28 September 2023) Availability of virtual machine types Workspace archiving and removal Happy reading! --- Summary ...
    • Network issues - November 17, 2022

      Dear Radboudumc DRE workspace owner, Due to a recent change on the platform, network proxy settings are now enforced on all virtual machines. Because of this, non-standard network traffic that is not supposed to go through the proxy server, such as ...