Stanford provides an unlimited* Google Drive space for all the labs. There are some restrictions like daily upload limit of 750 GB/person/day (this limit is 5TB for a single file which makes "tar"ing a folder a good practice), 2 files per second limit (very slow transfer for multiple small files) and the number of files each teamdrive can store. There can be multiple teamdrives setup on Google Drive to overcome the number of file limit. There are several teamdrives setup for Yiorgo's lab named "Skiniotis Lab", "SkiniotisLab2" and etc. You can use rclone module in Sherlock to copy your files to these TEAMDRIVEs.
- In order to transfer data using rclone, you should first configure rclone for your account on Sherlock to link rclone to these teamdrives on Google Drive.
- I prepared several template submission scripts to tar, split and transfer data and also to make a list of files. You should edit those submission scripts and submit it to transfer your assigned folders.
- Transfering or tarring are long processes, therefore submitting these jobs to queue is better that interactive jobs which requires constant internet connection of your open terminal
- Please follow the following instructions to configure rclone and transfer data.
How to configure Rclone for your Sherlock account
Please follow the following steps for each teamdrive you would like to use. More info in this link. https://uit.stanford.edu/service/gsuite/drive
PS: Commands are denoted with "$" sign and you should no include the "$" sign when you type these commands.
"#" sign denotes my comments and you should not include anything after this sign when you type these commands.
First login to Sherlock by opening a terminal on your local computer and type the following
$ ssh -Y email@example.com #do not forget to change your_sunet_id
#Follow the the instructions to login
$ ml system rclone
$ rclone config
type --> n (to create a new remote config file)
type --> teamremote (create a name for the team drive on Google Drive for your rclone configuration on Sherlock, you need a new name for each teamdrive)
type --> drive (to select for Google Drive)
client_id> (leave empty and press enter)
client_secret> (leave empty and press enter)
scope> (leave empty and press enter)
root_folder_id> (leave empty and press enter)
service_account_file> (leave empty and press enter)
Edit advanced config? --> n
Use auto config? --> n
#Copy the given link and open on a browser, select your Stanford account (not personal gmail account) to give permission to google drive.
#Copy the verification code and enter it on the terminal
Configure this as a team drive --> y
Enter a Team Drive ID --> 1 (There are several of Team Drives on Google Drive, ask Mike which one to choose)
y/e/d --> y (yes to confirm config details)
e/n/d/r/c/s/q --> q (quit to finalize configuration)
You have configured the rclone for your account in Sherlock which connects our "teamdrive"s on our Google Drive account to your Sherlock account with name "teamremote".
When your certain teamdrive in Google Drive is full, you should run the same configuration for linking another teamdrive on our Google Drive to your Sherlock account for rclone (perhaps with the name "teamremote2"). Alternatively, you can transfer all the files in your teamdrive to another teamdrive.
How to transfer AN assigned folder from Sherlock to Google Drive
First create your submission script on Sherlock to transfer data
# Open a Sherlock terminal and change directory to rclone on Yiorgo's home directory
$ cd /share/PI/yiorgo/rclone
#Create a folder for your name
#Copy relevant templates from templates folder
$cp templates/*.sbatch your_folder
You can either directly transfer a folder or you can tar a folder first and transfer the tar file.
Each method has its own advantages.
This needs to be updated
# Edit the following inputs in your submission script
---------- Please follow the instructions below for every new assignment ----------
# Edit the folder path in your submission script (if you would like to transfer a sub-folder you should maintain the sub-folder structure on the teamremote, please check the template submission script)
# Edit the name of the "teamremote" if necessary according to your naming (this is the name you defined in Sherlock) of the remote Google Drive
# Run the following command to submit your transfer job (do not forget to change "yourname" with your name)
$ cd /share/PI/yiorgo/rclone
$ sbatch yourname_rclone.sbatch
# Check the log name created by typing "ls log-files-rclone" command. You can then follow the transfer by typing the below command
$ tail -n 100 log-files-rclone/rcl-yourname_$jobid.log
# If you hit the number of file limit, you do not need to stop the job, you can just login to Google Drive with you Stanford email address and move some folders to another teamdrive.
# Each transfer job will be terminated after 7 days or if the transfer ends or it crashes. Resubmit your job with necesary changes accordingly (ex: new folder assignment).
# After all the file transfers finishes for a giving assignment, It would be better to submit the transfer again to see all the errors for the remaining files.
# You can also use
$ rclone rmdirs path this command to delete all the empty folders and check the folders/files that are not transfered yet.
# Please delete your log-files once in a while
Google drive limits
- Upload limit - 750 GB of data per day & 2 files per second.
- Single file size limit - 5 TB in size (single file transfer can exceed 750GB/day limit, so single tar file can be beneficial).
- If a single file exceeds the 750 GB daily limit, that file will upload. Subsequent files will not upload until the daily upload limit resets the next day.
- Number of files limit - 400,000 files and folders per teamdrive (you will get an error if this number exceed and you should create a new teamdrive).
- Subfolder limit - nest up to 20 subfolders
Some Other notes
# If you are transferring any file larger than 750GB, the transfer may fail due to the daily upload limit. You can use --bwlimit=8M flag with rclone command to overcome this potential error. It is not a good idea in general.
rclone rmdirs remote:path [flags] #Remove empty directories under the path. (useful flags,
--help & --leave-root)