{"id":137955,"date":"2021-12-03T15:27:12","date_gmt":"2021-12-03T20:27:12","guid":{"rendered":"http:\/\/www.bu.edu\/tech\/?page_id=137955"},"modified":"2023-08-25T12:48:20","modified_gmt":"2023-08-25T16:48:20","slug":"cloud-applications","status":"publish","type":"page","link":"https:\/\/www.bu.edu\/tech\/support\/research\/system-usage\/transferring-files\/cloud-applications\/","title":{"rendered":"Transferring Files using a Cloud Application"},"content":{"rendered":"<h2>Overview<\/h2>\n<p>Data is stored in cloud storage services such as AWS S3 buckets, Google Drive, or One Drive. Alternatively, data may also be made available via Application Programming Interface (API) by the organization hosting the data. Accessing these sources require an internet connection and often an application to interact with the remote servers hosting the data.<\/p>\n<p>A dedicated node, called the Data Transfer Node, is available on the SCC for these types of data transfer tasks. The Data Transfer Node has its own high bandwidth internet connection and its only intended to be used for data transfer tasks. This page contains the following sections:<\/p>\n<ol>\n<li><a href=\"#DTN\">The Data Transfer Node<\/a><\/li>\n<li><a href=\"#JOB\">Submitting a Batch Job<\/a><\/li>\n<li><a href=\"#OOD\">Using SCC OnDemand for Interactive Data Transfer<\/a><\/li>\n<li><a href=\"#CLOUD_APPS\">Suggested Software and Modules for Transferring Data from Cloud Storage<\/a>\n<ol type=\"a\">\n<li><a href=\"#LFTP\">lftp<\/a><\/li>\n<li><a href=\"#RCLONE\">Rclone<\/a><\/li>\n<li><a href=\"#AWSCLI\">AWSCLI<\/a><\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<h2><a name=\"DTN\"><\/a>The Data Transfer Node<\/h2>\n<p>The data transfer node is a single server named <span style=\"white-space: nowrap;\">\u201c<code>scc-globus.bu.edu<\/code>\u201d<\/span> that provides a direct 10GbE connection from the SCC to the internet for data transfer. The data transfer node is intended only for data transfer tasks and not for compute-intensive workloads. In comparison, the SCC compute nodes offer large computing capacity, but reside on a private network and share a connection to the outside world which limits the data transfer speeds. The data transfer node can be requested with the <span style=\"white-space: nowrap;\">\u201c<code><span class=\"command\">-l download<\/span><\/code>\u201d<\/span> command line option with a maximum runtime of 24 hours and limited to a single processor core.<\/p>\n<h4><a name=\"JOB\"><\/a>Submitting a Batch Job<\/h4>\n<p>The <code><span class=\"command\">-l download<\/span><\/code> command line option for <code><span class=\"command\">qsub<\/span><\/code> will place your job on the data transfer node. This can be done for standard <code><span class=\"command\">qsub<\/span><\/code> batch job submissions either on the command line or inside of a <a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/system-usage\/running-jobs\/batch-script-examples\/\">batch script<\/a>.<\/p>\n<p><b>Example <code><span class=\"command\">qsub<\/span><\/code> command line:<\/b><\/p>\n<pre style=\"margin-bottom: 30px;\"><code class=\"code-block\" style=\"padding: 0em 1em 0em 1em;\"><span class=\"prompt\">scc1$<\/span> <span class=\"command\">qsub<\/span> \u2013P <span class=\"placeholder\">projectname<\/span> <span class=\"command\">-l download<\/span> <span class=\"placeholder\">download.qsub<\/span><\/code><\/pre>\n<p><b>Example batch script:<\/b><br \/>\nThis example script would use the data transfer node to synchronize files from Amazon Web Services S3 Service. A researcher should change the module and commands for their own purposes.<\/p>\n<pre style=\"margin-bottom: 30px;\"><code class=\"code-block\" style=\"padding: 0em 1em 0em 1em;\"><span style=\"color: #1a6d20; font-weight: 800;\">#!\/bin\/bash -l<\/span>\r\n\r\n#$ -P <span class=\"placeholder\">projectname<\/span>\r\n#$ <span class=\"command\">-l download<\/span>\r\n\r\n<span class=\"command\">module load<\/span> <span class=\"placeholder\">awscli<\/span>\r\n<span class=\"command\">aws<\/span> s3 sync s3:\/\/mybucket scc-directory<\/code><\/pre>\n<p>Note that the data transfer node does not accept interactive jobs from the scheduler; neither <code><span class=\"command\">qsh<\/span><\/code> nor <code><span class=\"command\">qrsh<\/span><\/code> requests will work with the <code><span class=\"command\" style=\"white-space: nowrap;\">-l download<\/span><\/code> command line option. For interactive use, researchers can SSH directly to <span style=\"white-space: nowrap;\"><code>scc-globus.bu.edu<\/code><\/span> from within the cluster or use <a href=\"#OOD\">SCC OnDemand<\/a>. The scc-globus node is dedicated to file transfers and interactive usage through SSH is subject to the same policies as apply on the login nodes.<\/p>\n<h2><a name=\"OOD\"><\/a>Using SCC OnDemand for Interactive Data Transfer<\/h2>\n<p>Some data transfer programs must be run interactively, such as using the Firefox browser to access Google Drive cloud storage. For these applications, we recommend using a Desktop session on <a href=\"https:\/\/www.bu.edu\/tech\/support\/research\/system-usage\/connect-scc\/scc-ondemand\/\">SCC OnDemand<\/a> for a remote desktop on the data transfer node. This can be done by selecting a \u201cDesktop\u201d interactive app, adding the <span style=\"white-space: nowrap;\">\u201c<code><span class=\"command\">-l download<\/span><\/code>\u201d<\/span> option to the \u201cExtra Qsub Options\u201d field and launching the job.<\/p>\n<p><img loading=\"lazy\" src=\"\/tech\/files\/2020\/05\/scc-globus1.png\" style=\"border: 1px solid #000;\" width=\"821\" height=\"877\" class=\"alignnone\" alt=\"SCC OnDemand's Desktop session shows options for Remote desktop access for data transfer.\" \/><\/p>\n<p>Once the job starts, you can connect to the desktop session and find yourself on the <span style=\"white-space: nowrap;\"><code>scc-globus.bu.edu<\/code><\/span> node with the ability to open web browsers or launch interactive download applications.<\/p>\n<p><img loading=\"lazy\" src=\"\/tech\/files\/2020\/05\/scc-globus2.png\" width=\"1099\" height=\"718\" class=\"alignnone\" alt=\"Black command-line terminal: open browsers, launch downloads via SCC-globus.bu.edu commands.\" \/><\/p>\n<h2><a name=\"CLOUD_APPS\"><\/a>Suggested Software and Modules for Transferring Data from Cloud Storage<\/h2>\n<p>There are many third party applications available that try to make transferring data from a cloud service provider easier by automating the task with only a few lines of commands. It is common for these tools to require upfront configuration to setup authentication in order to connect to a cloud service and access the data. Although we suggest these modules for your use, we do not guarantee they will work in all scenarios.<\/p>\n<h3><a name=\"LFTP\"><\/a>lftp<\/h3>\n<p><a href=\"https:\/\/lftp.yar.ru\/\">lftp<\/a> is a command line program that can transfer files from remote servers using FTP, SFTP, HTTP, and several other protocols. It is available as a system utility and does not require and modules to be loaded. lftp has a wide array of commands that can be used and its website has tutorials and documentation. Here is an example of using lftp to transfer an entire directory from a remote FTP server:<\/p>\n<p><code>lftp -u username <strong>ftp.someserver.org\/path\/to\/remote\/directory<\/strong> -e \"cd <strong>path\/to\/scc\/destination<\/strong> ; mirror; exit\"<\/code><\/p>\n<p>You will be prompted for the password on the remote server. This will copy the specified directory on the remote server to the destination you specify on the SCC.<\/p>\n<h3><a name=\"RCLONE\"><\/a>Rclone<\/h3>\n<p><a href=\"https:\/\/rclone.org\/\">Rclone<\/a> is a command line program that allows one to configure a connection to over 40 cloud storage products. Use the following module load command to load rclone:<\/p>\n<pre style=\"margin-bottom: 30px;\"><code class=\"code-block\" style=\"padding: 0em 1em 0em 1em;\"><span class=\"prompt\">scc1$<\/span> <span class=\"command\">module load rclone<\/span><\/code><\/pre>\n<p>After loading the module, rclone needs to be configured to communicate with the cloud provider of your choice. Below are links to instructions on how to configure rclone with common cloud providers. Click on the link for the cloud provider you are connecting to and follow the instructions.<\/p>\n<ul>\n<li><a href=\"https:\/\/rclone.org\/dropbox\/\">Dropbox<\/a><\/li>\n<li><a href=\"https:\/\/rclone.org\/s3\/#configuration\">AWS S3<\/a><\/li>\n<li><a href=\"https:\/\/rclone.org\/drive\/\">Google Drive<\/a><\/li>\n<li><a href=\"https:\/\/rclone.org\/googlecloudstorage\/\">Google Cloud Storage<\/a><\/li>\n<li><a href=\"https:\/\/rclone.org\/onedrive\/\">Microsoft One Drive<\/a><\/li>\n<\/ul>\n<p>The list of all supported providers are available on <a href=\"https:\/\/rclone.org\/#providers\">rclone&#8217;s homepage<\/a>.<\/p>\n<h4>Rclone: Using the OnDemand Files App<\/h4>\n<p>After you\u2019ve configured your cloud account with rclone, a file containing your authenticated clouds will be saved in your home folder:  <code>~\/.config\/rclone\/rclone.conf<\/code>. OnDemand will automatically detect any entries in this file and will appear in your Files menu as shown below  <code>[1]<\/code>.<br \/>\n<img loading=\"lazy\" src=\"\/tech\/files\/2023\/08\/rclone_1-636x396.png\" alt=\"\" width=\"636\" height=\"396\" class=\"alignnone size-medium wp-image-147317\" srcset=\"https:\/\/www.bu.edu\/tech\/files\/2023\/08\/rclone_1-636x396.png 636w, https:\/\/www.bu.edu\/tech\/files\/2023\/08\/rclone_1.png 659w\" sizes=\"(max-width: 636px) 100vw, 636px\" \/><br \/>\nYou can move or copy files between your cloud account and the SCC using the built in functionality highlighted below. Navigate to the source path to your files, check the box of the file to be transferred <code>[1]<\/code>, and select <strong>Copy\/Move<\/strong> <code>[2]<\/code>.<br \/>\n<img loading=\"lazy\" src=\"\/tech\/files\/2023\/08\/rclone_2-636x376.png\" alt=\"\" width=\"636\" height=\"376\" class=\"alignnone size-medium wp-image-147318\" srcset=\"https:\/\/www.bu.edu\/tech\/files\/2023\/08\/rclone_2-636x376.png 636w, https:\/\/www.bu.edu\/tech\/files\/2023\/08\/rclone_2.png 717w\" sizes=\"(max-width: 636px) 100vw, 636px\" \/><br \/>\nA menu will appear on the left dashboard with the option to <strong>Copy or Move<\/strong> <code>[1]<\/code>. You must change to the target directory which you can manually enter by selecting the <strong>Change directory<\/strong> <code>[2]<\/code> option or by selecting your project space on the left. When you have changed to the correct target directory, select your transfer option <code>[1]<\/code>. <em>Note: <strong>Move<\/strong> will remove a copy of the file from the source<\/em>.<br \/>\n<img loading=\"lazy\" src=\"\/tech\/files\/2023\/08\/rclone_3-636x379.png\" alt=\"\" width=\"636\" height=\"379\" class=\"alignnone size-medium wp-image-147319\" srcset=\"https:\/\/www.bu.edu\/tech\/files\/2023\/08\/rclone_3-636x379.png 636w, https:\/\/www.bu.edu\/tech\/files\/2023\/08\/rclone_3-1024x610.png 1024w, https:\/\/www.bu.edu\/tech\/files\/2023\/08\/rclone_3-768x457.png 768w, https:\/\/www.bu.edu\/tech\/files\/2023\/08\/rclone_3.png 1310w\" sizes=\"(max-width: 636px) 100vw, 636px\" \/><br \/>\nYour transfer will then initiate which initiate and you can track with the in-page status <code>[1]<\/code>.<br \/>\n<img loading=\"lazy\" src=\"\/tech\/files\/2023\/08\/rclone_4-636x332.png\" alt=\"\" width=\"636\" height=\"332\" class=\"alignnone size-medium wp-image-147316\" srcset=\"https:\/\/www.bu.edu\/tech\/files\/2023\/08\/rclone_4-636x332.png 636w, https:\/\/www.bu.edu\/tech\/files\/2023\/08\/rclone_4-768x401.png 768w, https:\/\/www.bu.edu\/tech\/files\/2023\/08\/rclone_4.png 938w\" sizes=\"(max-width: 636px) 100vw, 636px\" \/><\/p>\n<h3><a name=\"AWSCLI\"><\/a>AWSCLI<\/h3>\n<p><a href=\"https:\/\/aws.amazon.com\/cli\/\">AWS Command Line Interface<\/a> (AWSCLI) is a command line program used to manage AWS services, but also can be used to transfer data from\/to AWS S3 buckets. Use the following module command to load awscli:<\/p>\n<pre style=\"margin-bottom: 30px;\"><code class=\"code-block\" style=\"padding: 0em 1em 0em 1em;\"><span class=\"prompt\">scc1$<\/span> <span class=\"command\">module load awscli<\/span><\/code><\/pre>\n<p>If you are connecting to a secured S3 bucket, you will need to configure awscli with authentication information in order to gain access to the data. For a basic configuration setup, run <span style=\"white-space: nowrap;\">\u201c<code><span class=\"command\">aws configure<\/span><\/code>\u201d<\/span> command. This will ask you to enter the &#8220;AWS Access Key ID&#8221;, the &#8220;AWS Secret Access Key&#8221;, and &#8220;Default region name&#8221;. The administrator of your AWS account can help you find this information.<\/p>\n<p>After the configuration is complete, you will use the <span style=\"white-space: nowrap;\">\u201c<code><span class=\"command\">aws s3<\/span><\/code>&#8220;<\/span> groups of commands to interact with an SCC bucket. For example, to synchronize the AWS S3 bucket with a local directory, one may run the following command:<\/p>\n<pre style=\"margin-bottom: 30px;\"><code class=\"code-block\" style=\"padding: 0em 1em 0em 1em;\"><span class=\"prompt\">scc1$ <\/span><span class=\"command\"> aws<\/span> s3 sync s3:\/\/mybucket scc-directory<\/code><\/pre>\n<p>The following are links to help pages for commands that maybe helpful for exploring and transferring data from an S3 bucket:<\/p>\n<ul>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/cli\/latest\/reference\/s3\/cp.html\">cp <\/a> &#8211; Copy<\/li>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/cli\/latest\/reference\/s3\/ls.html\">ls<\/a> &#8211; List S3 Objects<\/li>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/cli\/latest\/reference\/s3\/mv.html\">mv<\/a> &#8211; Move<\/li>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/cli\/latest\/reference\/s3\/rm.html\">rm<\/a> &#8211; Delete S3 Object<\/li>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/cli\/latest\/reference\/s3\/sync.html\">sync<\/a> &#8211; Sync directories<\/li>\n<\/ul>\n<p><a href=\"https:\/\/docs.aws.amazon.com\/cli\/latest\/reference\/s3\/index.html\">Click here<\/a> to access the documentation of all &#8220;s3&#8221; commands available.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Overview Data is stored in cloud storage services such as AWS S3 buckets, Google Drive, or One Drive. Alternatively, data may also be made available via Application Programming Interface (API) by the organization hosting the data. Accessing these sources require an internet connection and often an application to interact with the remote servers hosting the&#8230;<\/p>\n","protected":false},"author":1692,"featured_media":0,"parent":137947,"menu_order":3,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/137955"}],"collection":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/users\/1692"}],"replies":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/comments?post=137955"}],"version-history":[{"count":18,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/137955\/revisions"}],"predecessor-version":[{"id":147325,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/137955\/revisions\/147325"}],"up":[{"embeddable":true,"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/pages\/137947"}],"wp:attachment":[{"href":"https:\/\/www.bu.edu\/tech\/wp-json\/wp\/v2\/media?parent=137955"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}