Feb 24, 2024
Targets available on AWS Snowball Edge for data loading – Addressing Disconnected Scenarios with AWS Snow Family
There are several types of targets available on an AWS Snowball Edge device that you can use to load data.
NFS endpoint on the AWS Snowball Edge device
This option allows users to access and manage data on the Snowball Edge device using the familiar NFS protocol. This means you can easily mount the Snowball Edge device as a network file share, similar to mounting a NAS device. You can then perform standard file operations such as reading, writing, moving, and deleting files using drag and drop like you would on a departmental file share. Linux or macOS both have NFS support built in, while Windows requires installation of the Services for NFS optional component or a third-party NFS client.
This is generally the most convenient method and the most readily understood. Standard client-side tools such as rsync, xcopy, Robocopy, or the like can be used with no modifications.
This target has a practical maximum throughput of around 3 Gbit/s.
S3 endpoint on the AWS Snowball Edge device
All members of the AWS Snow Family have a local version of the same sort of S3 endpoint as you would work with in a region. You simply target the S3 endpoint IP on the AWS Snowball Edge device with commands from the AWS CLI or your own code (for instance, a Python script using boto3):

Figure 4.5 – S3 endpoint on an AWS Snowball Edge device
You can also target this local endpoint with third-party programs that know how to work with S3 – common examples include enterprise backup software packages such as Veeam or Commvault.
This target can ingest at speeds in excess of 20 Gbit/s. However, this requires considerable optimization of the client-side transfer mechanism to achieve.
EC2 instance running on the AWS Snowball Edge device
Another approach is to bypass the native endpoints on the device altogether by spinning up an EC2 instance on it:

Figure 4.6 – EC2 instances running on an AWS Snowball Edge device
That instance could run any third-party data transfer software you want, and the limitations on throughput would be specific to that vendor’s software.
AWS DataSync agent
The AWS DataSync agent is a special kind of EC2 instance you can spin up on an AWS Snow Family device. It is important to note that this type of target pulls the data rather than has data pushed to it like all of the others do. DataSync supports pulling data from the following types of shared storage in your on-premise environment:
NFS exports
Windows Server (CIFS/Server Message Block (SMB)) shares
Hadoop Distributed File System (HDFS)
Self-managed object stores (some NAS devices can host S3-compatible stores)

Figure 4.7 – Launching the DataSync agent from OpsHub
You create DataSync tasks inside the AWS Management Console that tell the agent how to access these resources in your environment, when to pull files, how much bandwidth to consume, or if any manipulations need to be done in the process. The agent optimizes the data transfer process by employing techniques such as parallelization, data deduplication, and delta detection to minimize transfer times and optimize bandwidth usage.
A single DataSync task is capable of relaying data to an AWS region at 10 Gbit/s. However, this is dependent upon the resources available within the instance type chosen when the agent is deployed onto the device. At a minimum, an instance type with 2 vCPUs must be used. The more vCPUs the agent has at its disposal, the more it can parallelize the transfer and attain higher speeds.
More Details