Aug 19, 2024
DNIs – Addressing Disconnected Scenarios with AWS Snow Family

DNIs were introduced to AWS Snow Family devices to support advanced network use cases. DNIs provide layer 2 network access without any translation or filtering, enabling features such as multicast streams, transitive routing, and load balancing. This direct access enhances network performance and allows for customized network configurations.

DNIs support VLAN tags, enabling network segmentation and isolation within the Snow Family device. Additionally, the MAC address can be customized for each DNI, providing further flexibility in network configuration:

Figure 4.18 – AWS Snowball Edge device with one DNI

DNIs and security groups

It’s important to note that traffic on DNIs is not protected by security groups, so additional security measures need to be implemented at the application or network level.

Snowball Edge devices support DNIs on all types of physical Ethernet ports, with each port capable of accommodating up to seven DNIs. For example, RJ45 port #1 can have seven DNIs, with four DNIs mapped to one EC2 instance and three DNIs mapped to another instance. RJ45 port #2 could simultaneously accommodate an additional seven DNIs for other EC2 instances.

Note that the Storage Optimized variant of AWS Snowball Edge does not support DNIs:

Figure 4.19 – AWS Snowball Edge network flows with DNIs

Looking at Figure 4.19, we can see that al2-1 has two Ethernet ports configured inside Linux. One is on the typical 34.223.14.128/25 subnet, but the other is directly on the 192.168.100.0/24 RFC 1918 space. A configuration such as this is the only time an interface on an EC2 instance on an AWS Snow Family device should be configured for any subnet but 34.223.14.128/25.

Figure 4.20 shows what a DNI looks like from the perspective of the EC2 instance that has one attached:

Figure 4.20 – DNI details under Amazon Linux 2

Storage allocation

All AWS Snowball Edge device variants work the same way with respect to storage allocation. Object or file storage can draw from the device’s HDD storage capacity, while block volumes used by EC2 instances can be drawn from either the device’s HDD or SDD capacity. Figure 4.21 shows an example of this:

Figure 4.21 – Storage allocation on AWS Snowball Edge

S3 buckets on a device can be thought of as being thin-provisioned in the sense that they start out consuming 0 bytes, and as objects are added, they only take the amount needed for those objects from the HDD capacity.

Block volumes for EC2 instances, on the other hand, can be thought of as thick-provisioned. When the volume is created, a capacity is specified, and it is immediately removed from the HDD pool for any other use.

More Details
Jul 4, 2024
AWS SNOW “PRIVATE” SUBNET – 34.223.14.128/25 – Addressing Disconnected Scenarios with AWS Snow Family

Something you will notice right away in Figure 4.15 is that the EC2 instances are configured for an internal network of 34.223.14.128/25, which is a routable prefix on the internet. At the same time, the “public” IPs mapped to them on their VNIs live on 192.168.100.0/24 – a non-routable RFC 1918 address space. This is counter-intuitive and the opposite of how public subnets work inside an AWS region.

Rest assured this is done for a reason. AWS owns the 34.223.13.128/25 space with IANA, and it is not actually used on the internet. AWS chose to do this to make deployment of Snow Family devices simpler by ensuring the default private subnet is never the same as whatever RFC 1918 address space a customer is using.

Note that while you can make the “public” subnet of the VNIs live on whatever you wish, it is not possible to change the “private” subnet on any AWS Snow Family device – it is always 34.223.14.128/25.

Two VNIs sharing one physical Ethernet port

In certain situations, physical constraints may prevent the ideal configuration of separating VNIs onto different Ethernet ports on an AWS Snowball Edge device:

Figure 4.16 – AWS Snowball Edge device with two VNIs on a single PNI

Figure 4.17 illustrates the two network paths possible from EC2 instance al2-1. This instance can connect to devices outside the AWS Snowball Edge environment via VNI 1, which is configured as a 1-1 NAT entry for 192.168.100.210 mapped to the IP the EC2 instance has configured internally of 34.223.13.193:

Figure 4.17 – AWS Snowball Edge network flows with VNIs

At the same time, al2-1 can communicate directly to centos-1 across the AWS Snowball Edge device’s internal subnet of 34.223.14.128/25.

VLANs on AWS Snow Family

It is possible for a Snow Family device to have VNIs that share the same physical Ethernet port configured for two different RFC 1918 subnets through the use of VLAN tagging. This helps to mitigate some security concerns expressed by customers, but be aware: instances will always be able to talk directly on the internal 34.223.14.128/25 subnet. It is therefore important that security groups are used to limit this.

More Details
Jun 22, 2024
Logical networking – Addressing Disconnected Scenarios with AWS Snow Family

First, we must level set on some terms that have specific meanings within the context of AWS Snowball Edge. These terms differ a bit from what you see in an EC2 VPC:

Public IP: In this context, the term “public” does not mean a routable IP on the internet. It simply means an IP address on the “outside” network of the device – this is the network your device will acquire Dynamic Host Configuration Protocol (DHCP) addresses from when you plug it in for the first time. The default gateway for this network will be a router that you own. DNS and NTP will also be pointing toward addresses you use now on your network.

Private IP: An IP address on the “inside” network of the device. Perhaps confusingly, AWS has chosen to make the private network range on all AWS Snow Family devices 34.223.14.128/25. This cannot be changed, and yes – it is a routable prefix registered to AWS with the Internet Assigned Numbers Authority (IANA). There are no services attached to the “real” version of this prefix out on the internet, so don’t worry.

Virtual Network Interface (VNI): A static 1:1 NAT mapping of a public IP to a private IP. These are needed for EC2 instances to talk to any network outside of the private range inside the device.

Direct Network Interface (DNI): This is a way to map one of the physical RJ45/10 GbE Ethernet ports on the AWS Snowcone device to an EC2 instance inside the device, thus bypassing the 1:1 NAT translation from 34.223.14.128/25 to 192.168.x.x (or whatever your network’s IP range is).

Two VNIs each on a different physical Ethernet port

Configuring an AWS Snowball Edge device with two VNIs, each on a separate physical Ethernet port, offers several key benefits. First, it provides increased network bandwidth and throughput by leveraging the capabilities of two separate network connections. This is particularly advantageous in scenarios that require high-speed data transfer or processing, allowing for faster and more efficient operations.

Secondly, having separate physical Ethernet ports for each VNI allows for network segregation and isolation at a hardware level. This enables the Snowball Edge device to maintain a strict separation between different types of network traffic or data flows. By keeping the networks isolated, organizations can ensure enhanced security, compliance, and operational control over their data and applications.

Furthermore, the configuration with two separate physical Ethernet ports provides inherent redundancy and high availability (HA). If one network connection or port experiences an issue, the Snowball Edge device can automatically switch to the other port, maintaining uninterrupted connectivity and data transfer. This redundancy ensures continuity of operations and minimizes the impact of any network failures:

Figure 4.15 – AWS Snowball Edge device with two VNIs on separate PNIs

More Details
May 10, 2024
Physical networking – Addressing Disconnected Scenarios with AWS Snow Family

AWS Snowball Edge devices have several Ethernet interfaces you can use to connect them to your network. The interfaces can operate at 1 Gbit/s, 10 Gbit/s, 25 Gbit/s, 40 Gbit/s, or 100 Gbit/s:

Figure 4.10 – Physical network interfaces (PNIs) on AWS Snowball Edge

Interfaces

RJ45: The RJ45 ports on an AWS Snowball Edge device support Ethernet over copper twisted-pair cables at either 1 Gbit/s or 10 Gbit/s. The interface will negotiate one or the other depending on what type of switch port is on the other end. Note that a 10 Gbit/s operation requires, at minimum, a Cat6a cable, or you can expect to drop packets. Cat8 cables are recommended.

Small Form-factor Pluggable (SFP) iteration 28: These are empty slots into which you must insert a transceiver module of some type. You must supply the transceiver module as none ship with an AWS Snowball device of any type. The 28 at the end refers to the fact that they can take Ethernet SFPs that go as fast as 25Gbit/s. These slots are also backward compatible with older 10 Gbit/s or even 1 Gbit/s modules:

Figure 4.11 – 25 GbE fiber optic (left) and 25 GbE RJ45 copper SFPs (right)

With SFP modules, you must supply the correct cable type as well. In the case of the 25 GbE fiber optic SFPs shown in Figure 4.11, those would be 50-micron LC-LC OM3 (or better) multimode cables. LC stands for Lucent Connector. They are the smaller squarish connectors that have a receive and transmit strand. OM3 stands for Optical Multimode version 3. These cables typically have an aqua colored jacket, a core size of 50 micrometers. In the case of 25 GbE over copper, a Cat8 twisted pair is required (see Figure 4.12):

Figure 4.12 – Cat8 twisted-pair RJ45 cable

Alternatively, 25 GbE SFP28 Twinax cables can be used in these slots. A Twinax cable, also called a direct-attach copper (DAC) cable, has transceivers on both ends and the cable is molded together as one big unit (see Figure 4.13). The cable part inside Twinax is copper, but it isn’t twisted-pair. It is essentially two coaxial cables bundled together – hence the name Twinax(ial):

Figure 4.13 – 25 GbE SFP28 Twinax cable

QSFP variant 28 – Like the SFP28 slots, these are empty sockets that you must insert a transceiver into. As is the case with the SFP28 slots, you must supply the transceiver yourself. Whereas SFP28 slots have a single 25 Gbit/s lane, the Quad part of QSFP28 denotes that these have four lanes. They can, therefore, support up to 100 Gbit/s over this single interface. Connectivity options remain the same as with SFP28, but in practice, Twinax cables are almost always used with QSFP. Note that these slots support older 40 Gbit/s modules as well:

Figure 4.14 – 100 GbE QSFP28 Twinax cable

More Details
Apr 29, 2024
Other considerations – Addressing Disconnected Scenarios with AWS Snow Family

Let us assume the following conditions for a migration using an AWS Snowball Edge device:

A SAN array has two servers as clients

Each server utilizes two Logical Unit Numbers (LUNs) on the SAN

One server runs Windows Server 2019

One server runs Red Hat Enterprise Linux 8 (RHEL 8)

The Windows server exposes its data for copying through a CIFS share

The Linux server exposes its data using an NFS export

The desktop is going to act as a data mover for the AWS Snowball Edge device

Figure 4.9 – Hypothetical data movement paths

Looking at Figure 4.9, we can see several places where the throughput could get slowed down:

The disk groups/pools on the SAN array

The controllers/I/O ports on the SAN array

The Fibrechannel fabric connecting the SAN array to the servers

The hardware configuration of either server

The OS and file-serving configuration of either server

Whether either server is dedicated to this task or is running other apps

Differences in the CIFS and NFS protocols or their versions

The network between the servers and the desktop

Hardware and software configuration of the desktop

An even worse possibility is that the servers and the desktop can pull the data from the SAN at maximum speed of all devices and links involved, only to discover this causes the SAN controller to queue I/O requests for a third client you weren’t aware of.

It turns out this third server is running a large Microsoft SQL Server that consumes LUNs from the same disk pool on the SAN array, and it also shares the same pair of SAN controllers on the frontend. The 10 Gbit/s of sequential reads causes head thrashing on the disk pool and overruns the shared cache on the controllers.

As a result, the mission-critical application that depends on this database suffers performance degradation – or worse, an outage. Anyone who has overseen many data center migrations – to the cloud or otherwise – has probably witnessed such a situation. Figuring out how fast you can possibly move data onto a device is important, but it is even more important to determine the maximum non-impactive speed for the source of the data.

More Details
Mar 31, 2024
Client-side mechanisms for loading data onto AWS Snowball Edge – Addressing Disconnected Scenarios with AWS Snow Family

With the exception of AWS DataSync, file loading is a push operation from your data loader workstation. Thus, you will need to use an appropriate client application to communicate with the target you have selected.

Performance tip – batching

Regardless of the client-side mechanism you use to copy data, there is a certain amount of per-file overhead incurred by operations such as encryption. This is why copying a thousand 1 KB files is slower than copying one 1,000 KB file. If the data you are loading consists of many small files spread across many subdirectories, you will probably save time by batching them up into one large archive with utilities such as zip, gzip, or tar. This is true even if you obtain zero compression by doing so.

AWS OpsHub for Snow Family

The simplest thing to do is use the drag-and-drop interface in the AWS OpsHub application. Customers who prefer a GUI interface download and use this anyway to unlock the device and make configuration changes to it. This option also requires no special target configuration.

While it might be convenient, as you can see from Figure 4.8, it is also quite slow with a maximum speed of around 0.3 Gbps:

Figure 4.8 – Uploading files via drag and drop in AWS OpsHub

NFS client

When using the NFS endpoint, your data loader workstation must have an NFS client installed. This is usually installed by default on macOS or Linux. While Windows does offer an NFS client, it is not installed by default, and the performance tends to be lower.

AWS CLI

The AWS CLI should be installed anyway on your data loader workstation. It can be used to target the locally running S3 endpoint on the AWS Snowball Edge device. Using the aws s3 sync command, you can do bulk data transfer operations the same way you would with an S3 bucket in an AWS region.

s5cmd

The AWS CLI is a general-purpose utility written in Python. It wasn’t explicitly designed to maximize file transfer speed. This means it can’t usually push as fast as the S3 endpoint can receive. Fortunately, s5cmd can. It is an open source project available on GitHub. It is written in Go and focuses on maximum parallelization. The more CPU cores your data loader has, the faster it can move data. However, given most laptops or even desktops don’t have 128 cores and 25 Gbps interfaces, this option tends to be used when the loader itself is a server in the customer’s data center.

More Details
Feb 24, 2024
Targets available on AWS Snowball Edge for data loading – Addressing Disconnected Scenarios with AWS Snow Family

There are several types of targets available on an AWS Snowball Edge device that you can use to load data.

NFS endpoint on the AWS Snowball Edge device

This option allows users to access and manage data on the Snowball Edge device using the familiar NFS protocol. This means you can easily mount the Snowball Edge device as a network file share, similar to mounting a NAS device. You can then perform standard file operations such as reading, writing, moving, and deleting files using drag and drop like you would on a departmental file share. Linux or macOS both have NFS support built in, while Windows requires installation of the Services for NFS optional component or a third-party NFS client.

This is generally the most convenient method and the most readily understood. Standard client-side tools such as rsync, xcopy, Robocopy, or the like can be used with no modifications.

This target has a practical maximum throughput of around 3 Gbit/s.

S3 endpoint on the AWS Snowball Edge device

All members of the AWS Snow Family have a local version of the same sort of S3 endpoint as you would work with in a region. You simply target the S3 endpoint IP on the AWS Snowball Edge device with commands from the AWS CLI or your own code (for instance, a Python script using boto3):

Figure 4.5 – S3 endpoint on an AWS Snowball Edge device

You can also target this local endpoint with third-party programs that know how to work with S3 – common examples include enterprise backup software packages such as Veeam or Commvault.

This target can ingest at speeds in excess of 20 Gbit/s. However, this requires considerable optimization of the client-side transfer mechanism to achieve.

EC2 instance running on the AWS Snowball Edge device

Another approach is to bypass the native endpoints on the device altogether by spinning up an EC2 instance on it:

Figure 4.6 – EC2 instances running on an AWS Snowball Edge device

That instance could run any third-party data transfer software you want, and the limitations on throughput would be specific to that vendor’s software.

AWS DataSync agent

The AWS DataSync agent is a special kind of EC2 instance you can spin up on an AWS Snow Family device. It is important to note that this type of target pulls the data rather than has data pushed to it like all of the others do. DataSync supports pulling data from the following types of shared storage in your on-premise environment:

NFS exports

Windows Server (CIFS/Server Message Block (SMB)) shares

Hadoop Distributed File System (HDFS)

Self-managed object stores (some NAS devices can host S3-compatible stores)

Figure 4.7 – Launching the DataSync agent from OpsHub

You create DataSync tasks inside the AWS Management Console that tell the agent how to access these resources in your environment, when to pull files, how much bandwidth to consume, or if any manipulations need to be done in the process. The agent optimizes the data transfer process by employing techniques such as parallelization, data deduplication, and delta detection to minimize transfer times and optimize bandwidth usage.

A single DataSync task is capable of relaying data to an AWS region at 10 Gbit/s. However, this is dependent upon the resources available within the instance type chosen when the agent is deployed onto the device. At a minimum, an instance type with 2 vCPUs must be used. The more vCPUs the agent has at its disposal, the more it can parallelize the transfer and attain higher speeds.

More Details
Jan 27, 2024
End-to-end network throughput – Addressing Disconnected Scenarios with AWS Snow Family

Of course, before starting any migration, even to a local device, one must evaluate all of the physical network links involved end to end. Having the AWS Snowball device connected to a 40 GbE switchport via Quad-Small Form-factor Pluggable (QSFP) won’t do much good if an upstream network link operates at a single gigabit:

Figure 4.3 – A full end-to-end throughput path

Additionally, there can be choke points on backend Storage Area Network (SAN) fabrics, disk arrays, Network-Attached Storage (NAS) devices, or virtualization software somewhere in the middle. In Figure 4.3, for example, the data being copied ultimately resides inside Virtual Machine Disk (VMDK) files on an aging SAN array attached via Fibre Channel (FC) to a server running VMware ESXi.

From the laptop’s perspective, the data is being copied over Common Internet File System (CIFS) from one of the VMware VMs, but in reality, there is a virtualization layer and yet another layer of networking behind that. If, for whatever reason, that SAN array’s controller or disk group could only push 4 Gbit/s to the VMware host, it simply doesn’t matter that all components of the “normal” network support 10 Gbit/s.

Data loader workstation resources

When transferring data to an AWS Snowball Edge device, it is important to note that the throughput achieved is highly dependent upon the available CPU resources of the machine doing the transfer.

Figure 4.4 – AWS Snowball Edge device loading from a laptop

In Figure 4.4, we can see that a reasonably powerful laptop with 8 CPU cores can transfer around 6 Gbit/s, even though there are effectively 10 Gbit/s available end to end on the network. Using a more powerful machine, particularly one with more CPU cores, we would expect the net throughput to rise.

More Details
Dec 20, 2023
Using AWS Snowball Edge – Addressing Disconnected Scenarios with AWS Snow Family

There is no longer a division between AWS Snowball and AWS Snowball Edge. Now, all such devices fall under the AWS Snowball Edge line, even if their intended use case is a straightforward data migration to S3.

There are four configurations with which an AWS Snowball Edge device can be ordered (see Figure 4.1):

   Storage Optimized w/80 TBCompute Optimized Type 1Compute Optimized Type 2 1 1 At the time of writing, this variant is limited to US-based regions onlyCompute Optimized w/GPU
HDD in TB8039.539.539.5
SSD in TB17.6807.68
NVME in TB00280
VCPUs245210452
VRAM in GB80208416208
GPU typeNoneNoneNoneNVIDIA V100
10 Gbit RJ451222
25 Gbit SFP1111
100 Gbit QSFP1111
Volume (in3)5381538153815381
Weight (lbs)47474747
Power draw (avg)304 w304 w304 w304 w
Power draw (max)1200 w1200 w1200 w1200 w
Voltage range100-240 v100-240 v100-240 v100-240 v

Table 4.1 – Comparison of AWS Snowball Edge variants

The AWS Snowball Edge Storage Optimized variant is now used for data migrations in place of the old AWS Snowball. There is a local S3 endpoint to which files can be directly copied using AWS OpsHub, the AWS Command Line Interface (AWS CLI), or direct API commands from a script.

The local compute capacity can be used to host an AWS DataSync instance, an AWS Tape Gateway instance, an AWS File Gateway instance, or another instance that provides a different type of loading interface of your choosing.

Migrating data to the cloud

Table 4.2 illustrates how long migrations of varying sizes would take depending upon the network throughput:

   50 Mbps100 Mbps1 Gbps2 Gbps5 Gbps10 Gbps25 Gbps40 Gbps100 Gbps
50 Terabytes3.3 months1.7 months5 days2.5 days1 day12 hours5 hours3 hours1 hour
500 Terabytes2.8 years1.4 years1.7 months25 days10 days5 days2 days1.25 days12 hours
5 Petabytes28.5 years14.3 years1.4 years8.5 months3.4 months1.7 months20 days12 days5 days
10 Petabytes57 years28.5 years2.8 years1.4 years6.8 months3.4 months1.3 months24 days10 days

Table 4.2 – Comparison of migration times

Many organizations don’t have high-throughput internet connections that could be fully dedicated to migration. Nor do they have access to/familiarity with the techniques needed to fully utilize said connection once the latency gets above a few milliseconds.

This is why loading one or more devices connected to a local network and physically shipping to AWS is so popular – despite the days on either end the devices spend on a truck:

Figure 4.2 – An AWS Snowball Edge device being loaded with data

More Details
Nov 19, 2023
Introduction to the AWS Snow Family – Addressing Disconnected Scenarios with AWS Snow Family

In today’s interconnected world, reliable connectivity is often taken for granted. However, there are numerous scenarios where maintaining a consistent network connection is a challenge, such as remote locations, disaster-stricken areas, or environments with limited or intermittent network access. In these disconnected scenarios, organizations require a solution that can ensure data availability, enable efficient data processing, and one that will support critical operations. This is where the AWS Snow Family comes into play, providing a range of robust and versatile solutions designed specifically to address the unique requirements of disconnected environments.

In this chapter, we will explore how the AWS Snow Family empowers organizations to overcome the limitations of disconnected scenarios and seamlessly bridge the gap between on-premises infrastructure and the cloud. We will delve into the features and capabilities of AWS Snow Family offerings and discuss their use cases, benefits, and considerations. Whether it’s securely transferring large amounts of data, performing on-site data processing and analysis, or extending cloud services to the edge, the AWS Snow Family offers reliable, scalable, and cost-effective solutions that cater to the needs of disconnected environments. Join us as we discover the power of AWS Snow to enable data-driven decision-making and unlock new possibilities in disconnected scenarios.

Here are the main headings:

Introduction to the AWS Snow Family

Using AWS Snowball Edge

Using AWS Snowcone

Introduction to the AWS Snow Family

The original AWS Snowball service was introduced in 2015. It started out as a mechanism to move large amounts of data when doing so over the network wasn’t reasonable. In the ensuing years, customer demand for new capabilities has driven the expansion of this line into different variants with use-case-specific capabilities:

Figure 4.1 – AWS Snow Family devices

All offer an interface and operating model that is consistent with Amazon EC2 and Amazon S3, and they are all designed to run autonomously. All AWS Snow Family devices operate their own local control, management, and data planes. Thus, they do not require a consistent network connection back to the AWS cloud to operate.

AWS Snow Family devices can all host local object storage buckets that utilize the same API/CLI interface as Amazon S3 buckets. When a customer orders one, it is sent to them, they copy their data to these local buckets, and then they ship the unit back to AWS. This is facilitated by an e-ink display on the unit that eliminates the need to pack it in a box or obtain a shipping label separately. When the device is received by AWS, the data is uploaded to the relevant “real version” of the Amazon S3 bucket in question.

Additionally, AWS Snow Family devices do not have the same restrictive environmental requirements as most off-the-shelf compute and storage hardware. AWS Snow Family devices are found operating in a wide variety of field situations that would be impractical with standard off-the-shelf servers. First responders heading to the site of a disaster can even check them in as luggage.

More Details