Aug 19, 2024
DNIs – Addressing Disconnected Scenarios with AWS Snow Family

DNIs were introduced to AWS Snow Family devices to support advanced network use cases. DNIs provide layer 2 network access without any translation or filtering, enabling features such as multicast streams, transitive routing, and load balancing. This direct access enhances network performance and allows for customized network configurations.

DNIs support VLAN tags, enabling network segmentation and isolation within the Snow Family device. Additionally, the MAC address can be customized for each DNI, providing further flexibility in network configuration:

Figure 4.18 – AWS Snowball Edge device with one DNI

DNIs and security groups

It’s important to note that traffic on DNIs is not protected by security groups, so additional security measures need to be implemented at the application or network level.

Snowball Edge devices support DNIs on all types of physical Ethernet ports, with each port capable of accommodating up to seven DNIs. For example, RJ45 port #1 can have seven DNIs, with four DNIs mapped to one EC2 instance and three DNIs mapped to another instance. RJ45 port #2 could simultaneously accommodate an additional seven DNIs for other EC2 instances.

Note that the Storage Optimized variant of AWS Snowball Edge does not support DNIs:

Figure 4.19 – AWS Snowball Edge network flows with DNIs

Looking at Figure 4.19, we can see that al2-1 has two Ethernet ports configured inside Linux. One is on the typical 34.223.14.128/25 subnet, but the other is directly on the 192.168.100.0/24 RFC 1918 space. A configuration such as this is the only time an interface on an EC2 instance on an AWS Snow Family device should be configured for any subnet but 34.223.14.128/25.

Figure 4.20 shows what a DNI looks like from the perspective of the EC2 instance that has one attached:

Figure 4.20 – DNI details under Amazon Linux 2

Storage allocation

All AWS Snowball Edge device variants work the same way with respect to storage allocation. Object or file storage can draw from the device’s HDD storage capacity, while block volumes used by EC2 instances can be drawn from either the device’s HDD or SDD capacity. Figure 4.21 shows an example of this:

Figure 4.21 – Storage allocation on AWS Snowball Edge

S3 buckets on a device can be thought of as being thin-provisioned in the sense that they start out consuming 0 bytes, and as objects are added, they only take the amount needed for those objects from the HDD capacity.

Block volumes for EC2 instances, on the other hand, can be thought of as thick-provisioned. When the volume is created, a capacity is specified, and it is immediately removed from the HDD pool for any other use.

More Details
Jul 4, 2024
AWS SNOW “PRIVATE” SUBNET – 34.223.14.128/25 – Addressing Disconnected Scenarios with AWS Snow Family

Something you will notice right away in Figure 4.15 is that the EC2 instances are configured for an internal network of 34.223.14.128/25, which is a routable prefix on the internet. At the same time, the “public” IPs mapped to them on their VNIs live on 192.168.100.0/24 – a non-routable RFC 1918 address space. This is counter-intuitive and the opposite of how public subnets work inside an AWS region.

Rest assured this is done for a reason. AWS owns the 34.223.13.128/25 space with IANA, and it is not actually used on the internet. AWS chose to do this to make deployment of Snow Family devices simpler by ensuring the default private subnet is never the same as whatever RFC 1918 address space a customer is using.

Note that while you can make the “public” subnet of the VNIs live on whatever you wish, it is not possible to change the “private” subnet on any AWS Snow Family device – it is always 34.223.14.128/25.

Two VNIs sharing one physical Ethernet port

In certain situations, physical constraints may prevent the ideal configuration of separating VNIs onto different Ethernet ports on an AWS Snowball Edge device:

Figure 4.16 – AWS Snowball Edge device with two VNIs on a single PNI

Figure 4.17 illustrates the two network paths possible from EC2 instance al2-1. This instance can connect to devices outside the AWS Snowball Edge environment via VNI 1, which is configured as a 1-1 NAT entry for 192.168.100.210 mapped to the IP the EC2 instance has configured internally of 34.223.13.193:

Figure 4.17 – AWS Snowball Edge network flows with VNIs

At the same time, al2-1 can communicate directly to centos-1 across the AWS Snowball Edge device’s internal subnet of 34.223.14.128/25.

VLANs on AWS Snow Family

It is possible for a Snow Family device to have VNIs that share the same physical Ethernet port configured for two different RFC 1918 subnets through the use of VLAN tagging. This helps to mitigate some security concerns expressed by customers, but be aware: instances will always be able to talk directly on the internal 34.223.14.128/25 subnet. It is therefore important that security groups are used to limit this.

More Details
Feb 24, 2024
Targets available on AWS Snowball Edge for data loading – Addressing Disconnected Scenarios with AWS Snow Family

There are several types of targets available on an AWS Snowball Edge device that you can use to load data.

NFS endpoint on the AWS Snowball Edge device

This option allows users to access and manage data on the Snowball Edge device using the familiar NFS protocol. This means you can easily mount the Snowball Edge device as a network file share, similar to mounting a NAS device. You can then perform standard file operations such as reading, writing, moving, and deleting files using drag and drop like you would on a departmental file share. Linux or macOS both have NFS support built in, while Windows requires installation of the Services for NFS optional component or a third-party NFS client.

This is generally the most convenient method and the most readily understood. Standard client-side tools such as rsync, xcopy, Robocopy, or the like can be used with no modifications.

This target has a practical maximum throughput of around 3 Gbit/s.

S3 endpoint on the AWS Snowball Edge device

All members of the AWS Snow Family have a local version of the same sort of S3 endpoint as you would work with in a region. You simply target the S3 endpoint IP on the AWS Snowball Edge device with commands from the AWS CLI or your own code (for instance, a Python script using boto3):

Figure 4.5 – S3 endpoint on an AWS Snowball Edge device

You can also target this local endpoint with third-party programs that know how to work with S3 – common examples include enterprise backup software packages such as Veeam or Commvault.

This target can ingest at speeds in excess of 20 Gbit/s. However, this requires considerable optimization of the client-side transfer mechanism to achieve.

EC2 instance running on the AWS Snowball Edge device

Another approach is to bypass the native endpoints on the device altogether by spinning up an EC2 instance on it:

Figure 4.6 – EC2 instances running on an AWS Snowball Edge device

That instance could run any third-party data transfer software you want, and the limitations on throughput would be specific to that vendor’s software.

AWS DataSync agent

The AWS DataSync agent is a special kind of EC2 instance you can spin up on an AWS Snow Family device. It is important to note that this type of target pulls the data rather than has data pushed to it like all of the others do. DataSync supports pulling data from the following types of shared storage in your on-premise environment:

NFS exports

Windows Server (CIFS/Server Message Block (SMB)) shares

Hadoop Distributed File System (HDFS)

Self-managed object stores (some NAS devices can host S3-compatible stores)

Figure 4.7 – Launching the DataSync agent from OpsHub

You create DataSync tasks inside the AWS Management Console that tell the agent how to access these resources in your environment, when to pull files, how much bandwidth to consume, or if any manipulations need to be done in the process. The agent optimizes the data transfer process by employing techniques such as parallelization, data deduplication, and delta detection to minimize transfer times and optimize bandwidth usage.

A single DataSync task is capable of relaying data to an AWS region at 10 Gbit/s. However, this is dependent upon the resources available within the instance type chosen when the agent is deployed onto the device. At a minimum, an instance type with 2 vCPUs must be used. The more vCPUs the agent has at its disposal, the more it can parallelize the transfer and attain higher speeds.

More Details
Jan 27, 2024
End-to-end network throughput – Addressing Disconnected Scenarios with AWS Snow Family

Of course, before starting any migration, even to a local device, one must evaluate all of the physical network links involved end to end. Having the AWS Snowball device connected to a 40 GbE switchport via Quad-Small Form-factor Pluggable (QSFP) won’t do much good if an upstream network link operates at a single gigabit:

Figure 4.3 – A full end-to-end throughput path

Additionally, there can be choke points on backend Storage Area Network (SAN) fabrics, disk arrays, Network-Attached Storage (NAS) devices, or virtualization software somewhere in the middle. In Figure 4.3, for example, the data being copied ultimately resides inside Virtual Machine Disk (VMDK) files on an aging SAN array attached via Fibre Channel (FC) to a server running VMware ESXi.

From the laptop’s perspective, the data is being copied over Common Internet File System (CIFS) from one of the VMware VMs, but in reality, there is a virtualization layer and yet another layer of networking behind that. If, for whatever reason, that SAN array’s controller or disk group could only push 4 Gbit/s to the VMware host, it simply doesn’t matter that all components of the “normal” network support 10 Gbit/s.

Data loader workstation resources

When transferring data to an AWS Snowball Edge device, it is important to note that the throughput achieved is highly dependent upon the available CPU resources of the machine doing the transfer.

Figure 4.4 – AWS Snowball Edge device loading from a laptop

In Figure 4.4, we can see that a reasonably powerful laptop with 8 CPU cores can transfer around 6 Gbit/s, even though there are effectively 10 Gbit/s available end to end on the network. Using a more powerful machine, particularly one with more CPU cores, we would expect the net throughput to rise.

More Details
Dec 20, 2023
Using AWS Snowball Edge – Addressing Disconnected Scenarios with AWS Snow Family

There is no longer a division between AWS Snowball and AWS Snowball Edge. Now, all such devices fall under the AWS Snowball Edge line, even if their intended use case is a straightforward data migration to S3.

There are four configurations with which an AWS Snowball Edge device can be ordered (see Figure 4.1):

   Storage Optimized w/80 TBCompute Optimized Type 1Compute Optimized Type 2 1 1 At the time of writing, this variant is limited to US-based regions onlyCompute Optimized w/GPU
HDD in TB8039.539.539.5
SSD in TB17.6807.68
NVME in TB00280
VCPUs245210452
VRAM in GB80208416208
GPU typeNoneNoneNoneNVIDIA V100
10 Gbit RJ451222
25 Gbit SFP1111
100 Gbit QSFP1111
Volume (in3)5381538153815381
Weight (lbs)47474747
Power draw (avg)304 w304 w304 w304 w
Power draw (max)1200 w1200 w1200 w1200 w
Voltage range100-240 v100-240 v100-240 v100-240 v

Table 4.1 – Comparison of AWS Snowball Edge variants

The AWS Snowball Edge Storage Optimized variant is now used for data migrations in place of the old AWS Snowball. There is a local S3 endpoint to which files can be directly copied using AWS OpsHub, the AWS Command Line Interface (AWS CLI), or direct API commands from a script.

The local compute capacity can be used to host an AWS DataSync instance, an AWS Tape Gateway instance, an AWS File Gateway instance, or another instance that provides a different type of loading interface of your choosing.

Migrating data to the cloud

Table 4.2 illustrates how long migrations of varying sizes would take depending upon the network throughput:

   50 Mbps100 Mbps1 Gbps2 Gbps5 Gbps10 Gbps25 Gbps40 Gbps100 Gbps
50 Terabytes3.3 months1.7 months5 days2.5 days1 day12 hours5 hours3 hours1 hour
500 Terabytes2.8 years1.4 years1.7 months25 days10 days5 days2 days1.25 days12 hours
5 Petabytes28.5 years14.3 years1.4 years8.5 months3.4 months1.7 months20 days12 days5 days
10 Petabytes57 years28.5 years2.8 years1.4 years6.8 months3.4 months1.3 months24 days10 days

Table 4.2 – Comparison of migration times

Many organizations don’t have high-throughput internet connections that could be fully dedicated to migration. Nor do they have access to/familiarity with the techniques needed to fully utilize said connection once the latency gets above a few milliseconds.

This is why loading one or more devices connected to a local network and physically shipping to AWS is so popular – despite the days on either end the devices spend on a truck:

Figure 4.2 – An AWS Snowball Edge device being loaded with data

More Details
Nov 19, 2023
Introduction to the AWS Snow Family – Addressing Disconnected Scenarios with AWS Snow Family

In today’s interconnected world, reliable connectivity is often taken for granted. However, there are numerous scenarios where maintaining a consistent network connection is a challenge, such as remote locations, disaster-stricken areas, or environments with limited or intermittent network access. In these disconnected scenarios, organizations require a solution that can ensure data availability, enable efficient data processing, and one that will support critical operations. This is where the AWS Snow Family comes into play, providing a range of robust and versatile solutions designed specifically to address the unique requirements of disconnected environments.

In this chapter, we will explore how the AWS Snow Family empowers organizations to overcome the limitations of disconnected scenarios and seamlessly bridge the gap between on-premises infrastructure and the cloud. We will delve into the features and capabilities of AWS Snow Family offerings and discuss their use cases, benefits, and considerations. Whether it’s securely transferring large amounts of data, performing on-site data processing and analysis, or extending cloud services to the edge, the AWS Snow Family offers reliable, scalable, and cost-effective solutions that cater to the needs of disconnected environments. Join us as we discover the power of AWS Snow to enable data-driven decision-making and unlock new possibilities in disconnected scenarios.

Here are the main headings:

Introduction to the AWS Snow Family

Using AWS Snowball Edge

Using AWS Snowcone

Introduction to the AWS Snow Family

The original AWS Snowball service was introduced in 2015. It started out as a mechanism to move large amounts of data when doing so over the network wasn’t reasonable. In the ensuing years, customer demand for new capabilities has driven the expansion of this line into different variants with use-case-specific capabilities:

Figure 4.1 – AWS Snow Family devices

All offer an interface and operating model that is consistent with Amazon EC2 and Amazon S3, and they are all designed to run autonomously. All AWS Snow Family devices operate their own local control, management, and data planes. Thus, they do not require a consistent network connection back to the AWS cloud to operate.

AWS Snow Family devices can all host local object storage buckets that utilize the same API/CLI interface as Amazon S3 buckets. When a customer orders one, it is sent to them, they copy their data to these local buckets, and then they ship the unit back to AWS. This is facilitated by an e-ink display on the unit that eliminates the need to pack it in a box or obtain a shipping label separately. When the device is received by AWS, the data is uploaded to the relevant “real version” of the Amazon S3 bucket in question.

Additionally, AWS Snow Family devices do not have the same restrictive environmental requirements as most off-the-shelf compute and storage hardware. AWS Snow Family devices are found operating in a wide variety of field situations that would be impractical with standard off-the-shelf servers. First responders heading to the site of a disaster can even check them in as luggage.

More Details
Oct 29, 2023
Global Navigation Satellite System (GLONASS) – Understanding Network and Security for Far-Edge Computing

Contemporaneously with the rollout of the US’s GPS, the Soviet Union began deployment of a similar system known as GLONASS. The first satellite was launched in 1982 and has continued to be developed by the Russian Federation and operated by Roscosmos. Due to economic constraints in the 1990s/2000s followed by sanction-related obstacles in the 2010s, GLONASS has faced numerous challenges. However, it remains operational and available for anyone to use.

Compared to GPS, GLONASS is less accurate on average (though only slightly). That said, due to the different configuration of its orbits, GLONASS is a bit more accurate than GPS at high latitudes (such as within the Arctic or Antarctic circles).

Galileo

Created by the European Union via the European Space Agency, Galileo is a multinational effort to operate a global positioning system that provides independence from single-country control as is seen with GPS and GLONASS. The system went live in 2016 and currently operates 30 satellites in MEO.

At the time of writing, Galileo is the most accurate of the three global systems for the average user.

Regional and augmentation systems

In addition to the three global systems, there are a few regional and augmentation systems. These include the following:

Quasi-Zenith Satellite System (QZSS): Operated by Japan, QZSS uses a combination of satellites in geostationary and highly elliptical orbits to augment GPS, improving performance for terminals in Japan and the surrounding region.

Navigation Indian Constellation (NAVIC): Deployed by India, NAVIC uses a handful of geostationary satellites to improve performance for GPS terminals in South Asia.

Wide Area Augmentation System (WAAS): The US Federal Aviation Agency (FAA) operates three satellites in geostationary orbit to improve navigation for civilian aircraft in North America.

European Geostationary Navigation Overlay Service (EGNOS): A distinct system from Galileo, EGNOS is a set of three geostationary satellites that augment GPS for European users. Future plans include the ability to augment the Galileo system as well.

Other uses for GNSS

When a very precise clock source is needed that is accurate down to nanoseconds, expensive atomic clocks are one approach. However, because GNSS satellites have one or more atomic clocks onboard, their signals can be used to indirectly gain access to a free atomic clock. For example, 5G NFV functions, or virtual machines running a Software-Defined Radio (SDR) application require access to a physical clock. Network Time Protocol (NTP) or Precision Time Protocol (PTP) servers frequently save money by making use of GNSS signals.

Summary

In this chapter, we introduced you to elements that are common to all wireless communication technologies that are used at the far edge – concepts such as wavelength, frequency, duplexing, modulation, multipathing, and antenna design.

We built upon that by diving into cellular networking technologies such as 4G/LTE and 5G, reviewing the key advantages of 5G networks and how they enable new low-latency/high-throughput use cases. You were given a survey of LPWAN technologies such as LoRaWAN and NB-IoT, both of which are crucial to use cases such as smart agriculture, V2X, and smart cities.

Finally, we discussed the basics needed to understand SATCOM technologies and the services based on them – upon which the most remote edge computing use cases are dependent.

In the next chapter, we will cover the AWS Snow family of services. These target remote/disconnected edge compute situations.

More Details
Sep 1, 2023
GEOMETRIC DILUTION OF PRECISION (GDOP) – Understanding Network and Security for Far-Edge Computing

GDOP is a calculated value that combines the impact of several factors related to the angle at which the ground station can reach the satellites into a single coefficient that expresses how accurate a calculated position is.

Referring back to the previous figure, we can see an example of good geometry of the satellites involved. They are spread across the sky in all three axes. Contrast that with the following situation. In this case, the user is in an area surrounded by mountains. The terminal has no choice but to use samples from satellites that are closer together in the sky, and the calculated position will be less accurate as a result:

Figure 3.43 – Poor geometry due to obstructions

Other sources of GNSS inaccuracy

Atmospheric refraction is when a satellite’s signal is bent a little while traveling through the upper layers of the atmosphere. Sunspot activity can cause interference. Lower-quality receivers are more susceptible to measurement noise, which can happen even under perfect environmental conditions. A clock error of 1 nanosecond (a billionth of a second) can introduce as much as half a meter (1.5 feet) of imprecision.

Urban environments pose a particular challenge to GNSSs. Not only is the geometry compromised by buildings, but the signals the user can receive are often reflected off of them – causing unwanted multipath propagation as previously discussed. If you’ve ever requested a ride from an app on your phone and wondered why the driver thinks you’re at a restaurant two streets away, these are likely culprits.

Global Positioning System (GPS)

The first satellite for what we now know as GPS was launched in 1978 by the United States Air Force. At first, only the US military had access to the system.

In 1983, pilots of a commercial flight from Alaska to Korea made a navigational error that took their aircraft over the Kamchatka Peninsula near Japan. In response, a Soviet SU-15 interceptor shot down the Boeing 747, killing all 269 civilians onboard. To prevent future incidents, the US opened GPS for civilian use.

As of 2020, GPS is operated by the United States Space Force and remains open for anyone to use. At the time of writing, it has 32 satellites in a semi-synchronous21 medium Earth orbit (MEO) with an altitude of 20,200 kilometers (12,600 miles). Each orbit has a different inclination, providing global ground coverage.

21 A semi-synchronous orbit is one in which the spacecraft passes over a given point on the Earth twice per day.

More Details
Aug 7, 2023
LOW-EARTH ORBIT (LEO) – Understanding Network and Security for Far-Edge Computing

LEO satellites are positioned in orbit around the Earth at an altitude of up to 2,000 kilometers (1,200 miles). Because of this, they are in constant motion relative to an observer.

LEO satellites are known for their ability to provide coverage over a large area of the Earth’s surface since they orbit the Earth relatively quickly (compared to GEO satellites). This allows them to provide communication and other services to a large number of users, as well as to track the movement of objects on the surface of the Earth.

The primary technical advantage of LEO-based SATCOM systems is their much lower latency than GEO (as low as ~20ms RTT). The main disadvantage is caused by the fact that they are in constant motion concerning any given point on the ground. They must use mechanisms such as motorized tracking antennas (or complex phased-array antennas) and constellations of a sufficient size to ensure users on the ground can always reach at least one satellite.

Here are some examples of LEO-based SATCOM services:

Certus 700: An L-band service from Iridium that supports speeds as high as 704 Kbps. It is served by 66 cross-linked satellites in LEO.

Starlink Roam: A Ka/Ku-band service from Starlink that supports speeds up to 200 Mbps. It is served by over 3,50020 cross-linked satellites in LEO, with plans to grow to as many as 12,000.

20 As of February, 2023.

Global Navigation Satellite System (GNSS)

GNSS is an overarching term that includes all of the systems that use timing signals from satellite constellations to determine a position on the ground for navigation purposes.

GNSS for positioning

Trilateration

All satellite-based navigation systems discussed in this section determine a terminal’s position using trilateration. Unlike triangulation, it measures distance – not angles. Satellites in these systems repeatedly broadcast their current position and local time, derived from multiple onboard atomic clocks.

The following figure demonstrates a point on the ground receiving the same broadcast from four satellites:

Figure 3.42 – Trilateration using four satellites

From these four pieces of data, a terminal can calculate its position within a margin of error that varies from centimeters to hundreds of meters, depending on the circumstances.

More Details
Jul 26, 2023
Satellite orbits – Understanding Network and Security for Far-Edge Computing

Geostationary orbit (GEO)

GEO satellites are positioned in orbit around the Earth at an altitude of about 35,786 kilometers (22,236 miles). They are designed to remain in a fixed location relative to a point on the Earth’s surface as they orbit the Earth at the same rate that the Earth rotates.

This makes things easy for ground-based users. There are mobile apps that will tell you exactly where in the sky to point your antenna, and then you’re done:

Figure 3.41 – GEO satellite distance

The downside is the high latency incurred when signals have to travel that far. The speed of light is fast, but it is finite. ~200 milliseconds are required for light to go from one spot on the earth up to the GEO satellite and another 200 to go down to another spot. Factor in the latency of any ground segment and a 600ms RTT is considered typical.

Here are some typical GEO-based SATCOM data services:

Broadband Global Area Network (BGAN): This is an L-band service from Inmarsat. It can achieve speeds up to 492kbps for standard IP data traffic and up to 800kbps for streaming data (usually video), although this depends heavily upon the terminal involved. Six geostationary satellites are involved in providing global coverage (including polar regions) for this service. It is extremely reliable, supporting a 99.9% uptime SLA.

Global Xpress (GX): This is a Ka-band service from Inmarsat. It can achieve download speeds up to 50mbps and 5mbps speeds for upload. Five geostationary satellites provide near-global coverage.

European Aviation Network (EAN): This is a hybrid service comprised of a single Inmarsat S-band satellite in geostationary orbit above Europe and Vodafone’s terrestrial 4G/LTE network. Specifically built to provide data services onboard aircraft in European airspace, data rates as high as 100mbps are supported. Aircraft use the terrestrial network below 10,000 feet and switch to the S-band service above this altitude.

ViaSat-3: This is a Ka-band service that uses a constellation of three geostationary satellites operated by ViaSat. Each satellite serves a specific region (AMER, EMEA, or APAC), and has a total network capacity greater than 1 terabit per second. Typical consumer plans are 100mbps, while contracts for defense and commercial entities can be higher.

GEO HTS: This is a Ku-band service from SES that can achieve speeds up to 10mbps. It has near-global coverage using four satellites in geostationary orbit.

FlexGround: This is a Ku-band service from Intelsat that supports download speeds up to 10mbps and 3mbps upload speeds. Being one of the pioneers in SATCOM19, Intelsat has over 50 satellites in geostationary orbit.

19 Intelsat launched its first satellite in 1965.

More Details