Deploy SAS Grid 9.4 on Azure NetApp Files

Azure NetApp Files
Azure Virtual Machines

SAS analytics software provides a suite of services and tools for drawing insights from data and making intelligent decisions. SAS solutions provide analytics, artificial intelligence, business intelligence, customer intelligence, data management, and fraud and security intelligence.

If you're deploying SAS Grid on Azure, Azure NetApp Files is a viable primary storage option. When you use the scalable services of Azure NetApp Files, you can scale the storage allocations up or down at any time without interruption to the services. You can also adjust the storage service level to the performance requirements dynamically.

SAS offers these primary platforms, which Microsoft has validated:

  • SAS Grid 9.4
  • SAS Viya

SAS Grid 9.4 has been validated on Linux.

This article provides general information for running SAS Grid 9.4 on Azure, using Azure NetApp Files for SASDATA storage. It also provides guidance on storage options for SASWORK. These guidelines are based on the assumption that you host your own SAS solution on Azure, in your own tenant. SAS doesn't provide hosting for SAS Grid on Azure.

Architecture

Diagram that shows an architecture for running SAS Grid on Azure.

Download a PowerPoint file of all diagrams in this article.

Dataflow

The compute tier uses SASDATA (and optionally SASWORK) volumes to share data across the grid. SASDATA is an NFS-connected volume on Azure NetApp Files.

  • A compute node reads input data from SASDATA and writes results back to SASDATA.
  • A subsequent part of the analytics job can be run by another node in the compute tier. It uses the same procedure to obtain and store the information that it needs to process.

Potential use cases

A scalable SAS Grid deployment that uses Azure NetApp Files is applicable to these use cases:

  • Financial analytics
  • Fraud detection
  • Tracking and protection of endangered species
  • Science and medicine
  • Analytics and AI

Requirements for storage performance

For SAS 9.4 (SAS Grid or SAS Analytics Pro) deployments on Azure, Azure NetApp Files is a viable primary storage option for SAS Grid clusters of limited size. SAS recommends 100 MiB/s throughput per physical core. Given that recommendation, SAS Grid clusters that use an Azure NetApp Files volume for SASDATA (persistent SAS data files) are scalable to 32 to 48 physical cores across two or more Azure virtual machines. SAS cluster sizes are based on the architectural constraint of a single SASDATA namespace per SAS cluster and the available single Azure NetApp Files volume bandwidth. The core count guidance will be revisited as Azure infrastructure (compute, network, and per file system storage bandwidth) increases over time.

Azure NetApp Files volume performance expectations

A single Azure NetApp Files volume can handle up to 4,500 MiB/s of reads and 1,500 MiB/s of writes. Given an Azure instance type with sufficient egress bandwidth, a single virtual machine can consume all the write bandwidth of a single Azure NetApp Files volume. However, only the largest single virtual machine can consume all the read bandwidth of a single volume.

SASDATA, the main shared workload of SAS 9.4, has an 80:20 read/write ratio. The important per volume numbers for an 80:20 workload with 64KiB read/write are:

  • 2,400 MiB/s of read throughput and 600 MiB/s of write throughput running concurrently (~3,000 MiB/s combined).

For more information, see Azure NetApp Files performance benchmarks for Linux.

Note

Azure NetApp Files large volumes feature is now available. This feature provides higher per-volume throughput than regular Azure NetApp Files volumes do. This capability can be considered in case more performance is required for your SASDATA (or SASWORK) volumes. See this documentation for details.

Capacity recommendations

The Azure NetApp Files performance calculator can provide guidance for sizing SASDATA volumes.

It's important to choose an appropriate service level because:

  • Volume bandwidth is based on volume capacity.
  • Capacity cost is based on the service level.
  • Your choice of service level is based on capacity versus bandwidth needs.

In the calculator, select advanced, select a region, and enter the following values.

  • Volume size: Desired capacity
  • Throughput: Desired throughput, considering 100 MiB/s per core
  • Read percentage: 80%
  • IOPS: 0
  • I/O size: 64KiB Sequential

The output at the bottom of the screen provides recommended capacity requirements at each service level and the cost per month, based on the price for the selected region:

  • Throughput. The bandwidth of the volume, based on the workload mix. For an 80% 64-KiB sequential read workload, 3,096 MiB/s is the expected maximum.
  • IOPS. The number of IOPS the volume provides at the specified throughput.
  • Volume Size. The amount of capacity needed by the volume at the given service levels to achieve the required throughput. Volume capacity (reported in GiBs) can be equal to or less than capacity pool size. This recommendation is based on the assumption that you're using automatic QoS capacity pool types. To further optimize capacity versus throughput distribution across volumes within a capacity pool, consider manual QoS capacity pool types.
  • Capacity Pool Size. The pool size. A volume's capacity is carved from a capacity pool. Capacity pools are sized in 1-TiB increments.
  • Capacity Pool Cost (USD/month). The cost per month of the capacity pool at the given size and service level.
  • Volume Show Back (USD/month). The cost per month of the capacity for the volume at the specified capacity. Charges are based on the allocated capacity pool sizes. The volume show back indicates the volume amount.

Note

The user experience is the same regardless of the service level, as long as sufficient bandwidth is provisioned.

Control costs as needed by using volume shaping in Azure NetApp Files. Two dynamic options are available to influence performance and cost:

Learn more about the Azure NetApp Files cost model.

Data protection

Azure NetApp Files uses snapshots to help you protect your data. Snapshots provide space-efficient, crash-consistent, near-instantaneous images of your Azure NetApp Files volumes. You can create snapshots manually at any time or schedule them by using a snapshot policy on the volume.

Use a snapshot policy to add automated data protection to your volumes. You can restore snapshots in place quickly by using snapshot revert. Or you can restore a snapshot to a new volume for fast data recovery. You can also use restore to new volume functionality to provide test/dev environments with current data.

For extra levels of data protection, you can use data protection solutions that use Azure NetApp Files backup or partner backup software.

Components

  • Azure Virtual Machines: SAS Grid requires high memory, storage, and I/O bandwidth, in an appropriate ratio with the number of cores. Azure offers predefined virtual machine (VM) sizes with lower vCPU counts that can help to balance the number of cores required with the amount of memory, storage, and I/O bandwidth.

    For more information, see Constrained vCPU capable VM sizes. It's important to thoroughly understand what compute resources are available with each instance. To run SAS Grid on Azure with Azure NetApp Files, we recommended these instance types:

    • Standard_E64-16ds_v4 or Standard_E64-16ds_v5
    • Standard_E64-32ds_v4 or Standard_E64-32ds_v5

    Be sure to review the best practices for using SAS on Azure, including the updates in the comments.

  • Azure NetApp Files: You can store SASDATA on an Azure NetApp Files volume, shared across the compute cluster.

    You can optionally also use Azure NetApp Files NFS volumes for SASWORK.

    Azure NetApp Files is available in three performance service levels:

    • Standard
    • Premium
    • Ultra

    Your volume performance is mostly defined by the service level. The size of your volume is also a factor, because the obtainable throughput is determined by the service level and the size of the volume.

Storage options for SASDATA

Because Azure NetApp Files can provide high throughput and low-latency access to storage, it's a viable, and faster, alternative to Premium Disk. Network-attached storage isn't throttled at the VM level as it is with managed disks, so you get higher throughput to storage.

To estimate the required tier for your SASDATA capacity, use the Azure NetApp Files Performance Calculator. (Be sure to select advanced.)

Because Azure NetApp Files NFS volumes are shared, they're a good candidate for hosting SASDATA, when used with the properly sized VM instance types and Red Hat Enterprise Linux (RHEL) distribution, discussed later in this article.

Storage options for SASWORK

The following table shows the most common storage options for deploying SASWORK on Azure. Depending on your size (capacity) and speed (bandwidth) requirements, you have three options: temporary storage, managed disk, and Azure NetApp Files.

Temporary storage Managed disk Azure NetApp Files
Size Small Large Extra large
Speed Extra large Small Medium

Take these considerations into account when choosing an option:

  • Temporary storage (or ephemeral storage) provides the highest bandwidth, but it's available only in smaller sizes. (Size depends on the VM SKU.) Depending on the available and required capacities, this option might be best.
  • If the required SASWORK capacity exceeds the temporary storage size of the VM SKU that you've selected, consider using an Azure managed disk to host SASWORK. Keep in mind, however, that the throughput to a managed disk is limited by the VM architecture by design, and that it varies depending on the VM SKU. Therefore, this storage option is viable only for environments that have lower SASWORK performance requirements.
  • For the highest SASWORK capacity requirements and an average performance requirement beyond what Azure managed disks can provide, consider Azure NetApp Files for SASWORK. It provides a large size together with fast throughput.

Important

In any scenario, keep in mind that SASWORK can't be shared between VM compute nodes, so you need to create separate SASWORK volumes for each compute node. Volumes need to be NFS-mounted on only one compute node.

In using the preceding table, to decide whether your needs are small, large, medium, or extra large, take into account the scale of the deployment, the number of VMs and cores, and the associated capacity and performance requirements. You need to make these assessments for each deployment.

The options in the table correspond to deployments described in the architectures that follow. In all scenarios, SASDATA is hosted on an Azure NetApp Files NFS volume and shared across the compute nodes. For some RHEL distributions, we recommend using the NFS nconnect option to create multiple network flows to the volume. For more information, see the NFS mount options section of this article.

Temporary storage architecture

Diagram that shows a temporary storage architecture.

For smaller SASWORK capacity requirements, Azure VM temporary storage is a fast and cost-effective solution. In this architecture, each VM in the compute tier is equipped with some temporary storage. To determine the temporary storage sizes for the VMs you use, see the Azure VM documentation.

Dataflow

  • A compute node reads input data from SASDATA and writes results back to SASDATA.
  • A subsequent part of the analytics job can be run by another node in the compute tier. It uses the same procedure to obtain and store the information that it needs to process.
  • The temporary work directory SASWORK isn't shared. It's stored in temporary storage on each compute node.

Managed disk architecture

Diagram that shows a managed disk architecture.

If your capacity requirements for SASWORK exceed the capacities available in temporary storage, Azure managed disks are a good alternative. Managed disks are available in various sizes and performance levels. For more information, see Scalability and performance targets for VM disks.

Dataflow

  • A compute node reads input data from SASDATA and writes results back to SASDATA.
  • A subsequent part of the analytics job can be run by another node in the compute tier. It uses the same procedure to obtain and store the information that it needs to process.
  • The temporary work directory SASWORK isn't shared. It's stored on managed disks that are attached to each compute node.

Azure NetApp Files architecture

Diagram that shows an Azure NetApp Files architecture.

For higher SASWORK capacity and/or medium performance requirements, consider using Azure NetApp Files. Azure NetApp Files provides volume capacities as high as 100 TiB. Each node in the compute tier should have its own SASWORK volume. The volumes shouldn't be shared.

Dataflow

  • A compute node reads input data from SASDATA and writes results back to SASDATA.
  • A subsequent part of the analytics job can be run by another node in the compute tier. It uses the same procedure to obtain and store the information that it needs to process.
  • The temporary work directory SASWORK isn't shared. It's stored on individual Azure NetApp Files volumes that are attached to each compute node.

Scale and configuration recommendations

RHEL distributions and NFS settings

RHEL distributions

RHEL is the recommended distribution for running SAS 9 on Linux. Each kernel supported by Red Hat has its own NFS bandwidth constraints.

For specifics about running SAS on Azure, see Best Practices for Using SAS on Azure.

Azure Standard_E64-16ds_v4 and Standard_E64-32ds_v4 VMs, or their v5 equivalents, are recommended for SAS. Taking these recommendations into account, this section provides some guidelines for using SAS with Azure NetApp Files.

  • If you use RHEL 7, Standard_E64-16ds_v4 or Standard_E64-16ds_v5 is the best choice, based on the 100-MiB/s per physical core target for SASDATA.

    • Standard_E64-16ds_v4: 90–100 MiB/s per core
    • Standard_E64-32ds_v4: 45-50 MiB/s per core
  • If you use RHEL 8.2, either Standard_E64-16ds_v4 or Standard_E64-32ds_v4, or their v5 equivalents, are possible options. Standard_E64-16ds_v4 is preferable, given the 100-MiB/s per core target for SASDATA.

    • Standard_E64-16ds_v4: 150-160 MiB/s per core
    • Standard_E64-32ds_v4: 75-80 MiB/s per core
  • If you use RHEL 8.3, both Standard_E64-16ds_v4 and Standard_E64-32ds_v4, or their v5 equivalents, are fully acceptable, given the per-core throughput target:

    • Validation indicates 3,200 MiB/s of reads.
    • These results are achieved with the NFS nconnect mount option.

Testing shows that a single RHEL 7 instance achieves no more than roughly 750-800 MiB/s of read throughput against a single Azure NetApp Files storage endpoint (that is, against a network socket). 1,500 MiB/s of writes are achievable against the same endpoint, if you use 64-KiB rsize and wsize NFS mount options. Some evidence suggests that the previously noted read throughput ceiling is an artifact of the 3.10 kernel. For more information, see RHEL CVE-2019-11477.

Testing shows that a single RHEL 8.2 instance, with its 4.18 kernel, is free of the limitations noted in the 3.10 kernel. So 1,200-1,300 MiB/s of read traffic is achievable, if you use a 64-KiB rsize and wsize NFS mount option. For large sequential writes, you can expect the same 1500 MiB/s of achievable throughput that you'd get on RHEL 7.

With a single RHEL 8.3 instance, with the nconnect mount option (which is new in the RHEL 8.3 distribution), about 3,200 MiB/s read throughput is achievable from a single Azure NetApp Files volume. Don't expect more than 1,500 MiB/s of writes to an Azure NetApp Files single volume, even when you apply nconnect.

Kernel tunables

Slot table entries

NFSv3 doesn't have a mechanism to negotiate concurrency between the client and the server. The client and the server each define their limits without awareness of the other. For the best performance, you should line up the maximum number of client-side sunrpc slot table entries with that supported without pushback on the server. When a client overwhelms the server network stack's ability to process a workload, the server responds by decreasing the window size for the connection, which isn't ideal for performance.

By default, modern Linux kernels define the per-connection sunrpc slot table entry size sunrpc.max_tcp_slot_table_entries to support 65,536 outstanding operations. These slot table entries define the limits of concurrency. Values this high are unnecessary because Azure NetApp Files defaults to 128 outstanding operations.

We recommend that you to tune the client to the same number:

  • Kernel tunables (via /etc/sysctl.conf)
    • sunrpc.tcp_max_slot_table_entries=128

File system cache tunables

You also need to understand the following factors about file system cache tunables:

  • Flushing a dirty buffer leaves the data in a clean state, usable for future reads until memory pressure leads to eviction.
  • There are three triggers for an asynchronous flush operation:

These factors are controlled by four tunables. You can tune each tunable dynamically and persistently by using tuned or sysctl in the /etc/sysctl.conf file. Tuning these variables improves performance for SAS Grid:

  • Kernel tunables (via custom tuned profile)
    • include = throughput-performance
    • vm.dirty_bytes = 31457280
    • vm.dirty_expire_centisecs = 100
    • vm.dirty_writeback_centisecs = 300

NFS mount options

We recommend the following NFS mount options for NFS shared file systems that are used for permanent SASDATA files:

RHEL 7 and 8.2

bg,rw,hard,rsize=65536,wsize=65536,vers=3,noatime,nodiratime,rdirplus,acdirmin=0,tcp,_netdev

RHEL 8.3

bg,rw,hard,rsize=65536,wsize=65536,vers=3,noatime,nodiratime,rdirplus,acdirmin=0,tcp,_netdev,nconnect=8

We recommend the following mount options for SASWORK volumes, where the respective volumes are used exclusively for SASWORK and not shared between nodes:

RHEL 7 and 8.2

bg,rw,hard,rsize=65536,wsize=65536,vers=3,noatime,nodiratime,rdirplus,acdirmin=0,tcp,_netdev,nocto

RHEL 8.3

bg,rw,hard,rsize=65536,wsize=65536,vers=3,noatime,nodiratime,rdirplus,acdirmin=0,tcp,_netdev,nocto,nconnect=8

For more information on the benefits and cost of the nocto mount option, see Close-to-open consistency and cache attribute timers.

You should also review Azure NetApp Files: A shared file system to use with SAS Grid on MS Azure, including all updates in the comments.

NFS read-ahead settings

We recommend that you set the NFS read-ahead tunable for all RHEL distributions to 15,360 KiB. For more information, see How to persistently set read-ahead for NFS mounts.

Alternatives

The storage solution in the preceding architectures is highly available, as specified by the Azure NetApp Files service level agreement. For extra protection and availability, you can replicate the storage volumes to another Azure region by using Azure NetApp Files cross-region replication.

There are two key advantages to replicating the volumes via the storage solution:

  • There's no additional load on the application VMs.
  • This solution eliminates the need to run VMs in the destination region during normal operation.

The storage contents are replicated without the use of any compute infrastructure resources, and the destination region doesn't need to run the SAS software. The destination VMs don't need to be running to support this scenario.

The following architecture shows how the storage content on Azure NetApp Files is replicated to a second region, where the storage is populated with a replica of the production data. If there's a failover, the secondary region is brought online, and the VMs are started so production can resume in the second region. You need to reroute traffic to the second region by reconfiguring load balancers that aren't shown in the diagram.

Diagram that shows an architecture with cross-region replication.

The typical RPO for this solution is less than 20 minutes when the cross-region replication update interval is set to 10 minutes.

Dataflow

  • A compute node reads input data from SASDATA and writes results back to SASDATA.
  • A subsequent part of the analytics job can be run by another node in the compute tier. It uses the same procedure to obtain and store the information that it needs to process.
  • The temporary work directory SASWORK isn't shared. It's stored on individual Azure NetApp Files volumes that are attached to each compute node.
  • Azure NetApp Files cross-region replication asynchronously replicates the SASDATA volume, including all snapshots, to a DR region to facilitate failover if there's a regional disaster.

Considerations

These considerations implement the pillars of the Azure Well-Architected Framework, a set of guiding tenets that you can use to improve the quality of a workload. For more information, see Microsoft Azure Well-Architected Framework.

Reliability

Reliability ensures your application can meet the commitments you make to your customers. For more information, see Overview of the reliability pillar.

Azure NetApp Files provides a standard 99.99% availability SLA for all tiers and all supported regions. Azure NetApp Files also supports provisioning volumes in availability zones that you choose, and HA deployments across zones.

For improved RPO/RTO SLAs, integrated data protection with snapshots and backup is included with the service. Cross-region replication provides the same benefits across Azure regions.

Security

Security provides assurance against deliberate attacks and the abuse of your valuable data and systems. For more information, see Overview of the security pillar.

Azure NetApp Files provides a level of security because volumes are provisioned, and data traffic stays, within your virtual networks. There's no publicly addressable endpoint. All data is encrypted at rest at all times. You can optionally encrypt data-in-transit.

Azure Policy can help you enforce organizational standards and assess compliance at scale. Azure NetApp Files supports Azure Policy via custom and built-in policy definitions.

Performance efficiency

Performance efficiency is the ability of your workload to scale to meet the demands placed on it by users in an efficient manner. For more information, see Performance efficiency pillar overview.

Performance

Depending on your requirements for throughput and capacity, keep the following considerations in mind:

Note

Azure NetApp Files large volumes feature is now available. This feature provides higher per-volume throughput than regular Azure NetApp Files volumes do. This capability can be considered in case more performance is required for your SASDATA (or SASWORK) volumes. See this documentation for details.

Scalability

You can easily scale compute performance by adding VMs to the scale sets that run the three tiers of the SAS solution.

You can dynamically scale storage of Azure NetApp Files volumes. If you use automatic QoS, performance is scaled at the same time. For more granular control of each volume, you can also control the performance of each volume separately by using manual QoS for your capacity pools.

Azure NetApp Files volumes are available in three performance tiers: Ultra, Premium, and Standard. Choose the tier that best suits your performance requirements, taking into account that available performance bandwidth scales with the size of a volume. You can change the service level of a volume at any time. For more information about the Azure NetApp Files cost model, see these pricing examples.

You can use the Azure NetApp Files Performance Calculator to get started.

Cost optimization

Cost optimization is about reducing unnecessary expenses and improving operational efficiencies. For more information, see Overview of the cost optimization pillar.

Cost model

Understanding the cost model for Azure NetApp Files can help you manage your expenses.

Azure NetApp Files billing is based on provisioned storage capacity, which you allocate by creating capacity pools. Capacity pools are billed monthly based on a set cost per allocated GiB per hour.

If your capacity pool size requirements fluctuate (for example, because of variable capacity or performance needs), consider dynamically resizing your volumes and capacity pools to balance cost with your capacity and performance needs.

If your capacity pool size requirements remain the same but performance requirements fluctuate, consider dynamically changing the service level of a volume. You can provision and deprovision capacity pools of different types throughout the month, providing just-in-time performance and reducing costs during periods when you don't need high performance.

Pricing

Based on your capacity and performance requirements, decide which Azure NetApp Files service level you need (Standard, Premium, or Ultra). Then use the Azure Pricing calculator to evaluate the costs for these components:

  • SAS on Azure components
  • Azure NetApp Files
  • Managed disk (optionally)
  • Virtual network

Operational excellence

Operational excellence covers the operations processes that deploy an application and keep it running in production. For more information, see Overview of the operational excellence pillar.

SAS Grid on Azure provides flexibility and a fast deployment. Here are some benefits:

  • Meet changing business demands with dynamic workload balancing
  • Create a highly available SAS computing environment
  • Get faster results from your existing IT infrastructure
  • Grow computing resources incrementally and cost-effectively
  • Manage all your analytical workloads
  • Easily transition from a siloed server or multiple-PC environment to a SAS grid environment

Deploy this scenario

It's best to deploy the workloads by using an infrastructure as code (IaC) process. SAS workloads can be sensitive to misconfigurations that often occur in manual deployments and reduce productivity.

To get a start with designing your SAS Grid on Azure solution, review SAS on Azure Architecture and Automating SAS Deployment on Azure by using GitHub Actions.

Contributors

This article is maintained by Microsoft. It was originally written by the following contributors.

Principal authors:

Other contributors:

To see non-public LinkedIn profiles, sign in to LinkedIn.

Next steps