This example scenario provides architecture and design guidance for any organization that wants to perform image-based modeling on Azure infrastructure-as-a-service (IaaS). The scenario is designed for running photogrammetry software on Azure Virtual Machines (VMs) using high-performance storage that accelerates processing time. The environment can be scaled up and down as needed and supports terabytes of storage without sacrificing performance.
Relevant use cases
Relevant use cases include:
- Modeling and measuring buildings, engineering structures, and forensic accident scenes.
- Creating visual effects for computer games and movies.
- Using digital images to indirectly generate measurements of objects of various scales as in urban planning and other applications.
This example describes the use of Agisoft PhotoScan photogrammetry software backed by Avere vFXT storage. PhotoScan was chosen for its popularity in geographic information system (GIS) applications, cultural heritage documentation, game development, and visual effects production. It is suitable for both close-range photogrammetry and aerial photogrammetry.
The concepts in this article apply to any high-performance computing (HPC) workload based on a scheduler and worker nodes managed as infrastructure. For this workload, Avere vFXT was selected for its superior performance during benchmark tests. However, the scenario decouples the storage from the processing so that other storage solutions can be used (see alternatives later in this document).
This architecture also includes Active Directory domain controllers to control access to Azure resources and provide internal name resolution through the Domain Name System (DNS). Jump boxes provide administrator access to the Windows and Linux VMs that run the solution.
- User submits a number of images to PhotoScan.
- The PhotoScan Scheduler runs on a Windows VM that serves as the head node and directs processing of the user's images.
- PhotoScan searches for common points on the photographs and constructs the geometry (mesh) using the PhotoScan processing nodes running on VMs with graphics processing units (GPUs).
- Avere vFXT provides a high-performance storage solution on Azure based on Network File System version 3 (NFSv3) and comprised of at least four VMs.
- PhotoScan renders the model.
- Agisoft PhotoScan: The PhotoScan Scheduler runs on a Windows 2016 Server VM, and the processing nodes use five VMs with GPUs that run CentOS Linux 7.5.
- Avere vFXT is a file caching solution that uses object storage and traditional network-attached storage (NAS) to optimize storage of large datasets. It includes:
- Avere Controller. This VM executes the script that installs the Avere vFXT cluster and runs Ubuntu 18.04 LTS. The VM can be used later to add or remove cluster nodes and to destroy the cluster as well.
- vFXT cluster. At least three VMs are used, one for each of the Avere vFXT nodes based on Avere OS 184.108.40.206. These VMs form the vFXT cluster, which is attached to Azure Blob storage.
- Microsoft Active Directory domain controllers allow the host access to domain resources and provide DNS name resolution. Avere vFXT adds a number of A records — for example, each A record in a vFXT cluster points to the IP address of each Avere vFXT node. In this setup, all VMs use the round-robin pattern to access vFXT exports.
- Other VMs serve as jump boxes used by the administrator to access the scheduler and processing nodes. The Windows jumpbox is mandatory to allow the administrator to access the head node via remote desktop protocol. The second jumpbox is optional and runs Linux for administration of the worker nodes.
- Network security groups limit access to the public IP address (PIP) and allow ports 3389 and 22 for access to the VMs attached to the Jumpbox subnet.
- Virtual network peering connects a PhotoScan virtual network to an Avere virtual network.
- Azure Blob storage works with Avere vFXT as the core filer to store the committed data being processed. Avere vFXT identifies the active data stored in Azure Blob and tiers it into solid-state drives (SSD) used for caching in its compute nodes while a PhotoScan job is running. If changes are made, the data is asynchronously committed back to the core filer.
- Azure Key Vault is used to store the administrator passwords and PhotoScan activation code.
- To take advantage of Azure services for managing an HPC cluster, use tools such as Azure CycleCloud or Azure Batch instead of managing the resources through templates or scripts.
- Deploy the BeeGFS parallel virtual file system as the back-end storage on Azure instead of Avere vFXT. Use the BeeGFS template to deploy this end-to-end solution on Azure.
- Deploy the storage solution of your choice, such as GlusterFS, Lustre, or Windows Storage Spaces Direct. To do this, edit the PhotoScan template to work with the storage solution you want.
- Deploy the worker nodes with the Windows operating system instead of Linux, the default option. When choosing Windows nodes, storage integration options are not executed by the deployment templates. You must manually integrate the environment with an existing storage solution, or customize the PhotoScan template to provide such automation, as described in the repository.
This scenario is designed specifically to provide high-performance storage for an HPC workload, whether it is deployed on Windows or Linux. In general, the storage configuration of the HPC workload should match the appropriate best practices used for on-premises deployments.
Deployment considerations depend on the applications and services used, but a few notes apply:
- When building high-performance applications, use Azure Premium Storage and optimize the application layer. Optimize storage for frequent access using Azure Blob hot tier access.
- Use a storage replication option that meets your availability and performance requirements. In this example, Avere vFXT is configured for high availability by default, with locally redundant storage (LRS). For load balancing, all VMs in this setup use the round-robin pattern to access vFXT exports.
- If the backend storage will be consumed by both Windows clients and Linux clients, use Samba servers to support the Windows nodes. A version of this example scenario based on BeeGFS uses Samba to support the scheduler node of the HPC workload (PhotoScan) running on Windows. A load balancer is deployed to act like a smart replacement for DNS round robin.
- Run HPC applications using the VM type best suited for your Windows or Linux workload.
- To isolate the HPC workload from the storage resources, deploy each in its own virtual network, then use virtual network peering to connect the two. Peering creates a low-latency, high-bandwidth connection between resources in different virtual networks and routes traffic through the Microsoft backbone infrastructure through private IP addresses only.
This example focuses on deploying a high-performance storage solution for an HPC workload and is not a security solution. Make sure to involve your security team for any changes.
For added security, this example infrastructure enables all the Windows VMs to be domain-joined and uses Active Directory for central authentication. It also provides custom DNS services for all VMs. To help protect the environment, this template relies on network security groups. Network security group offer basic traffic filters and security rules.
Consider the following options to further improve security in this scenario:
- Use network virtual appliances such as Fortinet, Checkpoint, and Juniper.
- Apply Azure role-based access control (Azure RBAC) to the resource groups.
- Enable VM JIT access if jump boxes are accessed via the Internet.
- Use Azure Key Vault to store the passwords used by administrator accounts.
The cost of running this scenario can vary greatly depending on multiple factors. The number and size of VMs, how much storage is required, and the amount of time to complete a job will determine your cost.
The following sample cost profile in the Azure pricing calculator is based on a typical configuration for Avere vFXT and PhotoScan:
- 1 A1_v2 Ubuntu VM to run the Avere controller.
- 3 D16s_v3 Avere OS VMs, one for each of the Avere vFXT nodes that form the vFXT cluster.
- 5 NC24_v2 Linux VMs to provide the GPUs needed by the PhotoScan processing nodes.
- 1 D8s_v3 CentOS VM for the PhotoScan scheduler node.
- 1 DS2_v2 CentOS used as administrator jumpbox.
- 2 DS2_v2 VMs for the Active Directory domain controllers.
- Premium managed disks.
- General purpose v2 (GPv2) Blob storage with LRS and hot tier access (only GPv2 storage accounts expose the Access Tier attribute).
- Virtual network with support for 10 TB data transfer.
For details about this architecture, see the e-book. To see how the pricing would change for your particular use case, choose different VM sizes in the pricing calculator to match your expected deployment.
For step-by-step instructions for deploying this architecture, including all the prerequisites for using either Avere FxT or BeeGFS, download the e-book Deploy Agisoft PhotoScan on Azure With Avere vFXT for Azure or BeeGFS.
The following resources will provide more information on the components used in this scenario, along with alternative approaches for batch computing on Azure.
- Overview of Avere vFXT for Azure
- Agisoft PhotoScan home page
- Azure Storage Performance and Scalability Checklist
- Parallel Virtual File Systems on Microsoft Azure: Performance tests of Lustre, GlusterFS, and BeeGFS (PDF)
- An example scenario for computer-aided engineering (CAE) on Azure
- HPC on Azure home page
- Overview of Big Compute: HPC & Microsoft Batch