Hello everyone. I've been trying to achieve a robust network design for infrastructure on the Azure platform. I've established vnets in both West US and South Central US. The vnets are peered and the following two options are enabled.
wus-vnet
Allow wus-vnet to access scus-vnet
Allow wus-vnet to receive forwarded traffic from scus-vnet
scus-vnet
Allow scus-vnet to access wus-vnet
Allow scus-vnet to receive forwarded traffic from wus-vnet
I've deployed an NVA (FortiGate) perimeter router to each vnet. The goal being that either NVA can serve as a perimeter gateway for both vnets in the event one or the other NVA failed or was down for maintenance. An alternative if you will to an HA firewall/router setup which IMO isn't a great option on a VM platform. This design also lends itself well to leveraging Azure Site Recovery between regions with the perimeter already staged and ready.
The real trick was getting all of the routing to update within our on-prem environment (which speaks OSPF) and Azure network (which is only compatible with BGP) to update automatically and correctly under different failover scenarios. There is a smattering of route-maps that exist on these NVAs handling route suppression and redistribution of enterprise routes. As well as the propagation of the zero route from Azure out to the internet.
All seems to be working, but I'm struggling to find anyone else that has been down this road. It seems you can run into someone who knows networking, but not Azure or the reverse being strong in Azure, but lacking in networking experience.
What I find troubling is the FAQ related to the Azure route server I am leveraging to accomplish this design.
https://learn.microsoft.com/en-us/azure/route-server/route-server-faq
Specifically, the question "Can I peer two Azure Route Servers in two peered virtual networks and enable the NVAs connected to the Route Servers to talk to each other"
This isn't exactly what I am doing, but close. In my case, each of the NVAs has a peering with the Route Server in each vnet. I'm using AS path prepending to make the remote NVA look less desirable. So the remote NVA isn't used unless the local NVA is down.
The other detail that caught my eye on the FAQ page was the question "Does Azure Route Server support virtual network peering?" Microsoft's answer to this question talks about enabling the vnet peering feature "Use the remote virtual network's gateway or Route Server". Initially I was hopeful this basically would peer two route servers together, but it seems this can't be enabled if you have a Route Server deployed in both vnets. You get an error with language like, "Failed to save virtual network peering vnet already has a gateway configured". This peering feature appears to be geared for a vnet that has no gateway or Route Server in it, which is not compatible with my design.
A high-level overview of the design below.. So here I am with a working solution that seems to meet our needs. We have a point of entry into our Azure network from both West US and South Central US and all workloads reconverge automatically if either NVA is taken offline. Obviously, latency goes up a bit if WUS flows through SCUS to the outside world, but still acceptable. Have I committed a network or Azure crime here? I'm meeting with a 3rd party to review this and provide input soon. Would be great to get this "certified" in some way before we start moving critical workloads onto the platform. I haven't seen a whitepaper from Microsoft on config examples or suggestions.