question

PorscheMe-6235 avatar image
0 Votes"
PorscheMe-6235 asked prmanhas-MSFT commented

L32sv2 vs DS5v2

We wanted to build 50TB Elasticsearch on Azure.
Replicas count: 3
Shard count: 1000

For doing this, we are considering below two VM SKUs.

DS5v2 – Intel Xeon processor
RAM: 56 GB
vCPU(s): 16
Managed Disk: Premium SSD

L32s v2: AMD EPYC 7551 processor
RAM: 256 GB
vCPU(s): 32
Local NVMe premium SSD disks attached to VMs (ephemeral)


By default, I was assuming we’d have to use managed disks as we couldn’t rely on the local temporary storage. That could get blown up at any time. Attached managed storage seems to be the way most people talk about supporting this scenario online.

However, the Lsv2-series has both temporary storage and NVMe disk. The spec sheet for it talks about it being ideal for “Big Data, SQL, and NoSQL databases.”. Which seems to fit our problem space.

L32sv2 is very promising, we wanted to know SLAs for this VM SKU(s)



azure-virtual-machinesazure-virtual-machines-scale-setazure-managed-disks
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@PorscheMe-6235 Any updates on the issue?

Please "Accept as Answer" if below response by @AndreasBaumgarten helped you so it can help others in community looking for help on similar topics.

Thanks

0 Votes 0 ·

1 Answer

AndreasBaumgarten avatar image
0 Votes"
AndreasBaumgarten answered AndreasBaumgarten commented

Hi @PorscheMe-6235 ,

as far as I know there are no dedicated SLAs for specific VM SKUs like L32sv2.

You will find the SLA for Azure Virtual Machines here:
https://azure.microsoft.com/en-us/support/legal/sla/virtual-machines/v1_9/


(If the reply was helpful please don't forget to upvote and/or accept as answer, thank you)

Regards
Andreas Baumgarten

· 5
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thanks for reply @AndreasBaumgarten.

We don’t really understand how to parse the idea of a ‘connectivity’ SLA. We don’t really care if we can’t connect to a VM for a few minutes occasionally.

We do really care if when I come back to it, all our data is gone 😊. Obviously occasionally loss of disks and/or machines is fine. Elasticsearch can handle that.

What’d be a problem is if Azure decided to restart 50% of them and they came back with clean disks. There’s no way to recover from without rebuilding everything from scratch, which’d take days.

0 Votes 0 ·

Hi @PorscheMe-6235 ,

related to disks the SLA you will find here: https://azure.microsoft.com/en-us/support/legal/sla/managed-disks/v1_0/
Link to Storage SLA

And maybe this is helpful as well: Locally-redundant storage for managed disks

Locally-redundant storage (LRS) replicates your data three times within a single data center in the selected region. LRS protects your data against server rack and drive failures.

https://docs.microsoft.com/en-us/azure/virtual-machines/disks-redundancy?tabs=azure-cli#locally-redundant-storage-for-managed-disks


(If the reply was helpful please don't forget to upvote and/or accept as answer, thank you)

Regards
Andreas Baumgarten

0 Votes 0 ·

Looks like you are recommending managed disk vs L32sv2 SKU(s) with has local NVMe on premium SSDs; did I read you correctly?

0 Votes 0 ·

Hi @PorscheMe-6235 ,

I just want to explain how managed disks are managed in Azure. And with 3 copies of the disk in different stamps in the datacenter it should be "ok". Also the fact the managed disks are non-ephemeral disks is a +.

I see the biggest concern with the Local NVMe Disks. As they are ephemeral all data will be lost on these disks if the VM gets stopped or deallocated.

Local NVMe Disks are ephemeral, data will be lost on these disks if you stop/deallocate your VM.

https://docs.microsoft.com/en-us/azure/virtual-machines/lsv2-series

On the other hand the managed disks, for instance Premium SSD, are not that fast.
https://docs.microsoft.com/en-us/azure/virtual-machines/disks-types#premium-ssd

Hard decision ;-) But at the end: If it needs days to rebuild the data ... What is more important, the performance or the use of non-ephemeral disks?


(If the reply was helpful please don't forget to upvote and/or accept as answer, thank you)

Regards
Andreas Baumgarten





0 Votes 0 ·

Thanks @AndreasBaumga.

We are going with Local NVMe disks combined with Elasticsearch replication and 3 different availability zones.

Rational for above is...
- majority of operations (reboot or planned maintenance) will still leave us with access to the local data (even if there's temp connectivity issues)
- If we do encounter bad hardware then the natural data replication/recovery policy should handle it
- Widespread outage for a zone, the other zone should handle it


0 Votes 0 ·