question

DeMatteisTiziano-5142 avatar image
0 Votes"
DeMatteisTiziano-5142 asked vipullag-MSFT answered

Error in creating cyclecloud cluster (No nodes found for nodearray hpc)

Hello,

I'm trying to setup my first CycleCloud cluster, but I keep getting error in the initialization phase.

In particular, it complains about not finding nodes for "nodearray hpc".

The full error message:

 CycleCloud Version: 8.2.0-1616
 Cluster: Test2 (version 8.2.x)
 ==============================
    
 Status: Error [Software Configuration] (retrying)
 Start Time: 2022-03-13T17:31:29.377Z
    
 Description: Unable to execute command `"bash"  "/tmp/chef-script20220313-15831-14pjjqq"` (exit code 1)
    
 Detail: 
 STDOUT: 
 STDERR: Upgrade not required!
 Bucket has a max_count <= 0, defined for machinetype=='Standard_F2s_v2'. Skipping
 /opt/cycle/slurm/cyclecloud_slurm.py:571: DeprecationWarning: The 'warn' function is deprecated, use 'warning' instead
   logging.warn("No nodes were created for nodearray %s using name format %s and offset %s: %s", request_set.nodearray, request_set.name_format,
 No nodes were created for nodearray hpc using name format hpc-pg0-%d and offset 1: Limited by 200 total cores (10 of Standard_D4_v2) quota in eastus
 Bucket has a max_count <= 0, defined for machinetype=='Standard_F2s_v2'. Skipping
 Unhandled failure.
 Traceback (most recent call last):
   File "/opt/cycle/slurm/cyclecloud_slurm.py", line 1101, in <module>
     main()
   File "/opt/cycle/slurm/cyclecloud_slurm.py", line 1078, in main
     args.func(**kwargs)
   File "/opt/cycle/slurm/cyclecloud_slurm.py", line 296, in generate_slurm_conf
     _generate_slurm_conf(partitions, writer, subprocess)
   File "/opt/cycle/slurm/cyclecloud_slurm.py", line 222, in _generate_slurm_conf
     raise RuntimeError("No nodes found for nodearray %s. Please run 'cyclecloud_slurm.sh create_nodes' first!" % partition.nodearray)
 RuntimeError: No nodes found for nodearray hpc. Please run 'cyclecloud_slurm.sh create_nodes' first!
 Traceback (most recent call last):
   File "/opt/cycle/jetpack/system/embedded/lib/python3.8/runpy.py", line 194, in _run_module_as_main
     return _run_code(code, main_globals, None,
   File "/opt/cycle/jetpack/system/embedded/lib/python3.8/runpy.py", line 87, in _run_code
     exec(code, run_globals)
   File "/opt/cycle/slurm/cyclecloud_slurm.py", line 1101, in <module>
     main()
   File "/opt/cycle/slurm/cyclecloud_slurm.py", line 1078, in main
     args.func(**kwargs)
   File "/opt/cycle/slurm/cyclecloud_slurm.py", line 296, in generate_slurm_conf
     _generate_slurm_conf(partitions, writer, subprocess)
   File "/opt/cycle/slurm/cyclecloud_slurm.py", line 222, in _generate_slurm_conf
     raise RuntimeError("No nodes found for nodearray %s. Please run 'cyclecloud_slurm.sh create_nodes' first!" % partition.nodearray)
 RuntimeError: No nodes found for nodearray hpc. Please run 'cyclecloud_slurm.sh create_nodes' first!
 EXCEPTION: bash[Create cyclecloud.conf] (slurm::scheduler line 156) had an error: Mixlib::ShellOut::ShellCommandFailed: Expected process to exit with [0], but received '1'
    
    
 Affected Nodes (1):
 ---
 Node Name: scheduler
 Hostname: ip-0A000005
 IP Address: 10.0.0.5
 Azure Resource ID: /subscriptions/8e7cef8b-5b7e-469a-8dd8-4cb4835f1727/resourceGroups/Test2-MEYTCMLBMU2GMLJQGU4DCLJUGA/providers/Microsoft.Compute/virtualMachines/scheduler-MIZGGNZQGRQWGLJVGUYGCLJUGB
 Azure VM ID: df70aba9-e8a0-4f15-8d86-764423838920
 Cluster-Init: slurm:default:2.4.7, slurm:scheduler:2.4.7
 Node ID: be007f51-9d51-4a42-ae4b-0eab606c563a


Any suggestion on what could be the cause?

My cluster configuration is the following:
- cycle cloud 8.2
- slurm as cluster manager
- Standard_D12_v2 as instance for the Scheduler, Standard_D4_v2 for HPC instances and Standard_F2s_v2 for HTC instances
- I've selected 16 as maximum HPC cores (I wanted to have just two machines)
- the image selected is Centos 7 for all the different instances

Thank you for your support!


azure-cyclecloud
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@DeMatteisTiziano-5142

I think issue is related to quota. Can you please check if you have the sufficient quota for hpc node array ?

Check the quotas in your subscription - may be you can use a different SKU and test the configuration. Besides that D series v2 is quite outdated, v3 and v4 are available, can you try with a more updated D series size?

0 Votes 0 ·

Hi @vipullag-MSFT

I'm not sure about the "HPC node array" quota (I didn't find it in my quotas page), but changing the SKU to v4 did the trick and the initialization has been successful.

Thanks!

0 Votes 0 ·
vipullag-MSFT avatar image vipullag-MSFT DeMatteisTiziano-5142 ·

@DeMatteisTiziano-5142

Thanks for confirming that the issue is resolved, please 'Accept as answer', so that it can help others in the community looking for help on similar topics.

0 Votes 0 ·

1 Answer

vipullag-MSFT avatar image
0 Votes"
vipullag-MSFT answered

@DeMatteisTiziano-5142

Thanks for reaching out on Microsoft Q&A Platform.

You can use a different SKU and test the configuration. Besides that D series v2 is quite outdated, v3 and v4 are available, can you try with a more updated D series size?

Hope this helps.
Please 'Accept as answer' if it helped, so that it can help others in the community looking for help on similar topics.

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.