Projects

A project is a collection of resources which define node configurations. Projects contain specs. When a node starts, it is configured by processing and running a sequence of specs.

Azure CycleCloud uses projects to manage clustered applications, such as batch schedulers. In the CycleCloud HPCPack, the project is a hn spec and cn spec which define the configurations and recipes for HPCPack headnode and computenode.

Below is a partial node definition. The docker-registry node will run three specs: bind spec from the okta project version 1.3.0, as well as core and registry specs from the docker project version 2.0.0:

[[node docker-registry]]
    Locker = base-storage
    [[[cluster-init okta:bind:1.3.0]]]
    [[[cluster-init docker:core:2.0.0]]]
    [[[cluster-init docker:registry:2.0.0]]]

The trailing tag is the project version number.

[[[cluster-init <project>:<spec>:<project version>]]]

A locker is a reference to a storage account container and credential. Nodes have a default locker, so this attribute is not strictly necessary.

Azure CycleCloud uses a shorthand for storage accounts, so https://mystorage.blob.core.windows.net/mycontainer can be written as az://mystorage/mycontainer.

The node will download each project it references from the locker using the pogo tool:

pogo get az://mystorage/mycontainer/projects/okta/1.3.0/bind

If a project is defined on a node but does not exist in the expected storage location then the node will report a Software Installation Failure to CycleCloud.

CycleCloud has internal projects that run by default on all nodes to perform special volume and network handling and setup communication to CycleCloud. These internal projects are mirrored to the locker automatically.

The user is responsible to mirroring any additional projects to the locker. The CycleCloud CLI has methods to compose projects:

cyclecloud project init myproject

and mirror:

cyclecloud project init mylocker

projects to lockers.

Specs are made up of python, shell, or powershell scripts.

Create a New Project

To create a new project, use the CLI command cyclecloud project init myproject, where myproject is the name of the project you wish to create. This will create a project called "myproject", with a single spec named "default" that you can change. The directory tree will be created with skeleton files you will amend to include your own information.

Directory Structure

The following directories will be created by the project command:

      \myproject
          ├── project.ini
          ├── blobs
          ├── templates
          ├── specs
          │   ├── default
          │     └── cluster-init
          │        ├── scripts
          │        ├── files
          │        └── tests
          │     └── chef
          │         ├── site-cookbooks
          │         ├── data_bag
          │         └── roles

The templates directory will hold your cluster templates, while specs will contain the specifications defining your project. spec has two subdirectories: cluster-init and custom chef. cluster-init contains directories which have special meaning, such as the scripts directory (contains scripts that are executed in lexicographical order on the node), files (raw data files to will be put on the node), and tests (contains tests to be run when a cluster is started in testing mode).

The custom chef subdirectory has three directories: site-cookbooks (for cookbook definitions), data_bags (databag definitions), and roles (chef role definition files).

project.ini

project.ini is the file containing all the metadata for your project. It can contain:

Parameter Description
name Name of the project. Words must be separated by dashes, e.g. order-66-2018
label Name of the project. Long name (with spaces) of the cluster for display purposes.
type Three options: scheduler, application, <blank>. Determines the type of project and generates the appropriate template. Default: application
version Format: x.x.x

Lockers

Project contents are stored within a locker. You can upload the contents of your project to any locker defined in your CycleCloud install via the command cyclecloud project upload (locker), where (locker) is the name of a cloud storage locker in your CycleCloud install. This locker will be set as the default target. Alternatively, you can see what lockers are available to you with the command cyclecloud locker list. Details about a specific locker can be viewed with cyclecloud locker show (locker).

If you add more than one locker, you can set your default with cyclecloud project default_target (locker), then simply run cyclecloud project upload. You can also set a global default locker that can be shared by projects with the command cyclecloud project default locker (locker) -global.

Note

Default lockers will be stored in the cyclecloud config file (usually located in ~/.cycle/config.ini), not in the project.ini. This is done to allow project.ini to be version controlled.

Uploading your project contents will zip the chef directories and sync both chef and cluster init to your target locker. These will be stored at:

  • (locker)/projects/(project)/(version)/(spec_name)/cluster-init
  • (locker)/projects/(project)/(version)/(spec_name)/chef

Blob Download

Use project download to download all blobs referenced in the project.ini to your local blobs directory. The command uses the [locker] parameter and will attempt to download blobs listed in project.ini from the locker to local storage. An error will be returned if the files cannot be located.

Project Setup

Specs

When creating a new project, a single default spec is defined. You can add additional specs to your project via the cyclecloud project add_spec command.

Versioning

By default, all projects have a version of 1.0.0. You can set a custom version as you develop and deploy projects by setting version=x.y.z in the project.ini file.

For example, if "locker_url" was "az://my-account/my-container/projects", project was named "Order66", version was "1.6.9", and the spec is "default", your url would be:

  • az://my-account/my-container/projects/Order66/1.6.9/default/cluster-init
  • az://my-account/my-container/projects/Order66/1.6.9/default/chef

Blobs

There are two types of blob: project blobs and user blobs.

Project Blobs

Project Blobs are binary files provided by the author of the project with the assumption that they can be distributed (i.e. a binary file for an open source project you are legally allowed to redistribute). Project Blobs go into the "blobs" directory of a project, and when uploaded to a locker they will be located at /project/blobs.

To add blobs to projects, add the file(s) to your project.ini:

[[blobs optionalname]]
  Files = projectblob1.tgz, projectblob2.tgz, projectblob3.tgz

Multiple blobs can be separated by a comma. You can also specify the relative path to the project's blob directory.

User Blobs

User Blobs are binary files that the author of the project cannot legally redistribute, such as UGE binaries. These files are not packaged with the project, but instead must be staged to the locker manually. The files will be located at /blobs/my-project/my-blob.tgz. User Blobs do not need to be defined in the project.ini.

To download any blob, use the jetpack download command from the CLI, or the jetpack_download Chef resource. CycleCloud will look for the user blob first. If that file is not located, the project level blob will be used.

Note

It is possible to override a project blob with a user blob of the same name.

Specify Project within a Cluster Template

Project syntax allows you to specify multiple specs on your nodes. To define a project, use the following:

[[[cluster-init myspec]]]
  Project = myproject # inferred from name
  Version = x.y.z
  Spec = default  # (alternatively, you can name your own spec to be used here)
  Locker = default  # (optional, will use default locker for node)

Note

The name specified after 'spec' can be anything, but can and should be used as a shortcut to define some > common properties.

You can also apply multiple specs to a given node as follows:

[[node scheduler]]
  [[[cluster-init myspec]]]
  Project = myproject
  Version = x.y.z
  Spec = default  # (alternatively, you can name your own spec to be used here)
  Locker = default  # (optional, will use default locker for node)

[[[cluster-init otherspec]]]
Project = otherproject
Version = a.b.c
Spec = otherspec  # (optional)

By separating the project name, spec name, and version with colons, CycleCloud can parse those values into the appropriate Project/Version/Spec settings automatically:

[[node scheduler]]
  AdditionalClusterInitSpecs = $ClusterInitSpecs
  [[[cluster-init myproject:myspec:x.y.z]]]
  [[[cluster-init otherproject:otherspec:a.b.c]]]

Specs can also be inherited between nodes. For example, you can share a common spec between all nodes, then run a custom spec on the scheduler node:

[[node defaults]]
[[[cluster-init my-project:common:1.0.0]]]
Order = 2 # optional
[[node scheduler]]
[[[cluster-init my-project:scheduler:1.0.0]]]
Order = 1 # optional

[[nodearray execute]]
[[[cluster-init my-project:execute:1.0.0]]]
   Order = 1 # optional

This would apply both the common and scheduler specs to the scheduler node, while only applying the common and execute specs to the execute nodearray.

By default, the specs will be run in the order they are shown in the template, running inherited specs first. Order is an optional integer set to a default of 1000, and can be used to define the order of the specs.

If only one name is specified in the [[[cluster-init]]] definition, it will be assumed to be the spec name. For example:

[[[cluster-init myspec]]]
Project = myproject
Version = 1.0.0

is a valid spec setup in which Spec=myspec is implied by the name.

run_list

You can specify a runlist at the project or spec level within your project.ini:

[spec scheduler]
run_list = role[a], recipe[b]

When a node includes the spec "scheduler", the run_list defined will be automatically appended to any previously-defined runlist. For example, if my run_list defined under [configuration] was run_list = recipe[test], the final runlist would be run_list = recipe[cyclecloud], recipe[test], role[a], recipe[b], recipe[cluster_init].

You can also overwrite a runlist at the spec level on a node. This will replace any run_list included in the project.ini. For example, if we changed the node definition to the following:

[cluster-init test-project:scheduler:1.0.0]
run_list = recipe[different-test]

The runlist defined in the project would be ignored, and the above would be used instead. The final runlist on the node would then be run_list = recipe[cyclecloud], recipe[test], recipe[different-test], recipe[cluster_init].

Note

runlists are specific to chef and do not apply otherwise.

File Locations

The zipped chef files will be downloaded during the bootstrapping phase of node startup. They are downloaded to $JETPACK_HOME/system/chef/tarballs and unzipped to $JETPACK_HOME/system/chef/chef-repo/, and used when converging the node.

Note

To run custom cookbooks, you MUST specify them in the run_list for the node.

The cluster-init files will be downloaded to /mnt/cluster-init/(project)/(spec)/. For "my-project" and "my-spec", you will see your scripts, files, and tests located in /mnt/cluster-init/my-project/my-spec.

Syncing Projects

CycleCloud projects can be synced from mirrors into cluster local cloud storage. Set a SourceLocker attribute on a [cluster-init] section within your template. The name of the locker specified will be used as the source of the project, and contents will synced to the your locker at cluster start. You can also use the name of the locker as the first part of the cluster-init name. For example, if the source locker was "cyclecloud", the following two definitions are the same:

[cluster-init my-project:my-spect:1.2.3]
  SourceLocker=cyclecloud

[cluster-init cyclecloud/my-proect:my-spec:1.2.3]

Large File Storage

Projects supports large files. At the top level of a newly created project you will see a "blobs" directory for your large files (blobs). Please note that blobs placed in this directory have a specific purpose, and will act differently than the items within the "files" directory.

Items within the "blobs" directory are spec and version independent: anything in "blobs" can be shared between specs or project versions. As an example, an installer for a program that changes infrequently can be stored within "blobs" and referenced within your project.ini. As you iterate on versions of your project, that single file remains the same and is only copied into your cloud storage once, which saves on transfer and storage cost.

To add a blob, simply place a file into the "blobs" directory and edit your project.ini to reference that file:

[blobs]
  Files=big_file1.tgz

When you use the project upload command, all blobs referenced in the project.ini will be transferred into cloud storage.

Log Files

Log files generated when running cluster-init are located in $JETPACK_HOME/logs/cluster-init/(project)/(spec).

Run Files

When a cluster-init script is run successfully, a file is placed in /mnt/cluster-init/.run/(project)/(spec) to ensure it isn't run again on a subsequent converge. If you want to run the script again, delete the appropriate file in this directory.

Script Directories

When CycleCloud executes scripts in the scripts directory, it will add environment variables to the path and name of the spec and project directories:

CYCLECLOUD_PROJECT_NAME
CYCLECLOUD_PROJECT_PATH
CYCLECLOUD_SPEC_NAME
CYCLECLOUD_SPEC_PATH

On linux, a project named "test-project" with a spec of "default" would have paths as follows:

CYCLECLOUD_PROJECT_NAME = test-project
CYCLECLOUD_PROJECT_PATH = /mnt/cluster-init/test-project
CYCLECLOUD_SPEC_NAME = default
CYCLECLOUD_SPEC_PATH = /mnt/cluster-init/test-project/default

Run Scripts Only

To run ONLY the cluster-init scripts:

jetpack converge --cluster-init

Output from the command will both go to STDOUT as well as jetpack.log. Each script will also have its output logged to:

      $JETPACK_HOME/logs/cluster-init/(project)/(spec)/scripts/(script.sh).out

Custom chef and Composable Specs

Each spec has a chef directory in it. Before a converge, each spec will be untarred and extracted into the local chef-repo, replacing any existing cookbooks, roles, and data bags with the same name(s). This is done in the order the specs are defined, so in the case of a naming collision, the last defined spec will always win.

jetpack download

To download a blob within a cluster-init script, use the command jetpack download (filename) to pull it from the blobs directory. Running this command from a cluster-init script will determine the project and base URL for you. To use it in a non-cluster-init context, you will need to specify the project (see --help for more information).

For chef users, a jetpack_download LWRP has been created:

jetpack_download "big-file1.tgz" do
  project "my-project"
  end

In chef, the default download location is #{node[:jetpack][:downloads]}. To change the file destination, use the following:

jetpack_download "foo.tgz" do
  project "my-project"
  dest "/tmp/download.tgz"
end

When used within chef, you must specify the project.