rxPingNodes: Simple Test of Compute Cluster Nodes
This function provides a simple test of the compute context's ability to perform a round trip through one or more computation nodes.
rxPingNodes(computeContext = rxGetOption("computeContext"), timeout = 0, filter = NULL)
A distributed compute context. RxSpark context is supported. See the details section for more information.
A non-negative integer value. Real valued numbers will be cast to integers. This parameter sets a total (real clock) time to attempt to perform a ping. The timer is not started until the system has first determined which nodes are unavailable (meaning down, unreachable or not usable for jobs, such as scheduler only nodes on LSF). At least one attempt to complete a ping is performed regardless of the setting of
timeout. If the default value of
0 is used, there is no timeout.
NULL, or a character vector containing one or more of the ping states. If
NULL, no filtering is performed. If a character vector of one or more of the ping states is provided, then only those nodes determined to be in the states enumerated will be returned.
This function provides an application level "ping" to one or more nodes in a cluster or cloud. To this end, a trivial job is launched for each node to be pinged. A successful ping means that communication between the end user and the scheduler and communication between the end user and the shared data directory was successful; that R was launched and ran a trivial function successfully on the host being pinged; and that the results were returned successfully.
While this function does not test certain aspects of the messaging required for HPA functions (e.g.,
rxSummary), it does allow the user
to easily test the majority of the end-to-end job functionality within a supported cloud or cluster.
The compute context provided is used to determine the cluster or queue
that is to be pinged. Furthermore, the nodes to be pinged will be determined in the usual fashion; that is,
NULL in the
nodes indicates use of all nodes; values in the
queue fields will
cause the set of nodes to be checked to be the intersection between all the nodes in the
specified, and the set of nodes specifically specified in the
nodes parameter, and so forth. Note that for
clusters and clouds that do have a head node,
computeOnHeadNode is respected. For more
information, see the compute context constructors or the rxGetNodeInfo for more information.
Most other values in the compute context are respected when determing how a ping will be sent. The following fields in particular are of note when using this tool:
May be used to allow the ping jobs to run sooner than other longer running jobs.
Should usually be avoided
Should almost always be set to
TRUE; however, may be of use to a system administrator diagnosing a problem.
An object of type
rxPingResults. This is essentially a list in which component is named using an rxMakeRNodeNames translated
node name in the same manner and for the same reasons described for rxGetNodeInfo, with the
getWorkersOnly parameter set to FALSE.
Each element of this list contains two elements:
nodeName which holds the true, unmangled name of the node, and
status, which contains a character scalar with one of
the following values:
The node failed its scheduler level check prior to an attempt to actually ping the node. This does not necessarily mean that the node is not not functional; rather, it only means that it cannot support having a job run on it.
The round trip job was a success.
The scheduler failed the job. This could be due to permissions, corrupt libraries, or a problem relating to the GUID directory.
The R process on the worker host was started, but failed.
The ping was sent, but a response was never received. This could be due to a problem with the installation, or other long running jobs being queued ahead of the ping job, or a system failure.
as.vector method is provided for the
rxPingResults object which returns a character vector of the non-mangled
(rxMakeRNodeNames translated) node names for use in another compute context, filtered by the
filter parameter originally
rxPingResults object has a
logical attribute associated with it:
allOk. This attribute is set to
TRUE if all of
the pinged nodes' states (after filtering) were set to
"success". Otherwise, this attribute is set to
Microsoft Technical Support
## Not run: # Attempts to ping all the nodes for the current compute context. rxPingNodes() # Pings all the nodes from myCluster, returning values only for those that are # currently not operational rxPingNodes( myCluster, filter=c("unavail","failedJob","failedSession") ) # Pings all the nodes from myCluster; times out after 2 minutes rxPingNodes( myCluster, timeout=120 ) # Extract the allOk attribute from the return value attr(rxPingNodes(), "allOk") ## End(Not run)