Collision on non-unique "headnodehost" hostname across HDInsight clusters

Question

The context:
Let's suppose I have multiple HdInsight4.0 clusters. Also suppose that I would like to access the Hadoop services eg jobhistory server running inside these clusters. Let's suppose I get the corresponding jobhistory address from each cluster via Ambari client configuration API. To be precise I fetch the value of the "mapreduce.jobhistory.address" hadoop property.

Ambari answers back the "headnodehost:10020" string. This is fine - one might guess - if I'm on a cluster node since all nodes have an /etc/hosts file which knows about "headnodehost" hostname:

But I'm not on a cluster node! Also I have a setup which registers all of the unique hostnames of every HDInsight cluster nodes so I can access those from my node. In other words I'm on a node which is able to reach all of my HDInsight clusters (network connectivity is provided). As you would guess this is where things get complicated. What should I do with "headnodehost". I cannot use the returned "headnodehost" hostname to establish TCP/IP connectivity simply because all of my HDInsight clusters have one which resolves to multiple different internal IP in each cluster. Obviously One might mistakenly say that I might as well find out what is the unique hostname alternative for that very same node like: "hn0-hdi101.iuyf3i2yrrvetpdqnyswcj2c3b.fx.internal.cloudapp.net" or "hn0-hdi101" and use that for TCP/IP but my automatism (and client libraries) rely on the "mapreduce.jobhistory.address" hadoop property, as well as these following properties fetched from Hadoop cluster via Ambari so this is approach would be a bottomless rabbit hole:

My questions:

Is it possible to provision the HDInsight cluster in a way that jobhistory service setup would be configured with one of the unique hostnames?
Alternatively is it possible to make this concept of "headnodehost" alias globally unique? Like prefixing it with clusterid: "hdi101.headnodehost" and configure that for Hadoop services like jobhistory server during cluster creation? Additionally keeping the "headnodehost" entry on the cluster /etc/hosts could maintain backward compatibility for existing applications.

Answer

Tag 862u:

Hi KranthiPakala-MSFT,

I think it might be better to give a very direct suggestion to illustrate what I want to achieve here. (Sorry for this approach, I hope it won't be inappropriate, I'm doing it with good intents to give you a different aspect)

Please configure JobHistory, Timeline Service services in a way that the above mentioned properties are referring to .headnodehost (following my example above would result eg: mapreduce.jobhistory.address=hdi101.headnodehost:10020). It is important that Ambari must report this value.
Please add the .headnodehost alias to networking. You might as well maintain its IP address similarly to the headnodehost alias during failovers. This will result an additional, externally referable and unique hostname alias for the master node which won't collide even if I happen to have multiple HDInsight4.0 clusters in the network at any given point in time.
You can keep all the existing hostname aliases as indicated in my screenshot (including the headnodehost alias as well to maintain backward compatibility for you)

Cheers

Share via

Collision on non-unique "headnodehost" hostname across HDInsight clusters

1 answer