Apache HBase/Phoenix - Tips , Tricks & Best Practices in HDInsight
We will keep this page updated with HDInsight HBase/ Phoenix related commonly asked questions. You can leave comments/questions on this blog. Also, official channel to provide HDInsight related feedback and make feature requests is here
What is the advantage of using HBase in Azure HDInsight?
Can't wait , give me a quick link to deploy HBase cluster in HDInsight?
How can I deploy OpenTSDB with HDInsight HBase?
Sure , check this out
So, I just got HBase up and running in HDInsight and want to test the performance without writing any code. How can I "take HBase for a spin"?
SSH into your cluster , and type
hbase org.apache.hadoop.hbase.PerformanceEvaluation sequentialWrite 1
PerformanceEvaluation tool takes number of parameters and commands , just type hbase org.apache.hadoop.hbase.PerformanceEvaluation for all the options.
Now go to HBase Shell and type list , you will see a new table and you can play with many more options.
Are there free online training's on HDInsight & HBase
Yes, check this out
Great! I am an enterprise customer and want to secure the cluster inside a virtual network. How can I do that?
Please follow the article Here
I really need to secure the VNET , what IP & ports Azure needs to operate the service
If you need to install HDInsight into a secured Virtual Network, you must allow inbound access over port 443 for the following IP addresses, which allow Azure to manage the HDInsight cluster.
22.214.171.124 126.96.36.199 188.8.131.52 184.108.40.206
Allowing inbound access from port 443 for these addresses will allow you to successfully install HDInsight into a secured virtual network.
Enough with playing , give me few best practices for great HBase performance
There are some inconsistencies when running “hbase hbck”. Then I want to run “sudo -u hbase<or hdfs> hbase hbck -repair”, it reports access denied to the folders in azure data lake store
Try adding “-ignorePreCheckPermission” as a command parameter
hbase hbck -ignorePreCheckPermission
I have Hive and HBase clusters in same VNET. How can I access HBase table from Hive?
In below example , I have HBase Table'TestTable' which we will map to Hive Table 'hive_table'
Step 1 - Open Hive shell with correct parameters as shown below
hive --hiveconf hbase.zookeeper.quorum=zk0-xxxx.xxxxxxxxxxxxxxxxxxxxxxx.cx.internal.cloudapp.net,zk1-xxxx.xxxxxxxxxxxxxxxxxxxxxxx.cx.internal.cloudapp.net,zk2-xxxx.xxxxxxxxxxxxxxxxxxxxxxx.cx.internal.cloudapp.net --hiveconf zookeeper.znode.parent=/hbase-unsecure
Step 2 - Map Hive table to HBase table
hive> CREATE EXTERNAL TABLE hive_table(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,info:0") TBLPROPERTIES ("hbase.table.name" = "TestTable");
Step 3 - Get the data
hive> select * from hbase_table;
Running hbase hbck shows multiple regions not assigned and holes in the region chain
The symptom is the count of regions is not balanced across all the nodes from HBase Master UI and running hbck shows multiple regions not assigned and holes in the region chain.
1. Run hbase zkcli
2. rmr /hbase/regions-in-transition (or rmr /hbase-unsecure/regions-in-transition)
3. exit hbase zkcli
4. Restart Active HMaster from Ambari
5. Run hbase hbck again to check issue is fixed (no unassigned regions and no holes).
How can I fine tune Apache Phoenix?
Community has build awesome guide. Check here