Broadcast join

Today, regular joins are executed on a single cluster node. Broadcast join is an execution strategy of join that distributes the join over cluster nodes. This strategy is useful when the left side of the join is small (up to several tens of MBs). In this case, a broadcast join is more performant than a regular join.

Use the lookup operator if the right side is smaller than the left side. The lookup operator runs in broadcast strategy by default when the right side is smaller than the left.

Note

If the left side of the join is larger than several tens of MBs, the query will fail.

You can run the following query to estimate the size of the left side, in bytes:

leftSide
| summarize sum(estimate_data_size(*))

If left side of the join is a small dataset, then you may run join in broadcast mode using the following syntax (hint.strategy = broadcast):

leftSide 
| join hint.strategy = broadcast (factTable) on key

The performance improvement is more noticeable in scenarios where the join is followed by other operators such as summarize. See the following query for example:

leftSide 
| join hint.strategy = broadcast (factTable) on Key
| summarize dcount(Messages) by Timestamp, Key