Architectural model

Completed

When GraphLab is launched on a cluster, one instance of its engine is started on each machine, as we described earlier. All engine instances are symmetric; that is, they have the same capability. Moreover, they all communicate directly with each other using a customized, asynchronous Remote Procedure Call (RPC) protocol over TCP/IP. The first triggered engine instance, however, will have the additional responsibility of serving as monitor/master engine, although instances on other machines will still work and communicate directly without coordination from the master. This arrangement makes GraphLab a peer-to-peer system.1 The master engine computes the atom mapping based on the atom index file and instructs corresponding engine instances to load their assigned atoms. Subsequently, all instances load their atoms in parallel. Each engine then executes the journal in each of its assigned atoms, generating a partition of the input graph, and runs the user-defined program specified in Algorithm 1. As each engine loads its atoms, generates its partitions, executes code, and schedules vertices without polling the master engine, GraphLab employs a push-based, thread-scheduling strategy. As a consequence of employing a symmetric design, GraphLab exposes high scalability and precludes centralized bottlenecks and single points of failure (SPOFs), unlike both Hadoop MapReduce 1.0 and Pregel.


1 Recall that in peer-to-peer systems, a master may be adopted, but only for purposes like monitoring the system and injecting commands. Processes in such systems work and communicate directly without having to contact master machines.

Check your knowledge

1.

What kind of system is GraphLab?

2.

What strategy does GraphLab use to schedule vertices?