place, and the resource manager has already been identified. If a conversion request is made on the node managing the resource, no messages need be exchanged. If the resource manager is not the node on which the request is being made, either one or two messages are required . For example, i n some cases in which the requested mode is compatible with the granted mode, the request
3 5
VAXcluster Systems
The V AXjVMS Distributed Lock Manager
NODE A
RESOURCE MANAGER
NODE 8
DIR ECTORY NODE
CD
CD 0 0
NODE C
When a new root-lock request is received, local copies of the resource block and lock block are created.
A message requesting a lock is then sent to the directory node.
The response i ndicates that node A is currently the resou rce manager.
The lock request i s again sent to node A . A master-copy lock block i s created o n the resou rce manager and linked to the resource block.
KEY:
(D
A granted response is returned.Dl
R ESOU RCE BLOCKD
(IMPLEMENTED AS A RESOU RCE BLOCK} DIRECTORY ENTRY FOR RESOU RCE 0 LOCK BLOCKFigure 3 New Root-lock Request When a Resource Manager Exists
can be unilaterally granted , and a single message sent ro noti fy the resource manager of the change . In others, the resource manager must make a dec ision based on the other requests t ha t are granted . A request i s then sent to the resource manager, who must respond . Jn all cases, no com
munications are req uired with the di rectory node. Figure 6 il lustrates a conversion request .
Operation During Periods of Resource Contention
The operation is slightly more complicated dur
ing periods of contention . When a resource man
ager receives a lock request t hat cannot be granted because an incompatible lock exists, two
actions are required . First, all holders of incom
patible locks that have indicated a desire ro receive blocking ASTs must be noti fied that a pro
cess is waiting. To accomplish this, a message is sent to each node where a lock holder resides.
The process holding the lock is notified only once , even though it may be blocking multiple lock requests. Second , the requester of the lock must be told to wait; this is accomplished by sending a response to the lock request. When the blocking lock is later released, a message is sent to each waiting requestor indicating that the lock is now granted . Table 4 su mmari zes the numbers of messages used for different types of lock requests .
Digital Tecb�Jical journal No. 5 September 1987
NODE A
RESOURCE MANAGER
NODE B
DIR ECTORY NODE
D
CD 0
NODE C
When a sublock request is received. a lock block is created. If this is the first lock on the sub·
resource, a resource block is also created.
The request is sent to the resou rce manager. No d irectory lookup is requ ired.
If locks already exist on the subresource . only a lock block is created. Otherwise, both a lock block and a resource block are created.
(D
A granted response is returned.KEY:
D
0
RESOURCE BLOCK
DIRECTORY ENTRY FOR RESOURCE (IMPLEMENTED AS A RESOURCE BLOCK) LOCK BLOCK
Figure 4 A Subtock Request on a Node that Is Not the Resource Manager
Scaling Behavior of the Distributed Lock Manager
It can be shown that the number of messages requ ired for any locking operation is bounded by a small constant that is independent of the num·
ber of nodes, or cl uster size, i n a VAXcluster sys·
re m . This section addresses how the size of the dara representing the locking state and the total number of locking messages vary with a cluster's size .
The distributed lock manager uses a fixed-size control block to represent both a lock and a lock request . An instance of this control block exists on the node requesting the lock. If the resource manager is a different node, another instance exists on the resource manager. A resource is rep·
Digital Technical journal No. 5 September I Y87
resented by another fixed-size control block. An instance of this control block exists on each node requesti ng the lock, on the resource manager, and on the directory node . Whenever any of these categories overlap ( i . e . , requestor, resource man·
ager, and directory node) , only one instance of the control block is present. The control blocks for Jocks and resources are dynamicaLly allocated and deal located.
At l east one lock is represented for every resource represented. Conversely, a resource is represented for every lock represented . For each lock, the u pper bound on the storage require·
ments is two lock control blocks and three resource control blocks . This upper bound is usually quite loose and depends on a c luster's size.
37
VAXcluster Systems
The VAX/VMS Distributed Lock Manager
NODE A
RESOURCE MANAGER
NODE B
DIR ECTORY NODE
0 CD
When an unlock request is received for a root
lock, the lock block is deallocated. If this is the last lock on the resource. the resou rce block is also deallocated.
A message is sent to the resource manager. No response is required.
The resource manager deallocates the lock block.
If this i s the last lock on the resource. the resource block i s also deallocated.
0
A message is sent to the directory node.(I)
The directory entry is removed.KEY:
D
D
RESOURCE BLOCK
D I RECTORY ENTRY FOR RESOURCE (IMPLEMENTED AS A RESOURCE BLOCK) LOCK BLOCK
Figure 5 Unlock Request for the Last Remaining Lock on a Root Resource
VAXcluster applications are typically designed so that their algorithms do not change as the size of the cluster changes. Therefore, an i nstance of a typical application running on one node exhibits a behavior with respect to the number of ou t
standing locks and the frequency of locking oper
ations that is independent of the number of addi
tional instances of that application running on the same or other nodes. If multiple instances of the application are running, the number of out
standing locks and the frequency of locking oper
ations increase i n proportion to the number of copies of the application , independent of the cJ uster size.
Both the number of messages per locking oper
ation and the storage requirements for a lock are
3 8
bou nded b y constants that are independent of the cl uster si ze. Therefore, the rate at which mes
sages must be exchanged and the total storage required to represent the locking state are pro
portional to the nu mber of instances of the appli
cation that are running, which is also i ndepen
dent of the cluster's size . If the nu mber of instances of the application i s proportional to the cluster size, the rate of message exchange and the total storage required to represent the locking state are both bounded by a constant ti mes t he cJ uster size.
This argument is also val id when multiple instances of each of several appl ications are present .
Digital Technical journal No. 5 Septem ber 198 7
NODE A NODE C RESOURCE MANAGER
NODE 8
(D
A conversion request is received.(D
The request is sent to the resource manager.G)
The request is granted.DIR ECTORY NODE
(D
A granted response is returned.D
Note: Conversion requests on the resourcemanager require no messages.
KEY:
D
0
RESOURCE BLOCK
DIRECTORY ENTRY FOR RESOURCE (IMPLEMENTED AS A RESOURCE BLOCK) LOCK BLOCK
Figure 6 Conversion Request on a Node that Is Not the Resource Manager
These characteristics of the distributed lock manager ( i . e . , total space and message traffic behavior that is subject to a linear bound i n the
"workload ") are a significant fact or in allowing VA.Xcluster systems to act as distributed operat
ing systems . These characteristics suggest that, from the distributed lock manager's viewpoint, additional growth in the size of a VAXcluster con
figurations is certainly viable.
Performance Aspects of the Distribu ted Lock Manager
Table 5 sum marizes the performance of the d is
tributed lock manager. The measurements reflect operations that are norma lly done in pairs . Such
Digital TeciJnical journal No. 5 Sepl<!mher 1987
operations include an SENQ fol lowed by a S DEQ, and a conversion to a more restrictive mode (up) foUowed by a conversion to a less restrictive mode (down) . The operations reported in the table are performed on sublocks.
When Processors join or Leave the VAXcluster System
The connection manager plays a major role in the lock manager's abi lity to deal with configuration changes when one or more nodes join or leave the VAXcluster system. When the membership of the cluster must be altered, a coordinator node is e lected to lead the other nodes through the state transition . Any node can become the coordinator
3 9
VAXcluster Systems
The VAXjVMS Distributed Lock Manager
Sublock request on resource manager
0
U nlock request on resource manager with
Sublock requests and subsequent root-lock requests from a system that is the di rectory node. Otherwise two messages;
a di rectory lookup req uest followed by a "do local" response.
Remove directory entry message sent to d i rectory node. No message sent if manager
Dequeue message to manager. Manager may then send a remove directory message to Using the Computer Interconnect (CI780)
ENO + DEO other members . They have the option of agreeing or d isagreeing with the proposed con figu ration .
They w i l l disagree i f they can construct a more opt i m a l configuration based on the nu mber of nodes they can com mu nicate with and on the assignment of votes to those nodes. The resulting VAXc l ustcr system can only consist of a strongly connected group of nodes where every node has a connection ro each of the others .
I n case o f d isagreement , t h e coordinator backs
ments are quickly resolved so that the node that can put together the most optimal configuration becomes the coordinator. At this point, the new configuration has been described to all nodes and they have agreed ; therefore, commit messages are sent.
Thus the connection manager is able to provide the distributed lock manager with a consistent view of the processors that are members of the VAXcluster system. The connection manager can a lso ensure that the vectors used to identify the directory node for a given resource are identical on all nodes. In addition, the manager assigns a unique identifier, called the cluster system I D (CSID) , to each processor admitted into the VAX cluster system.