Introduction
In the area of distributed programming there are several established approaches for solving the problem of communication between separate programs.
Within the wide range of solutions, from low-level socket operations to high-level and domain-specific information exchange systems, two "middle-level" approaches are particularly interesting as they hide implementation details at the same time offering a generic interface that can be deployed in a variety of application domains. These two solutions are RPC-oriented communication and messaging.
This article tries to highlight the essential difference between these two communication approaches.
The opportunities and traps of RPC
RPC, which stands for Remote Procedure Calls, is a concept that tries to generalize a regular procedure invocation to a case where caller and receiver do not reside in the same process - and are potentially distributed across separate machines.
The essential goal of this approach is to make remote invocation as similar as possible to regular procedure calls and to hide the details of the physical connection.
This goal is very compelling in that it potentially allows to turn the distribution of the final system into a deployment-time decision - in other words, that from the programmer's perspective it does not matter whether the call is local or remote as long as it syntactically looks the same, and the final decision about the distribution of individual system components can be made later. Removing the distribution aspect from the code can be very beneficial for the project if at its early stages the final details of its deployment are not fully known.
The potential benefits of RPC have their costs, however, and they mainly come from the fact that the syntax of regular calls leaves no place for the information that might be usefully used if the system is actually distributed. Regular procedure calls, as known from mainstream programming languages, provide only a simplified notion of the call that assumes several "facts":
- The mechanics of the actual call (that is, transfer of execution from the caller to the receiver) can be neglected in terms of its run-time costs.
Even though some programming languages, like C++, can make the programmer vigilant with respect to the potential cost of parameter passing, the whole idea of procedure call is that the execution just transfers to the called subprogram or function and continues there, until it just returns back to the point of call. This perception of "free call" invites programmers to refactor their code into many, but smaller procedures and functions. That low cost, improved even further by compile-time inlining, fits well into languages that do not provide any means for monitoring the execution cost of the call. - The recipient of the call is an inherent and logical part of the same system.
This assumption itself has several security and integrity consequences. First, many programming languages provide no security-related tools for managing procedure calls - every call is assumed to be safe. Second, in the statically-typed language, the programmer is reassured by the compiler that the signature and the protocol of the call matches on both sides. Neither of these assumptions is guaranteed to be true in a distributed system (see also the What Is Wrong With IDL article for a wider coverage of related problems).
The problem with RPC is that by hiding the fact of distribution at the syntax level, it makes it more difficult for the programmer to properly address the inherent challenges that come with the physical aspects of distribution.
Messaging as an alternative
Messaging as a communication concept is very much different from RPC in that it does not attempt to hide the physical aspects of communication. It is still trying to hide the implementation details, but not to the point of dismissing the notions related to run-time costs of exchanging data.
Messaging as a communication concept can be easily explained due to its similarities to the e-mail system. The most important of these similarities is the fact that messages are recognized as first-class entities, and that users think of each message as something tangible that is acted upon. The focus here is not on hiding, where the communication and its challenges are not visible at all - but rather on encapsulation, where the fact of communication is exposed in a form that the user can interact with. In the e-mail analogy, a message is something that is not only transmitted, but something that can be also backed-up or printed.
Following is a non-exhaustive list of opportunities that can be, depending on implementation, exposed as regular functional features of a messaging system:
- Ability to manage and react to communication delays. A messaging system can have timeouts that are controlled with arbitrary granularity - even at the level of individual messages.
- Ability to monitor the progress (and to estimate the actual time) of physical data transfer.
- Message priorities that allow the transport layer to differentiate messages based on their importance.
- Message persistency - a messaging system might have a possibility to store messages for reliability or to allow complete decoupling of senders and receivers in terms of their execution time. Back-ups also provide a possible application of message persistency.
- Exploratory and dynamic interpretation of message content - messages can have bodies that do not necessarily conform to any statically agreed structure, which provides more flexibility in managing evolution of a distributed system (again, see What Is Wrong With IDL for more details)
- Ability to adapt to both direct and indirect transport systems, including those with automated replication of content. In the e-mail analogy, mailing lists (with archives, thanks to message persistency) are good examples of indirect transport systems that do not require any extensions to the basic concepts of e-mail.
- Message tagging, meta-information and tracing - it is possible, for example, to obtain a full report of the transport path that was "visited" by the message until it finally reached its destination. Thanks to the meta-information, messages can also become parts of higher-level communication structures. In the e-mail analogy so-called "threads" in discussion groups show a possible application of these concepts.
- Security-related features like digital signatures of data content and access tokens can be used on a per-message basis for detailed control of who can do what in the distributed system.
The above list of possible features shows that what is considered to be a deployment detail in the RPC approach, is turned into a wide range of functional opportunities in the messaging approach.
It is still possible to use messaging as a distributed implementation of the "call" and in fact object-oriented methodology uses the term "message" to refer to requests that can be sent between objects. The advantage of messaging, however, is that by exposing the fact of communication in a tangible form of the message as a first-class entity, the programmer gains the opportunities for expansion into the functional areas that are either uncomfortable within the constraints of a "procedure call mindset" or just not possible at all.