Saturday, October 9, 2010

Distributed, Integrated or Networked?

Distributed is a very common adjective. I usually tell my students to be careful when using it to refer to anything that gets in a network, which is usually the case. Even more, I tell them to beware of saying that REST or SOA are the answer to distributed needs, since it is not totally true.  The question I receive is then: When can we tell something is distributed or not? Ok, this post will try to clarify the difference between the three terms you can read in the title, as they are very often confused.

Let’s begin by thinking about systems. A system is a set of related components that perform some interaction, following some rules. The components may be anything, from hardware to software, for virtual to real, from people to things. Now, a solution is the implementation of processes and flows that yield the resolution of (or helps solving) a problem. The solution may be a monolithic thing, or it may be a system.

If you take a system, and then you analyze what is it doing, you may find that it may not be good to have all processing done at the same place. So, you decide to split it, and place some parts somewhere else. Think of tasks, all performed by one big process in one big environment. You may need to split the system and send some tasks to be performed in other environment, by another process. That way, you can obtain some benefits like scaling, encapsulation, “replaceability” and better use of assets. Some say you have also reusability, but that is not actually true all the time, as it depends on how you split the system, and that is definitely not the goal of splitting. But, you also lose some other things, like performance due to increased interprocess communication (IPC). Fair. You have there a distributed system, a whole that was split but still needs to work as a whole.

Now, imagine you have two systems. Each system solves a problem. You may discover that joining forces, the systems can solve higher level problems. So, you decide they must communicate somehow, to try working together. Fair. You then create communication channels and you have an integrated macro system. That is, each system is treated as a whole that may be composed, reused, to create a bigger system. We need to keep the individual system’s inner workings a secret, and treat is as a whole, so we need it encapsulated, reusable and loose coupled. We use standard interfaces, and we use higher level processes. Also, the frequency of interaction is low. And of course, the user knows this bigger system is a large composed thing, where we know the individual parts. We call an Integrated system.

But wait: isn’t that the same idea as above? Well, not really. It may sound the same, but when implementing, and semantically, they are not. See, usually distribution requires the system to work as a whole, and thus communication is a pain the user should not be aware of. It must be fast, and should be tight coupled. Since the parts of a distributed system are parts of a whole, they know each other and that allows for faster communication if that information is used in coupling.

Let’s check an example. For distributed systems, suggesting Services through messaging as the communication channel may not work, since that actually increases the overload for communications. (Distribution with services seems to be a bad idea, an antipattern).

On the other hand, you have the integrated system. Here each subsystem is a standalone system. There may not be the need to have fast communication, but reliable and loose coupled one. There, a service fits, since we don’t need to know the internals of one system to integrate it with another one.

Fine. What about the Networked system? Well, that is also a system, but made as a network. A networked system has its components interacting as a network. That is, you have nodes and communication between all of them, you have routing to get to a node, and you have a very dynamic node topology. Yes, you can have integrated systems and distributed systems running in a network, but that does not make any of those systems a network system. You may have the same parts of a distributed/integrated system in one big machine, no network, and still be integrated or distributed.

In a networked system, you can easily have more than one copy of a node. Routing and balancing gives you the ability to increase and decrease the node number and thus scaling in place, incrementally, focused. You can add more nodes without affecting the ones in the network, and you can create subsystems with them. In fact, all you can do with a computer network you can do with a networked system. I guess the difference between integrated/distributed systems and networked ones is clear. No? Ok.

Let’s do it again.
- A distributed system is a whole system that was split to improve its parts, that needs fast and reliable communication, just like if the component being called is near at hand. Usually communication is direct, tight coupled. It has a high interaction between parts. Finally, the reuse of part is optional.
- An integrated system is the one made of whole systems largely distributed, encapsulated. Low frequency of interaction between parts, loose coupled (high cost) communication. The parts are made reusable, and the user is usually aware of the composition.
- A networked system, all nodes form a network. An application is a composition of node services. Topology may vary, dynamically, nodes may not be standalone components, and it needs low coupling and other messaging services like routing and balancing. Nodes can be added and functionality adjusted without much impact. Scalable locally, and may contain subsystems.

Ok. One big networked system is the web. Now, in the web you can have integration (B2B transactions), distribution (an enterprise with subsidiaries) and other networking systems (maybe mashups, or 
Grids).

Clear now? Great! Based on that clarification, I pretend to write a future post: REST is tuned for networked systems! Stay tuned...