Sunday, May 1, 2011

Dealing with versions in REST? Think again, you may not need to.

In the recent IWS 2010, I saw Zachman presenting an interesting note about enterprise architecture. He quoted Brooks by saying that redundancy drives the system to chaos, entropy. Well, actually some “managed” or better said “controlled” redundancy is necessary to allow scalability. In a nice context, that redundancy is a “mirrored” one, a copy, an instance.  That means each part that is redundant is the exact same copy of all other redundant parts. But not all is so nice: there is a meaner redundancy, the one created by specialization or customization. That one is partial redundancy, where one part of the instance is the same for all instance, but there is a part that is unique, different from the others. When that happens, we usually say we have a new version.

So, version is a way to differentiate between two things that are similar but not quite the same. Version is something like an ID. That is partial redundancy. If there were no redundancies, or if all redundancies were absolute, then we would not need versions.

Now, if two versions of something were to be treated the same, no difference in processing them, then there is no need to differentiate.  In other words, nobody cares about the part that is different in the instances. Thus, version can be defined as the ID plus the special processing or treatment that ID requires.

Let’s look it using another perspective. To create a version, we need to change something. That is: data, structure, a process, something that is changed. The original form is kept and the new form is assigned a version ID to differentiate it from the original. In other words, versioning is basically a technique to handle the differences that appear when something changes. The real issue is then modifiability.

Modifiability? When we modify something and the rest of the system has to adapt to that change, we say we have no modifiability: the system is tightly coupled.  Modifiability is a quality property that allows some things (even instances) to change without implying the system will break. It is the capacity of the system to support the change of one of its parts (here, each individual instance is counted as one part). When something changes and we start creating special version of things to handle the change and keep the old parts still working, we are adding partial redundancy.  And that is bad, remember?

So, as a piece of advice, try to stay away from versioning as hard as you can.

But, what if you need to use versions, how do you work with them in REST? Well, REST is a style that welcomes redundancy, but tries to keep the flexibility to change without disrupting the system. That is, it offers modifiability.

Let’s see what can change. In REST we have different servers that can come and go. Topology can change. Since we are using layers, the fact that a server changes in a second level or beyond does not affect us. If the server that goes away is the one that we are talking to, there should be another that takes its place (redundancy). If there is no other, then we are in trouble. So, here we have total redundancy, not partial. 

Now, let see the next level. The resources that are actually operated by a server may change. Say, we have no longer a text file but a database, or the resource is no longer a static image but a video, any change that a resource can suffer. Since we manage resources through representations, that change may not be a problem. It is solved by keeping the same representations as before and even adding new ones taking advantage of the changes in the resource.

Ok, we may say: what if the representation is the that changes!? Well, here we have two types of possible changes: Structural or content. If the representation structure changes, then we may be in front of a totally new structure that needs a new media type. Maybe the structure changes a little, like in WSDL 1.1 and 2.0. Well, in that case, the new 2.0 needs to get a new IANA registry (actually it has an application/wsdl+xml assigned for 2.0) . The client will know what is being sent by reading the media type. Humm, ok, it may not happen, new types for new versions of already existing media types may be difficult, in that case it could be the new type description contains backward compatibility, or simple that the old spec is not supported anymore.

Now, let’s analyze content modification in the representation. What does it mean? I guess that the usual data that was being sent by the application is not the same as usual. Humm. That means the client is statically bounded to a set of values? Well, that is not a problem of REST, right? The content should then have a way to tell the client what has changed. A version you said, well yes, but application driven. REST has no restriction of what values or fields you sent out, one of them could be a version number, but that is the application data, application semantics, totally out of REST concern.
A little note here. We may have a well-known generic data type like XML. And we may have one normal document structure the client is reading. What if we add some new elements with some new data, and remove others? Well, here we are not changing the data type (XML is still XML). What we changed is the XML schema. And of course, the schema is specified in the XML body. So, a client can perfectly parse the XML if it understands the schema. Again, nothing related to REST.

Let’s continue. Now, I want to change the actual process. Say, we need to do some posts to complete a process. Suddenly, a new Post and a verification Get is required. In this case, unless the client is a static thing that is not following Hypermedia as the Engine of Application State constrain, that is solve alone by adding a couple of links to one particular resource representation.

Ok, the resource that is changed with two more links may be the one to be versioned, right? Well no. If you still need the same resource representation, either create a new representation, create a modified special representation based on the client request (may be a query parameter) or create a new resource that will be used upon a client selection. Think of a virtual store that has a normal four step process to check out.  If you have a credit card registered, you may want to skip the credit card form, or even perform a one click check out. Note that in these cases, we are simple adding options that will drive the client to new process flow, old clients (that only understand the full checkout process, although that is not nice implemented) may be able to follow that old process link. So, we keep all old thing add add new ones without breaking the app, the magic of Hypermedia.

Did I forget something? I guess not. All the cases can be resolved without making artificial URLS with versions or creating custom media types. The only cases where a version id may be needed are the ones that are application specific and not related to REST.


Thursday, April 21, 2011

Forced Domain and Forced Paradigm anti-patterns

These two common (and usually overlooked) “anti-patterns” are frequently followed by developers. Actually, I’m not sure if we can officially call them anti-patterns, but they are for sure common patterns that are not good to follow.

Let us explain what the symptoms are for each.

The Forced Domain occurs when a domain that is not the natural one for the solution is forced into that solution. Here forcing means using the foreign domain concepts and processes to solve the problem in a usually unnatural way. The most common example is the IT domain forced into business logic, user interface, user experience or any part of the solution that needs to be in the business domain. For instance, when we build software that presents data in tables, shows codes (data keys), exposes the notion of next record, the notion of indexes or even the notion of fields, we are forcing the database concepts to the client. To use the software, the user needs to learn those concepts first. The same happens for processes, when backups, initialization, remote calls and such are forced to the user.

Another case is when we are integrating applications. Each solution we integrate may have its own domain, and they may not be the same domain as the other application we are integrating with. Integrated applications should communicate in a lousily coupled manner. So, for one solution to use the other one, both will need to share some common business concepts and those should be only ones in between. For example, a banking system that offers loans may need geographical information about the land we are using as warrant. Forcing the banking system to understand about geo locations, azimuths and GPS readings is a non-sense. To obtain the needed information, the bank should request it using business information it manages, like a particular address. It is the job of the geographical system to convert that requirement data, in the form of a text containing an address, into world’s longitude and latitude pairs and convert the response’s information into readable data for the bank.

Ok, that may raise a question: Isn’t that forcing the bank domain information into the Geo-System? No, actually. The Geo-System is probably offering that query as a service, which means it is decoupled from the implementation, and using standard documents (which may even be generic) to pass info. Note that the Geo-System will not get bank specific information, just a street address.

Why is that Forced Domain a bad thing?
A domain is not just a set of concepts. It is also a very complex environment that has processes that are natural in the environment, working smoothly with the environment’s rules. The concepts fit and all data is structured in a way that facilitates the processing. If not, then the domain would not work. Think of a medical domain, in surgery, where the concepts do not match or where the rules make the surgeon juggle with tools not made for the job at hand. In the example, think of a surgeon that is giving a fish knife to work with, and that is forced to kill the fish before opening. The tasks of cleaning a fish would sound similar to some of those performed by a surgeon, but it is clear the final idea is different and that it is not a good idea to practice surgery with a fish cleaning knife.

Domain forcing also produces coupling of a very special kind. In some cases it will require application A to know about Application B to use it, and in some others it will force Application B to know about A in order for application A to use B! It is like teaching me mechanics so the mechanic can fix my car by asking me car construction questions. In collaboration contexts, our banking application is forced to store and process geographical data because it needs those values to query the geographical system.

The Forced paradigm anti-pattern is more related to development.
That is, when something is not done naturally as it should, but done my way. A very common example occurs when the developer, coding a services consumption client, uses an OO language. The service is usually defined as a port to which we send messages in a pattern. To do that in Java, for instance, we map the port and message idea to a method, defined in a stub, which acts like a local class. Thus, the sending of a message notion is lost. The developer sees the operation as just invoking a method locally. There, we are forcing one paradigm into another; we are forcing the method call operation into a message send operation. In Java we are supposed to have objects and invoke methods, not send messages. That is true, but then we should build an object that sends the message for us, and we just invoke that functionality. Sounds the same, but semantically it is not.

Is that bad too?
Of course it is. Forcing a paradigm causes impedance mismatch. It also increases overhead due to paradigm conversions, and removes semantics that would help the developer to create good performing code (like the stubs, may trick developers to use the call as a local one, when it is remote, and cause a bad performance). Actually, if you take a look at the current specifications for Java, we may see there are many that map the actual interface to objects following this antipattern, hiding away the actual process from developers. That is done to make it “easy” for them to use the API, but the resulting API would not be good enough.

Do you have any examples of these antipatterns? What is your take on how bad are they for your health?

Back to Posts! Blogging Season Ahead...

Ok, six months of not posting anything seems like too much.
There are lots of things that are keeping me busy. But I hope next months will allow me to finish the 4-5 posts that are just started but never finished.
I have one almost ready, but I feel I cannot hold it longer. So, it is free then to be published, although soon I may write about the topic once more, to clarify more things.

Off we blog!