Wednesday, October 24, 2012
Friday, October 19, 2012
If a single tweet can attract 33,026 followers, you must be a really interesting fellow!
This enormous followers-to-tweets ratio (FTR) is what Larry Ellison, Oracle's Big Chief, can boast about. His entire tweet history even fits in a single popup frame. And interestingly, it's more SAP bashing than anything else. Or is this a very good hoax?
Just to compare: Barack Obama has over 20 million followers, but his FTR is not even 3k. Bill Gates is doing fine, his FTR is close to 14k. (My own FTR is far below zero!)
Who can beat Larry Ellison?
Monday, October 01, 2012
In the newly released "SOA with REST" book by Thomas Erl et al. (http://servicetechbooks.com/rest) , a few new design patterns are presented for SOA. One of these is "Idempotent Capability" on page 470. In principle, idempotency is important for both SOAP and RESTful services, but with REST, idempotency and safety of requests must be dealt with on a HTTP level: it's in the specs. For example, HTTP "PUT" must be idempotent, so if you implement it, you must take care of this. SOAP does not have idempotency requirements built into the specs. So maybe this is the reason why this pattern did not show up before.
I have written on idempotency before in this blog ( http://ignazw.blogspot.be/2012/01/idempotent-services.html ), and I am surprised that the described pattern does not correctly deal with service requests that will create or update data. It reads: "The design of an idempotent capability can include the use of a unique identifier with each request so that repeated requests (with the same identifier value) that have already been processed will be discarded or ignored by the service capability, rather than being processed again."
This statement bypasses a fundamental characteristic of idempotency, namely that the response to the same request must always be the same. Hence, the service capability should not discard or ignore the duplicate request at all! Rather, it should make sure the identical response to the previous identical request is given. This is not a simple task, but essential for service clients that contain state that is dependent on the service capability's outcome, for example for stateful business processes using the service.
Imagine this, not so uncommon scenario: The client issues an update request. However, the response does not arrive within the predefined time out. The request is sent a second time. (load balancers and reverse proxies can do this, too; seemingly hidden from the end user.) What response should I get? A correct idempotency implementation should respond with the same response that never got to me the first time. If I would receive nothing, or, let's say, a duplicate key exception, then my business may be in an erroneous state: I may have created a resource, but I miss the reference to it!
Ergo: idempotent service capabilities should not discard or ignore a duplicate request!
Thursday, September 27, 2012
Wednesday, July 04, 2012
Come and visit!
Saturday, June 02, 2012
A simple tweet by Werner Vogels of Amazon:
" Back to Basic weekend reading: Mattern's paper on vector clocks "Virtual Time and Global States of Distributed Systems" http://wv.ly/JUbKQ7 "
This is exactly why you need a stateful orchestrator when automating distributed business processes.
It is also interesting to see that cosmology and the cloud have something in common.
Thursday, May 10, 2012
Monday, April 02, 2012
Monday, January 23, 2012
If web services use HTTP as the communication protocol, it must be realized that HTTP cannot guarantee a quality of service (QoS) of exactly once. You can either achieve best effort or at least once. The former is when you do not do any retries after a communication error; the latter is achieved when you do any retries after a communication error.
Since best effort may mean that you loose a message every now and then, this QoS is rarely preferred. Usually some retry mechanism is implemented or configured. Hence, the QoS is most often at least once.
Take a look this. Suppose you send a message over HTTP to a web service, but you receive a time out. What has happened? Is the message accepted by the web service? Is it even processed? Or is it not sent? We can't really know.
When the sender of the message is an interactive application, the person using the application generally pushes the submit button again. The message is resent. This is typical behaviour. (Resending may occur at many points along the communication line: proxies, load balancers, etc. Don't think this is only human!)
But what if the message was received by the web service? The service may have processed it. Suppose the processing results in the insertion of data in a database. What happens when the second message arrives?
Surely, the second message may lead to a second insertion of the same data. There may thus be multiple data records for exactly the same data. This is not really what is wanted by the average database owner. Database designers identify data with unique keys. What happens if you try to insert the same data, with the same unique key, in the database, is a uniqueness constraint violation. This results in an error. In Java, we are all familiar with the DuplicateKeyException.
So the sender of the message will first insert data into the service's database, but he doesn't receive an acknowledgement of this, due to the HTTP time out. He sends it again, but now he receives a DuplicateKeyException. Hmm, that's unexpected. Now what?
The sender needs to perform at least one read operation into the service's database in order to verify if the data were correctly inserted into the database or not. A human, operating an application, may do this naturally, but to implement this in an automatic way can be very complex indeed. And who needs to implement this complexity? It is the service consumer, not the service provider. From a business point of view, this is not very customer friendly.
All this can be avoided if the service would be implemented as an idempotent service. Idempotency means that no matter how often I send the same message, I always get the same response.
Read operations are idempotent. No matter how often I read the same data, I will always get the same answer. (Yes, of course, until somebody changes those data, but that is not the point here.)
It is the create, update, and delete operations where idempotency becomes important. Suppose I want to create some data in a database. In normal operation, the service accepts my data, will insert it in a database, and will respond with a success message, very likely including the unique key which identifies the inserted data.
If I were to send the same data again for creation, an idempotent service will not respond with a DuplicateKeyException, but with the same answer as I would have received if this were the first message to insert these data in the database. Thus, I should receive the same success message again, and if the unique key is included in that message, it should be the same unique key.
For update and delete operations, idempotency essentially works the same as for the create operation.
Of course, implementing idempotency can be complex for the service provider. And complexity costs money. That's probably the real reason why idempotent services are so rare. (At least in the government environments I tend to work.)
But once a service is made idempotent, it is foolproof, and can guarantee data consistency across integration boundaries. Services should be made idempotent. It should be an architectural and service design principle. Especially so for services which are used by many service consumers. You can't really push out your own complexity of keeping data consistent to your consumers, can you? The consumers are your customers, you should treat them as such. You gain more customers if you make it easy for them.
Saturday, January 14, 2012
So who came up with this silly Hello World! idea?
A quick search in Wikipedia shows it was Brian Kernighan, back in 1974. I had never heard of him, so I read on on Wikipedia. Back in the 70s he worked at the famous Bell Labs. The letter K in the awk program language is from the first letter of Kernighan. He turned the original "unics" spelling into "unix." He worked on graph partitioning and the Travelling Salesman. And he did lots of other important IT work.
That certainly sheds a different light on Hello World!
However, I understand that it was quite an astonishing achievement to come up with the Hello World! program in 1974. But anno 2012, haven't we been able to come up with anything better? Open any IT book, and you are likely to find Hello World! somewhere in it. Nearly 40 years later, shouldn't we at least upgrade the semantics to Hello, Honourable Reader!?
Monday, January 09, 2012
Those who provide data make this available through (web) services, generally not taking care of the business needs of the consumers. And why should they? It's their data, they know everything about it, and they keep it consistent. They are the "authentic source" of the data. There role is to focus on their data, not others' business.
Those who consume data need data within a specific context, often unique to their business. It is quite complex for them to connect to the providers's (web) services. Those services often give much more data than the individual consumers need and often consist of complex data models, described by complex XML schemas.
In the middle between providers and consumers should be a mediator. The mediator is the glue between the consumers' needs and the providers' offering. The mediator should look both left and right: it should implement both consumer-driven and data-driven approaches to mediation. The technical implementation can be done using a service bus, but mediation is more than just a technical issue.
Data-driven mediation is a bottom-up approach. Some characteristics:
- Consumers are tightly coupled to providers: service life cycles, data models, ...
- Any change in a provider service involves complex governance of both mediator and all consumers
- Each consumer must interpret the data and assemble it correctly before it can be used in the business; the provider's data model is captured by the consumer
- The role of the mediator is limited: just expose the same interface as the service provider; only some technical advantages of a common security model, common logging and auditing, etc.; no added business value
- Consumers are loosely coupled to providers: consumer service life cycles and data models do not depend on those offered by the provider
- A change in a provider service does not automatically imply a change at the consumer's side, which simplifies governance
- The mediator interprets the provider's data and transforms it into the business domain language; the provider's data model is shielded from the consumer; there is an abstraction made by the mediator
- The role of the mediator is extensive: it is the glue between the data service providers and the business consumers
But if you start re-using services, governance start to take over the budget. Changes will occur, and will propagate to everywhere. The governance nightmare starts, and the maintenance costs of the SOA will rise.
So, if the goal of mediation is to expose reusable services, then it does make sense to make the initial investment and to go for consumer-driven mediation. The mediator becomes an important part of the IT integration landscape, and its role should not be underestimated.