Wednesday, October 24, 2012

Devoxx Conference

At the Devoxx Conference I will speak again regarding Service Versioning in SOA and Cloud architectures. Come and listen on Wednesday 14th November 2012, at 17:50 - 18:50. For details, see

Friday, October 19, 2012

Who can beat Larry Ellison?

Can one measure ones popularity by the number of Twitter followers? What is the value of a tweet?

If a single tweet can attract 33,026 followers, you must be a really interesting fellow!

This enormous followers-to-tweets ratio (FTR) is what Larry Ellison, Oracle's Big Chief, can boast about. His entire tweet history even fits in a single popup frame. And interestingly, it's more SAP bashing than anything else. Or is this a very good hoax?

Just to compare: Barack Obama has over 20 million followers, but his FTR is not even 3k. Bill Gates is doing fine, his FTR is close to 14k. (My own FTR is far below zero!)

Who can beat Larry Ellison?

Monday, October 01, 2012

SOA design pattern: Idempotent Capability

In the newly released "SOA with REST" book by Thomas Erl et al. ( , a few new design patterns are presented for SOA. One of these is "Idempotent Capability" on page 470. In principle, idempotency is important for both SOAP and RESTful services, but with REST, idempotency and safety of requests must be dealt with on a HTTP level: it's in the specs. For example, HTTP "PUT" must be idempotent, so if you implement it, you must take care of this. SOAP does not have idempotency requirements built into the specs. So maybe this is the reason why this pattern did not show up before.

I have written on idempotency before in this blog ( ), and I am surprised that the described pattern does not correctly deal with service requests that will create or update data. It reads: "The design of an idempotent capability can include the use of a unique identifier with each request so that repeated requests (with the same identifier value) that have already been processed will be discarded or ignored by the service capability, rather than being processed again."

This statement bypasses a fundamental characteristic of idempotency, namely that the response to the same request must always be the same. Hence, the service capability should not discard or ignore the duplicate request at all! Rather, it should make sure the identical response to the previous identical request is given. This is not a simple task, but essential for service clients that contain state that is dependent on the service capability's outcome, for example for stateful business processes using the service.

Imagine this, not so uncommon scenario: The client issues an update request. However, the response does not arrive within the predefined time out. The request is sent a second time. (load balancers and reverse proxies can do this, too; seemingly hidden from the end user.) What response should I get? A correct idempotency implementation should respond with the same response that never got to me the first time. If I would receive nothing, or, let's say, a duplicate key exception, then my business may be in an erroneous state: I may have created a resource, but I miss the reference to it!

Ergo: idempotent service capabilities should not discard or ignore a duplicate request!

Thursday, September 27, 2012

Service Versioning Talk

The slides of my talk at the 5th International SOA Symposium in London, 24-25 September 2012 can be found here:

Wednesday, July 04, 2012

More on Service Versioning

I have written a few blog posts on service versioning before. Now I will give a service versioning talk at the 5th International SOA, Cloud, and Service Technology Symposium, 24-25th September in London, UK.

Come and visit!

Saturday, June 02, 2012

Distributed state machine

A simple tweet by Werner Vogels of Amazon:

" Back to Basic weekend reading: Mattern's paper on vector clocks "Virtual Time and Global States of Distributed Systems" "

This is exactly why you need a stateful orchestrator when automating distributed business processes.

It is also interesting to see that cosmology and the cloud have something in common.

Thursday, May 10, 2012

Version compatibility in web service contracts

It is surprising to see how few web services are backward compatible, when a new version of the service contract is released. The result is often that all service consumers should update their client code in order to work with the new version of the service. A heavy burden for governance.

It is often stated that adding an optional field or attribute to an existing XML schema is a backward compatible change. That is, XML instance documents of the previous version are consistent with XML instance documents of the new version.

For web services, this means that request messages of the old version, will be understood by the web service, because it is consistent with the new message structure. However, for response messages, this is not true at all. The service will return a XML instance document of the new version to the service consumer, who only understands XML instance documents of the previous version. So as long as the optional element or attribute is not present in the instance document, there is no issue. But if the optional element or attribute is present, the service consumer suddenly receives some data that according to his service usage contract is invalid.

So service contract version compatibility is not the same as XML schema version compatibility.

For this work with service contracts in a general case, XML schemas must be designed both forward and backward compatible. Forward compatibility can be achieved using wildcards, which are extension points in the schema. However, such schemas become very messy after a few changes, especially for complex schemas.

Also, XML binding frameworks want ‘clean’ schemas, not the ugly ones with anyType elements, which is harder for code development. Thus, forward compatibility is often ignored in web service contract design, with versioning issues as a result. These issues are manifest in service governance.Trying to be smart and simple in service design, means we end up with complex service version governance.

Some of these governance issues can be alleviated by using a service mediator, such as a ESB, for the web service. Data transformations can be performed between interfaces of the previous version and the new version. For example, for old version service consumers, the mediator can filter out ‘unexpected’ XML instance document elements.

Another way to achieve forward compatibility would be to define service contracts using Schematron, instead of XML schema: rules-based instead of grammer-based contracts. But that's the subject of a future post.

Here are two good articles detailing some of the versioning compatibility issues:

Monday, April 02, 2012

XML namespace versioning in web services

Defining a version number in the namespace of an XML schema is a form of hardcoding. Handling versions of schemas can also be based on the value of a defined version element or attribute.

A hardcoded version number makes sense if you want to enforce versioning at design time. On the contrary, a version value makes sense at runtime.

Design time versioning tightly couples service consumers to the service version. Upgrading the service version means all consumers must upgrade. A version number in the namespace translates to package in Java. Changing the number is changing the client code. 

Runtime versioning is much more flexible, and has less impact on client code. But runtime versioning does require an effort in governance. Who is using which service version? Who is upgrading when? 

Changing the version number in a namespace can trigger a cascade of changes in other XML schema, e.g., through schema imports. This can be so complex that I have seen several large web service providers still being at version 1.0, even after numerous non-backward compatible changes. Effectively this is a no-version strategy, where each change pushes high costs onto all consumers and governance is not stimulated.

Runtime versioning is more friendly towards consumers, and ensures that proper governance is put in place, which increases the service maturity level.

It can't be a surprise I favour the runtime versioning strategy.

Monday, January 23, 2012

Idempotent services

An important service concept which is often neglected in (web) service design is idempotency.

If web services use HTTP as the communication protocol, it must be realized that HTTP cannot guarantee a quality of service (QoS) of exactly once. You can either achieve best effort or at least once. The former is when you do not do any retries after a communication error; the latter is achieved when you do any retries after a communication error.

Since best effort may mean that you loose a message every now and then, this QoS is rarely preferred. Usually some retry mechanism is implemented or configured. Hence, the QoS is most often at least once.

Take a look this. Suppose you send a message over HTTP to a web service, but you receive a time out. What has happened? Is the message accepted by the web service? Is it even processed? Or is it not sent? We can't really know.

When the sender of the message is an interactive application, the person using the application generally pushes the submit button again. The message is resent. This is typical behaviour. (Resending may occur at many points along the communication line: proxies, load balancers, etc. Don't think this is only human!)

But what if the message was received by the web service? The service may have processed it. Suppose the processing results in the insertion of data in a database. What happens when the second message arrives?

Surely, the second message may lead to a second insertion of the same data. There may thus be multiple data records for exactly the same data. This is not really what is wanted by the average database owner. Database designers identify data with unique keys. What happens if you try to insert the same data, with the same unique key, in the database, is a uniqueness constraint violation. This results in an error. In Java, we are all familiar with the DuplicateKeyException.

So the sender of the message will first insert data into the service's database, but he doesn't receive an acknowledgement of this, due to the HTTP time out. He sends it again, but now he receives a DuplicateKeyException. Hmm, that's unexpected. Now what?

The sender needs to perform at least one read operation into the service's database in order to verify if the data were correctly inserted into the database or not. A human, operating an application, may do this naturally, but to implement this in an automatic way can be very complex indeed. And who needs to implement this complexity? It is the service consumer, not the service provider. From a business point of view, this is not very customer friendly.

All this can be avoided if the service would be implemented as an idempotent service. Idempotency means that no matter how often I send the same message, I always get the same response.

Read operations are idempotent. No matter how often I read the same data, I will always get the same answer. (Yes, of course, until somebody changes those data, but that is not the point here.)

It is the create, update, and delete operations where idempotency becomes important. Suppose I want to create some data in a database. In normal operation, the service accepts my data, will insert it in a database, and will respond with a success message, very likely including the unique key which identifies the inserted data.

If I were to send the same data again for creation, an idempotent service will not respond with a DuplicateKeyException, but with the same answer as I would have received if this were the first message to insert these data in the database. Thus, I should receive the same success message again, and if the unique key is included in that message, it should be the same unique key.

For update and delete operations, idempotency essentially works the same as for the create operation.

Of course, implementing idempotency can be complex for the service provider. And complexity costs money. That's probably the real reason why idempotent services are so rare. (At least in the government environments I tend to work.)

But once a service is made idempotent, it is foolproof, and can guarantee data consistency across integration boundaries. Services should be made idempotent. It should be an architectural and service design principle. Especially so for services which are used by many service consumers. You can't really push out your own complexity of keeping data consistent to your consumers, can you? The consumers are your customers, you should treat them as such. You gain more customers if you make it easy for them.

Saturday, January 14, 2012

Hello Honourable Reader!

I hate Hello World! programs. They are too simple. They don't provide any insight into the programming language. And worst: they don't show any creativity at all.

So who came up with this silly Hello World! idea?

A quick search in Wikipedia shows it was Brian Kernighan, back in 1974. I had never heard of him, so I read on on Wikipedia. Back in the 70s he worked at the famous Bell Labs. The letter K in the awk program language is from the first letter of Kernighan. He turned the original "unics" spelling into "unix." He worked on graph partitioning and the Travelling Salesman. And he did lots of other important IT work.

That certainly sheds a different light on Hello World!

However, I understand that it was quite an astonishing achievement to come up with the Hello World! program in 1974. But anno 2012, haven't we been able to come up with anything better? Open any IT book, and you are likely to find Hello World! somewhere in it. Nearly 40 years later, shouldn't we at least upgrade the semantics to Hello, Honourable Reader!?

Monday, January 09, 2012

Service mediation within the government

In a government, we find organisations who provide data, and those who consume data.

Those who provide data make this available through (web) services, generally not taking care of the business needs of the consumers. And why should they? It's their data, they know everything about it, and they keep it consistent. They are the "authentic source" of the data. There role is to focus on their data, not others' business.

Those who consume data need data within a specific context, often unique to their business. It is quite complex for them to connect to the providers's (web) services. Those services often give much more data than the individual consumers need and often consist of complex data models, described by complex XML schemas.

In the middle between providers and consumers should be a mediator. The mediator is the glue between the consumers' needs and the providers' offering. The mediator should look both left and right: it should implement both consumer-driven and data-driven approaches to mediation. The technical implementation can be done using a service bus, but mediation is more than just a technical issue.

Data-driven mediation is a bottom-up approach. Some characteristics:
  • Consumers are tightly coupled to providers: service life cycles, data models, ...
  • Any change in a provider service involves complex governance of both mediator and all consumers
  • Each consumer must interpret the data and assemble it correctly before it can be used in the business; the provider's data model is captured by the consumer
  • The role of the mediator is limited: just expose the same interface as the service provider; only some technical advantages of a common security model, common logging and auditing, etc.; no added business value

Consumer-driven mediation is a top-down approach. Some of its characteristics:
  • Consumers are loosely coupled to providers: consumer service life cycles and data models do not depend on those offered by the provider
  • A change in a provider service does not automatically imply a change at the consumer's side, which simplifies governance
  • The mediator interprets the provider's data and transforms it into the business domain language; the provider's data model is shielded from the consumer; there is an abstraction made by the mediator
  • The role of the mediator is extensive: it is the glue between the data service providers and the business consumers

It is clear that the initial cost of consumer-driven mediation is larger than that for data-driven mediation. This is probably why data-driven mediation is seen so frequently: you can quickly expose services. And for the first few consumers, this is probably the cheapest solution.

But if you start re-using services, governance start to take over the budget. Changes will occur, and will propagate to everywhere. The governance nightmare starts, and the maintenance costs of the SOA will rise.

So, if the goal of mediation is to expose reusable services, then it does make sense to make the initial investment and to go for consumer-driven mediation. The mediator becomes an important part of the IT integration landscape, and its role should not be underestimated.

In a government SOA, there should thus be an integration mediator, which interacts with both authentic sources —  the data providers — and consumers, and which should have enough power in order to fulfil its role as consumer-driven mediator.