Saturday, March 29, 2008

Dumb XML

Beautiful Code is a book that can not let oneself indifferent. While reading it, I have been really annoyed by a statement made by one of the authors:
What I always tell people is that XML documents are just big text strings. Therefore, it's usually easier to just write one out using StringBuffer rather than trying to build a DOM (Document Object Model) or using a special XML generator library.

I fully disagree with this position, because I have seen time and again the adverse results of such a simplistic approach to XML generation:
  • Malformed XML: basic string construction does not handle general entities escaping, element name validity and correctly balanced tags. I once had to deal with an XML document that was so broken I initially thought it was SGML. It was neither and I ended using regular expressions instead of SAX to parse it.
  • Invalid XML: applications that generate XML should be polite enough to validate the data they produce before sharing it with other applications. At the time DTDs were current, I followed the practice to add the external declaration only if the document was tested to be valid, like a proofing stamp. Of course, I also had to deal with an application that never considered important to output XML that complied with its own schema!
  • Bad encoding: I realize that many developers live and work in a place where ASCII-7 is enough to represent all the characters they need. But the rest of the world cares for accents and other language particularities. Hence again: basic string building gives no guarantee in term of correct representation of Unicode characters.

Of course, for trivial XML blocks that will never contain any special character nor vary in form too much, using a StringBuilder (not StringBuffer by the way, most of the time it is unnecessary to use this synchronized version) is more than enough. Of course, you can use helper class to encoding all strings and escape all entities.

But if you go further than the trivial use cases or if the data you integrate in your XML document comes from an uncontrolled source (like a database connection or another application layer), use a proper library for building XML.

XML is simple, but do not dumb it down to simplistic

Thursday, March 27, 2008

Bugs Of Opportunity

Yesterday night, I fixed a trivial bug in NxBRE. Doing so, I have spent almost 5 times more time writing tests to assert the current behavior and the expected one when the bug would be killed.

This reminds me of an earlier reflection on how being test infected changed my reaction to incoming bugs. Prior to become green light addicted, bugs were my enemy and were received a treatment depending on my mood:

They are now an opportunity to improve and increase test coverage, regardless of the mood I happen to be in:
But, as you have certainly noticed it, grumbling is still part of the process!

Wednesday, March 19, 2008

Fighting The Good Fight

An article published yesterday in the Wall Street Journal (Pleasing Google's Tech-Savvy Staff), made me reflect on the fight for corporate standardization of technologies, a fight in which I have been pretty involved in the past 4 years.

This battle happens at many different levels: operating systems ; database, application and web servers ; development platforms, tools, frameworks and libraries. Is it worth fighting it?

First, let us bear in mind that there are some strong rationale in unifying the different technologies in an IT landscape. Here are a few:
  • Limited and simplified licensing and support contracts negotiations,
  • Facilitated software maintenance and operations,
  • Improved interoperability and potential for re-use.
But there are risks too, like:
  • Golden hammerism: Forcing the use of technologies into scenarios where they are not adequate,
  • Team entrenchment: any technological choice usually satisfies as much people it displeases,
  • Kool-aid intoxication: single vendor dependency often reduces options and opportunities for using better fitted approaches.
Is the Google way, where people are left free - and responsible - for their technological choices a viable approach? Can it even work anywhere else, where the density of geniuses is much lower?

I will not discuss the advantages of single server operating systems strategies: despite the clear advantage in term of resource management, the raise of virtualization platforms have somewhat made a multi-OS environment an easier possibility. Narrowing the discussion to developers, who have the tough job of taking decisions in a world where every day brings a new and promising tool, what is the actual risk of letting them make choices?

My experience in the matter taught me that:
  • No project gets doomed by its technological choices: I have seen more harm done by the abuse or misuse of a particular framework, than by the framework itself. And yes, this also holds true for projects condemned to use Entity Beans.
  • Applications designed on the same platform do not inter-operate by the sole fact of being developed and run on the same platform.
  • Similarly, re-use does not happen by unifying technologies (except if you develop for portals and want to re-use widgets without resorting to HTML scraping).
  • Paid-for support for open source projects is seldom useful (while "quick-start" consulting gigs are valuable).
So, rather than the quixotic pursuit of the perfect unified IT environment, what is the good fight worth sweating and bleeding for? I believe it is worth fighting for the following:
  • Practice quality over dependencies uniformity: more than the usage of a common set of libraries, improving code readability, test coverage, application design and build practices have the biggest impact on the maintainability and evolutility of a particular application.
  • Loose coupling over platform uniformity: an IT landscape greatly benefits from systems that have clear contracts between each other, interact in well defined and contained manners and can gracefully survive if their neighbors have temporarily fallen in digital limbos.
  • Operation friendliness over environment uniformity: applications that are developed while having operations in mind have a happier life in this world. Targeting a particular server or database or OS does not automatically translate into an application that will be easily handled by the production team of a company.
Let us fight the good fight!

Just Read: Beautiful Code

Beautiful Code is probably the most unequal software book I have ever read, both in term of style and content. Some chapters are very formal and academic, while others are more relaxed and down to earth. Some chapters really offer food for thought in the matter of software development, while others are arid displays of obscure code with no lesson to gather from. If all the royalties were not given to Amnesty International, I would have felt totally frustrated by this book. At least, I have the feeling to have served a good cause with my money.

Saturday, March 15, 2008

Unchaining Backward Chaining

NxBRE's Flow Engine is currently under work: I have added a simple backward chaining scheduler that can execute sets in order to produce a specified goal in the rule context. Not all rule bases qualify for it: no construct should exist outside of any set to be usable in this context.

This will allow to support Reaction RuleML as an input, alongside the current proprietary syntaxes, hence to be able to consume the output of Acumen's RuleManager, an excellent tool for BRE related development on the .NET platform.

Stay tuned for the upcoming new release. In the meantime, the most adventurous can already try the backward chaining engine by checking out the latest version out of SVN!

Friday, March 07, 2008

Was @SD West 08

SD West 2008 is over and my brain hurts. But batteries are reloaded: this is what happens when you come close to the luminaries of our industry. And it is a great feeling.

The CMP event team did a great job both expanding the conference with new sessions and filtering out the vendor kool-aid sessions that sometimes managed to enter the schedule.

Kudos and a big thank you to the organizing team!

@SD West 08: Highlights of Day Five

Ten Ways to Improve Your Code (Neal Ford)

Even if I do not fly airplanes anymore (for now?), I try to stay informed about what happens in the pilots' world. One aspect of it that has always impressed me is the importance given to constantly improving one's practice. Software development should not be different. This is why I like this kind of session, as there is always something to improve somewhere!

Neal presented ten ways to walk this path of improvement. Here is a very short version of these ways, consult the slides on-line for the full version:
  1. TDD for its design benefits (including DI).
  2. Static analysis (byte-code & source analysis).
  3. Good citizenship (encapsulation, invariants preservation from construction to mutation, cautious usage of singleton).
  4. YAGNI (no speculative development, no more ivory-towerish frameworks pleaeaease).
  5. Occam's razor (make the difference between essential & accidental complexity).
  6. Question authority (including established so-called standards, rebuke anti-patterns).
  7. SLAP (single level of abstraction principle: for this, refer to yesterday's "Clean Code").
  8. Polyglot programming (leverage languages targeted at specific problems)
  9. Learn the nuances of Java (discover the hidden JDK gems!).
  10. Anti-objects (too inspired by the real world and solving problems in a reverse manner).
Neal also presented 10 corporate development smells that would be hilarious if they were not tragic! Discover them now. For point 4 (the inane debate about stored procedures), you can also read my take on it in this discussion.

To finish I think it is worth quoting Neal's answer to client who try to find excuses for not making things better:
"Your problem is not more complex or so different than everybody else's!"

Responsible Web Design (Scott Fegette)

As if anything in software development could be responsible (software liabilities anyone?), Scott re-stated the need of staying abreast of current standards and best practices during his very open and non-dogmatic session. He also reminded us where we come from and all the progress that has been made along the way.

Here are a few key points:
  • Thinking semantically (avoiding layout-specific markup as much as possible), with microformats for example.
  • Properly managing CSS styles.
  • Opting for un-obstrusive JavaScript (enough of these links that break when JS is disabled!).
  • Staying current with the standards and practices.
I will not claim I followed everything in this session, even though Scott did a great job to demonstrate his points with actual web sites and code samples.

Memory Leaks in Java Applications (Gregg Sporar)

My worst memory leak, despite forgetting anniversaries, happened in 2002 when using Xalan. Each XSL transformation was leaving a bunch of not collectible objects in memory, until the JVM had enough and died. Since then, I am worried about all sorts of leakage (and since I am getting older, this should not surprise you), but I still like XSL ;-)

Gregg is obsessed with memory leaks too, but he works for Sun and knows the problem like the palm of his hand. The goal he had for us in his class was:
"To understand the different types of tools and techniques available for finding memory leaks."

He then went through a thorough review and demonstration of different techniques:
  • Post-mortem inspection: analyzing a thread dump after the JVM crashed with proper tooling.
  • Instrumentation: to perform live analysis of what is going on in the JVM, mainly by analyzing the trend in object generation count.
  • A combination of both: to capture dumps on running code.
He also explained the lack of user-friendly tooling for analyzing class-loader related memory leaks (that can lead to the infamous permanent generation error). To learn further about this interesting and uneasy subject, I encourage you to read the two articles he co-authored for ST&P in April and May 2007, from which he extracted the material for his talk.

Mock Objects / Mock Turtles: The Role of Patterns in TDD (Scott Bain)

After reminding us the youth of our industry, Scott made this very interesting statement:
"Software Development is both an intellectual and a practical profession."

I find this interesting because I tend to over-emphasis the intellectual part of the job when I digress about the nature of our profession. So, yes, there is also a practical dimension to it and Scott argued that testing is a driving force for this concretization.

So how do design patterns play a role in unit testing? On top of allowing us to use a simple name to communicate a lot of information and context, which is extremely valuable, did the GoF give us best practices for testing these patterns? Unfortunately not. But all hope is not lost, as Scott explained as he went on detailing applicable test strategies for the most prominent patterns (strategy, decorator, façade). This is a work in progress that can be contributed to on Net Objectives web site.

So what is the relationship with turtles? Because some patterns force you to know a lot about the chain of objects behind the scene (turtles all the way down) to be able to test them ; and that placing a mock at the right place can alleviate this issue. It is for this astutely located mock that Scott coined the term mock turtle (well in fact, he recycled the term).

Thursday, March 06, 2008

@SD West 08: Highlights of Day Four

HTTP for Web Developers (Jason Hunter)

After his yesterday's talk about caching, Jason detailed the core of the HTTP magic. Indeed modern web development tools very often abstract HTTP out of the development paradigm. You end up with developers who talk about controls on forms as if they were developing Access applications (do not laugh, it happened to me).

HTTP is a wild world: with browsers and servers interpreting the standard in their own manner (often leading to bastardized additions to the standard itself), developers need to know what is happening under the hood.

I will not detail all what Jason talked about but I am always amazed by the amount of extra knowledge you can get when an expert revisits the basics! A highly recommended exercise for anyone who does things with the webernet.

Clean Code: Functions in Java (Robert C. Martin)

Woohoo! A new class from Uncle Bob! It is of course impossible to properly summarize an hour and half of such a magistral session. The main concept that this talk detailed is the following:
"Making code more readable allows us to write code faster, because we end-up reading code twenty times more than writing code when we are in the process of writing code."

So how to achieve this? The general idea is that a function should be an executive summary of what is happening below it: it must be short and should not mix concepts from all layers of the application, in order to remain understandable. This must lead to a recursive functional decomposition and the creation of functions with meaningful and descriptive names all the way down through the abstraction layers. And by the way: if you find hard to find a good name for a method, it is probably because it does too much things.

Here is short summary of the talk:
  • Write small function, and if possible, even write smaller ones.
  • In a method: do one thing to preserve cohesion, i.e. one level of abstraction.
  • Three is the absolute maximum number of arguments: think about how two arguments are already confusing (right order?).
  • No side effects: a method should have no strange temporal coupling, it should always have the same semantics whenever it is called.
  • Throw exceptions instead of returning error codes because they get mixed with the natural outcome of the method and force ugly calling code.
Someone in the audience asked a question that is dear to me: "there is nothing new here, why keeping repeating it?" Uncle Bob replied: "because we do not believe it". This sounds like the classic "Shema Yisreal", with prophets and priests repeating the same things again and again because the people was unfaithful. We need to hear again and again to internalize things so we practice them. Writing code as if reading it matters is not natural but we can reach this goal thanks to method extraction refactoring.

Ouch. I will spend the rest of my life refactoring my own code.

Parallel or Perish!! - Are you Ready? (James Reinders)

After assuring us that multi-core are not a temporary trick to gain performance that will be deprecated later by faster single core processors, James made very clear the urgency and necessity of a mindset shift towards parallel programming, especially for client side developers.

He then gave us several tips that you can read in an article he wrote for DDJ a while ago.

I have really appreciated his remark on how to consider leveraging multicore parallelism in light of an increased work load, and not only by looking after accelerating existing code.

I was surprised by James remark about the lack of parallelism abstraction in Java: since version 1.5, the JDK offers a wealth a concurrency oriented high-level constructs (like collections, conditions, mutexes, futures and whatnot...). He might refer to the thread management themselves, but again I find that executors are offering an interesting way to splitting work around threads. But he confessed his main focus is on C/C++ though!

I am in fact more surprised by the number of Java developers who do not have clear (or at least basic) guidelines about how they write code that runs concurrently without coughing. As James said: we all have to learn and think about parallel development. Coming from Intel, this shows how much learning we can expect in the coming years!

In the meantime, thanks to either smart scheduling or pure randomness, this keynote was followed by two sessions on Java concurrency and parallelism: so I had an immediate opportunity to keep climbing the learning curve!

Thousands of Threads and Blocking I/O (Paul Tyma)

I was really looking forward this session and was not disappointed. The breadth and depth of the material Paul shared with the audience was worth attending. He did a great job debunking some myths about synchronous vs. asynchronous server models, all backed with hard facts. Here are a few of them:
  • Java asynchronous NIO has higher throughput than Java IO (false)
  • Thread context switching is expensive (false)
  • Synchronization is expensive (false, usually)
  • Thread per connection servers cannot scale (false)
Visit Paul's blog for more details and even the slides of the presentation. All in all, this talk confirmed my quasi-gutsy feeling about concurrent development: let the threads flow and be smart about their touch points!

Anti-Patterns in Software Projects: Human Factor (Rob Daigneau)

Writing code as a very mental process, hence is subjected to our human nature, with all its up and down sides. Rob gave a great presentation about how human factor affects software development and how can leaders of all sorts act to make the workplace a better place. This of course spawned a lot of lively discussions, as everybody has so much to say about what happens in their own life in software development.

It is hard to summarize all what Rob said but I really liked his emphasis on passion and how essential it is to let it burn in developers (without let them burning it, which is destructive). I also appreciate one of his final word: except if we write life-critical software, well, it does not really matter that much so better having fun while working. Sounds like a good advice to me.

Developer Bowl

Last year Developer Bowl was all about Googlers and their incredible supremacy in term of computer science knowledge. With Google being in no shortage of big brains, I walked in the theater with the expectation to get a fair deal of deja-vu...

This year was as entertaining as last year's edition. The questions, which were partially submitted by DDJ's readers, were much less oriented on the fundamentals of computer science and more on history and anecdotes. So this year Google did not pass the first round and IBM won versus Intel!

Oh well, it is all rigged anyway ;-)

Wednesday, March 05, 2008

@SD West 08: Highlights of Day Three

Demonstrating WCF: Beyond the Endpoints - Juval Löwy

So .NET is legacy and WCF is here to increase your productivity? What is this all about?Juval did a great job demonstrating how, by building this new platform on the CLR, Microsoft has delivered a complete development environment that offers a clean and efficient programming model for "enterprise" applications.

But is there anything new here? For .NET developers, surely yes. But from a JEE development standpoint: not really. All this sounds like a mix of EJB3 (framework-free classes, remote exceptions), JBoss call stack model (dynamic proxies, client-side and server-side interception), unified synchronous/asynchronous invocation model and workflow for long running operations.

To be fair, in this big mix of already known stuff, there are some pretty powerful features like resilience to change in service contracts. WCF goes to great length to transparently allow client and server at different version levels to keep exchanging messages even if they have changed.

Moreover, unlike the usual stack of disparate half-baked products that are common in Java-land, WCF is a typical Microsoft product: it comes complete with a wealth of tools (like the pretty impressive visual call stack analyzer) and offers to developers a trustable and stable development framework. Conclusion for .NET developers: if you are not using WCF today, do not further delay starting to leverage it!

Behavior-Driven Database Design (BDDD) - Scott Ambler

Scott warned the audience: he is going to be blunt. And he was. Let me quote him: "any monkey can rename a column in a production database"! If this sounds like a scary perspective to you, do not think you are alone: surveys show that there is still a majority of corporates for which this is a challenge too. Why is it so? Mostly because of the mystical belief that what is in the database is perfect and trustable, hence does not require testing.

No testing? Wait a minute. Survey shows that a majority of companies have business critical functionalities in their databases (triggers, stored procedures...). So... no testing? Does this sound reasonable? When all serious software developers are now test infected, is it acceptable that the data management community drags ten years behind in term of quality-oriented practices? When the problem of bad data quality is estimated to cost 600B$ per year in the USA, can we keep going on like this? Of course not.

Is this lack of testing the only factor that makes database refactoring look so hard? Not at all. Scott stated another factor very clearly: mainly because of poor data access architecture leading to tight coupling with the database itself. Ouch! Blunt again. True again.

So how can BDDD help? In short: BDDD is an evolutionary and test-driven (not model driven) approach to refactoring databases. It encourages to consider databases as having nothing special about them, hence to dare applying all the panoply of agile software development to them, which are, to name a few: continuous integration, automated build, SCM, versionning, developer sandboxes, granular refactoring (to minimize the risk of collision) and regular deployment between environment (frequent from developer to integration, less frequent to QA, highly controlled to production).

Because it is so different than traditional approaches, it appears as a threat to most of the data professionals. This should not be the case as their skills and knowledge are needed for the success of this refactoring. Moreover, with many developers now using some sort of O/R mapping tools tempted to dumb down databases as "just storage", having the feedback of data professionals can help leveraging the different features of databases, which must be considered as first class citizens of a software architecture (as application servers are).

The following stop sign in Sacramento shows the current situation: data and quality orthogonal to each other, and a big stop sign in the middle.

This must not be like this for ever. With both the software development and data communities working together, a little understanding and good share of courage, it is possible to make things better!

Object-Oriented Programming and Generic Programming and What Else? (Bjarne Stroustrup)

When the father of C++ gives a keynote about OOP and GP, everybody sits quietly and listens because everybody knows that anything below a fully concentrated attention will be not enough to follow. I did my best to follow the master and did not regret it! Bjarne detailed and compared the strengths and weaknesses of OOP versus GP (using the classical shapes example). He then presented where C++ is heading to (for its 0x version), making the interesting statement that the needs of concurrent programming will more and more shape the destiny of languages.

Bjarne has been honored with the Dr. Dobbs Excellence in Programming Award later on that day, a well deserved recognition for the extent of his contribution to our field.

Web Caching (Jason Hunter)

After reminding us why web caching is critical, which I will not detail here because if you do not know why it is you would rather stop reading this blog and start searching the web, Jason started by detailing the basics of HTTP before delving into five different cache techniques from Yahoo's Mike Radwin. Here is a non exhaustive list of all what he mentioned:
  • browser cache and conditional get for revalidation (leading to very cheap 304 replies from the server),
  • proxy cache that happens at different levels (company, country...),
  • beware of what can prevent caching (cookies, authentication), using mitigation techniques ...
... like:
  • serving static content from a cookie-free TLD or using cache control directives,
  • deciding that images never expire and use different versions of them (there is an Apache mod that helps for this),
  • go even further and version even JavaScript and Flash files (forcing to adapt HTML files so that links target the right and consistent versions),
  • leverage cache and expire directives correctly (for example instead of using 0 to mean "already expired", use a date far in the past, but not before the epoch),
  • do not trust anyone to understand these directives or to understand their latest versions, so cover all the spectrum of parameters and plan for stubborn clients (for example 302 redirect to a highly cached site).

The Busy .NET Developer's Guide to Rules and Rules Engines (Ted Neward)

I always wonder how good or bad are the stuff I am doing with NxBRE, so I decided to attend this session and listen to Ted giving a review of business rules engines in the world of .NET.

We all understand that business rules are natural members of software applications and, I hope, we all realize that hardcoding them makes our life more difficult, especially for rules that change often and bear a lot of conditional branching. Hence the rise of business rules engine (BREs), which origins are rooted in the artificial intelligence efforts and expert systems of the 70s. Ted gave this interesting definition of BREs:

"Business rules engines are generalized expert systems in which the expertise is missing, to be entered via some form of programming language at a later date"

I like it because it relates BREs to expert systems while showing the part left to developers. It re-enforces the idea that there will be a learning curve, that integration will be needed, hence that a BRE will not be for any project nor any budget.

Other interesting aspects to consider when using such beasts include rules edition (and the mythical "business user" editor), testing and validation (about what you can read my take) and operational lifecyle (promoting to production and vice-versa).

I am glad that Ted mentioned NxBRE in this session (without throwing stones at it, woohoo), even if he mainly focused on Microsoft Workflow Engine and Drools aka JBoss Rules .NET (definitively more interesting for the audience, anyway, as WF is available to anyone with .NET 3.x while Drools have been around for a long while).

The fact he has not mentioned RuleML only once, even when making fun of MS WF indigestible XML syntax, puzzled me... maybe RuleML still tastes too much academia for the industry?

18th Annual Jolt Product Excellence Awards
Dr. Dobb’s Excellence in Programming Award

When someone asked Uncle Bob if he would attend the Jolt ceremony, I overheard him reply "naaahh". Still, I can remember him four years ago cracking jokes from the first row with Alexa. So is all the fun gone for real? It is with this thoughts in mind that I entered the theater, ready to be more surprised by the results than ever, as I unfortunately had to drop Jolt Judging for this round.

Well, this ceremony was simply excellent. The whole process has been streamlined, with Productivity Awards simply named, and good drum rolls-slides synchronization! Robert X. Cringely, who was hosting the event, first provided us with a great insight on the transient nature of the software we produce, though stated that this transiency removes nothing of its importance. He then proceeded to the formal trophy handing celebration with brilliance and pizzazz.

I leave it up to the DDJ site to list all the winners of this year. My highlights are the following:
  • Atlassian (or should I say Cenqua?) received well-deserved awards for both Fisheye and Clover.
  • Smart Bear has been recognized for Code Collaborator, whose company published book I have talked about before.
  • O'Reilly Radar was high on the radar screen! If you do not read it yet, well, you know what you have to do...
  • I was really hoping to see the Spring Framework get the Jolt but it went to Google's Guice. I am not convinced that Guice will ever jolt the industry as Spring does. But, hey, a vote is a vote!