- Activity Log - This is a detailed audit trail of each and every user action you can capture. It provides detailed feedback on your features and how you've made them usable or not. Storing this data in a PostgreSQL partitioned table did well for us. With higher volumes, you may want to go NoSQL.
- Error Log - An embarrassing stack festival that may or may not have direct impact on the end user. No need to mention that this log is best kept empty. A service like Hoptoad can help you with that by putting errors in your face until you resolve them.
- Trace Log - This is where you take the true measure of what your application is actually doing, which is less than obvious in highly distributed applications. Logging correlation IDs and aggregating logs in a central place via syslog or Scribe is a good approach. You'll need searching capacities in these logs: think Clarity or Splunk, depending on your constraints and budget.
- Response Time - This is an obvious metric that will shed some light on your design and implementation. Just be sure you're logging it and paying attention to it.
- DB TPS - Though outside of the pure realm of your application feedback loop, this metric gives you a good measure on how DB intensive is your application and if it needs some redesign, like for example some low hanging fruits where caching could help.
- Cache Hit/Miss - Caching brings as much problems as it solves: a cache-happy application doesn't come for free, especially if it is distributed. Measuring the hit/miss ratio on each cache can help validate their usefulness or lack thereof.
- MQ Throughput - Monitoring of queues for high watermark thresholds is commonly done outside of the application's realm. An interesting MQ-related data an application can log is the time a message has been in-flight, including, or not, the processing time of the message after it's been consumed.
- Activity Intensity - This last one is a fun one: by representing the number of active application sessions and the current database activity, you can get a great idea of how active (or bored) are your users.
Tuesday, October 26, 2010
Listen to Your Applications
Tuesday, October 12, 2010
Wednesday, September 15, 2010
DevOps: Time for Agile Operations!
Go check it out!
Friday, September 10, 2010
Erlang + Cloud Files = cferl
- Browse the readme document that contains many more syntax examples.
- Download cferl 1.0.
- Fork the project or report issues on github.
Sunday, September 05, 2010
Recently Reviewed: Patterns-Based Engineering
Friday, August 27, 2010
Monday, May 24, 2010
Data Interaction Patterns
As you know, when data is involved caching comes into play when performance and scalability are sought. In the coming diagrams, cache is represented as a vertical rectangle. The persistent storage is represented as a vertical blue cylinder, while horizontal cylinders represent some form of reliable and asynchronous message delivery channels. The data interactions are represented with curvy arrows: they can represent reading or writing.
Direct [R/W]
Besides the obvious drawbacks coming from the temporal coupling with the persistent storage mechanism, the interesting thing to note in such a trivial data access pattern is that there is often some form of request-scoped caching happening without the need to explicitly do anything. This first level of cache you get from data access layers help in optimizing operations provided they occur in the same request (to which is bound the transaction, if one exists).
Being short lived, this kind of caching is free from the problem of expired cache entries eviction: it can kick in transparently without the application being aware of it.
Through Cache [R/W]
Reading through cache is a simple and powerful mechanism where an application tries first to read from a long lived cache (a very cheap operation) and, if the requested data can't be found, proceeds with a read in the persistent storage (a way more expensive operation).
It's interesting to note that write operations don't necessarily happen the same way, ie. it is well possible that a write to the persistent storage doesn't perform a similar write in the cache. Why is that? Cached data is often a specific representation of the data available in the storage: it can be for example an aggregation of different data points that correspond to a particular cache key. The same persistent data can lead to the creation of several different cache entries. In the case, a write can simply lead to an immediate cache flush, waiting for subsequent read operations to repopulate these entries with new data.
Conversely, it's possible to have write operations update the cache, which opens the interesting problem of consistency. In the current scenario, the persistent storage remains the absolute truth of consistency: the application must handle the case when the cache was inconsistent and led to an invalid data operation in the persistent storage. I've found that localized cache evictions work well: the system goes through a little hiccup but quickly restores its data sanity.
Though some data access technologies allow the automatic management of this kind of second level of caching, I personally prefer that my applications have an explicit interaction with the caching technology they use, and this at the service layer. This is especially true when considering distributed caching and the need to address the inherent idiosyncrasies of such a caching model.
This said, stickiness skews load balancing and doesn't play well when you alter a pool of servers: I've really became convinced that you get better applications by preventing stickiness and letting requests hit any server. In that case, cache distribution or clustering becomes necessary: the former presents some challenges (like getting stale data after a repartition of the caching continuum) but scales better than the latter.
Write Behind [W]
Writing behind consists in updating the data cache synchronously and then defer the writing to the persistent storage to an asynchronous process, through a reliable messaging channel.
This is possible with regular caching technologies if there is no strong integrity constraints or if it's acceptable to present temporarily wrong data to the data consumer. In case the application has strong integrity constraints, the caching technology must be able to become the primary source of integrity truth: consistent distributed cached that supports some form of transactional data manipulation becomes necessary.
In this scenario, the persistent storage doesn't enforce any form of data constraint, mostly because it is too hard to propagate violation issues back to the upstream layers in any meaningful form. One could wonder what is the point of using such a persistent storage if it is dumbed down to such a mundane role: if this storage is an RDBMS, there is still value in writing to it because external systems like a back-office or business intelligence tools often require to access a standard data store.
Cache Push [R]
Pushing to cache is very useful for data whose lifecycle is not related to the interactions with its consumers. This is valid for feeds or the result of expensive computations not triggered by client requests.
The mechanism that pushes to cache can be something like a scheduled task or a process consuming asynchronous message channels.
Future Read [R]
In this scenario, the data producers synchronously answers the consumers with the promise of the future delivery of the requested data. When available, this data is delivered to the client via some sort of server push mechanism (see next section).
This approach works very well for expensive computations triggered by client requests.
Server Push [R]
Server push can be used to complement any of the previous interactions: in that case, a process prepares some data and delivers it directly to the consumer. There are many well known technological approaches for this, including HTTP long-polling, AJAX/CometD, web sockets or AMQP. Enabling server push in an application opens the door to very interesting data interactions as it allows to decouple the activities of the data consumers and producers.
Monday, May 17, 2010
Infected but not driven
There are some interesting discussions going on around TDD and its applicability, which I think are mostly fueled by the heavy insistence of TDD advocates on their particular way of approaching software development in general and testing in particular. The more time I spend thinking about these discussions, the more it becomes clear to me that as far as testing is concerned, the usual rule of precaution of our industry applies: ie. it depends.
Behavior
Public Interface
When creating non-trivial public functions, I've found a great help into going through a serious amount of code reading in the different places where it is envisioned these functions will be used. Reading a lot of code before writing a little of it is commonplace in our industry: while going through the reading phase, you're actually loading all sort of contextual information in your short term memory. Armed with such a mental model, it becomes possible to design new moving parts that will naturally fit in this edifice. So that I guess that practice would be RDD (reading driven development).
Friday, May 14, 2010
Just Read: Zabbix 1.8 Network Monitoring
Friday, April 30, 2010
Grafting Mule Endpoints
Note: The following code samples are applicable to Mule 2.
In Mule ESB, outbound dispatching to a destination whose address is known at runtime only is a pretty trivial endeavor. A less frequent practice consists in programmatically defining inbound service endpoints.
- The class implements MuleContextAware in order to receive an instance of the MuleContext, which is the key to the gate of Mule's innermosts. Some might consider fetching this class from the connector object that gets injected in this class too: I personally find this less desirable for design reasons that I'll let Demeter explain.
- The endpoint is bound to the desired connector by passing its name in the URI used to create it. This allows picking up the right connector, which is compulsory for any Mule configuration with more than one instance of a particular connector (file connectors in this case).
- Endpoint specific configuration parameters, like moveToDirectory, are configured as extra URI parameters. You can also add other parameters, as key/value pairs: they will be automatically added to the message properties dispatched from this endpoint.
Wednesday, April 28, 2010
Just Read: 97 Things Every Programmer Should Know
If you're familiar with the practices that the Agile, XP or Software Craftsmanship movements are putting forward, you'll find that you already knew and agreed with most of the book. In that case, the real value of this book will come from the few essays you'll find questioning or disagreeing with, as you will have to self-introspect and decide if your disagreement is founded or based on prejudices.
Tuesday, April 20, 2010
Wednesday, April 14, 2010
Just Read: Coders at Work
Thursday, February 25, 2010
The Holy Grail Of Persistence?
- Schema Migration - For a startup, it's critical to be able to evolve a database schema with the less friction possible as features are often in a state of flux.
Using a standard DB like PostgreSQL allowed us to leverage Ruby's ActiveRecord Migration, which is not only handy for migrating forward (as you do in production) but also backwards (as you sometimes have to do in development). Though Mnesia record evolution is possible, the fact that data migration concerns permeate into the application code is very unpleasant. Going schema free was a tempting option but would not have come close to the flexibility ActiveRecord and PostgreSQL gave us. - Supporting Resources - Being able to solve problems quickly is essential for a startup: for everything that is not your core business, you usually rely a lot on the information available out there.
PostgreSQL has an extensive body of knowledge available online and in print. When things go haywire or in case of doubt, you're pretty much guaranteed that a Google search will bring you at least a couple of pages where people asked the exact same question and got answers for them. With Mnesia, the amount of available information is way reduced, simply because it's still very much a niche database. - Standard Connectivity - When you're focused on building something new, the last thing you want is wasting time in re-inventing the wheel: interoperable building blocks are key.
Using an standard database like PostgreSQL gave us immediate access to tools like Pentaho's Data Integration, which we use to massage data. Though we could have built an army of supporting tools to perform the same on Mnesia, it's always better to use something that's already there. I has also allowed us to fully leverage Ruby On Rails to build an awesome back office in no time. Though there are some Ruby-Erlang bridges out there, none gives you all the RAD features you get when plugging Rails to a standard database. - Operational Simplicity - In a startup, there's no DBA to nurse your database engine: you have to deal with it so it better be simple to operate.
Installing, upgrading, backing-up, restoring PostgreSQL databases are all well defined operations, supported by a wealth of tools. The security model is straightforward too. And there are plenty of options for monitoring what's happening under the hood and analyze and tune performances. I have no doubt all this is possible with Mnesia, but in a less familiar and straightforward manner.