Published on February 23, 2013
I love monitoring. Well actually I don’t, but I realized there’s no sane way to live without it and so I’ve grown to love it.
One of the key components to successfully running big data infrastructure like Apache Hadoop, Apache Cassandra or Apache Zookeeper in production is monitoring the heck out of them. This is crucial in a multitude of aspects. First and foremost is the learning aspect. Looking at these monitoring charts can teach you a lot about the internal workings and behavior of these infrastructures. Second aspect is tuning configuration. For example, when you make a change to Column Family’s caching configuration you need to get insight on how your change affects performance and the overall behavior of the cluster. Third is problem identification, you want to measure the behavior of your infrastructure over time to identify performance degradations, bottlenecks etc,. Fourth is capacity planning and the ability to measure performance as the environment around your infrastructure changes (e.g., traffic growth).
So monitoring is important, very important.
Monitoring Apache Hadoop, Cassandra and Zookeeper using Graphite and JMXTrans continued »
Published on July 19, 2011
Everyone seems to think annotations are cool and want to use them now days. A lot has been written about the pros and cons of using annotations and personally I think they should be used with caution. Mixing annotation domains to a point of total confusion, spreading configuration related meta-data across large code bases to a point where refactoring becomes very hard are only some of the risks. Nevertheless, annotations are cool.
I wanted to add annotation functionality to some “aspect” of our code base. An aspect being an interface Foo and multiple implementations of Foo in various artifacts (e.g., FooA from project A, FooB from project B and so on).
Spring custom annotations using context:component-scan continued »
Published on July 9, 2011
Elasticsearch is a good abstraction around Lucene Search Engine and an alternative to Solr. ElasticSearch provides out of the box index distribution along with a decent JSON RESTful interface backed by a Java/Groovy API. ElasticSearch also elaborate list of modules for a variety of integrations.
I wanted to integrate ElasticSearch in my Spring-based application and a factory/configuration abstraction came to mind.
A an embedded node within a cluster can be instantiated and used to execute requests against the distributed index. It deserves its own factory bean since it has lifecycle management operations and configuration associated with it. Note that configLocation can be provided along with local map of settings, if neither are provided elasticsearch will looks for its configuration files in the working directory or a “config” subdirectory. The local property can also be set (overridable via system property for unit tests).
Elasticsearch with Spring continued »
Published on July 9, 2011
Recently I had to implement an active-passive redundancy of a singleton service in our production environment where the general rule is always have “more than one of anything”. The main motivation is to alleviate the need to manually monitor and manage these services, whose presence is crucial to the overall health of the site.
This means that we sometime have a service installed on several machines for redundancy, but only one of the is active at any given moment. If the active services goes down for some reason, another service rises to do its work. Turns out this is actually called leader election. One of the most prominent open source implementation facilitating the process of leader election is Zookeeper.
Originally developed by Yahoo reasearch, Zookeepr is a service providing reliable distributed coordination. It is highly concurrent, very fast and suitable mainly for read-heavy access patterns. Reads can be done against any node of a Zookeeper cluster while writes a quorum-based. To reach a quorum, Zookeeper utilizes an atomic broadcast protocol.
Leader Election with Zookeeper continued »
Published on March 27, 2011
One of the challenges with writing a presumably successful Facebook application is taking care of scale. with an ever growing user base and the high viral growth potential brought by this social platform you could be looking at a very high traffic if your application is successful. It is wise to plan ahead. Integrating a service like Google App Engine or Amazon AWS could do the trick. Especially for small and medium enterprises which can find the cost model beneficial for their first baby steps.
This post is not going to dwell on the details of creating an App Engine account or a Facebook application. Those are described very well by their providers and across the web.
The example is an application called “famousity”. It authenticates a Facebook user then looks through his friends and determines how famous they are by executing a Google search on their exact name and comparing the hit count. Not very clever but will do in order to get a first feel of these tools.
Creating a Facebook app with Google App Engine and Google Web Toolkit continued »
Published on June 25, 2010
Quartz is an excellent, open-source scheduler which provides many enterprise features such as job persistence to a variety of job store implementations (e.g., RAM, JDBC, etc.), transactions and clustering. Spring offers good integration for Quartz and provides some nice abstractions for using it within the IoC container such as MethodInvokingJobDetailFactoryBean which allows you to use any old bean/method as a JobDetail.
Quartz and Spring Integration continued »