Apr 102007

I found a cool utility to help we handle 2-up PDF creation on Linux, since Open Office doesn’t do this. It is called PDFJam and it is available in the Ubuntu repositories. Very cool PDF utility.

Apr 062007

In the two previous episodes we looked at implicit updates and delayed SQL and the consequences of these pitfalls.

In this episode we’ll explore the pitfall known as cache fetching.

This pitfall occurs when you have a EntityManager that you do many operations with, including updating and selecting the same objects many times. We’ve seen that Hibernate keeps a cache of Objects that have been inserted, updated, deleted and also the Objects that have been queried. This pitfall is just an extensive of that caching mechanism. Here’s an example:

As you can see here, although we inserted one Test Object and then query another, since our query resulted in the Test we had previously added, we just got that one back. In addition, any modifications we made to our Test Object we see when we execute the query.

So, what’s bad about this? First, there is no way to retrieve the original Object after you have modified it unless you create a new EntityManager and use that for the query. We can’t call the refresh method because that would clobber the call to setName(“Bar”). This can become very expensive because each EntityManager uses a different JDBC connection. You could quickly run out of connections, depending on the application (I’ve seen this happen).

Furthermore, if the database has changed underneath you (possibly via straight JDBC or by another server), you will not see the change. This is by far the trickiest pitfall we have seen in this series and one of the more non-intuitive Hibernate behaviors.

Just so folks know, other ORMs divide on this issue. Some feel that going direct to the DB for queries is best, others feel that queries are heavy and hitting a cache is better. I personally feel that caches should be added when necessary, but not default. The ORM should do what the name implies, the Object-Relational Mapping and not caching.

Here’s an example of that:

You can see that if we first add our Test Object to the cache using a query and then modify it using plain old JDBC, the next time we execute the query, we do not get the new values from the database even though we explicitly asked for them. Furthermore if we use MySQL and take a look at the database we get this:

This means that we must be exceptionally omniscient and know that something in our database has changed so that we can call refresh to re-fetch from the database. This expands if you have say 1500 machines or so. If your EntityManager is mostly read-only, you can call refresh after every query and sorta fix the problem, but this really isn’t usable at all. Due to the fact that Hibernate doesn’t have any good work-arounds for this problem and there really isn’t any method of fixing this, the cache fetching pitfall gets a suckiness rating of 9.5 (out of 10).

Apr 032007

This is the second installment of the Hibernate Pitfalls montage. Before we get to this one, I’d like to sum up some comments from the previous one. First of all, I did rename the series. Someone suggested that my title was a bit off and I agreed. Second, many folks wrote in to tell me I’m an idiot about delayed SQL and someone went as far as to tell me that I have no idea about databases. Delaying the execution of SQL really doesn’t impact any portion of Hibernate at all except batching. If Hibernate went to the database after every statement, the only impact would be possibly performance and that would be it. It would still be able to lazy fetch and all the other nice features it has.

Remember that Hibernate is an ORM tool and I’m identifying points of pain that I and many others have felt for a long time with Hibernate. I’m not stating that Hibernate as a tool is completely worthless, I’m just pointing out places where I believe Hibernate is trying to be too smart, or is doing something I feel is cumbersome, annoying, obscure, etc. Now, on to todays installment.

In this episode we will be delving into another facet of Hibernate that has also spilled over into JPA (it appears) that can have some very difficult to handle consequences.

In this episode we’ll explore the pitfall known as implicit updates:

As we have already seen, Hibernate maintains a cache of Objects that have been inserted, updated or deleted. It also maintains a cache of Objects that have been queried from the database. These Objects are referred to as persistent Objects as long as the EntityManager that was used to fetch them is still active. What this means is that any changes to these Objects within the bounds of a transaction are automatically persisted when the transaction is committed. These updates are implicit within the boundary of the transaction and you don’t have to explicitly call any method to persist the values. Here’s an example to illustrate this. This uses the same table and entity as part 1:

As you can see, Hibernate is storing a reference to the Test Object we fetched from the database using a JPA query. If we modify that Object using any of the properties within a transaction, all of the modifications will be persisted when the transaction is committed implicitly.

Okay, so why is this an issue? The main downside to this is that we don’t really know whether or not modifications made to an object might later be persisted. Some code might modify the Object and not even realize that it is in a transaction. In fact we might call a toolkit or external library, which might modify the Object and we might not even know that the Object was modified. There is no way around this unless you forcibly refresh the Object instance from the EntityManager.

Another downside is that we must manage all of our Objects by hand. Instead of telling the EntityManager to update an Object (which is far more intuitive), we must tell the EntityManager which Objects NOT to update. We do this by calling refresh, which essentially rolls back a single entity. We do this just prior to calling commit on the transaction or when we realize the Object shouldn’t be updated. There is a downside to this however that is difficult to remedy. If we want to maintain the changes we’ve made to the Object thus far, but not persist the Object, we have very few options and sometimes no options for accomplishing this. We might use a copy constructor to make a copy of all our work and then refresh the persistent Object, but this is error prone, a maintenance nightmare and brittle.

So, although there is a solution for this issue, it is completely non-intuitive and can cause some applications to be designed in horrible ways just to handle this pitfall. Therefore, implicit updates gets a pitfall rating of 7 (out of 10).

Apr 022007

This will be my first episode in a series of posts regarding Hibernate. I’ve already posts a few sporadic posts about Hibernate, but now I’m collecting everything into a few well directed posts. These will all be using JPA rather than Hibernate APIs. I’ll finish off with a post about other JPA solutions and how they handle things.

In this episode we’ll explore the pitfall known as delayed SQL.

Hibernate maintains an in memory cache of inserted entities, updated entities and removed entities. This cache is later used to generate SQL that is passed to the database. This means that Hibernate does not execute any SQL with the underlying database until it absolutely has to or you tell it to using the flush method. So, if you have something simple like this:

Okay here’s the code (I’ve cut out all the junk like getters and setters to make this simple):

and the code that illustrates this problem. I’m gonna use comments to show you where things happen, but remember that Hibernate gets to decide when it calls the database so it isn’t guaranteed to work this way. It is up to Hibernate.

As you can see, this code illustrates the fact that Hibernate is caching up inserts, updates and deletes until it decides they need to be flushed. It does not execute the SQL when persist is called. There are a few ways to ensure that it does call the database and execute the SQL:

1. Call flush
2. Insert an object that has a @GeneratedValue annotation

These are the only methods of ensuring that Hibernate executes the statements.

So, why is this an issue? It is an issue when you want to know which of several operations failed. Unless you call flush after each operation, you have no way of telling. Here’s an example:

The main issue is that there is not a global way of controlling the flush behavior. Hibernate is storing these things up so that it can batch them in an effort to increase performance. The issue with that approach is that Hibernate is no longer an ORM. It is now acting somewhat like an Object DB or Object cache instead. But unfortunately, Hibernate isn’t a DB because it doesn’t correctly handle all the glorious transactional and distributed computing issues that most enterprise RDBMS and Object caches (i.e. Tangosol Coherence) do.

So, you have a number of solutions to this issue, which makes the suckiness factor of delayed SQL a 3 (out of 10).