Dec 032009

For some of the Inversoft products I’ve been adding new tables for new features into our standard schema. The tricky part with updating the database is also giving customers all the update scripts that they need to run and instructions on how to run them correctly.

I figured I’d see what happened if I added all the JPA Entity classes to the persistence.xml file and didn’t update the database. Since none of the existing JPA Entity classes was changed, the application appears to work fine without the tables in the database.

This is good news for me as it allows me to ship new versions of the application without any fear of breaking customers who don’t update their database. This is true only if that customer doesn’t use any of the new features that access the new tables.

Apr 062007

In the two previous episodes we looked at implicit updates and delayed SQL and the consequences of these pitfalls.

In this episode we’ll explore the pitfall known as cache fetching.

This pitfall occurs when you have a EntityManager that you do many operations with, including updating and selecting the same objects many times. We’ve seen that Hibernate keeps a cache of Objects that have been inserted, updated, deleted and also the Objects that have been queried. This pitfall is just an extensive of that caching mechanism. Here’s an example:

As you can see here, although we inserted one Test Object and then query another, since our query resulted in the Test we had previously added, we just got that one back. In addition, any modifications we made to our Test Object we see when we execute the query.

So, what’s bad about this? First, there is no way to retrieve the original Object after you have modified it unless you create a new EntityManager and use that for the query. We can’t call the refresh method because that would clobber the call to setName(“Bar”). This can become very expensive because each EntityManager uses a different JDBC connection. You could quickly run out of connections, depending on the application (I’ve seen this happen).

Furthermore, if the database has changed underneath you (possibly via straight JDBC or by another server), you will not see the change. This is by far the trickiest pitfall we have seen in this series and one of the more non-intuitive Hibernate behaviors.

Just so folks know, other ORMs divide on this issue. Some feel that going direct to the DB for queries is best, others feel that queries are heavy and hitting a cache is better. I personally feel that caches should be added when necessary, but not default. The ORM should do what the name implies, the Object-Relational Mapping and not caching.

Here’s an example of that:

You can see that if we first add our Test Object to the cache using a query and then modify it using plain old JDBC, the next time we execute the query, we do not get the new values from the database even though we explicitly asked for them. Furthermore if we use MySQL and take a look at the database we get this:

This means that we must be exceptionally omniscient and know that something in our database has changed so that we can call refresh to re-fetch from the database. This expands if you have say 1500 machines or so. If your EntityManager is mostly read-only, you can call refresh after every query and sorta fix the problem, but this really isn’t usable at all. Due to the fact that Hibernate doesn’t have any good work-arounds for this problem and there really isn’t any method of fixing this, the cache fetching pitfall gets a suckiness rating of 9.5 (out of 10).

Apr 032007

This is the second installment of the Hibernate Pitfalls montage. Before we get to this one, I’d like to sum up some comments from the previous one. First of all, I did rename the series. Someone suggested that my title was a bit off and I agreed. Second, many folks wrote in to tell me I’m an idiot about delayed SQL and someone went as far as to tell me that I have no idea about databases. Delaying the execution of SQL really doesn’t impact any portion of Hibernate at all except batching. If Hibernate went to the database after every statement, the only impact would be possibly performance and that would be it. It would still be able to lazy fetch and all the other nice features it has.

Remember that Hibernate is an ORM tool and I’m identifying points of pain that I and many others have felt for a long time with Hibernate. I’m not stating that Hibernate as a tool is completely worthless, I’m just pointing out places where I believe Hibernate is trying to be too smart, or is doing something I feel is cumbersome, annoying, obscure, etc. Now, on to todays installment.

In this episode we will be delving into another facet of Hibernate that has also spilled over into JPA (it appears) that can have some very difficult to handle consequences.

In this episode we’ll explore the pitfall known as implicit updates:

As we have already seen, Hibernate maintains a cache of Objects that have been inserted, updated or deleted. It also maintains a cache of Objects that have been queried from the database. These Objects are referred to as persistent Objects as long as the EntityManager that was used to fetch them is still active. What this means is that any changes to these Objects within the bounds of a transaction are automatically persisted when the transaction is committed. These updates are implicit within the boundary of the transaction and you don’t have to explicitly call any method to persist the values. Here’s an example to illustrate this. This uses the same table and entity as part 1:

As you can see, Hibernate is storing a reference to the Test Object we fetched from the database using a JPA query. If we modify that Object using any of the properties within a transaction, all of the modifications will be persisted when the transaction is committed implicitly.

Okay, so why is this an issue? The main downside to this is that we don’t really know whether or not modifications made to an object might later be persisted. Some code might modify the Object and not even realize that it is in a transaction. In fact we might call a toolkit or external library, which might modify the Object and we might not even know that the Object was modified. There is no way around this unless you forcibly refresh the Object instance from the EntityManager.

Another downside is that we must manage all of our Objects by hand. Instead of telling the EntityManager to update an Object (which is far more intuitive), we must tell the EntityManager which Objects NOT to update. We do this by calling refresh, which essentially rolls back a single entity. We do this just prior to calling commit on the transaction or when we realize the Object shouldn’t be updated. There is a downside to this however that is difficult to remedy. If we want to maintain the changes we’ve made to the Object thus far, but not persist the Object, we have very few options and sometimes no options for accomplishing this. We might use a copy constructor to make a copy of all our work and then refresh the persistent Object, but this is error prone, a maintenance nightmare and brittle.

So, although there is a solution for this issue, it is completely non-intuitive and can cause some applications to be designed in horrible ways just to handle this pitfall. Therefore, implicit updates gets a pitfall rating of 7 (out of 10).

Apr 022007

This will be my first episode in a series of posts regarding Hibernate. I’ve already posts a few sporadic posts about Hibernate, but now I’m collecting everything into a few well directed posts. These will all be using JPA rather than Hibernate APIs. I’ll finish off with a post about other JPA solutions and how they handle things.

In this episode we’ll explore the pitfall known as delayed SQL.

Hibernate maintains an in memory cache of inserted entities, updated entities and removed entities. This cache is later used to generate SQL that is passed to the database. This means that Hibernate does not execute any SQL with the underlying database until it absolutely has to or you tell it to using the flush method. So, if you have something simple like this:

Okay here’s the code (I’ve cut out all the junk like getters and setters to make this simple):

and the code that illustrates this problem. I’m gonna use comments to show you where things happen, but remember that Hibernate gets to decide when it calls the database so it isn’t guaranteed to work this way. It is up to Hibernate.

As you can see, this code illustrates the fact that Hibernate is caching up inserts, updates and deletes until it decides they need to be flushed. It does not execute the SQL when persist is called. There are a few ways to ensure that it does call the database and execute the SQL:

1. Call flush
2. Insert an object that has a @GeneratedValue annotation

These are the only methods of ensuring that Hibernate executes the statements.

So, why is this an issue? It is an issue when you want to know which of several operations failed. Unless you call flush after each operation, you have no way of telling. Here’s an example:

The main issue is that there is not a global way of controlling the flush behavior. Hibernate is storing these things up so that it can batch them in an effort to increase performance. The issue with that approach is that Hibernate is no longer an ORM. It is now acting somewhat like an Object DB or Object cache instead. But unfortunately, Hibernate isn’t a DB because it doesn’t correctly handle all the glorious transactional and distributed computing issues that most enterprise RDBMS and Object caches (i.e. Tangosol Coherence) do.

So, you have a number of solutions to this issue, which makes the suckiness factor of delayed SQL a 3 (out of 10).

Sep 262006

Grails command-line utility is based on Ant. This seems really cumbersome to me since Ant is the complete opposite from a CLI framework. Some of the more annoying things are the lack of CLI switches and lack of the ability to pass parameters. Instead it reads from stdin when doing pretty much anything. Like this:

Pretty annoying.

Second of all, the code generation uses tabs. This is just a preference thing but really they should go ahead and stick to the Java coding standards to get the best coverage.

Okay, been trying to get a simple app up and running and thus far I’ve had a considerable amount of headache with the generator system. If you add new field values in a model you need to delete and re-generate the entire controller/view layer in order for them to show up. This is really cumbersome. They really should scaffold out the view just like Rails does so that adding new fields is straight forward for a scaffolded app. I think this is just because of the generation. I’m going to try scaffolding next… Here goes.

Okay, so scaffolding like this works:

I wonder if they let you selectively override certain methods like Rails does… Let’s find out. Yep, selective overrides seem to work fine.

Okay, next annoyance… In Rails you can add new controllers and domains as you go. In Grails you can’t add new controllers or domains. Well, I’m not positive on the domain front, but it seems that way since if you can’t add controllers you might be able to add a domain, but probably won’t be able to do anything with it. This really slows development time down compared to Rails. This means a server restart each time you add something new.

Next, I never harp on naming except right now. This is something they have to change for everyone who uses Linux, a shell and the tab key (or if you use XP and cygwin). There are two directories in the root of the project that have the same prefix:

This must change and really there is absolutely no reason these should not be named app and test. The fact that I can’t type in one or two characters and then hit tab is REALLY annoying and very slow. Another naming annoyance is in the grails-app directory:

Typing in con is not enough to resolve the directory name. In fact I should only have to type in 1-2 characters. Anyways, just a simple annoyance. They could move conf or call it something different and save a lot of folks some serious headache.

Oh the evil copy issue! It bites everyone at some point and it seems that Grails decided to use this as there preferred mechanism. They apparently make a copy of the web app to a dir called tmp and run it from there. I guess they must run some type of checker that looks for file changes and then copies files over or something because Windows without junctions doesn’t have symlinks and there aren’t any symlinks in there. Anyways, if you create a domain and then later delete it you must delete your tmp directory! Even if you stop and restart the container this tmp directory never gets cleared. This is a bummer. They really should find a more elegant way to handle things. At least you can count on Rails never using a copy of any Ruby class, even the rhtml files that are interpreted by the ERB (if I recall).

The sum up, the reason I did this little exercise is that I wanted to test out whether or not Hibernate would work as the ORM layer. I’ve had a lot of issues with Hibernate including collections, sessions/transactions and of course the nasty session corruption issue that I haven’t written about yet, but a lot of folks have issues with where things like a unique key violation will completely kill the session making the use of open session in view filter very difficult. So, I wanted to test mostly the use of collections and the session/transactions issues. Here is everything I’ve found

  • Grails scaffolding blows up when there are any database errors (unique key, whatever). This causes a stack trace and the error page
  • Grails, like every other Hibernate application still has problems with the sessions and immediate updates. I setup a simple test that would update a row with a unique key violation. According to GORM, the update was successful (i.e. == true). This is because Hibernate said it was successful. Hibernate is really dangerous in this respect because it doesn’t do the update immediately and instead caches it for later. This is large and nasty and I’ll write about it later.
  • Grails still suffers from the open session in view syndrom. They might not actually be opening the session in the view to support lazy loading (I couldn’t determine this without looking at the code), but they have the same issue. If I update the entry and cause a unique key violation and do a redirect, well everything is okay, because it is a new session. However, if I use a forward, then it stack traces and I get the error page. This is because the session is totally jacked the second a database exception is hit. Again, large and nasty and I’ll write about this later.

So, my suggestion to the Grails folks is bail on Hibernate and either use something better like iBatis that can handle unique violations and doesn’t cache updates and all that crap that Hibernate does or write your own. You’ll end up pissing off so many Rails people looking to come back to Java that you’ll end up losing them forever. Not to mention battling with Hibernate forever since Gavin seems blind to the fact that the session was a really bad idea.

More later as I keep working with Grails.