Hibernate Pitfalls part 3

In the two previous episodes we looked at implicit updates and delayed SQL and the consequences of these pitfalls.

In this episode we’ll explore the pitfall known as cache fetching.

This pitfall occurs when you have a EntityManager that you do many operations with, including updating and selecting the same objects many times. We’ve seen that Hibernate keeps a cache of Objects that have been inserted, updated, deleted and also the Objects that have been queried. This pitfall is just an extensive of that caching mechanism. Here’s an example:

EntityManager em = emf.createEntityManager();
EntityTransaction et = em.getTransaction();

et.begin();
Test t = new Test();
t.setName("Foo");
t.setId(1);
em.persist(t);

t.setName("Bar");

Test t2 = (Test) em.createQuery("select test from Test test where test.id=1").
    getSingleResult();
Assert.assertSame(t, t2);
Assert.assertEquals("Bar", t2.getName());
et.commit();
em.close();

As you can see here, although we inserted one Test Object and then query another, since our query resulted in the Test we had previously added, we just got that one back. In addition, any modifications we made to our Test Object we see when we execute the query.

So, what’s bad about this? First, there is no way to retrieve the original Object after you have modified it unless you create a new EntityManager and use that for the query. We can’t call the refresh method because that would clobber the call to setName(“Bar”). This can become very expensive because each EntityManager uses a different JDBC connection. You could quickly run out of connections, depending on the application (I’ve seen this happen).

Furthermore, if the database has changed underneath you (possibly via straight JDBC or by another server), you will not see the change. This is by far the trickiest pitfall we have seen in this series and one of the more non-intuitive Hibernate behaviors.

Just so folks know, other ORMs divide on this issue. Some feel that going direct to the DB for queries is best, others feel that queries are heavy and hitting a cache is better. I personally feel that caches should be added when necessary, but not default. The ORM should do what the name implies, the Object-Relational Mapping and not caching.

Here’s an example of that:

EntityManager em = emf.createEntityManager();
EntityTransaction et = em.getTransaction();

et.begin();
Test t = new Test();
t.setName("Foo");
t.setId(1);
em.persist(t);
et.commit();

// Put the Object in the cache
Test t2 = (Test) em.createQuery("select test from Test test where test.id=1").
    getSingleResult();

// This method uses a separate JDBC connection to update the row
executeViaJDBC("update Test set name = 'Baz' where id = 1");

Test t3 = (Test) em.createQuery("select test from Test test where test.id=1").
    getSingleResult();
Assert.assertEquals("Foo", t3.getName()); // This object is Foo

em.refresh(t3);
Assert.assertEquals("Baz", t3.getName()); // Now it is Bar since we reloaded it

em.close();

You can see that if we first add our Test Object to the cache using a query and then modify it using plain old JDBC, the next time we execute the query, we do not get the new values from the database even though we explicitly asked for them. Furthermore if we use MySQL and take a look at the database we get this:

mysql> select * from Test;
+----+------+
| id | name |
+----+------+
|  1 | Baz  |
+----+------+

This means that we must be exceptionally omniscient and know that something in our database has changed so that we can call refresh to re-fetch from the database. This expands if you have say 1500 machines or so. If your EntityManager is mostly read-only, you can call refresh after every query and sorta fix the problem, but this really isn’t usable at all. Due to the fact that Hibernate doesn’t have any good work-arounds for this problem and there really isn’t any method of fixing this, the cache fetching pitfall gets a suckiness rating of 9.5 (out of 10).

30 thoughts on “Hibernate Pitfalls part 3

  1. Brian,

    Every technology has weaknesses. If it is not the right tool for the job then don’t use it. Many people opt for Ibatis for the very reasons you describe. There are a lot of pitfalls and situations where hibernate (and jpa) make things far harder than they need to be. However, Hibernate is an incredibly powerful tool. Generally, I’ve tended to favor the open session in view pattern with hibernate. This tends to leverage the cache and lazy loading features to the fullest. Furthermore, this principle also can be extended to a service layer where the view is an xml response to the client rather than html to a browser.

    Are you trying to dismiss Hibernate or just point out potential problems? From reading other people’s comments, it sounds like others are confused. Would you ever use hibernate?

    Like

  2. Yeah, I totally agree. Hell most of my projects have pretty solid weaknesses. 🙂 I think my goal is to open things up a bit. I’ve always been very – okay extremely – cynical. I don’t ever sit back and say that some framework solves all my problems, because they never do. I tear into things and figure out how they work, look at the code and figure out why I just spent 7 hours tracking down some bug. I’m also not one to say, ditch Hibernate and use something else. I let folks make up their own mind on that front. You decide what works best.

    As for the open session in view pattern, I use this all the time. It seems to work the best for everything you’ve mentioned. However, unless you ignore Hibernate’s rule about closing EntityManager when exceptions occur, this pattern is extremely painful with data models where unique key constraints are common and extremely useful.

    In terms of selecting Hibernate, I use Hibernate a lot. In two projects recently I’ve used Hibernate, well really JPA with Hibernate, but Hibernate has worked well in most cases.

    Like

  3. One last thing, I highly recommend ignoring the Hibernate close the entity manager rule. I post a topic on the forums and no one responded, but from everything I can tell, Hibernate’s EntityManager implementation works just fine after exceptions. There is absolutely no reason to chuck them, just rollback the current transaction and everything works fine.

    Here’s the forum link:

    http://forum.hibernate.org/viewtopic.php?t=972713&start=0&postdays=0&postorder=asc&highlight=

    Like

  4. Ok Brian now you really lost it 😉

    Have you ever heard about FlushMode.NONE ?

    Transactional integrity ?

    Shadow-instances ?

    Does StatelessSession ring a bell ?

    etc.

    And finally just a tiny correction – it is actually JPA you are “pitfalling” about more than Hibernate.

    Like

  5. Max,

    There isn’t a FlushMode.NONE. There is a MANUAL, which means I have to flush everything by hand if that is what you are talking about. The NEVER is pretty much deprecated at this point. But still, this post doesn’t really have anything to do with flushing because I’m just doing queries.

    When you say transactional integrity, what ar you referring to? I’m not sure what that has to do with cache fetching? The transactions are managed by the database and never by Hibernate, especially across multiple machines. Although, this pitfall has nothing to do with transactions.

    You lost me with shadow-instances? Not sure what this is?

    As for StatelessSession, I’ve already covered this in other comments, but that class is not really useful for most cases especially with lazy-loading needs.

    Remember, this is specifically addressing the fact that queries don’t hit the database. I’m not aware of methods of changing that behavior unless I use StatelessSession. This doesn’t have anything to do with lazy fetching, inserts, updates, etc.

    Like

  6. Sorry – i mistyped. I mean’t FlushMode.NEVER which btw. is the exact same thing as FlushMode.MANUAL but the latter is the better and non-deprecated name.

    So what is the thing I call shadow-instances ?

    Well imagine this code:

    c1 = session.get(Customer.class, 42);
    c2 = session.get(Customer.class, 42);
    (note: the get could be replaced with any other kind of query, hql, criteria, *lazy navigation* etc.)

    Now if I apply your wanted logic:

    assertSame(c1, c2); should fail but
    assertEqual(c1, c2); would actually pass (if you have a proper equal/hashcode implementation)

    Now what harm would it do if we had this kind of logic ?

    Changing c1 attributes would not affect c2.

    If Hibernate (or any other JPA implementation) would not do this you would have these “shadow objects” all over the place.

    You would have to manually keep track and ensure that you only use *one* of these objects.

    From reading your other “pitfall” stories you probably don’t think that is bad and think that you should have the responsibility to manage this…..and as I’ve said many times before: Use StatelessSession.

    Then you start claiming that StatelessSession does not do enough; e.g. does not do lazy loading of objects etc…but try and rethink this again when you also ask for Hibernate to not use its cache to resolve already loaded instances ?

    What would happen in the case of you lazy loading something like this:

    // we get a customer
    c1 = session.get(Customer.class, 42);

    // we get that customers orders
    List c1orders = c1.getOrders();

    // we get one of those orders customer
    c2 = c1orders.get(0).getCustomer();

    Now, what would you expect now ?

    They should definitly be equals, otherwise you domainlogic using set’s will be broken:
    assertEquals(c1,c2);

    Should they be the same ?
    assertSame(c1,c2);

    In your “world” they would not be because you don’t want queried object to be resolved in a cache.

    And imagine how it would go for Collections ?

    c1.getOrders().get(0).getOrders().get(…etc.

    You would be loading more and more of the *same* data into memory and your object graph would be inconsistent.

    So there you have “shadow-instances” explained and this actually also relates to what I mentioned as transactional integrity. If you allow “shadow-instances” you also let go of transactional integrity because the object models your are working on will not be consistent!

    And to your last point “the fact that queries don’t hit the database” is actually factually wrong! Queries (as in HQL, Criteria, Native sql) *always* hits the database BUT we always resolve entity and collections in the session cache to avoid all the things above.

    So yes it has *everything* to do with lazy fetching!

    If you really wanted to get a “snapshot” of what data is in the database (making your data operations “non-transactional”) then you got several options:

    0) Use StatelessSession

    1) Open a new session/entitymanager from the session/entitymanagerfactory (which you claim it is too heavy for you)

    2) Open a new session on the *existing* connection (sf.openSession(s1.connection()) which removes the “heavyness” you are referring too but is actually an antipattern since you now can end up mixing up connections managed by the container and you)

    3) Use native queries/hql/criteria with ResultTransformer to get value objects instead of entities for you to use.

    This turned out to be a very long and hopefully my last comment on this – I just couldn’t let your “one-sided” arguments against these stand alone.

    Like

  7. Max, thanks for the excellent response. This is a great discussion and much more productive than, “you’re an idiot” comments I’ve gotten. Let’s discuss some of this stuff:

    Sorry – i mistyped. I mean’t FlushMode.NEVER which btw. is the exact same thing as FlushMode.MANUAL but the latter is the better and non-deprecated name.

    FlushMode just means that I gotta call flush by hand. So, I’m still missing what you mean by this one I guess. I think the way EntityTransactions work is good and don’t really need to stop having Hibernate flush for me. I just want it to flush after each command unless I explicitly tell it to batch. The default should be least surprise.

    So what is the thing I call shadow-instances ?

    Well imagine this code:

    c1 = session.get(Customer.class, 42);
    c2 = session.get(Customer.class, 42);
    (note: the get could be replaced with any other kind of query, hql, criteria, *lazy navigation* etc.)

    Now if I apply your wanted logic:

    assertSame(c1, c2); should fail but
    assertEqual(c1, c2); would actually pass (if you have a proper equal/hashcode implementation)

    Now what harm would it do if we had this kind of logic ?

    Changing c1 attributes would not affect c2.

    I’m gonna take this stuff in chunks because it is so long. This is correct. If I’ve queried some object from the database deep in some class and change it, tweak it, and jack with it just to experiment to see if some logic is gonna work, I’ve actually completely jacked with every other place in the entire stack (tiers, whatever) that has ever gotten the same instance from Hibernate. This scares me a lot because it is not modular. I’ve introduce coupling because the instances are the same. If I wanted that I would just create a ThreadLocal Map and store everything in there. I’m asking Hibernate to fetch me an instance not to make a ThreadLocal cache of it.

    If Hibernate (or any other JPA implementation) would not do this you would have these “shadow objects” all over the place.

    You would have to manually keep track and ensure that you only use *one* of these objects.

    From reading your other “pitfall” stories you probably don’t think that is bad and think that you should have the responsibility to manage this…..and as I’ve said many times before: Use StatelessSession.

    (I’ll get to StatelessSession in a bit) Yep. I’d much prefer this because then my libraries/components/modules are completely de-coupled and I can now experiment with objects easily without having to clone or copy construct them. Plus, I don’t have to teach all 300 developers at my company that they need to clone/copy construct an object if they want to change it. If I want to persist changes I can fetch from the database, modify that instance and save it. Or pass that instance around and then save it. Either way works for me.

    Then you start claiming that StatelessSession does not do enough; e.g. does not do lazy loading of objects etc…but try and rethink this again when you also ask for Hibernate to not use its cache to resolve already loaded instances?

    What would happen in the case of you lazy loading something like this:

    // we get a customer
    c1 = session.get(Customer.class, 42);

    // we get that customers orders
    List c1orders = c1.getOrders();

    // we get one of those orders customer
    c2 = c1orders.get(0).getCustomer();

    Now, what would you expect now ?

    They should definitly be equals, otherwise you domainlogic using set’s will be broken:
    assertEquals(c1,c2);

    Should they be the same ?
    assertSame(c1,c2);

    In your “world” they would not be because you don’t want queried object to be resolved in a cache.

    I think you went one step too far. There is a difference between lazy loading and pointers. In my mind what you are describing here is pointers and not lazy loading. Of course the Order is going to have a pointer to the Customer and vice versa. You are creating a closed object-graph that uses pointer identity. That’s fine because if you open the classes you would expect it.

    public class Customer {
    List orders;
    }

    public class Order {
    Customer customer;
    }

    This is not surprising and Hibernate does this fine. If I were going to build this by hand I would ensure that they are the same customer.

    And imagine how it would go for Collections ?

    c1.getOrders().get(0).getOrders().get(…etc.

    You would be loading more and more of the *same* data into memory and your object graph would be inconsistent.

    Just to re-iterate, if I were writing this by hand I wouldn’t do that. I would ensure they were the same pointer. I think a good example is:

    c1 = session.get(Customer.class, 42);
    c2 = session.get(Customer.class, 42);

    assertSame(c1, c1.getOrders().get(0).getCustomer());
    assertSame(c2, c2.getOrders().get(0).getCustomer());
    assertNotSame(c1, c2);
    assertNotSame(c1.getOrders().get(0).getCustomer(), c2.getOrders().get(0).getCustomer());

    In my case this should all pass.

    So there you have “shadow-instances” explained and this actually also relates to what I mentioned as transactional integrity. If you allow “shadow-instances” you also let go of transactional integrity because the object models your are working on will not be consistent!

    I disagree on transactional integrity. The object model in my case is just fine. The difference here is that I want a query to create a new instance AND re-fetch from the database.

    And to your last point “the fact that queries don’t hit the database” is actually factually wrong! Queries (as in HQL, Criteria, Native sql) *always* hits the database BUT we always resolve entity and collections in the session cache to avoid all the things above.

    Hmmmm… I didn’t get far enough in my debugger session to determine exactly if this is accurate, but if this is true and considering the test case I have it would appear that Hibernate is in fact going to the database and then not using the data returned. Is that accurate? Is this actually a wasted round trip to the database or am I missing something?

    So yes it has *everything* to do with lazy fetching!

    You’ll still have to convince me of that. Lazy fetching in my case, where each query creates a new instance, still works fine and still constructs the correct object model and correctly keeps pointer integrity. So, given my points above, what does cache querying have to do with lazy-fetching? I still want getOrders() to hit the database and populate the List. The only thing I could see is that if I call getOrders a second time it doesn’t hit the database but just uses the List it already fetched. Which is fine by me. If I need to update the list I’ll refetch or refresh in the exact same manner I would if I were building this by hand.

    If you really wanted to get a “snapshot” of what data is in the database (making your data operations “non-transactional”) then you got several options:

    I’m not sure how getting a snapshot makes things non-transactional. Again, you’ll have to convince me of that given my points above. How does having to Objects (remember these are java objects not cursors) break transactions. I can still update an Object and persist it and then check out the other Object I fetched before and see that it is different and requires a refresh or a re-query. This makes sense to me. I’d much rather have the Object that is local to a method and NEVER passed out of that method stay unchanged. Hibernate violates that contract in a non-obvious and non-intuitive way.

    0) Use StatelessSession

    Yeah, probably never use that thing, but it does make sense for some cases, just not 95% of the cases out there.

    1) Open a new session/entitymanager from the session/entitymanagerfactory (which you claim it is too heavy for you)

    It is heavy to check in and out JDBC connections constantly and it is a documented anti-pattern. Plus it is nasty to deal with Objects when you have many sessions working on the single instances. I much prefer to have the open session in view model where I can lazy load easier.

    2) Open a new session on the *existing* connection (sf.openSession(s1.connection()) which removes the “heavyness” you are referring too but is actually an antipattern since you now can end up mixing up connections managed by the container and you)

    This would be the closets solution since it would remove my “ThreadLocal cache” concern. This does seem like a royal pain, but I would have to try it and see. Not sure this is possible via JPA. I’m pretty sure it isn’t.

    3) Use native queries/hql/criteria with ResultTransformer to get value objects instead of entities for you to use.

    Never thought about that. I’ll have to look into it and see. This would obviously be a Hibernate specific solution.

    This turned out to be a very long and hopefully my last comment on this – I just couldn’t let your “one-sided” arguments against these stand alone.

    I really appreciate the responses that you have given. You’re definitely the most knowledgable person responding and definitely have great information to share. One suggestion I would have for you in the future would be to be pragmatic. You work for JBoss it seems, so you are obviously slanted somewhat, but even I look at my own work and tear into it constantly. I find flaws and re-write or completely ditch things because they didn’t work or didn’t address concerns or were just a pain to maintain, train, debug, whatever. Like Mike said earlier, everything has flaws and don’t solve everyones problems and I find that locating flaws and opening them up makes engineers be better at their jobs. When someone says, “I love Savant and it just works for everything we need,” I know they are either pretty green or just haven’t gotten to that point where it doesn’t work for them because at some point it won’t. On the flip side when someone says, “Savant really lacks support for X,” or “Savant makes Y really painful and it isn’t obvious how it works at all,” I know that these folks are pretty solid engineers and are working to make things better.

    Like

  8. (not sure if the formatting will go well and there is no preview feature , so bare with me 😉

    * FlushMode.NEVER

    I talked about this one because you said you did not want changes applied to the database automatically. You wanted to be explicit about it.

    You want em.persist(o) to execute SQL immidiatly; and I say use em.persist(o); em.flush(); if you want that.
    One could consider having a FlushMode.IMMEDIATLY or something but I have a feeling that would not work well in all cases when you
    start operating with object graphs and cascades. It would definitly most likely result in more updates than necessary; which is
    against the goal of not hitting the database more than necessary.

    So what is the thing I call shadow-instances ?

    Well imagine this code:

    c1 = session.get(Customer.class, 42);
    c2 = session.get(Customer.class, 42);
    (note: the get could be replaced with any other kind of query, hql, criteria, *lazy navigation* etc.)

    Now if I apply your wanted logic:

    assertSame(c1, c2); should fail but
    assertEqual(c1, c2); would actually pass (if you have a proper equal/hashcode implementation)

    Now what harm would it do if we had this kind of logic ?

    Changing c1 attributes would not affect c2.

    I’m gonna take this stuff in chunks because it is so long. This is
    correct. If I’ve queried some object from the database deep in some
    class and change it, tweak it, and jack with it just to experiment to
    see if some logic is gonna work, I’ve actually completely jacked with
    every other place in the entire stack (tiers, whatever) that has ever
    gotten the same instance from Hibernate. This scares me a lot because
    it is not modular. I’ve introduce coupling because the instances are
    the same.

    This I simply cannot follow 😉

    Hibernates way of working gives you loose coupling since you do *not* have to
    manually keep track of changes done in your subsystems.

    If you don’t trust your subsystems to access/mutate data correctly why are you
    handing them your precious objects ?

    Yep. I’d much prefer
    this because then my libraries/components/modules are completely
    de-coupled and I can now experiment with objects easily without having
    to clone or copy construct them. Plus, I don’t have to teach all 300
    developers at my company that they need to clone/copy construct an
    object if they want to change it. If I want to persist changes I can
    fetch from the database, modify that instance and save it. Or pass
    that instance around and then save it. Either way works for me.

    I fail to see how “experimenting with objects” ever should be done on
    business objects being passed around.

    I really sense that you are having a very JDBC/statement approach to
    your application design and that is probably what is causing you to
    feel the behavior in Hibernate/JPA as pitfalls. Imagine if you haven’t
    seen JDBC or similar way of doing things but just worked with objects;
    why would you ever expect you would need to keep track of changes manually ?

    Then you start claiming that StatelessSession does not do enough; e.g. does not do lazy loading of objects etc…but try and rethink this again when you also ask for Hibernate to not use its cache to resolve already loaded instances?

    What would happen in the case of you lazy loading something like this:

    // we get a customer
    c1 = session.get(Customer.class, 42);

    // we get that customers orders
    List c1orders = c1.getOrders();

    // we get one of those orders customer
    c2 = c1orders.get(0).getCustomer();

    Now, what would you expect now ?

    They should definitly be equals, otherwise you domainlogic using set’s will be broken:
    assertEquals(c1,c2);

    Should they be the same ?
    assertSame(c1,c2);

    In your “world” they would not be because you don’t want queried object to be resolved in a cache.

    I think you went one step too far. There is a difference between lazy loading and pointers. In my mind what you are describing here is pointers and not lazy loading. Of course the Order is going to have a pointer to the Customer and vice versa. You are creating a closed object-graph that uses pointer identity. That’s fine because if you open the classes you would expect it.

    public class Customer {
    List orders;
    }

    public class Order {
    Customer customer;
    }

    This is not surprising and Hibernate does this fine. If I were going to build this by hand I would ensure that they are the same customer.

    How would you do that without having a “threadlocal cache” as you call it ?
    It is required (or you would have to do some pretty heavy model dependent magic to wire these together)

    And imagine how it would go for Collections ?

    c1.getOrders().get(0).getOrders().get(…etc.

    You would be loading more and more of the *same* data into memory and your object graph would be inconsistent.

    Just to re-iterate, if I were writing this by hand I wouldn’t do that. I would ensure they were the same pointer. I think a good example is:

    c1 = session.get(Customer.class, 42);
    c2 = session.get(Customer.class, 42);

    assertSame(c1, c1.getOrders().get(0).getCustomer());
    assertSame(c2, c2.getOrders().get(0).getCustomer());
    assertNotSame(c1, c2);
    assertNotSame(c1.getOrders().get(0).getCustomer(), c2.getOrders().get(0).getCustomer());

    In my case this should all pass.

    How would you actually implement this ? You would be needing something similar to a session/entitymanager level cache to figure out that the order’s CUSTOMER_ID column with value 42 points back to the exact same reference that you loaded N steps before.

    I disagree on transactional integrity. The object model in my case is just fine. The difference here is that I want a query to create a new instance AND re-fetch from the database.

    Why would you ever want to have the same entity with different data twice in one unit of work ? (note, i’m writing *entity* here, not *value* – different things)

    And to your last point “the fact that queries don’t hit the database” is actually factually wrong! Queries (as in HQL, Criteria, Native sql) *always* hits the database BUT we always resolve entity and collections in the session cache to avoid all the things above.

    Hmmmm… I didn’t get far enough in my debugger session to determine exactly if this is accurate, but if this is true and considering the test case I have it would appear that Hibernate is in fact going to the database and then not using the data returned. Is that accurate? Is this actually a wasted round trip to the database or am I missing something?

    Yes, Hibernate will return the existing object if it is already represented in the session (otherwise your lazy loading would not work!)
    and No, it is not a waste round trip because you are doing a *query* and Hibernate is not a database it’s an ORM and an ORM is *not* just about mapping relation tuples to objects it also involving maintaining objects graphs and keep track of state as is natural in the “Object world”.

    So yes it has *everything* to do with lazy fetching!

    You’ll still have to convince me of that. Lazy fetching in my case, where each query creates a new instance, still works fine and still constructs the correct object model and correctly keeps pointer integrity. So, given my points above, what does cache querying have to do with lazy-fetching? I still want getOrders() to hit the database and populate the List. The only thing I could see is that if I call getOrders a second time it doesn’t hit the database but just uses the List it already fetched. Which is fine by me. If I need to update the list I’ll refetch or refresh in the exact same manner I would if I were building this by hand.

    So you accept that the following won’t fail:

    c1 = session.get(Customer.class, 42);
    c2 = session.get(Customer.class, 42);
    c3 = c2.getOrders().get(0).getCustomer();

    assertSame(c1,c2,c3);

    Correct ?

    But you want the following to actually fail:

    c1 = session.get(Customer.class, 42);
    c2 = (Customer)session.createQuery(“from Customer c where c.id = 42).uniqueResult();
    c3 = c2.getOrders().get(0).getCustomer();

    assertSame(c1,c2,c3);

    Because following your logic now c2 is fetched by a query and not by lazy fetching or a session level look up.

    That would definitly be inconsistent and very surprising! Why should entities returned from a session be different
    dependent on which way they were fetched ? This is again about different view of things; Hibernate is *state* oriented
    and you are *statement* oriented.

    If the above where the way of doing things in Hibernate (or other JPA’s) you would not be able to query up
    object graphs piece-mal via queries; you would have to do it all through object navigation which is (by its nature)
    going to be alot more inefficient (yes – there are solutions for that in some cases; but not in general).

    2) Open a new session on the *existing* connection (sf.openSession(s1.connection()) which removes the “heavyness” you are referring too but is actually an antipattern since you now can end up mixing up connections managed by the container and you)

    This would be the closets solution since it would remove my “ThreadLocal cache” concern. This does seem like a royal pain, but I would have to try it and see. Not sure this is possible via JPA. I’m pretty sure it isn’t.

    Last time I read the title of your blog it was about what you considered Hibernate pitfalls; not JPA pitfalls 😉

    I really appreciate the responses that you have given. You’re
    definitely the most knowledgable person responding and definitely have
    great information to share. One suggestion I would have for you in the
    future would be to be pragmatic. You work for JBoss it seems, so you
    are obviously slanted somewhat, but even I look at my own work and
    tear into it constantly.

    Yes, I work for JBoss and been part of the Hibernate Team since forever 😉

    at their jobs. When someone says, “I love Savant and it just works for
    everything we need,” I know they are either pretty green or just
    haven’t gotten to that point where it doesn’t work for them because at
    some point it won’t. On the flip side when someone says, “Savant
    really lacks support for X,” or “Savant makes Y really painful and it
    isn’t obvious how it works at all,” I know that these folks are pretty
    solid engineers and are working to make things better.

    Let me try and do an anology here 😉

    You saying Hibernate should not have “cache fetching” would be me
    saying a pittfall with respect to Savant is that it has “Transitive dependencies”;
    why should it do that automatically, when I should list all my dependencies explicitly
    to avoid “least surprise”.

    Like

  9. Btw. just a comment why executing DML immediatly won’t work or at least why a flush operation still needs to be there.

    Namely that we would need to do automatic dirty checking of all objects (or at least the ones with mutable properties) on every operation and that is bad.

    But I have a feeling that automatic dirty checking will be in one of your next pitfall blogs – it would fit with what you seem to consider pitfalls 😉

    Like

  10. Max, just one thing first before I repsond, please quote correctly in the future. I did a small edit to your comment.

    If you don’t trust your subsystems to access/mutate data correctly why are you
    handing them your precious objects ?

    I think we have boiled it down to this gap. 🙂 I don’t think a POJO and for that matter a struct is a precious object. It is just data. I like anemic domains and service oriented architecture because it promotes loose-coupling in a simpler manner, but not everyone feels that way. For that model, what you get from Hibernate are not business Objects, they are just data.

    I fail to see how “experimenting with objects” ever should be done on
    business objects being passed around.

    I really sense that you are having a very JDBC/statement approach to
    your application design and that is probably what is causing you to
    feel the behavior in Hibernate/JPA as pitfalls. Imagine if you haven’t
    seen JDBC or similar way of doing things but just worked with objects;
    why would you ever expect you would need to keep track of changes manually ?

    Okay, so I’ll put myself in the business object perspective for a second and I still feel that copy/clone to “mess around” with business objects is not grand. I just don’t trust libraries and you seem to trust them implicitly. I think you see the enterprise as ambivalent because you don’t mind passing your Objects to any code. You assume whatever they do is nice and persistable. I would tell you that from experience this is very dangerous. I’d further tell you to create a duplicate domain model for those sub-systems and copy values back into your business object layer on demand in order to “de-couple” yourself. But this is again where we differ. I think my time at Orbitz slightly jaded me because when data is flying around a few thousand boxes and a few hundred libraries, you can’t trust anything or anyone to play nice. I’ve seen implicit trust cost millions of dollars in down time, so I guess I’m definitely jaded.

    How would you do that without having a “threadlocal cache” as you call it ?
    It is required (or you would have to do some pretty heavy model dependent magic to wire these together)

    No way 😉 Definitely possible because you are intercepting the method calls to getOrders(), you have the customer it was called on, you’ve got all the information about the current object graph at your disposal. You know what is the top level object because you wrote the code that instantiated it and therefore could do just about what ever is clever to wire up a graph.

    Why would you ever want to have the same entity with different data twice in one unit of work ? (note, i’m writing *entity* here, not *value* – different things)

    Hang on quick. If I wanted entities like this I would use a generation framework like Jaxor. I want value objects. Hibernate from all outward appearances looks like it hands you back a POJO. This is even worse prior to annotations. You have absolutely no clue where that object came from and where it is going in very large systems, so yeah I would definitely want to have two objects with the same value available to me. Than I could compare them.

    Customer c1 = query();

    library.change(c1);

    Customer c2 = query();
    if (c1.equals(c2)) {
    // it wasn’t changed by the library so I’m cool to try this other thing here
    library.changeDifferent(c1);
    }

    This is simplistic but just imagine your data is flowing around at least 3-5 machines and going in and out of 10-20 libraries for a single transaction. I’d sure like to know what the heck is happening to it.

    So you accept that the following won’t fail:

    c1 = session.get(Customer.class, 42);
    c2 = session.get(Customer.class, 42);
    c3 = c2.getOrders().get(0).getCustomer();

    assertSame(c1,c2,c3);

    This should fail. c1 and c2 and c3 are not the same. Only c2 and c3 are the same.

    You saying Hibernate should not have “cache fetching” would be me saying a pittfall with respect to Savant is that it has “Transitive dependencies”; why should it do that automatically, when I should list all my dependencies explicitly to avoid “least surprise”.

    Nice comparison 🙂 I think though that cache fetching is a bit different, but I definitely appreciate the reference. Cache fetching to someone new to Hibernate and that is familiar with roll your own or other ORMs seems a little surprising because it isn’t standard. ActiveRecord does it like this:

    btp1 = BillingTransactionProduct.find(1)
    btp2 = BillingTransactionProduct.find(1)
    assert(btp1)
    assert(!btp1.equal?(btp2))
    btp1.number = 42
    assert_equal(42, btp1.number)
    assert_equal(2, btp2.number)
    

    However, just to include Savant in the discussion, transitive dependencies by definition means fetching dependencies of your dependencies. ORM query by definition doesn’t mean hitting a cache. That’s all I meant by different.

    Also, just as a little plug (hehe), you can easily turn off transitive dependencies in Savant with the includedependencies=”false” flag.

    Like

  11. Okay, so I’ll put myself in the business object perspective for a
    second and I still feel that copy/clone to “mess around” with business
    objects is not grand. I just don’t trust libraries and you seem to
    trust them implicitly. I think you see the enterprise as ambivalent
    because you don’t mind passing your Objects to any code. You assume
    whatever they do is nice and persistable. I would tell you that from
    experience this is very dangerous. I’d further tell you to create a
    duplicate domain model for those sub-systems and copy values back into
    your business object layer on demand in order to “de-couple”
    yourself. But this is again where we differ. I think my time at Orbitz
    slightly jaded me because when data is flying around a few thousand
    boxes and a few hundred libraries, you can’t trust anything or anyone
    to play nice. I’ve seen implicit trust cost millions of dollars in
    down time, so I guess I’m definitely jaded.

    I don’t necessarily trust subsystems by default and you apparently don’t
    do that “ever” 🙂 So why don’t you just evict() objects from Hibernate’s Session
    before passing them down to subsystems ?

    Ah yes – you want something like OpenSessionView pattern…guess what; that kinda
    implies that you trust your layers.

    btw. All of the above sounds like a perfect usecase for detached objects where
    you are getting alot (read: most) of the responsibility to ensure consistency anyway.

    How would you do that without having a “threadlocal cache” as you call it ?
    It is required (or you would have to do some pretty heavy model dependent magic to wire these together)

    No way 😉 Definitely possible because you are intercepting the method
    calls to getOrders(), you have the customer it was called on, you’ve
    got all the information about the current object graph at your
    disposal. You know what is the top level object because you wrote the
    code that instantiated it and therefore could do just about what ever
    is clever to wire up a graph.

    That is what I call “pretty heavy *model dependent* magic”. What if you have an ever deeper hierachy ? What about:
    or maybe even a order.getPreviousCustomer() that might correspond back to any customer; including the root or someone else ? How would you resolve that ?

    You would traverse the whole object graph to track down these objects ? Nah, don’t think so (talk about slow and unpredicatable results)

    Why would you ever want to have the same entity with different data twice in one unit of work ? (note, i’m writing *entity* here, not *value* – different things)

    Hang on quick. If I wanted entities like this I would use a generation framework like Jaxor. I want value objects. Hibernate from all outward appearances looks like it hands you back a POJO

    Yes sure it does. But value != POJO and value != entity. I think you should go read Hibernate In Action to get some terms in place 😉

    So you accept that the following won’t fail:

    c1 = session.get(Customer.class, 42);
    c2 = session.get(Customer.class, 42);
    c3 = c2.getOrders().get(0).getCustomer();

    assertSame(c1,c2,c3);

    This should fail. c1 and c2 and c3 are not the same. Only c2 and c3 are the same.

    A session guarantees session-scope identity for objects! If you don’t want that
    then use two different session’s to get different instances; what is the big problem about this ?

    Cache fetching to someone new to Hibernate and that is familiar with roll your own or other ORMs seems a little surprising because it isn’t standard.

    How can you say it is not standard when it definitly is in all JPA and even JDO implementations/specs ?

    I can tell you that I stopped using other ORM’s before Hibernate because of them not having something like
    session scoped identity. Some lacked it completly others even made it application scoped identity which is
    much worse.

    ActiveRecord does it like this:

    btp1 = BillingTransactionProduct.find(1)
    btp2 = BillingTransactionProduct.find(1)
    assert(btp1)
    assert(!btp1.equal?(btp2))
    btp1.number = 42
    assert_equal(42, btp1.number)
    assert_equal(2, btp2.number)

    ActiveRecord (in general) is not a full blown ORM; because it does not handle the problem of identity etc.
    In any case what in those lines of code defines the “unit-of-work” ? How should anyone know what the following code does:

    c1 = somebtp.getCustomer();
    c2 = someotherbtp.getCustomer();

    is c1 == c2 ?

    That all depends on what you did before these lines.

    How does “ActiveRecord” guarantee that btp1 == btp1.getX().getY().getBillingTransactionProduct(); assuming X/Y is associations that result
    in getBillingTransactionProduct() points back to the root ? (My bet it doesn’t – and if it does it basically has attached something like “session”
    to these objects per each call to find() )

    With a proper session/unit-of-work abstraction you can guarantee they will be
    the same (assuming you use concepts like OpenSessionInView etc.) which
    is usefull in many apps.

    If you don’t want that; use different
    sessions to get them – pure and simple.

    Like

  12. Max, we could seriously go back and forth on this forever. I think we just break down at the fact that you like entities that are fully stateful and managed and I don’t. I really don’t like my frameworks doing things unless I tell them to is all. And we probably won’t convince each other since we are very entrenched in our philosophy. But, my main issue with just letting it drop is that you seem very defensive and not open to other views and have a very Java/Hibernate/JPA/JDO/EJB mindset. I don’t want folks reading this to leave thinking that Hibernate will solve all the problems or that the pitfalls I’ve suggested aren’t in fact things to be aware of when using and designing for Hibernate, because they are extremely vital to understand.

    In fact, I’m perfectly willing to use Hibernate in many cases and let it manage all the state. I understand how to detach and what you lose by doing that. I know many of the Hibernate trade-offs and picked different approaches based on needs. In some cases I’ve used other frameworks that handled things differently because of Hibernate approaches to ORM. I’m not stating that Hibernate is fundamentally flawed. I’m giving folks perspective into Hibernate concerns they might not have thought about before.

    But, I keep needing to clarify because some of the comments you offer are somewhat off with respect to the underlying pitfalls I’ve mentioned. Like your comment about OpenSessionInView and trusting layers – OpenSessionInView pattern really only lets me lazy load. It doesn’t mean I’m trusting layers because queries aren’t transactional.

    Or your comment about cache fetching being standard – I offered a very quick example of another widely used ORM that doesn’t use it and there are others out there as well.

    Or your comment about using two sessions fixes things – which can actually cause issues such as resource problems because you now have multiple JDBC connections per execute thread and issue when passing objects between sessions.

    Or your comment about heavy object graph magic – because you can associate a fetch cache to an object graph instead of a session and it behaves just like multiple sessions without the overhead.

    I doubt we’ll agree even on these points. Perhaps at this point we should agree to disagree and I’ll post another blog entry explaining to readers that my intention was to inform and discuss, but not to turn them away from Hibernate. Hibernate for what it is is great and most developers will save reams of time using it and that is equally important to understand as are my pitfalls. Sound fair?

    Like

  13. Sounds fair 😉

    Just two (maybe three points):

    Or your comment about cache fetching being standard – I offered a very quick example of another widely used ORM that doesn’t use it and there are others out there as well.

    Sure and I know about those; but I don’t go out and say some corner stone features in ActiveRecord is “bad” because they are or are not a standard.

    That is where you seem to ‘tick’ me 😉 Claiming that your view is the “right one” and thus everyone else would think these are pitfall’s – i’m trying to turn that upside down to show that in many cases it actually does makes sense (and even more when you consider the consequences of having/not-having certain features)

    Or your comment about using two sessions fixes things – which can actually cause issues such as resource problems because you now have multiple JDBC connections per execute thread and issue when passing objects between sessions.

    As I’ve said before you *don’t* need two connections for this. This would be very much similar to your ActiveRecord example (assuming that version of ActiveRecord “magically ensures that an objectgraph rooted in X is consistent identity wise” – you still haven’t answered that btw 😉

    Let me conclude this (again 😉 with a link that have describes one (our) way of classifying and comparing orm’s:

    http://blog.hibernate.org/cgi-bin/blosxom.cgi/Christian%20Bauer/relational/comparingpersistence.html

    As far as I can see your “standard” view is closest to “light object mapping” where as my “standard” view is closest to “full object mapping”

    (and please remember that Hibernate actually has facilities to cover both of these areas, but of course the “full” one is best covered one)

    Like

  14. If its ok, I may take a different approach of looking at this issue.
    Hopefully this makes sense 🙂

    Note: I’m relating the ‘cache’ you refer to as the JPA Persistence Context.
    This can be confusing in the sense that many ORM’s have another ‘global cache’.

    To start with, the way I view JPA Persistence Context is….

    Persistence Context has 2 functions/jobs
    – Building ‘consistent’ object graphs
    – in JPA used in flush to persist all ‘dirty’ persistent objects

    Persistence Context has 3 levels
    – Statement (my concept, not in JPA)
    – Transaction
    – Extended (across multiple transactions)

    The way I see the issue you describe here Brian is:
    Either: You’d like to not have JPA flush dirty objects automatically.
    In the example, setName(“Bar”) means that Bar is used in the update.

    Or: You want the query to run with “Statement level persistence context”.
    In the example the query runs in the same transaction and by default there is
    ‘Transaction Level persistence context’. As the object is already in the persistence
    context then the query returns the same instance.

    It seems to me that if either of these options where available, they would
    resolve the issue you described?

    Note: Statement level Persistence Context is NOT mentioned in JPA or
    anywhere else (that I know of). Its a concept I use.

    I wrote this document on Persistence Context that may help (hopefully).
    http://www.avaje.org/persistencecontext.html

    Does any of that make sense?

    Like

  15. Hmm, not too sure what the -> means.

    According to me (my testing of StatelessSession) and the Hibernate docs StatelessSession operates without ANY first level cache (Aka no persistence context at all).

    My definition of “Statement level persistence context” is…
    A Persistence Context exists and is scoped to a single statement (for the purposes of building a consistent object graph).

    e.g Execute a single query returning orders and order lines – does the object graph contain “data aliasing” (multiple instances that represent the same logical entity).

    // … session.createQuery(“from Order o left join fetch o.details”); …
    Does each “orderline” refer to the same instance of order?

    With StatelessSession they do not (which is consistent with the doc). If it used “Statement level persistence context” they would (my definition).

    That is, Hibernate StatelessSession states it has no “first level cache” as opposed to “its first level cache is scoped to a single statement”.

    Hmm, interesting…

    Like

  16. A statelesssession operates with a temporary persistencecontext. check the code.

    After verifying with some extra tests I see there is cases that is not covered; that needs to be fixed.

    “// … session.createQuery(”from Order o left join fetch o.details”); …
    Does each “orderline” refer to the same instance of order?”

    Currently that will only work if you an ” details left join fetch details.order”.

    Like

  17. Why Hibernate or other ORM anyway in first place? The first problem people facing is the performance when a joined query involves multi-tables. Hibernate executes hundreds of selects and painfully slow. commons.dbutils provides the balance between class/table mapping and flexibilities. Any ORM tools just add complexities and numerous pitfalls resulting maintainability nightmare.

    Like

  18. It’s too bad Max gave you such a hard time with your post. There are so many other post about “how woderful hibernate is”, there has to be others (like yours) outlining how hibernate can be difficult to offset the inbalance.

    I have used hibernate on 2 large projects, one where we designed the system from the ground up (hince the database was built for the object model) and another where we applied hibernate to a legacy product (thus the database had not object model design behind).

    In the first project, hibernate was a nice as it did all the work for us. However, in terms of scalability, we hit a wall later once we realized how many queries it makes to keep its cache in sync with the DB. Hibernate is still there, but we had to find many work around to scale. Has someone said before on your posts, every framework have weaknesses, and finding them was not cheap for us.

    On the second project, we had to throw hibernate out and move to commons.dbutils (something equivalent). We realized that mapping hibernate to a legacy database structure was difficult and hibernate did not handle it. One can argue that the design of this database was bad, but there are many “bad” design out there and we need to continue maintaning them. Nevertheless, I was committed to keep hibernate for this project until we reached a point where hibernate was throwing exception for bad code it generated. No one answered my posting on their site, so I was forced to pull it out.

    Hibernate is a fine tool to get something up and running quickly and easily. However, for large scale project with high performance requirements, I am doubtful (based on my experience) of its benefits.

    Like

  19. I dislike hibernate and I don’t think it eases development. Developers may not need to consider writing the SQL (how hard it could be?) but they are the same time limited by hibernate ‘rules’. Hibernate just has too many tricks. And it almost always creates unnecessary database queries:

    Suppose we have two classes, Person and Address, and they have nbi-directional one-to-many relationship (one person having multiple addresses). Hibernate says we use the following to add a new address to a person:

    Session session = …;

    Address newAddress = new Address();
    Person person = session.get(…);
    person.getAddresses().add(newAddress); //

    Like

  20. +1 to Brian for keeping a level, constructive, and informative discussion going. Always better to attack the issues than the person. So refreshing in cyberspace…

    Like

  21. Why anyone would use Hibernate for any project just shows that they don’t know anything about software system design. What do I gain from using Hibernate? Nothing but a mess! You will find that hibernate lovers don’t really understand the power of a hash map or a result set. Bulk data is easy. Ask them if they would return 10 books back to the library one trip at a time in their car-book delivery object, DAO or whatever. I why are we writing SQL on top of objects? SQL is for Relational Databases. If I do have objects, the last thing I want to do is go back to SQL again. I just have to shake my head at the trends people follow. 😐

    Like

  22. OMG, Hibernate sucks so bad. Ever see any performance benchmarks with this crap? It’s slower every time–EVERY TIME (which should not come as a shock at all). And the slowness is equal to or more than the differences between many different database types. That’s right–it almost makes the performance of your particular database a moot issue, because the real question regarding performance is, “Are you using Hibernate?”

    If you are, you’re guaranteed headaches, problems, more work, and PISS POOR performance. The end.

    Like

  23. The sheer fact that Max and Brian had to discuss/debate so many details about hibernate & O/R concept is proof in and of itself that this is not KISS.
    Brian and also Max may be experts at hibernate and I appreciate the knowledge if I have to use Hibernate to get my paycheck. But,
    I could have written and completed entire applications in the amount of time spend dissecting or fixing O/R implementation concepts. What happened to KISS? “Keep it Simple Silly” If you are dissecting a tool or library to this degree to get it to work right for you and understand it, then throw it in the recycle bin! That’s what my mom did when the VCR had too many buttons, options, and controls. User friendly is not just for end users.

    Like

  24. My current project is using Hibernate and it performs badly. We are considering replacing Hibernate with some other options. After all those years practice as an independent consultant, my believe in KISS keeps growing. Thank you, Brian, for such a wonderful post.

    Like

  25. Dug up these “pitfall” articles while musing some old Hibernate wounds. Definitely made for interesting reading (especially the comments section ;-).

    You stay classy Brian!

    Like

Leave a comment