Hibernate pitfalls part 1

This will be my first episode in a series of posts regarding Hibernate. I’ve already posts a few sporadic posts about Hibernate, but now I’m collecting everything into a few well directed posts. These will all be using JPA rather than Hibernate APIs. I’ll finish off with a post about other JPA solutions and how they handle things.

In this episode we’ll explore the pitfall known as delayed SQL.

Hibernate maintains an in memory cache of inserted entities, updated entities and removed entities. This cache is later used to generate SQL that is passed to the database. This means that Hibernate does not execute any SQL with the underlying database until it absolutely has to or you tell it to using the flush method. So, if you have something simple like this:

CREATE TABLE Test (
  id bigint(20) not null,
  name varchar(50),
  primary key (id)
);

Okay here’s the code (I’ve cut out all the junk like getters and setters to make this simple):

@Entity
public class Test {
  @Id
  private int id;
  private String name;
  ...
}

and the code that illustrates this problem. I’m gonna use comments to show you where things happen, but remember that Hibernate gets to decide when it calls the database so it isn’t guaranteed to work this way. It is up to Hibernate.

EntityManager em = emf.createEntityManager();
Connection c = ((EntityManagerImpl) em).getSession().connection();
Test t = new Test();
t.setId(1);
t.setName("Foo");

EntityTransaction et = em.getTransaction();
et.begin();
em.persist(t);

// Check if it is there
Statement s = c.createStatement();
ResultSet rs = s.executeQuery("select * from Test where id = 1");
Assert.assertFalse(rs.next()); // This is false because the SQL hasn't been executed yet

et.commit(); // The SQL is executed here
em.close();

As you can see, this code illustrates the fact that Hibernate is caching up inserts, updates and deletes until it decides they need to be flushed. It does not execute the SQL when persist is called. There are a few ways to ensure that it does call the database and execute the SQL:

1. Call flush
2. Insert an object that has a @GeneratedValue annotation

These are the only methods of ensuring that Hibernate executes the statements.

So, why is this an issue? It is an issue when you want to know which of several operations failed. Unless you call flush after each operation, you have no way of telling. Here’s an example:

em.persist(a);
// em.flush();   - solves the problem
...
em.persist(b);
// em.flush();  - solves the problem
...
em.persist(c);
// em.flush();  - solves the problem
...
// Normally, any unique key violations or other SQL errors
// occur here so we don't really know which statement failed
et.commit();

The main issue is that there is not a global way of controlling the flush behavior. Hibernate is storing these things up so that it can batch them in an effort to increase performance. The issue with that approach is that Hibernate is no longer an ORM. It is now acting somewhat like an Object DB or Object cache instead. But unfortunately, Hibernate isn’t a DB because it doesn’t correctly handle all the glorious transactional and distributed computing issues that most enterprise RDBMS and Object caches (i.e. Tangosol Coherence) do.

So, you have a number of solutions to this issue, which makes the suckiness factor of delayed SQL a 3 (out of 10).

39 thoughts on “Hibernate pitfalls part 1

  1. This is a strange post. I guess my main concern with the extreme stance of the title is that it implies there may be a better solution with similar performance characteristics.

    Some people (myself included) consider the characteristic cited to be a feature, not a bug, precisely because the performance characteristics of a fully transactional distributed system are quite different. Its not a “good” vs “bad” and the decision matrix to choose either involves many factors.

    These zero-sum game articles appear quite misleading to an uninformed user.

    Like

  2. Marcus, thanks for the thoughtful comment. The name was really just something I slapped down while writing the initial versions of all the articles. I’ll definitely re-think the naming for the next one.

    One the flip side, as the article states, I think this is really a horrible feature/issue. The theory of KIS, least surprise, transparency and so many other things is completely violated. Having worked on enormous applications with thousands of servers and tera-bytes of database data, batching is usually the least of the performance problems. So, building the entire framework around that concept seems pretty worthless to me.

    One last comment I have is that many other ORM frameworks don’t do this an favor a direct to the database approach with clear return values (like ActiveRecord returning a boolean as to whether or not the record was saved or not). This approach is simple, 0 surpirse, transparent and so much more.

    Like

  3. You would get the updates if you had done the query or sql in Hibernate. Hibernate flushes before queries. Why are you using a mixture of sql and jpa ? It isn’t usually necessary

    Like

  4. ..seriously – your augumentation does not hold since “delayed SQL” actually have alot of performance and throughput benefits in OLTP scenarios (the biggest intended usage of Hibernate or ORM’s in general) that you seem to ignore.

    Plus of course ignoring the other options you have in Hibernate to get exactly the behavior you want; e.g. cascade options could have helped you reduce those 3 persist calls to one (of course depending on how these objects are are connected)

    And If you don’t want Hibernate to be a full *state* oriented ORM but rather do *statement* oriented ORM then use StatelessSession.

    Like

  5. You would get the updates if you had done the query or sql in Hibernate. Hibernate flushes before queries. Why are you using a mixture of sql and jpa ? It isn’t usually necessary

    This is for illustration purposes only. It is just there to show that Hibernate is not going to the database. Yes, queries do flush the session I think in all cases. This might not be 100% true though. I would have to take a look at the code to tell.

    In either case, there are a lot of cases, the simplest being multiple JVMs or multiple ORMs in the same JVM, that make this behave the same.

    Like

  6. What do the docs for EntityTransaction say about begin() and commit()?

    Is auto-commit turned on?

    No, JPA turns auto-commit off when starting a transaction. I believe if auto-commit is turned on Hibernate still doesn’t execute the SQL, but I’d have to verify that.

    Like

    1. I have verified that, even you configed the auto-commit as true, you still need to manually commit() to submit your change to DB.

      Like

  7. ..seriously – your augumentation does not hold since “delayed SQL” actually have alot of performance and throughput benefits in OLTP scenarios (the biggest intended usage of Hibernate or ORM’s in general) that you seem to ignore.

    I’ve used Hibernate in VERY large scale OLTP situations and in most cases I’ve never needed batching of anything. I’ve also found this extremely painful for OLTP processing where unique key violations are not only common but used for control flow logic and persistence logic.

    Plus of course ignoring the other options you have in Hibernate to get exactly the behavior you want; e.g. cascade options could have helped you reduce those 3 persist calls to one (of course depending on how these objects are are connected)

    Yeah, you jumped the gun on that one I think. Cascading really is not a solution for this problem. It is merely a code reduction problem. That second example I have illustrates a long transaction with many inserts/updates across many unrelated tables. Even if they were related, cascading would not fix the issue, but make it worse because now a single statement does many inserts/updates and you again have no idea which one failed because of a unique key violation.

    And If you don’t want Hibernate to be a full *state* oriented ORM but rather do *statement* oriented ORM then use StatelessSession.

    This could be possible via JPA. I’ll give it a try using the hibernate.cfg.xml file. However, the default behavior of Hibernate is stateful and in my and others opinions Hibernate is a bit invasive.

    Like

  8. From a quick read of the Javadoc for StatelessSession:

    A stateless session does not implement a first-level cache nor interact with any second-level cache, nor does it implement transactional write-behind or automatic dirty checking, nor do operations cascade to associated instances. Collections are ignored by a stateless session. Operations performed via a stateless session bypass Hibernate’s event model and interceptors. Stateless sessions are vulnerable to data aliasing effects, due to the lack of a first-level cache.

    I haven’t used it, but if this all holds true it would seem that lazy fetching would be disabled and there would be no opportunity to intercept inserts/updates/deletes, which would reduce the overall effectiveness of the framework in OLTP situations since I would now have to manage all those insertDate, updateDate, columns by hand and would have to pre-fetch everything I need. Remember my issue is with inserts and updates and NOT with fetching in this article. I’ll cover fetching later on.

    Like

  9. Hi,
    I was a big fan of hibernate back in 2003.
    I still think it’s great but my concern is maintenance. Well I’m a consultant so i should not care that much.
    I found iBatis less clean but more maintanable when i’m gone to another contract for people that stays in the job after i left.
    My main concern about hibernate is that it’s difficult to follow what it’s doing especially when things are going wrong. You don’t really know what is happening on the DB, i know that you can put tracing on… but still i found it difficult to follow.
    iBatis is much more practical because SQL queries are really written in the file.
    OO db are not there yet, i mean on a day to day basis in the contract world. It will come… i hope but since then we will have to live with ersatz of what a persistence storage should be. I know db4o and prevayler as well, but… difficult to sell to a client.

    Like

  10. I’ve also found this extremely painful for OLTP processing where unique key violations are not only common but used for control flow logic and persistence logic.

    If you’re getting unique key violations, and you are expecting them in order to control logic, then I don’t think I’m going to find any useful information in your blog. You’re whining about Hibernate _features_ and then telling us about how you use exceptions like if statements. Perhaps you need to reconsider the architecture of your software, mate.

    Like

  11. If you’re getting unique key violations, and you are expecting them in order to control logic, then I don’t think I’m going to find any useful information in your blog. You’re whining about Hibernate _features_ and then telling us about how you use exceptions like if statements. Perhaps you need to reconsider the architecture of your software, mate.

    Tomorrow’s post will definitely include a good summary of this stuff, but before then I’ll get to this comment. How is ActiveRecord:

    keyword = Keyword.new("foo")
    if keyword.save
      # Unique key violation
      keyword = Keyword.find("foo")
    end
    

    using exceptions as flow logic? Let’s be sure we are talking about ORM tools in general. Even still, since we are stuck with Hibernate exceptionss, this is still valid logic for handling unique constraints:

    Keyword keyword = new Keyword("foo");
    try {
      entityManager.persist(keyword);
    } catch (ConstraintViolationException e) {
      keyword = entityManager.find(Keyword.class, "foo");
    }
    

    Albeit a simple example, this is perfectly acceptable persistence logic. You could probably easily extrapolate this out into more complex logic blocks and delegate to different workflows and other classes depending on the results from the persist method.

    Just to be clear, I’m focusing on specific items that are painful to code around or completely obscure and that other ORMs handle pretty more elegantly.

    Like

  12. This post sucks. You call write-behind an “anti-pattern” but do not explain why you view it that way. It doesn’t make sense to make 30 remote calls to the DB when you could make 5. There IS a way to control flush behavior, you can set the flushmode on a session. It is true there is no global configuration to change it but it is unclear to me that you’d want to do so unless you don’t really understand how database work. I don’t think even Cameron would be in line with your connection to Tangasol coherence here. It just doesn’t make sense, what does a distributed transactional cache have to do with how an ORM should handle persistence. I would argue the write behind behavior of Hibernate is pattern rather than an anti-pattern. In fact it has been a tried and true method of performance-tuning ORM from Toplink to EJB-CMP engines. In fact, Tangasol (now Oracle) Coherence employs it as well.

    Like

  13. This post sucks. You call write-behind an “anti-pattern” but do not explain why you view it that way. It doesn’t make sense to make 30 remote calls to the DB when you could make 5.

    Agreed, but for that 99% case, it doesn’t really improve things. And for that 1% case, and I’ve been there, I didn’t use it. We could have used it, but things worked fine without it.

    There IS a way to control flush behavior, you can set the flushmode on a session.

    Nope. There are no flush modes for IMMEDIATELY.

    It is true there is no global configuration to change it but it is unclear to me that you’d want to do so unless you don’t really understand how database work.

    Remember that this only got a 3 out of 10. There are many ways around it and it isn’t that big of a deal for many cases. So, let’s avoid throwing out things that don’t make much sense like, “databases are meant for batching and you just have no clue how they work otherwise you’d love ORMs to have write-behind.” This is talking about a “feature” of Hibernate which I think is unmaintainable, obscure, hard to manage, and annoying.

    I don’t think even Cameron would be in line with your connection to Tangasol coherence here. It just doesn’t make sense, what does a distributed transactional cache have to do with how an ORM should handle persistence. I would argue the write behind behavior of Hibernate is pattern rather than an anti-pattern. In fact it has been a tried and true method of performance-tuning ORM from Toplink to EJB-CMP engines. In fact, Tangasol (now Oracle) Coherence employs it as well.

    I’m pretty sure he would since he writes a cache product. Hibernate is not a cache product. It is an ORM. They are very different. ORMs are a mapping between objects and relational databases, not caches.

    Like

  14. Oh my, you will create a _series_ of nonsense posts like this one? Write-behind behavior (reducing lock times and therefore contention on your single most important shared resource, the database) makes a stateful ORM software an “object database”? A distributed cache? Where is the connection? It sounds like you just want people to believe that this is the case without ever saying why that is so.

    Like

  15. Oh my, you will create a _series_ of nonsense posts like this one? Write-behind behavior (reducing lock times and therefore contention on your single most important shared resource, the database) makes a stateful ORM software an “object database”? A distributed cache? Where is the connection? It sounds like you just want people to believe that this is the case without ever saying why that is so.

    Not exactly constructive, but I rarely reject comments. This isn’t non-sense. I’ve said it a couple times but I’m addressing concerns with Hibernate that I feel are not obvious, obscure and break “least surprise”. Hibernate is in fact acting partially like an object database. You query against Objects in memory and when you manipulate objects your changes are automatically persisted. These are some of the principles of object databases.

    In terms of distributed cache, my point is that Hibernate is NOT a distributed cache. Having worked closely with a distributed cache, anytime you lazily flush records to the database you must assume that any machine can fail and not cause data issues at all. This is paramount. Hibernate only knows about a single VM. Placing it into a distributed environment means you need to be careful how you use it because it does not flush immediately. You can’t assume that other machines can respond to an insert because it hasn’t occurred yet and although the VM you did the insert in will always see the new data, other machines won’t. This is really important to understand.

    Like

  16. Hum…

    You know, Hibernate and JPA aren’t magic. You still need to understand some things about *how* they work and how relational databases work — particularly transaction commits and rollbacks.

    The problem here is that this example is running into some side effects of CMT (container managed transactions). Hibernate is working *correctly*. In fact, flush() is limited in when it can force a database commit. It’s limited by the container’s transaction boundary.

    Most Java programmers don’t need to worry about the exact time of a commit to the underlying database. Hibernate in conjuction with the container will do it correctly nearly all of the time. In fact, the tool does you a favor by preventing you from corrupting the transaction boundary and screwing up your ability to do a rollback in the event of an error.

    If you want to override the tool, be sure you understand what is going on first.

    Like

  17. The problem here is that this example is running into some side effects of CMT (container managed transactions). Hibernate is working *correctly*. In fact, flush() is limited in when it can force a database commit. It’s limited by the container’s transaction boundary.

    This code was actually run command line without any container. Hibernate batches all database activity until a transaction boundary is hit and this is the pitfall I describe here. You just have to know this and plan for it.

    Like

  18. Perhaps this is slightly off topic, but in I’d like to see how this relates to JPA as well (as opposed to just Hibernate).

    The way I see it, this example shows what I consider 3 specific issues with JPA (as it currently stands). Specifically I would personally like to see JPA provide more control to developers in 3 areas.

    1. Control over Statement Batching (on/off, batch size, generated keys)
    2. Control over Cascading of Persist/Delete (the ability to turn it off)
    3. Better support for using Raw JDBC Support

    The example here as I see it revolves around the ‘transparent batching’ that is occurring (until the ‘flush’ in this example). As I see it there is nothing in JPA that allows you to control the statement batching – It may be that vendors do not see your pain?

    Another weakness in JPA (as I see it) is the lack of support for getting the Connection and using it (for more advanced uses such as stored proc calls, savepoints etc). Normally a query would implicitly flush (in JPA, Hibernate and Ebean) but raw use of Connection is not supported in JPA (currently).

    Disclaimer: I’ve created Ebean at http://www.avaje.org to try and improve JPA (and simplify life).

    Like

  19. Is this statement true????

    “Debugging is tough and error handling in hibernate leaves a lot to be desired. If you make mistakes or have a complex database, hooking up hibernate can be a nightmare and cause your project to go way over budget.

    Like

  20. Mark — I don’t know if this is a proper answer to your question but here’s something I ran into today … we are converting data from a legacy system into a new system with business objects persisted with hibernate. The query against the old table, also H defined, returns 26,000 objects … everything is running really slowly. 80 seconds to process 100 entries. Queries are taking forever. CPU is running 50% and the db is hardly working up a sweat. The solution? Clear the Hibernate cache immediately after the query bringing in the old table’s objects and clear it every 100 entries written to the DB. Now it takes 8 seconds to process 100 records. The problem is I would have expected H to just take care of this for me. It is not obvious (to me) that this needs to be done.

    I don’t necessarily agree with the thrust of this post but I think it and its replies are quite educational; so thanks to the author for writing it.

    Like

  21. Nonsense or Not, this article ( part 2 and 3 as well) helped us in our project where we were not considering the flush() much. I think people at hibernate.org should come out with some standard list of mistakes or misconceptions that a consultant/contractor/developer can have.

    Like

  22. I have used hibernate and frankly there are few hidden monsters. Best of all the documentation is good and normally you get answers to all the questions. I would consider write-behind to be a feature and not an *issue*

    Like

  23. In my view, Hibernate is for people who don’t know or don’t want to invest time in writing performant SQL queries. Maintain and convert a Hibernate project is very painful in that you have no idea how the queries are executed.

    Like

  24. “Hibernate sucks because I don’t understand anything about transaction boundaries”

    hibernate is good stuff if you aren’t too lazy/incompetent to figure it out. Spring’s hibernate template & hibernate transaction manager are a big help.

    Like

  25. I think, the points discussed here are worthwhile. I don’t know if hibernate is good or not but I was trying out something simple like this

    HibernateTemplate ht = getHibernateTemplate();
    ht.save( b );
    ht.flush();

    and this resulted in all those mysterious errors that the article talks about.

    I would definitely not prefer Hibernate over simple C++ or over the great ORM capabilities of Ruby on Rails. At least we have complete control of the exceptions and errors in the case of C++ or RoR. That way hibernate sucks, not even a good post in net for this problem from the Hibernate team and above all the team tries to defend themselves for their follies.

    Like

  26. I tried to like hibernate but I finally gave up. It was really a painful to use it in my projects. The main reason I think it is too intrusive. It dictates the way you design your application. It blurs the boundary between database and applications. Database schema inevitably leaks into application’s data model. The result is, if you change either data model or database schema, you have to change the other. And because hibernate’s hbm is so complicated, changing database schema will make it painful to update the hbms. At last, Hibernate has too many caveats. You have to be very careful when writing beans for hibernate.

    Instead, I found it was a pleasure to use iBatis as I had full controls over SQLs and my code was much more concise.

    Like

  27. >Instead, I found it was a pleasure to use iBatis as I had full controls over SQLs and my code was much more concise.

    iBatis is a dumb toy.
    Consider the fact that returning autogenerated keys requires ** different ** xml depending on the database (derby VS. sqlserver), what’s the whole point of “shielding” yourself from DB vagaries if each DB has its own XML syntax?

    Now get the spring wannabes in there, and you have a crap wrapper around a crap wrapper around pure JDBC. That’s at least 15 layers of crap!

    Like

  28. oh and by the way, EclipseLink kicks Hibernate’s ass up and down the street, and considering the license of EclipseLink is superior and furthermore it’s also JPA ref impl, I think you can safely dump Hibernate.
    Fortunately I never lost any time with Hibernate, but I took a whiff of it a few years ago, and it reeked of wannabe.

    Like

  29. Hibernate does suck. If it wants to be a crappy ORM tool, fine, do that. Stop bloating your damn libraries with “features” and CRAP that nobody wants or needs. STOP TRYING TO DO THE DATABASE’S JOB BETTER THAN IT DOES! If a database needs caching to improve performance, ORACLE or whoever is designing the database will determine that. They don’t need some retarded “software improvers” trying to fix things that aren’t broken, causing more problems in the mean time.

    And “jimmy” if you mean to imply that Hibernate is not crap piled upon more crap internally (take a look at the awesome source code), you are either delusional or you have special needs.

    Hibernate is for lazy developers who think they can avoid work by using some crap ORM tool. They fail to realize just how much time it costs them getting the ORM crap to do what it’s supposed to do in the first place. And by the time you’ve learned it and become good with it–there’s a new version out and you’re a Hibernate tard again. If they had just focused on simplicity and sat down to do the work in the first place, they’d have been done before they reach the point where they discover new “issues” in Hibernate’s caching, logging, performance, etc. AND even the latest college graduate would be able to understand how it works.

    Hibernate = job security perhaps? That’s about it.

    Like

  30. John Newman: “hibernate is good stuff if you aren’t too lazy/incompetent to figure it out. Spring’s hibernate template & hibernate transaction manager are a big help.”

    Pathetic. They even need the Spring team’s help getting their “framework” to a tolerable level of usability.

    Have fun figuring out all of those retarded caveats while those idiots work on the next version of Crapbernate, and you have to start all over again. The rest of us will actually be completing our software projects in the mean time. ROFFLE

    Like

  31. >>And “jimmy” if you mean to imply that Hibernate is not crap piled upon >>more crap internally (take a look at the awesome source code), you are >>either delusional or you have special needs.

    Nope, I certainly did not mean to imply that. I peeked at the hibernate architecture some years ago and rejected it. Currently we are using EclipseLink for some aspects of our project. It has some nice autogenerate features which so far works great on derby, SQLServer and PostgreSQL. The real ‘meat’ of our relational model, however, is ROLAP and for that we’re not using JPA.

    Like

  32. I guess the person who is describing the pitfalls of Hibernate does not know he is describing the pitfalls of the Unit of Work pattern as described by Martin Fowler.

    I suggest you read up on Unit of Work pattern as Hibernate carries it out in a great manner.

    Like

  33. In addition what have already been said about Hibernate – the evil is in Dynamic SQL and procedure caching on database server side. Please have a look at the article

    http://www.sommarskog.se/dynamic_sql.html

    and run sometimes these 2 queries on your SQL Server that used for Hibernate

    select SUM(SC.pagesused)*8192/1024 as [Procedure buffer (execution plans) size Kbytes] from [master].dbo.syscacheobjects SC

    select
    SC.cacheobjtype
    ,SC.objtype
    ,SC.sqlbytes
    ,SC.pagesused
    ,SC.usecounts
    ,SC.[sql]
    from [master].dbo.syscacheobjects SC
    –where usecounts = 1
    ORDER BY SC.pagesused DESC

    Pay attention to the size of RAM occupated by queries with usecounts = 1 – it is mostly wasted for nothing.

    Like

  34. I have 4 years of experience in professional Hibernate projects for financial (production working applications). I’ve also worked with Oracle PL|SQL for many years. And I must honestly admit – Hibernate sucks. As far as I can remember, SQL paradigm was invented about 30 years ago or more. Many great brains worked on that. Databases like DB2, Oracle, MS-SQL were invented by many, many smart engineers. I don’t see a reason to use a kind of tricky framework of very low quality to do a job which is already wery well done by databases itself.
    Unfortunately it’s hard to find java job where hibernate is not used. I hope it changes in the future.

    Like

  35. followup:
    we ** were ** using JPA. I am now convinced that JPA sucks, hibernate sucks, and pure JDBC is the way to go (EclipseLink sucks too).
    Hibernasty sucks hardcore (screw EclipseLink too).

    Like

  36. Guys take this advice, Hibernate+Spring really sucks, We are now facing problems because “The wise guy of the company” implemented that crap. He said many things about the benefits.
    -Less Code
    -Faster
    – and more crap that is not true.
    WHY ?
    Less Code ? r/ More configurations, more things in the memory.
    faster ? r/ update of 1000 records. simple way (update x set w=f where a = b) …. Hibernate does one by one.

    If you wanna have a complicated code with many patches…. implement that crap.

    Like

Leave a comment