Dec 052006
 

Okay, I’m totally hacked! java.net.URL class officially sucks! The equals method on this shining example of the JDK API mess actually does a blocking DNS lookup on the host string to resolve to an IP address and then compares the IP addresses rather than the host string. What freakin’ sense does that make?

Simple example:

Let’s say these map to these IP addresses:

Here’s the scary part:

These two URLs are NOT equal, ever! They could be different web apps, have different failure IPs and even different servers using a load balancer. At 2AM they might have the same IP, at 2PM they might have different ones. They might be in different states or different countries. And here is the kicker! URL’s equals method isn’t idempotent! If you call it with an Internet connection you might get a different result than if you call it without one! Seriously! I disconnected my ethernet card and got this:

Plus, if you have an Internet connection you have to do a DNS lookup in the equals method of URL to compare IP addresses. This is horribly slow and just a plain old bad idea. You can’t use these in Maps, Sets, Sorted anything or just ever call equals because it takes nearly a second best case to resolve the host name, even on a fast connection.

This code took over 4 minutes to execute:

Argh! This class sucks and I refuse to ever use it again. I’ll always use URI from now on since it doesn’t suck.

The author tag still says James Gosling, so I ask you Mr. Gosling, what were you thinking?

  40 Responses to “Mr. Gosling – why did you make URL equals suck?!?”

  1. Actually, this would be a nice WTF :) Just imagine putting a collection of URLs into a TreeSet…

  2. Working on it now. hehe

  3. funny shit, Brian – good catch

  4. yup, i’ve run into this a lot w/ more than just URL from the jdk. regex.Pattern comes to mind, although slightly more complex to implement an equals() for such things as

    .*?[a-z|0-9]

    -versus-

    .*?[0-9|a-z]

    but i remain shocked that no one’s done this.

    any type of DOM document equals as well.
    there’s a canonicalizer in apache’s xml-sec suite, but seems like a lot to roll in another dependency just to see if two documents are equal

  5. Reposting my comment from Reddit in case you don’t spot it:

    Wow. That’s monumentally bad for any library, let alone the standard library. Whoever thought that would be a good idea?

    Argh! This class sucks and I refuse to ever use it again. I’ll always use URI from now on since it doesn’t suck.

    Sorry, URI sucks too. Granted, it doesn’t do anything quite so stupid as URL, but it still gives plenty of incorrect results.

  6. “People are encouraged to use URI for parsing and URI comparison, and leave URL class for accessing the URI itself, getting at the protocol handler, interacting with the protocol etc. So, at present, we don’t plan on changing the URL.equals/hashCode behavior and we will leave the bug open until Tiger, when we re-investigate our options.”

    From http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4434494 , 2001.

  7. Ummm…try reading the javadoc for URL:

    http://java.sun.com/j2se/1.5.0/docs/api/java/net/URL.html

    A URL object is not the same as a String object. The equals method does not compare the String portion of the URL.

  8. Why did he make the language suck so that it even matters?

    newtype MyURL = MU URL

    instance Eq MyURL where

  9. Wow, Gruber is taking his Java stalking seriously (re: your incoming Daring Fireball link, which implies that this is the only way to compare URLs in Java). But yeah, it is bizarrely bad the way the URL class thinks nothing of opening network connections, when the natural assumption is that it’s just for storing and validating URLs. Seems like the whole class should be deprecated in favor of URI + HttpClient, or something.

  10. i agree that this is a WTF, but at least it does exactly what the documentation says. on second thought … i guess actually they just wrote the doc so it says exactly what the method does :-)

  11. Umm..if the IP addresses are identical they are in fact the same URL for all intents and purposes and therefore equal assuming you are using the same application, say a browser, to access them which uses a well-known port. Only one app can bind to one port on an IP at a time right?

    So, since you weren’t specifying the port, it probably ‘defaulted’ to port 80 for both making them indeed exactly equivalent. Use either one and you’ll get exactly the same server app on port 80 presumably speaking HTTP since that is the well-known port for that protocol. I agree it could be more clear, but ‘caveat developer’. Read Thine Manual.

    So if one RTFMs on ‘equals’:
    "Two URL objects are equal if they have the same protocol, reference equivalent hosts, have the same port number on the host, and the same file and fragment of the file."

    • You are missing something thinking you can only have one app in this situation. If you are using the URL to send an HTTP request to the server, you include a HOST header that specifies the host name you are using from the URL. There are whole data centers full of servers where every site requires that host name to figure out which site to serve since this is a common way to host multiple web sites off one server and is used by hosting companies extensively. Since it is just a header, it’s easy for certain equipment to also send it to completely different servers based on it as well, perhaps after answering to the public IP address DNS gives.

  12. There is a great bug report for this issue that was filed back in 2001.

    http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4434494

  13. I’d be perfectly fine with something along these lines:

    new IPAddress(“http://foo.example.com”).equals(new IPAddress(“http://www.example.com”))

    However, URL and URI do not equal just because their IPs equal.

    Once again, good catch Brian – Dare’s post (http://www.25hoursaday.com/weblog/PermaLink.aspx?guid=1684521e-3709-41fe-8712-5f9b5fe5cb93) reminded me of this WTF.

  14. Seems reasonable:

    equals

    public boolean equals(Object obj)
    Compares this URL for equality with another object.
    If the given object is not a URL then this method immediately returns false.

    Two URL objects are equal if they have the same protocol, reference equivalent hosts, have the same port number on the host, and the same file and fragment of the file.

    Two hosts are considered equivalent if both host names can be resolved into the same IP addresses; else if either host name can’t be resolved, the host names must be equal without regard to case; or both host names equal to null.

    Since hosts comparison requires name resolution, this operation is a blocking operation.

    Note: The defined behavior for equals is known to be inconsistent with virtual hosting in HTTP.

    Overrides:
    equals in class Object
    Parameters:
    obj – the URL to compare against.
    Returns:
    true if the objects are the same; false otherwise.
    See Also:
    Object.hashCode(), Hashtable

  15. Wow, that’s bad design! I great WTF in fact. Using HttpClient to access resources makes sense. When you have a fairly static URI class that has no such capabilities, having an URL class which basically wraps up URI and HttpClient into one incomprihensible mega-class, is just insane! Naming the class URL is just wrong and implementing equals() to do nslookups and whatnot is such a bad design decision that I can’t even begin to describe it. I’ll end as I started: Wow!

  16. Words. Fail. Me.

  17. I was just screwing with you guys.

  18. Dude, java.net.URL was written for JDK1.0 – and possibly even for the earlier alpha or beta pre-releases.

    The code is 12 years old, and was written with a set-top box system as the target platform, not for a general-purpose cross-platform enterprise computing language. I’d love to see all the can’t-predict-the-future assumptions that you’ve hard-coded into your software over the last 12 years!

    Yes, the class isn’t well written, but unfortunately there’s no way to fix it while still maintaining backwards compatibility.

  19. “””Umm..if the IP addresses are identical they are in fact the same URL for all intents and purposes and therefore equal assuming you are using the same application, say a browser, to access them which uses a well-known port. “””

    That may have been true in the distant past,
    but not today. A single IP address can be used to host more than one domain. The server differentiates the domain name based on the incoming requests Host: header.

    http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.23

    Of course, that was introduced in HTTP 1.1, and if the spec for this Java class is really 12 years old then it in fact predates HTTP 1.1. Of course, that in no way excuses making a network call to compare the equality of two URIs.

  20. Screw backwards compatibilty. It is long past time to go for a clean break and fix the ton of poorly designed crap that Java 1.x is holding on to.

    And whoever wrote shit like this, not to mention the Date class, should not be allowed anywhere near it.

  21. override equals() if you don’t like it?

  22. URL is final so overriding equals isn’t an available option unfortunately. Besides, this doesn’t work for libraries that instantiate URLs. You would have to wrap every call to libraries that return URLs with a custom URL class, that is if Java was dumb and made URL non-final.

    Would look like this:

    URL url = new BetterURL(myLibrary.getURL());

  23. Ah, fair enough, I didn’t know it was final. I’m not too fond of things being final. Since the linker knows the whole inheritance structure it can infer finality for itself. I wonder if this will ever be relaxed in the API?

  24. Umm..if the IP addresses are identical they are in fact the same URL for all intents and purposes and therefore equal assuming you are using the same application, say a browser, to access them which uses a well-known port. Only one app can bind to one port on an IP at a time right?

    Wrong. I work closely with the it manager of a small datacenter, and they have as many as 500 websites running on a single external IP address. In other words, this comparison method could say that http://foo.net is equal to http://bar.org.au, simply by both of those domains being hosted with the same company.

  25. NSU – 4efer, 5210 – rulez

  26. Not only is it wrong, but it can be very slow. It goes to DNS to do equals or hashCode. Not only that, it appears that “file” URLs do a host lookup too. Plus the boot class loader seems to rely on this to load JAR files.

  27. Hello
    I agree,The code is 12 years old, and was written with a set-top box system as the target platform.Override equals() if you don’t like it?

    Regards,
    Alex Bell

  28. URL is final, so unless you compile in some AOP, you can’t do much with it.

  29. Yes Brian, it is final. BUT you can write your own myEquals() and iterator instead can you not?

    And to those saying that the writer should summarily be sacked, I’ld love to see your code :) Stop crying about 12 year old technology and do something about it. Damn programmers are so lazy now days that if it isn’t written for them they can’t do it. All lazy programmers want to do is call API’s and expect them to work the way THEY want.

  30. @Alan,

    Okay, first off, I’m not crying. I’m pointing out an issue for others to be careful of. I’m perfectly capable of fixing it and offer a suggestion in the post about how to avoid URL – namely always use java.net.URI.

    Secondly, you’re not really attacking my code are you? That’s a bad move. My code is available in many open source projects and if you bothered to look around and check out some of my code, you’d see that it is not only well written and tested, but usually not prone to horrendous issues like java.net.URL. Probably should do some research before you post next time.

  31. how to make two objects equal in java

  32. Hi!

    I would like make better my SQL experience.
    I red that many SQL books and would like to
    read more about SQL for my work as db2 database manager.

    What would you recommend?

    Thanks,
    Werutz

  33. Not only do you have the issue of equals returning true for ”http://foo.example.com” and ”http://www.example.com” when it should return false, the opposite also occurs. If you are doing load balancing with a round robin DNS, there is a good chance that “http://www.mysite.com” and “http://www.mysite.com” return false when they should return true.

  34. None can doubt the veacitry of this article.

  35. […] “Mr. Gosling — why did you make url equals suck?” explains one such problem. Just get in the habit of using java.net.URI instead. […]

  36. When I initially commented I seem to have clicked the -Notify me when
    new comments are added- checkbox and now every time a comment is added I
    recieve 4 emails with the exact same comment. Perhaps there is an easy method you can remove me from that service?
    Thank you!

  37. […] "Mr. Gosling — since did we make url equals suck?" explains one such problem. Just get in a dress of controlling java.net.URI instead. […]

  38. This convenience gives potential customers the ability to view products and services,
    compare these products and service to the ones offered by competitors and make a purchase at any hour of any day.
    Business websites are a gateway for your business to target the
    vast online market and internet marketing is the portal that you
    can use to attract your targeted customers. Since the arrival of the internet,
    the face of the marketing and advertising business has dramatically changed.

  39. It’s a pity you don’t have a donate button! I’d without a doubt donate to this
    superb blog! I guess for now i’ll settle for book-marking and adding your RSS feed
    to my Google account. I look forward to new updates and will talk about this blog with my Facebook group.

    Talk soon!

 Leave a Reply

(required)

(required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">