Mr. Gosling – why did you make URL equals suck?!?

Okay, I’m totally hacked! java.net.URL class officially sucks! The equals method on this shining example of the JDK API mess actually does a blocking DNS lookup on the host string to resolve to an IP address and then compares the IP addresses rather than the host string. What freakin’ sense does that make?

Simple example:

URL url1 = new URL("http://foo.example.com");
URL url2 = new URL("http://example.com");

Let’s say these map to these IP addresses:

http://foo.example.com => 245.10.10.1
http://example.com => 245.10.10.1

Here’s the scary part:

url1.equals(url2) => true!

These two URLs are NOT equal, ever! They could be different web apps, have different failure IPs and even different servers using a load balancer. At 2AM they might have the same IP, at 2PM they might have different ones. They might be in different states or different countries. And here is the kicker! URL’s equals method isn’t idempotent! If you call it with an Internet connection you might get a different result than if you call it without one! Seriously! I disconnected my ethernet card and got this:

url1.equals(url2) => false!

Plus, if you have an Internet connection you have to do a DNS lookup in the equals method of URL to compare IP addresses. This is horribly slow and just a plain old bad idea. You can’t use these in Maps, Sets, Sorted anything or just ever call equals because it takes nearly a second best case to resolve the host name, even on a fast connection.

This code took over 4 minutes to execute:

Set<URL> excludeURLs = ...; // approximately 20
List<URL> testURLs = ...; // approximately 400

for (URL url : testURLs) {
    if (excludeURLs.contains(url)) {
        continue;
    }

    doSomeWork(url);
}

Argh! This class sucks and I refuse to ever use it again. I’ll always use URI from now on since it doesn’t suck.

The author tag still says James Gosling, so I ask you Mr. Gosling, what were you thinking?

45 thoughts on “Mr. Gosling – why did you make URL equals suck?!?

  1. yup, i’ve run into this a lot w/ more than just URL from the jdk. regex.Pattern comes to mind, although slightly more complex to implement an equals() for such things as

    .*?[a-z|0-9]

    -versus-

    .*?[0-9|a-z]

    but i remain shocked that no one’s done this.

    any type of DOM document equals as well.
    there’s a canonicalizer in apache’s xml-sec suite, but seems like a lot to roll in another dependency just to see if two documents are equal

    Like

  2. Reposting my comment from Reddit in case you don’t spot it:

    Wow. That’s monumentally bad for any library, let alone the standard library. Whoever thought that would be a good idea?

    Argh! This class sucks and I refuse to ever use it again. I’ll always use URI from now on since it doesn’t suck.

    Sorry, URI sucks too. Granted, it doesn’t do anything quite so stupid as URL, but it still gives plenty of incorrect results.

    Like

  3. Wow, Gruber is taking his Java stalking seriously (re: your incoming Daring Fireball link, which implies that this is the only way to compare URLs in Java). But yeah, it is bizarrely bad the way the URL class thinks nothing of opening network connections, when the natural assumption is that it’s just for storing and validating URLs. Seems like the whole class should be deprecated in favor of URI + HttpClient, or something.

    Like

  4. i agree that this is a WTF, but at least it does exactly what the documentation says. on second thought … i guess actually they just wrote the doc so it says exactly what the method does 🙂

    Like

  5. Umm..if the IP addresses are identical they are in fact the same URL for all intents and purposes and therefore equal assuming you are using the same application, say a browser, to access them which uses a well-known port. Only one app can bind to one port on an IP at a time right?

    So, since you weren’t specifying the port, it probably ‘defaulted’ to port 80 for both making them indeed exactly equivalent. Use either one and you’ll get exactly the same server app on port 80 presumably speaking HTTP since that is the well-known port for that protocol. I agree it could be more clear, but ‘caveat developer’. Read Thine Manual.

    So if one RTFMs on ‘equals’:
    "Two URL objects are equal if they have the same protocol, reference equivalent hosts, have the same port number on the host, and the same file and fragment of the file."

    Like

    1. You are missing something thinking you can only have one app in this situation. If you are using the URL to send an HTTP request to the server, you include a HOST header that specifies the host name you are using from the URL. There are whole data centers full of servers where every site requires that host name to figure out which site to serve since this is a common way to host multiple web sites off one server and is used by hosting companies extensively. Since it is just a header, it’s easy for certain equipment to also send it to completely different servers based on it as well, perhaps after answering to the public IP address DNS gives.

      Like

  6. Seems reasonable:

    equals

    public boolean equals(Object obj)
    Compares this URL for equality with another object.
    If the given object is not a URL then this method immediately returns false.

    Two URL objects are equal if they have the same protocol, reference equivalent hosts, have the same port number on the host, and the same file and fragment of the file.

    Two hosts are considered equivalent if both host names can be resolved into the same IP addresses; else if either host name can’t be resolved, the host names must be equal without regard to case; or both host names equal to null.

    Since hosts comparison requires name resolution, this operation is a blocking operation.

    Note: The defined behavior for equals is known to be inconsistent with virtual hosting in HTTP.

    Overrides:
    equals in class Object
    Parameters:
    obj – the URL to compare against.
    Returns:
    true if the objects are the same; false otherwise.
    See Also:
    Object.hashCode(), Hashtable

    Like

  7. Wow, that’s bad design! I great WTF in fact. Using HttpClient to access resources makes sense. When you have a fairly static URI class that has no such capabilities, having an URL class which basically wraps up URI and HttpClient into one incomprihensible mega-class, is just insane! Naming the class URL is just wrong and implementing equals() to do nslookups and whatnot is such a bad design decision that I can’t even begin to describe it. I’ll end as I started: Wow!

    Like

  8. Dude, java.net.URL was written for JDK1.0 – and possibly even for the earlier alpha or beta pre-releases.

    The code is 12 years old, and was written with a set-top box system as the target platform, not for a general-purpose cross-platform enterprise computing language. I’d love to see all the can’t-predict-the-future assumptions that you’ve hard-coded into your software over the last 12 years!

    Yes, the class isn’t well written, but unfortunately there’s no way to fix it while still maintaining backwards compatibility.

    Like

  9. “””Umm..if the IP addresses are identical they are in fact the same URL for all intents and purposes and therefore equal assuming you are using the same application, say a browser, to access them which uses a well-known port. “””

    That may have been true in the distant past,
    but not today. A single IP address can be used to host more than one domain. The server differentiates the domain name based on the incoming requests Host: header.

    http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.23

    Of course, that was introduced in HTTP 1.1, and if the spec for this Java class is really 12 years old then it in fact predates HTTP 1.1. Of course, that in no way excuses making a network call to compare the equality of two URIs.

    Like

  10. Screw backwards compatibilty. It is long past time to go for a clean break and fix the ton of poorly designed crap that Java 1.x is holding on to.

    And whoever wrote shit like this, not to mention the Date class, should not be allowed anywhere near it.

    Like

  11. URL is final so overriding equals isn’t an available option unfortunately. Besides, this doesn’t work for libraries that instantiate URLs. You would have to wrap every call to libraries that return URLs with a custom URL class, that is if Java was dumb and made URL non-final.

    Would look like this:

    URL url = new BetterURL(myLibrary.getURL());

    Like

  12. Ah, fair enough, I didn’t know it was final. I’m not too fond of things being final. Since the linker knows the whole inheritance structure it can infer finality for itself. I wonder if this will ever be relaxed in the API?

    Like

  13. Umm..if the IP addresses are identical they are in fact the same URL for all intents and purposes and therefore equal assuming you are using the same application, say a browser, to access them which uses a well-known port. Only one app can bind to one port on an IP at a time right?

    Wrong. I work closely with the it manager of a small datacenter, and they have as many as 500 websites running on a single external IP address. In other words, this comparison method could say that http://foo.net is equal to http://bar.org.au, simply by both of those domains being hosted with the same company.

    Like

  14. Not only is it wrong, but it can be very slow. It goes to DNS to do equals or hashCode. Not only that, it appears that “file” URLs do a host lookup too. Plus the boot class loader seems to rely on this to load JAR files.

    Like

  15. Hello
    I agree,The code is 12 years old, and was written with a set-top box system as the target platform.Override equals() if you don’t like it?

    Regards,
    Alex Bell

    Like

  16. Yes Brian, it is final. BUT you can write your own myEquals() and iterator instead can you not?

    And to those saying that the writer should summarily be sacked, I’ld love to see your code 🙂 Stop crying about 12 year old technology and do something about it. Damn programmers are so lazy now days that if it isn’t written for them they can’t do it. All lazy programmers want to do is call API’s and expect them to work the way THEY want.

    Like

  17. @Alan,

    Okay, first off, I’m not crying. I’m pointing out an issue for others to be careful of. I’m perfectly capable of fixing it and offer a suggestion in the post about how to avoid URL – namely always use java.net.URI.

    Secondly, you’re not really attacking my code are you? That’s a bad move. My code is available in many open source projects and if you bothered to look around and check out some of my code, you’d see that it is not only well written and tested, but usually not prone to horrendous issues like java.net.URL. Probably should do some research before you post next time.

    Like

  18. Hi!

    I would like make better my SQL experience.
    I red that many SQL books and would like to
    read more about SQL for my work as db2 database manager.

    What would you recommend?

    Thanks,
    Werutz

    Like

  19. Not only do you have the issue of equals returning true for ”http://foo.example.com” and ”http://www.example.com” when it should return false, the opposite also occurs. If you are doing load balancing with a round robin DNS, there is a good chance that “http://www.mysite.com” and “http://www.mysite.com” return false when they should return true.

    Like

  20. When I initially commented I seem to have clicked the -Notify me when
    new comments are added- checkbox and now every time a comment is added I
    recieve 4 emails with the exact same comment. Perhaps there is an easy method you can remove me from that service?
    Thank you!

    Like

  21. looks like misunderstanding.
    handler.equals is where comparison happens.
    so if you don’t like default behavior, just use public URL(URL context, String spec, URLStreamHandler handler)
    URLStreamHandler can be easily overriden, namely its hostsEqual or sameFile.

    Like

    1. Not necessarily a misunderstanding. Sure you can override the default behavior, but it is the default behavior that I was talking about. The default behavior has some dangerous behavior.

      If you are going to use your own URLStreamHandler, I would suggest changing the URLStreamHandlerFactory that URL uses to build the URLStreamHandlers if they aren’t provided to the constructor. This will ensure that you are safe everywhere, including in libraries and frameworks.

      Like

Leave a comment