Mr. Gosling - why did you make URL equals suck?!?
Okay, I’m totally hacked! java.net.URL class officially sucks! The equals method on this shining example of the JDK API mess actually does a blocking DNS lookup on the host string to resolve to an IP address and then compares the IP addresses rather than the host string. What freakin’ sense does that make?
Simple example:
URL url1 = new URL("http://foo.example.com");
URL url2 = new URL("http://example.com");
Let’s say these map to these IP addresses:
http://foo.example.com => 245.10.10.1 http://example.com => 245.10.10.1
Here’s the scary part:
url1.equals(url2) => true!
These two URLs are NOT equal, ever! They could be different web apps, have different failure IPs and even different servers using a load balancer. At 2AM they might have the same IP, at 2PM they might have different ones. They might be in different states or different countries. And here is the kicker! URL’s equals method isn’t idempotent! If you call it with an Internet connection you might get a different result than if you call it without one! Seriously! I disconnected my ethernet card and got this:
url1.equals(url2) => false!
Plus, if you have an Internet connection you have to do a DNS lookup in the equals method of URL to compare IP addresses. This is horribly slow and just a plain old bad idea. You can’t use these in Maps, Sets, Sorted anything or just ever call equals because it takes nearly a second best case to resolve the host name, even on a fast connection.
This code took over 4 minutes to execute:
Set<URL> excludeURLs = ...; // approximately 20
List<URL> testURLs = ...; // approximately 400
for (URL url : testURLs) {
if (excludeURLs.contains(url)) {
continue;
}
doSomeWork(url);
}
Argh! This class sucks and I refuse to ever use it again. I’ll always use URI from now on since it doesn’t suck.
The author tag still says James Gosling, so I ask you Mr. Gosling, what were you thinking?
December 6th, 2006 at 1:15 pm
Actually, this would be a nice WTF :) Just imagine putting a collection of URLs into a TreeSet…
December 6th, 2006 at 3:39 pm
Working on it now. hehe
December 6th, 2006 at 11:17 pm
funny shit, Brian - good catch
December 7th, 2006 at 9:13 am
yup, i’ve run into this a lot w/ more than just URL from the jdk. regex.Pattern comes to mind, although slightly more complex to implement an equals() for such things as
.*?[a-z|0-9]
-versus-
.*?[0-9|a-z]
but i remain shocked that no one’s done this.
any type of DOM document equals as well.
there’s a canonicalizer in apache’s xml-sec suite, but seems like a lot to roll in another dependency just to see if two documents are equal
January 31st, 2007 at 12:53 pm
Reposting my comment from Reddit in case you don’t spot it:
Sorry, URI sucks too. Granted, it doesn’t do anything quite so stupid as URL, but it still gives plenty of incorrect results.
January 31st, 2007 at 1:30 pm
“People are encouraged to use URI for parsing and URI comparison, and leave URL class for accessing the URI itself, getting at the protocol handler, interacting with the protocol etc. So, at present, we don’t plan on changing the URL.equals/hashCode behavior and we will leave the bug open until Tiger, when we re-investigate our options.”
From http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4434494 , 2001.
January 31st, 2007 at 1:49 pm
Ummm…try reading the javadoc for URL:
http://java.sun.com/j2se/1.5.0/docs/api/java/net/URL.html
A URL object is not the same as a String object. The equals method does not compare the String portion of the URL.
January 31st, 2007 at 5:41 pm
Why did he make the language suck so that it even matters?
newtype MyURL = MU URL
instance Eq MyURL where
…
January 31st, 2007 at 5:44 pm
Wow, Gruber is taking his Java stalking seriously (re: your incoming Daring Fireball link, which implies that this is the only way to compare URLs in Java). But yeah, it is bizarrely bad the way the URL class thinks nothing of opening network connections, when the natural assumption is that it’s just for storing and validating URLs. Seems like the whole class should be deprecated in favor of URI + HttpClient, or something.
January 31st, 2007 at 6:24 pm
i agree that this is a WTF, but at least it does exactly what the documentation says. on second thought … i guess actually they just wrote the doc so it says exactly what the method does :-)
January 31st, 2007 at 6:51 pm
Umm..if the IP addresses are identical they are in fact the same URL for all intents and purposes and therefore equal assuming you are using the same application, say a browser, to access them which uses a well-known port. Only one app can bind to one port on an IP at a time right?
So, since you weren’t specifying the port, it probably ‘defaulted’ to port 80 for both making them indeed exactly equivalent. Use either one and you’ll get exactly the same server app on port 80 presumably speaking HTTP since that is the well-known port for that protocol. I agree it could be more clear, but ‘caveat developer’. Read Thine Manual.
So if one RTFMs on ‘equals’:
"Two URL objects are equal if they have the same protocol, reference equivalent hosts, have the same port number on the host, and the same file and fragment of the file."January 31st, 2007 at 6:52 pm
There is a great bug report for this issue that was filed back in 2001.
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4434494
January 31st, 2007 at 7:55 pm
I’d be perfectly fine with something along these lines:
new IPAddress(”http://foo.example.com”).equals(new IPAddress(”http://www.example.com”))
However, URL and URI do not equal just because their IPs equal.
Once again, good catch Brian - Dare’s post (http://www.25hoursaday.com/weblog/PermaLink.aspx?guid=1684521e-3709-41fe-8712-5f9b5fe5cb93) reminded me of this WTF.
January 31st, 2007 at 9:38 pm
Seems reasonable:
equals
public boolean equals(Object obj)
Compares this URL for equality with another object.
If the given object is not a URL then this method immediately returns false.
Two URL objects are equal if they have the same protocol, reference equivalent hosts, have the same port number on the host, and the same file and fragment of the file.
Two hosts are considered equivalent if both host names can be resolved into the same IP addresses; else if either host name can’t be resolved, the host names must be equal without regard to case; or both host names equal to null.
Since hosts comparison requires name resolution, this operation is a blocking operation.
Note: The defined behavior for equals is known to be inconsistent with virtual hosting in HTTP.
Overrides:
equals in class Object
Parameters:
obj - the URL to compare against.
Returns:
true if the objects are the same; false otherwise.
See Also:
Object.hashCode(), Hashtable
February 1st, 2007 at 5:14 am
Wow, that’s bad design! I great WTF in fact. Using HttpClient to access resources makes sense. When you have a fairly static URI class that has no such capabilities, having an URL class which basically wraps up URI and HttpClient into one incomprihensible mega-class, is just insane! Naming the class URL is just wrong and implementing equals() to do nslookups and whatnot is such a bad design decision that I can’t even begin to describe it. I’ll end as I started: Wow!
February 1st, 2007 at 7:06 am
Words. Fail. Me.
February 1st, 2007 at 8:54 am
I was just screwing with you guys.
February 1st, 2007 at 10:58 am
Dude, java.net.URL was written for JDK1.0 - and possibly even for the earlier alpha or beta pre-releases.
The code is 12 years old, and was written with a set-top box system as the target platform, not for a general-purpose cross-platform enterprise computing language. I’d love to see all the can’t-predict-the-future assumptions that you’ve hard-coded into your software over the last 12 years!
Yes, the class isn’t well written, but unfortunately there’s no way to fix it while still maintaining backwards compatibility.
February 1st, 2007 at 9:29 pm
“”"Umm..if the IP addresses are identical they are in fact the same URL for all intents and purposes and therefore equal assuming you are using the same application, say a browser, to access them which uses a well-known port. “”"
That may have been true in the distant past,
but not today. A single IP address can be used to host more than one domain. The server differentiates the domain name based on the incoming requests Host: header.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.23
Of course, that was introduced in HTTP 1.1, and if the spec for this Java class is really 12 years old then it in fact predates HTTP 1.1. Of course, that in no way excuses making a network call to compare the equality of two URIs.
February 2nd, 2007 at 2:26 am
Screw backwards compatibilty. It is long past time to go for a clean break and fix the ton of poorly designed crap that Java 1.x is holding on to.
And whoever wrote shit like this, not to mention the Date class, should not be allowed anywhere near it.
February 2nd, 2007 at 11:35 am
override equals() if you don’t like it?
February 2nd, 2007 at 11:51 am
URL is final so overriding equals isn’t an available option unfortunately. Besides, this doesn’t work for libraries that instantiate URLs. You would have to wrap every call to libraries that return URLs with a custom URL class, that is if Java was dumb and made URL non-final.
Would look like this:
URL url = new BetterURL(myLibrary.getURL());
February 2nd, 2007 at 6:34 pm
Ah, fair enough, I didn’t know it was final. I’m not too fond of things being final. Since the linker knows the whole inheritance structure it can infer finality for itself. I wonder if this will ever be relaxed in the API?
February 4th, 2007 at 1:03 am
Wrong. I work closely with the it manager of a small datacenter, and they have as many as 500 websites running on a single external IP address. In other words, this comparison method could say that http://foo.net is equal to http://bar.org.au, simply by both of those domains being hosted with the same company.
March 13th, 2007 at 10:53 pm
NSU - 4efer, 5210 - rulez
May 28th, 2007 at 9:14 am
Not only is it wrong, but it can be very slow. It goes to DNS to do equals or hashCode. Not only that, it appears that “file” URLs do a host lookup too. Plus the boot class loader seems to rely on this to load JAR files.
October 17th, 2007 at 1:26 am
Hello
I agree,The code is 12 years old, and was written with a set-top box system as the target platform.Override equals() if you don’t like it?
Regards,
Alex Bell
October 29th, 2007 at 1:32 pm
URL is final, so unless you compile in some AOP, you can’t do much with it.
April 16th, 2008 at 9:36 am
Yes Brian, it is final. BUT you can write your own myEquals() and iterator instead can you not?
And to those saying that the writer should summarily be sacked, I’ld love to see your code :) Stop crying about 12 year old technology and do something about it. Damn programmers are so lazy now days that if it isn’t written for them they can’t do it. All lazy programmers want to do is call API’s and expect them to work the way THEY want.
April 16th, 2008 at 10:34 am
@Alan,
Okay, first off, I’m not crying. I’m pointing out an issue for others to be careful of. I’m perfectly capable of fixing it and offer a suggestion in the post about how to avoid URL - namely always use java.net.URI.
Secondly, you’re not really attacking my code are you? That’s a bad move. My code is available in many open source projects and if you bothered to look around and check out some of my code, you’d see that it is not only well written and tested, but usually not prone to horrendous issues like java.net.URL. Probably should do some research before you post next time.
May 2nd, 2008 at 12:53 am
how to make two objects equal in java