Okay, I’m totally hacked! java.net.URL class officially sucks! The equals method on this shining example of the JDK API mess actually does a blocking DNS lookup on the host string to resolve to an IP address and then compares the IP addresses rather than the host string. What freakin’ sense does that make?
Simple example:
URL url1 = new URL("http://foo.example.com"); URL url2 = new URL("http://example.com");
Let’s say these map to these IP addresses:
http://foo.example.com => 245.10.10.1 http://example.com => 245.10.10.1
Here’s the scary part:
url1.equals(url2) => true!
These two URLs are NOT equal, ever! They could be different web apps, have different failure IPs and even different servers using a load balancer. At 2AM they might have the same IP, at 2PM they might have different ones. They might be in different states or different countries. And here is the kicker! URL’s equals method isn’t idempotent! If you call it with an Internet connection you might get a different result than if you call it without one! Seriously! I disconnected my ethernet card and got this:
url1.equals(url2) => false!
Plus, if you have an Internet connection you have to do a DNS lookup in the equals method of URL to compare IP addresses. This is horribly slow and just a plain old bad idea. You can’t use these in Maps, Sets, Sorted anything or just ever call equals because it takes nearly a second best case to resolve the host name, even on a fast connection.
This code took over 4 minutes to execute:
Set<URL> excludeURLs = ...; // approximately 20 List<URL> testURLs = ...; // approximately 400 for (URL url : testURLs) { if (excludeURLs.contains(url)) { continue; } doSomeWork(url); }
Argh! This class sucks and I refuse to ever use it again. I’ll always use URI from now on since it doesn’t suck.
The author tag still says James Gosling, so I ask you Mr. Gosling, what were you thinking?
Actually, this would be a nice WTF 🙂 Just imagine putting a collection of URLs into a TreeSet…
LikeLike
Working on it now. hehe
LikeLike
funny shit, Brian – good catch
LikeLike
yup, i’ve run into this a lot w/ more than just URL from the jdk. regex.Pattern comes to mind, although slightly more complex to implement an equals() for such things as
.*?[a-z|0-9]
-versus-
.*?[0-9|a-z]
but i remain shocked that no one’s done this.
any type of DOM document equals as well.
there’s a canonicalizer in apache’s xml-sec suite, but seems like a lot to roll in another dependency just to see if two documents are equal
LikeLike
Reposting my comment from Reddit in case you don’t spot it:
Sorry, URI sucks too. Granted, it doesn’t do anything quite so stupid as URL, but it still gives plenty of incorrect results.
LikeLike
“People are encouraged to use URI for parsing and URI comparison, and leave URL class for accessing the URI itself, getting at the protocol handler, interacting with the protocol etc. So, at present, we don’t plan on changing the URL.equals/hashCode behavior and we will leave the bug open until Tiger, when we re-investigate our options.”
From http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4434494 , 2001.
LikeLike
Ummm…try reading the javadoc for URL:
http://java.sun.com/j2se/1.5.0/docs/api/java/net/URL.html
A URL object is not the same as a String object. The equals method does not compare the String portion of the URL.
LikeLike
Why did he make the language suck so that it even matters?
newtype MyURL = MU URL
instance Eq MyURL where
…
LikeLike
Wow, Gruber is taking his Java stalking seriously (re: your incoming Daring Fireball link, which implies that this is the only way to compare URLs in Java). But yeah, it is bizarrely bad the way the URL class thinks nothing of opening network connections, when the natural assumption is that it’s just for storing and validating URLs. Seems like the whole class should be deprecated in favor of URI + HttpClient, or something.
LikeLike
i agree that this is a WTF, but at least it does exactly what the documentation says. on second thought … i guess actually they just wrote the doc so it says exactly what the method does 🙂
LikeLike
Umm..if the IP addresses are identical they are in fact the same URL for all intents and purposes and therefore equal assuming you are using the same application, say a browser, to access them which uses a well-known port. Only one app can bind to one port on an IP at a time right?
So, since you weren’t specifying the port, it probably ‘defaulted’ to port 80 for both making them indeed exactly equivalent. Use either one and you’ll get exactly the same server app on port 80 presumably speaking HTTP since that is the well-known port for that protocol. I agree it could be more clear, but ‘caveat developer’. Read Thine Manual.
So if one RTFMs on ‘equals’:
"Two URL objects are equal if they have the same protocol, reference equivalent hosts, have the same port number on the host, and the same file and fragment of the file."
LikeLike
You are missing something thinking you can only have one app in this situation. If you are using the URL to send an HTTP request to the server, you include a HOST header that specifies the host name you are using from the URL. There are whole data centers full of servers where every site requires that host name to figure out which site to serve since this is a common way to host multiple web sites off one server and is used by hosting companies extensively. Since it is just a header, it’s easy for certain equipment to also send it to completely different servers based on it as well, perhaps after answering to the public IP address DNS gives.
LikeLike
There is a great bug report for this issue that was filed back in 2001.
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4434494
LikeLike
I’d be perfectly fine with something along these lines:
new IPAddress(“http://foo.example.com”).equals(new IPAddress(“http://www.example.com”))
However, URL and URI do not equal just because their IPs equal.
Once again, good catch Brian – Dare’s post (http://www.25hoursaday.com/weblog/PermaLink.aspx?guid=1684521e-3709-41fe-8712-5f9b5fe5cb93) reminded me of this WTF.
LikeLike
Seems reasonable:
equals
public boolean equals(Object obj)
Compares this URL for equality with another object.
If the given object is not a URL then this method immediately returns false.
Two URL objects are equal if they have the same protocol, reference equivalent hosts, have the same port number on the host, and the same file and fragment of the file.
Two hosts are considered equivalent if both host names can be resolved into the same IP addresses; else if either host name can’t be resolved, the host names must be equal without regard to case; or both host names equal to null.
Since hosts comparison requires name resolution, this operation is a blocking operation.
Note: The defined behavior for equals is known to be inconsistent with virtual hosting in HTTP.
Overrides:
equals in class Object
Parameters:
obj – the URL to compare against.
Returns:
true if the objects are the same; false otherwise.
See Also:
Object.hashCode(), Hashtable
LikeLike
Wow, that’s bad design! I great WTF in fact. Using HttpClient to access resources makes sense. When you have a fairly static URI class that has no such capabilities, having an URL class which basically wraps up URI and HttpClient into one incomprihensible mega-class, is just insane! Naming the class URL is just wrong and implementing equals() to do nslookups and whatnot is such a bad design decision that I can’t even begin to describe it. I’ll end as I started: Wow!
LikeLike
Words. Fail. Me.
LikeLike
I was just screwing with you guys.
LikeLike
Dude, java.net.URL was written for JDK1.0 – and possibly even for the earlier alpha or beta pre-releases.
The code is 12 years old, and was written with a set-top box system as the target platform, not for a general-purpose cross-platform enterprise computing language. I’d love to see all the can’t-predict-the-future assumptions that you’ve hard-coded into your software over the last 12 years!
Yes, the class isn’t well written, but unfortunately there’s no way to fix it while still maintaining backwards compatibility.
LikeLike
“””Umm..if the IP addresses are identical they are in fact the same URL for all intents and purposes and therefore equal assuming you are using the same application, say a browser, to access them which uses a well-known port. “””
That may have been true in the distant past,
but not today. A single IP address can be used to host more than one domain. The server differentiates the domain name based on the incoming requests Host: header.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.23
Of course, that was introduced in HTTP 1.1, and if the spec for this Java class is really 12 years old then it in fact predates HTTP 1.1. Of course, that in no way excuses making a network call to compare the equality of two URIs.
LikeLike
Screw backwards compatibilty. It is long past time to go for a clean break and fix the ton of poorly designed crap that Java 1.x is holding on to.
And whoever wrote shit like this, not to mention the Date class, should not be allowed anywhere near it.
LikeLike
override equals() if you don’t like it?
LikeLike
URL is final so overriding equals isn’t an available option unfortunately. Besides, this doesn’t work for libraries that instantiate URLs. You would have to wrap every call to libraries that return URLs with a custom URL class, that is if Java was dumb and made URL non-final.
Would look like this:
URL url = new BetterURL(myLibrary.getURL());
LikeLike
Ah, fair enough, I didn’t know it was final. I’m not too fond of things being final. Since the linker knows the whole inheritance structure it can infer finality for itself. I wonder if this will ever be relaxed in the API?
LikeLike
Wrong. I work closely with the it manager of a small datacenter, and they have as many as 500 websites running on a single external IP address. In other words, this comparison method could say that http://foo.net is equal to http://bar.org.au, simply by both of those domains being hosted with the same company.
LikeLike
NSU – 4efer, 5210 – rulez
LikeLike
Not only is it wrong, but it can be very slow. It goes to DNS to do equals or hashCode. Not only that, it appears that “file” URLs do a host lookup too. Plus the boot class loader seems to rely on this to load JAR files.
LikeLike
Hello
I agree,The code is 12 years old, and was written with a set-top box system as the target platform.Override equals() if you don’t like it?
Regards,
Alex Bell
LikeLike
URL is final, so unless you compile in some AOP, you can’t do much with it.
LikeLike
Yes Brian, it is final. BUT you can write your own myEquals() and iterator instead can you not?
And to those saying that the writer should summarily be sacked, I’ld love to see your code 🙂 Stop crying about 12 year old technology and do something about it. Damn programmers are so lazy now days that if it isn’t written for them they can’t do it. All lazy programmers want to do is call API’s and expect them to work the way THEY want.
LikeLike
@Alan,
Okay, first off, I’m not crying. I’m pointing out an issue for others to be careful of. I’m perfectly capable of fixing it and offer a suggestion in the post about how to avoid URL – namely always use java.net.URI.
Secondly, you’re not really attacking my code are you? That’s a bad move. My code is available in many open source projects and if you bothered to look around and check out some of my code, you’d see that it is not only well written and tested, but usually not prone to horrendous issues like java.net.URL. Probably should do some research before you post next time.
LikeLike
how to make two objects equal in java
LikeLike
Hi!
I would like make better my SQL experience.
I red that many SQL books and would like to
read more about SQL for my work as db2 database manager.
What would you recommend?
Thanks,
Werutz
LikeLike
Not only do you have the issue of equals returning true for ”http://foo.example.com” and ”http://www.example.com” when it should return false, the opposite also occurs. If you are doing load balancing with a round robin DNS, there is a good chance that “http://www.mysite.com” and “http://www.mysite.com” return false when they should return true.
LikeLike
None can doubt the veacitry of this article.
LikeLike
When I initially commented I seem to have clicked the -Notify me when
new comments are added- checkbox and now every time a comment is added I
recieve 4 emails with the exact same comment. Perhaps there is an easy method you can remove me from that service?
Thank you!
LikeLike
Maybe he wanted the class like him.
See alse java.util.Date
LikeLike
looks like misunderstanding.
handler.equals is where comparison happens.
so if you don’t like default behavior, just use public URL(URL context, String spec, URLStreamHandler handler)
URLStreamHandler can be easily overriden, namely its hostsEqual or sameFile.
LikeLike
Not necessarily a misunderstanding. Sure you can override the default behavior, but it is the default behavior that I was talking about. The default behavior has some dangerous behavior.
If you are going to use your own URLStreamHandler, I would suggest changing the URLStreamHandlerFactory that URL uses to build the URLStreamHandlers if they aren’t provided to the constructor. This will ensure that you are safe everywhere, including in libraries and frameworks.
LikeLike