Aug 272007
 

The phone screen

Question #1: How would you go about making a copy from a node in a directed graph?

This I had done before for Savant’s dependency tree. I had never distributed this type of operation, but since the operation is simple on a single machine distribution seems to be just a matter of dividing and then sharing of information via standard distributed algorithm operations like scatter, collect, etc.

Question #2: How would you write a cache that uses an LRU?

This I had also worked on. At Orbitz the hotel rates were cached in a very large cache that used an LRU. The LRU was actually from Apache, but the principle was simple. The cache was just a large array that we used a double hash for (for better usage). Collisions were immediately replaced although a bucket system could also be used to prevent throwing away recent cache stores. The LRU came from Apache, but was just a linked list of pointers to the elements in the hash. When an element was hit, that node in the linked list was removed and appended to the head. This was faster if the hash stored a struct that contained a point to the node in the linked list.

Question #3: How would you find the median of an array that can’t fit in memory?

This question I had some problems with. I wasn’t fond of theory and numerical computation at school and so I did what was needed and retained very little. Not to say that I couldn’t learn the material, just didn’t interest me all that much and to date I’ve never used any of it. Of course if I was going to work on the search engine code at Google, I would need to brush up. Anyways, I start thinking about slicing the array into segments and then distributing those. Each agent in the grid could further slice and distribute to build a tree. Then each agent would find the median and then push that value to its parent. That is as far as I got because I didn’t have the math to know if there was a algorithm to use those medians to find the global median. Well, after the call I busted out the CLR and found that all this stuff is called “Selection” algorithms. There is one algorithm that does exactly as I described but it then takes the “median-of-medians” and partitions the entire list around that value. This could probably be done on the grid and there was a PDF paper I stumbled across that talked about doing things this way. I’m not sure that is the best call though. After thinking about it more I wondered if the distributed grid could be a b-tree of sorts that uses the data set’s (e.g. real numbers) median value (always a known quantity if the data set is known) to build the tree. Once the tree was built you just recursively ask each node for its count and then ask for the ith element which would be i = count / 2. Again, I couldn’t really find anything conclusion online to state this was a possible solution.

Question #4: How would you randomly select N elements from an array of unknown size so that an equal distribution was achieved for infinite selects?

This one I sorta understood and got a very minimal answer to. I started by thinking about selecting from the array N elements. If there were more elements, you continue to select until you have 2N elements. Then you go back and throw out N elements at random. This operation continues until the entire array has been consumed and you now have a set of N element which are random. The problem is that the last N element in the array got favored because they only had the opportunity to be thrown out once, while the first N had the opportunity to be thrown out L / N times where L is the length. The interviewer then mentioned something about probability and I started thinking about the Java Memory Model (this really have little in common but that’s where my brain went). The memory model promotes Objects into tenure after they have been around for a while. In the same sorta system as you get to the end of the list the probability of keeping those last N elements is very low. You apply a probability weight to those elements that determines if they are kept or thrown out. Apparently you can solve this sucker by selecting a single element and applying some type of probability function to it and deciding to keep it or not. I again had little luck finding information online about this, and I ended up reading up on Monte Carlo, Marcov Chains, Sigma-Algebra and a ton of other stuff, but I have yet to put it all together to make a reasonable guess to the probability function. Since the length of the list is unknown, the function must use the number seen thus far in calculating the probability of keeping an element. And, in order to handle an array of length N, it must select the first N always. Therefore, it must have some mechanism for going back and throwing values out. So, I think I was on the right track, just didn’t get all the way to the end.

Boulder office

Second round of interviews was at the Boulder office. I interviewed with two guys from there and again things were very specific towards algorithms. The questions I received from these two were:

Question #1: If you have a list of integers and I ask you to find all the pairs in the list whose sum equals X, what is the best way to do this?

This is pretty simple brute force. You can iterate over the list and for each value figure out the value necessary to sum to X. Then just brute force search the list for that value. You can however speed this up with a sorted list by proximity searching. You’ll be able to find the second value faster based on your current position in the list and the values around you. You could also split the list into positive and negative around an index, saving some time. You could hash the values. There are lots of ways to solve this sucker, it just depends on the constraints you have, the size of things and what type of performance you need.

I had a few other questions in Boulder, but I can’t for the life of me recall what they were. Nothing too difficult. Just all algorithms and time calculations (big-o).

California office

Last interview was in California and they flew me out for it. I was put up quite a ways from the campus in a slightly seedy motel. Hey, I figured they would Microsoft me (Microsoft put us up at the Willows Lodge for the first MTS, which is a 4/5 star in Washington. Very posh) because they are Google and have a few billion in the bank. I wonder if they did that to throw me off a bit and see how I did under less than optimal conditions, but who knows.

Question #1: Find all the anagrams of a given word

This was the start of my demise. I asked the interviewer why we were going to optimize the dictionary when it was only a few 100K of words. Well, that was the wrong approach for sure and we started arguing about the size of the common dictionary. He said it was like 1 million words and I said it was more like 150K. Anyways, doesn’t matter really, but this interviewer and I had a personality conflict and his accent was so thick I kept having to ask him to repeat himself. I think this is why they didn’t hire me.

Anyways, this problems simple. Just sort each word, use that as the key into a Map whose value is a list of all the words with those letters. Of course he and I were arguing so much about things by this point I didn’t get the answer and he had to tell me, but the answers simple.

Question #2: If you have an algorithm for selecting candidates that has 10% error and you only select 10% from the available list, how many bad candidates do you select given a list of 100? How can you reduce the number of bad candidates if you can’t fix the error?

I was asked this question by the same interviewer from Question #1 and we were still at odds. Well, I started to solve it and he kinda got annoyed. He got up and started trying to solve it for me (strange). Then he sat there for 3-5 minutes trying and eventually gave up without the solution. The best part was when he said, “just trust me”. I was trying to do the math to solve this thing and he couldn’t even start to do the math and finally gave up. This really tipped me off to the fact that Google has a list of questions that interviewers can pick from and this guy picked one and forgot the solution. He hadn’t ever solved this problem himself, that I’m sure of.

As for my solution, I wanted to see the set sizes reduce as the variables changed. If you have an error of 10%, that means you either throw out 10 good candidates or hire 10 bad candidates from a pool of 100, or some mixture of those (e.g. 5 good 5 bad or 3 good 7 bad). Since error is fixed the only way to reduce the number of bad candidates hired is to reduce the initial set size. You want to reduce 100 to 10 or 5. That way you minimize your error. The issue then is that since error is fixed, over time you still hire the same number of bad candidates. As you repeat the process using a smaller set size, you eventually end up with the same number of bad candidates as you would with the original set size.

So, and this is what I argued with the interviewer, the only real solution with the information provided and without making large assumptions is to reduce the error. You have to somehow fix the problem of 10% error because it doesn’t matter in the long run what the set size is. Of course, he didn’t want to discuss that issue, just wanted me to solve the original problem.

Question #3: More discussion of the sum problem

We talked more about the sum problem from the Boulder interview. We discussed reducing the processing time, finding the sums faster and pretty much all the permutations you can think of. It wasn’t too bad and the guy interviewing me for this question was okay. One strange thing was he had a tag-along that was a new hire and the tag-along kept sorta smiling at me. That was really disconcerting. Again, this seemed like they were trying to throw me off or something. This was very unprofessional in my opinion, but whatever.

Question #4: Two color graph problem

This is the old graph coloring problem that everyone gets in school. He wanted some pseudo code for this sucker and I busted out a quick and dirty recursion for it. Of course there were issues in my code, because I only had about 2 minutes to do it. We fixed those and then tried to solve it without recursion. This is always pretty simple to do as long as you can tail it. Just use a linked list and anytime you hit a graph node that has links, add those nodes to the end of the list. Then you can just iterate over the list until there are no more elements. This is how Savant traverses the dependency graph it builds, so I’ve done it before.

Summary

The interesting point of the entire process of interviewing with Google was that I was never asked a single question that was not algorithmic. I’ve talked to dozens of other folks that have interviewed and they got questions about design, classes, SQL, modeling and loads of other areas. I think I understand why they were asking me these questions, ego. I’ve worked on large scale systems, which probably are larger than a lot of the applications that most folks at Google work on. I mean not everyone there can work on the search engine or GMail. Most of them probably work on smaller apps and only a few get to touch the big ones. I think, and have no supporting evidence to back it up, that because I worked at Orbitz the interviewers out in California wanted to show me up. The humorous part of this is that by doing so they firmly proved their own interview question. By allowing ego and dominance to drive their hiring process they are increasing their error. I hope Google’s smart enough to put in checks and balances and remove the interviewers like this from the system, because like I mentioned above, the only way to fix the problem is to reduce the error and the only way to reduce the error is to ensure that the interviewers are going to make the best decision possible for the company and not for themselves.

  20 Responses to “Google interviewing”

  1. This post was interesting. I always enjoy reading about interview questions for positions at large software companies. Thank you for writing this up.

  2. Thanks for the interesting read. I too interviewed for a position at Google Zurich in 2006, and also did not get it. The hotel they put me up in was pretty decent though (-;

    The problems they posed me were more differentiated than yours, very much so in fact, and a lot of fun to work on. Besides several phone interviews with Mountainview, I had a day with six 45-minute interviews in Zurich. I learned a lot and enjoyed discussing the diverse topics with them. When I was turned down I was very disappointed (this was 3 months into the process) and they would not tell me anything about how they arrived at their decision. The only thing they mentioned was that if I wanted to try again in 6 months or a year, I should certainly do so. I think this means that either there was an internal organisational issue that prevented the hire, or that they wanted to test my mettle and see if I reapplied (which I didn’t). Since then I have started my own business as a Java contractor and am doing fine.

    Did they give you any reasons for not hiring you?

  3. alot of weird questions

    on California office/Question #2 i think the best solution would be re-write the program(if its that simple, example says it is), i understand that they cannot fix it(maybe its written in some dead language or they dont have the source)

    google is about the graph.

  4. Sam: No problem

    Adriaan: The main reason they gave me was that they thought “my coding skills were not sufficient”. Which translated means, “we think you won’t be able to actually write good code.” Just as an interesting fact, I asked them if they had actually looked at my code because I do have 6 open source projects I run and I’ve worked on a lot of others. They told me that they never actually look at candidates code, but try to learn everything from the white-board coding.

    Raveman: Yeah, graphs are big there. I’ve written some good graphs over at Savant and I was hoping they would check out my code during the hiring process, but from what I could glean, they probably don’t have the band-width to do that. They are trying to double in size this year in most offices, which means that they spend a lot of time just weeding out candidates.

  5. Last year I applied at Google. I also interviewed with someone in CA over the phone, with almost the same questions. I also made it to the Boulder office, and had the same question, plus a couple on LRU, hashing implementations, and big-o.

    I also got just a voice mail from the recruiter informing me of a ‘no’ decision. It took me a week more of phone tag to talk to the recruiter to find out why. The reason was that my “skills weren’t a good fit”. I had the option to interview with other offices, but I wasn’t interested in relocating at that time.

    The biggest surprise to me was that they had a very specific skill set they were looking for, in a very specific position, despite the very generic nature of many of their job listings calling for “a broad range of Software Engineer positions” in “a number of areas”. At least now it looks like they’re putting real requirements up, so they don’t waste their time as well as interviewees…

  6. Bolder problem with list of integers can be solved in one pass on sorted list:
    Then you walk from both ends of the list:

    int sortedlist[]
    int position1=0, position2=sortedlist.length()-1;

    while (position1 X) {
    position2–;
    } else if (summ

  7. sorry .. should have encoded < and >
    list sorted ascending

    int sortedlist[]
    int position1=0, position2=sortedlist.length()-1;
    while (position1 >= position2) {
    summ = sortedlist[position1] + sortedlist[position2];
    if (summ > X)
    position2–;
    } else if (summ < X) {
    position1++;
    } else {
    println(“found a pair “+sortedlist[position1] +” “+ sortedlist[position2]);
    }
    }

  8. oups again .. “while” has inverted condition and when pair is found should increment position1/decrement position2.

  9. anon: Yeah, many folks have similar experiences as you and I did. I’m not sure why Google is tossing good engineers. Perhaps we just aren’t Googley enough. I know my entrepreneur nature definitely doesn’t help either.

  10. Anver: Yeah, that’s what I meant by sorting and doing proximity searching. Like you have, you look at the elements near you to figure out the next matches. Pretty straight-forward stuff. The tricky part comes when you have a list that doesn’t fit in memory. Something like a trillion 128 bit numbers or something crazy like that. Then you have to get more distributed or more brute force. I personally am really starting to tend to jump on b-trees for answering any “it doesn’t fit in memory” questions. Some can be solved without b-trees, but it is such a cool data-structure and you can do a lot of cool crap with it.

  11. It’s not your entrepreneurial nature that didn’t help, it’s your attitude. You used the word “simple” 7 times in your post. They actually want people who are entrepreneurial. But I’m going to give up because this will fall on deaf ears anyway.

    Remember, Google doesn’t just want smart people, they want smart NICE people who are easy to get along with. You completely failed the behavioral test. Why the hell would you argue with an interviewer and question him/her if you want to get the job?

  12. Digital,

    Dude, I can’t stress this enough – you have to question things everyday. I can guarantee that the founders of Google DON’T want people who just roll over and answer the questions. Pushing boundaries takes people who are willing to poke and prod. I’ve done a lot of interviewing and pair interviewed with a lot of really smart people and I can tell you that having folks who question things is much better than having them just answer the questions.

    Furthermore, you read a lot into things. Saying things are simple doesn’t mean I’m a jerk. I didn’t fail any behavioral test either. I’m not gonna pretend like questions are the hardest thing on the planet when they aren’t. That’s just lame as hell. My only point is that the Mountain View interviewers sucked and I felt like I wasn’t even interviewing, but like I was involved in some CS pissing contest. You weren’t there so you have no idea what the interviews were like. Had you actually been through a similar process, you might change your tune. In fact, there are tons of people who have been through similar processes at Google and other companies and they’ll tell you sometimes it really is the interviewers that botch it up.

    Oh and you gotta learn to be precise. I used the word 6 times, not 7. Try not to blow things out of proportion, you just end up looking defensive and discredit yourself.

  13. Really sorry to read about this experience; it’s never the intent for candidates to feel caught up in a “CS pissing contest” (as you say elsewhere), but unfortunately it happens. Hopefully those interviewers do get retrained.

    BTW, an English language dictionary is

  14. I suppose the less than sign is significant to some parser somewhere ;)

    …an English language dictionary is less than 200,000 words, but a multilingual dictionary can be much larger. The “correct” number depends on your frame of reference, and good interviewer knows that and would have been able to avoid the destructive conflict of wills that seems to have happened here.

  15. Yeah, I’ve worked a lot with the dictionary for the Inversoft Profanity Filter and Database. We have a speed trimmed dictionary for English that around 150k. But you are correct, including other languages increases that number dramatically.

    The question at its core is excellent because pre-processing is indeed a great performance boost and very easy to perform. I think like you mentioned, the destructive conflict caused the interview to fall apart when it should have been extremely fun to work through.

    I think that is something that is generally missing for many interviews at most companies – fun. I feel that making the person comfortable and interactive should really help in the decision making process. You want to see them having fun and interacting so you can determine if you want to work with them everyday or not.

  16. I’m confused by the apparent conflict in these two statements in CA question 1:

    “…but this interviewer and I had a personality conflict and his accent was so thick I kept having to ask him to repeat himself. I think this is why they didn’t hire me.”

    and a couple lines down you say:

    “Of course he and I were arguing so much about things by this point I didn’t get the answer and he had to tell me, but the answers simple.”

    Could the reason they didn’t hire you have to do with the fact that you didn’t have the answer to what you admit was a simple problem? Instead you felt you had to argue with them about validity of the premise of a theoretical problem. That sounds less like open minded debate and more like defensiveness because you can’t solve the problem.

    I can see Digital Boy’s point. However, that doesn’t mean the interviewers didn’t also have issues.

  17. Sean,

    Not really a conflict in those statements, so I’m not sure what you are getting at. I think though that you are proposing that I was being defensive and argumentative as some others have said. The truth is that I wasn’t. I’ll admit that I didn’t know the answer immediately, however, I think the reason was that I found it difficult to interact with someone that I could not understand. I’ll definitely stand behind the fact that interviewing native English speakers should be done by folks that speak English clearly. Interviewing native Hindi speakers should be done by the those who speak Hindi.

    The language barrier was a enormous problem because I think the interviewer was extremely annoyed that I couldn’t understand him. Each time I asked him to repeat himself, he seemed to get more and more annoyed. So, when I asked him if we were prematurely optimizing something that was relatively small in size (i.e. the dictionary), mostly to help to work through the problem, he really got mad.

    I’ll admit that I didn’t come to the answer at first, mostly because I was nervous and because I didn’t clearly understand what he was asking until I asked him to repeat the question a few times. This can be an extremely frustrating and nerve racking situation to be placed in. You are asked a question and you can’t understand it. You have to ask for the question a few more times and each time the interviewer is getting steadily more upset and you are getting steadily more nervous until things just completely fall apart and you try to work through them with a discussion of the merits of premature optimization and that just makes it worse. Then you’re claiming accurate numbers about sizes and the interviewer is telling you that you’re completely wrong and the both of you are in a place where you can’t really reconcile the situation.

    I’ll tell you that this was not enjoyable at all and I do feel that this was way I wasn’t hired. Furthermore, as to my point about the answer being simple, it is a simple answer. I don’t know if I would have gotten on my own, and I’ll never know, but once I was told, I just thought, “DUH!”. As for my failing at a simple problem, that might be. However, I doubt it because I’ve seen brilliant people miss simple questions and I’ve interviewed a lot of folks and I personally feel that missing questions isn’t the deciding factor for me. It’s more about thought process, challenging your assumptions and looking at things from different angles. I mean anyone can come up with questions that would stump even folks at Google. The idea is to work with the person you are interviewing and see how they think and work through problems.

  18. I think like some others. They may be did not found a fit for you in the company due to your attitude. I am not judging your attitude to be bad, but is just not what they were looking for. I also think you should prepare yourself better for the interviews and not being that much confident of your actual knoweldge. For example, the selecting algorithm is a very popular algorithm you did not had any idea. Then the problem of sums is resolved optimally in O(n) and you did not gave any idea that resolves the problem in that time. I recommend you to view the Video Lectures of the Algorithms Aduni course that is in aduni_dot_org. You can mantain dozens of open source projects with naive knoweledge of algortihms, i am pretty sure it is not your case, but Google must assure that.

  19. My interview with Google CA was similar in ways. To be specific, I was interviewing for a web developer position and was asked questions like “with an xterm, text editor, and C-compiler, determine the hardware architecture of the local machine based on the directionality of the stack” and “how to determine if a very-large list of integers had more negative or positive numbers”. Yeah, web development position. There were a handful of others too, each very algorithmic. No questions on designing applications, web stacks, best practices, or even team dynamics.

    I also came from Orbitz and while being exposed to a massive end-to-end stack and spending most of my time on the server-side, I entrusted the 3rd party APIs to do the dirty work while I create customizations that make $$. Lots of it ;)

    Like most people, I enjoy pealing back the layers and getting into the finer details, but ultimately, the employer needs productivity and results — something that we have to entrust COTS or open-source software to do the details while we put everything together at higher levels.

    Eh… I only interviewed with them because the opportunity came up. Ultimately, I really *really* enjoy my current job so I just throw this into the ‘experience’ bucket.

  20. Vincent:
    The point is NOT to prepare yourself and know all the answers. I was prepared for a number of the questions they asked me and knew the answers and what the interviewer did from that point was to ask me another question until he found something I didn’t know.

    Also, the selection algorithm is probably not as popular as you make it seem. In addition, the problem is not selection, it is selection of HUGE sets. Selection in memory is well-known and somewhat simple. Selection on a stream of unknown size where the size is larger than can ever be fit into memory is more difficult. You now have to use distributed memory or perform some other type of selection. In fact, my solution is a distributed memory based approach using a b-tree, which is a known solution that uses incremental sorting. So, you are correct that I didn’t know the theory or the terminology of the solution, but I give my self mad props for figuring out a complex distributed solution on my own without knowing the theory or terminology. That’s better than memorization in my opinion.

    As to the sum problem. I did in fact write code for a linear-time solution to this. I just didn’t put that in the blog post.

    Thanks for the link, I might check it out if I have time. Also, you’re more than welcome to check out my algorithms for my open source projects. My dependency graph and traversals are pretty decent linear time with no tail recursion. Also, I’ve done some pretty advanced string matching using a modified AhoCorasick multiple search algorithm for the Inversoft Profanity Filter that takes into account character replacements, spaces and padding between characters of a word, phonetics, and much more. This thing performs really well: ~5000 words to search for in ~1 meg takes around 10 seconds with support for unicode. And again I give myself mad props because I wrote the whole thing and then discovered there was already an algorithm for what I had written.

 Leave a Reply

(required)

(required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">