Correctness in the Real World

Music: Glitch Mob – Drink The Sea

I failed calculus in college. I’m an applications guy, in that, if I can’t see the point, if I can’t see the payoff, I have a lot of trouble convincing myself effort is worthwhile.

What happened was this: It was my first semester in college, and I took my professor at his word. He said there would be no curve at the end of the semester. The day of the final, I had two finals, one at 8 AM (calculus) and one at 11. Never quite a morning person, and making a 37 grade in calculus thus far, I slept past that one and instead got an A on my CS recursion final.

I could see the point in recursion, it was real to me.

A few days later I talked with one of my calculus peers who was also aiming at a grade of 35 or so out of 100. He went to the final. His final grade? He passed the class with a ‘B’. The professor was so bad, or perhaps just trying to teach us a little something about life, that he curved anyway.

My final grade? A 37, because I skipped the final I knew I would not pass. I took the course again a year later and passed (with a healthy and generous curve).

What we perceive as correctness/truth and what actually happens or matters in the real world are often two different things. This is a concept that can be incredibly hard for engineers to grasp, but this concept is key to surviving in the real world.

Often you are given a complex, or seemingly unsolvable problem with a timeline that makes no sense, and your “correctness” problem solving skills go out the window. In these real life situations you’re going to find your career teaching you a lot more about artfully cutting corners than precise algorithm correctness.

Consider for example the following interview problem:

You are given a list of items. You know the list’s size. The list contains apples and oranges. Find the last apple in the list with the fewest number of comparisons.

The conceptual algorithmic “most correct” answer to this question is simple, you do a binary search. That is, a algorithmic way of dividing the list in pieces and searching for your needle in the haystack with as few steps as possible.

A binary search is a fundamental algorithm you learn in college and is almost gauranteed to find your item in as few steps as necessary. If you don’t see this as an answer fairly quickly, the interviewer will probably be concerned.

But, as a smart little coder, you know the answer, and you’ll enthusiastically answer “I’d use a binary search!”

That’s when the real fun begins, the interviewer will then respond: “right, binary search is what works, now code that, on the whiteboard, in ten minutes”…

At that point your mind turns to mush and you immediately start stressing out about horrific bounds checking nightmares that you’ll never possibly solve correctly in ten minutes.

The trick to the interview question is this, a good interviewer isn’t looking for the absolute *correct* answer alone. A good interviewer is trying to see how you think, how you work, how you’ll approach the eventual bounds checking nightmare and corner cases. Hell, a good interviewer may not care if the code you write is completely correct at all, as long as you can communicate your intentions and think on your toes as you work through the problem.

But what about the real world?

Suppose you get the job, and your former-interviewer/new-coworker says “remember how you said binary search? well, can you do that for us on this project..?”

This is where the theoretical idealism of correctness starts fading and the real world sets in.

When presented with anything algorithmically complex, you should be asking yourself “how can I make this easier?”, and ask as many short-cut questions as possible. There’s no point in writing or (better..) finding a binary search implementation if it’s not actually what’s needed.

Your engineering training background will feed you little worrying thoughts of “doing it right, the 100% correct way”, but it’s your job to bridge the gap between theoretical fantasy land and the real world.

Will your project be fundamentally better if you burn a week solving this problem, rather than 2 hours? Probably not. Will your project manager or dev lead be happier with you for solving the problem 95% effectively in 1/10th the time? .. yes. And, when 95% isn’t enough, you can explain to the management that you need 10x the time of your 95% solution to get the more-correct answer.

When you solve problems effectively and efficiently, you’ll have more time later to re-evaluate which little cut-corners to invest more effort in. When you solve problems correctly in the pedantic academic sense, you’ll be slipping ship dates.

But how do we simplify?

The answer is simple: Take a walk. Get away from the computer, grab a coworker, and go for a walk. Free yourself to think on the problem from more than the “100% correct, perfect” angle.

Another easy way to simplify something is to ask yourself “how difficult will it be to test this?”. If it’s going to take you a day to write a decent unit test suite, figure a week of solid engineering and debugging time to get the algorithm right.

When we dont have 6 days, we figure out what works, and simplify. Allowing room for solving the problem better in the future.

A real world example of this came up for us on our TumbleOn product. We’re adding a feature to the application that allows a user to jump anywhere in a blog’s posts. A Tumblr blog has a mixture of post types such as text, video, photo, etc. TumbleOn only looks at photo posts, and Tumblr only tells us the total count (including all post types) of a blog’s posts, so essentially we’re looking for apples in a long list of a known size. This becomes a problem when the user says “jump to the end”, because, while we know the total number of posts, we don’t know how far backward we’ll search to find an interesting post.

Our first attack on the problem was the “correctness” attack, a slightly optimized binary search of sorts, with all kinds of complex rules built into it:

a) when a blog has more than 20,000 posts, check 1,000 back, see if we get a hit, then try 500 back, etc
b) when a blog has between X and Y, check different intervals
c) and so on.

Long story short, it was complicated shit, and unnecessarily so, and the worst part was developing a test suite to catch all of the horrible edge cases. The test suite itself was turning into a nightmare, nevermind the algorithm itself.. all to find the last apple in the list.

We didn’t want to take days to solve this problem, so we thought outside the box for a bit, away from the computer. We asked ourselves what corners could we cut now that’d work good enough; be easy to test; and be improvable in the future (if we ever cared). After only a few moments we came up with something much easier, our so-called “screw-it algorithm.”

The “screw-it” algorithm to reverse search (aka: brute force):

Just search backward from the end, until we find something, stopping with an error at 10 search attempts inward from the end.

Many an engineer will have a hearty laugh at our cop-out/simplification, but wait, there’s real-world ‘science‘ to the choice:

a) if/when Tumblr exposes the photo post count in their api, our algorithm is even easier, and this will be trashed.
b) our application is only looking for photos, and there’s 50 posts per attempt, so if we can’t find a photo in 500 posts, chances are this blog’s not really photo-centric to begin with.
c) we dont want to spend 3 days debugging our algorithm, and 2 days writing it
d) we want the algorithm to be testable, easily
e) we want to have time to add other value to the application

We started implementing the “screw-it” algorithm, when it hit us, why not instead do the “who even cares?” algorithm.

The “who even cares?” algorithm:

When it’s a corner case, if at all possible, just show a descriptive error and let the user deal with it. If it becomes a big issue later, work on the issue some more.

In this particular case, the chances that a user jumps to post 100,000 and finds no photos is slim, and it will take the user 1 second to instead jump to post 99,000 in this very rare case. We can (but never will) optimize on this usability corner case in the future if we really want to, but that’d be pretty pedantic and there’d be about 10,000 other things we’d improve first.

You’re taught, and interviewed as if you’re supposed to be making 100% correct uses of famous algorithms, and nothing but. It’s important to know about famous algorithms, and know when to use them. And if you’re at the JPL or working on something like a medical device, the “who even cares?” algorithm is not an option. But, 99.999% of the engineers in the world are not at all involved in a project where a hand-written reverse binary search algorithm (or quick sort, or a dozen other super-complicated-to-get-correct algorithms) is at all relevant or crucial.

The thing is, the future is sort-of infinite. The features you want to get into your software by deadline NEXT are known, but the features you can get into some future version is unknown and therefore infinite in possibility. Burning time today being pedantic about a million irrelevant details just eats tomorrow.

If you can get 15 features into your application today with a handful of difficult corner cases avoided until later, or 3 features in with all corner cases solved, which would you choose?

Being a software developer in the real world requires an ability to be flexible and know when to go academic vs practical. This is the fundamental difference of “correctness” in the academic sense, and correctness in the real world.