Today, guest scientist Andrew Boyd drives us to work. The University of Houston presents this series about the machines that make our civilization run, and the people whose ingenuity created them
Imagine that you wake up tomorrow and find your favorite route to work is closed. Not only that, but so many roads are under construction you can't think of a good alternative. Going to the computer, you visit a web site and ask for the shortest route. It responds in a fraction of a second with the answer. And if all the information is correct, it's sure to be the shortest route.
So how did the web site do this? It certainly didn't try every option and pick the shortest one. Even at the speed of today's computers, this strategy would take longer than the estimated age of the universe for a city like Houston.
What the web site did was apply a specialized optimization method. These methods can be used to search for lowest price airline tickets, find the least cost way of providing supplies to troops, or to solve any other problem where there is a clear measure of "best."
Methods for problems like finding the shortest route to work often do just what you might -- they start with a good guess and refine it until no better solution can be found. Driving to work, you might try what seems like a good route and then improve upon it by taking shortcuts. At some point you find you just can't seem to do any better. But how do you know you've found the best route? Perhaps there's another, shorter route that is very different from the one you've settled on. When optimization methods stop finding improvements they also provide a mathematical guarantee that there is no better solution. For the shortest route problem this guarantee can be found very quickly. It's actually a clever argument that there are absolutely no shortcuts left.
Not all problems are as easy as finding the shortest route to work. Take the closely-related traveling salesman problem, in which a mythical salesman seeks the shortest route to visit all of his customers around the city. While good solutions can be found in the same way we find a shortest route to work, actually showing a solution is the best is very difficult. So far a simple guarantee isn't known for the traveling salesman problem.
If the relative difficulty of these two problems seems puzzling, you're not alone. Over the last half century many of the brightest people in the world have tried to show that the traveling salesman problem is either harder to solve than the shortest route problem or just as easy, but so far no luck.
In a slightly more general form, it's considered the most important problem in theoretical computer science. If it could be solved quickly, it would show that thousands of other very important problems could also be solved quickly, saving the world economy hundreds of billions of dollars annually. The problem even carries a million dollar reward for a solution. Only six other mathematical problems offer such enormous rewards. But as any mathematician will tell you, it's not the money that makes the problem interesting. It's the opportunity to solve a fascinating problem that your peers agree is superbly difficult.
So the next time you visit a web site for driving directions, or to purchase an airline ticket, or to Google your favorite topic, remember there's a lot of ingenuity lying just beneath the surface. As one professional society likes to say, it's the science of better.
I'm Andrew Boyd, at the University of Houston, where we're interested in the way inventive minds work.
Dr. Andrew Boyd is Chief Scientist and Senior Vice President at PROS, a provider of pricing and revenue optimization solutions. Dr. Boyd received his A.B. with Honors at Oberlin College with majors in Mathematics and Economics in 1981, and his Ph.D. in Operations Research from MIT in 1987. Prior to joining PROS, he enjoyed a successful ten year career as a university professor.
The professional society referred to in the essay is INFORMS, the Institute for Operations Research and the Management Sciences. For more information, see http://www.informs.org/.
In 1962, Proctor and Gamble offered a contest for selecting the shortest route from Chicago through all the cities shown in red -- much more difficult than it first appeared.
Some additional Notes
The shortest route problem is one of a class of problems for which fast algorithms are known. "Fast" is formally defined as a problem that can be solved within a guaranteed number of computational steps that is dependent on the problem size. It's fair to let the algorithm take longer for Houston than for Terlingua, Texas, whose population is only 267.
Specifically, if S represents the problem size - which for our purposes we can think of as a list of the intersections, road segments, and their lengths - and, if there is an algorithm A and a polynomial function f(S) such that the number of steps required to solve the problem with A is no more than f(S), we say the problem belongs to the class P. P actually stands for "polynomially solvable," and the algorithm A is called "polynomial time."
There are actually many polynomial time algorithms for solving the shortest route problem. Dijkstra's algorithm is guaranteed to find a shortest route in no more than O(n*n) steps, where n is the number of intersections. The notation O(n*n) means that the computational time is dominated by the term n*n for large n. The Bellman-Ford algorithm is guaranteed to find a shortest route in no more than O(n*m) steps, where m is the number of road segments. These expressions are polynomial functions of the problem size.
Both the shortest route problem and the traveling salesman problem belong to another class of problems called NP. To be in the class NP, when a problem is a "yes instance," there must exist a certificate that verifies this in polynomial time.
For example, "is there a route to work that is no longer than 15 miles" is a problem in NP, since if there is such a route it's easy to verify -- simply check that its length is no longer than 15 miles. The route is the certificate. Similarly, "is there a traveling salesman tour that is no longer than 100 miles" is also a problem in NP.
Note that being in NP does not require a quick way to find a certificate, only that if the problem is a yes instance then there exists a certificate that can be verified in polynomial time. Further, being in NP does not require that if the problem is a "no instance" that this can be verified in polynomial time. If all traveling salesman tours are longer than 100 miles, how is this demonstrated? One way is to show that each and every tour is longer than 100 miles, but this is not a polynomial time-verification procedure.
On the other hand, if all routes to work are longer than 15 miles it is possible to verify this in polynomial time -- run the polynomial time algorithm that finds the shortest route, and if it's longer than 15 miles you're done. This is possible because the shortest route problem is in P. Note that if the problem is a "yes instance," that is, if there is a route no longer than 15 miles, this can also be verified by running the polynomial time algorithm that finds the shortest route. So the fact that a problem is in the class P is sufficient to demonstrate that it is also in NP; that is, the set NP contains the set P.
We thus come upon the big question: is every problem in NP also in P, in which case P = NP, or is NP a strict superset of P? This is a concise, abstract statement of the big question, but it is important to realize it is stems from a fundamentally practical problem. Problems in P are generally easy to solve to optimality, whereas there are many, many practical problems that we know are in NP but as yet we don't know if there exist polynomial time algorithms that will solve them. The definition of NP may seem a bit abstract, but it was, in fact, defined so to capture an enormous number of very practical problems.
Algorithmic complexity theory goes further by showing that there are problems in NP that are NP-complete. Loosely speaking, a problem is NP-complete if it is in NP and at least as hard as any other problem in NP. More correctly, if a polynomial time algorithm can be found for an NP-complete problem, thus showing it is in P, then every problem in NP has a polynomial time algorithm and P = NP. The traveling salesman problem is NP-complete, so a polynomial time algorithm to solve it would prove P = NP.
Interestingly, most known problems in NP are NP-complete. Because the proof methodology of finding a polynomial time algorithm for a problem is so straightforward, and because there are so many NP-complete problems, you regularly see proposed proofs that P = NP. So far, they have all been wrong.
The definitions of the classes P and NP are really quite different, so it might very well be expected that P is not equal to NP. This is further supported by the fact that no polynomial time algorithm has ever been found for an NP-complete problem even though many brilliant people have tried. Thus, the common conjecture is that P and NP are not identical. However, a proof of this fact has not been forthcoming. The conjecture is so widely held that a vast number of mathematical papers have been published based on the assumption P is not equal to NP. If by some chance it turns out that P = NP, we have wasted a lot of time and paper.
The million dollar prize for resolving whether or not P = NP is sponsored by the Clay Mathematics Institute. See http://www.claymath.org/millennium/ for more information.