Sudoku Solver

This page describes an algorithm for solving Sudoku puzzles, which have suddenly become popular in the UK, and provides a Python script to automate the process.

Oct 9 2005 - Replaced the Python script with an easier to understand version that doesn't use recursion.

The "Easy" Example

 1 9 4 4 8 6 7 5 2 9 1 2 4 3 5 4 6 3 8 7 3 6 8 4 1 2 9

We will start off with an easy example. This one appeared in the Evening Standard on 31st May 2005, where it was described as "Hard". It has 27 squares filled in, the examples they describe as "Easy" have 32. This doesn't make much difference to a computer algorithm, though if you were working through this by hand the more initial numbers you are given the quicker you should be able to solve it.

I call it "easy" as it can be solved after applying only the simpler parts of the algorithm.

Introduction

The algorithm works by writing out all the possible answers and then eliminating those that are incompatible with the numbers published in the puzzle. The starting position is that every cell can contain any of the numbers 1-9. We represent this by writing all the possible values into each cell, which gives us a grid that looks like this.

 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789

The overall approach is to eliminate the impossible so that what remains must be the answer.

Applying the Given Numbers

In the example above we have been given the values for 27 of the cells. The first thing to do is to apply these to our grid and eliminate all the values that are inconsistent with these. Working across the top row the first value we have is in the 5th cell. This is a 1. This means that there can be no other '1's on the top row or in the 5th column, or in the top-middle block of 9. Crossing out all these '1's leaves

 23456789 23456789 23456789 23456789 1 23456789 23456789 23456789 23456789 123456789 123456789 123456789 23456789 23456789 23456789 123456789 123456789 123456789 123456789 123456789 123456789 23456789 23456789 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 23456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 123456789 23456789 123456789 123456789 123456789 123456789

Continuing this process; after entering the 9 given values on the top three rows we have

 2368 2368 2368 23567 1 9 3578 4 3578 1239 1239 4 8 2357 2357 6 13579 13579 7 5 13689 346 346 346 1389 1389 2 12345689 12346789 12356789 12345679 23456789 12345678 12345789 12356789 13456789 12345689 12346789 12356789 12345679 23456789 12345678 12345789 12356789 13456789 12345689 12346789 12356789 12345679 23456789 12345678 12345789 12356789 13456789 12345689 12346789 12356789 12345679 23456789 12345678 12345789 12356789 13456789 12345689 12346789 12356789 12345679 23456789 12345678 12345789 12356789 13456789 12345689 12346789 12356789 12345679 23456789 12345678 12345789 12356789 13456789

This hasn't made much impact on the bottom two thirds of the grid because so far we've only had one value in each column. But we've got 18 more values to place. When we get to '6' in the 6th column of the 6th row the grid is becoming a lot sparser and we notice that the only possible value left for the sixth cell on the third row is '4'. So we have found our first missing number (shown in green).
 2368 2368 2368 23567 1 9 3578 4 3578 1239 123 4 8 2357 57 6 13579 13579 7 5 13689 36 346 4 1389 1389 2 368 9 3678 1 578 2 3578 35678 4 12468 124678 12678 579 5789 3 125789 1256789 156789 5 12378 12378 4 789 6 123789 123789 13789 1234689 1234678 12356789 235679 23456789 14578 12345789 12356789 1356789 1234689 1234678 12356789 235679 23456789 14578 12345789 12356789 1356789 1234689 1234678 12356789 235679 23456789 14578 12345789 12356789 1356789

After we've used up all the given numbers (and the one we found) we we are left with the grid.

 236 2368 238 3567 1 9 3578 4 578 1239 23 4 8 2357 57 6 159 1579 7 5 1389 36 36 4 1389 189 2 36 9 378 1 578 2 578 568 4 1246 24678 1278 579 578 3 125789 125689 156789 5 278 1278 4 78 6 12789 3 1789 8 24 259 56 456 15 1259 7 3 239 237 6 357 357 8 4 1259 159 34 1 357 2 9 57 58 568 568

So we move onto phase 2, "Looking for Singletons".

Looking for Singletons

Phase two involves counting how often the undecided numbers appear in a set of cells (row, column or 3x3 super-cell). If we start with the first column, the undecided cells are the 1st, 2nd, 4th, 5th, 8th and 9th. The possibilities are '236', '1239', '36', '1246', '239' and '34'. If we count how often each undecided number appears (the known ones are excluded from this process) we get

 Number 1 2 3 4 5 6 7 8 9 Frequency 2 4 5 2 - 3 - - 2

Every undecided number appears at least twice, so this doesn't help us. Repeating this exercise for columns 2 and 3 doesn't produce any useful information either. When we try column 4 though, we get

 Number 1 2 3 4 5 6 7 8 9 Frequency - - 3 - 4 3 3 - 1

Now the '9' only appears once, in the 5th row, so that is where it must be. We have found our second number. We can now set the '9' and repeat the process described in phase 1 giving.

 236 2368 238 3567 1 9 3578 4 578 1239 23 4 8 2357 57 6 159 1579 7 5 1389 36 36 4 1389 189 2 36 9 378 1 578 2 578 568 4 1246 24678 1278 9 578 3 12578 12568 15678 5 278 1278 4 78 6 12789 3 1789 8 24 259 56 456 15 1259 7 3 239 237 6 357 357 8 4 1259 159 34 1 357 2 9 57 58 568 568

That didn't help very much, so we restart this phase at the beginning. The first four columns offer no additional insights, but now the frequencies in column 5 are

 Number 1 2 3 4 5 6 7 8 9 Frequency - 1 3 1 5 2 5 3 -

Which gives us two more discoveries, '2' on the second row and '4' on the 7th. Filling these in using the phase 1 algorithm sets off a chain reaction...

 Action Consequences (2,5) = 2 (2,2) = 3 (2,2) = 3 None (7,5) = 4 (7,2) = 2 (7,2) = 2 (8,2) = 7 (8,2) = 7 (6,2) = 8 (6,2) = 8 (1,2) = 6, (6,5) = 7 (1,2) = 6 (1,1) = 2, (5,2) = 4 (6,5) = 7 None (1,1) = 2 (1,3) = 8 (5,2) = 4 None (1,3) = 8 None

Co-ordinates are (row, column) with the top left being (1,1). The grid now looks like this (with the new discoveries highlighted).

 2 6 8 357 1 9 357 4 57 19 3 4 8 2 57 6 159 1579 7 5 19 36 36 4 1389 189 2 36 9 37 1 58 2 578 568 4 16 4 127 9 58 3 12578 12568 15678 5 8 12 4 7 6 129 3 19 8 2 59 56 4 15 159 7 3 39 7 6 35 35 8 4 1259 159 34 1 35 2 9 57 58 568 568

Restartind g this phase counting down  the columns shows us that the bottom cell in the first column must be '4'. This doesn't lead to any further discoveries. Trying again reveals that the top cell in the 4th column must be '7'. This second discovery forces both (1,9) and (2,6) to be '5', which in turn forces three more discoveries, leaving.

 2 6 8 7 1 9 3 4 5 19 3 4 8 2 5 6 19 179 7 5 19 36 36 4 189 189 2 36 9 37 1 58 2 578 568 4 16 4 127 9 58 3 12578 12568 1678 5 8 12 4 7 6 129 3 19 8 2 59 56 4 1 59 7 3 39 7 6 35 35 8 4 1259 19 4 1 35 2 9 7 58 568 68

Searching the columns again we see that the third cell in the fifth column must be a 6. Setting this triggers another cascade of discoveries which culminates with the solution.

 2 6 8 7 1 9 3 4 5 1 3 4 8 2 5 6 9 7 7 5 9 3 6 4 1 8 2 3 9 7 1 8 2 5 6 4 6 4 2 9 5 3 7 1 8 5 8 1 4 7 6 2 3 9 8 2 5 6 4 1 9 7 3 9 7 6 5 3 8 4 2 1 4 1 3 2 9 7 8 5 6

For this example that is as far as we need to go, which is why I've described it as "Easy". I don't have any examples, but any patterns that can be solved only by using the first phase could be described as "Trivial".  The next sections discuss some more difficult examples of the problem.

Pairs

The first two phases are sufficient to solve most examples published in popular newspapers, but here is a more difficult example someone sent me. It only has 23 given numbers, 4 less than the previous example.

 2 5 8 6 3 8 5 1 4 7 6 9 6 5 7 7 3 9 4 7 6 8 9 8 1 9

After applying phases 1 and 2 we are still left with.

 134 379 134 1578 14589 14 2 679 13469 1234 5 8 127 1249 6 3479 79 1349 6 279 124 3 1249 124 479 8 5 238 1 23 4 7 5 6 29 289 9 4 6 128 128 12 5 3 7 5 28 7 6 3 9 1 4 28 7 6 12345 125 1245 8 349 259 2349 234 23 2345 9 2456 7 8 1 2346 1248 28 9 125 12456 3 47 2567 246

Which appears to have still a long way to go. If we examine the second column we can see things of interest.

1. The numbers 7 and 9 only appear twice, and when the do appear they appear together.
2. The are two cells that can only contain 2 or 8.

The first of these rules will allow us to simplify cells (1,2) and (3,2) which currently show the options 3,7,9 and 2,7,9. Because 7 and 9 only appear in these two cells, one must contain 7 and the other 9, though we don't know which is which. This does though allow us to remove the possibilities that these cells contain a 3 or a 2.

The second of these rules applies to cells (6,2) and (9,2), both of which can only contain a 2 or an 8. Again this means that one of the cells must contain 2 and the other 8. Although we don't know which is which we do no that none of the other cells in the column can be a 2 or an 8. This allows us to remove the twos from (3,2) and (8,2), though in this case (3,2) has already had its two removed using the previous rule.

The table now looks like

 134 79 134 1578 14589 14 2 679 13469 1234 5 8 127 1249 6 3479 79 1349 6 79 124 3 1249 124 479 8 5 238 1 23 4 7 5 6 29 289 9 4 6 128 128 12 5 3 7 5 28 7 6 3 9 1 4 28 7 6 12345 125 1245 8 349 259 2349 234 3 2345 9 2456 7 8 1 2346 1248 28 9 125 12456 3 47 2567 246

With the changed cells in green. The fact that 3 is now in a cell on it's own means that we can eliminate all the other 3's in the same block or row.

 134 79 134 1578 14589 14 2 679 13469 1234 5 8 127 1249 6 3479 79 1349 6 79 124 3 1249 124 479 8 5 238 1 23 4 7 5 6 29 289 9 4 6 128 128 12 5 3 7 5 28 7 6 3 9 1 4 28 7 6 1245 125 1245 8 349 259 2349 24 3 245 9 2456 7 8 1 246 1248 28 9 125 12456 3 47 2567 246

In this case there are no more pairs that can be used to simplify the grid, so we move onto phase 4, Eliminating Bad Guesses.

This phase is simple to described but harder to implement. In the previous example, after exhausting phase 3 we are still left with

 134 79 134 1578 14589 14 2 679 13469 1234 5 8 127 1249 6 3479 79 1349 6 79 124 3 1249 124 479 8 5 238 1 23 4 7 5 6 29 289 9 4 6 128 128 12 5 3 7 5 28 7 6 3 9 1 4 28 7 6 1245 125 1245 8 349 259 2349 24 3 245 9 2456 7 8 1 246 1248 28 9 125 12456 3 47 2567 246

The next phase involves making guesses and seeing the consequences. There are three possible outcomes,

1. the search runs to completion with a solution
3. the search is inconclusive

This first outcome at first glance appears to be ideal, but at this moment we don't know there is only one solution so we ignore it. Instead what we look for is outcome 2, the contradiction becase we now know that that guess must be wrong. Looking at the table above there are 141 possible guesses, starting with the top left cell being 1.

This part of the algorithm involves making that guess and then applying the three previous phases to see where it leads us. In this case setting cell (1,1) to 1 leads to a contradiction so we can eliminate that guess leaving only 3 or 4 in the top left cell. Trying 3 causes no problems but 4 leads to a contradiction, so we can eliminate that too.

Now we only have 3 left so that must be the value in the top left cell. Fixing this value and applying the previous phases leads to the solution

 3 9 1 8 5 4 2 7 6 4 5 8 7 2 6 3 9 1 6 7 2 3 9 1 4 8 5 8 1 3 4 7 5 6 2 9 9 4 6 1 8 2 5 3 7 5 2 7 6 3 9 1 4 8 7 6 4 2 1 8 9 5 3 2 3 5 9 6 7 8 1 4 1 8 9 5 4 3 7 6 2

If All Else Fails

If we take the example from the start of phase 3 (Pairs) and remove the '6' on the fifth row we are left with this puzzle.

 2 5 8 6 3 8 5 1 4 7 6 9 5 7 7 3 9 4 7 6 8 9 8 1 9

After applying the first 4 phases we are left with

 134 79 134 78 1589 45 2 67 1346 1234 5 8 127 129 6 3479 379 134 126 79 26 3 129 14 479 8 5 8 1 35 4 7 25 6 239 39 9 34 346 168 68 12 5 23 7 56 2 7 56 3 9 1 4 8 7 6 1234 12 1245 8 349 359 2349 2345 34 2345 9 2456 7 8 1 2346 145 8 9 1256 12456 3 47 567 246

Phase 5 works by working through the possible values recursively until it finds a solution or a contradiction. It steps through the possible guesses sequentially. The first cell (1,1) can be 1, 3 or 4. Choosing 1 gives

 1 79 34 8 59 45 2 67 346 34 5 8 7 2 6 349 39 1 26 79 26 3 19 14 479 8 5 8 1 35 4 7 25 6 239 39 9 34 346 16 8 12 5 23 7 56 2 7 56 3 9 1 4 8 7 6 1 2 45 8 349 359 349 2345 34 2345 9 456 7 8 1 346 45 8 9 156 1456 3 47 567 2

The next guess is to try 7 in cell (1,2). This leads to

 1 7 3 8 9 5 2 6 4 4 5 8 7 2 6 39 39 1 2 9 6 3 1 4 7 8 5 8 1 5 4 7 2 6 39 39 9 3 4 6 8 1 5 2 7 6 2 7 5 3 9 1 4 8 7 6 1 2 4 8 39 5 39 3 4 2 9 5 7 8 1 6 5 8 9 1 6 3 4 7 2

The third guess, setting (2,7) to 3 leads to a solution

 1 7 3 8 9 5 2 6 4 4 5 8 7 2 6 3 9 1 2 9 6 3 1 4 7 8 5 8 1 5 4 7 2 6 3 9 9 3 4 6 8 1 5 2 7 6 2 7 5 3 9 1 4 8 7 6 1 2 4 8 9 5 3 3 4 2 9 5 7 8 1 6 5 8 9 1 6 3 4 7 2

Stepping back one guess and trying (2,7) = 9 also leads to a solution. All in all there are 32 possibilities, see here.