“Jigsaw Puzzles: Problem B”

Michael Levison Queen's University, Ontario levison@cs.queensu.ca Craig Thomas Queen's University, Ontario thomas@cs.queensu.ca

A jigsaw puzzle problem arises whenever an archaeologist finds an old manuscript broken into fragments and wishes to reconstruct it. The best known examples are the Dead Sea Scrolls, reconstructed by hand during the middle of the 20th century. In fact, there are two different types of problem, each requiring a different strategy. If the fragments come from a familiar text, the manuscript can be reconstructed by locating each fragment in an existing copy without reference to other pieces of the puzzle. If the text is unfamiliar, we must compare the shapes of fragments to one another to discover good matches with reasonable letter sequences formed across the boundaries. We haved called the former Jigsaw Problem A, the latter Problem B. The purpose is to find computer techniques which substantially reduce the human work involved in reconstruction.

Previous Work

Very little has been published on this topic since it was brought to the first author's attention 35 years ago. A previous paper ( Levison and Wu, 1999) reported on some experiments involving artificial fragments derived from a paragraph of Alice in Wonderland. This work essentially provided a computer solution for Problem A, predicting whether a single site is likely for a given fragment based on letter-sequence frequencies and, for our fragments, finding such a unique location in Alice whenever it was predicted. Experiments on problem B proved less successful. A program matched each of the 53 fragments to each of the others with all possible vertical shifts, computing a score for each juxtaposition. The "best" matches were listed for each fragment, different scoring formulae being tried without materially affecting the outcome. The correct match often occurred among the best five, but only a few times in top or second place. In effect, two or three of 52 competitors often scored better than the correct one by chance alone. Extrapolation suggests that, given a set of a few thousand fragments, a human might have to examine a few hundred matches per fragment to determine the correct one -- a ten-fold improvement over a human-only approach, but still very time-consuming. For a satisfactory outcome, we want to narrow the field, allowing a human researcher to inspect only a few matches per fragment. Incidentally, these experiments did suggest that letter-sequences play only a secondary role in the scoring, shape being the most important feature. This is perhaps not surprising since letter-sequences can be assessed only on lines which fit together.

Refinement

In Levison and Wu (1999), the left- and right-profiles of each fragment are represented by a sequence of measurements from an arbitrary vertical datum line. At first sight, it is the step-like nature of the edges which causes the problem, allowing too many good fits between unrelated fragments. In fact, however, any shape represented by a finite sequence of measurements can be viewed as step-like. The true problem is the granularity of the measurements: a single measure per line of text, with a (fixed) character width, about 2.5 mm, as the horizontal unit. In new experiments, a further paragraph of Alice was fragmented and three measurements were made for each line of text to the nearest 0.5 mm. Once again each fragment was compared with every other in all different vertical positions. The scoring algorithm measured the total gap between lines when two fragments were "just touching". This initial measure was adjusted according to the number of lines which contributed to it, and the final score biased to reward juxtapositions where several lines fitted closely but not exactly. In the set of 61 new fragments from the paragraph, 41 correct matches occurred in first place, with five each in second and third, a major improvement on the earlier results.

Tear and Wear

Do these improvements persist if the number of fragments is much higher? With many correct fits now occurring in first place, the rough extrapolation used earlier no longer applies. Although few good shape matches may appear by chance among 61 fragments, they may well arise among thousands. To answer the question we need to experiment with much larger sets. Unfortunately no such sets are available, while tearing and measuring thousands of artificial fragments is not an enticing prospect. We have therefore created sets of "virtual fragments". The comparison process uses only the left- and right-profiles of each fragment. We therefore generate vertical "tears" which form the right- profiles of one group of fragments and the left-profiles of an adjacent one. A tear is simply a sequence of measurements, each randomly displaced from the one above. The displacements are chosen from a non- uniform distribution, small changes being common, larger ones less frequent. In fact, the actual distribution determines how rough or smooth the edges of the virtual fragments are. In practice, the shapes of these virtual fragments may be too perfect. Since the right-profile of a fragment and the left-profile of its neighbour come from the same tear, the correct match will involve no gap at all, helping to ensure that it is always the best. We therefore submit the virtual fragments to a process of "wear". Each profile measurement is altered by a random amount to reduce the quality of the fit between correctly matching profiles.

Results

Many sets of fragments ranging from about 60 to 5400 in number were simulated and compared using the matching process. Typical results are shown in Table 1. For sets of around 60 fragments, the average number of correct matches occurring in the first three places is around 88%, very close to the results obtained for the paragraph from Alice. As expected, this percentage diminishes as the size of the set increases. For 3600 fragments, it is still about 66%. In other words, if we have a set of 3600 fragments whose profiles do not deviate substantially from our virtual sets, we expect to find the correct match among our three top choices about two-thirds of the time. This meets the objective set earlier. In fact, we can expect even better results if we apply the process in both directions, and use it interactively, so that the human scholar confirms some matches and thereby eliminates some profiles from later comparisons. The comparison program is written in C, and was run on a 500Mhz PC. Table 2 shows the time taken to carry out the comparisons for sets of different sizes. In principle, the time should be proportional to the square of the number of fragments, and this is closely borne out by the last three observations. (The processing portion of the two smaller cases is swamped by the initialization and output portions of the program.) Even for very large sets the process is feasible on current computers.

Table 1. Typical results of the comparison process.
	number of fragments	number of correct matches among top three	percentage
	62	50	80.7%
	62	56	90.3%
	58	54	93.1%
average			87.9%
	199	152	76.4%
	181	157	86.7%
	193	146	75.6%
average			79.4%
	406	303	74.6%
	407	284	69.8%
	383	297	77.5%
average			73.9%
	3636	2443	67.2%
	3667	2448	66.8%
	3627	2412	66.5%
average			66.8%

Table 2. Timings for sets of different sizes
fragments	time (seconds)
60	1
111	1.5
272	4
1084	48
5381	1100 (= 18 minutes 20 seconds)

Further Stages

The comparison process is the central component of the reconstruction task. It permits many of the fragments to be combined into pairs and, by extension, into horizontal bands, from which scholars can easily complete the reconstruction. A complete program suite might go further, making use of digital photographs to display best fits for human judgement, applying the comparison process in the vertical direction to the horizontal bands, and so on. One time-consuming activity remains -- that of accurately measuring fragments to obtain their profiles. We have therefore investigated the possibility of deriving measurements directly from digital photographs. Some of our artificial fragments were photographed horizontally against a bright red ground using a digital camera. For this discussion we assume that the text is black on white. Our purpose is simply to distinguish three kinds of pixel: text, "paper" and background. If the text is faded or the material dirty, there are well-known digital processes for "cleaning" the image. The horizontal rows of pixels are scanned, and the numbers of black pixels in each row are counted. In principle, the numbers will be near zero for the inter-line gaps, higher for the bodies of text lines, and intermediate for rows corresponding to risers and descenders if any. This lets us locate the pixel rows which delimit the body of each text line; after which, scanning the top, middle and bottom rows of each line to find where red pixels stop and start again allows us to compute the desired measurements. Experiments confirm that this holds true in practice. In essence, then, the processes described here provide a computer solution to Problem B.

References

M. Levison. “The Siting of Fragments.” Computer Journal. 1965. 7: 275-277.

M. Levison. “The Computer in Literary Studies.” Machine Translation. Ed. A. D. Booth. Amsterdam: North-Holland Publishing Company, 1967. 173-194.

M. Levison J. Wu. “The Jigsaw Puzzle Problem Revisited.” ACH-ALLC 1999, Charlottesville. : , 1999.

J. A. Ogden. “The siting of papyrus fragments: an experimental application of digital computers.” University of Glasgow, 1969.