Reconstruct text pages cut by shredder



Q1: I have a page of a printed text file, but it was cut by a paper shredder (from top to bottom). You can download all the paper fragments here. Here are some of them. How can I reconstruct it?

shredded text

Q2: This time, the situation is even worse - the page was cut by a shredder twice (from top to bottom and from left to right)! (Get the fragments here).

really shredded text

My code for Q1:

filenames = FileNames["*.bmp", "C:\\Users\\Liao\\Desktop\\B\\file1"];
pics = Import /@ filenames;
binarypic = Binarize[#, 0.9] & /@ pics;
binarydata = ImageData /@ binarypic;
leftandrightdata = 
MapIndexed[{#2[[1]], #1[[All, 1]], #1[[All, -1]]} &, binarydata];
result = {};
firstPic = Select[leftandrightdata, ! MemberQ[#[[2]], 0] &][[1, 1]];
start = leftandrightdata[[firstPic]];
DeleteCases[leftandrightdata, start] //. 
      x_ /; Length[x] > 0 :> (sort = 
   SortBy[x, HammingDistance[#[[2]], start[[3]]] &];
     AppendTo[result, start = sort[[1]]]; Rest[sort]);
Prepend[List /@ result[[All, 1]], {firstPic}]] // ImageAssemble

This code maybe dizzy for you, but it doesn't matter. The core idea is simple and straightforward, which use HammingDistance to determine whether two images can be combined together.

But for Q2, I can't really find a good way to solve it.


Posted 2013-09-19T01:39:23.387

Reputation: 2 533

2how did you manage to put that stripes to the shredder? :p – Kuba – 2013-09-19T04:58:16.530

1@Kuba It's a tough work! Well, actually it's a problem from a Mathematical Contest in Modeling. Now the contest have finished, so I post it here, looking forward to funny and exellent anwsers. – mmjang – 2013-09-19T05:27:34.290

4You can try ImmageAssemble at each permutation, then TextRecognize -> DictionaryLookup -> Length. I think it would be fun-compact-neverending :) – Kuba – 2013-09-19T10:13:31.833

@Kuba It's too horrible to imagine what will happen :p – mmjang – 2013-09-19T14:54:25.140

@mm.Jang Just for curiosity. This is not a real striped paper, isn't this? – Murta – 2013-09-20T03:17:09.160

3Can you upload the files in a non .rar format? Perhaps a .zip or .tar.gz? – rm -rf – 2013-09-20T03:17:27.770

@Murta I suppose not, stripes fit too well :) – Kuba – 2013-09-20T03:31:54.597

@rm-rf oh,I'm sorry, I have changed the links to .zip format. – mmjang – 2013-09-20T06:10:26.163

I came up with a clever algorithm involving constructing a graph over the pieces in Q2 and then finding its maximum-weight spanning tree. But I need the edges to be labeled, and the result of Combinatorica's MaximumSpanningTree discards edge labels :( – None – 2013-09-20T09:26:57.800

is the data from Q2 the same as Q1 ? just trying to understand how the images are organised. Are the strips cut in the same amount of squares ? – lalmei – 2013-10-05T14:44:54.737

@lalmei it's 19x11 – Dr. belisarius – 2013-10-06T18:02:21.710

@belisarius Spoiler! that is the part of the question :p – Kuba – 2013-10-07T08:20:58.930

@Kuba And that is part of my answer :) – Dr. belisarius – 2013-10-07T11:24:16.987

@Kuba Now seriously, I can't find a clean way without using too much heuristics (ie guessing). – Dr. belisarius – 2013-10-07T11:26:33.280

@belisarius me too, with modifying my code I was able to create one full horizontal line and couple of 2-5pieces blocks. Which is ofcourse not enough. I'm afraid pieces are too small and I failed even with extracting borders of the image. For example there is a pic with only "A" in the middle :P, well I know where it should be but how to do this automatically? But since I'm not IT/algorithms spec I just wait for someone smarter answer :) – Kuba – 2013-10-07T11:32:24.713

In the blockbuster movie "The tourist" they show a computer reconstructing a letter from burnt fragments. Even in the movie it was a difficult process. And ultimately it proved useless. – magma – 2013-10-07T14:55:56.257

@magma in the movie I think the person set it on fire too, let's hope that's not the parameters for a Q3. – lalmei – 2013-10-07T15:41:26.473

@RahulNarain I was able to modify a basic maximum spanning implementation of Kruskal's algorighthm, since the "finding a cycle" part of the algorithm you can just strip any edge labels. My problem is finding a good weight for the edges (Ignoring white edges). – lalmei – 2013-10-08T11:34:31.113

Andrew Glassner considered this problem in this note; I haven't gotten around to seeing whether any of the answers below correspond to an implementation of Glassner's ideas.

– J. M.'s ennui – 2015-05-04T03:55:51.700

@J.M. Thanks for the reference! It seems his clustering algorithm is similar to what I created, with some modifications so pieces don't overlap for multiple sided pieces. The problem with the words, is that there are great matchings between wrong pieces, in all the weights methods I tried, I will expand my post to show the problems at some point. He also suggested removing white edges from the algorithm. This is a great read! – lalmei – 2015-05-04T09:26:09.230




pics = Binarize /@ (Import /@ FileNames["*.bmp"]);

(*extracting edges*)
left = ImageData[#][[All, 1]] & /@ pics; 
right = ImageData[#][[All, -1]] & /@ pics;

Let's find the left edge, here I'm assuming that it is all white. So if

pics // First // ImageDimensions
{72, 1980}

and an image is binarized then the left edge is:

Total /@ left // Position[#, 1980, 1] &

Instead of HammingDistance I'm calculating Norm:

cons = {4};
   With[{dists = Norm[right[[cons[[-1]]]] - #] & /@ left}, 
        AppendTo[cons, Position[dists, Min@dists, 1][[1, 1]] ]
  , {Length@pics - 1}]


enter image description here


Posted 2013-09-19T01:39:23.387

Reputation: 129 207

Scroll down, my answer only solves simple case. – Kuba – 2015-04-30T05:25:56.037


Greedy algorithm (1D & 2D)

Data loading:

data = N@ImageData@Import[#] & /@ FileNames["*.bmp"] // Developer`ToPackedArray;
n = Length[data];

The matching of two edges is defined by the property

$$ \frac{\mathop{\rm Var} [e_1+e_2]}{\mathop{\rm Var} [e_1-e_2]} $$

where $\mathop{\rm Var}[X]$ is the variance of $X$

r = 7;
var[list_] := 
  Total[#, {3, 4}] &[(Transpose[list, {1, 2, 4, 3}] - 
     Mean@Transpose[list, {2, 3, 1, 4}])^2];
corr[list1_, list2_] := 
  var@GaussianFilter[list1 + list2, {{0, 0, r, 0}}]/(0.0001 + 
     var@GaussianFilter[list1 - list2, {{0, 0, r, 0}}]);

Here r=7 is the radius of smoothing. It is very helpfull for real images (see below). 0.0001 is a small number to prevent division by zero.

The comparison each tile with each other:

top = corr[ConstantArray[data[[All, 1]], n], 
   Transpose@ConstantArray[data[[All, -1]], n]];
bottom = Transpose[top];
left = corr[ConstantArray[data[[All, All, 1]], n], 
   Transpose@ConstantArray[data[[All, All, -1]], n]];
right = Transpose[left];

Tiles in the resulting mosaic and possible positions of the next tiles:

tiles = SparseArray[{}, {2 n, 2 n}];
next = {};

The addition of the tile num to the position {i,j} of the resulting mosaic and the update of next tiles:

add[num_, {i_, j_}] := (next = DeleteCases[next, {i, j}];
  tiles[[i, j]] = num;
  left[[num, All]] = right[[num, All]] = top[[num, All]] = bottom[[num, All]] = 0.0; 
  If[tiles[[##]] == 0 && Not@MatchQ[next, {##}], AppendTo[next, {##}];] & 
      @@@ {{i, j + 1}, {i, j - 1}, {i + 1, j}, {i - 1, j}})

Quality of the matching of all possible tiles at the position {i,j} is just the sum of the matching with neighbors:

quality[i_, j_] := 
  If[#2 != 0, #[[All, #2]], 0 #[[All, 1]]] & @@@ {{left, 
      tiles[[i, j + 1]]}, {right, tiles[[i, j - 1]]}, {top, 
      tiles[[i + 1, j]]}, {bottom, tiles[[i - 1, j]]}} // Total;

Addition of two tiles with the best matching:

(add[#1, {n, n}]; add[#2, {n, n + 1}]) & @@@ Position[left, Max[left]];

Addition of n-2 remaining tiles to the best positions:

Do[add[#2, next[[#]]] & @@ First@Position[#, Max[#]] &[quality @@@ next], {n - 2}];

The result for Q1:

view[margins__] := 
 Image@Flatten[#, {{1, 3}, {2, 4}}] &@
  Map[If[# <= 0, 1 + 0 data[[1]], data[[#]]] &, 
   tiles[[#1 ;; #2, #3 ;; #4]] &[margins], {2}]

view[Min[#1], Max[#1], Min[#2], Max[#2]] & @@ Transpose@tiles["NonzeroPositions"]

enter image description here

Unfortunately Q2 have too many blank edges to reconstruction so I use Lena photo

data = N@ImageData /@ #[[PermutationList@RandomPermutation@Length[#]]] &@
  Flatten@ImagePartition[ExampleData[{"TestImage", "Lena"}], 64];

After repeating the steps above we got the result

enter image description here

Without smoothing of edges (see r above) the matching fails because of noise and fine structure (especially in the hat).


Posted 2013-09-19T01:39:23.387

Reputation: 41 907

Cool! But my bounty has expired......:-/ – mmjang – 2013-10-10T06:11:20.593

@mm.Jang If the bounty is expired the reputation is losted? I have never started a bounty. – ybeltukov – 2013-10-10T19:21:21.477

Very nice! My suggestion for the blank edges may be to give them huge (unnatural) weights, that way it avoids connecting any white edges, if there is no way to move across a image with blank edges (in the case of the letter) it may be a natural gap. – lalmei – 2013-11-01T16:13:36.400

I'm trying to understand your weights between edges better (specifically how those transpose's are acting on the image data), since I'm not very familiar with image analysis. Are you taking a variance over each color channel ? then summing ? – lalmei – 2013-11-04T17:15:55.067

@lalmei Yes, I sum squares of every edge pixel and each color channel (Total with parameter {3,4}). Transpose is a bit complicated, but it helps me to subtract the mean value (also with each channel separately). – ybeltukov – 2013-11-04T17:23:00.040


Q2: This answer has been sitting in my trunk of files for a year and a half now. I was hoping to improve it and use it to solve the original problem but never got around to it, at the moment it solves images/photos with noisy data. Might as well post it before I forget how it even works.

So I wanted to solve this problem with a Minimum Spanning Tree Algorithm. I decided to use Kruskal's algorithm, the idea being that people solve a jigsaw puzzle as they find pieces that match (maybe there is a way to parallelize this). This way you build a Forest of Trees made up of the best matches. So you can stop at any moment, and in the worst case you have a bunch of larger matched pieces.

At the time my idea was to use this algorithm for the main questions and get it to a point where matching using text recognition wasn't too time consuming since now you would have less pieces with larger pieces of words on them (make a greedy algorithm that matched the pieces to maximize the number of english words it found). But even then using text recognition at the time took a very long time.

The main problem I came into was trying to make sure that when you merge two Trees the images weren't overlapping. I made a modification to make it work, I call this the Minimum Spaning Geometrical Tree. There are two functions, one that gives the relative positions of pieces in a tree Chargeds. And one that checks if there is a path to a specific relative position for a piece, ChargedPath. Here I call the positions charges, since I envisioned this for a more generalized use (e.g. particle tracking. )

The main part of the code is to build an adjacency list like this $$ \{ node1, \{\{"up", node2\}, \{"down", node24\}, \{"left", node64\}, \cdots \}\}, \\ \hspace{-11.5cm} \{node2, \cdots\} $$ Except instead of "up","down","left","right", I use 1,2,3,4. (maybe not in that order)


For weights I used the same as @ybeltukov, since it works best for noisy images.

r = 12;
var2[list_] := Total@(Variance[#] & /@ (Thread[list]));
corr[list1_, list2_] := 
   var2@GaussianFilter[list1 - list2, {{0, 0, r, 0}}]/
   (var2@GaussianFilter[list1 + list2, {{0, 0, r, 0}}]);

And I organize the edge pixels as the OP.

testdata = #[[PermutationList@RandomPermutation@Length[#]]] &@
 Flatten@ImagePartition[ExampleData[{"TestImage", "Lena"}], 64];
pics = testdata;

test2 = ImageData[#] & /@ pics;
test3 = MapIndexed[{#2[[1]], #1[[All, 1]], #1[[All, -1]], 
     #1[[1, All]], #1[[-1, All]]} &, test2];

Then I create a distance matrix, it checks $n^2$ distance between the edges, for left/right and up/down. It would be better organized as a bipartite graph but so it is for now. Amazingly, this is the slowest part of the code.

n = Length[pics]
 leftrightF = 
   Table[If[i == j, 64^2, N[corr[test3[[i, 3]], test3[[j, 2]]]]], {i, 
    1, n}, {j, 1, n}];
 updownF = 
   Table[If[i == j, 64^2, N[corr[test3[[i, 5]], test3[[j, 4]]]]], {i, 
   1, n}, {j, 1, n}];

Minimum Spaning Geometrical tree

The main ingredient of the Kruskal Algorithm is that when you connect an edge you have to make you create no loops. If there is already a path between two points adding a connection between them will lead to a loop, so this function checks if there is a path between two points. I use Throw and Catch with a While loop for a depth-first search, and pack it all inside the Return. ( There might be a more efficient way to do this with Scan)

ITAPath[INadjgraph_List, point1_Integer, point2_Integer, label_: 1] :=
        Module[{adjgraph = fillin[Function[{y}, 
        {y[[1]], Function[{x}, x[[2]] ] /@ (y[[2]])}] /@ INadjgraph], 
        stack = {}, currentpos, marked},

  marked = Table[False, {i, 1, Length[adjgraph]}];
  marked[[point1]] = True;
  stack = Join[stack, adjgraph[[point1, 2]]];

           If[point1 == point2, Throw[True];];
           While[Length[stack] != 0,
            currentpos = Last@stack;
            stack = Drop[stack, {-1}];
              If[currentpos == point2, Throw[True];];
              marked[[currentpos]] = True;
              stack = Join[stack, adjgraph[[currentpos, 2]]];

This is essentially a Depth-First Search Algorithm to see if there is a path (ITAP) between two points with the adjancecy list. I use a fill function to make the edges undirected in the adjacency list. For some reason I keep the adjacency list directed. ( Actually I think this is because I think the minimum weight between two edges is not always back to the same node)

The second ingredient is to make sure the new point added to the tree does not overlap ( in 2D space) with other points already in the tree. I do this by looking up the "charge" or position of each point in the tree relative to the new point. This is done with the ChargePath function

ChargedPath[INadjgraph_List, point1_Integer, charge_] :=
   Module[{adjgraph = fillin2@INadjgraph, stack = {}, currentpos, 
           marked, chargem, chargeadd, side, parent},

   marked = Table[False, {i, 1, Length[adjgraph]}];
   parent = Table[Null, {i, 1, Length[adjgraph]}];
   marked[[point1]] = True;
   chargem = {0, 0};
   stack = Join[stack, adjgraph[[point1, 2]]];
        parent[[#[[2]]]] = {chargem, point1}; &, 
        adjgraph[[point1, 2]]];

     If[chargem == charge, Throw[True];];
     While[Length[stack] != 0,
           {side, currentpos} = Last@stack;
           stack = Drop[stack, {-1}];

        chargeadd = 
        Which[side == 1, {0, -1}, 
              side == 2, {0, 1}, 
              side == 3, {-1, 0},
              side == 4, {1, 0}];
        chargem = parent[[currentpos, 1]];
        chargem = chargeadd + chargem;
        If[chargem == charge, Throw[True];];
        marked[[currentpos]] = True;
        stack = Join[stack, adjgraph[[currentpos, 2]]];
             parent[[#[[2]]]] = {chargem, currentpos}; &, 
             adjgraph[[currentpos, 2]]];

Again it is a Depth-First Search algorithm to quickly find if there is a point with the same "charge" or position.

These are some custodial functions to filling the adjacency list with undirected edges for traversing the graph. A function to list the charges (positions) of each node in the tree.

fillin[adj_] := Module[{adjtest = adj},
    If[! MemberQ[adjtest[[x, 2]], y[[1]]],
           adjtest[[x, 2]] = Append[adjtest[[x, 2]], y[[1]]];
    ], y[[2]]];
  ], adjtest];
reve[y_] := Which[y == 1, 2, y == 2, 1, y == 3, 4, y == 4, 3];

fillin2[adj_] := Module[{adjtest = adj},
    If[! MemberQ[adjtest[[x[[2]], 2]], {reve[x[[1]]], y[[1]]}],

      adjtest[[x[[2]], 2]] = 
        Append[adjtest[[x[[2]], 2]], {reve[x[[1]]], y[[1]]}];
    ], y[[2]]];
 ], adjtest];

 Chargeds[INadjgraph_List, point1_Integer] :=
 Module[{adjgraph = fillin2@INadjgraph, stack = {}, currentpos, 
         marked, chargem, chargeadd, side, parent},

  marked = Table[False, {i, 1, Length[adjgraph]}];
  parent = Table[Null, {i, 1, Length[adjgraph]}];
  marked[[point1]] = True;

         chargem = {0, 0};
         Sow[{point1, chargem}, 1];
         stack = Join[stack, adjgraph[[point1, 2]]];

              parent[[#[[2]]]] = {chargem, point1}; &, 
              adjgraph[[point1, 2]]];

     While[Length[stack] != 0,
           {side, currentpos} = Last@stack;
           stack = Drop[stack, {-1}];
           If[! marked[[currentpos]],
           chargeadd = 
               Which[side == 1, {-1, 0}, 
                     side == 2, {1, 0}, 
                     side == 3, {0, 1},
                     side == 4, {0, -1}];

      chargem = parent[[currentpos, 1]];
      chargem = chargeadd + chargem;

      Sow[{currentpos, chargem}, 1];

      marked[[currentpos]] = True;
      stack = Join[stack, adjgraph[[currentpos, 2]]];
          parent[[#[[2]]]] = {chargem, currentpos}; &, 
          adjgraph[[currentpos, 2]]];

Building the Adjacency List

Update: simplified the code significantly

Building the spanning tree is the following code.

The main if statements are to separate which labels to give the new edges, since we are looking at global minima of edges(up down left right). Once we go to the correct label that's where we check for "no loops" (kruskal algorithm for mst) and the fact that the tree's don't overlap with due to the geometry (the new part of the algorithm), on top of a bunch of checks to make sure we only have 4 edges and that edges has not been connected already to some other part of the tree.

You can also set a max weight, so that if the weight gets larger than a certain value, then the algorithm stops and just gives the remaining Forest, which you could at that point apply some additional weights to how to match up those remaining Trees.

MSGT[leftright_, updown_, n_, minset_: 50., LRsidesize_: 180,UDsidesize_: 72] := 
   Module[{leftrightweights = leftright, updownweights = updown,
        Adjs = Table[{i, {}}, {i, 1, n}], minlr, minud, pos, 
        location2, location1, i, k, directions, labels, 
        Adjs2, oldimage, newimage, newcount, oldcount, t},

     (*Since the result is a tree, I can only add n-1 edges*)       

     k = n - 1;

     While[k > 0,

     (*find global minimum*) 

        minlr = Min[leftrightweights];
        minud = Min[updownweights];

     (* check if the minimum is an up down edge or left right *)
      (* Once I pass by a minimum I replace it with a large weight, It might be easier or faster to just keep track of *)
        If[minlr < minud,
        pos = Position[leftrightweights, minlr];
        leftrightweights [[First@(First@pos), Last@(First@pos)]] =LRsidesize^2;
        directions = {-1, 0};
        labels = {1, 2};
        pos = Position[updownweights, minud];
        updownweights [[First@(First@pos), Last@(First@pos)]] =UDsidesize^2;
        directions = {0, 1};
        labels = {3, 4};

         location1 = pos[[1, 1]];
         location2 =  pos[[1, 2]];

      If[location1 < location2, 
         {location1, location2} = {location1, location2} /. {location1 ->location2,location2 -> location1};
         labels = RotateLeft[labels];
         directions = -directions;

       (* if the weights are less the pre-set minimum quit *)
       If[(minlr > minset && minud > minset), Break[];];

 (* Main meat of the algorithm *)    
(* I group the checks in two sets *)   
(* first set is to check if the adjacency list is being build correctly, i.e. is it full ? does another node points to it from the opposite label etc. *)

If[(MemberQ[Adjs[[location1, 2]], {Last@labels, _}]), Continue[];];
If[(MemberQ[Adjs, {Last@labels, location2}, {3}]), Continue[];];  
If[(MemberQ[Adjs, {First@labels, location1}, {3}]), Continue[];];
If[(MemberQ[Adjs[[location2, 2]], {First@labels, _}]), 

(* The second set checks is the Path check and the charge path check *)

If[(ITAPath[Adjs, location1, location2, 1]), Continue[];];
        Or[##] & @@ (ChargedPath[Adjs, 
          location1, -directions + #] & /@ (Last@
          Thread@Last@Chargeds[Adjs, location2])) && (! 
         Or[##] & @@ (ChargedPath[Adjs, location2, 
           directions + #] & /@ (Last@
           Thread@Last@Chargeds[Adjs, location1])))))),    

 Adjs[[location1, 2]] = 
  Append[Adjs[[location1, 2]], {Last@labels, location2}];
  (* I don't add the incoming edge, but you could, I don't think it affects the algorithm *)
 (*   Adjs[[location2,2]]=Append[Adjs[[location2,2]],{First@


Now we can finally build the adjacency tree:

{ttt, res3} = AbsoluteTiming@MSGT[leftrightF, updownF, n, 3];


adjtest = 
 Function[{y}, {y[[1]], Function[{x}, x[[2]] ] /@ (y[[2]])}] /@ 
d = DeleteDuplicates[
    Sort[#] & /@ ((#[[1]] -> #[[2]]) & /@ (Flatten[
   Thread[#] & /@ adjtest, 1]))];
GraphPlot[d, VertexCoordinateRules -> Rule @@@ Last@Chargeds[res3, 2]]

enter image description here

Building the image from the Adjacency List

Using VertexRenderingFunction this is rather simple,

GraphPlot[d, VertexCoordinateRules -> Rule @@@ Last@Chargeds[res3, 2],
  VertexRenderingFunction -> (Inset[Image[pics[[#2]], 
     ImageSize -> {41, 41}], #1] &)]

enter image description here

Here is an animation of the building of the puzzle for every step in MSGT,

enter image description here


Posted 2013-09-19T01:39:23.387

Reputation: 3 174

1No judgements, but it will take me a while to go through it. – rcollyer – 2015-04-29T15:47:44.103

5@rcollyer: I wonder if anyone can find a method to reconstruct the entire Lenna image from just the piece Mathematica knows... ;) – J. M.'s ennui – 2015-05-04T03:58:53.837

Hey lalmei, I a had just started editing your answer when I saw you were doing the same. Perhaps you can fix the indentation in the first three code blocks? By the way I love the .gif :). – Jacob Akkerboom – 2015-05-09T11:34:37.790

@JacobAkkerboom I fixed edited most of the code, but it might need more work, I want to do another update tomorrow. btw the frames for the gif took 6 hours to render ( I used heike's packing algorithm to produce each frame), but I then I ran into the size limit of gifs and had to throw away a third of the frames. Don't know why the exported gif has these edge effects in some of the frames though.

– lalmei – 2015-05-09T13:12:47.773

1@lalmei I made a few more tiny edits. I added AbsoluteTiming somewhere, so that the code now completely works when copy pasting. It looks great! I don't know much about gifs unfortunately. But I often hear that in many cases gifs can be made a lot smaller. The gif looks as if it shouldn't be too big. Great that you had the perseverance to let it finish if it took 6 hours :). – Jacob Akkerboom – 2015-05-09T13:27:39.997

@J.M. gives the background and some of the history (as well as wikipedia). Would probably be possible to write a wolfram api call to search for similar images to the test image...

– erfink – 2017-03-19T04:24:28.447

@erfink, I'm well aware of the history; ;) it was merely a joke on unreasonable expectations some people have of reconstruction algorithms. – J. M.'s ennui – 2017-03-19T05:20:16.910



Get pictures

smallSquare = Import /@ FileNames["*", "D:\\...file1"]

Get a mean threshold to Binarize those picture but ensure most information is preserved.

threshold = 
   ImageData[ColorConvert[#, "Grayscale"]] & /@ smallSquare]]


Find the leftmost picture,which picture without any text on left,top and bottom side

left = Select[smallSquare, 
  MatchQ[AllTrue[#, EqualTo[1]] & /@ 
     Through[{First, Last, First@*Transpose, Last@*Transpose}[
       ImageData[Binarize[#, threshold]]]], {True, True, True, 
     False}] &]

Based on MatchingDissimilarity,I find series picture adjacent to the left,and until the rightmost picture without any text on the right side.

         First[Nearest[Complement[smallSquare, left], Last[left], 1, 
           DistanceFunction -> (N[MatchingDissimilarity @@ {Last[
           Transpose[ImageData[ColorNegate@Binarize[#1, threshold]]]], 
         First[Transpose[ImageData[ColorNegate@Binarize[#2, threshold]]]]}] &)]]]], 
       threshold]]]], EqualTo[1]]]

Combine all pictures



Well,since nobody conquer the second question yet,that is a good reason to climb this mountain.But like the ybeltukov said here

Unfortunately Q2 have too many blank edges to reconstruction

The same difficulty for me to hard overcome,but whatever this is a good try or start for the target

smallSquare = Import /@ FileNames["*", "...\\file2"]
threshold = 
    ImageData[ColorConvert[#, "Grayscale"]] & /@ smallSquare]];
data = ImageData[Binarize[#, threshold]] & /@ smallSquare;
UpEdge = MapIndexed[{First[#1], First[#2], {1, 0}} &, data];
DownEdge = MapIndexed[{Last[#1], First[#2], {-1, 0}} &, data];
LeftEdge = 
  MapIndexed[{First[Transpose[#1]], First[#2], {0, 1}} &, data];
RightEdge = 
  MapIndexed[{Last[Transpose[#1]], First[#2], {0, -1}} &, data];
edges = Select[Join[UpEdge, DownEdge, LeftEdge, RightEdge], 
   AnyTrue[First[#], EqualTo[0]] &];
init = 10;
mat = {{19, 19} -> {init, data[[init]]}};
margin = Select[edges, #[[2]] == init &];
Do[{stackMargin, next} = 
        Select[edges = 
          DeleteCases[edges, _?(#[[2]] == Last[margin][[2]] &)], 
         Last[#] == -Last[currentEdge] &]}], {currentEdge, margin}]], 
    MatchingDissimilarity @@ #[[All, 1]] &, 1]];
 nextPos = 
  Select[mat, First[Values[#]] == stackMargin[[2]] &][[1, 1]] + 
 AppendTo[mat, nextPos -> {next[[2]], data[[next[[2]]]]}];
 margin = 
  Join[DeleteCases[margin, stackMargin], 
   Select[edges, (#[[2]] == next[[2]] && 
       Last[#] =!= Last[next] &)]], 11]
   Array[0 &, 
      Keys[matRule = Thread[Keys[mat] -> Last /@ Values[mat]]]]]], 


Posted 2013-09-19T01:39:23.387

Reputation: 19 940