## Identifying and Labeling multiple letters in image

0

While I attempt to learn AI/ML I have taken on the task to create a Boggle solver. The idea is that a system could take an image of a Boggle arrangement of letters and identify the letters (and the relative placement/ordering of them) which could then be used to match with a dictionary.

Using Blender, I created a script to auto-generate varying camera orientations, letter positions, etc. I also created corresponding segmentation masks and letter lists, as shown here:

letters: [20, 3, 23, 1, 23, 15, 24, 7, 22, 12, 11, 3, 24, 9, 18, 20, 1, 0, 10, 23, 20, 25, 11, 8, 21]

I initially tried to use a U-Net to try to automatically recreate the mask but that was unsuccessful. I am now thinking that re-creating the mask might be unnecessary and I would like to go straight from the render image to the list of letters. What might be a good approach for this type of problem? My current network is the left side of a U-Net that is then reduced using Dense layers to a one-hot tensor for the entire list of letters, but it fails to make any progress (hovers around .04 accuracy).

input_1 (InputLayer)         [(None, 256, 256, 3)]     0
_________________________________________________________________
lambda (Lambda)              (None, 256, 256, 3)       0
_________________________________________________________________
conv2d (Conv2D)              (None, 256, 256, 16)      448
_________________________________________________________________
dropout (Dropout)            (None, 256, 256, 16)      0
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 256, 256, 16)      2320
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 128, 128, 16)      0
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 128, 128, 32)      4640
_________________________________________________________________
dropout_1 (Dropout)          (None, 128, 128, 32)      0
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 128, 128, 32)      9248
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 64, 64, 32)        0
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 64, 64, 64)        18496
_________________________________________________________________
dropout_2 (Dropout)          (None, 64, 64, 64)        0
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 64, 64, 64)        36928
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 32, 32, 64)        0
_________________________________________________________________
conv2d_8 (Conv2D)            (None, 32, 32, 128)       73856
_________________________________________________________________
dropout_5 (Dropout)          (None, 32, 32, 128)       0
_________________________________________________________________
conv2d_9 (Conv2D)            (None, 32, 32, 128)       147584
_________________________________________________________________
flatten (Flatten)            (None, 131072)            0
_________________________________________________________________
dense_3 (Dense)              (None, 1024)              134218752
_________________________________________________________________
dropout_6 (Dropout)          (None, 1024)              0
_________________________________________________________________
dense_4 (Dense)              (None, 675)               691875
_________________________________________________________________
reshape (Reshape)            (None, 25, 27)            0