Number Plate Recognition
This describes a simple program to isolate car number plates from fairly well controlled images for processing by an OCR package such as GOCR/JOCR. At present it’s tuned for UK plates, but should be easily adaptable for other countries that use a Roman alphabet and ‘Arabic’ numerals. The algorithm handles some scaling of the images but not rotation. As cars are upright under normal operating conditions this is not considered a problem.
How it works
In order to detect text in a scene, characteristics of text in images are used. Blocks of Roman text consist of a high density of vertical lines. The extraction algorithm detects blocks of these features that form rectangles of the approximate aspect ratio of the number plate. These rectangles are then isolated and passed on to the OCR engine. The steps in the extraction process are given below:
The most important step in the process is to obtain a good quality image. Noise, shadows etc. will cause the algorithm to fail.
An averaging filter is used to reduce noise in the image.
Detect vertical edges
Vertical edges are detected by differentiating the image along the horizontal.
The image is dilated to join up line fragments.
The image is subjected to a 15 pixel 1D convolution in the horizontal direction to filter out small areas of detail and emphasize larger areas.
The image is thresholded at 2sd from mean pixel value
An dilate-open-dilate sequence is applied to remove tiny areas and join up any fragmented areas.
Create a list of rectangles
The distinct areas are labelled and their bounding rectangles calculated.
Rectangles too small for OCR are removed and rectangles that are approximately the correct aspect ratio of a UK number plate are retained.
Extract candidate areas
The target rectangles are removed from the image…
The extracted image area is them passed to the OCR engine for recognition. I used GOCR pre-trained with the standard UK number plate font and obtained the following results:
This example has errors due to the extracted region extending outside of the plate area. Further processing of the extracted plate area is necessary.
i) Currently the main problem is errors caused by edges of the plate being included in the OCR image. Further filtering of this area is necessary.
ii) Also, better results might be obtained by using the rectangle areas from the original captured images as GOCR might perform better than with binarised images. This might also improve the problems caused in i).
iii) Use higher resolution images.
iv) This technique currently isn’t very robust for different images. This may be improved by iii).