Luminous Landscape Forum

Raw & Post Processing, Printing => Digital Image Processing => Topic started by: afaf9999 on September 09, 2009, 09:40:33 am

Title: How to extract the handwritten characters from table boxes?
Post by: afaf9999 on September 09, 2009, 09:40:33 am
Hi to all

I have graduate project, and the problem of it is:
How to extract the handwritten characters from table boxes?
For example this is the original picture:
(http://img35.imageshack.us/img35/4314/aa2og.jpg)

And I want it to be as this:
(http://img196.imageshack.us/img196/2177/aa3q.jpg)

I suggested in the project proposal to use Projection Profile method (X-Y tree) or Hough Transform method to find the straight lines in the image which are (table boxes) and remove them.

I have tried to apply Projection Profile method (X-Y tree) in MATLAB but I didn't find code or algorithm for it. Then I tried to apply Hough Transform method, but I didn't success totally with it, also I don't know how to delete the straight lines after found them?

So, this is my problems that i wish find solutions for them with anyone can help.

Thanks.

ALI.
Title: How to extract the handwritten characters from table boxes?
Post by: BernardLanguillier on September 09, 2009, 10:29:45 am
It would be very easy to find an algo if there were no intersection between the grid and the characters.

Still, it shouldn't be too hard to find measurable characteristics for the pixels belonging to the grid vs those belonging to characters, knowing that:

- the grid is regularly spaced,
- it is made up of adjacent pixels that are close to being black,
- the grid is surrounded by a mostly white area,
- ...

Cheers,
Bernard

Title: How to extract the handwritten characters from table boxes?
Post by: tomrock on September 09, 2009, 11:14:25 am
Make the boxes a color that the scanner won't see.
Title: How to extract the handwritten characters from table boxes?
Post by: afaf9999 on September 09, 2009, 01:59:40 pm
thanks for replying,
actually the idea of the project is how to extract the characters from the grid even though overlapping or intersection between them, the two objects have same color which is black , there for my solution was finding the straight lines , because the handwritten characters can not be straight.  
but my problem is how to apply this solution?

Quote from: BernardLanguillier
It would be very easy to find an algo if there were no intersection between the grid and the characters.

Still, it shouldn't be too hard to find measurable characteristics for the pixels belonging to the grid vs those belonging to characters, knowing that:

- the grid is regularly spaced,
- it is made up of adjacent pixels that are close to being black,
- the grid is surrounded by a mostly white area,
- ...

Cheers,
Bernard
Title: How to extract the handwritten characters from table boxes?
Post by: afaf9999 on September 09, 2009, 02:00:59 pm
it suppose to be same color (black)

Quote from: tomrock
Make the boxes a color that the scanner won't see.
Title: How to extract the handwritten characters from table boxes?
Post by: BernardLanguillier on September 10, 2009, 12:49:06 am
Quote from: afaf9999
thanks for replying,
actually the idea of the project is how to extract the characters from the grid even though overlapping or intersection between them, the two objects have same color which is black , there for my solution was finding the straight lines , because the handwritten characters can not be straight.  
but my problem is how to apply this solution?

Got it, but it shouldn't that hard.

Just scan from outside the frame, assume that the first pixel found belong to the frame, then move on from there, identify straight segments (start point, lenght and direction), deduce from the statistically the step of the grid, and that should get you the pixels.

You'll have to deal with the width of these lines too...

Cheers,
Bernard
Title: How to extract the handwritten characters from table boxes?
Post by: papa v2.0 on September 15, 2009, 10:11:48 am
hi

have you tried  'edge' function in matlab?

[g] =edge(f,'sobel',T,dir);

g is a logical image map, T is threshold   f is input image and 'sobel' is the edge detector, dir is direction (vertical, horizontal or both(default))

assuming your image is square in the first place
Title: How to extract the handwritten characters from table boxes?
Post by: Jonathan Wienke on September 15, 2009, 07:25:37 pm
What OP is probably really interested in is figuring out a better way to remove obscurations from CAPTCHA images so that spambots can more easily register themselves on forums...
Title: How to extract the handwritten characters from table boxes?
Post by: dkekesi on September 27, 2009, 06:23:23 pm
While not strictly fits the profile of this site, document imaging applications can solve your problem easily and much more. I work with Kofax Capture (www.kofax.com) as a solution provider (at adifferent part of the world, so I may not be of further use). It can do what you seek and even recognize handprinted characters. This is a standard task for that software. It is not cheap, but does the job fine. Kofax has resellers all over the world who will be more than glad to help you.
Title: How to extract the handwritten characters from table boxes?
Post by: Brad Proctor on September 27, 2009, 07:46:54 pm
Quote from: Jonathan Wienke
What OP is probably really interested in is figuring out a better way to remove obscurations from CAPTCHA images so that spambots can more easily register themselves on forums...

lol, I'd guess your right.