Pages: [1]   Go Down

Author Topic: OCR  (Read 6588 times)

user

  • Newbie
  • *
  • Offline Offline
  • Posts: 39
OCR
« on: October 03, 2007, 01:29:48 am »

hello

I would like to digitize a book, by taking photos of the book pages and then performing OCR in them

can you tell me please what characteristics must a camera have to do this? big zoom? many megapixels? specific features?

OCR needs a 300dpi scan from a scanner, so can you tell me please which is the equivalent for a digital camera photo? I mean how many megapixels and which distance from the source, how much lighting etc

any specific settings of the camera? does the room need to be very lighted? do I need a tripod? and specific add-ons to the camera? any software?
any suggestion would be much appreciated
 
thanks
Logged

Jonathan Wienke

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 5829
    • http://visual-vacations.com/
OCR
« Reply #1 on: October 03, 2007, 07:45:19 am »

Unless the book is particularly rare or expensive, I'd remove pages one by one and scan them. It will be faster, better resolution, and far less hassle than dealing with lighting and such.
Logged

user

  • Newbie
  • *
  • Offline Offline
  • Posts: 39
OCR
« Reply #2 on: October 03, 2007, 07:53:46 am »

thanks but no way, these books cost alot
Logged

DiaAzul

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 777
    • http://photo.tanzo.org/
OCR
« Reply #3 on: October 03, 2007, 08:07:19 am »

For OCR work you have a number of options depending on how easy it is to remove the pages. If you can't remove the pages then you need to find some way to keep the book flat when you take a picture of each page.

1/ I am assuming that you do not need colour critical photography - this will be for optical character recognition, not accurate reproduction of colour.

2/ Lots of pages, so speed is key

3/ given (1) and (2) above you need some form of copy stand, this link shows an example of a copy stand - a post on which to mount the camera and lighting to provide even illumination. You can shop around for cheaper versions, however, typically you pay for what you get.

4/ You can use a regular point and shoot camera - the two conditions that you need to bear in mind are (i) you need to mount the camera high enough that there are no distortions (barrel or pincushion) in the image (i.e. don't be so close to the book that you use the most wide angle setting, or so far that it becomes difficult to work) - you probably need to be around 3x the longest dimension of the book - so for an A4 document about 1m should be Ok - but you may need to experiment. (ii) choice of camera...if you need 300pixels per inch, multiply the longest side of the document by 300 and that will tell you how many pixels the camera needs. So for 12" document you need 3,600 pixels on the longest edge of the camera's sensor. I would suggest for A4 any camera at 10Mpix or 12 Mpix would be more than ample.

5/ If you can remove the pages from the book then I would recommend a document scanner which can produce PDF documents from the scans. This could scan a 100 page document in a couple of minutes. You may consider going to a reprographics shop and asking if they can do it for you. We typically use a Canon CLC5151 - though this may be a little outside most peoples price range. Cheaper options exist.
Logged
David Plummer    http://photo.tanzo.org/

user

  • Newbie
  • *
  • Offline Offline
  • Posts: 39
OCR
« Reply #4 on: October 03, 2007, 08:31:20 am »

thanks

I am interested in cheap solution like this
http://www.adorama.com/BG1723.html

also, as for the optical zoom? how much it should be?
can all cameras shoot in black/white?
do I need custome white balance?

these cameras are superb (except from the canon which is blurry)
http://www.luminous-landscape.com/essays/back-testing.shtml

but they must be very expensive

is there any cheaper solution?

also these book scanners use cameras:
kirtas-tech.com
atiz.com
and their scan samples are marvelous

can I reproduce these results?
Logged

Jonathan Wienke

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 5829
    • http://visual-vacations.com/
OCR
« Reply #5 on: October 03, 2007, 04:09:00 pm »

All of the cameras (including the "blurry" Canon 1-series) in the page you mention are expensive. A new 1Ds Mark II is probably going to cost you over $6000, and the medium format backs can run over $30000 for just the back, not including the camera to attach it to, or a lens. You can get a scanner perfectly adequate for the job for $300 or less. $5700 will buy a lot of books to disassemble, and then there's the cost of the lighting stuff if you use a camera...
« Last Edit: October 03, 2007, 04:10:31 pm by Jonathan Wienke »
Logged

user

  • Newbie
  • *
  • Offline Offline
  • Posts: 39
OCR
« Reply #6 on: October 03, 2007, 10:14:45 pm »

thanks

can you suggest me please a perfect adequate camera for the job

also, can you post me some photos of book pages with your cameras please

I would like shots from various cameras mentioning the camera model you use

thanks
Logged

user

  • Newbie
  • *
  • Offline Offline
  • Posts: 39
OCR
« Reply #7 on: October 21, 2007, 11:50:50 am »

ok, no more "not enought megapixels" excuse please:

this image is A4 and only 100dpi (850 x 1103  Pixels), which means it can be shot with a 0.9MP camera!

and this image performs just MARVELOUS in OCR, by hitting 99.2% OCR accuracy

do I miss something? we all thought that megapixels is the limitation? well, its not, something else must count more, sharpness? contrast? thats why I am here, to find out...

how can I produce this image with a camera? come on, its an image with meagre 100dpi...

post processing and image editing suggestions/solutions are welcomed ofcourse, it wont do any hurt

EDIT: oh yeah, and its JPG...
Logged

MikeMike

  • Full Member
  • ***
  • Offline Offline
  • Posts: 145
    • http://
OCR
« Reply #8 on: October 22, 2007, 10:51:48 am »

Your digging too deep.

Try things and see if they work or not!
Logged

user

  • Newbie
  • *
  • Offline Offline
  • Posts: 39
OCR
« Reply #9 on: October 22, 2007, 11:09:58 am »

mm love diggin

anyway

now, I need recommendations for compact/ultracompact with best sharpness and image quality above 8MP! if you have specific recommendations for 8MP, 10MP, 12MP would be great!

thanks!
Logged

spidermike

  • Sr. Member
  • ****
  • Offline Offline
  • Posts: 535
OCR
« Reply #10 on: November 05, 2007, 06:35:50 am »

Quote
mm love diggin

anyway

now, I need recommendations for compact/ultracompact with best sharpness and image quality above 8MP! if you have specific recommendations for 8MP, 10MP, 12MP would be great!

thanks!
[a href=\"index.php?act=findpost&pid=147840\"][{POST_SNAPBACK}][/a]

Yopu are asking people's advice on something they have probably never done.
You reject what advice you have been given and then you have the cheek to ask:
Quote
also, can you post me some photos of book pages with your cameras please

You seem to know what you want to do so there is an easy answer: take a book (any book) down to a local camera store, along with your SD or CF card. Explain to the assistant what you want to do and take some sample pictures with 2 or three different cameras (maybe one point and shoot, one compact and one DSLR). Then take your memory card home and load the pictures onto your computer and you will see first-hand which camera you want.

Simple, really.
Logged
Pages: [1]   Go Up