Convert scan to text through OCR technology

   When scan paper documents to image, it is easy to upload, transfer. But there is one problem that it is quite hard to extract text from scan file. So it will be hard for us to get information from it. If there is one page of scan file, we can type word from scan file to text. However, if there are thousands of pages, situation will be quite hard to handle.  In this article, I will show you how to convert scan to text through OCR technology.

  I software I use is VeryDOC Raster to Text OCR Converter Command Line, by it we can convert scan file in English, French, German, Italian, Czech, Danish, Dutch, Norwegian, Polish, Portuguese, Spanish, Swedish to text. In the following part, I will show you how to use this software.

Step 1. Download Raster to Text OCR Converter Command Line

  • On website, there are two Licenses: server version and developer version. If you just use this software on simply computer, laptop or server and do not use it for developing, simply choose the server version.
  • When downloading finishes, there will be a zip file. Please extract it to some folder then you can call the executable file in MS Dos Windows.

Step 2. Convert scan to text.

  • When use this software, please refer to the usage and examples.
  • Here is the usage for your reference: Usage: pdf2txtocr.exe [options] <PDF-file> <Text-file>
  • Here are some examples for your reference. You can scan file to any one of the below formats like TIFF, JPG, PNG, BMP, GIF, PCX, TGA, JP2, PNM and MNG.
  • pdf2txtocr.exe C:\in.tif C:\out.txt
    pdf2txtocr.exe C:\in.jpg C:\out.txt
    pdf2txtocr.exe C:\in.bmp C:\out.txt
    pdf2txtocr.exe C:\in.png C:\out.txt
    When convert those scan file to text, simply input the full path of the scan file and then output text file full path. By this way, you can convert scan file to text directly.

  • When converting tiff file in some other languages except English, please refer to the following command line template.
    pdf2txtocr.exe -lang deu C:\in.tif C:\out.txt
    Please add parameter –lang and corresponding languages parameters. This software supports more than 50 OCR languages like French, German, Italian, Czech, Danish, Dutch, Norwegian, Polish, Portuguese, Spanish, Swedish, etc. but you need to download corresponding language package on website. Please use the right language symbol like
     
  • Bulgarian bul.zip   Catalan cat.zip   Czech ces.zip  German deu.zip   Greek ell.zip   English  eng.zip  Finish  fin.zip     French fra.zip

    Hungarian hun.zip  Indonesian  ind.zip  Italian  ita.zip  Latvian  lav.zip  Lithuanianlit.zip  Dutch nld.zip

So this software will be your real helpful assistant when you need to extract text from scan file. And there are more parameters of this software, I can not list all of them here. During the using, if you have any question, please contact us as soon as possible.

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)

Random Posts

Leave a Reply

Your email address will not be published. Required fields are marked *


Verify Code   If you cannot see the CheckCode image,please refresh the page again!