How to convert scanned PDF to searchable PDF?

    There is a kind of PDF file which is created by sending Office files, images, etc. to an Acrobat like PDF printer and those created by scanning physical paper like pages of a book, legal documents, etc. Normally speaking, those kinds of PDF file can not be edited let alone extract text from it. This feature will cause some using problem when you need to reuse the content in scanned PDF. In this article, I will show you how to convert scanned PDF to searchable PDF.

  I use software VeryDOC Raster to Text OCR Converter Command Line, which can also help you convert PDF to plain Text document and save the document as TXT format which can be edited freely. Please check more information on homepage, in the following part, I will show you how to make the conversion from scanned PDF to searchable PDF. The so called searchable PDF, is a kind of text based PDF file, which allows you to do copy and paste easily.

Step 1. Download Raster to Text OCR Converter Command Line

  • As its name shows, this is one suit of command line version software. When downloading finishes, there will be a zip file. You need to extract it to some folder then you can call the executable file in MS Dos Windows.
  • And this is Windows version software, it supports all the Window system both of 32-bit and 64-bit.

Step 2. Convert scanned PDF to searchable PDF

  • Here is the usage for your reference: pdf2txtocr.exe [options] <PDF-file> <Text-file>
  • When converting scanned PDF to searchable PDF, please refer to the following command line templates.
    pdf2txtocr.exe -ocr -lang deu -ocrmode 1 C:\in.pdf C:\out.pdf
    pdf2txtocr.exe -ocr -lang eng -ocrmode 2 C:\in.pdf C:\out.pdf
    pdf2txtocr.exe -ocr -lang eng -ocrmode 3 C:\in.pdf C:\out.pdf
    pdf2txtocr.exe -ocr -lang eng -ocrmode 2 -outboxfile C:\in.pdf C:\out.pdf
    pdf2txtocr.exe -ocr -lang fra -ocrmode 1 C:\in.pdf C:\out.pdf
    pdf2txtocr.exe -ocr -lang ita -ocrmode 1 C:\in.pdf C:\out.pdf
    pdf2txtocr.exe -ocr -lang nld -ocrmode 1 C:\in.pdf C:\out.pdf
    pdf2txtocr.exe -ocr -lang spa -ocrmode 1 C:\in.pdf C:\out.pdf
    pdf2txtocr.exe -bitcount 24 -ocrmode 4 -ocr C:\in.pdf C:\out.pdf
    pdf2txtocr.exe -bitcount 8 -ocrmode 4 -ocr C:\in.pdf C:\out.pd
    Now let us check related parameters.
  • -ocr                : enable OCR function for scanned PDF file
    -lang <string>      : choose the language for OCR engine
    -ocrmode <int>      : set OCR mode
      -ocrmode 0: output to text file
      -ocrmode 1: OCR PDF pages and insert new text layer under original PDF pages
      -ocrmode 2: output to plain text based PDF file
      -ocrmode 3: output to OCRed PDF file (BW) with hidden text layer
      -ocrmode 4: output to OCRed PDF file (Color) with hidden text layer

This software supports more than 50 OCR languages, so it can handle most of languages like English, French, German, Italian, Czech, Danish, Dutch, Norwegian, Polish, Portuguese, Spanish, Swedish, etc. scanned PDF to searchable PDF file.  And checking from the above parameters, you can know that this software supports 5 OCR modes which can help you OCR scanned PDF file more accurately.

There are two many functions of this software to be detailed, so check readme.txt file, you will find more useful information. During the using, if you have any question, please contact us as soon as possible.

VN:F [1.9.20_1166]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.20_1166]
Rating: 0 (from 0 votes)

Random Posts

Leave a Reply

Your email address will not be published. Required fields are marked *


Verify Code   If you cannot see the CheckCode image,please refresh the page again!