Sometime we need to convert image to PDF, but when converting finishes and checking PDF file, we will find that the PDF is different with others PDF file, which can not be copied and pasted. In order to solve this problem, VeryDOC will introduce one way of converting raster file. The software I used here is VeryDOC Raster to Text OCR Converter Command Line, which also can be used to convert image to text. In the following part, I will show you how to use this software.
- Once downloading finishes, there will be a zip file. Please extract it to some folder then you can call the executable file in MS Dos Windows.
- There are some help documents, bat file by which you can check conversion effect at once.
Step 2. Convert rasterfile.
- When you use this software, please read usage and parameter list carefully.
- Here is the usage for your reference. Usage: pdf2txtocr.exe [options] <PDF-file> <Text-file>
- When converting raster , please refer to the following command line templates.
pdf2txtocr.exe -ocrmode 1 -threshold 200 -ocr C:\in.tif C:\out.pdf
pdf2txtocr.exe -ocrmode 2 -rotate 90 -ocr C:\in.jpg C:\out.pdf
pdf2txtocr.exe -ocrmode 3 -threshold 200 -ocr C:\in.png C:\out.pdf
pdf2txtocr.exe -ocrmode 4 -rotate 90 -ocr C:\in.bmp C:\out.pdf
pdf2txtocr.exe -ocrmode 3 -threshold 200 -ocr C:\in.gif C:\out.pdf
pdf2txtocr.exe -ocrmode 4 -rotate 90 -ocr C:\in.tga C:\out.pdf
This software provides 5 OCR modes, please check related parameters. Please note do not use -ocrmode 0 as this parameter can help you output TEXT file input image file.
-ocrmode <int> : set OCR mode
-ocrmode 0: output to text file
-ocrmode 1: OCR PDF pages and insert new text layer under original PDF pages
-ocrmode 2: output to plain text based PDF file
-ocrmode 3: output to OCRed PDF file (BW) with hidden text layer
-ocrmode 4: output to OCRed PDF file (Color) with hidden text layer
Also before processing PDF, you can adjust input image in advance. Say you can rotate input image, adjust image resolution and so on so forth. Here are some parameters for your reference:
-bitcount <int> : set color depth when render PDF page to image data, it can be set 1, 8, 24, default is 8-bit
-rotate <int> : rotate pages before OCR
-threshold <int> : lightness threshold that used to convert image to B&W
-ocr : enable OCR function for scanned PDF file
By this software, you can convert most of the raster image file like TIFF, JPG, PNG, BMP, GIF, PCX, TGA, JP2, PNM and MNG to searchable PDF file. During the using, if you have any question, please contact us as soon as possible.