cuneiform PDF recognizing
Does it plaining to add PDF file regognizing for CuneiFrom?
The problem is that some PDF's with, for example, russian text couldn't be copied normaly directly by "text selection tools" of pdf-viewer due to problems with incorrect internal codepage of pdf-document or internal fonts and after inserting to text document editor.
Now to get text form of such "problematic pdf-file" it's need, as one of path of resolving such task, to open pages of pdf in Gimp "as images" (not as layers!) with 300 dpi at least (for following successful recognition) and save each page as 16bits (for small file size) *.bmp (or *.tiff). Otherwise very big input file (greater than 100 mb ) lead to "*** buffer overflow detected ***: cuneiform terminated" of cuneiform.
After it could to regognize receive *.bmp files by cuneiform in the form of separate text files.
So does you planning to add PDF file recognition at least (к some other way) in the form of next background for user steps:
1. importing of .pdf to temporary .bmp (or .tiff) files with manual setting of dpi resolution;
2. the automatic batch regognition of each of them;
3. automatic creation from recognized images one merged output text file instead of series of them?
Thank you in advance.
Question information
- Language:
- English Edit question
- Status:
- Solved
- Assignee:
- No assignee Edit question
- Solved by:
- Sergey Torokhov
- Solved:
- Last query:
- Last reply: