Cuneiform reorders text columnwise - I'd like strict line-by-line processing
(tried mailing list, but my message somehow didn't appear there - sorry...)
Hi there,
I'm using Cuneiform for Linux 0.6.0 to extract the (german) text from a (digitally sent and received) fax. I then parse the text to extract certain strings identified by keywords on the fax.
While the actual recognition is at about 100%, Cuneiform sometimes seems to group the text into some kind of strangely ordered rectangular blocks and then process the text inside them. As this messes up the order of the text and thus makes post-processing
Is there a way to do this? Some hidden command-line flag? As this badly messes up the order of texts, I think this could be considered as a bug.
Illustration of FAX:
KEY1 : STRING NO. 1 HERE
KEY2 : STRING NO. 2 HERE
How Cuneiform seems to process it:
|------
| KEY1 | STRING NO. 1 HERE |
| KEY2 | STRING NO. 2 HERE |
| -------
Output(!):
STRING NO. 1 HERE
KEY1
KEY2
STRING NO. 2 HERE
Imagine a lot more keywords, some of them recognized inline, some of them 'ripped out' like this, and strings being potentially multiline. As you can see, it's hard to determine which strings actually refer to which keywords.
How I would love to *always* have the output (most text is actually being recognized like this):
KEY1 : STRING NO. 1 HERE
(STRING NO. 1 MULTILINE)
KEY2 : STRING NO. 2 HERE
Who can help?
Thanks in advance,
Matt
Question information
- Language:
- English Edit question
- Status:
- Solved
- Assignee:
- No assignee Edit question
- Solved by:
- MaXmuc
- Solved:
- Last query:
- Last reply: