I decided to take a new approach with the scans and OCR-conversion I did with “The Etude” music magazine a couple of years ago. I took a long break from that project, in which I scanned hundreds upon hundreds of pages into the ABBYY Finereader OCR software, then hand-edited the errors as best I could. The goal was to make the text fully searchable without so much of the gobbledygook text commonly found in unedited OCR conversions. Additionally I thought it seemed worthwhile to make individual sections of the magazine independently searchable, which I did using the formidable swish-e search software.
More recently I thought I’d try the Flipbook approach. After a somewhat arduous decision making process I settled on the FlexPaper Desktop Publisher, from Devaldi.com.
Using ABBYY Finereader I converted the scanned pages to PDF format. Then I dropped them in to FlexPaper. My intent was to turn these pages into fully searchable HTML5 documents, making the text content available to search engines without losing the layout and fundamental character of the printed magazine.
The FlexPaper approach seemed promising but unfortunately it did not work. I suspect the text-heavy nature of the content (combined with the teeny-tiny size of said text) causes FlexPaper to choke. When attempting to save and publish a PDF as HTML5 I was rewarded with an HTML5 document that is graphically warped and mangled, and which essentially grinds to a halt any web browser with which I attempt to view it.
Creating the PDFs and publishing them through FlexPaper is laboriously time-consuming. Having spent $150 for the software I find that the inability to republish “The Etude” in HTML5 a disappointment. The software is also pocked with numerous interface annoyances and idiosyncrasies too irritating to explicate.
FlexPaper’s OCR conversion engine makes mistakes — as any reasonable person should expect from a OCR engine, especially when damaged and stained publications are involved.
But correcting OCR errors in FlexPaper is incredibly annoying. Text matter is stored in a gigormous single-line JavaScript file in which every single word (and blank space) is surrounded with enormous blobs of code.
Want to change something in the words “Piano Pieces? Just open your handy-dandy vi editor and look for this:
[353,336,38,14,4,”Piano”],[353,373,10,14,4,” “],[353,384,55,14,4,”Pieces. “]
Nevertheless, the software and its makers seem trustworthy (unlike some of FlexPaper’s competitors) so I shall stick with it for now while keeping my eyes open for better solutions.
Unable to publish to HTML5 I find that publishing from FlexPaper to HTML4 at least works. I’ve shared a handful of copies of “The Etude” from 1899 to experiment, to see if search engines really do index the text matter and if the content finds new audiences in its “responsive” format (which promises viewability on virtually any device).
Take a look, and tell me what you think of these editions of “The Etude” from the year 1899:
This is the first real posting to Szapp.Com. Szapp is pronounced “shop” and this is where I go to Talk Szapp about the continuous aggravations and annoyances I encounter whilst doing combat with freeware, commercialware, payware, and any other software that demands more of me than I have time or energy to give it.
When I encounter failure messages from software or web sites I see a world filled with rooms of software developers laughing, laughing at me for my intellectual inability to navigate their perfectly obvious software product, those rooms full of overpaid software developers thence going back to their company-supplied pool table to complain about their paying audience’s lack of qualifications to use the product. (I have been in these belly-slapping rooms. I know how it works.)
Every posting to Szapp ends with a map of a random location somewhere in our world.