The goal of converting “The Etude” magazine to Flipbook format was to make the text searchable by commercial search engines without losing the fundamental character of the printed matter.

I’ve given it a couple of weeks. Searchability appears to have been accomplished, but with a lot of caveats.

Unlike the flat HTML pages of etudemagazine.com, which were carefully edited from the original OCR, the text being barfed up to the searchies by FlexPaper is akin to the kind of gobbledygook produced by many other scan-to-text web products.

The ABBYY Finereader software used to scan and convert “The Etude” to text is head and shoulders superior to whatever OCR engine FlexPaper uses.

A bigger problem is that searchies are indexing non-existent pages.

A real page URL from the June, 1899, issue looks like this:

http://etudemagazine.com/1899/06/index.php?page=5

That pages is indexed, but for some reason the searchies are also gobbling up pages with no content, looking for page numbers well beyond the content of the magazine.

Thousand of URLs like this do nothing to improve “The Etude”‘s SEO:

http://etudemagazine.com/1899/06/index.php?page=519

So far I find no way to prevent this, but I look forward to a reply from FlexPaper’s tech support.

FlexPaper has not worked exactly as advertised. FlexPaper is “Open Source”, but the “supported” product is expensive compared to comparable commercial offerings. I must convert to HTML4, not the more desirable HTML5, because of conversion errors that no one at FlexPaper can explain.

This is technology’s rubicon: When human beings responsible for a software have no idea why it does not work. The Internet will end like this.

Add that to the long list of this planet’s software mysteries. HTML4 is probably just fine for getting the content indexed, but I wish support for a usable way to edit the OCRed text was available.

UPDATE: June 3, 2014. I contacted FlexPaper about the search engine bug, in which searchies are gobbling up hundreds, even thousands of pages with the SEO-hostile quality of zero content save for PHP error messages. The response came a few days later. I’m told that a new version will be released soon to address this problem that I reported. I guess I won’t do any more conversions from PDF to Flipbook until the new release.