An e-mail to the support folks at FlexPaper returned a quick fix to an annoying bug. Search engines were indexing the Flipbooks all right, but after indexing the real pages they continued on into infinity, following hundreds of links to non-existent pages. A code fix to the index.php file looks like it should fix it, but I’ll keep an eye on it.

The code fix was copied and pasted into an e-mail, and also attached as a PHP file. For some reason when I copied and pasted it from the body of the e-mail I got a T_STRING error on line 14, even though nothing whatsoever appeared to be wrong with the syntax of that line. Copying and pasting from the PHP file attachment worked perfectly. Both lines of code were identical to the nekkid eye. Some UTF-8-esque encoding bug? Who knows.

I moved one edition that was being indexed into infinity (“The Etude”, February, 1899) to a new URL and disallowed searchies from continuing to gobble up all the bad URLs at the old address. If all goes as planned that should clear the pipes and get better search indexing on that edition. I’ll see how it goes with this one before changing the others.

Derf! That was fast. Google picked up the new URL just 8 minutes after I changed it. It has already indexed up to page 16, of 31. Though the page numbering brings attention to the annoying vagaries of page numbering in digital publishing. Page 25 of a PDF copy of a printed book is unlikely to be the same page 25 as the hard copy.

A goal with all this is to get the content of all my copies of “The Etude” fully indexed and then build a Google site-search to search all the magazines.

Watching this tick by:

tail -f /var/log/httpd/domains/etudemagazine.com.log |grep ‘1899_02’

In the past I used the mighty swish-e search for text content. I may revisit that for this project, but for now I’m trying to keep things as simple as possible. Swish-e’s documents do not seem to specifically reference support for HTML4 or HTML5, or the Flipbook format for that matter. It is a highly versatile piece of software, though, so I should be able to use it sooner or later.

Each issue of the magazine includes a search box at the top right, allowing a full text search for that edition. That issue-specific search works well enough though it’s a little twitchy.

  • Search only works when you are viewing at 100%. It does not work if your are zoomed in even a little bit.
  • When using Chrome the search term is highlighted on the page but that highlight disappears when you start zooming in. “The Etude” content is so text-heavy that I frequently lose the location of the highlighted word when zooming in. This does not happen when using Internet Explorer, but IE has another twitch:
  • Search terms are highlighted when using IE, and the search term remains highlighted whilst zooming in, but sometimes when zooming in the text is blurry, and barely readable. This happens inconsistently. As with Chrome, one must be at 100% zoom to be able to search
  • There appears to be no way to bookmark a search result. That would be an ace feature.

For some reason Bing is not crawling the content at all, though it has a healthy appetite for the rest of the site’s pages. I shall manually submit it, I guess.