I made a strange and eerie discovery regarding my Sorabji.MOBI project, and other things I do with my mobile phone.

Over on Sorabji.MOBI I sometimes post stories which are dictated by me and automatically converted to text by Google’s voice recognition software. These dictations are usually riddled with errors, which are sometimes funnier or more poignant than what I actually said.

I saw an article on CNET about how Google saves actual audio files of your voice as you issue commands and interact with Google Voice Search on your phone. If you’ve ever asked Google “Where is 123 Main Street?” then audio of you asking that question resides on Google’s servers, where you can access and listen to it yourself should you be fascinated by the sound of your own voice. Audio is said to be retained for a limited span of time solely for the purpose of improving the quality of their voice recognition software.

I did not think much of this CNET report because I never do commands. I don’t remember ever vocally asked Google anything through my phone. Talking to computers makes me feel like an imbecile, and it’s only been recently that I’ve warmed up to doing dictation in the manner I do at .MOBI.

I checked my Google account anyway and was thus surprised to discover over a year’s worth of audio in which I am speaking in a deliberate staccato voice, enunciating words clearly so that they will be transcribed as accurately and possible, and speaking punctuation marks such as “comma” and “period”. I am not speaking those commands that CNET said Google was recording. I am using voice-to-text to write text messages, story drafts, e-mails, etc. It sounds like like an intermediary version of my voice hanging on the aural walls of an uninhabited audio museum.

It reminded me of a span of time a few years ago in which I was sick and depressed and walking into walls for no physically diagnosable reasons anyone could explain. I had a brain MRI performed, took a bunch of drugs, had gallons of blood taken, but never got a diagnosis beyond extreme mental depression.

It was during this period that I got a field recorder. To try it out I recorded myself saying something like “Mary had a little lamb.” When I played it back my voice sounded warped, and elongated. I thought I had spoken the words quickly but they played back at maybe two-thirds the rate they were supposed to sound: “Mmmmmmaaaaaarrrrrryyyyyy hhhaaaaaaddd aaaaaa…” I seriously freaked out about this, thinking “Did I just have a stroke?” I thought for a hot second that while I thought I was speaking normally I was actually talking in a slow, sickly drawl, and that my mind was playing tricks on me.

A few seconds of talking out loud almost convinced me that I really was speaking normally. I also was encouraged to remember that I had just been in public less than an hour before, speaking to people who would certainly have given some indication if my voice was messed up.

I was not fully convinced everything was clear, though, until I realized what had happened: I had unintentionally activated a feature of the voice recorder which slowed down playback of audio. This feature makes it easier for typists to transcribe spoken word audio files on playback from the device. My brain was fine or, as the MRI report memorably confirmed, “unremarkable”.

With all the attention being directed at my brain at the time I don’t think it was too far off to at least momentarily think I’d had some kind of seizure.

Google claims that “Only you can see this data”, implying that Google itself has no way of accessing it. I don’t think any reasonably informed individual honestly believes that. If someone was suspected of a crime and their text messages or correspondences were garbled on account of being passed through this speech-to-text software then I think we’d find that Google is perfectly able to access that person’s audio archive and turn it over to authorities, even if the suspect thought s/he had deleted the files.

I am a little miffed to that there seems to be no easy way for me to download this stuff should I want to use it for my own purposes. Obviously I could record the playback but who has time… The only way I can find to download these files is to view source, go to the bottom of the page, and look for code chunks that look like this:

["/history/audio/play/1446490275998644?authuser=1"]

Delete the leading and trailing brackets and quotation marks, so you are left with this:

/history/audio/play/1446490275998644?authuser=1

Then prepend the string with http://history.google.com. That will give you this full URL:

http://history.google.com/history/audio/play/1446490275998644?authuser=1

You can put that URL in your web browser and access the MP3 file directly. Using Chrome on Windows 7 I am presented with this screen, where I can right-click and choose “Save” or type Control-S to save the audio file:

Direct MP3 Download of google Voice History

Direct MP3 Download of google Voice History

It is a clumsy method but if you really, really want to save that sound file of your voice then it works, as would finding a way to record it using something like Audacity.

I blame my ignorance to the fact that Google is allowed to intercept dictation spoken into applications not developed by them on how I became blind to the not-exactly-hidden Google logo that appears whenever I activate voice recognition in any app. I typically do dictation out of doors, where the bright sun prevents me from clearly seeing the screen. Partly on account of that I guess I just never noticed it or paid it any mind.

Google retains recordings of outgoing text messages, everything that I speak into the Note Everything text editor, any e-mails that I dictate, and so on. Basically any time you activate speech recognition in any Android app the sound is recorded and sent to Google.

Some of it could be embarrassing (especially the text messages) anyone somehow access it, but I’m not concerned about that. I’m just a little creeped out to have unexpectedly discovered this weird trove of audio that I specifically thought did not exist. I actually thought I had found an efficiency in bypassing the actual recording of audio and converting my words straight to text. Instead I unwittingly clog the cloud with yet more of my digital accumulation.

I could delete it all but it has potential to be a useful repository: I sometimes have no idea what I actually said when reading the transcriptions, as the software makes significant errors almost every single time I use it. If I simply cannot remember what I said I can now just play it back – if I can stand to listen to that mutant version of my voice.