Monday, April 30, 2012

Dispatches from the Content Tsunami


Last week, TV writer and broadcaster Tom Scharpling tweeted exciting news:
@scharpling
I can now reveal the Big Announcement: Michael Lewis is writing his next book about me. It's gonna be called RADIO MAN. Very exciting!
https://twitter.com/#!/scharpling/statuses/190124529006292992
Two hours later, however, the big news had soured slightly:
@scharpling
Bad news. It turns out that the Michael Lewis writing a book about me was a spambot. Beyond disappointed. 
https://twitter.com/#!/scharpling/statuses/190156016103600129

Yes, it appears that the Low Quality Content Tsunami is slowly seeping into popular consciousness. We are staring into the black abyss of low quality texts, to which translators must add a multilingual layer of Low Quality Translations to complete the Literary Sandwich of the Future.

Picture yourself, for instance, relaxing next to a Caribbean beach while using your smartphone to post-edit a masterpiece by Israeli Nobel-Prize-winning pyscologist Daniel Kahneman, Fast and Slow Thinking. Life is easy. The sun is shining. Solicitous waiters bring you a healthy lunch silently, so as not to disturb you.

Wait, no. Rewind. It turns out that Daniel Kahneman’s best seller is called Thinking, Fast and Slow. What you are post-editing is a deceptively similar tome written by an entity called “Karl Daniels”. See what the scumbag Conent Tsunami-ers did there by choosing a fake name that sounds sneakily like Kahneman’s? Unfortunately, the book is no longer available from Amazon (I could KICK myself! However, Google Books managed to at least salvage some memory of this oeuvre in its vast Babelian library). The non-Nobel book, according to Geoffrey Pullum of the Language Log, is:
a compilation of snippets from Wikipedia articles and the like, dressed up like a book. Edited by robots for you to buy by mistake. It's a spam book, part of the "gigantic, unstoppable tsunami of what can only be described as bookspam"
Yes, the world has been enriched by this book by about the same degree as it has been by all those Nigerian scam emails.

But wait, Hal Karl Daniels isn’t the only binary brain that has been busily recycling low quality content to add to the giant tidal wave of information that is going to make us all rich and enlightened. Perhaps you can apply your post-editing chops to the little reports produced by a company called ICON Publishing in San Diego founded by a computer science professor called Philip Parker (Is automated publishing the future?):
Parker's production costs are only 23 cents per book because they're made by computers. Algorithms search through incredible amounts of data from published research and government reports. That info is then plugged into a book format. It's kind of like a very high tech form of Mad Libs. Parker came up with the idea in the '90s when he was writing economic reports.
Oh, wait, the book cost 23 cents to make. Not per word. I mean the WHOLE ENTIRE book cost 23 cents to make. It is highly unlikely that anyone will pay even $0.01 per word to translate it, which is close to below subsistence level in most countries on Earth. So scratch that opportunity. On to the next thing.

Heard about the recent spate of flash crashes driven by stock trading algorithms? Well, it seems that the computer programs that feed on news and use them as signals for buying and selling might be taking their cues from other computers. Which is always reassuring when you mention computers and gynormous amounts of money. Ever heard of the phrase feedback loop? This is from a recent article by Evgeny Morozov ("A Robot Stole my Pulitzer"):
Forbes—one of financial journalism’s most venerable institutions—now employs a company called Narrative Science to automatically generate online articles about what to expect from upcoming corporate earnings statements. Just feed it some statistics and, within seconds, the clever software produces highly readable stories. Or, as Forbes puts it, “Narrative Science, through its proprietary artificial intelligence platform, transforms data into stories and insights.”Don't miss the irony here: Automated platforms are now “writing” news reports about companies that make their money from automated trading. These reports are eventually fed back into the financial system, helping the algorithms to spot even more lucrative deals. Essentially, this is journalism done by robots and for robots. The only upside here is that humans get to keep all the cash.
Maybe Narrative Science needs a couple of financial translators moonlighting as post-editors? Once again, though, the production cost of the source text is so negligible as to make out-and-out raw MT the likeliest candidate for translation.

If you truly expect binary recursiveness to feed you, perhaps you can write to the company that published Computer Game Bot Turing Test and propose your services as a post-editor into Spanish or French for this instant classic. Computer engineer Carlos Bueno describes the book as follows (I am indebted to Spanish IT translation über-geek @jordibal for this anecdote):
Let me tell you about another book, “Computer Game Bot Turing Test”. It's one of over 100,000 “books” “written” by a Markov chain running over random Wikipedia articles, bundled up and sold online for a ridiculous price. The publisher, Betascript, is notorious for this kind of thing.
It gets better. There are whole species of other bots that infest the Amazon Marketplace, pretending to have used copies of books, fighting epic price wars no one ever sees. So with “Turing Test” we have a delightful futuristic absurdity: a computer program, pretending to be human, hawking a book about computers pretending to be human, while other computer programs pretend to have used copies of it. A book that was never actually written, much less printed and read.
But take heart, not all low quality content is attributable to computers. Last week it was reported that China had censored the sex scene in the 3-D version of Titanic. Many online news outlets (Daily Mail, MSNBC, Entertainment Weekly, E Online…) included a statement from the Chinese Ministry justifying the move thusly:
"Considering the vivid 3D effects, we fear that viewers may reach out their hands for a touch and thus interrupt other people's viewing. To avoid potential conflicts between viewers and out of consideration of building a harmonious ethical social environment, we've decided to cut off the nudity scenes."
Too good to be true… and it was. Gawker reports that the quote came from a satirical website:
Tons of English-language news outlets are running with this quote even though, guys, it's obviously not real. The rumor probably originated with this blog post, which fails to mention the joke aspect. 
The Chinese state news agency Xinhua reports that "there is no official response to the roll-back of the censorship policy concerning the 3D film." 
Also, the Chinese movie-going public are not medieval villagers; they understand how 3D works.
And, in closing, a widely circulated article estimates that a good chunk of the Content Tsunami is actually sex videos:
It’s probably not unrealistic to say that porn makes up 30% of the total data transferred across the internet. http://www.extremetech.com/computing/123929-just-how-big-are-porn-sites/2
Humorist Stephen Colbert paraphrased the finding thusly: “Thirty percent of all internet traffic is porn, according to a new report by the New England Journal of Underestimating Everything.”


Miguel Llorens is a freelance financial translator based in Madrid who works from Spanish into English. He is specialized in equity research, economics, accounting, and investment strategy. To contact him, visit his website and write to the address listed there. Feel free to join his LinkedIn network or to follow him on Twitter.

1 comment:

tlumaczenia ekonomiczne Warszawa said...

Spambook. I will remember it! Now I know how to call a book on the Polish market called "Cook with the Pope". Needless to say it had nothing to do neither with cooking nor with the Pope. :-)