Baidu, the Chinese internet giant that made its name with search, just released an updated version of its Baidu Translate app yesterday for iOS. In addition to the usual type-input, voice-to-text translation, and Optimized Character Recognition (it will translate your friend’s T-shirt for you), the update comes equipped with image-recognition software.
That means you can go outside, take a photo of a fire hydrant, and Baidu Translate will display the words “Fire Hydrant” along with the Chinese “消防栓”.
Baidu tells us that the souped-up Baidu Translate marks the broad introduction of the company’s image-recognition technology, though in the past it has dabbled in image search for narrow fields like dog breeds and album covers. The bulk of the research behind the app took place at the company’s Institute of Deep Learning in Silicon Valley, where top data scientists have been working to integrate nimbleness and complexity of the human brain to into confines of computer algorithms.
Curious to see how these new features measure up, I took the app for a test drive in one of Taipei’s best knick-knack filled cafes.
For the image-recognition feature, I started out simple – a coffee mug. Here’s what Baidu came up with.
Okay, not bad – we’re looking at a coffee mug, so we’re in the ballpark. But that first test highlights one of the weaknesses of Baidu translate, and a rather unexpected one at that – it shoots beyond the mark. Rather than coming up with “cup” or “glass,” Baidu churned out a number of different frou-frou beverages. What if I was drinking tea? “Shannon Coffee,” meanwhile, appears to be a literal translation of the Chinese, which out to be interpreted as “Rich, fragrant coffee,” but that’s an issue for the translation team, not the image team.
Here’s another example of an overshoot:
Four dog breeds, no dogs. But given that we’re dealing with a two-dimensional teddy bear and a tapir, it’s impressive that Baidu has identified these two inanimate objects as animals in the first place.
Shifting gears a bit:
Even the “overshoot” answer is impressive. Baidu doesn’t just want to tell you you’re looking at a motorcycle – it wants to tell you the type of motorcycle you’re looking at.
Can it help me read the menu?
Tech In Asia had never tested out Baidu’s OCR features before, so we decided we’d take them for a spin while we were at it.
There are a slew of English-Chinese OCR apps out on the market, though of the ones I’ve used, none have been reliable enough to stay on my iPhone for very long. Baidu in this case remains no exception. When hovering the panel over English words, at its best, Baidu will identify two or three letters and then guess the entire word.
Going from Chinese to English yields even poorer results, though since I live in Taipei and not the mainland, perhaps this is due to a lack of optimization for traditional Chinese text.
Baidu Translate: ‘Caption This’ edition
Of course, Baidu Translate’s image recognition can get amusing at times. Here are some of my favorite gaffes:
Depends on how you define “mobile” I guess…
“Look at this chandelier I bought for the living room, honey, it’s ‘European-style.’”
While Baidu’s image recognition feature has a “cool” factor of 10, it thus far has few practical use cases. Unless you want to know what something is as well as how to say it in a second language, you’ll be much better off just using the plain-Jane type-input translation feature.
Still, the app’s navigation and interface is quick and seamless, and unlike Google Goggles’ image recognition software, it never gets stumped. Even when Baidu misses the bullseye, it’s hard not to get excited when it hits the target – and that’s almost every time you snap a photo. At the moment, Baidu’s image recognition software is amazing technology that doesn’t work. Here’s to hoping that with a little more research and tweaking, it soon will.
(Editing by Steven Millward)
(Top Image via Flickr user bfishadow)