By Brett Smith for redOrbit.com – Your Universe Online
As smartphones become a more and more integral part of our daily lives, a group of engineers at the University of Rochester are predicting that they will soon be able to gauge a person’s mood in the same way our best friends and family members can—based on voice inflections.
This week, that team of engineers will debut a new computer program at the 2012 IEEE Workshop on Spoken Language Technology that gauges human feelings via speech patterns, with greater accuracy than any previous approach.
Built by graduate student Na Yang, the program is in the form of a smartphone app that displays either a happy or sad face based on its analysis of a user’s voice.
“The research is still in its early days,” says Na’s professor, Wendi Heinzelman, “but it is easy to envision a more complex app that could use this technology for everything from adjusting the colors displayed on your mobile to playing music fitting to how you’re feeling after recording your voice.”
The app analyzes 12 features of speech, including pitch and volume, to identify one of six different emotions from a voice recording. According to the researchers, it achieves 81 percent accuracy – improving on earlier programs that typically achieved about 55 percent accuracy.
To ‘teach’ the app how to determine a speaker’s mood, researchers first established 12 specific features in speech that were measured in different recordings. The researchers then categorized each of the recordings and used them to tell the program what sounds were “sad,” “happy,” “fearful,” “disgusted” or “neutral.”
“We actually used recordings of actors reading out the date of the month – it really doesn’t matter what they say, it’s how they’re saying it that we’re interested in,” Heinzelman said.
After being shown what to look for, the program analyzed new voice recordings and tried to identify any of its known emotions. If the program was unable to choose between two or more emotions, it labeled the recording unclassified.
The engineers are currently working with Rochester psychologists Melissa Sturge-Apple and Patrick Davies, who are assessing the technology in the context of interactions between teenagers and their parents.
“A reliable way of categorizing emotions could be very useful in our research,” Sturge-Apple said. “It would mean that a researcher doesn’t have to listen to the conversations and manually input the emotion of different people at different stages.”
She explained that the engineers are trying to replicate an innate process in humans that occurs mostly on a subconscious level.
“You might hear someone speak and think ‘oh, he sounds angry!’ But what is it that makes you think that?” asks Sturge-Apple.
She noted that our brains learn to detect a speaker’s emotions based on the way they alter their voice’s volume, pitch, and subtle harmonics.
“We don’t pay attention to these features individually, we have just come to learn what angry sounds like – particularly for people we know,” she said.
One problem that has plagued previous efforts to identify a speaker’s mood through a computer program has cropped up again in the new Rochester team’s endeavor – if the voice being detected is different from the one that trained the system, the app’s accuracy dropped from 81 percent to about 30 percent.
The researchers said they are looking at ways of minimizing this effect, potentially by training the system with a voice in the same age group and of the same gender.
Source: Redorbit
Tidak ada komentar