AI method for evaluating user performance based on audio pitch re: public speaking


First I will clarify the context, I have to learn new technologies for my bachelor thesis. I am making a mobile application similar to Flappy Bird, except it's voice controlled. The idea is to have this app as a practice tool for people with voice problems (monotony). The bird flies upwards when the user makes a high pitch, and downwards when it's a low pitch.

The app part is pretty much complete. But my project also consists of a website where the vocal coach can follow his user's progress. I am saving all the user's game history in a database. More specifically, every time the bird passes through a pipe opening, I save the pitch values that he correctly produced. I also save the pitch value that made him lose the game, along with the pitch value that he was supposed to produce to not lose.

Having all these data in my database, this is where I'd like to add AI as a new technology for my project. I thought of a nice feature where it would calculate the user's strong and weak points. For example it would say that a user is strong at high pitches, but is not good at low pitches. But since I don't know anything about AI, I'm not sure to what extent this is possible and have no idea where to start.

I would really appreciate if someone could share their knowledge with me by pointing me to existing libraries/frameworks! I'd most preferably include the AI code inside my API that is directly connected to my database, so I can directly return the calculated data to my website. My API is a SpringBoot app, so I guess I would need Java AI libraries?

M. Benamar

Posted 2019-05-15T20:40:40.257

Reputation: 51



The question is a typical literature question. Which means, that a certain practical problem is there and the question is, what the around 10 million papers from the AI community will say about it. The first thing which is important to know is, that the described use case of a mobile game was going viral in the social media under the name “voice-activated 'Flappy Bird”. Many youtube videos are showing the software in action. What is seen in the video is a woman who is crying “jump jump” into the microphone and as a result the flappy bird character is going upward. In the literature this kind of interaction is called “voice control” and was explained in a paper about wheelchairs. [1]

According to google scholar there are many more papers available for this subject. If i have understood the use case right, than the entire speech of the human into the microphone is recorded and analyzed. This sounds like a typical problem for a datamining / deeplearning application.

A rudimentary description of such a system is given by [2]. The chance is high that his paper won't solve the initial question in complete, but it provides additional literature and some keywords to identify relevant literature.

  • [1] Nishimori, Masato, Takeshi Saitoh, and Ryosuke Konishi. "Voice controlled intelligent wheelchair." SICE Annual Conference 2007. IEEE, 2007.

  • [2] Dmitrieva, H., and Kirill Nikitin. "Design of Automatic Speech Emotion Recognition System." Proceedings of the workshop on applications in information technology. 8-10 October, 2015. 2015.

Manuel Rodriguez

Posted 2019-05-15T20:40:40.257

Reputation: 1

I did mention my bachelor thesis but it's rather a practical project. I am effectively programming this application so it's not a literature question. – M. Benamar – 2019-05-15T22:47:57.133

@M.Benamar You would like to store an audio stream in a database, use an AI software to monitor the user's voice and capsule the code in an API, but doesn't reference to existing literature? Tell me more about the project. – Manuel Rodriguez – 2019-05-16T09:27:57.300

The audio analyzing is already done by the mobile app. The mobile app calculates the frequencies and saves them to my database. What I need to do with AI is simple maths/statistics to find out patterns with the data in my database. – M. Benamar – 2019-05-16T10:41:32.950