Android Speech Processing API Review

Recently I got a chance to work with Android Speech Processing API. Our team developed an app called Meeting1 which will take meeting minutes.

If the app is turned on,

  1. The app listens to the meeting and converts the speech into text
  2. Then the text is sent to the server where it is consolidated into minutes at the end of the meeting
  3. In a multi-user mode, Many devices are used in parallel to process the speech and the server will eliminate duplicate context while finally giving out the meeting minutes to the orgainzer
  4. Meeting minutes can be shared by the organizer afterwards by email or salesforce technologies (This app was developed for the DreamForce Hackathon competition)

Android Speech Processing API :

I would say that Android Speech Processing API is at early stage. Below is my review of the api

  • Speed of speech : It did a poor job of converting speech under normal speed. But even if you slow your speech by 20%, it improved the efficiency a lot
  • Context : Again if the context is about ‘YOU’ or a ‘LOCATION’ or any other things that you can use in google now like weather, sports, tv shows. It did a better job. But it did a terrible job if the context is changed to a particular topic like ‘Android’, ‘America’, ‘Washing Machine’ or even for reading excerpts from any book
  • Length of Speech : It can determine a a maximum of 12 – 18 words in a single attempt. In our case, we have to repeatedly call the api in a loop to make continuous speech processing
  • Noisy Background : As expected noisy background led to a poor job
  • Accent : Surprisingly, android does better for different accents. (Google has done a great job of training it for a wide variety of accent)

Given all this limitations which will/can be improved, the speech processing api can be used as a COMMAND to operate your app like to get Driving directions, find the best book, filtering or sorting of results. (Since this field is rapidly improving, you can expect major progress sooner)

To counter the limitations, we have implemented the following techniques in our app.

  • Continuous loop of the api to get more text converted at a go
  • Context : Allow the user to correct things and learn from it locally in the server. Use multiple devices for simulatenous conversions of speech (I am not sure if this would really make a difference in the long run but it did for us)

Finally, if you want to take a look at the code of our app, you can use https://github.com/anooshm/meeting1_android