A Speech Corpus (or Spoken Corpus) is a database of speech audio files and text transcriptions of these audio files in a format that can be used to create Acoustical Models (which can then be used with a Speech Recognition Engine). ISIP's Switchboard database is a good example of this.

A corpus is one such database. Corpora is the plural of corpus (i.e. it is many such databases).

There are two types of Speech Copora:

(1) Read Speech - which includes

(2) Spontaneous Speech - which includes:

© 2005-2009 VoxForge; Legal: Terms and Conditions

SourceForge.net Logo