A Speech Corpus (or Spoken Corpus) is a database of speech audio files and text transcriptions of these audio files in a format that can be used to create Acoustical Models (which can then be used with a Speech Recognition Engine). ISIP's Switchboard database is a good example of this.
A corpus is one such database. Corpora is the plural of corpus (i.e. it is many such databases).
There are two types of Speech Copora:
(1) Read Speech - which includes
(2) Spontaneous Speech - which includes:
© 2005-2009 VoxForge; Legal: Terms and Conditions