SPEECH TOOLS

The NICO toolkit was developed for automatic speech recognition problems and therefore special commands are available for speech input and output. The set of tools available is not a full tool-kit for speech processing but it is enough to start experimenting. It's possible to extract filterbank features (mel or bark scaled), compute cepstrum features and velocity and acceleration coefficients. A tool for creating targets for phoneme clasifiers from phonetic label files and a simple label file editor are also available.

In phonetic classification expreriments, you have an output group of N output units -- one unit for each phoneme. The output unit with the highest activation at each time point is the network's phonetic classification.

The phonetic targets are typically specified in "label files" -- file formats that are not directly supported by NICO streams. This is text files with basically one phoneme per row. Start and end times for the phonemes are also given. See the reference section File formats for more details.

Creating phonetic target files

The command Lab2Targ converts label files to a file format that can be read by a stream in a network. This is the syntax for Lab2Targ: "LabelListFile" is a list of the phonemes to discriminate, so the number of target values per frame is equal to the number of rows in "LabelListFile" . The program also needs to know the number of samples for each utterance, therefore you need to give the datafile of one arbitrary other stream of the network (for example an input stream) and the size of this stream. That's the arguments "InputSize" and "InputFile". Following the general convention of the toolkit, all data files for the same utterance have the same basename so the filename of the created target file is constructed from "InputFile". The directories and extensions of the input file, label file and outputted target file can be specified with various options (see the manual page for Lab2Targ).

Lab2Targ can read severeal different label file formats and write different target file formats. File formats are described in more detail in the Reference section File formats.

Creating input feature files

Input features are computed with the Barkfib command: Barkfib cen extract: mel, bark or linearly spaced filterbank features and do some standard preprocessing, such as pre-emphasis. It also computes cepstrum features from the filterbank and the energy coefficient on demand. Melfib can read several different audio file formats, see the File formats section for details. The created datafiles gets the same base names as the audio files, but the file extension and output directory can be specified by command switches. See the manual page for Barkfib for more details.

The command MakeCep can be used to create a cepstrum representation from the filterbank input. This is done by adding a layer of hidden linear units to the network and connect them with the appropriate weights. Similarly, MakeDiff creates a group of units computing the first order derivative of another group by adding a new hidden layer and connect it to the old lauer with time-delay and look-ahead connections. Thus, by combining MakeCep and MakeDiff, the popular cepstrum + velocity and acceleration can be created.

In the Speech Example we put all the pieces discussed above together in a phonetic classification experiment.