NAME: Lab2Targ
SYNOPSIS:
Lab2Targ [options] LabelListFile InputSize InputFile
DESCRIPTION
The command Lab2Targ converts phonetic label files to activation target
values in a file format that can be read by streams of a network.
The names of the phonemes and their order is specified by the file:
'LabelListFile', a text file with one phoneme name per row. When Lab2Targ
reads a label file, it compares the labels of the label file with the
phoneme names. If the label is found, the respective target value is set
to 1.0 for all frames covered by that phoneme. All other targets get the
value 0.0. Use the same file 'LabelListFile' when creating the output units
of the network to be sure that the phonemes are in the right order (see
the -S option of the AddUnit command).
The number of target values per frame is given by the number of rows in
'LabelListFile', but to create a target file, the command also needs to
know the number of frames of the utterance. Therefore you need
to give the datafile of one arbitrary other stream of the network
(for example an input stream) and the size of this stream. That's the
arguments 'InputSize' and 'InputFile'. Typically, you use the Melfib command
to create the input feature files first and then run this command to create
the output targets. In that case, 'InputSize' is the number of filters and
InputFile is the filterbank feature file. Following the convention
of the toolkit, all data files for the same utterance have the same basename
the filename of the created target file is constructed from 'InputFile'.
OPTIONS
-S
|
This option changes the interpretation of the argument 'InputFile'.
With the -S option specified, the inputfiles are taken from file names from
the rows of the script file 'InputFile'. Otherwize 'InputFile' is itself a
filename.
|
-x ext
|
Sets the file extension for output target files. The default extension is:
"targ".
|
-d dir
|
Sets the directory for output target files. The default is the current
directory.
|
-q ext
|
Specifies the file extension for label files. The default extension is:
"lab".
|
-p dir
|
Specifies the directory for label files. The default is the current
directory.
|
-Q ext
|
Specifies the file extension for input files (no default extension).
|
-P dir
|
Specifies the directory for input files. The default is the current
directory.
|
Options for fileformats
-F format
|
Sets output format for target files. 'format' is: binary, ascii, htk,
etc.). See the reference section File formats
for details and a complete list of supported parameter file formats.
|
-L format
|
Specifies label file format. 'format' is: mix, htk, etc.).
See the reference section File formats
for details and a complete list of supported label file formats.
|
-I format
|
Specifies file format of input files. 'format' is: binary, ascii, htk,
etc.). See the reference section File formats
for details and a complete list of supported parameter file formats.
|
Options specifying target generation
-e win
|
Suspend normal target generation and generate instead a parameter file
with only one target per frame that's 1.0 at segment borders 0.0 in the
interior of segments and 0.5 in windows centered ar borders. The windows
have their width specified by 'win' in frames.
|
-f matrix
|
Suspend normal target generation and generate instead a feature
representation of the phonemes. the translation from phoneme to features
is given by the file 'matrix' with one row per phoneme and the target
values specified by the values respective row, i.e., the number of features
is equal to the number of columns of the matrix.
|
-i label
|
Implicit label. If, for a frame, no target is > 0.5, then the phoneme
with name 'label' is implicitly "on" and gets the target 1.0 (or the value
given by the -1 option).
|
-D num
|
Delay all targets 'num' frames.
|
-1 high
|
Value for phonemes that are "on". The default is 1.0.
|
-0 low
|
Value for phonemes that are "off". The default is 0.0.
|
-l scale
|
Label time scale (samples/frame). The default is 160, corresponding to a
sample rate of 16kHz and a frame step of 10 ms.
|
-o offset
|
Label time offset (samples). Can be used to shift phoneme boundaries if
the feature representation is not correctly aligned (if frames don't start at
the correct sample). Should not be necessary if
Melfib is used.
|