This is the syntax of NormStream:
USAGE: NormStream [options] Net [Input] Option Default -S Treat 'Input' as an inputfile script (off) -s stream Process the specified stream (all streams) -d mult Center around mean mapping: [mean-s*mult, mean+s*mult] -> [-1,1] (min,max -> -1,1) -0 Reset linear coeff's (no Input)"Input" is a "training data" kind. By examining the external data of "Input", the program computes the normalization parameters. Often the training data is stored in many different files (in speech examples, typically one file per utterance). Then the -S option should be used. When the -S option is used, "Input" is not treated as a datafile, but as a script file holding a list of data files. In either case, it is the base name of the datafiles that matters -- the directory and the extension is taken from the information in each stream of the network (see the Streams section).
This method of specifying multiple datafiles using the -S option is shared by all training and evaluation commands in the toolkit. It is practical to make a few different script-files for: training files, evaluation files, testing files etc.
The default normalization method is to normalize each component in such a way that the minmum value is mapped to -1 and the maximum value is mapped to +1. However, this can sometimes be a bad idea, since it only considers the most extreme and often rare cases. The -d option provides an alternative to this min/max strategy. The -d option maps the mean of the external data to 0.0 and a user specified factor times one standard deviation is mapped to +/-1.0.
USAGE: BackProp [options] Net Input Option Default -S Treat 'Input' as an inputfile script (off) -s obj1 obj2 Select connections from obj1 to obj2 (all) -g gain Linear gain factor (1e-3) -m float Momentum parameter (0.9) -w factor Weight decay (off) -f num Update frequency - update every 'num' frames (Max+1) -F min max Random update after between min and max frames (off) -E Epoch updating (off) -p name Name of runtime error progress report file (none) -T level Level of detail in progress report file (0) -P name m n Update progress every m:th and net every n:th iteration -i iter Number of iterations (100) -V set stream Specify a Validation set and stream (off) -B decay num Multiply gain with 'decay' after epochs where the validation set's global error is not improved, but maximum 'num' times. (off) -e float Error Criterium (off) -d Store external data in RAM (off) -M address Send mail to 'address' when finished (off)Again, "Input" is either a data file or, with the -S option, a script file with a list of datafiles.
The back-propagation training often takes a lot of time. Typically BackProp is run in the background, sometimes for several days. In such cases (and in other cases too) I recommend that you use the -p option (or the -P option). The -p option will case output of the progress of a training session to be printed after each epoch (epoch=one iteration through all training data). Then, if something is wrong and the network isn't learning, you can stop the session and restart with new parameters.
The -f and -F options are used when working with dynamical networks. Theoretically, the back-propagation through time algorithm reqiures that for each data file, you first do the forward pass to the end of the file and then do the backward pass from the end to the beginning. But this would allow weight updating only once for each file. However, if the temporal dependencies that we expect the network to learn, have a smaller time-span than whole files, we can allow ourselves to restart the back-propagation after a smaller number of samples. For speech recognition for example, it seems sufficient to use -F 20 30, when a sample (frame) is 10 ms.
The optimal gain and momentum parameters are highly task dependent. You should not trust the default values for the -g and -m options. It can be worthwhile to experiment a little with these before starting a big training session.
The -B option defines a scheme for reducing the gain during the backprop run. A reduction will occur if the error on the validation set does not improve from one epoch to the next. The first argument after -B specifies the factor to multiply the gain with in each reduction and the second argument specifies the maximun allowed number of reductions. For example, "-B 0.5 5" means that the gain is halved for each reduction and that maximum five reductions will take place. If the error still does not improve after five reductions, the training is terminated.
The -s option provides a way to train only a selected set of connection weights. This can also be done by maipulating the connection's plasticity (see the Connections section and the command SetPlast).
For classification problems the -V option can be useful. With classification problem I mean that the stream should have one component for each class and the "choice" of the network is the component with the highest value (see also the Evaluation section). The validation set should be different from the training set. This way you can monitor in the progress-file (-p option) how the classification performance changes during the training. Then you can stop the training when the classification is starting to degrade due to over-learning the training data.