STREAMS

The network's communicatinon with external data is performaed by streams. The following is the syntax of the command that creates a new stream in a network:
USAGE: AddStream [options] Size Mode Name Net
         Mode ::=  r | w | t | i | 0
           (read, write, target, interactive or no-action)
       Option                                                        Default
       -x         Specify file extension                             (data)
       -d         Specify default directory                          (.)
       -f filter  Data will be piped through 'filter'                (off)
       -F format  Specify file format (ascii, binary, htk etc.)      (binary)
       -S file    Load component names from rows of 'file'           (unnamed)
Streams are responsible for reading and writing data from files and converting the data to vectors of real numbers in a range suitable for unit activations in the network. Therefore each stream must know if it should read or write data, which fileformat to read/write and where to look for files.

It is possible to alter an already created stream's mode, data-directory, format and extension and the component's names using the command EditStream. This is particularly practical when you wish to run a network on new datafiles, possibly in a new directory and with a different format.

The file format of the external data can be specified using the -F option. The complete set of supported file formats is described in the Reference section File formats. The default format is "binary", a very simple format: no header, float vectors are written binary as they are represented by the paricular operating system used.

File name convention

By convention, all datafiles must have the same base filename at each time. For example, a network may read its input from a file called "data4711.input" in one stream and its targets for backprop training from another file called "data4711.targets". But it's NOT possble to use "input.data" and "output.data". This convention makes it possible to handle multiple streams in a network by specifyng, for each stream, the directory and filename extension for this particular stream. This is done with the -d and -x options. Many different training, testing and evaluation runs can then be simulated by having different base names for the files in different simulations.

File mode

The mode of the stream determines how the data should be handled. For training a network, at least one stream with mode "target" is necessary. In most cases the network also needs input streams of type: "read". The other types: "write", "interactive" and "no-action", are less common. In fact, "write" is currently not implemented and "interactive" is only an internal mode in some experimental commands. "no-action" can be used to turn off a stream without removing it from the network.

Example

Here is an example where we add one input stream to a network: Now let's assume that the network somehow detects five colors from its input (don't care about why or how). Let's say that we want to detect: green, blue red, magenta, and yellow. Create a file called for example "colors.list" with one row for each name like this: Now we can add a target stream by:  The -S option reads the file of component names for the stream. Now run this command:  Information about the two streams is printed. If you type: Only information about the output stream is printed. It should appear like this:
Stream     :  COLORS
  size     :     5
  path     :  .
  extension:  data
  format   :  binary
  mode     :  target (t)
Names            A             B
green           1.000000      0.500000
blue            1.000000      0.500000
red             1.000000      0.500000
magenta         1.000000      0.500000
yellow          1.000000      0.500000
For each component of the stream, the name is printed and then there are two columns with the parameters of the linear transformation between the internal activation values and the values of the external data.

Data transformation

Internal values are computed by the formula: x / b - a, where x is the external value and a and b are the parameters of the linear transformation. Theses are the parameters that were printed in the example above. The output stream of the example has the default transformation from the range [-1; 1] (which is the range of tanhyp units) to the range [0; 1] which is the desired range if we want to estimate probabilities. In this case, the values in the target files should be in the range [0; 1]. Now look at the input stream:
Stream     :  INPUT
  size     :     7
  path     :  .
  extension:  data
  format   :  binary
  mode     :  read (r)
     A             B
     0.000000      1.000000
     0.000000      1.000000
     0.000000      1.000000
     0.000000      1.000000
     0.000000      1.000000
     0.000000      1.000000
     0.000000      1.000000
We see that the components are unnamed and the linear transformation is the identity mapping, this is the default for input streams. However, often the data in the external files is not well suited for directly using it as unit activations. Therefore it is common practice to linearly transform the input to values roughly in the range [-1; 1]. This is done with the command NormStream and is described more in the Training section.

Linking streams to groups and units

The transformed values of the input stream components can be copied to input units of the network and transformed values of the target stream can be imposed on output units. This is called to link units to a stream. The easiest way to do this is to link the units of a group to a stream with the command LinkGroup. The number of units in the group must equal the number of components in the stream. Here is an example, extending our previous "color- network": It is a good convention to name groups linked to a stream to the same name but let the stream have capital letters as in this example.

LinkUnit is a more low-level command to link one unit to a specified component of a stream.