STREAMS
The network's communicatinon with external
data is performaed by streams. The following is the syntax of the
command that creates a new stream in a network:
USAGE: AddStream [options] Size Mode Name Net
Mode ::= r | w | t | i | 0
(read, write, target, interactive or no-action)
Option Default
-x Specify file extension (data)
-d Specify default directory (.)
-f filter Data will be piped through 'filter' (off)
-F format Specify file format (ascii, binary, htk etc.) (binary)
-S file Load component names from rows of 'file' (unnamed)
Streams are responsible for reading and writing
data from files and converting the data to vectors of real numbers in a
range suitable for unit activations in the network. Therefore each stream
must know if it should read or write data, which fileformat to read/write
and where to look for files.
It is possible to alter an already created
stream's mode, data-directory, format and extension and the component's
names using the command EditStream. This is particularly practical
when you wish to run a network on new datafiles, possibly in a new directory
and with a different format.
The file format of the external data can
be specified using the -F option. The complete set of supported file formats
is described in the Reference section File
formats. The default format is "binary", a very simple format: no header,
float vectors are written binary as they are represented by the paricular
operating system used.
File name convention
By convention, all datafiles must have the
same base filename at each time. For example, a network may read its input
from a file called "data4711.input" in one stream and its targets for backprop
training from another file called "data4711.targets". But it's NOT possble
to use "input.data" and "output.data". This convention makes it possible
to handle multiple streams in a network by specifyng, for each stream,
the directory and filename extension for this particular stream. This is
done with the -d and -x options. Many different training, testing and evaluation
runs can then be simulated by having different base names for the files
in different simulations.
File mode
The mode of the stream determines how
the data should be handled. For training a network, at least one stream
with mode "target" is necessary. In most cases the network also needs input
streams of type: "read". The other types: "write", "interactive" and "no-action",
are less common. In fact, "write" is currently not implemented and "interactive"
is only an internal mode in some experimental commands. "no-action" can
be used to turn off a stream without removing it from the network.
Example
Here is an example where we add one input
stream to a network:
CreateNet my_net my_net.rtdnn
AddStream 7 r INPUT my_net.rtdnn
Now let's assume that the network somehow
detects five colors from its input (don't care about why or how). Let's
say that we want to detect: green, blue red, magenta, and yellow. Create
a file called for example "colors.list" with one row for each name like
this:
green
blue
red
magenta
yellow
Now we can add a target stream by:
AddStream -S colors.list 5 t COLORS my_net.rtdnn
The -S option reads
the file of component names for the stream. Now run this command:
Display -X2 -s my_net.rtdnn
Information about the two streams is
printed. If you type:
Display -o COLORS -X2 my_net.rtdnn
Only information about the output stream is
printed. It should appear like this:
Stream : COLORS
size : 5
path : .
extension: data
format : binary
mode : target (t)
Names A B
green 1.000000 0.500000
blue 1.000000 0.500000
red 1.000000 0.500000
magenta 1.000000 0.500000
yellow 1.000000 0.500000
For each component of the stream, the name
is printed and then there are two columns with the parameters of the linear
transformation between the internal activation values and the values of
the external data.
Data transformation
Internal values are computed by the formula:
x
/ b - a, where x is the external value and a and
b
are the parameters of the linear transformation. Theses are the parameters
that were printed in the example above. The output stream of the example
has the default transformation from the range [-1; 1] (which is the range
of tanhyp units) to the range [0; 1] which is the desired range if we want
to estimate probabilities. In this case, the values in the target files
should be in the range [0; 1]. Now look at the input stream:
Display -X2 -oINPUT my_net.rtdnn
Stream : INPUT
size : 7
path : .
extension: data
format : binary
mode : read (r)
A B
0.000000 1.000000
0.000000 1.000000
0.000000 1.000000
0.000000 1.000000
0.000000 1.000000
0.000000 1.000000
0.000000 1.000000
We see that the components are unnamed and
the linear transformation is the identity mapping, this is the default
for input streams. However, often the data in the external files is not
well suited for directly using it as unit activations. Therefore it is
common practice to linearly transform the input to values roughly in the
range [-1; 1]. This is done with the command NormStream and is described
more in the Training
section.
Linking streams to groups and units
The transformed values of the input stream
components can be copied to input units of the network and transformed
values of the target stream can be imposed on output units. This is called
to link units to a stream. The easiest way to do this is to link the units
of a group to a stream with the command LinkGroup. The number of
units in the group must equal the number of components in the stream. Here
is an example, extending our previous "color- network":
AddGroup input my_net.rtdnn
AddUnit -i -u7 input my_net.rtdnn
LinkGroup input INPUT my_net.rtdnn
AddGroup colors my_net.rtdnn
AddUnit -o -Scolors.list colors my_net.rtdnn
LinkGroup colors COLORS my_net.rtdnn
It is a good convention to name groups linked
to a stream to the same name but let the stream have capital letters as
in this example.
LinkUnit is a more low-level command
to link one unit to a specified component of a stream.