2006/08/23

Seadragon Technical Description

Here is a summary of the process flow from cetacean sound to human interface, the c-to-h flow (c2h), and some references to the h-to-c flow (h2c).

c = cetacean (e.g., dolphin)
h = human

The Nodes:

There are 3 major nodes in the backbone: c, c2h, and h. Each backbone node can be deployed on a single host, and a pair or all three nodes of the backbone can also be deployed on a single machine (i.e., a PC, as in the configuration which is packaged in Version 2.1). The nodes exchange data using text (UTF8) containing xml tags.

There are nodes for other components than the backbone but these are not detailed here, except to mention that some of these are for supporting a peer-to-peer network (p2p) for human users, connecting a large number of people to a single backbone, which connects to one or many cetaceans.

When the backbone nodes are deployed on different hosts, then the data exchange takes place over TCP/IP sockets, and when two nodes are on the same host then the data exchange takes place within Java objects and between different threads, not involving sockets.

The 3 backbone nodes design, chosen a few years ago, allows us, for example, to have an underwater system composed of two hosts, one for the handheld human interface (hosting the h node) and the other device hosting the c2h node and the c node. Such a system is feasible with off-the-shelf parts today (or shortly, with some testing and debugging). The Seadragon software is configured to run on multiple hosts by using properties in files that it reads at startup. For the proposed underwater system, the same software would be used for an h node on its own host and for the c and c2h nodes on another host, these two installations just use different properties at startup. This feature, among others, is given by the generic Leafy API.

The Data Flows:

The c2h flow between backbone nodes is: c to c2h to h (these are nodes inside the application).

The h2c flow is: h to c2h to c.

The flow between backbone nodes is the same whether the nodes are hosted on different hosts or on the same host. Seadragon also supports other nodes than backbone nodes and these are for peer-to-peer networks, including end points which are cell phones.

Back to the data fow: The c node is in charge of the cetacean interface: it emits underwater sounds to cetaceans and it acquires underwater sounds. In the h2c flow, it receives data from the c2h node (in text form), then converts it to voltage levels representation (numbers) and then to actual voltage (analog) and these voltage levels go to a hydrophone (e.g., a piezo-electric cristal) that converts the voltages to vibrations (sound).

The c-to-h Flow:

In the c2h flow, the c node acquires sounds as voltage levels (numbers), using FFT, it converts 1024 voltage numbers to a single frequency value (Hz or cycles per second) and sends the data (a single frequency value) to the c2h node for processing, i.e., for the attempt at recognition.

The c2h flow summary: sound --> hydrophone (part of a c node) --> voltage --> analog-to-digital --> voltage levels --> FFT --> frequency value --> send text (single frequency value) --> c2h node: assembly of frequency values into a series (i.e., a whistle) --> pattern matching --> signal object in lexicon (new or old) --> send text --> h node: writing the text to the human user in the msg window.

The h-to-c Flow:

The other data flow is h-to-c, human to cetacean (h2c), and it is similar to the reverse of the c2h flow but does not involve frequency pattern matching because the human user can only emit a whistle which is already present in the lexicon and the human user has to use the unique text name of the whistle. One could say that there is the simpler text name matching in this flow, but no frequency matching.

Whistle Recognition: Pattern Matching

The most complex part of the whole thing is the assembly of frequency values into a series (i.e., a complete whistle) in the c2h flow. This involves, for example, the recognition of the start and the end of a whistle and the completion of the data in between, prior to comparing it with signals in the lexicon (in memory). This process must be extremely efficient and it took me many months to fine tune because the quantity of this data in real time is huge (10 or 40 per second now) and I would like to process even more, ideally maybe 100 frequency values per second.

Pattern matching is performed in the c2h node. Once an incoming whistle's start and end have been determined (not trivial), then the frequencies of the incoming whistle are compared with the frequencies of the signals present in the lexicon. A score is calculated for each comparisons. If the score is outside the acceptable limit than this match is abandoned and the incoming whistle is compared with another whistle in the lexicon. After this process, a best match is obtained or no match. If we have a match, then the name of the whistle from the lexicon is sent to the h node. If no match, then the incoming whistle is given a unique name (by the system), the whistle is added to the lexicon, and the name is sent to the h node.

0 Comments:

Post a Comment

<< Home