2006/01/04

Processing dolphin whistles in Seadragon v. 1.0

Here is a summary of the process and data flows from cetacean sound to human interface, i.e., the c-to-h flow (c2h), and some references to the h-to-c flow (h2c), in Seadragon; c2h involves underwater signal acquisition and recognition, while h2c involves human input and emission of signals underwater.

There are 3 major nodes in the backbone subsystem in Seadragon: c, c2h, h. Each node can be deployed on a single host (e.g., a PC) or on the same host as another node. In version 1.0, these 3 nodes are deployed on a single host. The nodes exchange data using text (UTF8) containing xml tags. When the nodes are deployed on different hosts, then the data exchange takes place over TCP/IP sockets, and when two nodes are on the same host then the data exchange takes place within Java objects and between different threads, not involving sockets.

The 3 backbone nodes design, chosen a few years ago, allows us to easily have an underwater system composed of three or two hosts, one for the handheld human interface (hosting the h node) and the other device hosting the c2h node and the c node, as described on another post on this blog. Such a system is feasible with off-the-shelf parts today (or shortly, with some testing and debugging). The Seadragon software is configured to run on multiple hosts by using properties in files that it reads at startup. For the proposed underwater system, the same software would be used for an h node on its own host and for the c and c2h nodes on another host, these two installations just use different properties at startup. This feature, among others, is given by the generic Leafy API. This multi-host design will also allow us to easily use the more powerful processors that will be required when we process signals that are more complex than tonal whistles, such as those the complex signals used by larger cetaceans (and other species such as Elephants), and also mixtures of whistles and clicks used by smaller cetaceans. This design will also be useful when we add complex multi-signal structure recognition (e.g., real-time grammatical theories analysis) on top of single signal recognition (e.g., converting a whistle to a textual identifier).

The c2h flow between backbone nodes is: c to c2h to h.

The h2c flow is: h to c2h to c.

The flow between backbone nodes is the same whether the nodes are hosted on different hosts or on the same host. Seadragon also supports other nodes than backbone nodes and these are for peer-to-peer networks, either human peer-to-peer or cetacean peer-to-peer networks. The cetacean p2p has not been fully implemented but an human-side p2p has been tested, including hosts which are cell phones.

The c node is in charge of the cetacean interface: it emits underwater sounds to cetaceans and it acquires underwater sounds. In the h2c flow, it receives data from c2h (in text form), then converts it to voltage levels representation (numbers) and then to actual voltage (analog) and the goes to a hydrophone (e.g., a piezo-electric cristal) that converts the voltages to vibrations (sound). In the c2h flow, it acquires sounds as voltage levels (numbers), using FFT, it converts 1024 voltage numbers to a single frequency value (Hz or cycles per second) and sends the data (a single frequency value) to the c2h node for processing, i.e., attempt at recognition.

The c2h flow summary: sound --> c node: hydrophone --> voltage --> analog-to-digital --> voltage levels --> FFT --> frequency value --> send text (single frequency value) --> c2h node: assembly of frequency values into a series (i.e., a whistle) --> pattern matching --> signal object in lexicon (unrecognized acquired whistle) --> send text --> h node: writing the text to the human user in the msg window.

The other flow is h-to-c, human to cetacean (h2c), and it is similar to the reverse of the c2h flow but does not involve frequency pattern matching because the human user can only emit a whistle which is already present in the lexicon and the human user has to use the unique text name of the whistle. One could say that there is the simpler text name matching in this flow, but no frequency matching.

The most complex part, as far as code structure is concerned, is the *assembly of frequency values into a series (i.e., a whistle)* in the c2h flow. This involves, for example, the recognition of the start and the end of a whistle and the completion of the whistle, prior to comparing it with signals in the lexicon (a fancy name for Seadragon's whistles database). This process must be extremely efficient and it took me many months to fine tune it because the quantity of this data in real time is huge (10 and 40 per second now) and I would like to process even more, ideally maybe 100 frequency values per second, when off-the-shelf PCs are fast enough.

An overview of the pattern matching (aka. signal recognition): It is performed in the c2h node. Once an incoming whistle's start and end have been determined and we have all the intermediate frequency values (not trivial), then the frequencies of the incoming whistle are compared with the frequencies of the signals present in the lexicon. A score is tallied for each comparisons. If the score is outside the acceptable limit than this match is abandoned and the incoming whistle is compared with another whistle in the lexicon. Out of this process a best match is obtained or no match. If we have a match, then the name of the matching whistle from the lexicon is sent to the h node and displayed in the *msg* window. If no match, then the incoming whistle is given a unique name (by the system), the whistle is added to the lexicon, and the name is sent to the h node and displayed.

A detailed description of the pattern matching process between two whistles will be published later.

serge

0 Comments:

Post a Comment

<< Home