Instrument for Soundscape Recognition, Identification and Evaluation

  • Members: Oliver Bunting, Jon Stammers, Dave Chesmore
  • Start Date: October 2006

Soundscapes can encompass an endless variety of sounds. The types and intensity of sounds within a particular soundscape vary with location, and throughout the day. Metrics for measuring a soundscape have been dictated by what can be practically measured.

The predominate metric for soundscape measurement is the A-weighted sound pressure level measured at a point. This metric has been used since the earliest sound level meters. As a consequence, most legislative noise controls are now defined using A-weighted sound, averaged over long time periods. However, the existing system has limitations. Noise is a subjective issue, with people prefering some sounds over others. The current A-weighted metric is completely unable to distinguish between different sources, and so cannot weight them according to whether they are deemed 'good' or 'bad'. Another failure of the time averaged A-weighted metric is its insensitivity to short duration loud events, such as low flying aircraft, which may cause great annoyance but barely affect averaged noise levels. The aim of the ISRIE project is to develop an instrument capable of separating out sound components from within a soundfield and automatically classifying them. 

This project is in collaboration with the consultancy arm of the Institute of Sound and Vibration Research based at Southampton University. ISVR aim to analyse existing legislation, and to evaluate the contribution ISRIE could make to future drafts of noise standards and planning guidance documents. Partners based at Newcastle University are modelling and developing an ad-hoc wireless network to monitor soundscapes over wide areas.

 

Source Seperation

The problem of separating multiple sources from mixtures is not a new one, the problem being first considered around 1953. The aim is to extract from a mixture the component sources, with no knowledge of what the sources are, or how they where mixed together. This is often refered to as blind source separation, or perhaps more convivially as the cocktail party problem, likened to listening to only one person at a noisy cocktail party. Many methods have evolved in the effort to solve the cocktail party problem, each applying some assumption about the underlying signals. For this application, the number of sources in a soundscape can quickly outnumber the number of sensors used to capture the soundscape. This rules out many traditional separation methods, such as ICA. Instead, we place an assumption that the sources are sparse in some domain, and use this sparseness as the basis for the separation algorithm. By using a B-format microphone, we can calculate the direction of arrival of sources. With knowledge of the 3D position of the sources, we've developed an new algorithm to separate out the sources.

Knowledge of the sources location in the recording is not always known however, and so we are working on a novel method using neural networks to pinpoint a noise source against a noisy background.

 

Signal Classification

The signal classification part of the ISRIE project aims to develop novel techniques for analysing and recognising single-source audio signals.  The output of the analysis will be a statement of which sound category the signal belongs to. Some key sounds have been identified that are to be recognised within the urban soundscape based on current noise legislation requirements in the UK.

 

 

The soundscape is split into 3 main categories: Anthrophony, those sounds made by humans or human presence; Geophony, naturally occurring sounds; and Biophony, sounds caused by the presence of animals.

 

Initially it was thought a standard classification system setup could be used. These typically consist of a feature extractor and a classifier (and sometimes some post-processing to account for anomolies). The feature extraction technique being employed is Time-Domain Signal Coding (TDSC). This is a purely time-domain analysis method which has shown excellent results in previous applications relating to bio-acoustic studies. Compared to frequency-domain techiniques it is computationally inexpensive making it ideal for a real-time system. So far, the outputs of the TDSC feature extractor have been classified using a very simple Self-Organising Map neural network. At first the results looked inconclusive. However, when analysed as a histogram of SOM class frequency it was noticed that similar sounds gave a similar output. Therefore it is thought that the TDSC/SOM classifier could be used as a pre-processor to a further classification stage. At present it is thought that the final classifier may some form of statistical analysis technique. The diagram below shows how this system will be constructed.

 

This work has so far been presented at two conferences; the Institute fo Acoustics Spring Conference 2008 and Acoustics '08 in Paris

 

 

Back to the Top