PhD research projects are currently available in the following areas. Some projects listed may also be suitable for the MSc by Research – please contact the member of academic staff involved if you are interested in the MSc by Research. Please note that funding opportunities, if available, are advertised on our Funding page.
Dr John Szymanski, email: jes1@ohm.york.ac.uk
Preliminary studies at York have confirmed that it is possible to separate a monophonic musical recording into multiple tracks by the use of prior parametric source models as part of a structured analysis stage, the output of which is then used to define the controlling parameters of a suite of adaptive filters. 'Proof of principle' separations have been carried out of up to seven simultaneous pitched (violin) notes, as well as for instrument mixtures (such as saxophone, violin, clarinet and piano), and for the extraction of individual instruments from a commercial recording, allowing the remixing of that recording. The potential for high-fidelity remastering of monophonic music, archive and film soundtrack material to stereo and surround sound is considerable. Improved algorithms will be developed using more powerful data analysis techniques and more sophisticated models of instrumental sounds. Testing will involve assessing the quality of the separation into multiple tracks of a variety of representative sample mixes.
Dr Jez Wells, email: jjw100@ohm.york.ac.uk
Previous work at York in spectral modelling of audio has focused on highly accurate single frame descriptions of signals for real-time applications. There are many possibilities for sophisticated, signal-informed audio processing with such descriptions of sound. For example, effects which vary their control parameters according to the types of component within a signal. To cater for the widest range of input sounds a thorough study is required of the extent to which spectral modelling techniques for monophonic sounds can be used to describe the homogeneity of ensembles of sounds. This study would focus on the application of measurements of non-stationarity of spectral components to estimate the interaction of ensembles of partials at similar frequencies. Can the beating behaviour give us information about the nature of the ensemble? Can we use this information to infer the parameters of instruments within the ensemble? Ultimately, is spectral modelling of ensembles feasible and useful?
Dr John Szymanski, email: jes1@ohm.york.ac.uk
A wide range of audio restoration techniques exist for handling localized degradation of audio material, by clicks or low-frequency noise transients for example. These are most commonly statistical or interpolative techniques that attempt to predict a suitable waveform to reduce the damage or to 'patch' the gap in the audio. This project investigates the more powerful approach of using multiple instrument models to assist in separating the signal into multiple channels, each representing different sound sources, so that each channel can be treated separately and then recombined to produce the overall restoration. This approach will be much more robust than trying to parameterize the complexities of a hybrid signal in a single process since, although the waveforms themselves are changing fairly rapidly, the underlying parameters that characterize the signal due to each sound source (e.g. pitch, amplitude, frequency structure, etc.) are varying at a much slower rate - the values of, and variations in, these parameters will be mixed, masked or cancelled in the original signal, but can be extracted from at least some of the separated channels. This will lead to enhanced estimates of any missing material. Testing will involve assessing the quality of the restoration of a variety of audio corruption arising from different physical sources and recording media.
Dr Andy Hunt, email: adh@ohm.york.ac.uk
Sonification is the art and science of displaying data as sound (see the web-site which Dr Hunt co-authors at http://www.interactive-sonification.org/). Current physiotherapy techniques can be enhanced by the use of monitoring the body's internal operation, for instance by gathering the electrical impulses from muscles. This can be used to aid diagnosis. Our work involves creating a real-time audio signal which can be heard by both therapist and patient, and used as biofeedback, for example in recovering from injury or stroke. This project - working alongside Physiotherapists from Teesside Rehabilitation Unit will investigate and develop methods of monitoring body signals and presenting them to clinicians and patients in real time as they are undergoing therapy. There is also the option of developing methods for the patient to continue their therapy at home using portable systems. (Dr Hunt's other projects and research overview can be found at http://www-users.york.ac.uk/~adh2/research_overview.htm which will place this work in context).
Prof John Robinson, email: jar11@ohm.york.ac.uk
The University's "immersive demonstration space" (http://www.york.ac.uk/ctc/facilities/) provides surround video and sound in a meeting-room environment. This research project is to determine how best to enhance and use the space to support collaborative design and creativity. We want to augment intellectual work along the continuum from solitary activity to rich collaboration. People will be able to move from absorbed concentration on their own work, through to interaction of various kinds that allow ideas/data/knowledge to be shared in planned, concentrated, long, short, serendipitous, or fleeting transactions. We expect that the walls/screens will change only occasionally and that sounds will be subtle and localised. Individuals working on personal devices (e.g. laptops and phones) might "project" their screens onto the walls of the space on demand and have facilities for merging views from different people. The environment might monitor what people are doing and suggest possible connections which are then visualised on the walls. The design of such an environment involves sensing, communications, pattern analysis, HCI and organisational behaviour. It is likely that the research will be conducted in collaboration with others in the University's HCI group (http://www.cs.york.ac.uk/hci/).
Dr Dave Chesmore, email: edc1@ohm.york.ac.uk
The project will be a follow-on from an EPSRC-funded project investigating the separation and identification of sounds in 3-D in a soundscape using soundfield microphones. The previous project indicated that sound separation methods based on sparse signal separation worked well for natural sounds (birds, insects, etc) and the proposed project will make use of these results to develop the methods further. It will also look at classifying the separated signals using novel time domain algorithms.
Dr Dave Chesmore, email: edc1@ohm.york.ac.uk
Dr Chesmore’s research into acoustic species identification is at the cutting edge of the computational bioacoustics field and has been successfully applied to the detection and identification of insects in the field (grasshoppers), quarantine insects in imported goods and more generalised biodiversity assessment. The project will look at the analysis and identification of complex time varying signals such as birds using time domain methods, time frequency methods and syntactic pattern recognition.
Dr Dave Chesmore, email: edc1@ohm.york.ac.uk
The identification of species such as insects can be very difficult and often requires many years of expertise. Dr Chesmore has been working on the development of novel methods for achieving automated identification using image processing and has worked on moths, bees, hoverflies and beetles. The project will continue this work and extend it to identification in the field using hand held computers.
Prof John Robinson, email: jar11@ohm.york.ac.uk
The group has recently filed a patent on a method for estimating the attributes of faces detected in pictures. These attributes include the location and orientation and facial "landmarks", such as the eyes, nose tip and chin tip, along with the age, sex, race and facial expression of the person. A visualization of the method working frame-by-frame on a video can be seen at http://www.elec.york.ac.uk/visual/jar11/kbdemo2.avi. This project's aim is to apply these methods to computational photography, linking with other work in the group on shake removal and high-dynamic-range imaging. The project will use information about facial position, pose, eye gaze and expression to control enhancement and restoration algorithms. When applied online within the camera, this will aid in capturing stills of highly-dynamic scenes of people (potentially with application to surveillance as well as photography). When applied offline, it will assist with automatic portrait touch-up and the restoration of old photos and films.
Dr Dave Chesmore, email: edc1@ohm.york.ac.uk
Much of the research in the Biological Systems Laboratory involves the development of automated identification systems for species using either bioacoustics or image processing. The application of computing to species identification is part of computer-aided taxonomy. There are a number of different application areas including pest and invasive species identification and rapid biodiversity assessment. There are also a number of different approaches which analyse signals in different ways and are incompatible with each other and cannot be integrated. What is needed is a conceptual framework for species identifiers; this will be investigated using the internet as a vehicle for delivering automated identification of species from user-supplied images or sounds (or other sources).
Dr John Szymanski, email: jes1@ohm.york.ac.uk
Advances in digital imaging mean that it is now possible to acquire large amounts of close-range, high-resolution imagery data quickly and easily. This opens up new application possibilities in areas as diverse as architectural or heritage recording, and virtual, immersive or augmented reality. This research concentrates on the challenge of 'stitching' large numbers of such images together into a single massive composite image - a seamless 'mosaic' which is accurate enough to allow a non-technical user to explore and interact with vast quantities of data. Algorithms will be developed which can handle the general case when the images may have been acquired from many different (unknown) positions and orientations and where the images are subject to significant distortions due to perspective, parallax and focal depth effects. Testing will be carried out on a variety of sample data sets, including stained-glass windows imaged from multiple positions and ranges from a remotely-controlled aerial vehicle.
Dr Jez Wells, email: jjw100@ohm.york.ac.uk
Stereo audio has been available as a consumer format for over half a century and in that time a huge wealth of spatial audio heritage has been accumulated. Tastes in the presentation of spatial information vary over time and from person to person. Also, there are a number of different speaker arrays over which this audio is reproduced. If the audio contained within these recordings can be separated into different directions then there is much greater flexibility over reproduction. Obtaining more than two audio streams from just two inputs is a significant challenge for all but the simplest acoustic scene. Work currently under way at York is investigating highly adaptive time-frequency representations of audio and how these can be combined with prior information about the recording to produce high quality directional streams that can enhance reproduction of and interaction with audio from stereo microphone arrays. There are a number of projects available in this area ranging from the application of Computational Auditory Scene Modelling to investigating the extent to which the variability of individual microphones affects the separation process.
Prof David Howard, email: dh@ohm.york.ac.uk
This project will look at the use of iPad technology for distributed technology amongst music performers thereby providing communication links between the devices in addition to the audio and visual communication between performers. The work will look at the advantages and potential for new ways of interacting during performances using such systems.
Dr Andy Hunt, email: adh@ohm.york.ac.uk
Sonification is the art and science of displaying data as sound (see the web-site which Dr Hunt co-authors at http://www.interactive-sonification.org/). Film music and sound is a highly internationally accepted way of giving 'information' subliminally to an audience - often portraying a sense of place or mood, and inflecting what is perceived visually. In contrast, the relatively young discipline of sonification commonly uses simple sounds to portray information. This project will investigate how knowledge of film music and sound can be used in the field of Auditory Displays. (Dr Hunt's other projects and research overview can be found at http://www-users.york.ac.uk/~adh2/research_overview.htm which will place this work in context).
Prof John Robinson, email: jar11@ohm.york.ac.uk
The group has pioneered the application of conditional probability density estimation to describing human faces. Our methods allow a system to estimate the age, sex, race and facial expression of people in a real-time video sequence. Accuracy is acceptable for some uses but not for demanding security applications where lighting may be poor and faces partly occluded. To tackle these difficult cases, we will combine our methods with new approaches in face landmark extraction, motion analysis and cascade classification. This project will explore these broad alternatives before focusing on promising avenues that will result in high-accuracy, high-speed robust face description.
Prof John Robinson, email: jar11@ohm.york.ac.uk
Many projects in Engineering and Computer Science involve global
multivariate optimisation. Classical methods include genetic algorithms, simulated annealing and particle filters. One way to think
about global optimization is as density estimation, where Markov chain
Monte Carlo methods attempt to learn the properties of an optimisation
surface as if it were a probability distribution. This project is
about doing that surface characterisation in a new way. The idea is to
begin local ascent algorithms from multiple start points and then use
the information about distance and directions travelled in each case
to estimate maxima density. From this, we would make principled
choices about how many starts to attempt and stopping criteria for a
high probability of finding the global maximum. An informal statement
of the ideas behind this project is available at:
http://the-aerodrome-incline.blogspot.co.uk/2011/12/global-optimization.html.
Prof David Howard, email: dh@ohm.york.ac.uk
Today’s electronic voice synthesis is highly intelligible but rarely if ever mistaken for the voice of a human speaker (it is highly non-natural). The next key step in improving the naturalness of electronic voice synthesis (speech and singing) involves creating a virtual model of the oral (mouth) and nasal (nose) tracts that can be varied in shape dynamically according to human articulation. Steps are in progress to achieve this, but in order for it to function in a human-like manner, it requires a voice source that is as human-like as possible, and this is the focus of this PhD. The vocal folds of the human larynx will be modelled in such a way that they will vibrate when a virtual lung airstream is applied, according to the settings of appropriately placed virtual muscles to control pitch and voice quality. The output from the model will be compared to signals that monitor the life outputs from human informants, with the purpose of creating as natural a voice source as possible. This source will be linked to the physical model to establish how natural the resulting voice output is.
Prof David Howard, email: dh@ohm.york.ac.uk
When a quartet sings a capella (unaccompanied) it tends to a just temperament in individual chords. This has been accurately measured using the electrolaryngograph which is a non-acoustic device, and hence can be used on a number of subjects at the same time without cross interference. The purpose of this PhD work is to confirm pilot results on intonation drift when singing in tune with key change, and to locate and test choral a capella repertoire for which such a pitch drift can be predicted. A real-time visual display will be developed to support singers in the development of just intonation when singing together which will be harmonically informed with respect to the music itself in setting its pitch targets. The application of the display in practice will be tested.
Dr Andy Hunt, email: adh@ohm.york.ac.uk
Sonification is the art and science of displaying data as sound (see the web-site which Dr Hunt co-authors at http://www.interactive-sonification.org/). Athletes are often told they need to use certain muscles more effectively to avoid injury and to enhance performance. This can be fine when working in a gym alongside a trained therapist. But out in the field or on the track it is a different matter. This project - working alongside Physiotherapists from Teesside Rehabilitation Unit will investigate portable methods of monitoring body signals and presenting them to athletes as sound in real time as they run. (Dr Hunt's other projects and research overview can be found at
http://www-users.york.ac.uk/~adh2/research_overview.htm which will place this work in context).
Tony Tew, email: ait@ohm.york.ac.uk
Understanding speech in noisy conditions is more successful using two ears rather than one. The improvement which is observed indicates that the hearing system is doing much more than simply combining the signals arriving at each ear. Hearing aids that incorporate sophisticated binaural signal processing are becoming practical. They have the potential to emulate features of the human ear and to enhance the residual binaural hearing of people with some forms of hearing impairment.
Tony Tew, email: ait@ohm.york.ac.uk
The human hearing system uses a variety of acoustic, visual and other cues to determine the direction and distance of a sound. The binaural reproduction of sound using inadequate or conflicting cues is thought to be a major cause of unsatisfactory spatial impression. The cues interact in complex ways that need to be understood more clearly before robust spatial audio can be provided widely.
Tony Tew, email: ait@ohm.york.ac.uk
Under certain conditions, sounds which are perceptible on their own become imperceptible when mixed with other sounds. This fact is exploited in perceptual coding schemes such as the ubiquitous MP3 standard, which achieves a large reduction in stereo signal storage requirements with minimal impact on audio quality. The investigation and expansion of these psychoacoustic principles in spatial hearing holds an important key to improving the performance of spatial audio encoder/decoders.