John Robinson's pages on
Research

INTRODUCTION
Visual Info Engineering

IMAGE CODING
Quincunx / BTPC
Object-based coding
Error resilience
Primitive-based coding

IMAGING HUMANS
Face feature space
Fast face tracking
Facial image coding
Facial image enhancement

AUGMENTED REALITY
Wearables
Perspective registration
Video Augmentation
The WristCam

RECENT PUBLICATION
Patents
Journals
Conferences

RESOURCES
Media Tech Resources

RETURN TO:
Userport homepage
York homepage

Selected recent journal papers 

J A Robinson, "A software system for laboratory experiments in image processing", IEEE Transactions on Education, Vol 43, No 4, November 2000, pp 455-459.
Laboratory experiments for image processing courses are usually software implementations of processing algorithms, but students of image processing come from diverse backgrounds with widely differing software experience. To avoid learning overhead, the software system should be easy to learn and use, even for those with no exposure to mathematical programming languages or object-oriented programming. The "Class Library for Image Processing" (CLIP) supports users with knowledge of C, by providing three C++ types with small public interfaces, including natural and efficient operator overloading. CLIP programs are compact and fast. Experience in using the system in undergraduate and graduate teaching indicates that it supports subject matter learning with little distraction from language/system learning.

J A Robinson, N Gosset, A Druet, "Video Compression with Binary Tree Recursive Motion Estimation and Binary Tree Residue Coding", IEEE Transactions on Image Processing, Vol 9, No 7, July 2000, pp 1288-1292.  
Binary Tree Predictive Coding (BTPC) is an efficient general-purpose still-image compression scheme, competitive with JPEG for natural image coding and with GIF for graphics. We report in this paper the extension of BTPC to video compression using motion estimation and compensation techniques which are simple, efficient, non-linear and predictive. The new methods, Binary Tree Recursive Motion Estimation and Coding (BTRMEC), and Binary Tree Residue Coding (BTRC) exploit the hierarchical structure of BTPC, in the first case giving progressively refined motion estimates for increasing numbers of pels and in the second case providing efficient residue coding. Compression results for BTRMEC and BTRC are compared against conventional block-based motion compensated coding as provided by MPEG. They show that both BTRMEC and BTRC are efficient methods to code video sequences.

J A Robinson, Y Shu, "Zerotree Pattern Coding of Motion Picture Residues for Error-Resilient Image and Video Transmission", IEEE Journal on Selected Areas in Communications, Volume 18, No 6, June 2000, pp 1099-1110.
This paper describes a compression scheme for difference-image residues in video coding. Structured spatial patterns are used to map residue pixel values into a quadtree structure, which is then coded in significance order with the SPIHT algorithm. Thus the wavelet coefficient values of standard zerotree coding are replaced by untransformed (but carefully positioned) residue pixel values. The new zerotree pattern coding method compresses as well as zerotree wavelet coding and much better than DCT coding (as in MPEG) over error-free channels. Over noisy channels, zerotree pattern coding provides build-in error resilience, allowing transmission of residue data without error control overhead. A simple post-processing technique provides additional error concealment.

L L Winger, J A Robinson, M E Jernigan, "Low-Complexity Character Extraction in Low-Contrast Scene Images", International Journal of Pattern Recognition and Artificial Intelligence, Vol 14, No 2, March 2000. pp 113-135.
There is wide application for the extraction of textual information from low-contrast, complex natural images. We are particularly interested in segmentation and thresholding algorithms for use in a portable text-to-speech system for the vision impaired. Reading low-contrast LCD displays is the target application. We present a low-complexity method for automatically extracting text of any size, font, and format from images acquired by a video camera that may be poorly focused and aimed, under conditions of inadequate and uneven illumination. The new method consists of fast thresholding that combines a local variance measure with a logical stroke-width method, and with a low-complexity statistical and contextual noise segmentation. The performance of this method compares favorably with more complex methods for the extraction of characters from scene images. Initial results are encouraging for application in a robust portable reader.

D. M. Manoranjan, J A. Robinson, "Practical, low-cost visual communication using binary images for deaf sign language", IEEE Transactions on Rehabilitation Engineering, Vol 8, No 1, March 2000, pp 81-88.
Deaf sign language transmitted by video requires a temporal resolution of 8 to 10 frames per second for effective communication. Conventional videoconferenc ing applications, when operated over low bandwidth telephone lines, provide very low temporal resolution of pictures, of the order of less than a frame per second, resulting in jerky movement of objects. This paper presents a practical solution for sign language communication, offering adequate temporal resolution of images using moving binary sketches or cartoons, implemented on standard personal computer hardware with low cost cameras and communicating over telephone lines. To extract cartoon points an efficient feature extraction algorithm adaptive to  the global statistics of the image is proposed. To improve the subjective quality of the binary images, irreversible pre-processing techniques, such as isolated point removal and predictive filtering, are used. A simple, efficient and fast recursive temporal pre-filtering scheme, using histograms of successive frames, reduces the additive and multiplicative noise from low cost cameras. An efficient three dimensional compression scheme codes the binary sketches. Subjective tests performed on the system confirm that it can be used for sign language communication over telephone lines.

M Farimani, J A Robinson, "Multipoint Activity Sharing Services for Fast Prototyping of Groupware Applications", International Journal of Computers and Applications, Vol 22, No 1, January 2000, pp 23-28.
Activity Sharing systems allow existing application programs to be displayed on multiple computers, while their users interact simultaneously with the shared applications.  In this paper we  introduce a new role for Activity Sharing -- as a platform to provide services for sharing visual information and managing multiple users' input.  "Aware" application programs that share information and interact with several users concurrently can be developed efficiently using these services.  Such programs have a high degree of interworking capability and can be prototyped quickly.

J A Robinson, "Engineering Thinking and Rhetoric", Journal of Engineering Education, Vol 87, No 8, July 1998, pp 227-229.
Engineers seek optimal solutions to problems. Often, though, the constraints of the problem and the solution criteria are of several, qualitatively different types, and there is no formal way to find the best trade-offs. Nevertheless, engineers make judgments and provide explanations to justify their choices. Engineering thinking and rhetoric is the development of such explanations that identify and validate a particular solution as the best. Engineering thinking involves analogical reasoning as well as deduction. This implies that in teaching engineering, descriptive case-based examples are important to the student as source analogs for problem solving.

J A Robinson, J Fischl, B Miller, "Muscle-based analysis/synthesis video coding", Canadian Journal of Electrical and Computer Engineering, Vol 23, Nos 1-2, 1998, pp 23-30.
We present an analysis/synthesis coding scheme for visual telephony, based on a wire frame and simulated muscle model of the human head. Anatomically accurate muscles control facial synthesis from a stored texture map, providing economical representation of human facial expressions. Our main contribution is the development of a steepest-descent analysis algorithm that accurately and robustly tracks head movement and facial expression in a moving sequence, in terms of muscle parameters. Coding of the head part of the "Miss America" sequence is achieved at below 1,000 bits/frame, with most of the data allocated to texture updates for the eyes and mouth.

J A Robinson, V M Liang, J A M Chambers, C L MacKenzie, "Computer User Verification Using Login String Keystroke Dynamics", IEEE Transactions on Systems, Man and Cybernetics -- Part A: Systems and Humans, Vol 28, No 2, March 1998, pp 236-241.
The keystroke dynamics of a computer user's login string provide a characteristic pattern that can be used for identity verification.  Timing vectors for several hundred login attempts were collected for 10 'valid' users and 10 'forgers', and classification analysis was applied to discriminate between them. Three different classifiers were applied, and in each case the key hold times were more effective features for discrimination than the interkey times. Best performance was achieved by an Inductive Learning classifier using both interkey and hold times. A high rate of typographical errors during login entry is reported. In practice, these are usually corrected errors - that is, they are strings which include backspaces to correct earlier typos - but their presence confounds the use of typing-style analysis as a practical means of securing access to computer systems.

J A Robinson, "Efficient General-Purpose Image Compression with Binary Tree Predictive Coding", IEEE Transactions on Image Processing, Vol 6, No 4, April 1997, pp 601-607.
Binary Tree Predictive Coding uses a non-causal, shape-adaptive predictor to decompose an image into a binary tree of prediction errors and zero blocks. Fast compression performance is comparable with JPEG for photographs, with GIF for graphics, and superior to the state of the art for composite images.