John Robinson's pages on
Research
INTRODUCTION
Visual Info Engineering
IMAGE CODING
Quincunx / BTPC
Object-based coding
Error resilience
Primitive-based coding
IMAGING HUMANS
Face feature space
Fast face tracking
Facial image coding
Facial image enhancement
AUGMENTED REALITY
Wearables
Perspective registration
Video Augmentation
The WristCam
RECENT PUBLICATION
Patents
Journals
Conferences
RESOURCES
Media Tech Resources
RETURN TO:
Userport homepage
York homepage
|
|
Selected recent journal papers
J A Robinson, "A software system for laboratory
experiments in image processing", IEEE Transactions on Education,
Vol 43, No 4, November 2000, pp 455-459.
Laboratory experiments for image processing courses are
usually software implementations of processing algorithms, but students of
image processing come from diverse backgrounds with widely differing software
experience. To avoid learning overhead, the software system should be easy
to learn and use, even for those with no exposure to mathematical programming
languages or object-oriented programming. The "Class Library for Image
Processing" (CLIP) supports users with knowledge of C, by providing three
C++ types with small public interfaces, including natural and efficient
operator overloading. CLIP programs are compact and fast. Experience in
using the system in undergraduate and graduate teaching indicates that it
supports subject matter learning with little distraction from language/system
learning.
J A Robinson, N Gosset, A Druet, "Video Compression
with Binary Tree Recursive Motion Estimation and Binary Tree Residue Coding",
IEEE Transactions on Image Processing, Vol 9, No 7, July 2000, pp 1288-1292.
Binary Tree Predictive Coding (BTPC) is an efficient
general-purpose still-image compression scheme, competitive with JPEG for
natural image coding and with GIF for graphics. We report in this paper
the extension of BTPC to video compression using motion estimation and
compensation techniques which are simple, efficient, non-linear and
predictive.
The new methods, Binary Tree Recursive Motion Estimation and Coding
(BTRMEC),
and Binary Tree Residue Coding (BTRC) exploit the hierarchical structure
of BTPC, in the first case giving progressively refined motion estimates
for increasing numbers of pels and in the second case providing efficient
residue coding. Compression results for BTRMEC and BTRC are compared
against
conventional block-based motion compensated coding as provided by MPEG.
They show that both BTRMEC and BTRC are efficient methods to code video
sequences.
J A Robinson, Y Shu, "Zerotree Pattern
Coding of Motion Picture Residues for Error-Resilient Image and Video
Transmission",
IEEE Journal on Selected Areas in Communications, Volume 18, No 6, June
2000, pp 1099-1110.
This paper describes a compression scheme
for difference-image
residues in video coding. Structured spatial patterns are used to map
residue pixel values into a quadtree structure, which is then coded in
significance
order with the SPIHT algorithm. Thus the wavelet coefficient values of
standard zerotree coding are replaced by untransformed (but carefully
positioned)
residue pixel values. The new zerotree pattern coding method compresses
as well as zerotree wavelet coding and much better than DCT coding (as
in MPEG) over error-free channels. Over noisy channels, zerotree pattern
coding provides build-in error resilience, allowing transmission of
residue
data without error control overhead. A simple post-processing technique
provides additional error concealment.
L L Winger, J A Robinson, M E Jernigan, "Low-Complexity Character
Extraction
in Low-Contrast Scene Images", International Journal of Pattern
Recognition
and Artificial Intelligence, Vol 14, No 2, March 2000. pp 113-135.
There is wide application for the extraction of textual information
from low-contrast, complex natural images.
We are particularly interested in segmentation
and thresholding algorithms for use in a portable text-to-speech
system for the vision impaired.
Reading low-contrast LCD displays is the target application.
We present a low-complexity method for automatically extracting
text of any size, font, and format from images acquired by a video
camera that may be poorly focused and aimed, under conditions of inadequate
and uneven illumination.
The new method consists of fast thresholding that combines
a local variance measure with a logical stroke-width method,
and with a low-complexity statistical and contextual noise segmentation.
The performance of this method compares favorably with more complex methods
for the extraction of characters from scene images.
Initial results are encouraging for application in a robust portable reader.
D. M. Manoranjan, J A. Robinson, "Practical,
low-cost
visual communication using binary images for deaf sign language", IEEE
Transactions on Rehabilitation Engineering, Vol 8, No 1, March 2000, pp
81-88.
Deaf
sign language transmitted by video requires a temporal resolution of 8
to 10 frames per second for effective communication. Conventional
videoconferenc
ing
applications, when operated over low bandwidth telephone lines, provide
very low temporal resolution of pictures, of the order of less than a
frame
per second, resulting in jerky movement of objects. This paper presents
a practical solution for sign language communication, offering adequate
temporal resolution of images using moving binary sketches or cartoons,
implemented on standard personal computer hardware with low cost cameras
and communicating over telephone lines. To extract cartoon points an
efficient
feature extraction algorithm adaptive to the global statistics of
the image is proposed. To improve the subjective quality of the binary
images, irreversible pre-processing techniques, such as isolated point
removal and predictive filtering, are used. A simple, efficient and fast
recursive temporal pre-filtering scheme, using histograms of successive
frames, reduces the additive and multiplicative noise from low cost
cameras.
An efficient three dimensional compression scheme codes the binary
sketches.
Subjective tests performed on the system confirm that it can be used for
sign language communication over telephone lines.
M Farimani, J A Robinson, "Multipoint Activity
Sharing Services for Fast Prototyping of Groupware Applications",
International
Journal of Computers and Applications, Vol 22, No 1, January 2000, pp
23-28.
Activity
Sharing systems allow existing application programs to be displayed on
multiple computers, while their users interact simultaneously with the
shared applications. In this paper we introduce a new role
for Activity Sharing -- as a platform to provide services for sharing
visual
information and managing multiple users' input. "Aware" application
programs that share information and interact with several users
concurrently
can be developed efficiently using these services. Such programs
have a high degree of interworking capability and can be prototyped
quickly.
J A Robinson, "Engineering Thinking and
Rhetoric", Journal of Engineering Education, Vol 87, No 8, July 1998, pp
227-229.
Engineers seek optimal solutions to problems.
Often,
though, the constraints of the problem and the solution criteria are of
several, qualitatively different types, and there is no formal way to find
the best trade-offs. Nevertheless, engineers make judgments and provide
explanations to justify their choices. Engineering thinking and rhetoric
is the development of such explanations that identify and validate a
particular
solution as the best. Engineering thinking involves analogical reasoning
as well as deduction. This implies that in teaching engineering,
descriptive
case-based examples are important to the student as source analogs for
problem solving.
J A Robinson, J Fischl, B Miller, "Muscle-based
analysis/synthesis video coding", Canadian Journal of Electrical and
Computer
Engineering, Vol 23, Nos 1-2, 1998, pp 23-30.
We present
an analysis/synthesis coding scheme for visual telephony, based on a wire
frame and simulated muscle model of the human head. Anatomically accurate
muscles control facial synthesis from a stored texture map, providing
economical
representation of human facial expressions. Our main contribution is the
development of a steepest-descent analysis algorithm that accurately and
robustly tracks head movement and facial expression in a moving sequence,
in terms of muscle parameters. Coding of the head part of the "Miss
America"
sequence is achieved at below 1,000 bits/frame, with most of the data
allocated
to texture updates for the eyes and mouth.
J A Robinson, V M Liang, J A M Chambers, C
L MacKenzie, "Computer User Verification Using Login String Keystroke
Dynamics",
IEEE Transactions on Systems, Man and Cybernetics -- Part A: Systems and
Humans, Vol 28, No 2, March 1998, pp 236-241.
The keystroke
dynamics of a computer user's login string provide a characteristic
pattern
that can be used for identity verification. Timing vectors for
several
hundred login attempts were collected for 10 'valid' users and 10
'forgers',
and classification analysis was applied to discriminate between them.
Three
different classifiers were applied, and in each case the key hold times
were more effective features for discrimination than the interkey times.
Best performance was achieved by an Inductive Learning classifier using
both interkey and hold times. A high rate of typographical errors during
login entry is reported. In practice, these are usually corrected errors
- that is, they are strings which include backspaces to correct earlier
typos - but their presence confounds the use of typing-style analysis as
a practical means of securing access to computer systems.
J A Robinson, "Efficient General-Purpose Image
Compression
with Binary Tree Predictive Coding", IEEE Transactions on Image
Processing,
Vol 6, No 4, April 1997, pp 601-607.
Binary Tree Predictive
Coding uses a non-causal, shape-adaptive predictor to decompose an image
into a binary tree of prediction errors and zero blocks. Fast compression
performance is comparable with JPEG for photographs, with GIF for
graphics,
and superior to the state of the art for composite images.
|