“Optical Music Recognition: Stroke Tracing and
Reconstruction of Handwritten Manuscripts”
Kia
Ng
University of Leeds
kia@kcng.org
Nowadays, the computer is an important instrument in music. It can not only
generate sound (audio synthesis) but is also able to perform a wide range of
time consuming and repetitive tasks, such as transposition and part extraction,
with speed and accuracy. However, a score must be represented in a
machine-readable format before any operation can be carried out. Current input
methods, such as using an electronic keyboard, are laborious and require human
intervention. Optical Music Recognition (OMR) provides an efficient and
automatic method to transform paper-based music scores into a machine
representation.
The potential benefits of an Optical Music Recognition system were recognised
over thirty years ago. A robust OMR system can provide a convenient and
time-saving input method to transform paper-based music scores into a machine
readable format for widely available music software, in the same way as Optical
Character Recognition (OCR) is useful for text processing applications.
In this paper, we present a brief survey of Optical Music Recognition
developments and currently available commercial packages for printed music
scores, followed by an introduction to handwritten manuscript recognition with a
discussion of the obstacles associated with it. We then discuss our framework
design and low-level pre-processing modules, and illustrate the modules with
example inputs and outputs. Our prototype takes a digitised music-score grey
image (300 d.p.i. with 256 grey) as input. An iterative thresholding method is
used to obtain a threshold value, and the image is binarised. Using the black
and white image, the skew of the input image, usually introduced during
digitisation process, can be automatically detected by reference to the
music-typographical features of the roughly parallel stave lines, and the image
is deskewed by rotation. This is follow by layout analysis to determine the
general normalisation factors including the stave line thickness.
Stave line thickness and inter-stave-line spacing are used as normalisation
factors, as stave lines form a grid system for musical symbols, of which most
are related to the geometry of the staves: for example, the height of a note
head must approximate the distance between two stave lines plus the thickness of
the two stave lines. Hence, the sum of average distance between two stave lines
and the average stave-line thickness form the fundamental unit used by the
classification process.
This paper focuses on our stroke-based segmentation approach using a hybrid of
edge detection technique and mathematical morphology. We make use of the edges,
curvature and variations in relative thickness to extract and segment the
musical objects from the underlying grid (stave lines) and disassemble them into
lower-level graphical primitives, such as vertical and horizontal lines, curves,
ellipses and others. These primitives are classified using a k-Nearest-Neighbour
(kNN) classifier with simple features such as the aspect ratio, normalised width
and height and other feature vectors. After recognition, these sub-segmented
primitives need to be reconstructed.
As with other forms of optical document analysis, such as OCR, imperfections
introduced during the printing and digitising process that are normally
tolerable to the human eye can often complicated the recognition process.
Musical notation is highly interconnected, and features may connect horizontally
(for example beams), vertically (for example chords) or sometimes be overlaid
(for example, slurs cutting through stems or bar lines). Furthermore, when
symbols are grouped (beamed), they may vary in shape and size: for example,
consider the shape of isolated semiquavers and the many possible appearances of
four-semiquaver groups. To resolve any ambiguities, contextual information is
required, for example, a dot, classified by the kNN classifier, could be an
expression sign or a duration modifier depending on its relative position with
respect to a note-head nearby.
After classification and reconstruction, basic musical syntax and a number of
high level musical analysis techniques are employed to enhance the recognition,
the reconstructed results are re-examined in the light of the analysis and
corrections and enhanced guesses are made if necessary. We attempt to detect
global information such as the key and time signatures and use them to provide
evidence in the detection and correction of possible mis-recognition. It is
clearly important that the OMR system should accurately detect the time
signature of a piece or section of a piece of music in order to produce an
accurate representation of the musical score.
Many factors in the input source (for example, resolution, contrast, and other
inherent complexities) could influence the performance of the automatic
recognition and transcription process, especially for handwritten manuscripts.
In order to provide flexible and efficient transcriptions, we are designing a
graphical user-interface editor with built-in basic musical syntax and some
contextual intelligence to resolve uncertainties, assist the transcription and
output. The default output format is currently set to ExpMIDI, which is
compatible with the standard MIDI file format, and is capable of representing
and storing expressive symbols such as accents, phrase markings and others.
We believe that the application of domain knowledge essential for complex
document analysis and recognition and it is particularly important in
hand-written-manuscript recognition since it takes years of experience for a
trained copyist or engraver to intelligently decipher poorly or inconsistently
written scores.