Regularity

Gerry Wolff gerry at sees.bangor.ac.uk
Fri Jan 8 04:05:32 PST 1999


Dear Detlef and mail list members,

Thanks again for your several points. Perhaps I have been too much
preoccupied with trying "walk before I try to run" and have not given
enough attention to the kinds of issues you raise. That said, I have
given some thought to those kinds of issues and will say something below
(after quoting from your last message). I'll take it in two stages: "How
'functional' regularity may be related to 'relatively frequent
occurrence of patterns'" and "How a learning system might
invent/discover the concept of addition".

HOW 'FUNCTIONAL' REGULARITY MAY BE RELATED TO 'RELATIVELY FREQUENT
OCCURRENCE OF PATTERNS'

> > This idea goes hand-in-hand with the working hypothesis
> > that *all* kinds of knowledge might be analysed in terms
> > of the principle which is used in all 'standard' methods for
> > compression: "identify relatively large patterns which
> > repeat 'relatively often'..."
> 
> I wish you could widen this hypothesis. As I tried to show in "Regularity",
> there might exist regularities (I think, you agree, knowledge can be treated
> being something regular) which cannot been found looking for repeated
> occurrence of patterns only. And there may arise some problems from the
> attempt modelling such "functional" knowledge on machines which are
> specialised in repeated patterns. If the structure of a model does not match
> the structure of the process to be modelled, it is an awful procedure to
> reflect the process in the model "turning knobs" only.
> 
> > In this kind of material, a regularity is anything that
> > repeats 'relatively often'.
> 
> "Repeated occurrence of entities in space/time" is a sub-set of regularity.
> It will result in a specialised class of functions
> An ... Fn (see "Regularity") and in a specialised kind of detector.
> (It might be attractive reflecting on how this function class could be
> described mathematically. Certainly, some NN's mathematical apparatus would
> match.)
> 
> Humans highly prioritise this "repetitive" regularity for biological
> reasons. Our vision, e.g., will isolate a moving object from the background
> by its repeated occurrence in the scene. Subconsciously, we prefer this kind
> of regularity because we must not "think" about it, it was "built in" by
> evolution or is learnt early in life.
> 
> Detecting "functional" regularity, which I suggest to not ignore, is much
> more difficult. We do not have a tuned detector at hand (as in the case of
> repetitive regularity). Our compression daemon must tune a detector instance
> (induce the rule) for a given environment snapshot. This takes massively
> longer than detecting repetitive regularities, applying pre-configured
> detectors. You feel tired after a while because it consumes large amounts of
> "CPU". That's why it takes a lot of self-discipline to force oneself to this
> kind of cognitive activity.
> 
> I wish I could see a way making an SP machine induce
> XYZ   Z=X+Y
> ---
> 54
> 21
> 76
> 
> ("inventing" addition) from
> XYZ
> ---
> 549
> 213
> 76D

Algorithmic Information Theory defines randomness in terms of
compressibility: if a body of data can be expressed by a computer
program which is shorter than the data, then the data is compressible
and is not random. If no such program can be found then, pending the
discovery of a program which can compress the data, we may regard the
data as random.

In these terms, a progam (or 'function') which compresses some data may
be regarded as an expression of 'regularities' in the data. This seems
to correspond with the kind of 'functional' regularity you are
discussing.

Why should this have any connection with the concept of regularity as
"relatively frequent repetition of patterns"? I'll try to identify the
things that suggest to me that it does:

1 If you look at computer programs (in almost any high level language)
with the right 'spectacles' it is possible to recognise programming
devices which are essentially the same as the devices used in 'standard'
techniques for data compression. For example:

* Isolating a function or sub-routine and giving it a name is very much
like the idea of identifying a repeating pattern and giving it a 'tag'
or 'code'. If a programmer finds that he or she has repeated a 'chunk'
of code in two different places, this is often a spur to pull out the
chunk and make it into a named function. This can be referenced
('called') from different parts of the program. Doing this makes the
program shorter.

* If we make a 'while' loop or a 'for' loop or a 'repeat ... until' loop
we avoid having to write out the code for each turn round the loop. This
can save a lot of space. It is essentially the same as 'run length
coding' in standard compression techniques.

* In a similar way, the use of class hierarchies with inheritance can
save a lot of space. This is essentially the same as a technique which
is sometimes called 'schema plus correction'.

I have made these kinds of points in Section 4.5 of my article about
"Computing, Cognition and Information Compression".

2 What is the difference between a 'well-structured program' or a
'badly-structured program'?

* Michael Jackson said that a 'well-structured' program is one that
'models' the structure of its input data and/or its output. For the case
we are considering (representing a body of data in compressed form),
there would be no input and the output should be the given body of data
in uncompressed form.

* In a similar way, it is recognised that, to be 'well-structured', an
object-oriented program (using classes and inheritance) must 'model' the
'real world' objects which are the source of inputs and/or outputs for
the program.

These principles are now widely accepted amongst software engineers.

3 Concepts like 'well structured' and 'model' are intuitive. What have
they got to do with information compression and the idea of "relatively
frequent repetition of patterns"? Here are some points that suggest that
they are connected:

* I think it is generally acknowledged amongst software engineers that a
'well structured' program is likely to be shorter than a
badly-structured program. I think Michael Jackson made this point
explicitly and it seems to be generally acknowledged amongst
object-oriented programmers. [There may be times when a program will run
faster if repeated applications of a chunk of code are written out and
not 'called' but, I think, this does not alter the basic principle that
well-structured programs are shorter than badly-structured programs.]

* It is not too hard to see that compression is greatest if the 'chunks'
which are identified in functions, loops or classes are relatively
frequent. For a given size, a named function which is used only 2 or 3
times will yield less compression than one which is used hundreds of
times.

* If the data which is to be compressed is this:
'ABCDABCDPQRABCDPQRPQRABCDPQRPQRPQRABCDABCD' then a program which is
modelled around these two chunks - 'ABCD' and 'PQR' - will yield more
compression than one which is modelled round chunks like 'DABC' or
'QRABC'. The first two are more frequent (6 and 6, respectively) than
the second two (2 and 3, respectively) - I think I have counted right!

4 There are many parallels between the organisation of a grammar for
natural languages (in formalisms like Context-Free Phrase-Structure
Grammar or Definite Clause Grammars) and the organisation of a computer
program. In fact Definite Clause Grammars are basically Prolog programs
with some 'syntactic sugar'.

4.1 That being the case, principles which apply to the 'induction' or
'discovery' of grammars are likely to carry over to the process of
forming well-structured programs.

4.2 Prominent amongst these principles is the recognition of patterns
which occur with a 'relatively high frequency'.

4.3 I recall now that an early version of the SP model (SP6) was
designed to do this kind of thing. It was/is limited to rather small
examples because the search method is horrendously inefficient. My
article called "Towards a new concept of software" (on my main Web page)
describes how it is possible to induce a 'program' ('function') using a
'regularity detector' based on the idea of searching for relatively
frequent patterns.

To summarise a bit:

* Programs or functions can be seen as devices for expressing data in a
compressed form.

* They do this best if they are 'well structured'.

* This seems to mean that the internal structure of the program is
modelled round patterns in the data or output which occur relatively
frequently.

HOW A LEARNING SYSTEM MIGHT INVENT/DISCOVER THE CONCEPT OF ADDITION

I think this needs to be tackled in three stages:

1 How can a learning system learn a concept of number?

2 How can addition be performed by ICMAUS?

3 Given a concept of number, how can a learning system learn the concept
of addition?

For questions 1 and 3, we shall assume that the learning is done by
unsupervised induction, not instruction by a teacher: the learner has
access to a set of positive examples and infers the concept from that.

1 How can a learning system learn a concept of number?

1.1 To make things simple, lets assume that the learner is to learn
numbers expressed in 'unary' notation (0 = 0, 1 = 1, 2 = 11, 3 = 111, 4
= 1111 etc).

The learner needs to see 'patterns' each of which associates some
non-symbolic representation of a number (eg a collection of apples) with
the corresponding symbolic representation. The learner may be presented
with the patterns like this:

  0
A 01
AA 011
AAA 0111
AAAA 01111
etc

('A' stands for 'apple')

Let's assume that the learner has managed to isolate each pattern from
its neighbours (either because it knows the conventions that there is
one pattern on each line or by some 'chunking' process as discussed
above).

Where is the repeating pattern within a set of patterns like those
above? What repeats is the discontinuous patten 'A ... 1'. I think I am
right in saying that the recursion can be captured with an 'SP' pattern
something like this:

# A # ; 1 ;

Given this pattern, together with a pattern representing '0' (something
like '# 0 ;'), any pattern like those above may be 'parsed' by multiple
alignment something like this:

  A   A   A   A   A    0   1   1   1   1   1
  |   |   |   |   |    |   |   |   |   |   |
  |   |   |   |   |  # 0 ; |   |   |   |   |
  |   |   |   |   |      | |   |   |   |   |
# A # |   |   |   |      ; 1 ; |   |   |   |
    | |   |   |   |          | |   |   |   |
    # A # |   |   |          ; 1 ; |   |   |
        | |   |   |              | |   |   |
        # A # |   |              ; 1 ; |   |
            | |   |                  | |   |
            # A # |                  ; 1 ; |
                | |                      | |
                # A #                    ; 1 ;

[IF THIS ALIGNMENT, AND SIMILAR ONES BELOW, DO NOT DISPLAY PROPERLY, IT
IS PROBABLY NECESSARY TO SAVE THE MESSAGE AS 'PLAIN TEXT' AND VIEW IT
WITH AN ORDINARY TEXT EDITOR.]

It should be possible to do similar things for binary numbers, decimal
numbers etc. Here is a partial answer to defining binary arithmetic:

It is quite easy to define binary arithmetic if we ignore 'semantics'
(ie the objects being counted). Here is a definition:

# 0 ;
# 1 ;
# # ; # ; ;

With this definition we can parse a binary number like this:

      1     0         1     1 
      |     |         |     |
    # 1 ;   |         |     |
    |   |   |         |     |
    |   | # 0 ;       |     |
    |   | |   |       |     |
    |   | |   |     # 1 ;   |
    |   | |   |     |   |   |    
    |   | |   |     |   | # 1 ;
    |   | |   |     |   | |   |
  # #   ; #   ; ;   |   | |   |
  |             |   |   | |   |
  |             | # #   ; #   ; ;
  |             | |             |
# #             ; #             ; ;

Likewise, we can define the 'semantic' side something like this:

@ %
@ A %
@ @ % @ % %

and 'parse' a collection of apples like this:

      A     A       A
      |     |       |
    @ A %   |       |
    |   |   |       |
    |   | @ A %     |
    |   | |   |     |
  @ @   % @   % %   |
  |             |   |
  |             | @ A %
  |             | |   |
@ @             % @   % %

I am not clear at present how to combine the semantics and the syntax so
that a collection of apples of a given size is assigned its correct
binary number. Answers will be gratefully received (and acknowledged if
used in future publications).

To summarise, the number system itself is highly redundant and this
redundancy manifests itself as recursive repetition of patterns. 

2 How can addition be performed by ICMAUS?

Again, to keep things simple, consider adding a 1-bit binary number to
another 1-bit binary number.

In this case, the 'function' can be defined as a table like this:

N1 N2   C R

0  0  : 0 0
1  0  : 0 1
0  1  : 0 1
1  1  : 1 0

Here, N1 and N2 are the two 1-bit numbers to be added, R is the result
and C is the 'carry'.

With a table like this, any addition may be performed by the formation
of an alignment like this:

0 1 :       - this is the 'input' to the addition
| | |
0 1 : 0 1   - the two digits on the right are the 'output' of the
addition.

(Why is the ':' symbol needed? If it is not there, there is too much
ambiguity about the 'correct' alignment of symbols.)

I am not going to attempt to do it here, but I think it is possible to
create 2-bit adders, 3-bit adders etc by a recursive 'cascade' of the
1-bit adder.

3 Given a concept of number, how can a learning system learn the concept
of addition?

In the case of the 1-bit adder, learning addition simply means learning
the table shown above. I think it is not too hard to see how a process
which is sensitive to patterns which have a relatively high frequency
should enable to the learner to pick out each row as a discrete 'chunk',
especially if, in every occurrence, the pattern were prefixed by the
symbol 'add' or '+'.

How can this leaning be generalised so that the system can add numbers
containing 2, 3 or more bits?

At present, I can't spell this out in detail but it seems to me quite
likely that, if the learner already knows the number system including
its recursive feature, it should be possible to apply this recursion so
that the 1-bit adder can be used repeatedly to perform the addition of
numbers with 2, 3 or more bits.

WOULD ANYONE LIKE TO TRY TO WORK OUT HOW THIS MIGHT BE DONE??? For
binary arithmetic it may be possible to work with a definition of the
syntax (like the one above) and ignore the semantics. As before, any
answers will be gratefully received and acknowledged if used in future
publications.

CONCLUSION

I have tried to describe some of the ideas which seem (to me) to support
the working hypothesis that 'functional' regularity may not be so very
different from other kinds of regularity and that learning functional
regularities probably needs a system which can detect patterns which
occur relatively frequently.

As usual, comments, reactions etc will be very welcome.

Best wishes,

Gerry



More information about the Casc mailing list