Regularity
Sergio Navega
snavega at ibm.net
Wed Jan 27 08:20:07 PST 1999
As usual, Gerry Wolff's message is both insightful and
thought-provoking.
This is Gerry's last paragraph:
>Having said all this, I do have an open mind! It could well be that
>there are kinds of regularity that don't fit into the framework I am
>trying to construct. As I said above, the repetition idea is just a
>working hypothesis. I will keep pushing it until it 'breaks'.
Gerry, I'm also confident that your hypothesis is correct and I'm
also trying to push it to see if it can break somewhere. So far,
I also couldn't see any evidence that this will happen.
However, I do think we have to add something here to allow this
process to proceed cumulatively.
Back to the beginning:
> Gerry Wolff wrote:
>[snip]
>Well, this is just one simple example. What about z = x + y (closer to
>your example)? The table would look something like this:
>
>x y z
>1 1 2
>1 2 3
>2 1 3
>1 3 4
>3 1 4
>etc (very tedious!)
>
>As you say, it is not obvious where the repetition is here. Given just
>the table, how could the function z = x + y be inferred? Let's imagine
>that, like a mathematician, the inference/learning system already knows
>the number system. And let's imagine, just to make things simple for the
>sake of discussion, that the system knows that we are looking for a
>function z = f(x, y) and that f() is one of add, subtract, multiply or
>divide.
>
>Given this amount of help, it is not difficult to see how the function
>could be inferred: try +, -, x and / for the first row and make a note
>of which one fits (only + works here). Then do the same for the second
>row. Ditto. Likewise, for the third and subsequent rows.
>
>After we have done this for the finite number of rows that we have been
>shown, we see that the only function that works for every row is +. In
>short, + ***repeats*** on every row! Here is the repetition that we are
>discussing.
>
This is in the eye of the tornado! I guess we can find several other
examples with a similar outcome: repetition.
What I think is missing in this scheme to boot it up is something that
stands for *learning*. I mean for learning, in this context, the ability
of *improved performance* as the number of experiences of the agent
grows.
To exemplify what I mean, I'll resort to a practical example.
Suppose we want to design one agent capable of interpreting simple
images. How can we teach this agent to recognize a tree? I'll
reduce the problem here to fit the space of this message (also
because I have to workout on other details of the problem :-)
Imagine that the following pattern is a symbolic representation of
the digitization of the edge of an image of a trunk of a tree
(taken as a single horizontal scan):
a
aaa
aa
aaaa
a
aaa
aa
aaaa
a
aaa
aa
aaaa
First, try to solve this by yourself. It's not difficult, although
it's not immediate either. I'm concerned with the process we may be
using to look for regularities in this pattern. Eventually, our
system will find that this pattern is comprised of one irreducible
sequence that I will call EEL (Edge Element) but the system may
call it, say, #SYM3242:
a EEL -+
aaa |
aa |
aaaa -----+
a EEL -+
aaa |
aa |
aaaa -----+
a EEL -+
aaa |
aa |
aaaa -----+
So this pattern can be subsumed into this sequence:
EEL
EEL
EEL
which is pretty regular. Found a regularity? Give it a name.
Say #SYM9982. But we can inform the system that this is
a "concept" with a *public* name: we call it EDGE. Now we
must go on and present to the system *other* examples of
EDGES:
a
aaa
aa
aaaa
a
aaa
aa
aaaa
"EEL EEL is also an EDGE", the system thinks. His concept
will receive this added information (it will keep the exemplars
as separate instances). We move on with another exemplar:
a
aaa
aa
aaaa
a
aaa
aa
aaaa
a
aaa
aa
aaaa
a
aaa
aa
aaaa
a
aaa
aa
aaaa
a
aaa
aa
aaaa
"EEL EEL EEL EEL EEL EEL is also an EDGE".
This is the time for the system to conclude (inductively) that
EDGE is something comprised of a variable number of EELs:
[n:EEL] -> EDGE { n occurrences of EEL }
and that typical numbers for n are in the range of 2 to 6 (it
may revise this in the future).
Now imagine that the following pattern is a horizontal scan of the
central part of a trunk of a tree:
aa
a
aa
a
aa
a
aa
a
By a similar process, our system will conclude that central trunk
is composed of:
[n:CT] -> TRUNK
Where CT stands for "aa a", TRUNK is the public name (given by us)
of the internal concept found by the system.
It is time to tell the system that a horizontal scan of a tree is
composed of two edges with a central trunk in the middle:
[TREE] -> [EDGE] [TRUNK] [EDGE]
What would we have if we left our system analyze the following sequence?
a
aaa
aa
aaaa
a
aaa
aa
aaaa
a
aaa
aa
aaaa
aa
a
aa
a
aa
a
aa
a
aa
a
aa
a
a
aaa
aa
aaaa
a
aaa
aa
aaaa
a
aaa
aa
aaaa
a
aaa
aa
aaaa
Seen for the first time, this would not be obvious neither for the system
neither for us. But the system already knows about some concepts. So
besides initiating a thread doing the "hard work" (trying to find all
regularities from the start), it will also start *other* threads
in which *previously acquired concepts* are put to use. Eventually,
one of the threads (taking info from several others) will settle
discovering the following sequence:
EEL EEL EEL CT CT CT CT EEL EEL EEL EEL
[EDGE] [TRUNK] [EDGE]
------------ [ T R E E ] ---------------
Because the amount of work to reach this conclusion is much less
than the amount of work of the "hard approach" (because the
latter started from zero and because of the parallelism of
other threads recognizing elements very fast), this conclusion
will pop out much faster and the system will be able to recognize
the feature in a reasonable amount of time. I guess this is
not only efficient, but also neuropsychologically plausible.
Obviously, this "method" has dozens of unaccounted details
but I find it appealing. One of the things I like most is
its "Gestalt" aspect.
Say you're given a phone number to call. It's this one:
(321) 212-298
When you fix your eyes in that number, something odd surfaces.
After a couple of seconds, you may wonder if there's a number
missing, because your brain is used to find things like:
(xxx) xxx-xxxx
and not
(xxx) xxx-xxx
For me, this is an unconscious and automatic mechanism that
works independently of our will, alerting us of everything
that "doesn't fit" the rules. Doesn't fit means is not easily
explained by the rules we learned from previous experiences.
Some rules may be pretty fancy, with conditional situations,
specific exemplars, generalization, etc.
There is a subdiscipline of Psychology called Implicit Learning
that deals with some similar situations. One of the experiments
is done with "artificial grammars" that confirm experimentally
the ability that our brain has of unconsciously capturing "hidden
regularities". I guess this is all related and I'm pretty
confident that the ideas of cognition based on compression
are fundamentally important for the understanding of the
way we think.
Regards,
Sergio Navega.
More information about the Casc
mailing list