Biological Foundations?

Sergio Navega snavega at ibm.net
Mon Sep 14 08:32:22 PDT 1998


Detlef Morgenstern wrote:
>Dear Sergio,
>
>In our "Design a brain!" discussion you wrote:
>> Exactly. But with a subtle difference: this "brain" could
>> develop its own software, based on its interaction with
>> the world.
>
>I see two distinct levels of "software" which must be considered here.
>
>(1)  The "application" level. This is a large library of laws (rules,
causal
>dependencies) we build up/maintain during our lifetime, "experimenting with
>the world". We use these laws to model the world and predict what is going
>to happen when certain conditions (causes, assumptions) hold. Feed a law
>black box in your brain with "Cause_X1", and it will respond "Effect_X1".

>[snip]

That seems reasonable. I would just emphasize that things are a little bit
more complicated, because our world is uncertain and fuzzy. Generally we
don't have a single cause_x1 for a single effect_x1. We have a set of
causes with a probabilistic relationship between them and those causes may
give rise to a set of effects. The brain seems very good at deciding
which of the causes are relevant *in the current context*.

>
>(2)  The "firmware" level. It defines one basic operating principle of the
>brain - being able to abstract, applying some compression-like technique.
>This capability is "built-in". When a child is born, the child is
"equipped"
>with it. We need not "learn" it (and I think we cannot learn it). We
inherit
>it.
>

Yes, and that seems the essential point to understand: in which way our
neural mechanism is able to receive information from one side and produce
adaptive (and compressed) knowledge from the other.

>I agree, it is misleading to simply say "hardware", when I mean this (2)
>"inherited basic operating principle of hardware". And of course, you can
>"run" (simulate, emulate) this "firmware" principle in any other universal
>computing environment - be it hardware centered  or software centered. I
>wanted to point out, we must not forget about these two completely
different
>operation levels of brain and that a lot of AI frustration comes from
>separating one from the other and from attempting to explain intelligence
>being EITHER (1) OR (2). (Or - even worse -  (3) ONLY, see below.)


We are now in a better position to understand this, thanks to the great
advances in neuroscience. When AI started in the middle of the 1950's much
of this problem has been simplified. Even today we have attempts that do
not follow this bottom-up method (Lenat's CYC project is just one example).

>
>> The main problem is how to develop inductive machines
>> that can recognize when to stop.
>
>I see no problem at all. Induction is a transformation procedure which aims
>at increasing abstraction ratio of a "lawbase" (not a database!). Each step
>of transformation can be verified checking whether the stimulus->response
>functionality of the "lawbase" is still identical to what it was before the
>current transformation step. Abstraction ratio can be (reciprocally)
>measured in "storage resource usage" units. And induction can be stopped,
>when abstraction ratio reached a maximum (when we cannot find further
>transformation steps, which increase abstraction not losing functional
>integrity).
>

The great problem of induction is that the process of increasing abstraction
is not always sound. From a set of regular occurrences (patterns) we usually
have *dozens* of possible abstractions. Induction alone is very dangerous
because it leads to "mysticisms". On top of induction we must put other
methods (like deduction, analogical reasoning, etc) to try to restrict
the wrong "ascending paths" of the reasoning.

>> The basic principle behind induction is that everything
>> that happened several times in the past will have a good
>> chance of happening in the future.
>
>I disagree. In many cases, we induce a law, having sensed *different*
>cause->effect associations, which did not repeat at all. We want to find
out
>*what is common* to them even if they look very different. This, I think,
is
>the powerful side of induction. Limiting it to repeated (in space and time)
>occurrence of similar sensations, we throttle its power nearly to the idle.

Finding what is common means finding things that happened earlier equal to
those that happen now. Inducing "laws" is only allowed once you see enough
occurrences of the premises to justify the creation of one law. We can't
forget that induction happens not only with respect to incoming sensations
of the agent, but also between occurrences in which what repeats is the
sequence of "concepts". Concepts are abstractions that the agent makes
based on grouping (initial induction) of the sensory inputs received.
Another
level of induction is applied *over those concepts* to make one "rule"
(law).
Thus, it may appear at first that the agent is jumping from a non-reocurring
situation to a conclusion but the non-reocurring situation was itself the
result of previously similar situations of the past.

>This is, why
>(a) All known implementations of pattern recognition perform so poorly
>(b) Compression which only seeks similarities does not find abstractions
>    but leads to "binary soup".
>
>Is there some regularity in these observations?:
>0110001001000
>1101101111000
>Is it just "000" or "110" or "11" occurring repeatedly?
>
>This was one reason for me to ask 'What is regularity? How does a
>"regularity detector" work?' Is it simply "repeated occurrence of similar
>instances of something" or is it something more global?
>

You raised the most important point of our discussion.

The fundamental question here is that patterns do not happen only on
raw input signals. If you monitor with an oscilloscope the spikes of
firing neurons in our brain, you will see a sequence of pulses that
barely seem regular at all. However, these spikes may represent a cup,
as seen by our visual cortex. How is it possible to find a cup from those
bunch of spikes? It is because our neural structure applies inductive
processes *recursively* and not only on the original information but
also on abstracted concepts. See for example this pattern:

21211212122121121212

Is it regular? I made a program that discovers "concepts" within this
kind of patterns:

Original Input Pattern---------------------
21211212122121121212

Concepts Discovered------------------------
21        AA1AA2AA1AA2
12        2B1BBB2B1BBB
212       C11C12C11C12
121       2DD2122DD212
2121      E1E2E1E2
11        212F21212212F21212
211       21G2121221G21212
1211      2H212122H21212
112       212I1212212I1212
2112      21J121221J1212
1121      212K212212K212
1212      2121L122121L12
22        212112121M121121212
122       21211212N121121212
2122      2121121O121121212
221       212112121P21121212
1221      21211212Q21121212
2212      212112121R1121212

Most of the discovered concepts are not interesting. But let's take concept
'21' for a moment. If we rebuild the original pattern using the letter 'A'
to
substitute for '21' we have this: 'AA1AA2AA1AA2' (letter B is used above to
substitute for '12', C for '212' and so on).

Now, apply *the same process* to the pattern of concept A. We obtain:

Original Input Pattern---------------------
AA1AA2AA1AA2

Concepts Discovered------------------------
AA        a1a2a1a2
A1        AbAA2AbAA2
AA1       cAA2cAA2
1A        AAdA2AAdA2
A1A       AeA2AeA2
AA1A      fA2fA2
1AA       AAg2AAg2
A1AA      Ah2Ah2
A2        AA1AiAA1Ai
AA2       AA1jAA1j
1AA2      AAkAAk
2A        AA1AAlA1AA2
A2A       AA1AmA1AA2
AA2A      AA1nA1AA2
2AA       AA1AAo1AA2
A2AA      AA1Ap1AA2
2AA1      AA1AAqAA2

(here, the concept AA has been called 'a', so the rebuilt input pattern
became a1a2a1a2)

Now, we go again through another cycle and find out that a1a2a1a2 can be
seen (among a dozen of other not interesting things) as the grouping of
two concepts 'Y' where Y = 'a1a2'. This leads
us to rebuild that pattern as being just 'YY' which can be further
reduced to 'Z', where Z is obviously = 'YY'. Bingo! We have reduced that
initial pattern to the 'one concept' level. Of course, this reduction
happened because our initial pattern was *very* regular. In real world
that rarely happens. But it serves to our purpose of investigating
regularities in multiple levels. Each level shows us another set of
regularities that may even be "invisible" when we look into the original
pattern. Induction works in the same way. We often induce things not
directly from the sensory input we receive, but on *abstractions* we
assemble from those inputs. Now, lets summarize what my algorithm
discovered, but from conclusion to premises (reverse expansion):

Z
(YY)
((a1a2) (a1a2))
((AA 1 AA 2) (AA 1 AA 2))
((2121) 1 (2121) 2) ((2121) 1 (2121) 2)

Simple, isn't it? What's the lesson here? I dare to say that the lesson
here is the *path* we have taken to get there. This path is not, obviously,
unique. From our example we have dozens of other paths that conduct us to
"compressed" patterns with similar regularity.

But these paths are *very* few, if compared to the thousands
of possible expansions of all concepts. As the time passes and the agent
receives new information, the set of paths that conduct to good results
receive *reinforcements*. These paths will survive, and all the others will
vanish. With time, this set of paths will provide a modeling of the "world"
that generated the patterns (here I'm assuming that the patterns mean
something, that they came from a relatively regular and coherent world, such
as the one we live in). So, the computational cost of expanding all the
paths will reduce with time. Initially, the agent knows nothing and for
that reason it must explore all alternatives. This leads to an combinatorial
explosion that demands a lot of computational effort. The good news is that
this cost reduces dramatically, as the agent is able to use previously
acquired knowledge to simplify the tasks at hand.

This is very similar to what happens in our brain. Neural plasticity is
the growth of synaptic connections, the joining points between neurons.
They grow like branches from a tree. During initial childhood these
connections are very plastic, extending in all directions in a somewhat
random way.

As the child matures, there is a sensible reduction of this plasticity and
what remains is the "surviving paths", the connections that presented the
greatest payoff. Neural plasticity continues during our entire life,
but on an increasingly reduced scale. I think this is because after some
time the child grabs the "essence" of the things of the world, something
that
is usually called "common sense". Neural plasticity continues to exist to
support the learning of adults. The process repeats when we are presented
with an unknown subject and must study it.

The relation of these neurological facts with my experiments is still
speculative, and this is at the center of my current research.
But I think the similarity is promising.

>You might object, only after having seen one and the same cause->effect
>association several times, we may be sure "there is really some law behind
>it". Disagree again. If it occurred at least once, there was a law behind
>it. What we forget is, there was some context in which we observed the
>event. And this context provides part of the law. If we do not sense the
>context we cannot find "the law". Doing things repeatedly (in different
>context!) or demanding that similar events must be observed repeatedly is
>just to make sure we did not ignore a hidden context. It is simply for
>feeling more comfortable about our observation & prediction precision. But
>it has nothing to do with the precision of induction, which - if correctly
>applied - will be 100%. You say it a bit different, but aim at the same:
>
>> Then, this allows us to keep expectancies about the future.
>> When those expectancies doesn't fulfill, this may inform us of
>> something important. This deviation may mean that we need
>> to take into consideration additional factors in our prediction,
>> which will lead us into revising our original induction.
>
>> When this process is applied for some time, we will end up
>> with a lot of "rules" that hold with great probability. These rules
>> are the building blocks of our theories and these rules are best
>> manipulated using deductive reasoning. For me, deductive
>> reasoning can only be applied after initial application of induction.
>
>Mostly agree, but in my view, we can apply "entry level deduction" even
>after having seen only lots of cause->effect associations, still not having
>compacted (induced from) them. Once we "know" the (flat) cause->effect
>association, and we see the cause, we can deduce this very effect. Of
>course, this is deduction in idle gear. It will develop its power only on
>induced knowledge, helping predict even effects for causes never observed
>before.
>

I mostly agree with that. The point I want to emphasize is that I'm not
against using some deduction in the initial levels of comprehension, as
long as the "rules" being employed are *tentative*. That means we will
have lots of correct tentative deductions happening and the ones that
will "survive" will be those with better inductive aspect, in respect with
the remaining knowledge of the agent.

>This is the answer to the above regularity question. The law is:
>0110  0010  01000     // 06+02=08
>1101  1011  11000     // 13+11=24
>

This regularity can be discovered by the agent as long as he knows the
concepts associated with arithmetic operations. And these concepts the
agent will discover from even simpler patterns, all the way down to
counting, which seems to be the first step to mathematical knowledge.
Knowing
the basic arithmetic operations allows the agent to discover the "essence"
of this pattern:

1 2 4 8 16 32...

Granted, it is a difficult task in computational terms, but it is similar
to the process I presented earlier. Now one could ask what's the importance
of this ability?

Suppose this agent is presented with a table of experimental
data relating force, mass and acceleration. What AI must find is a *generic*
method of inducing the rules that relate these quantities, until the agent
discovers f=m*a. The situation is, obviously, more complicated in nature.
Here, we will have lots of variables to track and only a few of them will
be significant to our problem. The problem is huge, but our brain is the
living proof that it is solvable.

That's the real importance of induction and of "cognitive compression". To
help the agent perceive the initial correlations of some variables among
a bunch of data available. Intelligence, for me, is a measure of the ability
of the agent in making sense of this flood of data.

>
>> The "logicist" approach (Newell, Simon, McCarthy) proposed
>> that intelligence was just the result of logic manipulations.
>
>We should be able to describe intelligence in terms of logic manipulations.
>Foundations of intelligence cannot be illogical, or logic-free. When we aim
>at finding a "basic principle", we need not look outside logic. I think we
>must completely move the focus. Instead of permanently adding complexity to
>logic on the higher (complex) levels, we must revise its firmware,
>introducing some new "particle" symmetry.
>

Detlef, we have similar thoughts, perhaps differing in nomenclature. If by
logic manipulations you mean "computationally implementable", then we agree.
But I don't see intelligence as being the result of "logic entailment" only.
In logic, we have some premises (antecedents) and one conclusion
(consequent).
AI started with this kind of reasoning:

All men are mortal.               Men(x) -> Mortal(x)
Socrates is a man.                Men(Socrates)
So, socrates is mortal.           Mortal(Socrates)

This is known as modus ponens, and is the starting point of the logic based
approach to AI. We know that our world is much more complex and uncertain
than this.

Regards,
Sergio Navega.






More information about the Casc mailing list