Regularity

Brendan Macmillan bren at cs.monash.edu.au
Sat Jun 10 01:00:30 PDT 2000


Hi all,

>      p = 1/(the number of possible strings of
>                 the same length using the same 
>                 alphabet of symbol types)
[snip] 
> According to this formula, the probability of a regular string of symbols of
> a given length is the same as the probability of a random string of the same
> length.
> 
> This is strictly true, but will strike many people as being wrong. I think
> the reason that it seems wrong is that, in discussing these issues, it is
> easy to overlook the question of whether a given string is in its original
> form or whether it has been recoded to take advantage of regularities in the
> string.
Another reason why it may strike people as being wrong is because it is not
generally possible to compress a string, without first knowing something
about the kinds of "regularity" you expect in the string.

The problem is that any possible recoding will necessarily make some strings
longer - specifically, those strings which do not follow the pattern expected.

This is because any recoding scheme can be seen as a redistributions of
probabilities for each string.  Assigning some strings a higher probability
implies giving other strings a lower probability, as it all must add up to 1.

(An actual encoding scheme is helpful to verify this fundamental result.)

> A regular string is one that can be recoded to something shorter.  For
> example, the string "AAAAAAAAAAAAAAAAAAAA" can be recoded to something like
> "A[20]", meaning that it is the letter 'A' repeated twenty times. A random
> string is one that cannot be recoded like this.

My apologies - I'm sure this result has a name, but I've forgotten it.

Of course, it dooms neither compression nor prediction, because we usually
deal with strings in which some strings *are* more common than others.  That
is, we do know something about the kind of "regularity" expected.


Brendan
--
bren at csse.monash.edu.au                             BH:  +61 (3)  9905 1502
Email is checked daily                              Phone is rarely attended



More information about the Casc mailing list