Fundamental Compressionist Philosophy.
Andrew Stanworth
andrew.stanworth at bigfoot.com
Fri Jun 1 06:22:38 PDT 2001
Hi Detlef,
| It may be boring, but again: We need not compress the data.
| We must compress **the theory**!
|
| (This is one of those jewel ideas. It was right under my nose.
| I was lucky to find it, and now nobody cares. Strange.)
That's not true - I do care. So I will talk a little about this.
As far as compaction goes, there is 'lossy' and 'lossless'. 'Lossy' can be
thought of as a primary round of data conditioning to render it more
amenable (i.e., more compressible) to a second round of targeted lossless
compression (this is a broad generalisation intended for convenience).
Whether you want to view lossy compression as an adjunct to lossless
compression or not, what is under discussion is reducing the size of the
data set which the algorithm has been presented with. A third approach to
compression, which I view as 'functional compression', works in a
fundamentally different way. For the sake of argument, lets say that I have
a very good pseudo-random sequence generator (PRSG) which with the
appropriate combination of seed values and counters (or whatever - this is
just for ease of explanation!). I could compact a data set by interrogating
(theoretically) my excellent PRSG for matches to sequences in my target
data. In the end, instead of the compacted data being the distilled essence
of the original data - as with ordinary lossless compression - the compacted
data now bears no direct relationship to the it (i.e., it becomes instead a
list of parameters for generating the appropriate matching sequences from my
PRSG). (as an aside, if a perfect PRSG algorithm could be constructed, and
it were possible to search it exhaustively, it would then be theoretically
possible to recompact the data generated from each completed round until
arriving at the smallest it could possibly be (at least for any single
implementation of the PRSG algorithm)).
So what? It is just a different way of doing lossless compression - it
looks different, but in essence it still does the same job - it just encodes
the compacted object sequences differently. But now consider that the
equation of a straight line is similar to my PRSG in that from a list of
parameters supplied to it, it is possible to generate many specific
sequences in a deterministic fashion - it is as though all straight lines
(for the sake of this argument no further precision is necessary) have been
compacted within this equation - i.e., the functionality of straight lines
has been compacted to its minimum (I must point out, for completeness, that
there is an overhead incurred which is often overlooked, namely the method
of uncompacting the data (i.e. just because it takes less characters/symbols
to convey to information it does not necessarily follow that it is the most
compact when the method of extraction is included in the count) - which may
oftentimes be very complex and large), which can then be used to generate
any number of straight lines often in a much more efficient way (in terms of
raw data needed). A form of compaction using this may be seen when a bitmap
picture file is converted to some form of geometrical drawing program file -
which can quite often produce significant compaction.
| We must build a theory of how the data might have been generated.
| This can be an initially largely redundant theory. A long (lookup)
| table containing all samples which have been observed so far may
| serve as such an initial 'theory':
It is my understanding that you (Detlef) are interested in using compaction
to work backwards (in a sense) from the process of functional compaction I
have already described - in that you propose to begin with the uncompacted
data and search for the function/functions which most efficiently produce it
or its components (efficient in terms of smallest size) - this is my
understanding of your compressing **the theory** - and that such functions
should contain a 'description' of some aspects of the system, or its
components (assuming the original data come from observations of such a
system). Is this fair comment (as far as it goes) or have I misunderstood
you?
It also seems to me, that in addition to this general aim, you are
attempting
to use compaction to finding more efficient functions from within the output
of a particular function - rather than trying to find a replacement for
that function from outside of it - but again I am not sure if I am following
you correctly.
| When a significantly smaller theory can be found which reproduces the
| initial set, we have
| (1) an abstract representation of the law which generated the data
| (2) a means of predicting results for samples which until know had
| never been observed!!
2 doesn't necessarily follow from 1 - you might just end up with a smaller
1 - i.e., removing obscured redundancies from it. I see that what you need
is a data set which is the partial story and not the whole story to achieve
2 - the functionality extracted from it then being able to automatically
move beyond what is 'known' from that specific data set(s). It is worth
noting the thread that just because you find a function which is more
efficient (in a compacted sense) it does not necessarily follow that you
will be able to understand/recognise any extra functionality it may spit out
(e.g., consider that many mathematicians deliberately choose problems they
view as too abstract to have practical applications (i.e., which have no
known application) - which later on prove to be crucial when a particular
class of practical application appears).
Best wishes,
Andrew.
More information about the Casc
mailing list