[self-interest] compact encoding

Sat Aug 28 18:30:42 UTC 1999

I decided to check out if the compact encoding I proposed a while
ago (with two bits per instruction mixed in the the literals) was
a good idea or not.

So first I got me a list of all the methods in the system (actually,
their mirrors):

  enumerating all enumerate asList copyFilteredBy: [|:e|
                    e isReflecteeMethod]

This yields a list of 26320 elements in the normal Snapshot. Then,
in this list, I evaluated the following expression to see how
the coding would effect each method:

  | d <- dictionay copyRemoveAll |
  do: [ | :m. delta <- -4 |

      "delta starts out with -4 since we will no longer need a
       bytecode object for each method, and each bytevector object
       has 4 words of overhead"

      delta: delta - (m codes size /+ 4).

      "here we also discount the bytes themselves, coverting them
       to a rounded up number of words"

      delta: delta + (m codes size /+ 15).

      "on the other hand, the instructions now add words to the
       literal vector. But now we pack them 15 to a word"

      m byteCodesDo: [ | :bci. :op. :lit |
          delta: delta + 1.

          "we suppose that every single instruction will need
           exactly one literal. Note that in the case of pushSelf
           that originally didn't need a literal, now we have
           changed it to implicitSend 'self' which does need a
           literal"

          op = bytecodeFormat opcodes resendOp ifTrue: [delta: delta+1].

          "by converting resends to use primitives, we will need
           an extra literal for them"

          op = bytecodeFormat opcodes return ifTrue: [delta: delta-1].

          "return instructions don't need a literal, so we correct
           our mistake here"
      ].
      delta: delta - m literals size.

      "since we want to know how many words are added to the literal
       vector, we have to discount how many were in there before"

      d at: delta Put: (d at: delta IfAbsent: 0) + 1.

      "count yet another method with delta amount of memory
       words added"
  ].
  d

So now we have a 70 element dictionary that has as keys the number of
words that my enconding scheme would add to Self and as values the
number of methods that will increase by that amount. Some simple
expressions will reveal that:

number of methods that increase or stay the same:    959
memory increase due to these methods:               8200 words
number of methods that decrease in size:           25361
memory decrease due to these methods:             102380 words

total memory gain due to new encoding:             94180 words

Since that is almost half a megabyte, it might turn out to be
a good idea after all.

Another space saving measure (I haven't tested this one to see
if it is any good) would be to allow singleton objects to be
their own maps. Only when a new object is copied for the first
time would we bother to separate it from the map. That might
save some memory for all these methods.

-- Jecel