[self-interest] Re: compact encoding

Stefan Matthias Aust sma at netsurf.de
Sun Aug 29 13:07:55 UTC 1999


Jecel Assumpcao Jr wrote:

>I decided to check out if the compact encoding I proposed a while
>ago (with two bits per instruction mixed in the the literals) was
>a good idea or not.

I think, this was your encoding:

    11  SELF_SEND_NEXT_LITERAL (or end of method marker, if no more
                                literals are available)

Instructions and literals both stored in one 32-bit word array.

I don't remember what your idea was when you've more than 15 instructions,
but I think, you could use the following algorithm:

You actually don't need the "00" instruction.  Non-local returns only occur
in block objects and are always the last instruction.  You could encode in
the block object whether it's a block with "^" non-local return or a normal
block.  Now you could use "00" as mark for the end of the instruction
stream.  If you now reach the 16th instruction and it's no "00", you will
use the word following the last used literal as source for the next
instructions.  Here's some (pseudo) Java-code to illustrate this. Note that
instructions are stored "the wrong way":

int ip = 0;  // instruction pointer
int lp = 0;  // literal pointer
loop:while (true) {
  int iw = method.words[ip];
  for (int n=0; n<16; n++) {
    int i = iw & 3;  // instruction
    iw >>>= 2; // always shifts-in 0 and doesn't propagate sign bit
    switch (i) {
    case 0: break loop;
    case 1: 
      int literal = method.worlds[++lp];
    ip = ++lp;
if (method.hasNonLocalReturn()) // ...
else //...


However, I wonder how your idea compares to this variant (which I wanted to
use for my mySelf interpreter):

The idea is to minimize the number of literal references, which need 4
bytes each time.  One need to identify the 63 most common literals,
selectors and/or primitive selectors.  These are directly encoded.  Only
otherwise a literal vector is used, encoded into the byte vector similar to
the above idea.

Each instruction is byte-sized, which also helps to efficiently interpret
them, I think.  Everything is stored in 32 bit words.  Upto four
instructions are followed by zero or more (at most four) literal
references.  Each instruction encodes a type in 2 bits and carries a 6 bit

0 special instruction
 - normal return
 - non-local return
 - push self
 - push special literal (nil, true, false, -1, 0, 1, 2, 10)
 and these are useful for interpreters
 which optimize a few control flow methods...
 - drop stack 
 - dup stack
 - conditional branch
 - unconditional branch
 - push next block
 and still 48 instructions free!

1 push literal
 - push next literal
 - push special literal (1..63)

2 stack send
 - send next literal
 - send special literal (1..63)
   (probably things like #traits, #at:, #clone, #copy, #size, etc.)

3 implicit self send
 - send next literal
 - send special literal (1..63)
   (same as with 2 send)

Either of the return instructions end the method's (or block's) execution.
This means, the shortest possible code is "push self", "normal return".

The method "clone = (cloneSize: 10)" whould be encoded as "implicit self
send"+#cloneSize:, "push special"+10, "normal return" - still one word
(assuming that cloneSize: is one of the special literals).  Jecels encoding
would need two words - as mine would without the special literal encoding.

It's difficult for me to guess what would save more space.

In any case, I agree that Jecel's encoding is an improvement over the old


>Another space saving measure (I haven't tested this one to see
>if it is any good) would be to allow singleton objects to be
>their own maps. Only when a new object is copied for the first
>time would we bother to separate it from the map. That might
>save some memory for all these methods.

I don't think that this saving is worth the additional time.  I don't know
the original self system, but I'd guess that oddballs aren't that common.


Stefan Matthias Aust  //  Bevor wir fallen, fallen wir lieber auf.

More information about the Self-interest mailing list