[self-interest] bytecode formats (was: Squeak and Self)

Thu Jan 14 19:28:31 UTC 1999

Stefan Matthias Aust wrote:
> We'd need slot objects, slot array objects and byte array object - which
> are all already supported by Squeak. The only difference is the map
> reference instead of the class reference.  However, I'm no expect for the
> ObjectMemory class and I've only a vague idea how the whole ObjectMemory
> works.

You are probably right that Self objects could be the same format
as Squeak objects with no problems at all. ObjectMemory is a lot
more complex than it needs to be in order to save space (several
different header formats, for example).

> >>2) copy Compiler and patch it so it will parse Squeak
> >>   sources into Self objects and bytecodes (this
> >>   involves throwing out a lot of stuff, but see step 4)
> 
> What kind of stuff?  Typically, a Smalltalk VM needs to support the
> following kinds of instructions:
> 
> SMALLTALK              SELF
> 
> push self              SELF
> push slot #            SELF SEND i_<slotname>
> push temp #            SELF SEND t_<tempname>
> push literal #         LITERAL #

I wouldn't add the "i_" and "t_" prefixes, though. If you define a
temporary variable with the same name as an instance variable in
a method, you have no way to access the instance variable at all.
Which Self does automatically.

> Shared variables are acccessed using variable binding objects to which
> #value and #value: are sent. I think, typical instruction sets have special
> instructions here, but this is only an optimization.  The same is true for
> "push constant" type instructions.

No need for this - shared variables (pool dictionaries, global
variables and so on) can be handled by making the class objects
have parent slots pointing to simple objects with the right slot
names. Then SELF SEND <global variable name> will do the trick.

> store slot #           SELF SEND i_<slotname>:
> store temp #           SELF SEND t_<tempname>:
> 
> Store doesn't pop the stack. Two other instructions, dup and pop are
> typically used to optimize cascading and multiple assignments. There's no
> direct way to express this using the SELF bytecodes, however that's no
> problem as the stack will be always cleaned up when the method execution
> has been completed.

I think Mario's Smalltalk simply ignores cascades. They can emulated
easily enough with hidden temporary variables:

   makeSilly
    "create a new initialized silly object"
      ^ Silly new center: 0 at 0 ;
                  color: Paint black;
                  border: 2;
                  pen: Pen new;
                  yourself

would become:

   makeSilly = ( "create a new initialized silly object"
        | cascTemp1 |
        cascTemp1: global_Silly new.
        cascTemp1 center: 0 at 0.
        cascTemp1 color: global_Paint black.
        cascTemp1 border: 2.
        cascTemp1 pen: global_Pen new.
        cascTemp1 yourself
   ).

You can always use "global pen" instead of "global_Pen". The important
thing is to make it different from "self pen"!

> send msg, argnum       [SELF] SEND msg (sends to self must be treaten special)

No - this will give you the wrong results. You want this for messages
to self in Smalltalk instead:

  PUSH_SELF
  SEND <msg>

About the control flow bytecodes, I was counting on removing this
from the parser (Smalltalk compiler) and having Jitter put them
back in. The total complexity would be roughly the same, though
performancewise it is not interesting to move complexity to runtime
like this.

Neither the NON_LOCAL_RETURN nor the PUSH_SELF bytecodes need their
index field (always zero, currently), so up to 62 new bytecodes
could be added to Self. I've looked into this (the idea was to
make the bytecodes complete enough that primitives could be written
in them) and also into getting rid of a few of the 8 bytecodes. The
PUSH_SELF bytecode can easily be replaced with SELF_SEND 'self', so
it isn't really needed. The two resend bytecodes could be replaced
with primitives since they are pretty rare both statically and
dynamically (and tend to be used in initialization methods). So I
had the idea to merge the literal vector and bytecodes so I wouldn't
need the EXTEND_INDEX bytecodes either. We would need two bits:

    00  NON_LOCAL_RETURN
    01  PUSH_NEXT_LITERAL
    10  SEND_NEXT_LITERAL
    11  SELF_SEND_NEXT_LITERAL (or end of method marker, if no more
                                literals are available)

The very first literal would now be the 15 first instructions. With
this scheme, when the same literal is used twice it must appear
twice in the literal vector. This isn't as common as you might
think, however, so this idea might have some merit.

So the method

   print = (x print. '@' print. y print)

which is normally coded as

   literals:  0) 'x'
              1) 'print'
              2) '@'
              3) 'y'

   bytecodes: 0) SELF_SEND 0
              1) SEND 1
              2) PUSH_LITERAL 2
              3) SEND 1
              4) SELF_SEND 3
              5) SEND 1

could become:

   literals:  0) 11 10 01 10 11 10 11 00 00 00 00 00 00 00 00 00
              1) 'x'
              2) 'print'
              3) '@'
              4) 'print'
              5) 'y'
              6) 'print'

When you take into account header words, this second version is
the same size or shorter even though it has the nasty repeated
literals (which aren't very common, as I noted above).

Now this would be a nice format for MPEG-4, wouldn't it ;-) ?

-- Jecel

------------------------------------------------------------------------
eGroup home: http://www.eGroups.com/list/self-interest
Free Web-based e-mail groups by eGroups.com