[self-interest] bytecodes

Tue Oct 31 22:35:15 UTC 2000

On Tue, 31 Oct 2000, Marko Mikulicic wrote:
> Jecel Assumpcao Jr wrote:
> >     5  readLocal - access local slot
> >     6  writeLocal - change value of local slot
> 
> why ?
> Why the compiler cannot figure out what is a local slot and inline the
> access, instead of using the parser ?
> Is it to speed up the compiler ?

I don't think there are any advantages for a compiler, only for an
interpreter.

> >     7  lexicalLevel - change what "local" means for previous
> >                             instructions
> 
> ??
> What are the possibilities ?
> Are these some kind of registers for frequently accessed objects or
> are they used for local slots in the method activation ?

By "previous" I meant the readLocal and writeLocal instructions, not
the ones before this one in some method. If you have something like:

          someMethod = ( | tmp <- 9. r <- 7 |
                 [ r < 10 ] whileTrue: [ | a <- 3 |
                         r: r + a.
                         r > tmp ifTrue: [ | z <- 0 | tmp: a + z ].
                 ]
          )

This isn't supposed to make any sense at all. But if we look at the
bytecodes inside the block used in 'ifTrue:'

          81  = 0x51   readLocal 1
         113 = 0x71   lexLevel 1
          80  = 0x50   readLocal 0
          32  = 0x20   send 0 "+"
         114 = 0x72   lexLevel 2
          98  = 0x62   writeLocal

where lexical level 0, slot 1 is 'z', level 1 slot 0 is 'a' and level 2
slot 2 is 'tmp'.

This idea should be familiar to people used to Algol based languages
(like Pascal) or Scheme. C doesn't have anything like this.

> >    11 branchIndexed - tos is an index into a "branch vector"
> 
> Which is the literal of the bytecode ?
> Is it a self vector ?
> What contains this vector ? smallInts as in the bytecodes 8,9,10

It seems to be a normal self vector containing smallInts.

> >      1  pop - eliminates the tos
> 
> I imagine it can be used when multiple expressions are used ("some code .
> something ").
> But, since methods should be small, I see no advantage.  Where is it used ?

Exactly as you suggested - one for each "." separating expressions in a
method. Though the methods are small, the lack of "pops" would cause an
interpreter to leave garbage on the stack (tinySelf 1 does that, for
example). Since you throw away a method's stack when it returns, it
doesn't seem to matter. But if you write an interpreter like the
original Digitalk Smalltalk/V that uses the hardware stack, this could
become a problem.

Note that a compiler has to do dataflow analysis on the method and can
find out for itself which values are unused  even without "pop"
bytecodes, which is why Self didn't use to have them.

> Is the Java in Self emulator available to the public ?
> I have readed something about. Is it at the base of the HotSpot java VM ?

See http://www.sun.com/research/kanban/

My impression is that Pep was just a proof of concept. HotSpot was
derived from Animorphic's Java implementation. But others in this list
know much more about it than I do.

> > For those who missed it, I had made a proposal which used only 4
> > bytecodes (0 = push literal, 1 = send, 2 = selfSend, 3 =
> > nonLocalReturn) and used primitives for resends.
> 
> Power of simplicity :-)
> I think I missed it. Some considerations:
> [...]
> I think bytecodes must encode in the most efficient way the behaviour of a
> method; they must not follow the phylosophy of self step by step.

Sorry - I shouldn't have called them "bytecodes". In my scheme there is
only the literal vector, and all multiples of 16 (starting with 0) are
a smallInt containing 15 2 bit opcodes. The literals are used in
sequence so there is no need of an index field or an index extension
instruction. The "push self" bytecode is replaced with an
"implicitSelfSend 'self'" instruction. This works since all method
objects have a ':self*' slot. The literal vector for the value method
in the block in my example would look like:

      0: 10 10 01 10 00 00 00 00 00 00 00 00 00 00 00 00
      1: 'z'
      2: 'a'
      3: '+'
      4: 'tmp:'

Though this encoding can have repeated literals, it actually uses up
less memory than the normal one.

> I'm interested to see how many "index" bytecodes are in the 4.1.2 world as
> opposed as in the 4.0 world,
> and also the mean length of methods in the two systems. Can anyone try to get
> this infos ?

Self 4.0:

   mean length = 184835 / 26369 = 7.00956 bytes
   mean literals = 141237 / 26369 = 5.35618
   index bytecodes = 3233

Self 4.1.2

   mean length = 356743 / 37376 = 9.54471 bytes
   mean literals = 152172 / 37376 = 4.07138
   index bytecodes = 15386

So while there are more bytecodes (5 times the number of index
bytecodes), there are less literals (the readLocal and writeLocal
bytecodes don't used them, while the implicitSelfSelf does) for a net
saving of 3.66 bytes per method.

> What are the advantages of the 4.1.2 bytecodes. Is it because of the
> interpreter ?

Exactly.

> I have implemented the VM using 8 bytecodes. Do you think it could be helpful
> use the 4.1.2 sheme ?

Only if you have an interpreter or if you want to be able to read in
snapshots created by Self4.1.2.

-- Jecel