[self-interest] bytecodes

Wed Nov 1 16:46:51 UTC 2000

Marko Mikulicic wrote:
> Jecel Assumpcao Jr wrote:
> > where lexical level 0, slot 1 is 'z', level 1 slot 0 is 'a' and level 2
> > slot 2 is 'tmp'.
> 
> ah. I understand. This is only to allow fast indexing through local slots, without
> literals.

Exactly. Urs did several comparisons between ParcPlace Smalltalk and
the Self NIC (the Non Inlining Compiler, and also speculated on a Self
interpreter) in his thesis. He saw that the main performance difference
between the NIC and an interpreter is the use of message sends for
local slot access (page 31). These new bytecodes fix this.

He also found out that the NIC would be 2.5 times faster if it inlined
the "if" and "while" messages and blocks like Smalltalk does. The
branch bytecodes would fix this (if they were used).

> But does it depend too much on the implementation ?
> In some paper I have readed that the lowest level at wich Self code can access is
> bytecodes and

Yes - there are messages to mirrors that let you see the bytecodes and
literal vectors. And since they are regular Self objects, you could (in
theory)  change them from Self code. There are primitives that will let
you dump the compiled code (on the Sparc, for example) to the standard
output, but Self can't "see" this or change it in any way.

> in some other paper that the system tries hard to mantain the illusion that what
> is going on is what the user expects.
> For example, dynamic deoptimization and debugging. The original bytecode was quite
> abstract and looked good
> also when one thought of a block as a real object ... the bytecodes was so
> abstract that they didn't depend on the lexical context
> of the block.

Right - using normal inheritance (delegation for the picky people) as
variable scopes was simply brilliant, even if dynamically scoped Lisp
(and Logo) had something very similar.

> I think that maybe for the interpreter there could be two levels of bytecode, one
> created by the parser (the standard set) and
> one deeper, closer to the interpreter (just in time interpreting). Of course there
> is a waste of space (like for machine code) but at least
> I don't break compatibility with older snapshots or Self programs wich accessed
> bytecodes (emulators,...) only because I want to make my VM
> portable.
>  What do you think ?

Urs suggested something like this. You could include PICs and other
complications in this second set of bytecodes. The advantage of such a
solution over the current NIC is that it could easily be ported to any
machine with a C++ compiler.

It would be great if someone were to do this.

> Why should the compiler try to analyze the dataflow (quite expensive) only to save
> a couple of words on the stack ?

The compiler has to follow the dataflow to optimize register usage,
find out the types of the expressions and so on. As long as it is doing
these things, it might as well notice expressions which aren't used and
free the storage (register or stack) they are taking up.

> Cannot find any links regarding Pep.

The OOPSLA99 workshop and TPOS97 papers:

    http://www.sun.com/research/kanban/oopsla-vm-wkshp.pdf

    http://www.sunlabs.com/research/java-topics/pubs/97-pep.ps

> > Though this encoding can have repeated literals, it actually uses up
> > less memory than the normal one.
> 
> Very good idea.
> Hope you didn't patent it :-)

No. I decided to not even patent the Self-in-hardware architecture I
developed.

> Can you tell me how less memory uses.
> 
> the example above would be:
> (0 = push literal, 1 = send, 2 = selfSend, 3 =
> 
> 0: selfSend, 0
> 1: selfSend, 1
> 2: push, 2
> 3: selfSend, 3
> ----------
> 1*3 = 3 bytes

Bytecode 2 should be a "send", not a "push". And I counted 4 bytecodes
above instead of 3.

> 1: 'z'
> 2: 'a'
> 3: '+'
> 4: 'tmp:'
> ------
> 4*4 = 16 bytes
> 
> total: 19 bytes    versus yours 5*4 = 20 bytes   => you loose 1 byte :-)

Don't forget to add 4 more words for the bytevector (header word, map
pointer, size, bytes pointer). I came out 16 bytes ahead!

> I imagine that for a bit longer methods there is a gain of 1 byte (at least for 16
> codes).
> But every duplicate eats 4 bytes!

See the details in my 28 Aug 1999 message to this list:

    http://www.egroups.com/message/self-interest/299

I saved 400KB.

> I'm very interested to see what the real-world says.
> Have you implemented this on something that can read the self4 world ?

Not yet, though it would be trivial to write a Self method to save the
image in this format.

> Very interesting! Can you please also tell me the mean size of the literals (I
> suppose the "mean literals" above
> is the number of literals per method) to estimate the percentual saving of 3.66
> bytes per method ?

You save 4 bytes per pointer that you no longer have in the literal
vector plus the size of the literal (only once for each different
literal) but only if that string is not used for anything else in the
system. If you have a local slot called 'a', you might still need the
cannonical string 'a' for something else (in this case this is sure to
happen since this string also stands for what would be the character $a
in Smalltalk). It seems very complicated to calculate how much you will
save by eliminating the literals.

> I thought to stay more or less compatible but I think I won't follow that.
> Maybe it won't be bad if we could share the images between OpenSelf and Self/R.

That would be great, but as long as we can read in each other's textual
Self source it would be a good start too.

> What have you implemented until now, Jacel ?

Nothing since tinySelf 1, unfortunately. I have been doing a lot of
designing, however.

-- Jecel