[self-interest]Self implementation details

Thu Oct 5 17:32:22 UTC 2006

MilanVandrovec wrote:

> I'm designing a network protocol where I need to serialize objects, and I'm
> trying to use some Self ideas.
> I've been studying the memory representation of objects and I can understand
> almost everything, except for the "virtual function pointer", the fourth word
> of a map. What does it do, and how does it look? (I guess it's an array of
> pointers to native functions, but I don't know what they do).

See page 175 of Craig Chamber's "The Design and Implementation of the
Self Compiler, an Optimizing Compiler for Object-Oriented Programming
Languages" dissertation. The virtual machine is a huge C++ program and
this pointer makes maps look like regular C++ objects (see "vtable" in
any explanation of typical C++ implementations).

> Another question is how the Literal array of Bytecode object looks like? Is it
> an array of SmallInts/tagged pointers?

Actually, a method object has a bytecode vector (with the raw bytes) and
a literal vector (tagged pointers, including small integers and floating
point numbers).

> How are the closures implemented and where are they stored?

They are faked. As long as normal execution is happening Self just uses
the regular processor stack. When some error occurs or some other
condition allows the programmer to "see" what the system is doing then
the system creates activation objects as needed to hide the actual
stack. The debugger is implemented to only deal with these activations
so as far as a programmer can tell these linked lists of activation
objects are how the system implements closures (which are in theory just
cloned from the method objects themselves, which are very closure-like).

> And why is there only one literal array? I think it serves two functions,
> a) method names, and b) literals. No element can serve both functions.

It could, but this would be so rare that it really isn't worth worrying
about:

..... perform: 'perform:with:' with: twoElementVector ...

> So splitting it into two arrays would save some INDEX-EXTENSION opcodes,
> while requiring some contant overhead. Am I right, and have the designers
> decided that it's better the current way?

The extension bytecodes are *very* rare, so eliminating some of them
would have almost no impact at all in the system. I have experimented
with the opposite - merging the bytecode and literal vectors (but not
how it is normally done in Smalltalk) since then you save one pointer
and one object header per method. See

http://tech.groups.yahoo.com/group/self-interest/message/299

> Also, how is the Bytecode array of Bytecode object ended? Is there a length
> field somewhere, and when all the bytecodes are interpreted, the top of
> the current stack is pushed into the caller's stack?

Exactly. The bytecode vector is just a kind of byteVector (like strings)
which includes its length in bytes in its third word (its fourth word
points to the actual bytes which live in a separate memory area from
objects so the garbage collector doesn't have to skip over them).
Running out of bytes is interpreted as a local return like you said, but
if this is in a block we might execute a non local return bytecode and
never actually get to the end. Sending the _Restart primitive is another
way of not reaching the end.

I hope this clears up some things for you. There is a *lot* of
information in the two thesis listed in the "papers" section of the Self
web site.

-- Jecel