[self-interest] embedding based prototype languages
Jecel Assumpcao Jr
jecel at merlintec.com
Wed Nov 6 23:48:56 UTC 2002
| claimed a 1 clock dispatch for an embedded Self (recently renamed
"Oliver Smalltalk" in an effort to make other Smalltalkers see that we
are really a part of their team).
On Wednesday 06 November 2002 18:26, James McCartney wrote:
> OK I've been curious about this ever since you posted it. If it is
> not a secret, I wish you would tell more about it. I assume you are
> flattening parents into a single object.
There are several tricks that work together. A simple masking function
is used to convert an object's pointer into a pointer for its map, so
we don't have to look that up in the object's first word or something.
The drawback is that sometimes an object has to change maps after a
programming change (added or removed a slot) and that means changing
the object's pointer, which means scanning the whole memory to update
any references to the old pointer.
This is a 16 bit machine so I can't afford to have many objects. So the
sources for all of an object's slots are gathered into one string which
is stored in the object's map (so it doesn't take up an object pointer
of its own). The object's "outliner" gives you a more Self-like view,
so this implementation detail is hidden. When you make any programming
changes the new string is used to build a new map, which is then
compiled and finally any of the object's children are recompiled.
Besides the source string, a map contains a table and machine language
code. This table includes a two word entry for every single selector
defined in the system. There can be up to 4096 selectors, so this per
map table can grow to be as much as 16KB!! But like I said, I have
memory to spare. The tables for all maps are always exactly the same
size and they grow as you define new selectors.
Most instructions (except for "send") are only 5 bits long, so up to
three of them can fit in a 16 bit word. With two words we can have up
to 6 instructions or 3 instructions plus a 16 bit literal. When we
compile a map, for each possible selector we fill in its two word entry
with the right code. If it is a constant slot, we do something like
pushLiteral 42 ; return
while if it is a data slot we might do
read 3 ; return
or an assignment slot
write 3 ; return
and if it is a method slot we can simply stuff the code into those two
words if it will fit, or else put the code at the end of the map and
put a jump to that code in the two words.
Note that most entries would have nothing defined for them in a given
object, so we simple compile code to call the debugger with a "message
not found" error. These entries are considered "empty" in what is
Before actually compiling a map, we copy the table and extra code from
its first parent (which already includes everything from its parents),
then everything from the second parent and so on. Then we compile the
map overwritting anything copied from the parents.
So at runtime, given the pointer to an object and a selector of a
message to send to it, we can
1) find the object's map with no memory accesses
2) fetch from 1 the first word from the entry corresponding to the
3) execute that word
> Do you also flatten the
> instance's own slots along with the parents?
If you mean that you can't tell what came from the object and what came
from the parent, you are right.
What I described above won't work in some cases: selectors that access
data slots in the parent and resends. In the first case, code is
compiled to explicitly delegate to the parent. It is like adding this
buttonColor: c = (globals ui preferences buttonColor: c)
I actually got rid of the lobby and globals so this sort of thing
wouldn't be so common (it adds a lot of overhead to every single
And about the resends, the compiler simply inlines the parent's source
for the method into the spot where the resend was.
> It would seem like you'd
> need to hash in that case, but obviously you're not since you say 1
> clock. But I'm not even sure what that means since even an array
> lookup is not 1 clock.
If we are executing a method in objA and the current top of stack points
to objB, we can have these accesses via the cache (which needs a 16
object pointer and a 16 bit field offset to find the 16 bit word we
clock N: cache ( map(objA), PCa ) ===> "send selector23"
clock N+1: cache ( map(objB), 23<<1 ) ===> "read 3 ; return"
clock N+2: cache ( objB, 3 ) ===> data slot 3's contents
clock N+3: ----
clock N+4: cache ( map(objA), PCa+1) ===> "selfsend selector92"
It is annoying that the return takes 2 clocks (N+3 and N+4). With a
little more hardware that can be reduced to 1 clock or even, when the
instruction in N+2 doesn't do a memory access (which "read 3" does, but
most don't), 0 clocks.
Anyway, what I mean by "1 clock dispatch" is that in clock N we fetched
a send instruction and in clock N+1 we were already fetching the first
instruction from the right method. This is possible because the
hardware is a simple stack machine which isn't pipelined.
More information about the Self-interest