[self-interest] Re: Caching method lookup smalltalk

Thu Sep 2 20:14:10 UTC 1999

Stefan Matthias Aust wrote:
> 
> Jecel:
> >Exactly. But note that the class of "self" is different for
> >executions of the same method associated with different keys.
> >This suggests compiling different native codes for the same
> >source method, which is the idea behind "customization".
> 
> Either you didn't understand my example or I don't understand your answer
> (probably the latter, I guess).  You're talking about Smalltalk, right?
> Why the class of "self" is different and want keys do you refer to?

I was talking about Smalltalk and Self, but the problem of different
"classes" for self is, as the name of the language implies, more
important in Self.

Imagine classes A, B and C as in your example below. The lookup
cache has:

  <C, #x> -> <A>>x>
  <B, #x> -> <A>>x>
  <A, #x> -> <A>>x>

Suppose both A and C define a method "y":

  <C, #y> -> <C>>y>
  <B, #y> -> <A>>y>
  <A, #y> -> <A>>y>

Now look at the expression "self y" inside of method "x". While
you would be executing the exact same bytecodes <A>>x>, the object
denote by "self" can be of class A, B or C. Which means (in case
I want to inline it), that the expression "self y" can refer to
either <A>>y> or <C>>y>.

If I allow the compiler to generate only one machine code method
for each bytecode method, the best it can do (in a C-like syntax)
is:

  compiled_method_for: <A>>x>

      ...
      switch ( class_of(self) ) {
         case A:
         case B:  code_for: <A>>y>;
                  break;
         case C:  code_for: <C>>y>;
                  break;
      };
      ...

Actually, the compiler will probably do much worse than this
(generate something like look_up_and_call("y");). Now imagine
that we make the compiler generate a different machine code method
for each entry in your cache (we call this "customization"). We
would have:

  compiled_method_for: <A>>x> Customized_for: A

      ...
      code_for: <A>>y>;
      ...

  compiled_method_for: <A>>x> Customized_for: B

      ...
      code_for: <A>>y>;
      ...

  compiled_method_for: <A>>x> Customized_for: C

      ...
      code_for: <C>>y>;
      ...

Now you might think it is a waste of space since the two first
customized versions have exactly the same native code, and you
would be right. But this is the key to Self 4.0's speed. There
are plenty of papers about this at the Self site, if you are
interested.

> >Yes, maps can be a part of the key for the cache.
> 
> ?

I think we are using different terms for the same things. For
me, a cache would be a list of associations like this:

   key1 -> value1
   key2 -> value2

So I was saying you can replace

   <#x,classOf(A)> -> <A>>x>

with

   <'x',mapOf(A)> -> <A>>x>

> Of course.  Here's the example (for Smalltalk):  Suppose we've A and B,
> subclass of A and C, subclass of B.  Suppose that A implements method x
> (which I write as A>>x).  Let a,b,c instances of A, B and C.
> 
> If we have "c x", this generates  <C, #x> -> <A>>x>
> If we have "b x", this generates  <B, #x> -> <A>>x>
> 
> If we now add x to B, the caches need to be flushed and the same message
> sends generate different keys now.  Is this what you were refering to above?

No, as I explained I was talking about "self" becoming polymorphic
due to inheritance.

> "c x" -->  <C, #x> -> <B>>x>
> "b x" -->  <B, #x> -> <B>>x>
> "a x" -->  <A, #x> -> <A>>x>
> 
> If B>>x would contain a "super x", and we have to predend to by a
> superclass(receiver), that is an A to do the lookup.  The invocation is of
> course made for b.

This is an exception to what I was talking before, since when "super"
(resend in Self) makes the loopup start at the class where the
method *was defined*, not in the class of the object that received
the message. If B>>x includes a "super x", this *always* means
<A>>x> even when executed for an object of class C.

> >The answer is to do away with all of this and use reflection with
> >some agressive optimization (partial evaluation, for example).
> 
> Do you have a concrete example as why reflection would help here and how?

You also asked this in the email "Re: Objects with dynamic slots".
I suppose "here" means "custom message passing semantics", as we
were talking about that earlier.

Basically, the idea is that instead of making message passing
a fixed feature of the VM design, you would associate one or
more "meta-objects" with each object and they are the ones that
define what happens. Note that classes in Smalltalk and maps in
Self are already examples of meta-objects, but we could add
many more, Jeff McAffer added the following seven meta-objects
to make a more reflective Smalltalk called CodA:

  http://web.yl.is.s.u-tokyo.ac.jp/members/jeff/research/coda.html

 1) send - starts the message transfer process
 2) accept - interacts on behalf of the receiver with the sender's
             "send" meta-object
 3) queue - can decouple accepting a message from executing it
 4) receive - when the object is ready, the message is removed
              from the queue and processed
 5) protocol - knows how to get the right method based on the
               message selector
 6) execution - given a method, it knows how to get the receiver
                object to actually execute it
 7) state - knows how to access instance variable in the receiver

Each of these meta-objects is a real Smalltalk object, with all
that implies. In particular, it implements a set of methods which
are invoked by the Virtual Machine.

If you associate every object in the system with the default
meta-objects, you get a system that behaves exactly like the
normal Smalltalk (and CodA is highly optimized for this case).
The standard "send" and "accept" meta-objects implement a
synchronous message passing which blocks the sender. The "accept"
invokes "protocol" directly (bypassing "queue" and "receive")
which implements the regular method lookup with a cache. And
"execution" simply invokes the Virtual Machine to interpret the
method.

If you need something different, all you have to do is define
one or more new meta-objects and ask the system to associate
your objects with them. For example:

  ProxyAccept>>accept: message for: base
     | ra |
     ra := (base meta state instVarAt: 'representee') meta accept.
     ^ ra accept: message for: base

Associating this meta-object with a normal object that has an
instance variable called 'reprententee' will make it into a
proxy object in a much cleaner way than #doesNotUnderstand:

> >As I mentioned in my other email, this is very complicated. The
> >error ends up in a method in the process object, and it checks
> >if the object happens to understand the message 'undefinedSelector:...'
> 
> I'd do it the othe way round. First lets the VM determine wether there's
> the right method.  If yes, let's call it directly without noticing the
> process (or thread object as I would probably call it)

That is how it is now. In fact, if inlining happened liked in my
example at the start of this email then the message send is much
more direct (it may even have been optimized away completely!!).

>  Otherwise, it's the
> right way (at least the way I'd have expected) to notify the process that
> there's an object that doesn't understand what it should.

That is what happens. And then the process (in Self code) takes
care of it.

> BTW, what if there's no process object or that objects doesn't understand
> the right message? Kernel panic?  ;-)  Perhaps the right time to reinvent
> the "guru meditation"...

There are a few "kernel panics" in Self 4.0, but message sending
doesn't have such problems. There is always a process object,
even in a very empty Self world (but not necessarily a scheduler
object, though).

> >[PICs are very hard to integrate into an interpreter]
> 
> if (receiver.map != map_I_expect_here) {
>   push(receiver);
>   map_I_expect_here = receiver.map;
>   method = do_the_normal_lookup(selector);
> }
> invoke_remembered(method);
> // "map_I_expected_here" and "method" static local vars
> // need aditional storage for "prev_cache" and "next_cache"
> 
> which would normally be some kind of assembler.

I didn't understand where this code is supposed to go. If you
compile bytecodes to native machine language, then you can
insert this sort of thing there. But if we are talking about
an interpreter, how does this piece of code get associated with
the right message-send-bytecode?

Jecel