Hi!
Currently, my interpreter lookup all methods every time they are called. I'd like to cache this somehow, but I'm unsure whether this works as lined out below.
My approach is to assume a static inheritance hierarchy and then flush caches in all cases were this assumption was wrong. So far so easy.
It's a well known optimization for Smalltalk systems, to cache
<class, selector> -> <method>
mappings. Class is the receiver's class, not the method's implementation class. This means, more than one key could reference to the same method.
This kind of cache can remove about 80%-95% of all method lookups, depending on its size, the quality of the hashing algorithm used and of course the application.
Can I simply replace "class" with "map object" in Self and use the same caching with the same results? Map objects are used to share common slot layout and behavior of objects. They're transparent to the user.
Now, using #doesNotUnderstand in Smalltalk is not that uncommon and using DNU should be cachable, too. I'm not sure how to extend the mechanism, but I think, mapping
<myclass, #foobar> -> <Object>>doesNotUnderstand:>
after you noticed that #foobar doesn't exists, should be sufficient. Am I wrong? I recently read in comp.lang.smalltalk, that DNUs aren't slower than normal method calls anymore in most Smalltalk system so it can be optimized. But how?
By the way, how does Self notify the user?
Are 'doesNotUnderstand:' and 'ambiguousMessage:' (to make up two names, I don't know the right names at the moment) part of "defaultBehavior"? What if an object doesn't implement or inherit these messages? Are these messages sent to another object?
Let's assume it would work so far. What if the inheritance graph changes? Flushing all caches would be a solution (of course) but can we find some better way please?
If a map gets or loses a parent slot or a parent slot's value is modified, we have to flush all cache entries for that map and all maps which inherit from that map (further complicated because of the fact that inheritance isn't a nice tree but directed graph, not even anti-cyclic).
To find them, we must either enumerate all maps and search or we must store back links, a list of direct submaps for each map. This seems to be difficult especially for dynamic parent slots, to which you could assign new values at any time.
Are there better ways to cache slot lookup?
Polymorphic inline caches are said to give even better caching results but there are more difficult to flush I think, because they are distributed over the whole code. You'd probably have to maintain a linked list of these caches.
BTW, what's the best way to handle slots?
The original Self paper describes map objects as sequential lists (or vectors) of slots. Searching for a slot means to compare selector addresses in sequental order. A very fast operation but unfortunately, but with a complexity of O(n/2).
Smallalk traditionally uses hash tables (IdentityDictionaries actually). While it took some time to compute the hash code, it should be faster as typically a (good) hashing needs about 2 tries (it's O(2)). On the other hand, hash tables need more space (about twice as much, but minimal 25%-33% overhead). For some 25000 methods, this means about 100k.
We've also to keep in mind, that if the lookup caching works, lookup is only performed in 10%-5% of all cases. So lookup speed isn't that critical perhaps.
bye -- Stefan Matthias Aust // Bevor wir fallen, fallen wir lieber auf.
Stefan Matthias Aust wrote:
It's a well known optimization for Smalltalk systems, to cache
<class, selector> -> <method>
mappings. Class is the receiver's class, not the method's implementation class. This means, more than one key could reference to the same method.
Exactly. But note that the class of "self" is different for executions of the same method associated with different keys. This suggests compiling different native codes for the same source method, which is the idea behind "customization".
Can I simply replace "class" with "map object" in Self and use the same caching with the same results? Map objects are used to share common slot layout and behavior of objects. They're transparent to the user.
Yes, maps can be a part of the key for the cache.
Now, using #doesNotUnderstand in Smalltalk is not that uncommon and using DNU should be cachable, too. I'm not sure how to extend the mechanism, but I think, mapping
<myclass, #foobar> -> <Object>>doesNotUnderstand:>
after you noticed that #foobar doesn't exists, should be sufficient. Am I wrong?
Supposing that myclass overrides #doesNotUnderstand:, it would be better to map <myclass, #foobar> to <myclass>>doesNotUnderstand:>, right?
I recently read in comp.lang.smalltalk, that DNUs aren't slower than normal method calls anymore in most Smalltalk system so it can be optimized. But how?
Certainly using the cache they way I indicated above would help, but you still would have an indirection in that the method myclass>>doesNotUnderstand: would do some kind of dispatching before getting to the actual code of #foobar, and that would be slower than the built-in message dispatching.
The answer is to do away with all of this and use reflection with some agressive optimization (partial evaluation, for example).
By the way, how does Self notify the user?
Are 'doesNotUnderstand:' and 'ambiguousMessage:' (to make up two names, I don't know the right names at the moment) part of "defaultBehavior"? What if an object doesn't implement or inherit these messages? Are these messages sent to another object?
As I mentioned in my other email, this is very complicated. The error ends up in a method in the process object, and it checks if the object happens to understand the message 'undefinedSelector:...' or 'ambiguousSelector:...' or so on. If it does, then the process uses the _Perform: primitive to send on of these messages to the object, letting it handle the error itself. If the object doesn't implement the corresponding message (doesn't understand #doesNotUnderstand ;-) then the debugger is invoked. So error handling works even if the object doesn't inherit from anybody at all.
Let's assume it would work so far. What if the inheritance graph changes? Flushing all caches would be a solution (of course) but can we find some better way please?
This is the way I would do it in a first implementation.
If a map gets or loses a parent slot or a parent slot's value is modified, we have to flush all cache entries for that map and all maps which inherit from that map (further complicated because of the fact that inheritance isn't a nice tree but directed graph, not even anti-cyclic).
In Self 4.0, all methods are connected via doubly linked lists to all slots that were involved in its compilation. If *any* of them are changed (eliminated, new one with same name added in child, etc.) then it is purged from the compiled code cache.
I don't like all this overhead, but you might consider some variation of this idea to optimize flushing your lookup cache.
Polymorphic inline caches are said to give even better caching results but there are more difficult to flush I think, because they are distributed over the whole code. You'd probably have to maintain a linked list of these caches.
PICs are very hard to integrate into an interpreter. Note that your cache works at the receiving site (and so mixes the results of all callers), while ICs and PICs work at the calling site, and so there are more of them and they are more precise.
Please look at the papers at http://www.cs.ucsb.edu/oocsb/papers.shtml (specially the ones about Dynamic Dispatch).
-- Jecel
Jecel:
Me:
It's a well known optimization for Smalltalk systems, to cache
<class, selector> -> <method>
mappings. Class is the receiver's class, not the method's implementation class. This means, more than one key could reference to the same method.
Exactly. But note that the class of "self" is different for executions of the same method associated with different keys. This suggests compiling different native codes for the same source method, which is the idea behind "customization".
Either you didn't understand my example or I don't understand your answer (probably the latter, I guess). You're talking about Smalltalk, right? Why the class of "self" is different and want keys do you refer to?
Can I simply replace "class" with "map object" in Self and use the same caching with the same results?
Yes, maps can be a part of the key for the cache.
?
Now, using #doesNotUnderstand in Smalltalk is not that uncommon and using DNU should be cachable, too. I'm not sure how to extend the mechanism, but I think, mapping
<myclass, #foobar> -> <Object>>doesNotUnderstand:>
after you noticed that #foobar doesn't exists, should be sufficient. Am I wrong?
Supposing that myclass overrides #doesNotUnderstand:, it would be better to map <myclass, #foobar> to <myclass>>doesNotUnderstand:>, right?
Of course. Here's the example (for Smalltalk): Suppose we've A and B, subclass of A and C, subclass of B. Suppose that A implements method x (which I write as A>>x). Let a,b,c instances of A, B and C.
If we have "c x", this generates <C, #x> -> <A>>x> If we have "b x", this generates <B, #x> -> <A>>x>
If we now add x to B, the caches need to be flushed and the same message sends generate different keys now. Is this what you were refering to above?
"c x" --> <C, #x> -> <B>>x> "b x" --> <B, #x> -> <B>>x> "a x" --> <A, #x> -> <A>>x>
If B>>x would contain a "super x", and we have to predend to by a superclass(receiver), that is an A to do the lookup. The invocation is of course made for b.
The answer is to do away with all of this and use reflection with some agressive optimization (partial evaluation, for example).
Do you have a concrete example as why reflection would help here and how?
As I mentioned in my other email, this is very complicated. The error ends up in a method in the process object, and it checks if the object happens to understand the message 'undefinedSelector:...'
I'd do it the othe way round. First lets the VM determine wether there's the right method. If yes, let's call it directly without noticing the process (or thread object as I would probably call it) Otherwise, it's the right way (at least the way I'd have expected) to notify the process that there's an object that doesn't understand what it should.
This should improve the performance as the VM is probably that part of the system that can do the fasted method lookup and invocation. Anything I missed?
BTW, what if there's no process object or that objects doesn't understand the right message? Kernel panic? ;-) Perhaps the right time to reinvent the "guru meditation"...
In Self 4.0, all methods are connected via doubly linked lists to all slots that were involved in its compilation. If *any* of them are changed (eliminated, new one with same name added in child, etc.) then it is purged from the compiled code cache.
Probably the only way to do it. Does anybody have a better idea? Dave perhaps? Anything you'd change for a new self system?
PICs are very hard to integrate into an interpreter. Note that your cache works at the receiving site (and so mixes the results of all callers), while ICs and PICs work at the calling site, and so there are more of them and they are more precise.
I know. This is why inline caches are considered to do even better caching ratios. Creating a double-linked list of all PICs could do the trick. This -- as written without further checking -- should do the trick
if (receiver.map != map_I_expect_here) { push(receiver); map_I_expect_here = receiver.map; method = do_the_normal_lookup(selector); } invoke_remembered(method); // "map_I_expected_here" and "method" static local vars // need aditional storage for "prev_cache" and "next_cache"
which would normally be some kind of assembler. One problem could be that in the caching case, we need a jump which is bad for modern processors. So logic should be inverted. However, in all cases, we'd need a jump over the static variables if they're stored local to the invocation. One could setup an extern cache array which would then store map, method pairs (prev and next aren't needed in that case) but this would break locality of the code. Self-modifying code which would hide the vars in constant assignments to some dummy register would be yet another solution.
bye -- Stefan Matthias Aust // Bevor wir fallen, fallen wir lieber auf.
Stefan Matthias Aust wrote:
Jecel:
Exactly. But note that the class of "self" is different for executions of the same method associated with different keys. This suggests compiling different native codes for the same source method, which is the idea behind "customization".
Either you didn't understand my example or I don't understand your answer (probably the latter, I guess). You're talking about Smalltalk, right? Why the class of "self" is different and want keys do you refer to?
I was talking about Smalltalk and Self, but the problem of different "classes" for self is, as the name of the language implies, more important in Self.
Imagine classes A, B and C as in your example below. The lookup cache has:
<C, #x> -> <A>>x> <B, #x> -> <A>>x> <A, #x> -> <A>>x>
Suppose both A and C define a method "y":
<C, #y> -> <C>>y> <B, #y> -> <A>>y> <A, #y> -> <A>>y>
Now look at the expression "self y" inside of method "x". While you would be executing the exact same bytecodes <A>>x>, the object denote by "self" can be of class A, B or C. Which means (in case I want to inline it), that the expression "self y" can refer to either <A>>y> or <C>>y>.
If I allow the compiler to generate only one machine code method for each bytecode method, the best it can do (in a C-like syntax) is:
compiled_method_for: <A>>x>
... switch ( class_of(self) ) { case A: case B: code_for: <A>>y>; break; case C: code_for: <C>>y>; break; }; ...
Actually, the compiler will probably do much worse than this (generate something like look_up_and_call("y");). Now imagine that we make the compiler generate a different machine code method for each entry in your cache (we call this "customization"). We would have:
compiled_method_for: <A>>x> Customized_for: A
... code_for: <A>>y>; ...
compiled_method_for: <A>>x> Customized_for: B
... code_for: <A>>y>; ...
compiled_method_for: <A>>x> Customized_for: C
... code_for: <C>>y>; ...
Now you might think it is a waste of space since the two first customized versions have exactly the same native code, and you would be right. But this is the key to Self 4.0's speed. There are plenty of papers about this at the Self site, if you are interested.
Yes, maps can be a part of the key for the cache.
?
I think we are using different terms for the same things. For me, a cache would be a list of associations like this:
key1 -> value1 key2 -> value2
So I was saying you can replace
<#x,classOf(A)> -> <A>>x>
with
<'x',mapOf(A)> -> <A>>x>
Of course. Here's the example (for Smalltalk): Suppose we've A and B, subclass of A and C, subclass of B. Suppose that A implements method x (which I write as A>>x). Let a,b,c instances of A, B and C.
If we have "c x", this generates <C, #x> -> <A>>x> If we have "b x", this generates <B, #x> -> <A>>x>
If we now add x to B, the caches need to be flushed and the same message sends generate different keys now. Is this what you were refering to above?
No, as I explained I was talking about "self" becoming polymorphic due to inheritance.
"c x" --> <C, #x> -> <B>>x> "b x" --> <B, #x> -> <B>>x> "a x" --> <A, #x> -> <A>>x>
If B>>x would contain a "super x", and we have to predend to by a superclass(receiver), that is an A to do the lookup. The invocation is of course made for b.
This is an exception to what I was talking before, since when "super" (resend in Self) makes the loopup start at the class where the method *was defined*, not in the class of the object that received the message. If B>>x includes a "super x", this *always* means <A>>x> even when executed for an object of class C.
The answer is to do away with all of this and use reflection with some agressive optimization (partial evaluation, for example).
Do you have a concrete example as why reflection would help here and how?
You also asked this in the email "Re: Objects with dynamic slots". I suppose "here" means "custom message passing semantics", as we were talking about that earlier.
Basically, the idea is that instead of making message passing a fixed feature of the VM design, you would associate one or more "meta-objects" with each object and they are the ones that define what happens. Note that classes in Smalltalk and maps in Self are already examples of meta-objects, but we could add many more, Jeff McAffer added the following seven meta-objects to make a more reflective Smalltalk called CodA:
http://web.yl.is.s.u-tokyo.ac.jp/members/jeff/research/coda.html
1) send - starts the message transfer process 2) accept - interacts on behalf of the receiver with the sender's "send" meta-object 3) queue - can decouple accepting a message from executing it 4) receive - when the object is ready, the message is removed from the queue and processed 5) protocol - knows how to get the right method based on the message selector 6) execution - given a method, it knows how to get the receiver object to actually execute it 7) state - knows how to access instance variable in the receiver
Each of these meta-objects is a real Smalltalk object, with all that implies. In particular, it implements a set of methods which are invoked by the Virtual Machine.
If you associate every object in the system with the default meta-objects, you get a system that behaves exactly like the normal Smalltalk (and CodA is highly optimized for this case). The standard "send" and "accept" meta-objects implement a synchronous message passing which blocks the sender. The "accept" invokes "protocol" directly (bypassing "queue" and "receive") which implements the regular method lookup with a cache. And "execution" simply invokes the Virtual Machine to interpret the method.
If you need something different, all you have to do is define one or more new meta-objects and ask the system to associate your objects with them. For example:
ProxyAccept>>accept: message for: base | ra | ra := (base meta state instVarAt: 'representee') meta accept. ^ ra accept: message for: base
Associating this meta-object with a normal object that has an instance variable called 'reprententee' will make it into a proxy object in a much cleaner way than #doesNotUnderstand:
As I mentioned in my other email, this is very complicated. The error ends up in a method in the process object, and it checks if the object happens to understand the message 'undefinedSelector:...'
I'd do it the othe way round. First lets the VM determine wether there's the right method. If yes, let's call it directly without noticing the process (or thread object as I would probably call it)
That is how it is now. In fact, if inlining happened liked in my example at the start of this email then the message send is much more direct (it may even have been optimized away completely!!).
Otherwise, it's the right way (at least the way I'd have expected) to notify the process that there's an object that doesn't understand what it should.
That is what happens. And then the process (in Self code) takes care of it.
BTW, what if there's no process object or that objects doesn't understand the right message? Kernel panic? ;-) Perhaps the right time to reinvent the "guru meditation"...
There are a few "kernel panics" in Self 4.0, but message sending doesn't have such problems. There is always a process object, even in a very empty Self world (but not necessarily a scheduler object, though).
[PICs are very hard to integrate into an interpreter]
if (receiver.map != map_I_expect_here) { push(receiver); map_I_expect_here = receiver.map; method = do_the_normal_lookup(selector); } invoke_remembered(method); // "map_I_expected_here" and "method" static local vars // need aditional storage for "prev_cache" and "next_cache"
which would normally be some kind of assembler.
I didn't understand where this code is supposed to go. If you compile bytecodes to native machine language, then you can insert this sort of thing there. But if we are talking about an interpreter, how does this piece of code get associated with the right message-send-bytecode?
Jecel
self-interest@lists.selflanguage.org