[self-interest] Caching method lookup smalltalk

Sun Aug 29 15:27:35 UTC 1999

Hi!

Currently, my interpreter lookup all methods every time they are called.
I'd like to cache this somehow, but I'm unsure whether this works as lined
out below.

My approach is to assume a static inheritance hierarchy and then flush
caches in all cases were this assumption was wrong.  So far so easy.

It's a well known optimization for Smalltalk systems, to cache

<class, selector> -> <method>

mappings.  Class is the receiver's class, not the method's implementation
class.  This means, more than one key could reference to the same method.

This kind of cache can remove about 80%-95% of all method lookups,
depending on its size, the quality of the hashing algorithm used and of
course the application.

Can I simply replace "class" with "map object" in Self and use the same
caching with the same results?  Map objects are used to share common slot
layout and behavior of objects.  They're transparent to the user.

Now, using #doesNotUnderstand in Smalltalk is not that uncommon and using
DNU should be cachable, too.  I'm not sure how to extend the mechanism, but
I think, mapping

<myclass, #foobar> -> <Object>>doesNotUnderstand:>

after you noticed that #foobar doesn't exists, should be sufficient. Am I
wrong?  I recently read in comp.lang.smalltalk, that DNUs aren't slower
than normal method calls anymore in most Smalltalk system so it can be
optimized.  But how?

By the way, how does Self notify the user?  

Are 'doesNotUnderstand:' and 'ambiguousMessage:' (to make up two names, I
don't know the right names at the moment) part of "defaultBehavior"?  What
if an object doesn't implement or inherit these messages?  Are these
messages sent to another object?

Let's assume it would work so far.  What if the inheritance graph changes?
Flushing all caches would be a solution (of course) but can we find some
better way please?

If a map gets or loses a parent slot or a parent slot's value is modified,
we have to flush all cache entries for that map and all maps which inherit
from that map (further complicated because of the fact that inheritance
isn't a nice tree but directed graph, not even anti-cyclic).

To find them, we must either enumerate all maps and search or we must store
back links, a list of direct submaps for each map.  This seems to be
difficult especially for dynamic parent slots, to which you could assign
new values at any time.

Are there better ways to cache slot lookup?  

Polymorphic inline caches are said to give even better caching results but
there are more difficult to flush I think, because they are distributed
over the whole code.  You'd probably have to maintain a linked list of
these caches.  

BTW, what's the best way to handle slots?

The original Self paper describes map objects as sequential lists (or
vectors) of slots.  Searching for a slot means to compare selector
addresses in sequental order.  A very fast operation but unfortunately, but
with a complexity of O(n/2).

Smallalk traditionally uses hash tables (IdentityDictionaries actually).
While it took some time to compute the hash code, it should be faster as
typically a (good) hashing needs about 2 tries (it's O(2)).  On the other
hand, hash tables need more space (about twice as much, but minimal 25%-33%
overhead).  For some 25000 methods, this means about 100k.

We've also to keep in mind, that if the lookup caching works, lookup is
only performed in 10%-5% of all cases.  So lookup speed isn't that critical
perhaps.

bye
--
Stefan Matthias Aust  //  Bevor wir fallen, fallen wir lieber auf.