I have been thinking about resends, lately. I don't like them as they are. Here is the statistics for the bytecodes for the 26339 methods in the standard snapshot:
implicitSend 81590 send 62233 literal 27795 pushSelf 4639 return 4158 index 3233 resendOp 598 delegatee 17
This means we have 581 undirected resends and 17 directed ones. Given the small number of directed resends and their awkwardness (two bytecodes), I had worried the most about them.
But now I am thinking if we really need the undirected resends? They were certainly important when we had parent priorities in early Selfs and complex tie breaker rules. But in modern Self it would be no big deal to always specify in which parent the lookup should proceed. Moving slots around so that this parent would change is much less likely to happen now.
The reason I am thinking about this is that I would like to replace resends with a more general explicit delegation mechanism. Where we now have
comparisonParent.isLessThan: arg
we could do
(helpers at: i).isLessThan: arg
Hmm... I remember that it was possible to do this with primitives in early Selfs, but can find nothing like it in either Self 3 or Self 4. Anyway, the use of the "." for the resend syntax was a needless complication. Maybe the more popular "::" would be easier to parse?
Anyway, directed resends fit in perfectly with the semantics of delegation (lookup in another object but use me as "self") while undirected resends do not (lookup in me bu skipping any slots with the same name).
-- Jecel
Hi!
Jecel wrote three weeks ago:
Here is the statistics for the bytecodes for the 26339 methods in the standard snapshot:
implicitSend 81590 send 62233 literal 27795 pushSelf 4639 return 4158 index 3233 resendOp 598 delegatee 17
Which made me think about the encoding again. We can use these numbers to do some math. Yeah!
A Self method object looks like this, I think:
(| bytecodes. literals. |)
where bytecodes contains a byteVector and literals a vector (or nil). As I don't know how many objects contain nil, I'll assume every object has at least an empty vector. Let's further assume that every normal object has an object header of 8 bytes (4 bytes = 1 word = reference to map and 4 bytes for hash value, gc flags and type flags). Vectors have have 12 bytes (additionally 4 bytes size) but let's assume we're using a compact encoding for short vectors. (Jecel even mentioned 16 bytes object header).
That means, for 26339 methods, there's an VM overhead of (at least) 26339*3*8 = 632136 bytes. Reducing a method object to a single 32bit-word object (wordVector) will save (at most) 66%, 26339*2*8 = 421424 bytes, needing only 210712 bytes.
This is already quite impressive.
Now Self needs 183648 bytes for its byte code instructions (ignoring resends). For Jecel's and mine encoding, this is more difficult to calculate. Go on.
Each method needs about 7 byte codes. However, Jecel uses only sends and pushes but needs to add a return to every method, not only to 4158. So I'll assume 202596 instructions. This is about 7.7 instructions per method and means we can still assume that 1 word is enough per method in general. This sums up to 26339 * 4 = 105356 bytes then. Calculating the size for the literal is easy again, as any push and send instructions needs exactly 4 bytes: 176257 * 4 = 705028.
So Jecel's approach needs about 210712 (VM overhead) + 105356 (instructions) + 705028 (literals) = 1021096 bytes or about 997k.
Compared to original Self, these are 76k less for instructions and at least 17k (4639 * 4 for the push self) but probably much more for literals. The big saving of ~410k is because of the reduced VM overhead. The netto encoding saving is less than 60k!
I can't really calcuate more for the original self as I don't know how often instructions can share a literal and how many methods have literals at all. 93% of all instructions need literals. This means, about 6.5 literal references per method. My VisualWorks Smalltalk has 34066 methods. Here only 9787 don't share literals. That is 20% share one literal and 51% two or more literals. However, the typical VisualWorks method has 19 byte codes. So let's assume that about 50% of all Self method share at least one symbol. That is, of 171618 literal references in 26339 methods, we can remove at least 8780 13170 references, or 13170 * 4 = 52680 bytes. Compared to Jecel, the netto saving is reduced to 8k. Am I right?
Now to my suggestion. I need 202596 instructions, as Jecel. However these are all bytes which need to fit into words. 7.7 instructions will fit into two words: 26339 * 8 = 210712 bytes. Let's look into VisualWorks again: The 34066 methods contain 304607 message sends or other symbol literals. These are the top 10 message sends:
#+ 6561 #== 5977 #new 4478 #= 4172 #at:put: 4051 #- 3292 #at: 3272 #size 2702 #@ 2635 #value: 2328
The top 63 sends are 92945 or 30% of all sends. This means, we can remove 30% or 43147 of 143823 literals for sends at least. The other 27795 literals are probably numbers, strings or else. Because of encoding -1, 0, 1 or 2 as bytecodes, I'll remove additional 10% or 2780 references. In sum: (100676 + 25015 = 125691) * 4 = 502764 bytes.
So my approach needs about 210712 (VM overhead) + 210712 (instructions) + 502764 (literals) = 924188 bytes or about 903k.
Better but still disappointing.
I could save a few bytes by using this tricky encoding:
Byte code instructions are stored from the end of the combined vector and literals from the beginning. So you don't have to inter-weave both things and you don't loose memory for padding to 4 bytes. Why didn't I thought about this earlier? (This would also work for Jecel's idea, to be fair). This would save the tremendous amount of some 8116 bytes. Cool.
To wrap up, if you really want to save space, don't bother with the byte codes of all these little methods, either write big methods (uuh!) or reduce the VM object overhead.
Actually you don't need a VM overhead at all if you add the code directly to the map object. This would probably add some work to the garbage collector but otherwise, it would free up the complete VM overhead of 210712 bytes.
bye -- Stefan Matthias Aust // Bevor wir fallen, fallen wir lieber auf.
self-interest@lists.selflanguage.org