Is there a list for discussing prototype languages in general?
I am interested in figuring out some details concerning languages like Omega, Kevo, Obliq that prefer embedding over delegation. The idea of embedding never really made sense to me until I read some slides by Luca Cardelli ("Object based vs Class based Languages", PLDI 96 Tutorial). Some things still seem problematic with the approach. I like a few things about it, easy to make dispatch fast, objects are self sufficient. For me, fast dispatch is important since I am currently using a Smalltalk-like language with constant time dispatch.
It seems that change management would be a nightmare when you want to update a method that was inherited by a lot of objects. This might be solved by using "become:" ?? The slides contain this quote: "In embedding based languages such as Kevo and Omega, pervasive changes are achieved even without donor hierarchies." How?
Also it seems more difficult to use inheritance in certain "part-of" situations (as was done NewtonScript for example), or doing so would cause a lot of unecessary space usage.
This quote is also there: "Super and override make sense for implicit inheritance but not for explicit inheritance." But don't you still need to call the 'super' method in the (explicitly) overriding method sometimes?
Another observation is that if you have doesNotUnderstand: then it is trivial to reimplement parent delegation in an embedding only language. So why not build it in to the dispatch routine anyway?
I can't seem to find any info at all on the web on Kevo other than brief mentions. Omega seems very out of date. I'm unsure of the status of Obliq.
On Monday 30 September 2002 14:16, James McCartney wrote:
Is there a list for discussing prototype languages in general?
There were one or two attempts to start one. Currently there is a more generic "langsmiths" list at Yahoo groups where you can find interested people. But posting here is probably a good way to reach people interested in prototype based languages.
I am interested in figuring out some details concerning languages like Omega, Kevo, Obliq that prefer embedding over delegation. The idea of embedding never really made sense to me until I read some slides by Luca Cardelli ("Object based vs Class based Languages", PLDI 96 Tutorial).
Thanks for the referece. I hadn't seen it before and it is very interesting:
http://research.microsoft.com/Users/luca/Slides/PLDI96Tutorial.pdf
Some things still seem problematic with the approach. I like a few things about it, easy to make dispatch fast, objects are self sufficient. For me, fast dispatch is important since I am currently using a Smalltalk-like language with constant time dispatch.
And I am implementing a version of Self with constant time dispatch (1 clock, not counting cache misses). It doesn't have dynamic inheritance and wastes memory. You can say that "under the hood" it uses embedding instead of delegation. But it looks like normal Self at the programming level.
It seems that change management would be a nightmare when you want to update a method that was inherited by a lot of objects. This might be solved by using "become:" ?? The slides contain this quote: "In embedding based languages such as Kevo and Omega, pervasive changes are achieved even without donor hierarchies." How?
The main complication is how to express such sweeping changes in these languages. At the implementation level it doesn't look like much of a problem to me if you can accept that such changes might take a second or two.
Also it seems more difficult to use inheritance in certain "part-of" situations (as was done NewtonScript for example), or doing so would cause a lot of unecessary space usage.
Again, space usage problems can be solved with implementation tricks. But embedding languages have a more static feel to them, in my opinion, and wouldn't work as well in the kind of programming that NewtonScript was used for.
This quote is also there: "Super and override make sense for implicit inheritance but not for explicit inheritance." But don't you still need to call the 'super' method in the (explicitly) overriding method sometimes?
Indeed, but by definition you already have some notation for talking about the methods/attributes that you are explicitly inheriting. So you can use that notation instead of something more "magic" like "super".
Another observation is that if you have doesNotUnderstand: then it is trivial to reimplement parent delegation in an embedding only language. So why not build it in to the dispatch routine anyway?
Sure, but then it becomes a delegating language instead of an embedding one, right? Note that you might not be able to fully emulate delegation without some language support - in "real" delegation the value of self is the receiver of the original message, not the receiver of the delegated one.
I can't seem to find any info at all on the web on Kevo other than brief mentions.
Kevo itself (for the Mac) was posted to this list a month or two ago. I have not had much luck finding papers about it. "On the Notion of Inheritance" is interesting, if you haven't already read it:
http://www.csee.umbc.edu/331/resources/papers/Inheritance.pdf
Omega seems very out of date. I'm unsure of the status of Obliq.
My impression is that all interest in languages that weren't slightly patched Java died out in 1997 or so. But I could be wrong...
-- Jecel
Thank you for your informative responses.
On Monday, September 30, 2002, at 03:37 PM, Jecel Assumpcao Jr wrote:
Sure, but then it becomes a delegating language instead of an embedding one, right? Note that you might not be able to fully emulate delegation without some language support - in "real" delegation the value of self is the receiver of the original message, not the receiver of the delegated one.
Yes, however in Obliq you do have a way to explicitly do "real" delegation to another object.
Kevo itself (for the Mac) was posted to this list a month or two ago. I have not had much luck finding papers about it. "On the Notion of Inheritance" is interesting, if you haven't already read it:
http://www.csee.umbc.edu/331/resources/papers/Inheritance.pdf
No I hadn't seen that. Thanks.
Again this is not about Self exactly but I don't think there's any bandwidth problem on this list..
OK I've been curious about this ever since you posted it. If it is not a secret, I wish you would tell more about it. I assume you are flattening parents into a single object. Do you also flatten the instance's own slots along with the parents? It would seem like you'd need to hash in that case, but obviously you're not since you say 1 clock. But I'm not even sure what that means since even an array lookup is not 1 clock.
On Monday, September 30, 2002, at 03:37 PM, Jecel Assumpcao Jr wrote:
And I am implementing a version of Self with constant time dispatch (1 clock, not counting cache misses). It doesn't have dynamic inheritance and wastes memory. You can say that "under the hood" it uses embedding instead of delegation. But it looks like normal Self at the programming level.
On Wednesday, November 6, 2002, at 12:26 PM, James McCartney wrote:
It would seem like you'd need to hash in that case, but obviously you're not since you say 1 clock. But I'm not even sure what that means since even an array lookup is not 1 clock.
A perfect hash would take care of that(I use them in Io) but it doesn't scale to large numbers of slots, which you'd get if you're flattening the inheritance tree.
Cheers, Steve
On Wednesday, November 6, 2002, at 03:43 PM, Steve Dekorte wrote:
A perfect hash would take care of that(I use them in Io)
I'm not really sure that the perfect hash as you have implemented them really buys you anything. You use a prime sized table and the % operator instead of using a power of two sized table and masking. Mod % is a pretty slow operation. In the time it takes to do the % operation you could probably have chained along once. With a 0.5 load factor and separate chaining, the average chain length is 1.27 . So I'd think that your perfect hash is not really winning on performance, but is losing big on space.
On Wednesday, November 6, 2002, at 07:30 PM, James McCartney wrote:
A perfect hash would take care of that(I use them in Io)
I'm not really sure that the perfect hash as you have implemented them really buys you anything. You use a prime sized table and the % operator
Actually, I don't use a prime size table. The table grows by one until the hash is perfect, so it can be any size above the minimum.
instead of using a power of two sized table and masking.
Memory usage may be a problem there. The important bit about getting a small perfect hash is having a lot of sizes to try out (because the chance of any given small size working is small). There aren't many powers of 2 before you get big numbers.
Mod % is a pretty slow operation.
That's true.
In the time it takes to do the % operation you could probably have chained along once. With a 0.5 load factor and separate chaining, the average chain length is 1.27 . So I'd think that your perfect hash is not really winning on performance, but is losing big on space.
I've found it doesn't take much space as long as the number of slots is low(which is typical when using differential inheritance).
But I'd certainly like to move to another technique if it's better. I'll run a memory&speed comparison if you send me a piece of your sample code(just the hash code) and post the results.
Cheers, Steve OSX freeware and shareware: http://www.dekorte.com/downloads.html
On Wednesday, November 6, 2002, at 10:24 PM, Steve Dekorte wrote:
I've found it doesn't take much space as long as the number of slots is low(which is typical when using differential inheritance).
For what I'd like to do, which is flattening the parents, the number of slots would be large for the parent.
But I'd certainly like to move to another technique if it's better. I'll run a memory&speed comparison if you send me a piece of your sample code(just the hash code) and post the results.
The Lua hash table lookup scheme looks pretty good. It is separate chaining fit into an open addressing. I would keep my load factor lower than they do though, to keep the chains shorter. I'd be curious how fast your scheme compares to how Lua does it.
On Thursday, November 7, 2002, at 12:16 AM, James McCartney wrote:
On Wednesday, November 6, 2002, at 10:24 PM, Steve Dekorte wrote:
But I'd certainly like to move to another technique if it's better. I'll run a memory&speed comparison if you send me a piece of your sample code(just the hash code) and post the results.
The Lua hash table lookup scheme looks pretty good. It is separate chaining fit into an open addressing. I would keep my load factor lower than they do though, to keep the chains shorter. I'd be curious how fast your scheme compares to how Lua does it.
Me too. Unfortunately Lua's hash code is all mixed up with it's VM. It would involve effort to pull it out.
Cheers, Steve OSX freeware and shareware: http://www.dekorte.com/downloads.html
On Thursday, November 7, 2002, at 12:16 AM, James McCartney wrote:
On Wednesday, November 6, 2002, at 10:24 PM, Steve Dekorte wrote:
I've found it doesn't take much space as long as the number of slots is low(which is typical when using differential inheritance).
For what I'd like to do, which is flattening the parents, the number of slots would be large for the parent.
That's true. Btw, I ran some tests today and found the load on my perfect hashes is typically around .8 for less than 20 slots, and above .5 for between 20 and 50.
Also, I was thinking about the masking idea. Maybe there's some short combination of binary operations that could get the variability needed for perfect hashing while being faster than a %. Some tests on my G4 showed % to be about 1/3 the speed of a &. I poked around on the net but the ones I found were more expensive than %. Any ideas?
Seems like the hw folks should have implemented a mod instruction by now. :-)
Cheers, Steve OSX freeware and shareware: http://www.dekorte.com/downloads.html
| claimed a 1 clock dispatch for an embedded Self (recently renamed "Oliver Smalltalk" in an effort to make other Smalltalkers see that we are really a part of their team).
On Wednesday 06 November 2002 18:26, James McCartney wrote:
OK I've been curious about this ever since you posted it. If it is not a secret, I wish you would tell more about it. I assume you are flattening parents into a single object.
There are several tricks that work together. A simple masking function is used to convert an object's pointer into a pointer for its map, so we don't have to look that up in the object's first word or something. The drawback is that sometimes an object has to change maps after a programming change (added or removed a slot) and that means changing the object's pointer, which means scanning the whole memory to update any references to the old pointer.
This is a 16 bit machine so I can't afford to have many objects. So the sources for all of an object's slots are gathered into one string which is stored in the object's map (so it doesn't take up an object pointer of its own). The object's "outliner" gives you a more Self-like view, so this implementation detail is hidden. When you make any programming changes the new string is used to build a new map, which is then compiled and finally any of the object's children are recompiled.
Besides the source string, a map contains a table and machine language code. This table includes a two word entry for every single selector defined in the system. There can be up to 4096 selectors, so this per map table can grow to be as much as 16KB!! But like I said, I have memory to spare. The tables for all maps are always exactly the same size and they grow as you define new selectors.
Most instructions (except for "send") are only 5 bits long, so up to three of them can fit in a 16 bit word. With two words we can have up to 6 instructions or 3 instructions plus a 16 bit literal. When we compile a map, for each possible selector we fill in its two word entry with the right code. If it is a constant slot, we do something like
pushLiteral 42 ; return
while if it is a data slot we might do
read 3 ; return
or an assignment slot
write 3 ; return
and if it is a method slot we can simply stuff the code into those two words if it will fit, or else put the code at the end of the map and put a jump to that code in the two words.
Note that most entries would have nothing defined for them in a given object, so we simple compile code to call the debugger with a "message not found" error. These entries are considered "empty" in what is described next.
Before actually compiling a map, we copy the table and extra code from its first parent (which already includes everything from its parents), then everything from the second parent and so on. Then we compile the map overwritting anything copied from the parents.
So at runtime, given the pointer to an object and a selector of a message to send to it, we can
1) find the object's map with no memory accesses 2) fetch from 1 the first word from the entry corresponding to the selector 3) execute that word
Do you also flatten the instance's own slots along with the parents?
If you mean that you can't tell what came from the object and what came from the parent, you are right.
What I described above won't work in some cases: selectors that access data slots in the parent and resends. In the first case, code is compiled to explicitly delegate to the parent. It is like adding this local slot:
buttonColor: c = (globals ui preferences buttonColor: c)
I actually got rid of the lobby and globals so this sort of thing wouldn't be so common (it adds a lot of overhead to every single map!).
And about the resends, the compiler simply inlines the parent's source for the method into the spot where the resend was.
It would seem like you'd need to hash in that case, but obviously you're not since you say 1 clock. But I'm not even sure what that means since even an array lookup is not 1 clock.
If we are executing a method in objA and the current top of stack points to objB, we can have these accesses via the cache (which needs a 16 object pointer and a 16 bit field offset to find the 16 bit word we want):
clock N: cache ( map(objA), PCa ) ===> "send selector23" clock N+1: cache ( map(objB), 23<<1 ) ===> "read 3 ; return" clock N+2: cache ( objB, 3 ) ===> data slot 3's contents clock N+3: ---- clock N+4: cache ( map(objA), PCa+1) ===> "selfsend selector92"
It is annoying that the return takes 2 clocks (N+3 and N+4). With a little more hardware that can be reduced to 1 clock or even, when the instruction in N+2 doesn't do a memory access (which "read 3" does, but most don't), 0 clocks.
Anyway, what I mean by "1 clock dispatch" is that in clock N we fetched a send instruction and in clock N+1 we were already fetching the first instruction from the right method. This is possible because the hardware is a simple stack machine which isn't pipelined.
-- Jecel
Thanks for the long reply. It seems like you've made things fast for common execution, but have made creating and modifying prototypes very expensive in time and space.
The drawback is that sometimes an object has to change maps after a programming change (added or removed a slot) and that means changing the object's pointer, which means scanning the whole memory to update any references to the old pointer.
[...]
When you make any programming changes the new string is used to build a new map, which is then compiled and finally any of the object's children are recompiled.
Besides the source string, a map contains a table and machine language code. This table includes a two word entry for every single selector defined in the system. There can be up to 4096 selectors, so this per map table can grow to be as much as 16KB!! But like I said, I have memory to spare. The tables for all maps are always exactly the same size and they grow as you define new selectors.
So this is the equivalent in a class based system of having a full all classes by all selectors matrix.
And about the resends, the compiler simply inlines the parent's source for the method into the spot where the resend was.
So you can't reassign a method slot? Or if you do you have to rebuild the map of all children..
On Friday 08 November 2002 03:57, James McCartney wrote:
Thanks for the long reply. It seems like you've made things fast for common execution, but have made creating and modifying prototypes very expensive in time and space.
As long as "very expensive in time" is less than two seconds, it seems like a good choice for this particular project.
So this is the equivalent in a class based system of having a full all classes by all selectors matrix.
Yes, what is called "Selector Table Indexing" (STI) in "Message Dispatch on Pipelined Processor" by Karel Driesen, Urs Hölzle and Jan Vitek (ECOOP95 conference proceedings). Since the table is less than 5% full for a typical Smalltalk system, the authors considered STI a purely theoretical alternative to be use as a base for comparison with the practical ones.
This is "very expensive in space", but I can't buy a cheaper memory chip than an 8MB SDRAM, yet mass storage is less than 1MB of Flash memory. So I can waste space.
And about the resends, the compiler simply inlines the parent's source for the method into the spot where the resend was.
So you can't reassign a method slot? Or if you do you have to rebuild the map of all children..
You can change a method slot using reflection, if that is what you mean by "reassigning " it. You have to rebuild the maps for all of the children objects, which makes it too slow of an operation to be used in the middle of a program. But the pause is acceptable after pressing the green button in an editor.
My implementation does not keep the extensive dependency lists that Self 4 does, so any change at all to an object requires the children to be recompiled, even if they end up exactly the same after the recompilation.
-- Jecel
But those authors pack their tables in the paper I read. In fact I use exactly their selector based row displacement packing technique in my own vm.
On Friday, November 8, 2002, at 10:55 AM, Jecel Assumpcao Jr wrote:
Yes, what is called "Selector Table Indexing" (STI) in "Message Dispatch on Pipelined Processor" by Karel Driesen, Urs Hölzle and Jan Vitek (ECOOP95 conference proceedings). Since the table is less than 5% full for a typical Smalltalk system, the authors considered STI a purely theoretical alternative to be use as a base for comparison with the practical ones.
In Self this operation is considered reflective, but I'm not sure why. I would consider assigning methods to slots to be a normal kind of operation. It is a more specific way to change behaviour than dynamic inheritance via assigning parents, which I would think would be the more reflective operation.
On Friday, November 8, 2002, at 10:55 AM, Jecel Assumpcao Jr wrote:
You can change a method slot using reflection, if that is what you mean by "reassigning " it.
On Friday 08 November 2002 17:35, James McCartney wrote:
In Self this operation is considered reflective, but I'm not sure why. I would consider assigning methods to slots to be a normal kind of operation. It is a more specific way to change behaviour than dynamic inheritance via assigning parents, which I would think would be the more reflective operation.
Methods are considered constants in Self. If I have an object like this:
( | a <- 3. b = 4 | )
then changing the value of "a" can be done with a simple message send and is considered "normal programming", while changing the value of "b" requires either a primitive (_AddSlots:) or using a mirror on this object, either of which would be considered "meta programming".
So though both "a" and "b" can be changed, having to programmer explicitly give the system a tip about how they probably will be used allows the compiler to do a better job. So does having all method slots be constant.
One change in Self 4.1.6 compared to previous releases is that now you have the option of storing method objects in data slots.
One complication of having assignable method slots is that though assignment is unambiguous enough, what happens when you read from the slot? Do you get the method object as the result or is the method automatically executed? No matter which one you choose as the default, you will need a special scheme for the other.
In another email James McCartney wrote:
But those authors pack their tables in the paper I read. In fact I use exactly their selector based row displacement packing technique in my own vm.
The paper I mentioned can be found at http://www.cs.ucsb.edu/labs/oocsb/papers.shtml though for some reason I can't access that right now.
All compression schemes require you to add some extra tests to your dispatch sequence. Normally it is well worth it. But I have a machine running at 57MHz with lots of extra memory that I can't otherwise use, so I don't compress.
-- Jecel
At OOPSLA there was a paper titled "Fast Algorithm for Creating Space Efficient Dispatching Tables with Application to Multi-Dispatching" described at http://oopsla.acm.org/fp/files/pap-4-optimizations.html and is on his homepage at http://www.cs.technion.ac.il/~zyoav/
This work looks very promising but requires some time to stop and freeze the objects (and the relevant protocols) involved. Possibly the best way to handle it would be to allow a collection of objects to use one meta-object to describe and implement that lookup algorithm for them all collectively.
On Saturday 09 November 2002 04:49, Brian T Rice wrote:
[OOPSLA paper] This work looks very promising but requires some time to stop and freeze the objects (and the relevant protocols) involved.
They claim that for even very large type hierarchies this only takes a small fraction of a second on a fast machine. And that they have an incremental version of their system (not described in the paper, I think) where this would be even less of a problem.
Possibly the best way to handle it would be to allow a collection of objects to use one meta-object to describe and implement that lookup algorithm for them all collectively.
This would work very well with their system if you have one meta-object per "type partition", but I think it is the overhead of grouping objects into partitions that you are trying to eliminate, right?
The numbers in the paper were very interesting, though I think that there are far more "types" and messages in Self 4 than what they evaluated. I wonder if that would have any effect in the results.
-- Jecel
On Tuesday, November 12, 2002, at 06:38 AM, Jecel Assumpcao Jr wrote:
[OOPSLA paper] This work looks very promising but requires some time to stop and freeze the objects (and the relevant protocols) involved.
They claim that for even very large type hierarchies this only takes a small fraction of a second on a fast machine. And that they have an incremental version of their system (not described in the paper, I think) where this would be even less of a problem.
That is interesting, because they say that in the majority of cases that the table creation time was less than 1/100 of a second. For my own work, music, even a pause of several milliseconds is too long, but this alg. might be fast enough. I'm looking for constant time lookup and incremental update of inheritance. They say that they sacrifice constant time lookup with this scheme. I'll have to read the paper now..
On Tuesday 12 November 2002 19:02, James McCartney wrote:
That is interesting, because they say that in the majority of cases that the table creation time was less than 1/100 of a second. For my own work, music, even a pause of several milliseconds is too long, but this alg. might be fast enough.
You need to create the table only when first reading in the system into an "empty world". You probably won't be playing music just then.
An option would be to throw away the table and build a new one every time the programmer makes any changes, and with this system even this would be fast enough. If you just patch the tables then the pause will be far shorter.
I'm looking for constant time lookup and incremental update of inheritance. They say that they sacrifice constant time lookup with this scheme. I'll have to read the paper now..
They have a small binary search. In Self we have a linear search in the PICs (polymorphic inline caches) so we don't have constant lookup times either. In fact, no processor with caches will ever really have constant lookup times.
-- Jecel
On Tuesday, November 12, 2002, at 02:32 PM, Jecel Assumpcao Jr wrote:
On Tuesday 12 November 2002 19:02, James McCartney wrote:
That is interesting, because they say that in the majority of cases that the table creation time was less than 1/100 of a second. For my own work, music, even a pause of several milliseconds is too long, but this alg. might be fast enough.
You need to create the table only when first reading in the system into an "empty world". You probably won't be playing music just then.
Well that is what I am doing now. However I would like to have a system where new types and slots can be added dynamically while I *am* playing music. That is what I am looking for. The only solution that is truly incremental so far seems to be hashing.
An option would be to throw away the table and build a new one every time the programmer makes any changes, and with this system even this would be fast enough. If you just patch the tables then the pause will be far shorter.
I'm already using row displacement dispatch tables which can do this. I just cannot do it while playing.
I'm looking for constant time lookup and incremental update of inheritance. They say that they sacrifice constant time lookup with this scheme. I'll have to read the paper now..
They have a small binary search. In Self we have a linear search in the PICs (polymorphic inline caches) so we don't have constant lookup times either. In fact, no processor with caches will ever really have constant lookup times.
A memory cache miss is still O(1) time, just larger constant factor. But a polymorphic cache miss is not O(1) because you then have to do an actual lookup which is not O(1).
Jecel Assumpcao Jr wrote:
In fact, no processor with caches will ever really have constant lookup times.
Well, every processor I've ever seen, you can turn the caches off ;-)
-- Jecel
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
self-interest@lists.selflanguage.org