Hi!
I've set up a web page for my first version of ``mySelf´´, my Self parser and simulator written in Smalltalk! If you like, check out
http://freeweb.digiweb.com/computers/sma/
MySelf works both for VisualWorks Smalltalk and Squeak, however the Squeak port is really crude. As Squeak's syntax unfortunately doesn't conform with the "standard", I had to work around some problems, mainly the fact that "_" isn't a valid character in message names. Therefore, I recommend VisualWorks NC, which is also _much_ faster.
Happy New Year! bye -- Stefan Matthias Aust // Are you ready to discover the twilight zone?
------------------------------------------------------------------------ eGroup home: http://www.eGroups.com/list/self-interest Free Web-based e-mail groups by eGroups.com
I'm back from vacation and only now reading my December email. I'm glad to see people having fun with the Self syntax.
As you have already found out, '^' and '|' are NOT valid binary selectors.
About expressions in parentheses, in Self 1.0 they WERE parsed as method objects, though they were tagged as "inner methods" as opposed to the normal "outer methods". So the expression
3 + ( 4 * 5 )
would result in the following bytecodes:
push literal ( | | 4 * 5 ) push literal 3 send '+'
Just as pushing a block does something special (it actually creates a block context and pushes that instead), the push inner method resulted in pushing the result of evaluating that method.
This was eliminated in Self 2.0 or 3.0, and I patched my tinySelf0 parser (in C) to handle this by testing for inner methods that had exactly one expression and no slots and "expanding" them so that the following bytecodes would now be generated:
push literal 5 push literal 4 send '*' push literal 3 send '+'
If you write something like
3 + ( | x=9 | x * 5 )
then tinySelf0 will generate an inner method, but Self 4.0 will complain that inner methods are no longer supported.
Hi!
I've set up a web page for my first version of ``mySelf´´, my Self parser and simulator written in Smalltalk! If you like, check out
http://freeweb.digiweb.com/computers/sma/
MySelf works both for VisualWorks Smalltalk and Squeak, however the Squeak port is really crude. As Squeak's syntax unfortunately doesn't conform with the "standard", I had to work around some problems, mainly the fact that "_" isn't a valid character in message names. Therefore, I recommend VisualWorks NC, which is also _much_ faster.
I'll have a look as soon as I can.
I have downloaded two very interesting programs. Here are the first lines of the README files for them:
-------------------------- Ultimardrev-Self is an interpreter of Self 1.0 implemented in Smalltalk 2.5 by P.Mulet and F.Rivard in 1991. -------------------------- The PROTOTALK platform for the simulation of prototype-based languages. --------------------------
I have never run them as I had no access to a ParcPlace Smalltalk and didn't want to port them to some other dialect. It would be interesting to see if the would run on the latest version of VW NC. They don't seem to be available for FTP, though, so I am wondering if I should make them available on my site (contacting the authors to ask them about this will probably be a bit complicated).
Happy New Year! bye -- Stefan Matthias Aust // Are you ready to discover the twilight zone?
-- Jecel
------------------------------------------------------------------------ eGroup home: http://www.eGroups.com/list/self-interest Free Web-based e-mail groups by eGroups.com
About expressions in parentheses, in Self 1.0 they WERE parsed as method objects, though they were tagged as "inner methods" as opposed to the normal "outer methods".
Then SELF 1.0 had no '( expr )' at all. Everything in () was a new object, however only objects in slot definitions got a 'self' slot. Right?
This was eliminated in Self 2.0 or 3.0, and I patched my tinySelf0 parser (in C) to handle this by testing for inner methods that had exactly one expression and no slots and "expanding" them so that the following bytecodes would now be generated:
Which is the same heuristic as I used. However, I dislike this solution because I'd prefer one which can be expressed with a context free LL(1) grammar.
then tinySelf0 will generate an inner method, but Self 4.0 will complain that inner methods are no longer supported.
But this means, the grammar given in the Self 4.0 programmers reference is wrong (or at least ambigious). If I understand this correctly, objects with code (aka methods) are only allowed in constant slot definition and never in general expressions. This would lead to...
object --> '(' [object-slot-list] ')' method --> '(' [object-slot-list] code ')' block --> '[' [block-slot-list] [code] ']'
(btw, whats the result of the empty block []?)
receiever --> [primary] primary --> 'self' | '(' expression ')' | constant constant --> number | string | object | block
expression --> keyword-message and so on... (see my web site for a full grammar)
unary-slot --> identifier ['*'] '=' method binary-slot --> operator [identifier] '=' method keyword-slot --> ... '=' method
data-slot --> identifier ['*'] [('<-' | '=') expression]
and there's now only a problem to distinguish a constant data-slot and an unary-slot. (| a = (3 + 4). |) would actually be a method, not a constant.
I have downloaded two very interesting programs. Here are the first lines of the README files for them:
I'd be interested in the Ultimardrev-Self thing, and might even try to port this to VW3 or Squeak, which might be easier.
bye -- Stefan Matthias Aust // Are you ready to discover the twilight zone?
------------------------------------------------------------------------ eGroup home: http://www.eGroups.com/list/self-interest Free Web-based e-mail groups by eGroups.com
Then SELF 1.0 had no '( expr )' at all. Everything in () was a new object, however only objects in slot definitions got a 'self' slot. Right?
Yes, but these inner methods had an invisible "lexical parent" slot just like block method contexts do. Another strange thing in Self 1.0 was that the first byte in the bytecode vector was used to indicate what kind of method this object was. I really hated that - it was so un-object-oriented.
[patch in tinySelf0 parser]
Which is the same heuristic as I used. However, I dislike this solution because I'd prefer one which can be expressed with a context free LL(1) grammar.
I wish it were possible, but there are several cases in Self where grammar alone won't get the job done. There is a great parser generator for Self called Mango, yet it was never used to generate a parser for Self itself!!
then tinySelf0 will generate an inner method, but Self 4.0 will complain that inner methods are no longer supported.
But this means, the grammar given in the Self 4.0 programmers reference is wrong (or at least ambigious). If I understand this correctly, objects with code (aka methods) are only allowed in constant slot definition and never in general expressions. This would lead to...
object --> '(' [object-slot-list] ')' method --> '(' [object-slot-list] code ')' block --> '[' [block-slot-list] [code] ']'
(btw, whats the result of the empty block []?)
It is just a block that when sent the 'value' message will return the receiver of the method in which it lexically appears.
receiever --> [primary] primary --> 'self' | '(' expression ')' | constant constant --> number | string | object | block
Ok, so here you distinguish between "normal" objects and methods in the grammar. That is probably a good solution for a new parser. Since both mine and Self 4's evolved from Self 1.0, the hack was the best choice.
and there's now only a problem to distinguish a constant data-slot and an unary-slot. (| a = (3 + 4). |) would actually be a method, not a constant.
It should only be interpreted as a constant slot if it can't be interpreted as a unary method slot. Actually, I see 'a' as a constant slot where the constant value happens to be a method object!
Did you find handling the '.' character hard? The way things are defined, it is impossible to cleanly separate the lexer and the parser. The problem is that white space is significant:
x. y: 9.1 + 2 "my y = 11.1" x.y: 9.1 + 2 "my x parent's y = 11.1" x. y: 9. 1 + 2 "my y = 9" x.y: 9. 1 + 2 "my x parent's y = 9"
I also found that the implicit self things really made the parser a bit messy, but I like that enough that it is worth the hassle.
I'd be interested in the Ultimardrev-Self thing, and might even try to port this to VW3 or Squeak, which might be easier.
I will try to upload a compressed version of it (180KB) so you can get it as
http://www.lsi.usp.br/~jecel/ult.tgz
It will certainly be easier to port to VW3 than to Squeak.
-- Jecel
------------------------------------------------------------------------ eGroup home: http://www.eGroups.com/list/self-interest Free Web-based e-mail groups by eGroups.com
Yes, but these inner methods had an invisible "lexical parent" slot just like block method contexts do. Another strange thing in Self 1.0 was that the first byte in the bytecode vector was used to indicate what kind of method this object was. I really hated that - it was so un-object-oriented.
Well, for the evaluation function, you need to at least detect methods and blocks as one need to clone them and create an activation object. How are these objects detected? Currently, I'm checking whether a "self*" or a "(parent)*" slot exists, but that's also a hack. I think, I'll introduce a flag in the map.
(btw, whats the result of the empty block []?)
It is just a block that when sent the 'value' message will return the receiver of the method in which it lexically appears.
Okay, so [] is really an abbreviation for [self].
receiever --> [primary] primary --> 'self' | '(' expression ')' | constant constant --> number | string | object | block
Ok, so here you distinguish between "normal" objects and methods in the grammar. That is probably a good solution for a new parser.
I wrote my parser from scratch, only using the grammar which was available. I've no fear of actually changing the grammar to improve it, if the language is still Selfish enough :-)
For example, I right now thinking about restricting the use of "*" to constant slots. Perhaps an unneeded restriction, but I think, it's easier to create Self system which is not only simulated in Smalltalk but which actually runs using the normal Smalltalk (Squeak) VM.
Did you find handling the '.' character hard? The way things are defined, it is impossible to cleanly separate the lexer and the parser. The problem is that white space is significant:
Yes, actually I've three kinds of periods which are handled in the scanner. When reading a number, the scanner will check whether there's a digit following the period. If not, it's a normal period, otherwise it's part of the number. If the scanner finds a period, it also checks the next character. For small letters, a "resendDot" flag is set. It's a bit tricky and perhaps not the best language design, to make the meaning of the period so dependent on the occurence of white spaces.
I also found that the implicit self things really made the parser a bit messy, but I like that enough that it is worth the hassle.
This actually is quite simple in my parser. When looking for a receiver, I expect either an identifier (which might be the special string "self" or not), a number, a string, a block or something with parenthesis. Everything else isn't accepted but nil is returned.
When finally creating a message send, a nil for the receiver expression is accepted as valid result. When looking for arguments, the nil is rejected and an error is issued.
My problem was to detect resends. When reading an identifier, the parser must scan the next token and if that's a period, the resendDot flag must be set. Otherwise, I'd to rewind and check for a normal message send.
[Ultimardrev-Self]
Thanks.
bye -- Stefan Matthias Aust // Are you ready to discover the twilight zone?
------------------------------------------------------------------------ eGroup home: http://www.eGroups.com/list/self-interest Free Web-based e-mail groups by eGroups.com
[hated that first byte identified method "type"]
Well, for the evaluation function, you need to at least detect methods and blocks as one need to clone them and create an activation object. How are these objects detected? Currently, I'm checking whether a "self*" or a "(parent)*" slot exists, but that's also a hack. I think, I'll introduce a flag in the map.
You could have several map subclasses and use that instead of a class. Blocks and Methods would do the right thing when you sent the exact same message to their map. If an object had a generic map instead, something different would happen. Checking for certain slots is a bad use of reflection (though both Self 4.0 and the current tinySelf use it. But tinySelf now uses Self 4.0 objects and maps - when it gets its own in the next version I will change to a more object oriented style).
(btw, whats the result of the empty block []?)
It is just a block that when sent the 'value' message will return the receiver of the method in which it lexically appears.
Okay, so [] is really an abbreviation for [self].
Oooops! I was very wrong, here. I just tested it and it seems that the empty block gets a new empty object installed as the "method" in the value slot. Sending the 'value' message to the block returns this empty object instead of the receiver of the lexical context. I'd better check these things the next time.
I wrote my parser from scratch, only using the grammar which was available. I've no fear of actually changing the grammar to improve it, if the language is still Selfish enough :-)
I wonder why they didn't use '::' for resends? They added a lot of Cisms to the language anyway, and it would be very familiar to C++ programmers. There would be no confusion with the normal ':' for keyword selectors and the '.' would have been left free for numbers and statement ends.
For example, I right now thinking about restricting the use of "*" to constant slots. Perhaps an unneeded restriction, but I think, it's easier to create Self system which is not only simulated in Smalltalk but which actually runs using the normal Smalltalk (Squeak) VM.
Having parent data slots (dynamic inheritance - DI) seemed like a good idea but was only used a in very simple example (tree nodes that change behavior if they are empty or not). The problem is that no current implementation handles this very well. I have a solution, but if you eliminate DI you won't miss much.
I also found that the implicit self things really made the parser a bit messy, but I like that enough that it is worth the hassle.
This actually is quite simple in my parser. When looking for a receiver, I expect either an identifier (which might be the special string "self" or not), a number, a string, a block or something with parenthesis. Everything else isn't accepted but nil is returned.
When finally creating a message send, a nil for the receiver expression is accepted as valid result. When looking for arguments, the nil is rejected and an error is issued.
Hmm - that is pretty good. Even if you do this in Self itself, nil can never be receiver at parse time:
nil printString
is parsed as
selfSend 'nil' send 'printString'
so the nil will only appear on the stack at runtime.
My problem was to detect resends. When reading an identifier, the parser must scan the next token and if that's a period, the resendDot flag must be set. Otherwise, I'd to rewind and check for a normal message send.
My solution was way more ugly. I had to allow an N token lookahead to get things to work. Just a bad hack - I first got things working without the resend and then went back and tried to patch it to work. The lexer had to have a two character lookahead, which I find rather annoying.
In another message you mentioned that it looked easier to port Ultimardrev to Squeak and mentioned some of the difficulties. I agree that ParcPlace Smalltalk probably changed enough, specially in the UI classes, to make things worse than Squeak. You might consider if all this effort will be worth it, though. Self 1.0 was fun, but quite different from 4.0 in several aspects. Even if you don't get it running, I thought that looking that the sources of Ultimardrev might help give you some ideas.
I once considered very seriously changing Squeak into a Self implementation. With the Jitter "compiler", this would actually have a performance very close to the current Squeak. And with a little care, it could be nearly 100% compatible with the Squeak Smalltalk code (see Mario Wolczko's GNU Smalltalk port to Self 4.0 for an example of what it possible).
-- Jecel
------------------------------------------------------------------------ eGroup home: http://www.eGroups.com/list/self-interest Free Web-based e-mail groups by eGroups.com
Jecel wrote:
You could have several map subclasses and use that instead of a class. Blocks and Methods would do the right thing when you sent the exact same message to their map. If an object had a generic map instead, something different would happen. Checking for certain slots is a bad use of reflection (though both Self 4.0 and the current tinySelf use it. But tinySelf now uses Self 4.0 objects and maps - when it gets its own in the next version I will change to a more object oriented style).
Yes, but... well, I tried to immitate the implementation as described in the various papers and this can't have map classes, just simple chunks of map memory. Therefore, I assumed there must be some undocumented map flags somewhere. I don't want to build on OO technics where OO might not available - when creating a SELF VM out of Squeak, for example.
Okay, so [] is really an abbreviation for [self].
Oooops! I was very wrong, here. I just tested it and it seems that the empty block gets a new empty object installed as the "method" in the value slot. Sending the 'value' message to the block returns this empty object instead of the receiver of the lexical context. I'd better check these things the next time.
I'm afraid, I like the "[self]"-idea better than replacing "[]" with "[(| |) _Clone]" (or even "[(| |)]".
But wait, what's the value of an empty method? Can I define one? I think, that's impossible, as a method object is only a method if and only if it holds some code. Now when it's impossible to create an empty method, why should it be possible to create an empty block?
I wonder why they didn't use '::' for resends? They added a lot of Cisms to the language anyway, and it would be very familiar to C++ programmers. There would be no confusion with the normal ':' for keyword selectors and the '.' would have been left free for numbers and statement ends.
Hm, "super::someMessage" could mean an undirected resend while "sma::someMessage" would resend someMessage to sma. "::" would become a new operator and it's the responsibility of the parser, to distinguish keywords from "::", which just needs a lookup buffer of 2.
Having parent data slots (dynamic inheritance - DI) seemed like a good idea but was only used a in very simple example (tree nodes that change behavior if they are empty or not). The problem is that no current implementation handles this very well. I have a solution, but if you eliminate DI you won't miss much.
This was my estimation. Furthermore, if I allow direct resends to non-parent-slots, I've again get DI through the back door, haven't I?
In another message you mentioned that it looked easier to port Ultimardrev to Squeak and mentioned some of the difficulties. I agree that ParcPlace Smalltalk probably changed enough, specially in the UI classes, to make things worse than Squeak. You might consider if all this effort will be worth it, though. Self 1.0
I gave up after the Squeak VM crashed the second time. I always forget that I cannot proceed after changing a methods. This is really annoying. I managed to file in all code and even made the SelfParser run that SelfObject could initialize all globals.
was fun, but quite different from 4.0 in several aspects. Even if you don't get it running, I thought that looking that the sources of Ultimardrev might help give you some ideas.
I browsed through the code and my first impression was (besides that I noticed that I don't know enough French to understand the comments) that the code needs refactoring. I'd say you can reduce it to half the size. It would also a good idea to remove some 30 Globals which are added to Smalltalk and to separate UI and self system.
I once considered very seriously changing Squeak into a Self implementation. With the Jitter "compiler", this would actually have
I'm very interested in this idea, too. Well, perhaps with a slightly different emphasize.
a performance very close to the current Squeak. And with a little care, it could be nearly 100% compatible with the Squeak Smalltalk code (see Mario Wolczko's GNU Smalltalk port to Self 4.0 for an example of what it possible).
You thought about simulating the old Squeak Smalltalk code on a new Self VM, didn't you? I'd like to create a VM which can run both simultanly. I've no concrete idea yet how to do this, but it should be just an extension. The second step then would remove all Smalltalk-only stuff from the VM, hopefully reducing its complexity.
bye -- Stefan Matthias Aust // Are you ready to discover the twilight zone?
------------------------------------------------------------------------ eGroup home: http://www.eGroups.com/list/self-interest Free Web-based e-mail groups by eGroups.com
Yes, but... well, I tried to immitate the implementation as described in the various papers and this can't have map classes, just simple chunks of map memory.
Actually, one of the first words in a map is the C++ VTable - its runtime type (class)!
Therefore, I assumed there must be some undocumented map flags somewhere. I don't want to build on OO technics where OO might not available - when creating a SELF VM out of Squeak, for example.
Squeak was OO the last time I checked ;-) but if you mean that you want to use their Smalltalk to C translator, then you are right.
I'm afraid, I like the "[self]"-idea better than replacing "[]" with "[(| |) _Clone]" (or even "[(| |)]".
But wait, what's the value of an empty method? Can I define one? I think, that's impossible, as a method object is only a method if and only if it holds some code. Now when it's impossible to create an empty method, why should it be possible to create an empty block?
Good point. You can define an empty method, but then it is only a normal object and not a method. Blocks look like methods, but they have a lot more hidden stuff. Since Self allows most objects (including nil) to understand the 'value' message, it wouldn't be such a hardship if empty blocks were forbidden. But the way it is now is reasonable too.
[I wonder why they didn't use '::' for resends?]
Hm, "super::someMessage" could mean an undirected resend while "sma::someMessage" would resend someMessage to sma. "::" would become a new operator and it's the responsibility of the parser, to distinguish keywords from "::", which just needs a lookup buffer of 2.
Why can't the lexer handle this?
[DI is *very* rare]
This was my estimation. Furthermore, if I allow direct resends to non-parent-slots, I've again get DI through the back door, haven't I?
You can't resend to non-parents, except by using some special primitives. But in that case, your performance won't be very good anyway, right?
I browsed through the code and my first impression was (besides that I noticed that I don't know enough French to understand the comments) that the code needs refactoring. I'd say you can reduce it to half the size. It would also a good idea to remove some 30 Globals which are added to Smalltalk and to separate UI and self system.
The French thing is a complication. It prompted me to eliminate all Portuguese comments from my code a few years ago :-)
You thought about simulating the old Squeak Smalltalk code on a new Self VM, didn't you? I'd like to create a VM which can run both simultanly. I've no concrete idea yet how to do this, but it should be just an extension. The second step then would remove all Smalltalk-only stuff from the VM, hopefully reducing its complexity.
I would recompile old Squeak code to Self bytecodes. The Jitter translator would generate the same threaded code as before, if all went well. Here is the plan I came up with:
- subclass ObjectMemory and patch it so it deals with
Self style object and headers
- copy Compiler and patch it so it will parse Squeak
sources into Self objects and bytecodes (this involves throwing out a lot of stuff, but see step 4)
- add a method to (1) so it will use (2) to convert
an existing Squeak image into Self format. Now we generate a Selfish Squeak 1.31 image. An alternative would be to use the SystemTracer, but I think that might make me write a lot of code twice.
- subclass (1) by copying all the Jitter classes (if
this had been created as a decent framework, this wouldn't be necessary. But it isn't a perfect world). Modify the copied classes to translate from Self bytecodes to thread code (this mostly means adding back the code we threw out in step 2, but in very different classes and execution time).
- translate (4) to C and compile it. We now would have
a Squeak that looks and works exactly like the original, but is very different inside.
- copy Compiler yet again, and this time patch it to
translate from Self sources to Self bytecodes. Also make a new inspector to deal with Self objects "natively".
This can be released as a dual Self/Squeak system.
- fileIn and patch as much code from Self 4.0 as possible.
Patch Squeak morphs and stuff to allow more Self 4.0 code to be integrated into the system.
This can be a second release, and be used as a very slow replacement for Self 4.0.
You can tell when this was from the reference to Squeak 1.31. The need to copy the whole Jitter hierarchy to change it was particularly bothersome, in my opinion. I don't normally miss multiple inheritance, but this was a case when I did.
I never got very far into step 1, for I decided it would be nicer to take the nearly working tinySelf1 and finish that instead. Good plan, no I just have to do it :-)
But it would be great if someone were to create a Self/Squeak like I described above.
Cheers, -- Jecel
------------------------------------------------------------------------ eGroup home: http://www.eGroups.com/list/self-interest Free Web-based e-mail groups by eGroups.com
Jecel Assumpcao Jr wrote:
[different classes of map objects?]
Actually, one of the first words in a map is the C++ VTable - its runtime type (class)!
I see! Pretty obvious actually. Probably that "i didn't see the forest because of all that trees" problem.
Squeak was OO the last time I checked ;-) but if you mean that you want to use their Smalltalk to C translator, then you are right.
It was? Amaising :-)
But you guessed right, I was referring to the translator. The alternative would be of course to extend the translator to support methods using vtable lookup. The week definitely needs more weekends.
Good point. You can define an empty method, but then it is only a normal object and not a method. Blocks look like methods, but they have a lot
No :-) An empty method isn't a method as a method is a method if and only if it has code. But it's pretty clear want you meant.
Hm, "super::someMessage" could mean an undirected resend while "sma::someMessage" would resend someMessage to sma. "::" would become a new operator and it's the responsibility of the parser, to distinguish keywords from "::", which just needs a lookup buffer of 2.
Why can't the lexer handle this?
You're right. It's of course the scanner (lexer), not the parser. My fault.
I would recompile old Squeak code to Self bytecodes. The Jitter translator would generate the same threaded code as before, if all went well. Here is the plan I came up with:
- subclass ObjectMemory and patch it so it deals with
Self style object and headers
We'd need slot objects, slot array objects and byte array object - which are all already supported by Squeak. The only difference is the map reference instead of the class reference. However, I'm no expect for the ObjectMemory class and I've only a vague idea how the whole ObjectMemory works.
- copy Compiler and patch it so it will parse Squeak
sources into Self objects and bytecodes (this involves throwing out a lot of stuff, but see step 4)
What kind of stuff? Typically, a Smalltalk VM needs to support the following kinds of instructions:
SMALLTALK SELF
push self SELF push slot # SELF SEND i_<slotname> push temp # SELF SEND t_<tempname> push literal # LITERAL #
Shared variables are acccessed using variable binding objects to which #value and #value: are sent. I think, typical instruction sets have special instructions here, but this is only an optimization. The same is true for "push constant" type instructions.
store slot # SELF SEND i_<slotname>: store temp # SELF SEND t_<tempname>:
Store doesn't pop the stack. Two other instructions, dup and pop are typically used to optimize cascading and multiple assignments. There's no direct way to express this using the SELF bytecodes, however that's no problem as the stack will be always cleaned up when the method execution has been completed.
send msg, argnum [SELF] SEND msg (sends to self must be treaten special) supersend msg, argnum SUPER SEND msg
SELF probably determines the number of arguments from the selector which isn't problematic as this happens only at (byte code) compile time.
RETURN STACKTOP --- (No need to translate this, as this is the default action after the method execution has been completed.)
BLOCKRETURN STACKTOP NON-LOCAL-RETURN
The following instructions aren't needed by SELF because it doesn't inline common control flow instructions like #ifTrue: or #whileTrue:. It might be a good idea however, to extend the original SELF instruction set to support similar instructions if the instructions aren't compiled and optimized but just interpreted. As the SELF instruction bears no parameter, there's room for 31 other parameterless instructions.
BRANCH TO address BRANCH IF STACKTOP IS FALSE TO address
PRIMITIVE number
I think, with one exception, byte codes can be translated very easily. The exception are cascades, which need to be compiled a bit differently. Before code generation, we perform the following translation:
rcv m1; m2; ... mN --> ([:value | value m1. value m2 ... value mN] value: (rcv))
Alternatively, we can add DUP and POP instructions to the SELF instruction set.
Either I oversaw something obvious or there're no big problems in generating SELF instructions instead of Smalltalk instructions, assuming we've the same class hierarchy intrastructure.
- translate (4) to C and compile it. We now would have
a Squeak that looks and works exactly like the original, but is very different inside.
Sounds too simple to be true :-)
You can tell when this was from the reference to Squeak 1.31. The need to copy the whole Jitter hierarchy to change it was particularly bothersome, in my opinion. I don't normally miss multiple inheritance, but this was a case when I did.
:-) Time to refactorize the whole interpreter/dynamicInterpreter stuff.
But it would be great if someone were to create a Self/Squeak like I described above.
Indeed.
bye -- Stefan Matthias Aust // Are you ready to discover the twilight zone?
------------------------------------------------------------------------ eGroup home: http://www.eGroups.com/list/self-interest Free Web-based e-mail groups by eGroups.com
Stefan Matthias Aust wrote:
We'd need slot objects, slot array objects and byte array object - which are all already supported by Squeak. The only difference is the map reference instead of the class reference. However, I'm no expect for the ObjectMemory class and I've only a vague idea how the whole ObjectMemory works.
You are probably right that Self objects could be the same format as Squeak objects with no problems at all. ObjectMemory is a lot more complex than it needs to be in order to save space (several different header formats, for example).
- copy Compiler and patch it so it will parse Squeak
sources into Self objects and bytecodes (this involves throwing out a lot of stuff, but see step 4)
What kind of stuff? Typically, a Smalltalk VM needs to support the following kinds of instructions:
SMALLTALK SELF
push self SELF push slot # SELF SEND i_<slotname> push temp # SELF SEND t_<tempname> push literal # LITERAL #
I wouldn't add the "i_" and "t_" prefixes, though. If you define a temporary variable with the same name as an instance variable in a method, you have no way to access the instance variable at all. Which Self does automatically.
Shared variables are acccessed using variable binding objects to which #value and #value: are sent. I think, typical instruction sets have special instructions here, but this is only an optimization. The same is true for "push constant" type instructions.
No need for this - shared variables (pool dictionaries, global variables and so on) can be handled by making the class objects have parent slots pointing to simple objects with the right slot names. Then SELF SEND <global variable name> will do the trick.
store slot # SELF SEND i_<slotname>: store temp # SELF SEND t_<tempname>:
Store doesn't pop the stack. Two other instructions, dup and pop are typically used to optimize cascading and multiple assignments. There's no direct way to express this using the SELF bytecodes, however that's no problem as the stack will be always cleaned up when the method execution has been completed.
I think Mario's Smalltalk simply ignores cascades. They can emulated easily enough with hidden temporary variables:
makeSilly "create a new initialized silly object" ^ Silly new center: 0@0 ; color: Paint black; border: 2; pen: Pen new; yourself
would become:
makeSilly = ( "create a new initialized silly object" | cascTemp1 | cascTemp1: global_Silly new. cascTemp1 center: 0@0. cascTemp1 color: global_Paint black. cascTemp1 border: 2. cascTemp1 pen: global_Pen new. cascTemp1 yourself ).
You can always use "global pen" instead of "global_Pen". The important thing is to make it different from "self pen"!
send msg, argnum [SELF] SEND msg (sends to self must be treaten special)
No - this will give you the wrong results. You want this for messages to self in Smalltalk instead:
PUSH_SELF SEND <msg>
About the control flow bytecodes, I was counting on removing this from the parser (Smalltalk compiler) and having Jitter put them back in. The total complexity would be roughly the same, though performancewise it is not interesting to move complexity to runtime like this.
Neither the NON_LOCAL_RETURN nor the PUSH_SELF bytecodes need their index field (always zero, currently), so up to 62 new bytecodes could be added to Self. I've looked into this (the idea was to make the bytecodes complete enough that primitives could be written in them) and also into getting rid of a few of the 8 bytecodes. The PUSH_SELF bytecode can easily be replaced with SELF_SEND 'self', so it isn't really needed. The two resend bytecodes could be replaced with primitives since they are pretty rare both statically and dynamically (and tend to be used in initialization methods). So I had the idea to merge the literal vector and bytecodes so I wouldn't need the EXTEND_INDEX bytecodes either. We would need two bits:
00 NON_LOCAL_RETURN 01 PUSH_NEXT_LITERAL 10 SEND_NEXT_LITERAL 11 SELF_SEND_NEXT_LITERAL (or end of method marker, if no more literals are available)
The very first literal would now be the 15 first instructions. With this scheme, when the same literal is used twice it must appear twice in the literal vector. This isn't as common as you might think, however, so this idea might have some merit.
So the method
print = (x print. '@' print. y print)
which is normally coded as
literals: 0) 'x' 1) 'print' 2) '@' 3) 'y'
bytecodes: 0) SELF_SEND 0 1) SEND 1 2) PUSH_LITERAL 2 3) SEND 1 4) SELF_SEND 3 5) SEND 1
could become:
literals: 0) 11 10 01 10 11 10 11 00 00 00 00 00 00 00 00 00 1) 'x' 2) 'print' 3) '@' 4) 'print' 5) 'y' 6) 'print'
When you take into account header words, this second version is the same size or shorter even though it has the nasty repeated literals (which aren't very common, as I noted above).
Now this would be a nice format for MPEG-4, wouldn't it ;-) ?
-- Jecel
------------------------------------------------------------------------ eGroup home: http://www.eGroups.com/list/self-interest Free Web-based e-mail groups by eGroups.com
I wouldn't add the "i_" and "t_" prefixes, though. If you define a temporary variable with the same name as an instance variable in a method, you have no way to access the instance variable at all. Which Self does automatically.
Wait. I'm comparing byte codes, disregarding any additonal constraints of the Smalltalk language. As instance variables and temporary variables are typically references by indices, you _can_ distinuish them, even if they'd have the same name. You're however right that this distinction isn't really needed. You have however to distinguish the slot accessor methods needed for SELF from normal methods.
Shared variables are acccessed using variable binding objects to which #value and #value: are sent. I think, typical instruction sets have special instructions here, but this is only an optimization. The same is true for "push constant" type instructions.
No need for this - shared variables (pool dictionaries, global variables and so on) can be handled by making the class objects have parent slots pointing to simple objects with the right slot names. Then SELF SEND <global variable name> will do the trick.
Good point. Of course you're right. However, has this way really an advantage over the "classical" Smalltalk way? Actually, doesn't I have to support the old way to support all kinds of meta class manipulation stuff that people probably expect to work.
Let's say we implement "Smalltalk" as an unique object which has a named slot for every global in the Smalltalk world. The typical idiom "Smalltalk at: aSymbol" must be supported. The SELF Smalltalk object isn't SystemDictionary so it must emulate that at: method, probably using some (probably available) mirror primitives to access a slot by name.
BTW, I recently read that SELF handles Symbols (unique strings) inside the VM. A Smalltalk VM does this typically in Smalltalk. Would this be a problem?
I think Mario's Smalltalk simply ignores cascades. They can emulated easily enough with hidden temporary variables:
As you might have seen, I suggested nearly the same transformation :-)
No - this will give you the wrong results. You want this for messages to self in Smalltalk instead:
You're right. But this makes the byte code transformation even simpler.
About the control flow bytecodes, I was counting on removing this from the parser (Smalltalk compiler) and having Jitter put them back in. The total complexity would be roughly the same, though performancewise it is not interesting to move complexity to runtime like this.
I think, when we start to discuss variants of the instruction set and its encoding, we first need to decide whether this set shall be optimized for interpretation or compilation. Squeak's Jitter can probably perform simply macro expansions, but I don't know whether it can perform more complex inlining and unrolling operations which would be probably needed to reach an acceptable execution speed.
Therefore, it might be worth considering an instruction set tailored for interpretation together with a parser/codegenerator which would even flatten if, while and for statements.
You proposed a clever and compact encoding. But it's tailored towards compiling. An interpreter would have to decode the instruction bit instead of using a simple jump table. You need to maintain a current literal pointer. You need to extract the argument count from message arguments (probably not that difficult. If the first character is a letter, just count the ":". Otherwise the argument count is 1). And your parser needs to macro-expand delegation into primitives. Finally, your interpreter must interpret a non-local-return as the end of execution even for methods because otherwise you cannot deal with instruction streams with sizes other than n*16.
bye -- Stefan Matthias Aust // Are you ready to discover the twilight zone?
------------------------------------------------------------------------ eGroup home: http://www.eGroups.com/list/self-interest Free Web-based e-mail groups by eGroups.com
Stefan Matthias Aust wrote:
[...] You have however to distinguish the slot accessor methods needed for SELF from normal methods.
Ooops - you are right. Otherwise this isn't possible:
frame "returns my current frame" ^ frame
[global objects]
[...] The typical idiom "Smalltalk at: aSymbol" must be supported.
Ooops again! Yes, it would be easier simply to have Smalltalk be a dictionary and implement globals the traditional way. The dictionary protocol could be emulated for "global" objects, but it wouldn't be worth it. Besides, with my idea any addition of a global variable would cause all methods that use *any* global variable to have to be recompiled.
BTW, I recently read that SELF handles Symbols (unique strings) inside the VM. A Smalltalk VM does this typically in Smalltalk. Would this be a problem?
Self calls them cannonical strings, and Smalltalk calls them symbols. In practice, they are the same thing. Self could do things like Smalltalk does and there would be no problems at all. Note that even Smalltalk needs some support from the VM for symbols.
I think Mario's Smalltalk simply ignores cascades. They can emulated easily enough with hidden temporary variables:
As you might have seen, I suggested nearly the same transformation :-)
I like your use of blocks, but it would be more work for the native/threaded code generator to figure this out.
I think, when we start to discuss variants of the instruction set and its encoding, we first need to decide whether this set shall be optimized for interpretation or compilation. Squeak's Jitter can probably perform simply macro expansions, but I don't know whether it can perform more complex inlining and unrolling operations which would be probably needed to reach an acceptable execution speed.
As it is, Jitter can handle non of these things. I was thinking of a major extension (which would be a lot like the code that would be removed from the parser, so total complexity would be roughly the same).
Therefore, it might be worth considering an instruction set tailored for interpretation together with a parser/codegenerator which would even flatten if, while and for statements.
Great idea, but the current Squeak bytecodes already do this pretty well. Maybe just a SELF_SEND and SET_DELEGATEE would be needed to fully support the Self semantics?
You proposed a clever and compact encoding. But it's tailored towards compiling. An interpreter would have to decode the instruction bit instead of using a simple jump table. You need to maintain a current literal pointer. You need to extract the argument count from message arguments (probably not that difficult. If the first character is a letter, just count the ":". Otherwise the argument count is 1). And your parser needs to macro-expand delegation into primitives. Finally, your interpreter must interpret a non-local-return as the end of execution even for methods because otherwise you cannot deal with instruction streams with sizes other than n*16.
Yes, the format is much better for compiling than for interpreting. I first thought of the non-local-return as an end of exeuction, but then I couldn't distinguish between
[|:x| y: x+1. x-1]
and
[|:x| y: x+1. ^ x-1]
So I changed things so that a SELF_SEND when the literal pointer was past the end of the literal vector means end of execution.
-- Jecel
------------------------------------------------------------------------ eGroup home: http://www.eGroups.com/list/self-interest Free Web-based e-mail groups by eGroups.com
Where is a good reference on Self maps?
Also, the grammar talk and implementation talk has been interesting. Keep it up.
Dru Nelson Redwood City, California
------------------------------------------------------------------------ eGroup home: http://www.eGroups.com/list/self-interest Free Web-based e-mail groups by eGroups.com
http://www.lsi.usp.br/~jecel/ult.tgz
It will certainly be easier to port to VW3 than to Squeak.
I doubt. I looked through the code and it looks more than Squeak than anything else. The code uses all that old MVC stuff of Forms, DisplayScreen and so one. It also uses "_" for assignments. Actually, there're just five undefined classes (BooleanView, TextCompositor, InspectorView, NotifierController and NotifierView) which are subclasses. I think, BooleanView is some yes/no dialog, TextCompositor is something like a CompositionScanner (I don't know), InspectorView is probably class Inspector and the Notifier stuff is now implemented in the Debugger class I believe. The only big problem I've currently with the Squeak version is that some methods have more than 32 temps & args and this isn't supported by Squeak at all. Furthermore, the ULT code also uses block local variables, so I hacked the Squeak parser to support this. I also hacked the parser to accept identifiers with embedded periods :-)
bye -- Stefan Matthias Aust // Are you ready to discover the twilight zone?
------------------------------------------------------------------------ eGroup home: http://www.eGroups.com/list/self-interest Free Web-based e-mail groups by eGroups.com
Hi!
A month ago, Jeccel mentioned a Self 1.0 system for ObjectWorks Smalltalk called Ultimardrev. I did a crude port to VisualWorks 3.0. Now as it can at least successfully file-in all Self files provided, I consider my job done. It still has errors, but I'm afraid the French source code comments keep me from continuing this project. Please email me if YOU want to continue this.
Currently, I've an ENVY version but I can provide a non-envy source file if needed. You can file-in the stuff without modification of the base image. However, it makes a subtle change to the semantics of the parser, allowing selectors with embedded dots. Be aware of that. Currently, you can't cleanly remove the mess as it added literally dozens of global variables and also uses self modifying code. There's one unreferenced global called PT.
I replaced the screen and view classes and all tools with a new SelfWorkspace called workspace-like application based on the usual VW ApplicationModel (though I tried to preserve the funny cursors and the background pattern). It has all the buttons of the former SelfView but I'm sure they don't work correctly yet. I've no idea about the tracing functionality.
I mde two modifications to the *.slf files. If caching is activated, the file-in fails in 'ascii.slf'. Therefore, I commended out "_HighSpeed" in all.slf. Furthermore, float.slf contains the constant "1e100" which too big for VW. I changed this to 1e38.
If you want to try out the source, file-in everything, evaluate
SelfObject selfKernel. SelfWorkspace open
and then evaluate
_SourceDir: 'ult\'. 'all' _RunScript
in the Self workspace (if you unzipped the *.slf files to a different location that the current directory, use a different file name here) and be patient.
Then try to print
3 + 4
;-)
bye -- Stefan Matthias Aust // Don't talk. Just doIt.
------------------------------------------------------------------------ eGroup home: http://www.eGroups.com/list/self-interest Free Web-based e-mail groups by eGroups.com
Great job! Too bad I don't have time to finish it. If you want to make it available, I could place it in the same directory as the original Ultimardrev. The non-Envy version would probably be better for most people.
While Self has grown a lot since the 1.0 days, there was a lot of complexity there that no longer exists (parent priorities, privacy, inner methods and "tie breaker" rules). Anyway, I hope it was worthwhile as a learning experience and it is fun to be able to type "3 + 4" and get the answer ;-)
On a related note, I have tried to download Gordon's Linux port about a dozen times but never got more than 2MB before the conection was broken. That is why I haven't made any comments about it.
-- Jecel
------------------------------------------------------------------------ eGroup home: http://www.eGroups.com/list/self-interest Free Web-based e-mail groups by eGroups.com
Thank you very much. I suspected that. :-)
I will study the work in this console (I'm interested in the Self language itslef).
Regards,
Jose
------------------------------------------------------------------------ eGroup home: http://www.eGroups.com/list/self-interest Free Web-based e-mail groups by eGroups.com
self-interest@lists.selflanguage.org