Summary of copy-on-write

Wed Mar 30 13:20:29 UTC 1994

  David Ungar wrote:
    > A Lieberman-style system (with copy-on-write) avoids the
    > non-concrete traits problem but makes it harder to express shared state--
    > how do you know when NOT to copy-on-write?

Sorry, I am probably missing some important points, but to me this seems to be a non-problem.
Just let the programmer tell you  - resp. SELF :-) - what he wants, full sharing, partial sharing, default sharing or no sharing at all, on a per-slot basis. 

Maybe the misunderstanding arises from the fact that your remark refers to copy-on-write at the external interface level while I am talking about COW at the delegation interface, i.e. the interface an object implicitely offers to its children? I have the impression that these two levels have been mixed up in the previous discussion. Without saying it explicitely diffrent contributions referred to different levels where copy-on-write could be applied. 
Let me explain (if you have five spare minutes).

THE EXTERNAL AND THE DELEGATION INTERFACE 

Encapsulation and sharing are two distinct, orthogonal issues which, in my opinion, require that all sensible choices along both dimensions should be explicitly specifiable by the programmer. Nevertheless, there are also many interesting parallels. One can view both as different aspects of _one_ concept, accessibility. While encapsulation specifies access permissions/restrictions at the level of the _external interface_, sharing specifies the same at the level of the _delegation interface_. SELF does not (yet) support this distinction, but it might be worthwile to think about a version that does. 

In the current SELF version one can allow different degrees of encapsulation by hiding slots or allowing execute and/or write access (using the _, ^_, ^, and _^ annotations). These access permissions (which I will abreviate by --, x-, -w, xw for no, execute only, write-only and execute&write access) take effect at the level of the _external interface_, i.e. they control what messages may be sent to an object.

The flexible specification of access restrictions applicable to message _sends_ is currently complemented by a hard-wired complete "freedom of access" at the level of message _delegtion_. This is based on the view that "parents are shared parts of objects", hence full (xw) access is given to all "shared parts" of the same object. 

Adding --, x-, -w, xw access specifications at the level of the _delegation_ interface allows more fine grained control on the extent to which "parents are shared parts". It allows to protect objects used as prototypes from being manipulated by their children in undesirable ways. E.g. consider a "mammal" object with the following structure:

give_birth        method        x-               x-
breathe           method        x-               x-
die               method        --               x-
...
population        0             x-               xw 
generation        1             x-               --
#_of_legs         4             xw               x-
...

Note the different access permissions at the external interface and delegation interface level for most of the shown slots. In contrast to current SELF only the population slot is xw-accessible by delegation, while the others are write-protected.
Full (xw) delegation access would change the mammal prototype where only a change to a child is intended, if the child mistakenly does not have an own, local slot. E.g. the #_of_legs of "mammal" is obviously a default that should not be changed when a descendand "human" object that erroneously has no local #_of_legs slot says to itself "#_of_legs: 2".

THE CORRUPTED PROTOTYPE PROBLEM (CPP)

Note that the above problem is similar to the "corrupted prototype problem" mentioned by Ian Woollard at the beginning of this discussion:

   > Currently:
   > - accidently forgetting the copy is a real screw up, you end up
   > modifying the prototype

The distinction is that Ian's CPP is at the level of the external interface: mistakenly sending "prototype slot: value" instead of "prototype copy slot: value" changes "slot" in the prototype, not in its copy.

Obviously there are two instances of the CPP: Ian's "misdirected message send CPP" (MMS-CPP) and the "misstructured child delegation CPP" (MCD-CPP) from the mammal example. 

MULTI-USER FEATURES

There have been different proposals to avoid MMS-CPP by intorducing multiuser features:

    Jecel (jecel at lsi.usp.br) wrote:
    > My suggestion for the "corrupted prototype problem" is to turn Self
    > into a multiuser system. If the prototype belongs to another user,
    > you can read it and clone it, but if you try to modify it you will
    > get an error. You would have to divide the system into many pseudo
    > users for this to work well. The idea is that the system should be
    > "open" but safe from accidents.

    David Ungar wrote:
    > >I think that the problem might be best solved by extending the VM to
    > >have restrictions on what processes can change what objects/slots. It
    > >could be ids, groups of ids or whatever.
    > >
    > >This might be prototyped by modifying slots assignment code.
    > >
    > >This potentially allows multiuser ability.
    > >
    > >You could also allow 'different' objects (e.g. lobby) for different
    > >processes (this also allows multiuser ability). This also avoids one
    > >process/user trashing another as well.
    > >
    > >-Ian
    > 
    > This is a good idea that has surfaced before--I think it has the
    > germ of something quite powerful!
    > 
    > Dave

Yes, this could be a powerful feature and I would love to have a multi-user SELF system, incorporating it.
Nevertheless, again I might misunderstand your ideas, how can multi-user features solve the CPP? They would avoid that it spreads in a multi-user environment, but  if I am working alone, how can they ensure that I do not change one of my _own_ prototypes just because I forgot a "copy" in the right place? 

Access permissions based on user ids, group ids or whatever are not general enough in this case, since they only allow to specifiy a finite and usually a-priori known number of ids, while there is an infinite number of objects that might send messages or delegate to a specific object. Also most of these objects are do not even exist when the object that is to be protected is created. 

COPY-ON-WRITE (COW)

Another proposal to prevent corrupted prototypes was to introduce copy-on-write behaviour:

  alex at XAIT.Xerox.COM (Alexis Layton)
    > This present discussion dovetails with something I have been thinking about
    > exploring for some time -- the idea is to have a stack or tree of contexts
    > that provide essentially copy-on-write semantics on objects; this would allow
    > one to do exploratory computations and then unwind any object changes.

Let me see if I understood your proposal. If an object receives a message which tries to modify it, a _new object_ is created and the computation continues on the new object, right?
The only other interpretation I can think of is to _add_ the slots to be changed to some _existing object_. But this would obviously contradict your goal of doing exploratory computations, which requires not to modify any of the objects involved. Even without the aim of exploratory computations I would strongly discourage copy-on-write which adds slots to existing objects that are _unrelated_ to the one who received the message. Allowing such a behviour would e.g. imply that the structure of an object could be changed without prior notice (especially without being intended by the object's programmer) as a side-effect of sending a message to another object.

Let me summarize: in my understanding Alexis' proposal refers to COW at the level of the _external interface_, suggesting the creation of (at least) one _new object_ in whose context one can do AI-style exploratory computations whose effects can be undone simply by removing the new object(s). 

This is obviously different from COW as proposed by J.J.Larea:

    > I think there are definite advantages to unifying this behaviour by
    > providing a mechanism for copy-on-write on a per-slot basis.  For
    > example, in a word-processor I might want all paragraphs to appear in
    > a default typeface unless specifically overridden by the user; if the
    > default is changed all paragraphs which have not been so overridden
    > should reflect that change; and a paragraph-object with an overridden
    > "typeface" ("style", whatever) slot should be revertable to the default
    > behaviour by deleting the slot.  

    > I think that is what Alexis was getting at.
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I don't! J.J. refers to COW is at the level of the _delegation interface_. Since it only applies to objects which are related by delegation this does not require creation of new objects. Adding previously inherited slots to a child does _not_ change its logical structure (visible to a client through the external interface).

Note that COW at the external interface level requires COW at the delegation interface level. Suppose an object sends a message to HUMAN which is a child of MAMMAL:

       O ---msg---> HUMAN - - - - - -> MAMMAL

If A requires COW, a new object TMP would be created and the msg solved in its context.
This could be done in two ways: either TMP is a copy of HUMAN (including its perent pointers) or TMP could be an empty object that delegates to HUMAN:  

        ---msg---> TMP (=HUMAN) - - - - - - - - - - - -> MAMMAL

        ---msg---> TMP (=empty)  - - -> HUMAN - - - - -> MAMMAL

In both cases the evaluation of the message may involve update messages to slots that are "inherited" from MAMMAL. Now COW at the delegation interface level is required, in order to apply the change to a slot created on-the-fly in the current self, TMP, not to MAMMAL. 

Hence COW at he delegation interface level is the more basic mechanism.   

COx and CORRUPTED PROTOTYPES 

Let's see what happens if we generalize COW to COx, where x can be execute or write and if we extend the access right hierarchy by adding a side-effecting copy dimension, C. 
Then the hierarchy
                            xw
                           /  \
                         x-    -w
                           \  /
                            --
is replaced by
                            xw
                           /  \
                          xC   Cw
                          / \ / \
                        x-  CC   -w
                          \ / \ /
                          C-   -C
                           \   /
                            --
where C in the first position means "copy-on-execute" whereas in the second position it is  "copy-on-write". 
(This level of generality might not allways be needed. I have included it in the discussion because a uniform model is preferrable and it does not seem to introduce any additional problems.)

Now consider a "mammal" object with the following structure:

give_birth        method        x-               x-
breathe           method        x-               x-
die               method        --               x-
...
population        0             x-               xw 
generation        1             x-               CC   <---
#_of_legs         4             xw               xC   <---
...

- COW and MISSTRUCTURED CHILD DELEGATION CPP

Obviously the CC and xC annotations in the delegation interface are better suited to describe the semantics of the example. The default #_of_legs will remain in effect for children objects until changed by them (which will lead to adding the slot locally).
On the contrary, the generation, is not a default applicable to children, but a slot which only describes the "local" object. Hence it should be copied to every child "as soon as possible" in order that a local change to the prototype propagates to the child. 

The generation slot example also shows limits of this approach. If the prototype changes its generation value (which is unlikely in this case but might happen in other examples) _before_ the the child's first execute or write-access to the slot, the local change will propagate to the child. In order to avoid this one would need a more general concept of "allways copy" (CA) slots, which will allways be immediately included in any copy of a prototype. Such a concept would allow to specify a kind of minimal hmogeneous structure of clones which is similar to instantiation up to the missing guarantee that the copies will not change this structure at their will. 

Summarizing, COW at the delegation level solves the MCD-CPP. Its generalization (COx and CA) even exhibit some desirable additional features.

 - COW and MISDIRECTED MESSAGE SEND CPP

If xC, CC, and -C permissions are set in the _external interface_ this would make _every write access to the corresponding slot create a new object. If the the prototype is copied, the copy will have the same access restrictions. Hence the MMS-CPP is not really solved by copy-on-write, since one looses the ability to make _any_ change to the corresponding slots, in the prototype as well as in any of its clones. 

If this is what Dave meant by his remark, then I appologize for my initial comment. There really seems to be no solution to the MMS-CPP.

The MMS-CPP arises from the fact that the prototype and its copy may have the same slots, with the same access permissions in their external interface. Therefore I believe that in a purely prototype-based language there exists NO solution to the problem (it does not occur in languages where objects are created from classes, since classes and instances have a different external interface - some aspects of classes are really not so bad :-) ). 

The best one can do is to try to avoid making errors by restructuring the system in such a way that the "safer" syntax is the default, as Dave and Jecel suggested:

  Dave (David.Ungar at Eng.Sun.COM) wrote:
    > One way to mitigate the prototype-corruption problem that I
    > am seriously considering would be to make "globals" no longer
    > be a parent and to change its name to something shorter, like "the".
    > The drawback: every time you refer to a prototype, you would have to say
    > "the point copy" instead of "point copy".
    > The benefits: slightly harder to forget copy, and better for showing
    > all of an object's inherited attributes in a single place.
    > ...
    > 
    > -- Dave

  Jecel (jecel at lsi.usp.br) wrote:
    > As there really aren't "globals" in Self, we could have the
    > following alternative:
    > 
    > traits graphics _AddSlotsIfAbsent: ( | point = () | )
    > traits point _Define:              ( | .......... | )
    > _AddSlotsIfAbsent: ( | proto* = ().      the = () | )
    > the _AddSlotsIfAbsent:             ( | point = () | )
    > the point _Define: ( | parent* = traits point.
    >                        x <- 0. y <- 0 | )
    > proto _AddSlots:   ( | point = ( the point copy ) | )
    > 
    > Now we can just say "point" and get a copy of the prototype, but
    > must write "the point" when we really want to refer to the
    > original prototype itself.
    > 
    > - Jecel
    > 
    > P.S.: I don't really my last suggestion, but I think it
    > is a good idea to make the "safest" operations the default.

Agreed.

LAST WORDS

I wrote this summary because the things you have been discussing are strongly related to my Ph.D. thesis and I was intrigued by statements that at the first glance seemed to contradict some of my views. I hope I didn't write just what was so obvious to anyone else that they didn't even care to mention it.

As I touched almost every topic of the previous discussion I'm prepared to find a lot of comments and critique in my mailbox when I return to my office by the end of next week. 
I'm looking forward to it.

Gunter