MuPADCombinatWiki - Generators

Specifications of Generators

This page is for discussing the specifications of generators, and in particular the library combinat::generators.

A generator is a device that allows for iterating over a large or even infinite quantity of objects, without having to produce and store all of them at once. This is typically very handy in combinatorics.

In principle a generator is any MuPAD object g such that the function call g() is defined. Each call g() returns either the next object, or FAIL if no more objects are available. The effect of calling g() after the FAIL value has been returned is undefined (i.e. possibly disastrous).

Generators are furthermore assumed to display the so-called reference effect, in particular when passed as argument to a function. That is, after assigning a generator g to f, f and g should refer to the same object.

Example:

>> g:=combinat::generators::fromNext(0, i->i+1):
>> g(), g() ;
                                              0, 1
>> f := g:
>> f(), f(), g(), g(), f(), f() ;
                                        2, 3, 4, 5, 6, 7

Copy of generators

There are cases where one needs to make a copy of a generator, so that after copying a generator g onto f, g and f are fully independent, and both produce the same objects, in the same order (we probably would need to be more precise about what we mean by same object; syntactical equality?).

Example:

>> f := combinat::generators::copy(g):
>> f(), f(), g(), g(), f(), f() ;
                                       8, 9, 8, 9, 10, 11

Below, we call a generator copyable if it displays by default the reference effect, and provides a mean to copy it.

!Design goals

Define a standard interface for copyable generators
Make sure that all the generators in MuPAD-Combinat are copyable
Make it easy for the user to write copyable generators
Making a generator g copyable should not impact the efficiency of calls to g().
On the other hand, generators are not copied very often, so the copy itself does not need to be extremely efficient.
If possible, make this work for all versions of MuPAD. (well, 2.5.0 is probably a minimum, in order to have copyClosure.

Constant generators (that is function objects without internal state like g:1, or g: ()->x^3) have no internal state. So copy and reference semantic coincide, and nothing needs to be done for them.

Most generators are implemented by closures whose internal state is stored in local variables. If the variables of such a generator contain only objects with a copy semantic (that is most MuPAD object), then copying the closure of the generator with copyClosure(g) does the job.

However, quite often some of the variables contain objects which display the reference effect, typically other generators. Those need to be explicitly copied. Copying recursively all the generators inside a generator (deep copyClosure) may be inefficient, and is not always safe (see the example we had found). So it is probably simpler, more efficient, and safer to leave the control of what needs to be copied to the implementor of the generator. This does not require kernel changes, and will work with any version of MuPAD >= 2.5.0.

Current attempt:

Interface for users

To copy a generator g, one should call combinat::generators::copy(g)

Interface for implementers of generators

To be copiable, a generator g can be
- a DOM_PROC which can be safely copied with copyClosure (typically its variables contain only objects without reference effect)
- a MuPAD object such that g::copy(g) returns a copy of itself (typically a funcenv; in fact, in the current implementation of generators::copy, this object is required to be a funcenv).
- any other MuPAD function object without internal state and without g::copy slot.
To build such a copyable generator one can use: combinat::generators::generatorWithCopy(g: DOM_PROC, copy) where g and copy are two DOM_PROC which have been built inside the exact same lexical scope. g contains the usual code of the generator.

Calling copy should do whatever is required to replace the state of g by a copy of it; typically replacing all objects in the context of g which display a reference effect by copies of them.

As Florent says, one downside of this design is that some generators are copyable and some not, and there is no way, neither for the system, nor for the user to know whether a given generator is copyable or not, except to read the documentation of the code. In other words, copyable generators are not typed any differently than untyped generators.

So, should we enforce some sort of typing?

We could, for example, use one of the following approaches

enforce that copyable generators have a copy slot (even a dummy one);
encapsulate generators inside objects of some domain.

This is a question of balance between efficiency (2 is not good to this respect), safeness, and practicalness for writing generators.

Practicalness is important, because writing generators is already something tricky and tedious.

For the programmer, this would amount to have to always use some construction like the following to build a new generator:

        generators::newWithCopy(g <, copy= NIL >)

My first feeling is that this would be cumbersome, but that's just a feeling.

To get some better opinion, I suggest that we run through all our generators to

fix those that are not copy safe using the current design
check the others to see how much extra code that would require to type them
and by the way, check what would be the overhead in time and code to type all our generators. So far, the consensus had been not to type them (mupad functions are not really typed either), but maybe we were wrong!

Full example (taken from the test file of combinat::generators):

makeGenerator :=
proc()
    option escape;
    local g, n;
begin
    n := 0;
    g := G::fromNext(1, i->i+1);
    G::generatorWithCopy(proc()
                         begin
                             n := n+1;
                             [g(), n];
                         end_proc,
                         proc()
                         begin
                             g := G::copy(g);
                         end_proc);
end_proc:

g := makeGenerator(0):
prog::test([g() $ i = 1..3],
           [[1, 1], [2, 2], [3, 3]]):

g1 := g::copy(g):
prog::test([g() $ i = 1..3],
           [[4, 4], [5, 5], [6, 6]]):
prog::test([g1() $ i = 1..2],
           [[4, 4], [5, 5]]):
prog::test([g() $ i = 1..3],
           [[7, 7], [8, 8], [9, 9]]):

g2 := g1::copy(g1):
prog::test([g() $ i = 1..3],
           [[10, 10], [11, 11], [12, 12]]):
prog::test([g1() $ i = 1..2],
           [[6, 6], [7, 7]]):
prog::test([g2() $ i = 1..1],
           [[6, 6]]):
prog::test([g() $ i = 1..3],
           [[13, 13], [14, 14], [15, 15]]):
prog::test([g1() $ i = 1..2],
           [[8, 8], [9, 9]]):
prog::test([g2() $ i = 1..1],
           [[7, 7]]):

Note: g := random(3) is a generator. But not a copyable generator! On the other hand, frandom(20) could be made copyable, as could stats::weibullRandom(a, b, seed=22) and the others with a specified seed.

Implementation of generatorWithCopy

(taken from the source of combinat::generators):

    generatorWithCopy :=
    proc(generator: DOM_PROC, copy: DOM_PROC)
    begin
        /* Note: at first, generator and copy have different
           DOM_PROC_ENV's (which should refer to the same lexical
           context. When I do the copy, I make a copyClosure of
           g, and assign the DOM_PROC_ENV of g to copy. This seems to
           work. But is it safe?
         */
        funcenv
        (generator,
         NIL,
         table("copyState" = copy,
               "copy" =
               proc(x)
                   local generator, environment;
               begin
                   generator := copyClosure(extop(x,1));
                   environment := extop(generator,12);
                   x := subsop(x,
                               1=generator,
                               3=map(extop(x,3),
                                     f -> if domtype(f) = DOM_PROC then
                                              extsubsop(f, 12=environment)
                                          else
                                              f
                                          end_if)
                              );
                   x::copyState();
                   x;
               end_proc));
    end_proc;

?ChristopherCreutzig: The above code has the disadvantage of not accepting an existing DOM_FUNC_ENV as its first argument.

NicolasThiéry: Yes, but I am not sure we will need such a feature, and there would be quite a few things to be carefull with with a funcenv (like checking whether the various slots share the same environment or not). This is really intended to be a low-level constructor which you are going to call just after creating the two procedures. If at some point in the future we see a use for allowing funcenv's, it will always be time to add this feature.

?ChristopherCreutzig: What is much worse: The extsubsop of the 12th operand should be guarded with a comparison with the 12th operand of x, for otherwise the following breaks:

f := proc(x) option escape; begin y -> x+y; end(2):
f := funcenv(f):
f::somethingElse := proc(x) option escape; begin y -> x+y; end(3):
f::andAnotherOne := combinat::generators::copy(f):

g := combinat::generators::copy(f):
prog::test(nops({extop(extop(g, 1), 12),
                 extop(g::somethingElse, 12),
                 extop(extop(g::andAnotherOne, 1), 12)}), 3):
prog::test(nops({extop(extop(f, 1), 12),
                 extop(extop(g, 1), 12)}), 2)
prog::test(nops({extop(f::somethingElse, 12),
                 extop(g::somethingElse, 12)}), 2)
prog::test(nops({extop(extop(f::andAnotherOne, 1), 12),
                 extop(extop(g::andAnotherOne, 1), 12)}), 2)

Sorry for getting nasty, but you asked for it. :-)

NicolasThiéry: Yes, that would be much safer indeed. The problem is just that I don't know how to do this comparison, due to the following :

bla :=
proc(x)
    option escape;
begin
    [() -> x, ()->x+1];
end:
[f, g] := bla():
extop(f,12), extop(g,12) ;

                        DOM_PROC_ENV(138383748), DOM_PROC_ENV(138340264)

How do I know that f and g share the same context? They do not share the same DOM_PROC_ENV. Also, is it meaningfull and safe to assign the DOM_PROC_ENV of f to the one of g? (I tried, it happened to work, so I used it, but ...)

?ChristopherCreutzig: Actually, that is very easy to find out, just ask MuPAD and ignore the display on screen:

>> bool(extop(f,12)= extop(g,12))

                             TRUE

NicolasThiéry: Ah! That's the point. Thanks. I'll update the code soon. Well, now we have an alternative design: if generators::copy receive a function environment, then it calls copyClosure on the underlying procedure f, assigns that closure to all the func_env entries f::truc that shared the same context as f, and finally calls f::copy() if it exists. This design is slightly simpler; no need anymore for generators::generatorWithCopy, to build a generator with copy, the user just does a funcenv(generatorCode, table("copy" = ...)). Also, adding new slots for counting how many times the generator has beend called, or maybe to have a reverse generator fits directly in this framework. On the other hand, the interface is less abstract and dictactes the underlying implementation, which would make it more difficult to change later on.

FlorentHivert : Hum... I don't understand what CCR was trying to do above... The current solution seems to be very easy to use, As an exercice, I've updated all the generators of the library. It does not seems to be a great work to update all the generator of MuPAD-Combinat. It's seems that if the standard user don't try to do ugly things as Christopher do :-) it perfectly solves the problem.

Just a little bug: trivial generator (not functions) are not accepted by G::copy.

NicolasThiéry: Fixed, and updated the specifications above.

?ChristopherCreutzig: So you want some "real" application? Right now, the best I can come up with is:

>> g := combinat::generators::fromNext(0, x->x+1):
>> g := funcenv(g):
>> g::countPrint := combinat::generators::fromNext(1, x->x+1):
>> g::print := () -> "integer-generator (looked at ".
&>                   expr2text(g::countPrint())." times)":
>> g

             integer-generator (looked at 1 times)
>> g

             integer-generator (looked at 2 times)
>> g

             integer-generator (looked at 3 times)

(NicolasThiéry: Cute. Even more useful application: having print display how many times the generator has been called.)

In any case, the original question was "is this safe?", which I (maybe wronlgy) interpreted as "is this safe for general generators?" I think it is possible (and not even all that hard) to fix the issues I pointed out -- and it would not mean a change for the user.

Naming conventions

Should we use generators::multisets and generators::powersets instead of multiset and powerset?

Zip

What do you think of generators::zip(g1, g2, f)?