Should we switch to SAGE???

This wiki page discusses an essential issue for the future of MuPAD-Combinat. Please read carefully, and get involved!

SAGE is a completely open source mathematical system which was launched around 2004 by William Stein. It's based on a general purpose language (python), and readily has a large range of functionality thanks to the integration of existing components (GMP, GAP, singular, linbox, R, just to name a few)

http://www.sagemath.org/

One year ago, we got contacted by Mike Hansen. Since then, he has ported 30k lines of code from MuPAD-Combinat into SAGE. With François, Jason, and Anne, we were invited at the SAGE days 7 last week in Los Angeles (see http://mupad-combinat.sourceforge.net/#sage_days_7 for my presentation slides). Our goal was to play hard with the system and discuss seriously with the SAGE developers to evaluate how good an alternative SAGE would be. Since then, some further experiments were done (http://wiki.sagemath.org/combinat/). This is a report about those.

In short, I have been amazed by the dynamism of the growing SAGE community. Furthermore, our experiments (partial reimplementation of our crystal code) seem to indicate that there would be no major technical issues to port our code: most of the idioms and paradigms we use have some natural equivalent. On the other hand, porting 7 years worth of work in MuPAD, and more importantly migrating the community will be a lot of work and worst may cause the loss of users and developers.

Now is the time to take a decision
  • Should we switch to SAGE?
  • If yes, what should be the time line?

The answer is far from clear to me, and anyway this is really a decision that we have to take all together. There are lots of pros and cons, which I have tried to summaries below.

Your feedback is essential. Please answer by e-mail, or edit our Wiki

http://mupad-combinat.sourceforge.net/Wiki/Sage


Independently of when the switch would occur, here are some points
  • We should leave MuPAD-Combinat in a good state. If not just so that the MuPAD people can reuse whatever they see fit in there. We owe them this for all the support they provided us.
  • We don't necessarily need to switch all at the same time. But none of us want to have one foot in each system for long, even if the sage-MuPAD interface may level things a bit.
  • Anne: we have an opportunity now to prepare the switch, since Jason, Mike, Anne and myself are all together at MSRI, and Florent, Jean-Christophe and François will visit us at some point.
  • An hypothetical time line???

    • Spring 08, MSRI:

      • Design, prioritization, planning
      • Hopefully: implementation by the SAGE team of the fundamentals we need (free modules, basic data structures, graph stuff)
      • Implementation of some fundamental features by a core team (symmetric functions, decomposable objects)
    • June 08, FPSAC:

      • final stable release of MuPAD-Combinat
      • official announcement of the switch
      • Goal: being able to launch most new basic users directly with SAGE
    • September 08: SAGE days (Nancy)

      • Get all the MuPAD-Combinat developers started with SAGE
      • Full migration of those how can switch
    • Spring 09 ????

      • Full migration of all developers and all users?

Here is a summary of pros and cons (please edit!)

Development model and community:

+ Completely open source
+ Free (as in free beer)
+ (Hyper)Active fast growing community, with members in all areas of mathematics
  Will this continue at the same rate?
+ Regular SAGE days meetings (3 more planned for 2008, including one in France)
+ Excellent community organization (see sagetrac.org)
+ A new release comes out quite often (about every two weeks)
+ Releases are managed by a release manager which is paid for this,
  and takes administrative burden away (think running tests, making
  sure everything builds on all platforms, ...)
+ The quality of the code is enforced by systematic peer-reviews
  before integration into SAGE itself. This may motivate people to
  submit more stable/correct/easy-to-read/tested/documented code.
+ Mercurial distributed version control (better than subversion when
  it comes to have several lines of development in parallel).
- A Windows version is not yet fully available!
- NSF supported. What about the long run? Mark thinks that AMS / NSF
  would be likely to push further SAGE.
- MuPAD is somewhat a european technology, unlike many others including
  SAGE which is financed by NSF. Switching to SAGE will make it more difficult
  for our european developers to ask for funding from the european community
  for *-Combinat. On the other hand, we haven't so far really used this "european" lever,
  and, the algebraic community (and more generally the symbolic computation in
  general) is too small to afford such a split along a non-scientific line.
  Also, SAGE is pretty young. It happens to have started in the states, while
  MuPAD-Combinat was started in France, but both have an international vocation.
  If sufficiently many european researchers get involved in SAGE it is still time to
  "hijack" it and make it as much european as american (hoping for others from
  all other the world to also join the hijacking!).

- SAGE is huge. That surely is the price for the integration of so many other software.
  But will they manage to keep a large scale vision, and keep everything consistent?

Documentation:

+ easier to write
+ close to the code (written in the source file, within each function)
+ the doc does compile easily on all platforms
- Not as structured as in MuPAD (latex instead of XML)
  The platform by itself wont impose high quality standards
- syntactic tests instead of semantic tests
  needs to update the results if the pretty printing changes

Tests:

+ close to the code (as the doc)
+ may migrate progressively to pythonunit

Technical level:

+ Based on a general purpose language (Python) with a huge community:
  + Tons of books, tutorials, ...
  + Many potential users already know it, and for those that don't, it is an investment that they can reuse elsewhere
  + Lots of full featured and stable development tools (editors, IDE's, debugger, ...)
  + Thousands of general purpose python libraries
+ Speed?
  + For the language itself (mostly relevant for all the combinatorial routines): I don't expect a huge difference (see
    micro benchmark below). The interpreter and memory management should be more optimized though, and this may make a
    difference for higly recursive code.
  + Cython compiler: low level critical sections in python can be
    compiled without rewrite into C (up to some restrictions). It can be
    further optimized by using C-types instead of python types. This should allow us to gain on the fundamental
    combinatorial routines (think integer lists lex tools).
  + For all the other fundamental tools (linear algebra, group theory, groebner basis)
    having preexisting interfaces with optimized external tools (linbox, GAP, Singular, ...) will make a serious difference.
    In general, in the long run, we can hope to delegate more of the non algebraic combinatorics algorithmic to specialists
    than we currently can in MuPAD (larger community).
  + One of Sage's goal is in particular to beat Magma; not a small feat. So speed is certainly a concern for them!
+ Much easier integration of external C/C++ code
  (can drop MAPITL)
+ Continuations (yield) which makes it much easier to write
  generators. However: it is not sure that you can do copies of those
  generators and we use this intensively (see generators::subset)
+ parent/element: sets are objects like the others
  (see the discussion about posets last June)
- some issues with multiple inheritance
- no standard mantra for input and output type checking
- the coercion model may need some adaptations

For whatever it's worth, here is a ultra quick benchmark on a piece of code which was straightforwardly ported by mike and should be a relatively typical of combinatorial computations (counting partitions with some constraints which is done stupidly by generating them one by one lexicographically; this essentially involves lots of little manipulations of lists of integers):

sage: P = Partitions(20, maxlength=10)
sage: %timeit P.count()
10 loops, best of 3: 42 ms per loop
>> prog::timeThis(10, combinat::partitions::count(20, MaxLength=10))
CPU: 122.4 ms/call Wallclock: 122.4 ms/call

sage: P = Partitions(40, maxlength=10)
sage: %timeit P.count()
10 loops, best of 3: 2.84 s per loop
>> prog::timeThis(10, combinat::partitions::count(40, MaxLength=10))
CPU: 3702.2 ms/call Wallclock: 3702.2 ms/call

sage: %timeit P.count()
10 loops, best of 3: 16.5 s per loop
>> prog::timeThis(10, combinat::partitions::count(50, MaxLength=10))
CPU: 13653.3 ms/call Wallclock: 13653.3 ms/call

No big difference really, especially since the benchmark may well be unfair (on my machine, MuPAD is running in 32bits emulation, sage in 64bits, ...).

Note however that in python there is room for optimization if we compile the code, make it run with machine integers, etc.


To do list (Mike, please update!):

 - Basic combinatorial classes      [ 75%]
 - Decomposable objects             [    ] (Mike received funding from Google to work on this over the summer)
 - Symmetric functions              [ 90%]
 - k-Schur & the like               [ 50%]
 - Root systems / ...               [ 25%]
 - Crystals                         [ 50%]
 - Free modules & such              [  5%]
 - Algebra (desosseur, ...)         [  0%]
 - Hopf algebra framework           [  0%]
 - * with several bases             [  0%]
 - Operads                          [  0%]
 - Linbox interface                 [100%] (compares to 10% in MuPAD)
 - GAP interface                    [100%] (compares to  1% in MuPAD)
 - Interface for fast Gröbner basis [100%] (compares to  0% in MuPAD)
 - Nauty                            [100%]
 - Symmetrica                       [ 60%]
 - GLIP                                    Deprecated by Robert's generic code
 - dot2tex                          [ 10%] Delegated to the SAGE graph theorists
 - Database access                  [100%]
 - MachineIntegerListsLex           [  0%] Will be easy via cython
 - Rigged configurations            [  0%]
 - Basic abstract data structures   [100%] (fast stacks, AVL, ...) compares to just the basic ones in MuPAD + no real way to implement some with serious speed ourselves

Hello, all. I'm Kurt Luoto, currently a grad student at University of Washington, and an erstwhile software developer. Nicolas asked me to post a note about myself, my interests, and potential involvement, since I could possibly contribute to the project if it does migrate to Sage.

I have within the last month begun to learn Sage/Python for my own research purposes. I was earlier writing my own code from scratch (in Caml). However I would now like to move to what is likely to be a more standard platform since I am now working with collaborators. My current research project involves quasisymmetric functions (QSym) and non-symmetric Macdonald polynomials. I have written some code for the latter, independent from Mike Hansen's file (sage/combinat/sf/ns_macdonald.py). My code computes these polynomials via diagram fillings per the work of Haglund, Haiman, and Loehr, and offers some other related functionality. I am currently trying to implement quasisymmetric functions using a file from Mike as a starting point. It is not very complete yet, but I hope to learn more of the fine points soon. Currently the code is written to meet my own needs, but I would not mind putting in some work to make it more widely usable by the larger community.

Email: kwluoto "at" math.washington.edu or kwluoto "at" gmail.com