This wiki page discusses an essential issue for the future of MuPAD-Combinat. Please read carefully, and get involved!
One year ago, we got contacted by Mike Hansen. Since then, he has
ported 30k lines of code from MuPAD-Combinat into SAGE. With François,
Jason, and Anne, we were invited at the SAGE days 7 last week in Los
Angeles (see
http://mupad-combinat.sourceforge.net/#sage_days_7 for my
presentation slides). Our goal was to play hard with the system and
discuss seriously with the SAGE developers to evaluate how good an
alternative SAGE would be. Since then, some further experiments were
done (
http://wiki.sagemath.org/combinat/). This is a report about those.
In short, I have been amazed by the dynamism of the growing SAGE community. Furthermore, our experiments (partial reimplementation of our crystal code) seem to indicate that there would be no major technical issues to port our code: most of the idioms and paradigms we use have some natural equivalent. On the other hand, porting 7 years worth of work in MuPAD, and more importantly migrating the community will be a lot of work and worst may cause the loss of users and developers.
The answer is far from clear to me, and anyway this is really a decision that we have to take all together. There are lots of pros and cons, which I have tried to summaries below.
An hypothetical time line???
Spring 08, MSRI:
June 08, FPSAC:
September 08: SAGE days (Nancy)
Spring 09 ????
Here is a summary of pros and cons (please edit!)
Development model and community:
+ Completely open source + Free (as in free beer) + (Hyper)Active fast growing community, with members in all areas of mathematics Will this continue at the same rate? + Regular SAGE days meetings (3 more planned for 2008, including one in France) + Excellent community organization (see sagetrac.org) + A new release comes out quite often (about every two weeks) + Releases are managed by a release manager which is paid for this, and takes administrative burden away (think running tests, making sure everything builds on all platforms, ...) + The quality of the code is enforced by systematic peer-reviews before integration into SAGE itself. This may motivate people to submit more stable/correct/easy-to-read/tested/documented code. + Mercurial distributed version control (better than subversion when it comes to have several lines of development in parallel). - A Windows version is not yet fully available! - NSF supported. What about the long run? Mark thinks that AMS / NSF would be likely to push further SAGE. - MuPAD is somewhat a european technology, unlike many others including SAGE which is financed by NSF. Switching to SAGE will make it more difficult for our european developers to ask for funding from the european community for *-Combinat. On the other hand, we haven't so far really used this "european" lever, and, the algebraic community (and more generally the symbolic computation in general) is too small to afford such a split along a non-scientific line. Also, SAGE is pretty young. It happens to have started in the states, while MuPAD-Combinat was started in France, but both have an international vocation. If sufficiently many european researchers get involved in SAGE it is still time to "hijack" it and make it as much european as american (hoping for others from all other the world to also join the hijacking!). - SAGE is huge. That surely is the price for the integration of so many other software. But will they manage to keep a large scale vision, and keep everything consistent?
Documentation:
+ easier to write + close to the code (written in the source file, within each function) + the doc does compile easily on all platforms - Not as structured as in MuPAD (latex instead of XML) The platform by itself wont impose high quality standards - syntactic tests instead of semantic tests needs to update the results if the pretty printing changes
Tests:
+ close to the code (as the doc) + may migrate progressively to pythonunit
Technical level:
+ Based on a general purpose language (Python) with a huge community:
+ Tons of books, tutorials, ...
+ Many potential users already know it, and for those that don't, it is an investment that they can reuse elsewhere
+ Lots of full featured and stable development tools (editors, IDE's, debugger, ...)
+ Thousands of general purpose python libraries
+ Speed?
+ For the language itself (mostly relevant for all the combinatorial routines): I don't expect a huge difference (see
micro benchmark below). The interpreter and memory management should be more optimized though, and this may make a
difference for higly recursive code.
+ Cython compiler: low level critical sections in python can be
compiled without rewrite into C (up to some restrictions). It can be
further optimized by using C-types instead of python types. This should allow us to gain on the fundamental
combinatorial routines (think integer lists lex tools).
+ For all the other fundamental tools (linear algebra, group theory, groebner basis)
having preexisting interfaces with optimized external tools (linbox, GAP, Singular, ...) will make a serious difference.
In general, in the long run, we can hope to delegate more of the non algebraic combinatorics algorithmic to specialists
than we currently can in MuPAD (larger community).
+ One of Sage's goal is in particular to beat Magma; not a small feat. So speed is certainly a concern for them!
+ Much easier integration of external C/C++ code
(can drop MAPITL)
+ Continuations (yield) which makes it much easier to write
generators. However: it is not sure that you can do copies of those
generators and we use this intensively (see generators::subset)
+ parent/element: sets are objects like the others
(see the discussion about posets last June)
- some issues with multiple inheritance
- no standard mantra for input and output type checking
- the coercion model may need some adaptations
For whatever it's worth, here is a ultra quick benchmark on a piece of code which was straightforwardly ported by mike and should be a relatively typical of combinatorial computations (counting partitions with some constraints which is done stupidly by generating them one by one lexicographically; this essentially involves lots of little manipulations of lists of integers):
sage: P = Partitions(20, maxlength=10) sage: %timeit P.count() 10 loops, best of 3: 42 ms per loop >> prog::timeThis(10, combinat::partitions::count(20, MaxLength=10)) CPU: 122.4 ms/call Wallclock: 122.4 ms/call sage: P = Partitions(40, maxlength=10) sage: %timeit P.count() 10 loops, best of 3: 2.84 s per loop >> prog::timeThis(10, combinat::partitions::count(40, MaxLength=10)) CPU: 3702.2 ms/call Wallclock: 3702.2 ms/call sage: %timeit P.count() 10 loops, best of 3: 16.5 s per loop >> prog::timeThis(10, combinat::partitions::count(50, MaxLength=10)) CPU: 13653.3 ms/call Wallclock: 13653.3 ms/call
No big difference really, especially since the benchmark may well be unfair (on my machine, MuPAD is running in 32bits emulation, sage in 64bits, ...).
Note however that in python there is room for optimization if we compile the code, make it run with machine integers, etc.
To do list (Mike, please update!):
- Basic combinatorial classes [ 75%] - Decomposable objects [ ] (Mike received funding from Google to work on this over the summer) - Symmetric functions [ 90%] - k-Schur & the like [ 50%] - Root systems / ... [ 25%] - Crystals [ 50%] - Free modules & such [ 5%] - Algebra (desosseur, ...) [ 0%] - Hopf algebra framework [ 0%] - * with several bases [ 0%] - Operads [ 0%] - Linbox interface [100%] (compares to 10% in MuPAD) - GAP interface [100%] (compares to 1% in MuPAD) - Interface for fast Gröbner basis [100%] (compares to 0% in MuPAD) - Nauty [100%] - Symmetrica [ 60%] - GLIP Deprecated by Robert's generic code - dot2tex [ 10%] Delegated to the SAGE graph theorists - Database access [100%] - MachineIntegerListsLex [ 0%] Will be easy via cython - Rigged configurations [ 0%] - Basic abstract data structures [100%] (fast stacks, AVL, ...) compares to just the basic ones in MuPAD + no real way to implement some with serious speed ourselves
Hello, all. I'm Kurt Luoto, currently a grad student at University of Washington, and an erstwhile software developer. Nicolas asked me to post a note about myself, my interests, and potential involvement, since I could possibly contribute to the project if it does migrate to Sage.
I have within the last month begun to learn Sage/Python for my own research purposes. I was earlier writing my own code from scratch (in Caml). However I would now like to move to what is likely to be a more standard platform since I am now working with collaborators. My current research project involves quasisymmetric functions (QSym) and non-symmetric Macdonald polynomials. I have written some code for the latter, independent from Mike Hansen's file (sage/combinat/sf/ns_macdonald.py). My code computes these polynomials via diagram fillings per the work of Haglund, Haiman, and Loehr, and offers some other related functionality. I am currently trying to implement quasisymmetric functions using a file from Mike as a starting point. It is not very complete yet, but I hope to learn more of the fine points soon. Currently the code is written to meet my own needs, but I would not mind putting in some work to make it more widely usable by the larger community.
Email: kwluoto "at" math.washington.edu or kwluoto "at" gmail.com