Saturday, October 11, 2025

FSet 2.0 is coming!

I have pushed and tagged the first release candidate, v2.0.0-rc0, of FSet version 2!  I'm keeping it in a GitLab Merge Request (MR) for the moment, but I am very much hoping to get some FSet users to try it out and give me some feedback.

One major change is that sets and maps now use the CHAMP implementations by default.  This change should be transparent as long as:

  • you haven't written any complex custom compare methods (if all the method does is call compare-slots, it can be easily converted to use the new macro define-equality-slots), and
  • you don't care about the ordering of your sets and maps, or in the cases where you do care, you've used the new custom-ordering features.

The second major change is to the defaulting behavior of maps and seqs.  FSet 1 uses a "default default" of nil, meaning that if you don't supply an explicit default when creating a map or seq, its default is nil.  The default is returned on a map lookup when the supplied key is not in the map; it is returned on a seq lookup when the supplied index is not in bounds (the bounds being 0 up to, but excluding, the size of the seq).

In FSet 2, there is no default default.  If you don't supply an explicit default, the map or seq has no default, and an access attempt will signal an error instead in these cases.  So, migrating your code to FSet 2 will probably require a little debugging — running your test suite, noting when you get one of the new errors, finding the form where the map or seq involved is initially created, and adding :default nil to the form or wrapping it in (with-default ... nil).

Examples:

;; The map constructor macros accept a default anywhere in the form
(map)          -->  (map :default nil)
(map ('a 3))   -->  (map ('a 3) :default nil)
(replay-map (14 'x) (9 'q)) --> (replay-map :default nil (14 'x) (9 'q))
;; The seq constructor macro does not
(seq 3 1 4)    -->  (with-default (seq 3 1 4) nil)
;; The constructor functions take a :default keyword argument
(empty-map)    -->  (empty-map :default nil)
(empty-seq)    -->  (empty-seq :default nil)
;; Tuples associate defaults with the keys rather than the tuples
(define-tuple-key foo) --> (define-tuple-key foo :default nil) 

But, there's good news!  You don't have to convert your code if you don't want to.  Merely loading FSet 2 doesn't expose your code to these changes; the behavior of names exported from package fset has mostly not changed.   Instead, I've added a new package, fset2, that exports its own versions of the names with new behavior.  So, to use FSet 2, change :use fset in your defpackage form(s) to :use fset2.

(There is one change you will see even if you don't use the new package, having to do with the printing of map and seq defaults.  Previously, a nil default would not be printed explicitly; now, it will be, so you'll see things like ##{| (a 3) |}/NIL and #[ 3 1 4 ]/NIL.)

For complete details of all changes in this release, see the MR.

So, for anybody who wants to help me out, here's what I ask:

  1. Clone this repo (or this one), and in your copy, do: git checkout fset2.
  2. If you didn't clone it in ~/quicklisp/local-projects/, arrange for Quicklisp to find this copy, in whatever way you do that (e.g. by pushing the directory pathname onto asdf:*central-registry*).
  3. Recompile your client code and test it.  If anything doesn't work, please let me know immediately.
  4. Go into the :use clause of your defpackage form(s) and change fset to fset2.
  5. Recompile your client code again, and test it again.  This time you may need to make some changes, as discussed above.  Let me know how much trouble you have, whether a little or a lot (and especially let me know if you give up).  You can post comments in the MR, or in this GitHub issue.

Again, this is a release candidate, not yet a release.  I've tested it pretty thoroughly, but there could still be bugs.  OTOH, if there's something in particular you don't like about it, I may be more willing to make changes than I will be after it's released. 

Share and enjoy!

 

Sunday, September 21, 2025

How well different Common Lisps run FSet

I just did some quick-and-dirty benchmarking, using FSet's test suite.  It was not designed as a benchmark, but I think it still gives a useful rough indication of how well FSet runs on different platforms.

These tests were all run on an Intel Xeon "Ivy Bridge" except the first one, which was on an Apple M2 MacBook Pro.  Values are the time to run 100 iterations of the test suite; smaller is better.

  SBCL 2.5.3 on M2           1.9
  SBCL 2.5.3                 2.8
  LispWorks 8.0.1            5.8  (Personal Edition, but I doubt that matters)
  CCL 1.13                  13.3 
  Franz Allegro 11          15.7
  ECL 24.5.10               58.1
  CLASP 2.5.0               79.4
  ABCL 1.9.2                85.3
 

Yikes!  Ignoring the M2 number, that's a factor of 30 — a very wide range.  I don't think the test is entirely fair, because I develop on SBCL and haven't put any effort into optimizing for other platforms.  I suspect the CCL and Allegro times could be improved somewhat.  The poor performance of ECL and CLASP surprises me; FSet spends most of its time doing ordinary struct and simple-vector accesses, which I would think would translate well into C.  Maybe they're still doing a lot of type- and bounds-checking, even though I've requested safety 0?

As for ABCL, I think it's a remarkable achievement that it is compatible enough to run FSet at all; I can't fault it for not being a speed demon.  My guess is that the biggest gains to be had here would be from improving ABCL itself, rather than tweaking FSet.

Friday, September 19, 2025

FSet now has CHAMP sets and maps!

I have just released FSet 1.6.0, which completes the work on CHAMP sets and maps that I started months ago.

CHAMP is a hash-based data structure by Michael Steindorfer, that improves a little on Phil Bagwell's widely-used HAMT.  (The HAMT is used, for example, by Clojure.)

See the GitLab MR  for the details of how to use it.

I did some quick micro-benchmarking, using sets of integers, comparing CHAMP against my older weight-balanced trees.  On lookup (testing whether a value is in the set), CHAMP is about twice as fast at size 4, growing to almost 5x faster at size 2048.  On update (adding an element to the set, with about a 25% chance that it's already there), CHAMP is roughly even with WB at size 4, but over 40% faster at size 2048.

So to summarize, there's a significant and welcome improvement in update performance, and quite a remarkable improvement in lookup performance.  W00t!

 

Monday, September 1, 2025

Receiving Multiple Values

Background: the nlet macro

'Way back in 1980, when I was an MIT undergraduate hacking on the Lisp Machines at the AI Lab, I was already starting to experiment with a more functional style than most people there used.  And one thing I quickly discovered was the usefulness of the multiple-value feature in this style.  For example, instead of doing (when (something) (rotatef a b)) to conditionally swap the values of two variables, I found I could do this:

(multiple-value-bind (a b)
    (if (something) (values b a)
      (values a b))
  ...)

Whether this is really a stylistic improvement or not is certainly debatable, though I still do it, but it does get rid of the assignments.

Of course, the more common use of multiple values is to return more than one piece of information from a function.  Some languages have "out parameters" for this purpose, but Common Lisp, like Lisp Machine Lisp before it, has multiple values; and some CL builtins are specified to return multiple values (floor and ceiling come to mind, but there are many others).  On most implementations, returning multiple values is much faster than consing up a list or other structure to return, because the semantics don't make the tuple of values a firstclass object (unless, of course, you do that explicitly with multiple-value-list).  Instead, like arguments to a call, they are generally returned in registers or on the stack.

Anyway, as I was making fairly frequent use of multiple-value-bind, sometimes nesting two or three such forms together, it seemed to me that I should be able to write them similarly to let forms — that the difference between binding one variable and binding two or more was not so large that I should have to use a completely different construct for the latter.  Also, I noticed that there was space in the syntax of let to extend it: let's binding clauses are specified to be lists of length two (or one, but let's ignore this case), of which the first is the variable to bind and the second is the init-form whose value is to be bound to.  The obvious generalization is that all subforms of the clause but the last are variables to be bound, and the last subform is the init-form that will supply those values.

It also occurred to me, as I was using both let for parallel binding and let* for sequential binding, that there was a way to allow arbitrary combinations of parallel and sequential binding using clause nesting: nesting an inner clause inside an outer one would make the variables bound by the outer one available to the inner one.

Thus was born my macro nlet.  Here's an example:

  (nlet ((a b c (zot))
         ((d (quux a c))
          ((e f (mumble b d))
           (g (mung a))))
         ((h (frobnicate c))
          ((i (glumph h))))
         (*print-level* 3))
    ...)

First a, b, and c are bound to the first three values of (zot), and in parallel, *print-level* is bound to 3; then d and h are bound; then e, f, g, and i are bound.

As this example illustrates, all bindings at a given nesting level are done in parallel, with all bindings at a deeper level following. Stylistically, it is expected that init-forms in nested clauses will refer only to variables bound in containing clause lists.  (More on this point below.)

My Misc-Extensions system exports this macro under two names, nlet and let.   In my own code, I import it as let, shadowing cl:let.  (You'll see it all over the FSet codebase.)

Critical reception: crickets

I published Misc-Extensions at the same time I first published FSet, which I think was 2007 or 2008.  I don't recall anyone ever commenting to me either positively or negatively about nlet.  My guess is that few if any people besides me have ever used it, but I have no way to know that for sure.  In fairness, I didn't make much noise about it; I didn't even add a README with documentation until May 2024, though there was always documentation in the source file.

Let me explain why I like it, though.  Consider: why do people ever use let (I mean cl:let) when let* exists?  One possible reason is because they want to shadow a variable in the containing scope, but they also need one of the init-forms to refer to the outer variable, and they also need that init-form to be evaluated after the init-form that provides the value for the inner variable.  Like this:

(let ((a (something)))
  ...
  (let ((a (must-come-first))
        (b (must-come-second a)))
    ...))

If they didn't want to shadow a, they could use let*.  (And even if they did want to shadow a, if the two init-forms could be reordered, they could still use let*).  Since you never really have to shadow a variable — you can always pick a new name for the inner one — let is redundant; we could always use let*.  And yet, in the CL code I've seen (not counting my own), let is several times as common as let*.  Why is this?

In the preface to Structure and Interpretation of Computer Programs, Hal Abelson and Gerry Sussman famously said, "Programs must be written for people to read, and only incidentally for machines to execute."  While I think that's a tad overstated, it makes an important point.

When let* is used, someone reading the code needs to track the data flow through the clauses, by seeing which init-forms reference which variables bound in previous clauses, in order to understand it.  In contrast, the use of let communicates immediately that there are no data flow dependences between the clauses.  Thus, the reader need not look for them; each of the init-forms can be viewed independently.

The advantage nlet has over let*, then, is that it permits even greater precision in indicating which clauses depend on which other clauses.  Let's look again at this example:

  (nlet ((a b c (zot))
         ((d (quux a c))
          ((e f (mumble b d))
           (g (mung a))))
         ((h (frobnicate c))
          ((i (glumph h))))
         (*print-level* 3))
    ...)

The grouping of the clauses tells the reader at a glance that only the mumble and mung calls can depend on d, and only the glumph call can depend on h.  (This isn't enforced, but it's an easy rule to follow when writing.)  Any of the calls can depend on *print-level*, but if they actually do, it would have been clearer to put that at the top.  If you wanted to make sure that they can't see that binding, you could just wrap the clause in two more parenthesis pairs.

Well.  All that said, my guess is that most people are still going to find this syntax a bridge too far.  What do you think?

Compromise: mvlet and mvlet*

I have just now released a new Misc-Extensions with two new variations on the above theme, mvlet and mvlet*.  These retain the multiple-value binding feature of nlet, but don't support clause nesting; instead they have the familiar semantics of doing fully parallel or fully sequential binding, like let and let* respectively.  I think this is probably the right compromise to attract wider use.  Do you agree?

It's probably clear by now how to use mvlet and mvlet*, but here's an example anyway:

  (let ((a (foo)))
    ...
    (mvlet ((a b (bar))
            (c (baz a))  ; outer 'a'
            ...)
      ...)

    (mvlet* ((a b (bar))
             (c (baz a))  ; inner 'a'
             ...)
      ...))

I've looked around for other published macros that provide the same features — a succinct syntax for binding to multiple values, multiple binding clauses, and a choice of parallel or sequential binding — and I haven't found much.  Alexandria doesn't have it.  I have a faint recollection of seeing a generalized-destructuring macro that allows the use of values expressions as patterns, but I can't seem to find it now.  I don't see this functionality in Trivia.  The closest thing I can find is Ron Garret's bb macro, but (a) it's in his Ergolib project, which is not in Quicklisp, (b) it does only sequential binding, and (c) it doesn't handle declarations in the body correctly (this is noted as a TODO).  If you know of something else, please comment.

I would also note that Dylan, which was created by people who had worked on the design of CL, has a similar syntax for binding to multiple values:

let (a, b, c) = values(...)

I take this as validation of the concept.

I would like this post to start a conversation.  I'll post it on Reddit under /r/Common_Lisp; if you use Reddit, that might be the best place.  Otherwise, feel free to comment below.  Do you use multiple values much?  If so, do you agree that the builtin syntax is a bit clunky?  Do you think you would use any of the macros I've introduced here?

Monday, August 25, 2025

FSet now supports Iterate!

For FSet 1.5.1, I have added support for the popular Iterate iteration macro.  The issue page has a good explanation.

I don't use Iterate myself, or I would no doubt have done this sooner.

Also, to implement some of the Iterate functionality, I extended the stateful seq iterator to support iterating a subsequence, and added a reverse version. So now the iterator method on wb-seq takes keyword arguments start, end, and from-end?, whose semantics is familiar from cl:find etc.

If there's anything else about which you think "I would like to use FSet, but it doesn't work for me because it doesn't have X", I would like to know what that is.  Please post in the blog comments or on Reddit (I'll link to this from r/Common_Lisp).

 

Tuesday, August 19, 2025

FSet 1.5.0 gets custom orderings!

The ordering of the "setlike" collections — sets, maps, and bags — in FSet has always been determined by the generic function fset:compare.  This approach is often very convenient, as it allows you to define the ordering of a new type simply by adding a method on compare; there is no need to supply the ordering explicitly every time you create a new collection.

However, as people have complained from time to time, it is also a bit limiting.  Say you want to make something like a telephone directory (anyone remember telephone directories?) which maps string keys to values, and you would like it maintained in lexicographic order of the keys.  To do this with FSet, you have heretofore had to define a wrapper class, and then a compare method on that class, something like:

(defstruct lexi-string
  value)
(defmethod compare ((a lexi-string) (b lexi-string))
  (compare-lexicographically (lexi-string-value a) (lexi-string-value b)))

Then you would have to wrap your keys in lexi-strings before adding them to your map.  That seems a little wasteful of both time and space.

A second problem with always using fset:compare is that you have to pay the cost of the generic function dispatch several times every time the collection gets searched for an element, as in contains? on a set or lookup on a map.  (The number of such calls is roughly the base-2 logarithm of the size of the collection.)  One micro-benchmark I ran showed this cost to be around 20% of the access time, which is not insignificant.

So, in response to popular demand, I have added custom orderings to FSet: you can supply your own comparison functions when creating collections, and FSet will call those instead of compare.  Use of this feature is completely optional; existing code is not affected.  But if you want to do it, now you can!

I refer you to the PR description for the details.

There is one aspect of this change that might surprise you.  When given objects of different classes, fset:compare doesn't compare the contents of the objects; it just compares their class names and returns :less or :greater accordingly.  So, for instance, a list cannot be equal? to a vector or seq, even if they have the same elements in the same order.  This rule now also covers cases where the objects are collections of the same kind (sets, bags, or maps) but with different orderings.  So just as a wb-set and a ch-set can never be :equal, so two wb-sets with different orderings can never be :equal; compare will just look at the comparison function names to impose an artificial ordering.

I'm not suggesting this is an ideal situation, but I don't see a way around it.  Since comparing two wb-sets of the same ordering relies on that ordering, a combined relation on wb-sets of different orderings would in general fail to be transitive; you would get situations where a < b and b < c, but c < a.

Thursday, October 17, 2024

Comparison: FSet vs. Sycamore

[BULLETIN: Quicklisp now has the latest version of FSet.]

Sycamore, primarily by Neil Dantam, is a functional collections library that is built around the same weight-balanced binary tree data structure (with leaf vectors) that FSet uses.  While the README on that page comments briefly on the differences between Sycamore and FSet, I don't feel that it does FSet justice.  Here is my analysis.

Dantam claims that his library is 30% to 50% faster than FSet on common operations.  While I haven't done comprehensive micro-benchmarking, a couple of quick tests indicates that this claim is plausible.  A look through the internals of the implementation confirms that it is clean and tight, and I must commend him.  There may be some techniques in here that I could usefully borrow.

Most of the performance difference is necessitated by two design choices that were made differently in the two libraries.  One of these Dantam mentions in his comparison: FSet's use of a single, global ordering relation implemented as a CLOS generic function, vs. Sycamore's more standard choice of requiring a comparison function to be supplied when a collection is created.  The other one he doesn't mention: the fact that FSet supports a notion of equivalent-but-unequal values, which are values that are incomparable — there's no way, or at least no obvious way, to say which is less than the other, and yet we want to treat them as unequal.  The simplest example is the integer 1 and the single-float 1.0, which have equal numerical values (and cl:= returns true on them), but which are nonetheless not eql.  (I have a previous blog post that goes into a lot more detail about equality and comparison.)  Since Sycamore expects the user-supplied comparison function to return an integer that is negative, zero, or positive to indicate the ordering of its arguments, there's no encoding for the equivalent-but-unequal case, nor is there any of the code that would be required to handle that case.

Both of these decisions were driven by my goal for the FSet project.  I didn't just want to provide a functional collections library that could be called occasionally when one had a specific need for such a data structure.  My ambition was much grander: to make functional collections into a reasonable default choice for the vast majority of programming situations.  I wanted FSet users (including, of course, myself) to be able to use functional collections freely, with very little extra effort or thought.  While Lisp by itself reaches a little bit in this direction — lists can certainly be used functionally — lists used as functional collections run into severe time complexity problems as those collections get large.  I wanted the FSet collections to be as convenient and well-supported as lists, but without the time complexity issues.

— Or rather, I wanted them to be even more convenient than lists.  Before writing FSet, I had spent years working in a little-known proprietary language called Refine, which happened to be implemented on top of Common Lisp, so it was not unusual to switch between the two languages.  And I had noticed something.  In contrast to CL, with its several different predefined equality predicates and with its functions that take :test arguments to specify which one to use, Refine has a single notiion of equality.  The value space is cleanly divided between immutable types, which are compared by value — along with numbers, these include strings, sets, maps, and seqs — and mutable objects, which are always compared by identity.  And it worked!  I found I did not miss the ability to specify an equality predicate when performing an operation such as "union".  It was just never needed.  Get equality right at the language level, and the problem goes away.

Although FSet's compare generic function isn't just for equality — it also defines an ordering that is used by the binary trees — I thought it would probably turn out to be the case that a single global ordering, implemented as a generic function and therefore extensible, would be fine the vast majority of the time.  I think experience has borne this out.  And just as you can mix types in Lisp lists — say, numbers and symbols — without further thought, so you can have any combination of types in an FSet set, effortlessly.  (A project I'm currently working on actually takes considerable advantage of this capability.)

As for supporting equivalent-but-unequal values, this desideratum flows directly from the principle of least astonishment.  While it might not be too surprising for a set or map implementation to fail distinguish the integer 1 from the float 1.0, it certainly would be very surprising, and almost certainly a source of bugs in a compiler that used it, for it to fail to distinguish two uninterned symbols with the same name.  (I saw a macro expansion recently that contained two distinct symbols that both printed as #:NEW.  It happens.)  A compiler using Sycamore for a map on symbols would have to supply a comparison function that accounted for this; it couldn't just compare the package name and symbol name.  (You'd have to do something like keep a weak hash table mapping symbols to integers, assigned in the order in which the comparison function encountered them.  It's doable, but FSet protects you from this madness.)

Along with those deep semantic design choices, I've spent a lot of time on developing a wide and featureful API for FSet (an effort that's ongoing).  FSet has many features that Sycamore lacks, including:

  • seqs, a binary-tree sequence implementation that holds arbitrary Lisp objects (Sycamore ropes hold only characters, which is certainly an important special case, but why restrict ourselves?)
  • default values for maps and seqs (the value to return for an invalid key is associated with the collection, not supplied at the call site; this turns out to be a significant convenience)
  • generic functions that operate on both lists and FSet collections, to shadow the CL builtins
  • the powerful map-union and map-intersection operations (I'll blog about these in the future)
  • more ways to iterate over the collections (the FSet tutorial has a good summary, about 3/4 of the way down)
  • speaking of the tutorial, FSet has lots more documentation

Let me digress slightly to give an example of how FSet makes programming more elegant and convenient.  Joe Marshall just put up a blog post comparing Go(lang) with Common Lisp, which is worth a read on its own; I'm just going to grab a code snippet from there to show a little bit of what programming with FSet is like.  Here's Joe's code:

 (defun collate (items &key (key #'identity) (test #'eql) (merger (merge-adjoin #'eql)) (default nil))
   (let ((table (make-hash-table :test test)))
     (dolist (item items table)
       (let ((k (funcall key item)))
         (setf (gethash k table) (funcall merger (gethash k table default) item))))))

 (defun merge-adjoin (test)
   (lambda (collection item)
     (adjoin item collection :test test)))

And here's what I would write using FSet:

 (defun collate (items &key (key #'identity))
   (let ((result (map :default (set))))
     (dolist (item items result)
       (includef (@ result (funcall key item)) item))))

(Well, I would probably move result outside the dolist form to make it clearer what the return value is, but let's go with Joe's stylistic choice here.)

For those who haven't used FSet: the form (map :default (set)) creates a map whose default is the empty set, meaning that lookups on that map will return the empty set if the key is not in the map.  This saves the includef form from having to handle that possibility.

My version makes assumptions, it's true, about how you want to collect the items with a given key; it doesn't give you other choices.  It could, but what would be the point?  It's already using a general set with better time complexity than lists, and saving you from having to write anything like merge-adjoin.  The extensible global equivalence relation means you're not going to need to supply a :test either.

I think the FSet-enhanced code is cleaner, more elegant, and therefore clearer than the plain-CL version.  Don't you agree?  Maybe you wouldn't say it's a huge improvement, okay, but it's a small example; in a larger codebase, I would argue, these small improvements add up.

* * * * *

To summarize: if you just want a library you can call in a few places for specific purposes, Sycamore might work better for you (but think hard if you're writing a comparator for symbols).  FSet can certainly be used that way, but it can be much more.  If you want to see one way in which Common Lisp can be made into a better language, without giving up anything that we love about it, I urge you to give FSet a try.

FSet has changed the way I write Lisp programs.  — an FSet user

(UPDATE: the magnitude of the performance difference between FSet and Sycamore surprised me, and inspired me to do some profiling of FSet.  It turned out that I could get a 20% speedup on one micro-benchmark simply by adding some inline declarations.  Mea culpa, mea culpa, mea maxima culpa; I should have done this years ago.   With that change, the generic function overhead appears to be the only significant cause of the remaining ~20% performance difference.  I tried creating a Sycamore set using a thin wrapper around fset:compare, and the resulting performance was very similar to that of FSet with its new inlines.)