Saturday, October 18, 2025

FSet 2.0 update

Someone asked me what the rationale is for the decision, in my FSet 2.0 release candidate, to have no default default for maps and seqs, so that an out-of-domain lookup will signal an error.  I started to write an answer, but after putting the arguments for and against this change down on the page and mulling them over for a few days, I concluded it was a mistake and decided to reverse it.

So in FSet 2.0, it will still be the case, unless you specify otherwise, that an out-of-domain lookup on a map, or an out-of-bounds lookup on a seq, will simply return nil (with a nil second value).  You do, as before, have the option to specify a different default, and now you also have the option to specify no default, if you want out-of-domain/bounds lookups to signal an error.

I have tagged v2.0.0-rc1. 

This has been a difficult decision that I have changed my mind about a few times.  Let me summarize the arguments for and against the change.  I'll start with some in favor of not having a default default:

  • It will be simpler to explain to new FSet users that the map or seq has a default only if explicitly given one.
  • Users will supply a default of nil only for those maps and seqs which actually have out-of-domain/bounds lookups done on them.  More maps and seqs will have no default, which will surface cases when an intended invariant, that the lookups are all in-domain, is violated; this will improve the overall robustness of their code.
  • Some operations, primarily map-union, map-intersection, and compose, are easier to use when their arguments have no defaults; if they have nil defaults, the function passed in to combine or map values (often specified as a lambda expression) must explicitly handle nil, which is often inelegant.  If there is no default default, fewer people will trip over this speed bump.

Some arguments in favor of a nil default default:

  • It's consistent with FSet past practice; having no default default will require migration effort on the part of FSet users.
  • It's consistent with the majority of CL collection accessors (assoc, gethash, nth).
  • It's consistent with other FSet behaviors, such as that of arb on an empty set, which returns two nil values.

Minimizing migration effort is somewhat desirable, of course, but I try not to overweight it.  There's an old story I once heard about Stu Feldman, the original author of make.  He wrote it and passed it around to his colleagues at Bell Labs.  Pretty soon he realized that the syntax was a dumpster fire, but he didn't want to fix it, the story goes, because he already had ten users.  And now millions of us have to live with it.

So I'm willing to impose some migration pain on existing users, as long as it doesn't seem excessive, if I believe they themselves will be happier in the long run.   It's not that their interests don't count; it's just that future benefits can outweigh present pain.  And in this case, I think the amount of present pain would not have been large; I did the conversion on some of my own code that uses FSet, and it didn't seem very hard.  So all told, the migration argument carried a little weight, but not a huge amount.

As for the CL collection accessors, there is some inconsistency there already.  Sequence accessors — svref, elt, and aref — do signal an error on an out-of-bounds index, except perhaps at safety 0.  (Surprisingly, at least to me, of these only elt is specified to signal an error, but the other two do so also in all the implementations I've tried.)  nth is a funny case; at least in the major implementations, on a positive index greater than or equal to the length of the list, it just returns nil, but on a negative index it signals an error.  The consistency-with-CL argument is thus not quite as strong as it may sound, when CL isn't even completely self-consistent.  Of course, the map accessors assoc and gethash do return nil on an out-of-domain lookup.  All told, again, this argument carries somewhat more weight for me than the migration argument, but it's not overwhelming.

The argument from internal consistency of FSet was the one that tipped the balance for me.  There are other access operations besides lookup that indicate failure by returning a second (or sometimes third) value which is false.  I suppose I could have changed these to signal errors also, but this seemed a bridge too far; in the cases of set and bag operations, there isn't currently a way you could select between the error behavior and the return-nil behavior, the way that the choice of defaults allows you to do for maps and seqs.

I also tried to estimate the frequency of the following two cases:

  • In a no-default-default FSet, how often would users have to add an explicit :default nil to prevent undesired lookup errors?
  • In a nil-default-default FSet,  how often would users have to add an explicit :no-default or :no-default? t to cause errors on out-of-domain lookups, or for reasons having to do with map-union etc?

Although it's hard to be extremely confident about my estimates without seeing a lot of code others have written against FSet, my experience suggests that the former would be several times as frequent as the latter.  This argument also helps tip the balance toward a nil default default.

Saturday, October 11, 2025

FSet 2.0 is coming!

I have pushed and tagged the first release candidate, v2.0.0-rc0, of FSet version 2!  I'm keeping it in a GitLab Merge Request (MR) for the moment, but I am very much hoping to get some FSet users to try it out and give me some feedback.

One major change is that sets and maps now use the CHAMP implementations by default.  This change should be transparent as long as:

  • you haven't written any complex custom compare methods (if all the method does is call compare-slots, it can be easily converted to use the new macro define-equality-slots), and
  • you don't care about the ordering of your sets and maps, or in the cases where you do care, you've used the new custom-ordering features.

The second major change is to the defaulting behavior of maps and seqs.  FSet 1 uses a "default default" of nil, meaning that if you don't supply an explicit default when creating a map or seq, its default is nil.  The default is returned on a map lookup when the supplied key is not in the map; it is returned on a seq lookup when the supplied index is not in bounds (the bounds being 0 up to, but excluding, the size of the seq).

In FSet 2, there is no default default.  If you don't supply an explicit default, the map or seq has no default, and an access attempt will signal an error instead in these cases.  So, migrating your code to FSet 2 will probably require a little debugging — running your test suite, noting when you get one of the new errors, finding the form where the map or seq involved is initially created, and adding :default nil to the form or wrapping it in (with-default ... nil)UPDATE: this decision has been reversed in v2.0.0-rc1. 

Examples:

;; The map constructor macros accept a default anywhere in the form
(map)          -->  (map :default nil)
(map ('a 3))   -->  (map ('a 3) :default nil)
(replay-map (14 'x) (9 'q)) --> (replay-map :default nil (14 'x) (9 'q))
;; The seq constructor macro does not
(seq 3 1 4)    -->  (with-default (seq 3 1 4) nil)
;; The constructor functions take a :default keyword argument
(empty-map)    -->  (empty-map :default nil)
(empty-seq)    -->  (empty-seq :default nil)
;; Tuples associate defaults with the keys rather than the tuples
(define-tuple-key foo) --> (define-tuple-key foo :default nil) 

But, there's good news!  You don't have to convert your code if you don't want to.  Merely loading FSet 2 doesn't expose your code to these changes; the behavior of names exported from package fset has mostly not changed.   Instead, I've added a new package, fset2, that exports its own versions of the names with new behavior.  So, to use FSet 2, change :use fset in your defpackage form(s) to :use fset2.

(There is one change you will see even if you don't use the new package, having to do with the printing of map and seq defaults.  Previously, a nil default would not be printed explicitly; now, it will be, so you'll see things like ##{| (a 3) |}/NIL and #[ 3 1 4 ]/NIL.)

For complete details of all changes in this release, see the MR.

So, for anybody who wants to help me out, here's what I ask:

  1. Clone this repo (or this one), and in your copy, do: git checkout fset2.
  2. If you didn't clone it in ~/quicklisp/local-projects/, arrange for Quicklisp to find this copy, in whatever way you do that (e.g. by pushing the directory pathname onto asdf:*central-registry*).
  3. Recompile your client code and test it.  If anything doesn't work, please let me know immediately.
  4. Go into the :use clause of your defpackage form(s) and change fset to fset2.
  5. Recompile your client code again, and test it again.  This time you may need to make some changes, as discussed above.  Let me know how much trouble you have, whether a little or a lot (and especially let me know if you give up).  You can post comments in the MR, or in this GitHub issue.

Again, this is a release candidate, not yet a release.  I've tested it pretty thoroughly, but there could still be bugs.  OTOH, if there's something in particular you don't like about it, I may be more willing to make changes than I will be after it's released. 

Share and enjoy!

 

Sunday, September 21, 2025

How well different Common Lisps run FSet

I just did some quick-and-dirty benchmarking, using FSet's test suite.  It was not designed as a benchmark, but I think it still gives a useful rough indication of how well FSet runs on different platforms.

These tests were all run on an Intel Xeon "Ivy Bridge" except the first one, which was on an Apple M2 MacBook Pro.  Values are the time to run 100 iterations of the test suite; smaller is better.

  SBCL 2.5.3 on M2           1.9
  SBCL 2.5.3                 2.8
  LispWorks 8.0.1            5.8  (Personal Edition, but I doubt that matters)
  CCL 1.13                  13.3 
  Franz Allegro 11          15.7
  ECL 24.5.10               58.1
  CLASP 2.5.0               79.4
  ABCL 1.9.2                85.3
 

Yikes!  Ignoring the M2 number, that's a factor of 30 — a very wide range.  I don't think the test is entirely fair, because I develop on SBCL and haven't put any effort into optimizing for other platforms.  I suspect the CCL and Allegro times could be improved somewhat.  The poor performance of ECL and CLASP surprises me; FSet spends most of its time doing ordinary struct and simple-vector accesses, which I would think would translate well into C.  Maybe they're still doing a lot of type- and bounds-checking, even though I've requested safety 0?

As for ABCL, I think it's a remarkable achievement that it is compatible enough to run FSet at all; I can't fault it for not being a speed demon.  My guess is that the biggest gains to be had here would be from improving ABCL itself, rather than tweaking FSet.

Friday, September 19, 2025

FSet now has CHAMP sets and maps!

I have just released FSet 1.6.0, which completes the work on CHAMP sets and maps that I started months ago.

CHAMP is a hash-based data structure by Michael Steindorfer, that improves a little on Phil Bagwell's widely-used HAMT.  (The HAMT is used, for example, by Clojure.)

See the GitLab MR  for the details of how to use it.

I did some quick micro-benchmarking, using sets of integers, comparing CHAMP against my older weight-balanced trees.  On lookup (testing whether a value is in the set), CHAMP is about twice as fast at size 4, growing to almost 5x faster at size 2048.  On update (adding an element to the set, with about a 25% chance that it's already there), CHAMP is roughly even with WB at size 4, but over 40% faster at size 2048.

So to summarize, there's a significant and welcome improvement in update performance, and quite a remarkable improvement in lookup performance.  W00t!

 

Monday, September 1, 2025

Receiving Multiple Values

Background: the nlet macro

'Way back in 1980, when I was an MIT undergraduate hacking on the Lisp Machines at the AI Lab, I was already starting to experiment with a more functional style than most people there used.  And one thing I quickly discovered was the usefulness of the multiple-value feature in this style.  For example, instead of doing (when (something) (rotatef a b)) to conditionally swap the values of two variables, I found I could do this:

(multiple-value-bind (a b)
    (if (something) (values b a)
      (values a b))
  ...)

Whether this is really a stylistic improvement or not is certainly debatable, though I still do it, but it does get rid of the assignments.

Of course, the more common use of multiple values is to return more than one piece of information from a function.  Some languages have "out parameters" for this purpose, but Common Lisp, like Lisp Machine Lisp before it, has multiple values; and some CL builtins are specified to return multiple values (floor and ceiling come to mind, but there are many others).  On most implementations, returning multiple values is much faster than consing up a list or other structure to return, because the semantics don't make the tuple of values a firstclass object (unless, of course, you do that explicitly with multiple-value-list).  Instead, like arguments to a call, they are generally returned in registers or on the stack.

Anyway, as I was making fairly frequent use of multiple-value-bind, sometimes nesting two or three such forms together, it seemed to me that I should be able to write them similarly to let forms — that the difference between binding one variable and binding two or more was not so large that I should have to use a completely different construct for the latter.  Also, I noticed that there was space in the syntax of let to extend it: let's binding clauses are specified to be lists of length two (or one, but let's ignore this case), of which the first is the variable to bind and the second is the init-form whose value is to be bound to.  The obvious generalization is that all subforms of the clause but the last are variables to be bound, and the last subform is the init-form that will supply those values.

It also occurred to me, as I was using both let for parallel binding and let* for sequential binding, that there was a way to allow arbitrary combinations of parallel and sequential binding using clause nesting: nesting an inner clause inside an outer one would make the variables bound by the outer one available to the inner one.

Thus was born my macro nlet.  Here's an example:

  (nlet ((a b c (zot))
         ((d (quux a c))
          ((e f (mumble b d))
           (g (mung a))))
         ((h (frobnicate c))
          ((i (glumph h))))
         (*print-level* 3))
    ...)

First a, b, and c are bound to the first three values of (zot), and in parallel, *print-level* is bound to 3; then d and h are bound; then e, f, g, and i are bound.

As this example illustrates, all bindings at a given nesting level are done in parallel, with all bindings at a deeper level following. Stylistically, it is expected that init-forms in nested clauses will refer only to variables bound in containing clause lists.  (More on this point below.)

My Misc-Extensions system exports this macro under two names, nlet and let.   In my own code, I import it as let, shadowing cl:let.  (You'll see it all over the FSet codebase.)

Critical reception: crickets

I published Misc-Extensions at the same time I first published FSet, which I think was 2007 or 2008.  I don't recall anyone ever commenting to me either positively or negatively about nlet.  My guess is that few if any people besides me have ever used it, but I have no way to know that for sure.  In fairness, I didn't make much noise about it; I didn't even add a README with documentation until May 2024, though there was always documentation in the source file.

Let me explain why I like it, though.  Consider: why do people ever use let (I mean cl:let) when let* exists?  One possible reason is because they want to shadow a variable in the containing scope, but they also need one of the init-forms to refer to the outer variable, and they also need that init-form to be evaluated after the init-form that provides the value for the inner variable.  Like this:

(let ((a (something)))
  ...
  (let ((a (must-come-first))
        (b (must-come-second a)))
    ...))

If they didn't want to shadow a, they could use let*.  (And even if they did want to shadow a, if the two init-forms could be reordered, they could still use let*).  Since you never really have to shadow a variable — you can always pick a new name for the inner one — let is redundant; we could always use let*.  And yet, in the CL code I've seen (not counting my own), let is several times as common as let*.  Why is this?

In the preface to Structure and Interpretation of Computer Programs, Hal Abelson and Gerry Sussman famously said, "Programs must be written for people to read, and only incidentally for machines to execute."  While I think that's a tad overstated, it makes an important point.

When let* is used, someone reading the code needs to track the data flow through the clauses, by seeing which init-forms reference which variables bound in previous clauses, in order to understand it.  In contrast, the use of let communicates immediately that there are no data flow dependences between the clauses.  Thus, the reader need not look for them; each of the init-forms can be viewed independently.

The advantage nlet has over let*, then, is that it permits even greater precision in indicating which clauses depend on which other clauses.  Let's look again at this example:

  (nlet ((a b c (zot))
         ((d (quux a c))
          ((e f (mumble b d))
           (g (mung a))))
         ((h (frobnicate c))
          ((i (glumph h))))
         (*print-level* 3))
    ...)

The grouping of the clauses tells the reader at a glance that only the mumble and mung calls can depend on d, and only the glumph call can depend on h.  (This isn't enforced, but it's an easy rule to follow when writing.)  Any of the calls can depend on *print-level*, but if they actually do, it would have been clearer to put that at the top.  If you wanted to make sure that they can't see that binding, you could just wrap the clause in two more parenthesis pairs.

Well.  All that said, my guess is that most people are still going to find this syntax a bridge too far.  What do you think?

Compromise: mvlet and mvlet*

I have just now released a new Misc-Extensions with two new variations on the above theme, mvlet and mvlet*.  These retain the multiple-value binding feature of nlet, but don't support clause nesting; instead they have the familiar semantics of doing fully parallel or fully sequential binding, like let and let* respectively.  I think this is probably the right compromise to attract wider use.  Do you agree?

It's probably clear by now how to use mvlet and mvlet*, but here's an example anyway:

  (let ((a (foo)))
    ...
    (mvlet ((a b (bar))
            (c (baz a))  ; outer 'a'
            ...)
      ...)

    (mvlet* ((a b (bar))
             (c (baz a))  ; inner 'a'
             ...)
      ...))

I've looked around for other published macros that provide the same features — a succinct syntax for binding to multiple values, multiple binding clauses, and a choice of parallel or sequential binding — and I haven't found much.  Alexandria doesn't have it.  I have a faint recollection of seeing a generalized-destructuring macro that allows the use of values expressions as patterns, but I can't seem to find it now.  I don't see this functionality in Trivia.  The closest thing I can find is Ron Garret's bb macro, but (a) it's in his Ergolib project, which is not in Quicklisp, (b) it does only sequential binding, and (c) it doesn't handle declarations in the body correctly (this is noted as a TODO).  If you know of something else, please comment.

I would also note that Dylan, which was created by people who had worked on the design of CL, has a similar syntax for binding to multiple values:

let (a, b, c) = values(...)

I take this as validation of the concept.

I would like this post to start a conversation.  I'll post it on Reddit under /r/Common_Lisp; if you use Reddit, that might be the best place.  Otherwise, feel free to comment below.  Do you use multiple values much?  If so, do you agree that the builtin syntax is a bit clunky?  Do you think you would use any of the macros I've introduced here?

Monday, August 25, 2025

FSet now supports Iterate!

For FSet 1.5.1, I have added support for the popular Iterate iteration macro.  The issue page has a good explanation.

I don't use Iterate myself, or I would no doubt have done this sooner.

Also, to implement some of the Iterate functionality, I extended the stateful seq iterator to support iterating a subsequence, and added a reverse version. So now the iterator method on wb-seq takes keyword arguments start, end, and from-end?, whose semantics is familiar from cl:find etc.

If there's anything else about which you think "I would like to use FSet, but it doesn't work for me because it doesn't have X", I would like to know what that is.  Please post in the blog comments or on Reddit (I'll link to this from r/Common_Lisp).

 

Tuesday, August 19, 2025

FSet 1.5.0 gets custom orderings!

The ordering of the "setlike" collections — sets, maps, and bags — in FSet has always been determined by the generic function fset:compare.  This approach is often very convenient, as it allows you to define the ordering of a new type simply by adding a method on compare; there is no need to supply the ordering explicitly every time you create a new collection.

However, as people have complained from time to time, it is also a bit limiting.  Say you want to make something like a telephone directory (anyone remember telephone directories?) which maps string keys to values, and you would like it maintained in lexicographic order of the keys.  To do this with FSet, you have heretofore had to define a wrapper class, and then a compare method on that class, something like:

(defstruct lexi-string
  value)
(defmethod compare ((a lexi-string) (b lexi-string))
  (compare-lexicographically (lexi-string-value a) (lexi-string-value b)))

Then you would have to wrap your keys in lexi-strings before adding them to your map.  That seems a little wasteful of both time and space.

A second problem with always using fset:compare is that you have to pay the cost of the generic function dispatch several times every time the collection gets searched for an element, as in contains? on a set or lookup on a map.  (The number of such calls is roughly the base-2 logarithm of the size of the collection.)  One micro-benchmark I ran showed this cost to be around 20% of the access time, which is not insignificant.

So, in response to popular demand, I have added custom orderings to FSet: you can supply your own comparison functions when creating collections, and FSet will call those instead of compare.  Use of this feature is completely optional; existing code is not affected.  But if you want to do it, now you can!

I refer you to the PR description for the details.

There is one aspect of this change that might surprise you.  When given objects of different classes, fset:compare doesn't compare the contents of the objects; it just compares their class names and returns :less or :greater accordingly.  So, for instance, a list cannot be equal? to a vector or seq, even if they have the same elements in the same order.  This rule now also covers cases where the objects are collections of the same kind (sets, bags, or maps) but with different orderings.  So just as a wb-set and a ch-set can never be :equal, so two wb-sets with different orderings can never be :equal; compare will just look at the comparison function names to impose an artificial ordering.

I'm not suggesting this is an ideal situation, but I don't see a way around it.  Since comparing two wb-sets of the same ordering relies on that ordering, a combined relation on wb-sets of different orderings would in general fail to be transitive; you would get situations where a < b and b < c, but c < a.