112

I'm developing a library intended for public release. It contains various methods for operating on sets of objects - generating, inspecting, partitioning and projecting the sets into new forms. In case it's relevant, it's a C# class library containing LINQ-style extensions on IEnumerable, to be released as a NuGet package.

Some of the methods in this library can be given unsatisfiable input parameters. For example, in the combinatoric methods, there is a method to generate all sets of n items that can be constructed from a source set of m items. For example, given the set:

1, 2, 3, 4, 5

and asking for combinations of 2 would produce:

1, 2
1, 3
1, 4
etc...
5, 3
5, 4

Now, it's obviously possible to ask for something that can't be done, like giving it a set of 3 items and then asking for combinations of 4 items while setting the option that says it can only use each item once.

In this scenario, each parameter is individually valid:

  • The source collection is not null, and does contain items
  • The requested size of combinations is a positive nonzero integer
  • The requested mode (use each item only once) is a valid choice

However, the state of the parameters when taken together causes problems.

In this scenario, would you expect the method to throw an exception (eg. InvalidOperationException), or return an empty collection? Either seems valid to me:

  • You can't produce combinations of n items from a set of m items where n > m if you're only allowed to use each item once, so this operation can be deemed impossible, hence InvalidOperationException.
  • The set of combinations of size n that can be produced from m items when n > m is an empty set; no combinations can be produced.

The argument for an empty set

My first concern is that an exception prevents idiomatic LINQ-style chaining of methods when you're dealing with datasets that may have unknown size. In other words, you might want to do something like this:

var result = someInputSet
    .CombinationsOf(4, CombinationsGenerationMode.Distinct)
    .Select(combo => /* do some operation to a combination */)
    .ToList();

If your input set is of variable size, this code's behaviour is unpredictable. If .CombinationsOf() throws an exception when someInputSet has fewer than 4 elements, then this code will sometimes fail at runtime without some pre-checking. In the above example this checking is trivial, but if you're calling it halfway down a longer chain of LINQ then this might get tedious. If it returns an empty set, then result will be empty, which you may be perfectly happy with.

The argument for an exception

My second concern is that returning an empty set might hide problems - if you're calling this method halfway down a chain of LINQ and it quietly returns an empty set, then you may run into issues some steps later, or find yourself with an empty result set, and it may not be obvious how that happened given that you definitely had something in the input set.

What would you expect, and what's your argument for it?

ErikE
  • 1,181
anaximander
  • 2,295

13 Answers13

145

Return an Empty Set

I would expect an empty set because:

There are 0 combinations of 4 numbers from the set of 3 when i can only use each number once

Ewan
  • 83,178
82

When in doubt, ask someone else.

Your example function has a very similar one in Python: itertools.combinations. Let's see how it works:

>>> import itertools
>>> input = [1, 2, 3, 4, 5]
>>> list(itertools.combinations(input, 2))
[(1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5)]
>>> list(itertools.combinations(input, 5))
[(1, 2, 3, 4, 5)]
>>> list(itertools.combinations(input, 6))
[]

And it feels perfectly fine to me. I was expecting a result that I could iterate over and I got one.

But, obviously, if you were to ask something stupid:

>>> list(itertools.combinations(input, -1))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: r must be non-negative

So I'd say, if all your parameters validate but the result is an empty set return an empty set, you’re not the only one doing it.


As said by @Bakuriu in the comments, this is also the same for an SQL query like SELECT <columns> FROM <table> WHERE <conditions>. As long as <columns>, <table>, <conditions> are well formed formed and refer to existing names, you can build a set of conditions that exclude each other. The resulting query would just yield no rows instead of throwing an InvalidConditionsError.

71

In layman's terms:

  • If there is an error, you should raise an exception. That may involve doing things in steps instead of in a single chained call in order to know exactly where the error happened.
  • If there is no error but the resulting set is empty, don't raise an exception, return the empty set. An empty set is a valid set.
Tulains Córdova
  • 39,570
  • 13
  • 100
  • 156
53

I agree with Ewan's answer but want to add a specific reasoning.

You are dealing with mathematical operations, so it might be a good advice to stick with the same mathematical definitions. From a mathematical standpoint the number of r-sets of an n-set (i.e. nCr) is well defined for all r > n >= 0. It is zero. Therefore returning an empty set would be the expected case from a mathematical standpoint.

sigy
  • 716
24

I find a good way of determining whether to use an exception, is to imagine people being involved in the transaction.

Taking fetching the contents of a file as an example:

  1. Please fetch me the contents of file, "doesn't exist.txt"

    a. "Here's the contents: an empty collection of characters"

    b. "Erm, there's a problem, that file doesn't exist. I don't know what to do!"

  2. Please fetch me the contents of file, "exists but is empty.txt"

    a. "Here's the contents: an empty collection of characters"

    b. "Erm, there's a problem, there's nothing in this file. I don't know what to do!"

No doubt some will disagree, but to most folk, "Erm, there's a problem" makes sense when the file doesn't exist and returning "an empty collection of characters" when the file is empty.

So applying the same approach to your example:

  1. Please give me all all combinations of 4 items for {1, 2, 3}

    a. There aren't any, here's an empty set.

    b. There's a problem, I don't know what to do.

Again, "There's a problem" would make sense if eg null were offered as the set of items, but "here's an empty set" seems a sensible response to the above request.

If returning an empty value masks a problem (eg a missing file, a null), then an exception generally should be used instead (unless your chosen language supports option/maybe types, then they sometimes make more sense). Otherwise, returning an empty value will likely simplify the cost and complies better with the principle of least astonishment.

David Arno
  • 39,599
  • 9
  • 94
  • 129
10

As it's for a general purpose library my instinct would be Let the end user choose.

Much like we have Parse() and TryParse() available to us we can have the option of which we use depending on what output we need from the function. You'd spend less time writing and maintaining a function wrapper to throw the exception than arguing over choosing a single version of the function.

James Snell
  • 3,188
4

You need to validate the arguments provided when your function is called. And as a matter of fact, you want to know how to handle invalid arguments. The fact that multiple arguments depend on each other, doesn't make up for the fact that you validate the arguments.

Thus I would vote for the ArgumentException providing the necessary information for the user to understand what went wrong.

As an example, check the public static TSource ElementAt<TSource>(this IEnumerable<TSource>, Int32) function in Linq. Which throws an ArgumentOutOfRangeException if the index is less than 0 or greater than or equal to the number of elements in source. Thus the index is validated in regards to the enumerable provided by the caller.

sbecker
  • 189
  • 5
3

I can see arguments for both use cases - an exception is great if downstream code expects sets which contain data. On the other hand, simply an empty set is great if if this is expected.

I think it depends on the expectations of the caller if this is an error, or an acceptable result - so I would transfer the choice to the caller. Maybe introduce an option?

.CombinationsOf(4, CombinationsGenerationMode.Distinct, Options.AllowEmptySets)

3

There are two approaches to decide if there's no obvious answer:

  • Write code assuming first one option, then the other. Consider which one would work best in practice.

  • Add a "strict" boolean parameter to indicate whether you want the parameters to be strictly verified or not. For example, Java's SimpleDateFormat has a setLenient method to attempt parsing inputs that don't fully match the format. Of course, you'd have to decide what the default is.

3

You should do one of the following (though continuing to consistently throw on basic problems such as a negative number of combinations):

  1. Provide two implementations, one that returns an empty set when the inputs together are nonsensical, and one that throws. Try calling them CombinationsOf and CombinationsOfWithInputCheck. Or whatever you like. You can reverse this so the input-checking one is the shorter name and the list one is CombinationsOfAllowInconsistentParameters.

  2. For Linq methods, return the empty IEnumerable on the exact premise you've outlined. Then, add these Linq methods to your library:

    public static class EnumerableExtensions {
       public static IEnumerable<T> ThrowIfEmpty<T>(this IEnumerable<T> input) {
          return input.IfEmpty<T>(() => {
             throw new InvalidOperationException("An enumerable was unexpectedly empty");
          });
       }
    
       public static IEnumerable<T> IfEmpty<T>(
          this IEnumerable<T> input,
          Action callbackIfEmpty
       ) {
          var enumerator = input.GetEnumerator();
          if (!enumerator.MoveNext()) {
             // Safe because if this throws, we'll never run the return statement below
             callbackIfEmpty();
          }
          return EnumeratePrimedEnumerator(enumerator);
       }
    
       private static IEnumerable<T> EnumeratePrimedEnumerator<T>(
          IEnumerator<T> primedEnumerator
       ) {
          yield return primedEnumerator.Current;
          while (primedEnumerator.MoveNext()) {
             yield return primedEnumerator.Current;
          }
       }
    }
    

    Finally, use that like so:

    var result = someInputSet
       .CombinationsOf(4, CombinationsGenerationMode.Distinct)
       .ThrowIfEmpty()
       .Select(combo => /* do some operation to a combination */)
       .ToList();
    

    or like this:

    var result = someInputSet
       .CombinationsOf(4, CombinationsGenerationMode.Distinct)
       .IfEmpty(() => _log.Warning(
          $@"Unexpectedly received no results when creating combinations for {
             nameof(someInputSet)}"
       ))
       .Select(combo => /* do some operation to a combination */)
       .ToList();
    

    Please note that the private method being different from the public ones is required for the throwing or action behavior to occur when the linq chain is created instead of some time later when it is enumerated. You want it to throw right away.

    Note, however, that of course it has to enumerate at least the first item in order to determine if there are any items. This is a potential drawback that I think is mostly mitigated by the fact that any future viewers can quite easily reason that a ThrowIfEmpty method has to enumerate at least one item, so should not be surprised by it doing so. But you never know. You could make this more explicit ThrowIfEmptyByEnumeratingAndReEmittingFirstItem. But that seems like gigantic overkill.

I think #2 is quite, well, awesome! Now the power is in the calling code, and the next reader of the code will understand exactly what it's doing, and won't have to deal with unexpected exceptions.

ErikE
  • 1,181
2

Based on your own analysis, returning the empty set seems clearly right — you've even identified it as something some users may actually want and have not fallen into the trap of forbidding some usage because you can't imagine users ever wanting to use it that way.

If you really feel that some users may want to force nonempty returns, then give them a way to ask for that behavior rather than forcing it on everyone. For example, you might:

  • Make it a configuration option on whatever object is performing the action for the user.
  • Make it a flag the user can optionally pass into the function.
  • Provide an AssertNonempty check they can put into their chains.
  • Make two functions, one that asserts nonempty and one that does not.
2

It really depends on what your users expect to get. For (a somewhat unrelated) example if your code performs division, you may either throw an exception or return Inf or NaN when you divide by zero. Neither is right or wrong, however:

  • if you return Inf in a Python library, people will assault you for hiding errors
  • if you raise an error in a Matlab library, people will assault you for failing to process data with missing values

In your case, I'd pick the solution which will be least astonishing for end users. Since you're developing a library dealing with sets, an empty set seems like something your users would expect to deal with, so returning it sounds like a sensible thing to do. But I may be mistaken: you have a much better understanding of the context than anyone else here, so if you expect your users to rely on the set always being not empty, you should throw an exception right away.

Solutions which let the user choose (like adding a "strict" parameter) aren't definitive, since they replace the original question with a new equivalent one: "Which value of strict should be the default?"

1

It is common conception (in mathematics) that when you select elements over a set you could find no element and hence you obtain an empty set. Of course you have to be consistent with mathematics if you go this way:

Common set rules:

  • Set.Foreach(predicate); // always returns true for empty sets
  • Set.Exists(predicate); // always returns false for empty sets

Your question is very subtle:

  • It could be that input of your function has to respect a contract: In that case any invalid input should raise a exception, that's it, the function is not working under regular parameters.

  • It could be that input of your function has to behave exactly like a set, and therefore should be able to return an empty set.

Now if I were in you I would go the "Set" way, but with a big "BUT".

Assume you have a collection that "by hypotesis" should have only female students:

class FemaleClass{

    FemaleStudent GetAnyFemale(){
        var femaleSet= mySet.Select( x=> x.IsFemale());
        if(femaleSet.IsEmpty())
            throw new Exception("No female students");
        else
            return femaleSet.Any();
    }
}

Now your collection is no longer a "pure set", because you have a contract on it, and hence you should enforce your contract with a exception.

When you use your "set" functions in a pure way you should not throw exceptions in case of empty set, but if you have collections that are no more "pure sets" then you should throw exceptions where proper.

You should always do what feels more natural and consistent: to me a set should adhere to set rules, while things that are not sets should have their properly thought rules.

In your case it seems a good idea to do:

List SomeMethod( Set someInputSet){
    var result = someInputSet
        .CombinationsOf(4, CombinationsGenerationMode.Distinct)
        .Select(combo => /* do some operation to a combination */)
        .ToList();

    // the only information here is that set is empty => there are no combinations

    // BEWARE! if 0 here it may be invalid input, but also a empty set
    if(result.Count == 0)  //Add: "&&someInputSet.NotEmpty()"

    // we go a step further, our API require combinations, so
    // this method cannot satisfy the API request, then we throw.
         throw new Exception("you requsted impossible combinations");

    return result;
}

But it is not really a good idea, we have now a invalid state that can occurr at runtime at random moments, however that's implicit in the problem so we cannot remove it, sure we can move the exception inside some utility method (it is exactly the same code, moved in different places), but that's wrong and basically the best thing you can do is stick to regular set rules.

Infact adding new complexity just to show you can write linq queries methods seems not worth for your problem, I'm pretty sure that if OP can tell us more about it's domain, probably we could find the spot where the exception is really needed (if at all, it is possible the problem does not require any exception at all).