21

Suppose we have a method foo(String bar) that only operates on strings that meet certain criteria; for example, it must be lowercase, must not be empty or have only whitespace, and must match the pattern [a-z0-9-_./@]+. The documentation for the method states these criteria.

Should the method reject any and all deviations from this criteria, or should the method be more forgiving about some criteria? For example, if the initial method is

public void foo(String bar) {
    if (bar == null) {
        throw new IllegalArgumentException("bar must not be null");
    }
    if (!bar.matches(BAR_PATTERN_STRING)) {
        throw new IllegalArgumentException("bar must match pattern: " + BAR_PATTERN_STRING);
    }
    this.bar = bar;
}

And the second forgiving method is

public void foo(String bar) {
    if (bar == null) {
        throw new IllegalArgumentException("bar must not be null");
    }
    if (!bar.matches(BAR_PATTERN_STRING)) {
        bar = bar.toLowerCase().trim().replaceAll(" ", "_");
        if (!bar.matches(BAR_PATTERN_STRING) {
            throw new IllegalArgumentException("bar must match pattern: " + BAR_PATTERN_STRING);
        }
    }
    this.bar = bar;
}

Should the documentation be changed to state that it will be transformed and set to the transformed value if possible, or should the method be kept as simple as possible and reject any and all deviations? In this case, bar could be set by the user of an application.

The primary use-case for this would be users accessing objects from a repository by a specific string identifier. Each object in the repository should have a unique string to identify it. These repositories could store the objects in various ways (sql server, json, xml, binary, etc) and so I tried to identify the lowest common denominator that would match most naming conventions.

Zymus
  • 2,533

6 Answers6

48

Your method should do what it says it does.

This prevents bugs, both from use and from maintainers changing behavior later. It saves time because maintainers don't need to spend as much time figuring out what is going on.

That said, if the defined logic isn't user friendly, it should perhaps be improved.

Telastyn
  • 110,259
21

There are a few points:

  1. Your implementation must do what the documented contract states, and should do nothing more.
  2. Simplicity is important, for both contract and implementation, though more for the former.
  3. Trying to correct for erroneous input adds complexity, counter-intuitively not only to contract and implementation but also to use.
  4. Errors should only be caught early if that improves debugability and doesn't compromise efficiency too much.
    Remember that there are debug-assertions for diagnosing logic errors in debug-mode, which mostly alleviates any performance-concerns.
  5. Efficiency, as far as available time and money allow without compromising simplicity too much, is always a goal.

If you implement a user-interface, friendly error-messages (including suggestions and other help) are part of good design.
But remember that APIs are for programmers, not end-users.


A real-life experiment in being fuzzy and permissive with input is HTML.
Which resulted in everyone doing it slightly differently, and the spec, now it is documented, being a mammoth tome full of special cases.
See Postel's law ("Be conservative in what you do, be liberal in what you accept from others.") and a critic touching on that (Or a far better one MichaelT made me aware of).

Deduplicator
  • 9,209
16

A method's behavior should be clear cut, intuitive, predictable, and simple. In general, we should be very hesitant to do extra processing on a caller's input. Such guesses about what the caller intended invariably have a lot of edge cases that produce undesired behavior. Consider an operation as simple as file path joining. Many (or perhaps even most) file path joining functions will silently discard any preceding paths if one of the paths being joined appears to be rooted! For example, /abc/xyz joined with /evil will result in just /evil. This is almost never what I intend when I join file paths, but because there's no interface that doesn't behave this way, I'm forced to either have bugs or write extra code covering these cases.

That said, there are rare occasions when it makes sense for a method to be "forgiving," but it should always be within the power of the caller to decide when and whether these processing steps apply to their situation. So when you've identified a common preprocessing step that you want to apply to arguments in a variety of situations, you should expose interfaces for:

  • The raw functionality, without any preprocessing.
  • The preprocessing step by itself.
  • The combination of the raw functionality and the preprocessing.

The last is optional; you should only provide it if a large number of calls will use it.

Exposing the raw functionality gives the caller the ability to use it without the preprocessing step when they need to. Exposing the preprocessor step by itself allows the caller to use it for situations where they're not even calling the function or when they want to preprocess some input before calling the function (like when they want to pass it into another function first). Providing the combination allows callers to invoke both without any hassle, which is useful primarily if most callers will use it this way.

jpmc26
  • 5,489
4

As others have said, making the string matching "forgiving" means introducing additional complexity. That means more work in implementing the matching. You now have many more test cases, for example. You have to do additional work to ensure there are no semantically-equal names in the namespace. More complexity also means there is more to go wrong in the future. A simpler mechanism, such as a bicycle, requires less maintenance than a more complex one, such as a car.

So is the lenient string matching worth all that extra cost? It does depend on the use case, as others have noted. If the strings are some kind of external input you have no control over, and there is a definite advantage to lenient matching, it might be worth it. Perhaps the input is coming from end users who may not be very conscientious about space characters and capitalization, and you have a strong incentive to make your product easier to use.

On the other hand, if the input were coming from, say, properties files assembled by technical folks, who should understand that "Fred Mertz" != "FredMertz", I'd be more inclined to make the matching stricter and save the development cost.

I do think in any case there is value in trimming and disregarding leading and trailing spaces, though -- I've seen too many hours wasted on debugging those kinds of issues.

mat_noshi
  • 181
3

You mention some of the context from which this question comes.

Given that, I would have the method do just one thing, it asserts the requirements on the string, let it execute based on that - I wouldn't try to transform it here. Keep it simple and keep it clear; document it, and endeavour to keep the documentation and code in sync with each other.

If you wish to transform the data that comes from the user database in some more forgiving manner, put that functionality into a separate transformation method and document the functionality tied it.

At some point the function's requirements need to be metered out, clearly documented and the execution must continue. The "forgiveness", at his point, is a little mute, it's a design decision and I would argue for the function not mutating its argument. Having the function mutate the input hides some of the validation that would be required of the client. Having a function that does the mutation helps the client get it right.

The big emphasis here is clarity and to document what the code does.

Niall
  • 1,851
-1
  1. You can name a method according to the action like doSomething() , takeBackUp () .
  2. To make it easy to maintain you can keep the common contracts and validation on different procedures . Call them as per use cases.
  3. Defensive programming : your procedure handles wide range of input including (Minimum things those are use cases must be covered anyway)