31

Say I have a have request payload

PUT /user
{
  email: "invalid"
  ...
}

In the backend there is a email regex, which I cannot modify. Currently the behavior is to output:

{
  "error": "'email' fails to pass regex '<some_regex_here>'`
}

Should I go with existing behavior or change the output response to

{
  "error": "'email' is invalid"
}
夢のの夢
  • 427
  • 1
  • 4
  • 5

6 Answers6

150

For any error message (and mostly for any message at all), you need to ask yourself:

  • Who is the audience of the message?
  • What can they do about the problem?
  • What information do they need to solve the problem?

I would argue that knowing the regex is pretty much useless to the end user, because even if they know what a regex is, it doesn't help them fix the problem:

  • They made a typo; the fact that the email is wrong is enough information for them to take a second look at it.
  • The email is correct; that means the regex is probably wrong. Doesn't help them (the end user) to fix the problem, because they don't have a problem. It is you (the developer) that has the problem.

Knowing the regex would allow me to tweak the email address so that it passes the regex, but that makes no sense; if I tweak the email address just so that it passes the regex, it will no longer work for the intended purpose.

Mike P
  • 109
  • 3
Jörg W Mittag
  • 104,619
31

Yes, this is bad for various reasons.

A normal end user is not going to gain anything from reading the validation regex over just reading an error message.

An attacker may or may not be able to use the exact regex to craft an attack string that causes denial of service or compromise of security. This is not likely, but it's certainly more likely with the regex than without it.

Requirements on the format of user-selectable values should always be expressible in a single, simple sentence. Anything more complex will cause more confusion than it resolves. Note: simply saying that your email must satisfy RFC XXXX is not simple enough - the official spec for email addresses is already surprisingly (or perhaps staggeringly) complex.

Kilian Foth
  • 110,899
16

As someone who previously used email addresses that too many sites thought were invalid, I appreciated at least knowing that you used a regex for validation, because unless all it does is check for an @ with at least one character on each side, I almost guarantee you got it wrong. In the worst case I saw, it accepted my email during registration, but later rejected it during login.

Even a non-technical user can post a question somewhere that says, "Site X won't accept my email address. It keeps saying it doesn't match the regex, whatever that is." And someone can tell them it's most likely the site's fault for only accepting a subset of valid email addresses, and they'll know to look out for the "regex" word, even if they don't know what it means.

Karl Bielefeldt
  • 148,830
10

From a general security perspective, the "best practice" principle is to avoid exposing internal details of the system to a user when an error occurs, to prevent a hacker from using that information to breach the system.

That's why IIS operates in two modes: a "User Mode," where a faulty page displays, at most, an HTTP response code like 404 or 500, and an authenticated "Administrative Mode," which will also supply detailed error information like stack traces.

In some cases, pages will actually display incomplete or outright wrong information. For example, in login pages it is common to respond to an incorrect password with something like "Authentication Failed," without identifying whether the login name or password is the problem. If a user tries to open a web page for which they don't have adequate permissions, the web server may simply respond with 500 instead of telling the user they don't have permission.

Robert Harvey
  • 200,592
2

You have a HTTP API. Probably RESTful one, but there's no need to jump to conclusion.

There are three point of views in play:

  • API is usually consumed by other code. This means that API is consumed by someone who wrote the code. A programmer. Or a tech savvy user. It would be a good user experience for them to provide as detailed error message as possible. If you are worried for the end user, you needn't to be. Just change the message on the FRONTEND to something your END USER will understand.
  • This being a HTTP API, and the e-mail in question being an user input, this particular behavior should be implemented as 400 Bad Request. Again, at this point, we are dealing with the client error 4xx, client being the frontend or other API consuming your API. It is a good practice to include enough information in 4xx error messages for the consumer to fix stuff on their side. And let them (developers of the frontend) deal with end users and transforming the error messages. IMHO it's too soon to make conclusion about end users at API level.
  • Finally, security. I don't see any security problem with displaying regex used to validate an e-mail. Security by obscurity is a discouraged practice and does not achieve any real security. Implement proper security instead.

With that being said, definitely include the Regex into the error message, I as a client side developer want to know why our users cannot register with the app thta's using your API without DMing you or looking into your backend code.

netchkin
  • 240
0

Things don't get black or white here. There are multiple questions to answer.

  1. What is the API character?
    • Public API: If the REST is a public API, you might want to document it well and add samples that are both valid and invalid. This is a good practice in general. I like the informational flavor of the error showing what regex was used for the validation, which might be helpful for the developers. However, it is possible the email is correct and the Regex fails anyway (read point no. 3).
    • Private API: The REST is for the communication of the internal systems. You don't need to be that verbose as from the previous point, however, it is a good practice in general.
  2. What happens with the error message?
    • Caught and logged: It might be useful to see the Regex the validation failed on at least right in logs.
    • Propagated to front-end: Here is a question, who is a consumer, in other words, what is the qualification of a reader of the error message. If it is anybody who is either not technically qualified OR the Regex knowledge has no benefit for them, it makes no sense to add it.
  3. What if the email address is correct but it still fails on Regex?
    • Null validation: Be careful. You might check for null and throw such an error message which is misleading. This is valid for any validation happening either before or after Regex validation.
    • Regex correctness: There are various Regex expressions and each one behaves a bit differently and follows different standards. Read more here and here. A scenario a valid email doesn't pass is possible. Again, the presence of such a verbose error message depends on the target who deals with it. If it is a public API and a tech-savvy person is a user, he might create an issue that the email is valid but such Regex doesn't match it with a link to the Regex101.com sample. For anybody else, such information has no real or minimal value.
  4. How about security?
    • Safety first: Is there at least a minimal risk of abuse of knowing internal system details which might cause any harm in the future? If so, forget to expose such information. Also, it is a good practice to rather merge such messages into a generic one:

      The combination of the email, birth date, password and security code is invalid.

Answering these would give you a general idea of whether it is better to include a Regex or not.

Disclaimer: If you finally decide to include the Regex in the error message, remember to place the actual one and not a hard-coded text. There is nothing worse to display a different Regex from the one which is actually used.

Nikolas
  • 603