119

The codebase I'm working on frequently uses instance variables to share data between various trivial methods. The original developer is adamant that this adheres to the best practices stated in the Clean Code book by Uncle Bob/Robert Martin: "The first rule of functions is that they should be small." and "The ideal number of arguments for a function is zero (niladic). (...) Arguments are hard. They take a lot of conceptual power."

An example:

public class SomeBusinessProcess {
  @Inject private Router router;
  @Inject private ServiceClient serviceClient;
  @Inject private CryptoService cryptoService;

  private byte[] encodedData;
  private EncryptionInfo encryptionInfo;
  private EncryptedObject payloadOfResponse;
  private URI destinationURI;

  public EncryptedResponse process(EncryptedRequest encryptedRequest) {
    checkNotNull(encryptedRequest);

    getEncodedData(encryptedRequest);
    getEncryptionInfo();
    getDestinationURI();
    passRequestToServiceClient();

    return cryptoService.encryptResponse(payloadOfResponse);
  }

  private void getEncodedData(EncryptedRequest encryptedRequest) {
    encodedData = cryptoService.decryptRequest(encryptedRequest, byte[].class);
  }

  private void getEncryptionInfo() {
    encryptionInfo = cryptoService.getEncryptionInfoForDefaultClient();
  }

  private void getDestinationURI() {
    destinationURI = router.getDestination().getUri();
  }

  private void passRequestToServiceClient() {
    payloadOfResponse = serviceClient.handle(destinationURI, encodedData, encryptionInfo);
  }
}

I would refactor that into the following using local variables:

public class SomeBusinessProcess {
  @Inject private Router router;
  @Inject private ServiceClient serviceClient;
  @Inject private CryptoService cryptoService;

  public EncryptedResponse process(EncryptedRequest encryptedRequest) {
    checkNotNull(encryptedRequest);

    byte[] encodedData = cryptoService.decryptRequest(encryptedRequest, byte[].class);
    EncryptionInfo encryptionInfo = cryptoService.getEncryptionInfoForDefaultClient();
    URI destinationURI = router.getDestination().getUri();
    EncryptedObject payloadOfResponse = serviceClient.handle(destinationURI, encodedData,
      encryptionInfo);

    return cryptoService.encryptResponse(payloadOfResponse);
  }
}

This is shorter, it eliminates the implicit data coupling between the various trivial methods and it limits the variable scopes to the minimum required. Yet despite these benefits I still cannot seem to convince the original developer that this refactoring is warranted, as it appears to contradict the practices of Uncle Bob mentioned above.

Hence my questions: What is the objective, scientific rationale to favor local variables over instance variables? I can't quite seem to put my finger on it. My intuition tells me that hidden couplings are bad and that a narrow scope is better than a broad one. But what is the science to back this up?

And conversely, are there any downsides to this refactoring that I have possibly overlooked?

Alexander
  • 1,083

12 Answers12

178

What is the objective, scientific rationale to favor local variables over instance variables?

Scope isn't a binary state, it's a gradient. You can rank these from largest to smallest:

Global > Class > Local (method) > Local (code block, e.g. if, for, ...)

Edit: what I call a "class scope" is what you mean by "instance variable". To my knowledge, those are synonymous, but I'm a C# dev, not a Java dev. For the sake of brevity, I've lumped all statics into the global category since statics are not the topic of the question.

The smaller the scope, the better. The rationale is that variables should live in the smallest scope possible. There are many benefits to this:

  • It forces you to think about the current class' responsibility and helps you stick to SRP.
  • It enables you to not have to avoid global naming conflicts, e.g. if two or more classes have a Name property, you're not forced to prefix them like FooName, BarName, ... Thus keeping your variable names as clean and terse as possible.
  • It declutters the code by limiting the available variables (e.g. for Intellisense) to those that are contextually relevant.
  • It enables some form of access control so your data can't be manipulated by some actor you don't know about (e.g. a different class developed by a colleague).
  • It makes the code more readable as you ensure that the declaration of these variables tries to stay as close to the actual usage of these variables as is possible.
  • Wantonly declaring variables in an overly wide scope is often indicative of a developer who doesn't quite grasp OOP or how to implement it. Seeing overly widely scoped variables acts as a red flag that there's probably something going wrong with the OOP approach (either with the developer in general or the codebase in specific).
  • (Comment by Kevin) Using locals forces you to do things in the right order. In the original (class variable) code, you could wrongly move passRequestToServiceClient() to the top of the method, and it would still compile. With locals, you could only make that mistake if you passed an uninitialized variable, which is hopefully obvious enough that you don't actually do it.

Yet despite these benefits I still cannot seem to convince the original developer that this refactoring is warranted, as it appears to contradict the practices of Uncle Bob mentioned above.

And conversely, are there any downsides to this refactoring that I have possibly overlooked?

The issue here is that your argument for local variables is valid, but you've also made additional changes which are not correct and cause your suggested fix to fail the smell test.

While I understand your "no class variable" suggestion and there's merit to it, you've actually also removed the methods themselves, and that's a whole different ballgame. The methods should have stayed, and instead you should've altered them to return their value rather than store it in a class variable:

private byte[] getEncodedData() {
    return cryptoService.decryptRequest(encryptedRequest, byte[].class);
}

private EncryptionInfo getEncryptionInfo() { return cryptoService.getEncryptionInfoForDefaultClient(); }

// and so on...

I do agree with what you've done in the process method, but you should've been calling the private submethods rather than executing their bodies directly.

public EncryptedResponse process(EncryptedRequest encryptedRequest) {
    checkNotNull(encryptedRequest);
byte[] encodedData = getEncodedData();
EncryptionInfo encryptionInfo = getEncryptionInfo();

//and so on...

return cryptoService.encryptResponse(payloadOfResponse);

}

You'd want that extra layer of abstraction, especially when you run into methods that need to be reused several times. Even if you don't currently reuse your methods, it's still a matter of good practice to already create submethods where relevant, even if only to aid code readability.

Regardless of the local variable argument, I immediately noticed that your suggested fix is considerably less readable than the original. I do concede that wanton use of class variables also detracts from code readability, but not at first sight compared to you having stacked all the logic in a single (now long-winded) method.

Flater
  • 58,824
85

The original code is using member variables like arguments. When he says to minimize the number of arguments, what he really means is to minimize the amount of data that the methods requires in order to function. Putting that data into member variables doesn't improve anything.

Alex
  • 911
  • 5
  • 4
54

Other answers have already explained the benefits of local variables perfectly, so all that remains is this part of your question:

Yet despite these benefits I still cannot seem to convince the original developer that this refactoring is warranted, as it appears to contradict the practices of Uncle Bob mentioned above.

That should be easy. Simply point him to the following quote in Uncle Bob's Clean Code:

Have No Side Effects

Side effects are lies. Your function promises to do one thing, but it also does other hidden things. Sometimes it will make unexpected changes to the variables of its own class. Sometimes it will make them to the parameters passed into the function or to system globals. In either case they are devious and damaging mistruths that often result in strange temporal couplings and order dependencies.

(example omitted)

This side effect creates a temporal coupling. That is, checkPassword can only be called at certain times (in other words, when it is safe to initialize the session). If it is called out of order, session data may be inadvertently lost. Temporal couplings are confusing, especially when hidden as a side effect. If you must have a temporal coupling, you should make it clear in the name of the function. In this case we might rename the function checkPasswordAndInitializeSession, though that certainly violates “Do one thing.”

That is, Uncle Bob doesn't just say that a function should take few arguments, he also says that functions should avoid interacting with non-local state whenever possible.

meriton
  • 4,338
27

"It contradicts what someone's uncle thinks" is NEVER a good argument. NEVER. Don't take wisdom from uncles, think for yourself.

That said, instance variables should be used to store information that is actually required to be stored permanently or semi-permanently. The information here isn't. It is very simple to live without the instance variables, so they can go.

Test: Write a documentation comment for each instance variable. Can you write anything that isn't completely pointless? And write a documentation comment to the four accessors. They are equally pointless.

Worst is, assume the way to decrypt changes, because you use a different cryptoService. Instead of having to change four lines of code, you have to replace four instance variables with different ones, four getters with different ones, and change four lines of code.

But of course the first version is preferable if you are paid by the line of code. 31 lines instead of 11 lines. Three times more lines to write, and to maintain forever, to read when you are debugging something, to adapt when changes are needed, to duplicate if you support a second cryptoService.

(Missed the important point that using local variables forces you to make the calls in the right order).

gnasher729
  • 49,096
16

What is the objective, scientific rationale to favor local variables over instance variables? I can't quite seem to put my finger on it. My intuition tells me that hidden couplings are bad and that a narrow scope is better than a broad one. But what is the science to back this up?

Instance variables are for representing the properties of their host object, not for representing properties specific to threads of computation more narrowly scoped than the object itself. Some of the reasons for drawing such a distinction that appear not to already have been covered revolve around concurrency and reentrancy. If methods exchange data by setting the values of instance variables, then two concurrent threads can easily clobber each other's values for those instances variables, yielding intermittent, hard-to-find bugs.

Even one thread alone can encounter problems along those lines, because there is a high risk that a data exchange pattern relying on instance variables makes methods non-reentrant. Similarly, if the same variables are used to convey data between different pairs of methods, then there is a risk that a single thread executing even a non-recursive chain of method invocations will run into bugs revolving around unexpected modifications of the instance variables involved.

In order to get reliably correct results in such a scenario, you need either to use separate variables to communicate between each pair of methods where one invokes the other, or else to have every method implementation take into account all the implementation details of all the other methods it invokes, whether directly or indirectly. This is brittle, and it scales poorly.

9

Discussing just process(...), your colleagues example is far more legible in the sense of business logic. Conversely your counter example takes more than a cursory glance to extract any meaning.

That being said, clean code is both legible and good quality - pushing local state out into a more global space is just high-level assembly, so a zero for quality.

public class SomeBusinessProcess {
  @Inject private Router router;
  @Inject private ServiceClient serviceClient;
  @Inject private CryptoService cryptoService;

  public EncryptedResponse process(EncryptedRequest request) {
    checkNotNull(encryptedRequest);

    return encryptResponse
      (routeTo
         ( destination()
         , requestData(request)
         , destinationEncryption()
         )
      );
  }

  private byte[] requestData(EncryptedRequest encryptedRequest) {
    return cryptoService.decryptRequest(encryptedRequest, byte[].class);
  }

  private EncryptionInfo destinationEncryption() {
    return cryptoService.getEncryptionInfoForDefaultClient();
  }

  private URI destination() {
    return router.getDestination().getUri();
  }

  private EncryptedObject routeTo(URI destinationURI, byte[] encodedData, EncryptionInfo encryptionInfo) {
    return serviceClient.handle(destinationURI, encodedData, encryptionInfo);
  }

  private void encryptResponse(EncryptedObject payloadOfResponse) {
    return cryptoService.encryptResponse(payloadOfResponse);
  }
}

This is a rendition that removes the need for variables at any scope. Yes the compiler will generate them but the important part is that it controls that so the code will be efficient. While also being relatively legible.

Just a point on naming. You want the shortest name that is meaningful and expands on the information already available. ie. destinationURI, the 'URI' is already known by the type signature.

Kain0_0
  • 16,561
7

I would just remove these variables and private methods altogether. Here's my refactor:

public class SomeBusinessProcess {
  @Inject private Router router;
  @Inject private ServiceClient serviceClient;
  @Inject private CryptoService cryptoService;

  public EncryptedResponse process(EncryptedRequest encryptedRequest) {
    return cryptoService.encryptResponse(
        serviceClient.handle(router.getDestination().getUri(),
        cryptoService.decryptRequest(encryptedRequest, byte[].class),
        cryptoService.getEncryptionInfoForDefaultClient()));
  }
}

For private method, e.g. router.getDestination().getUri() is clearer and more readable than getDestinationURI(). I would even just repeat that if I use the same line twice in the same class. To look at it another way, if there's a need for a getDestinationURI(), then it probably belongs in some other class, not in SomeBusinessProcess class.

For variables and properties, the common need for them is to hold values to be used later in time. If the class has no public interface for the properties, they probably shouldn't be properties. The worst kind of class properties use is probably for passing values between private methods by way of side effects.

Anyway, the class only needs to do process() and then the object will be thrown away, there's no need to keep any state in memory. Further refactor potential would be to take out the CryptoService out of that class.

Based on comments, I want to add this answer is based on real world practice. Indeed, in code review, the first thing that I'd pick out is to refactor the class and move out the encrypt/decrypt work. Once that's done, then I'd ask if the methods and variables are needed, are they named correctly and so on. The final code will probably closer to this:

public class SomeBusinessProcess {
  @Inject private Router router;
  @Inject private ServiceClient serviceClient;

  public Response process(Request request) {
    return serviceClient.handle(router.getDestination().getUri());
  }
}

With above code, I don't think it needs further refactor. As with the rules, I think it takes experience to know when and when not to apply them. Rules are not theories that are proven to work in all situations.

Code review on the other hand has real impact on how long before a piece of code can pass. My trick is to have less code and make it easy to understand. A variable name can be a point of discussion, if I can remove it reviewers wouldn't even need to think about it.

imel96
  • 3,608
4

Flater's answer covers issues of scoping quite well, but I think that there's another issue here too.

Note that there's a difference between a function that processes data and a function that simply accesses data.

The former executes actual business logic, whereas the latter saves typing and perhaps adds safety by adding a simpler and more reusable interface.

In this case it seems that the data-access functions don't save typing, and aren't reused anywhere (or there would be other issues in removing them). So these functions simply shouldn't exist.

By keeping only the business logic in named functions, we get the best of both worlds (somewhere between Flater's answer and imel96's answer):

public EncryptedResponse process(EncryptedRequest encryptedRequest) {

    byte[] requestData = decryptRequest(encryptedRequest);
    EncryptedObject responseData = handleRequest(router.getDestination().getUri(), requestData, cryptoService.getEncryptionInfoForDefaultClient());
    EncryptedResponse response = encryptResponse(responseData);

    return response;
}

// define: decryptRequest(), handleRequest(), encryptResponse()
2

The first and most important thing: Uncle Bob seems to be like a preacher sometimes, but states that there are exceptions to his rules.

The whole idea of Clean Code is to improve readability and to avoid errors. There are several rules that are violating each other.

His argument on functions is that niladic functions are best, however that up to three parameters are acceptable. I personally think that 4 are also ok.

When instance variables are used, they should make a coherent class. That means, the variables should be used in many, if not all non-static methods.

Variables that are not used in many places of the class, should be moved.

I would neither consider the original nor the refactored version optimal, and @Flater stated already very well what can be done with return values. It improves readability and reduces errors to use return values.

kap
  • 121
  • 3
1

Local variables reduce scope hence limits the ways in which the variables can be used and hence helps prevent certain classes of error, and improves readability.

Instance variable reduce the ways in which the function can be called which also helps reduce certain classes of error, and improves readability.

To say one is right and the other is wrong may well be a valid conclusion in any one particular case, but as general advice...

TL;DR: I think the reason you smell too much zeal is, there's too much zeal.

drjpizzle
  • 264
0

Despite the fact that methods starting with get... should not return void, the separation of levels of abstractions within the methods is given in the first solution. Although the second solution is more scoped it is still harder to reason about what is going on in the method. The assignments of local variables are not needed here. I would keep the method names and refactor the code to something like that:

public class SomeBusinessProcess {
  @Inject private Router router;
  @Inject private ServiceClient serviceClient;
  @Inject private CryptoService cryptoService;

  public EncryptedResponse process(EncryptedRequest encryptedRequest) {
    checkNotNull(encryptedRequest);

    return getEncryptedResponse(
            passRequestToServiceClient(getDestinationURI(), getEncodedData(encryptedRequest) getEncryptionInfo())
        );
  }

  private EncryptedResponse getEncryptedResponse(EncryptedObject encryptedObject) {
    return cryptoService.encryptResponse(encryptedObject);
  }

  private byte[] getEncodedData(EncryptedRequest encryptedRequest) {
    return cryptoService.decryptRequest(encryptedRequest, byte[].class);
  }

  private EncryptionInfo getEncryptionInfo() {
    return cryptoService.getEncryptionInfoForDefaultClient();
  }

  private URI getDestinationURI() {
    return router.getDestination().getUri();
  }

  private EncryptedObject passRequestToServiceClient(URI destinationURI, byte[] encodedData, EncryptionInfo encryptionInfo) {
    return serviceClient.handle(destinationURI, encodedData, encryptionInfo);
  }
}
0

Both things do the same and the performance differences are unnoticeable, so I don't think there's a scientific argument. It comes down to subjective preferences then.

And I too tend to like your way better than your colleague's. Why? Because I think it's easier to read and understand, despite what some book author says.

Both ways accomplish the same thing, but his way is more spread out. To read that code you need to shift back and forth between several functions and member variables. It's not all condensed in one place, you need to remember all that in your head to understand it. It's a much greater cognitive load.

In contrast, your approach packs everything much more densely, but no so as to make it impenetrable. You just read it line after line, and you don't need to memorize so much to understand it.

However if he's used to code being laid out in such a fashion, I can imagine that for him it might be the other way round.

Vilx-
  • 5,420