14

For those of you who have the good fortune not to work in a language with dynamic scope, let me give you a little refresher on how that works. Imagine a pseudo-language, called "RUBELLA", that behaves like this:

function foo() {
    print(x); // not defined locally => uses whatever value `x` has in the calling context
    y = "tetanus";
}
function bar() {
    x = "measles";
    foo();
    print(y); // not defined locally, but set by the call to `foo()`
}
bar(); // prints "measles" followed by "tetanus"

That is, variables propagate up and down the call stack freely - all variables defined in foo are visible to (and mutatable by) its caller bar, and the reverse is also true. This has serious implications for code refactorability. Imagine that you have the following code:

function a() { // defined in file A
    x = "qux";
    b();
}
function b() { // defined in file B
    c();
}
function c() { // defined in file C
    print(x);
}

Now, calls to a() will print qux. But then, someday, you decide that you need to change b a little bit. You don't know all the calling contexts (some of which may in fact be outside your codebase), but that should be alright - your changes are going to be completely internal to b, right? So you rewrite it like this:

function b() {
    x = "oops";
    c();
}

And you might think that you haven't changed anything, since you've just defined a local variable. But, in fact, you've broken a! Now, a prints oops rather than qux.


Bringing this back out of the realm of pseudo-languages, this is exactly how MUMPS behaves, albeit with different syntax.

Modern ("modern") versions of MUMPS include the so-called NEW statement, which allows you to prevent variables from leaking from a callee to a caller. So in the first example above, if we had done NEW y = "tetanus" in foo(), then print(y) in bar() would print nothing (in MUMPS, all names point to the empty string unless explicitly set to something else). But there is nothing that can prevent variables from leaking from a caller to a callee: if we have function p() { NEW x = 3; q(); print(x); }, for all we know, q() could mutate x, despite not explicitly receiving x as a parameter. This is still a bad situation to be in, but not as bad as it probably used to be.

With these dangers in mind, how can we safely refactor code in MUMPS or any other language with dynamic scoping?

There are some obvious good practices for making refactoring easier, like never using variables in a function other than those you initialize (NEW) yourself or are passed as an explicit parameter, and explicitly documenting any parameters that are implicitly passed from a function's callers. But in a decades-old, ~108-LOC codebase, these are luxuries one often does not have.

And, of course, essentially all good practices for refactoring in languages with lexical scope are also applicable in languages with dynamic scope - write tests, and so forth. The question, then, is this: how do we mitigate the risks specifically associated with the increased fragility of dynamically-scoped code when refactoring?

(Note that while How do you navigate and refactor code written in a dynamic language? has a similar title to this question, it is wholly unrelated.)

senshin
  • 334

3 Answers3

5

Wow.

I do not know MUMPS as a language, so I do not know whether my comment applies here. Generally speaking - You must refactor from inside out. Those consumers (readers) of global state (global variables) must be refactored into methods/ functions/procedures using parameters. The method c should look like this after refactoring:

function c(c_scope_x) {
   print c(c_scope_x);
}

all usages of c must be rewritten into (which is a mechanical task)

c(x)

this is to isolate the "inner" code from the global state by using local state. When you are done with that, you will have to rewrite b into:

function b() {
   x="oops"
   print c(x);
}

the x="oops" assignment is there to keep the side effects. Now we must consider b as polluting the global state. If you only have one polluted element consider this refactoring:

function b() {
   x="oops"
   print c(x);
   return x;
}

end rewrite each use of b with x=b(). Function b must use only methods already cleaned up (you may want ro rename c o make that clear) when doing this refactoring. After that you should refactor b to not pollute the global environment.

function b() {
   newvardefinition b_scoped_x="oops"
   print c_cleaned(b_scoped_x);
   return b_scoped_x;
}

rename b to b_cleaned. I guess you will have to play a bit with that to get accoustomed to that refactoring. Sure not every method can be refactored by this but you will have to start from the inner parts. Try that with Eclipse and java (extract methods) and "global state" a.k.a. class members to get an idea.

function x() {
  fifth_to_refactor();
  {
    forth_to_refactor()
    ....
    {
      second_to_refactor();
    }
    ...
    third_to_refactor();
  }
  first_to_refactor()
}

hth.

Question: With these dangers in mind, how can we safely refactor code in MUMPS or any other language with dynamic scoping?

  • Maybe someone else can give a hint.

Question: How do we mitigate the risks specifically associated with the increased fragility of dynamically-scoped code when refactoring?

  • Write a program, which does the safe refactorings for you.
  • Write a program, which identifis safe candidates / first candidates.
thepacker
  • 893
3

I guess your best shot is to bring the full code base under your control, and make sure you have an overview about the modules and their dependencies.

So at least you have a chance of doing global searches, and have a chance to add regression tests for the parts of the system where you expect an impact by a code change.

If you do not see a chance to accomplish the first, my best advice is: do not refactor any modules which are reused by other modules, or for which you do not know that others rely on them. In any codebase of a reasonable size the chances are high you can find modules on which no other module depends. So if you have a mod A depending on B, but not vice versa, and no other module depends on A, even in a dynamically scoped language, you can make changes to A without breaking B or any other modules.

This gives you a chance to replace the dependency of A to B by a dependency of A to B2, where B2 is a sanitized, rewritten version of B. B2 should be a newly written with the rules in mind you mentioned above to make the code more evolvable and easier to refactor.

Doc Brown
  • 218,378
1

To state the obvious: How to do refactoring here? Proceed very carefully.

(As you've described it, developing and maintaining the existing code base should be difficult enough, let alone attempting to refactor it.)

I believe I would retroactively apply a test-driven approach here. This would involve writing a suite of tests to ensure the current functionality remains working as you start refactoring, firstly just to make the testing easier. (Yes, I expect a chicken and egg problem here, unless your code is modular enough already to test without changing it at all.)

Then you can proceed with other refactoring, checking that you haven't broken any tests as you go.

Finally, you can start writing tests that expect new functionality and then write the code to make those tests work.

Mark Hurd
  • 343