1

I don't dislike global state, but that could be due to the lack of experience. I was thinking about what the usual implementation of global state is:

A big variable where data flows in a non-consistent, unpredictable but most importantly, non-standardized way, referring to all CRUD operations.

An implementation like global $steps; where you would always do global $steps; $steps['insert']['key_name']['and_value']; to get a value is, in my opinion, why global state is hated and I believe, partially, because people have become a bit too accustomed to objects and for no good reason.

I'll try to naively show how to try to overcome these in order to aid my question.

The problem with just polluting the global state with all kinds of data inside of a variable is that that this way has absolutely not consistent ways to perform CRUD, nor it has rules. Anything can happen, where as objects must respect interfaces, return types and so on, a variable can be...anything. It's extremely hard to predict what goes in and out, or worse yet, in what form.

But what if that global state was accessed in very well-thought, documented & clear ways, so that it's always predictable?

Take my cache object example, that serves as a temporary cache, per PHP request:

class Cache
{
    /**
     * Holds all of our data in a key=>value manner.
     *
     * @var array
     */
    private $data;
/**
 * Adds data to a key.
 *
 * @param string $key The key, used as an identifier.
 * @param mixed $data The data that we're adding to that identifier.
 */
public function addData( $key, $data )
{
    //Should perform integrity checks, etc.
    $this->data[$key][] = $data;
}

/**
 * Retrieves data based on a key.
 *
 * @param string $key The key, used as an identifier.
 * @return mixed
 */
public function getData( $key )
{
    return $this->data[$key];
}


/**
 * Changes data, where data is the value of a key in the big array.
 *
 * @param string $key The key, used as an identifier.
 * @return void
 */
public function changeData( $key )
{
    if( checkIntegrityAndOtherStuff( $this->data[$key] ) ) {
        $this->data[$key] = $data;
        return True;
    }

    //If we failed our checks
    return False;
}

}

It's a very minimal implementation, but, assuming we had strong checks & rules in place (I will come back to this extremely vital point which I think could be the Achilles' hill of it all in a bit), then we have a predictable system that we can refer to:

global $cache = new Cache;

that we can always rely on to work in just one specific way and nothing else:

$cache->addData( 'user_list', [['name' => 'John'], ['name' => 'Jen']] );
$cache->getData( 'user_list' );

We have all the clear signs of good implementation: good naming, predictability, testability, it's concise, but most importantly, easy to use.

So what is wrong here?

The possible Achilles' heel.The one thing that defeats it all could be the fact that you cannot impose any type of contract / pattern on the data being added to these keys, unlike objects where you can set return types / interfaces and know what to expect when you retrieve something, here, you can't, the developer must know beforehand what he's getting, otherwise he's in the dark, with the other, worse side to it that anyone, even if well-intended can change the data contents (and therefore structure) without any consequences, rendering code that relies on it unusable.

If we had things such as "data contracts" that would be bound & required to the data we add (for retrieval later on), then no one could nor add the wrong data type / structure, nor retrieve it, creating a perfectly predictable, well-structured & ruly environment that everyone can benefit from and access.

It might look something like this:

public function addData( $key, $data, DataScheme $data_scheme )
{
    $structure = $data_scheme->getStructure();
if( dataDoesNotRespectScheme( $data, $data_scheme ) ) {
    //Break, do not allow it.
}
$this->data[$key][] = ['data' => $data, 'scheme' => $data_scheme];

}

public function changeData( $key, $new_data ) {

if( dataDoesNotRespectScheme( $this->data[$key]['scheme'], $new_data ) ) {
    //Fail.
} else {
    //Add the new data which is 100% identical in scheme to the old one.
}

//If we failed our checks
return False;

}

As such, the developer only has to know about the data structure, but he's 100% guaranteed to get it, in essence, creating a data interface which means that no matter what, code relying on retrieving this saved cannot fail.

Is this what the global state must overcome to be accepted?

If not, what would be a better solution to this or is the original intent of "I want to access data everywhere" a bad way of doing things?

coolpasta
  • 657

4 Answers4

15

Is this what the global state must overcome to be accepted?

No. Adding contracts to what goes into and out of global state is as easy as using a language with real types. There are lots of those and global state is still evil there.

Things you're missing:

  • Who fubar'd my data?!? - the biggest, most obvious problem with global mutable state is that anyone can change it. What happens when a bug pops up because the contents of the data aren't what you expected? Literally any part of your program could be the culprit.
  • I want to reuse this thing. - Oops, you can't. It relies on global mutable state along with the entire rest of your program. That broad coupling tends to make things less modular and encourages people to add functionality to your God Object.
  • Why is this thing slow?!? - Less of a problem in php, but global mutable state is really, really unfriendly to concurrency. That limits scalability and usually performance of that code. And since almost every unit test framework will parallelize test runs, your global mutable state will probably break that too.

(along with a few smaller things)

Telastyn
  • 110,259
3

Global state is not evil. Global state hates being anthropomorphised, so don’t do it.

The problem is not how global state is used. The problem is how it can be used. If I see code changing global state, thats fine because I know it is changed. If I don’t see changes to global state, that’s the problem because I have no idea whether it’s unchanged, or whether I missed the one line of code changing it. Or all of the 200 lines changing it.

gnasher729
  • 49,096
1

You question actually concern two different things:

  1. Global variables

  2. An untyped key-value store versus a strongly typed repository with input validation, consistency and access control.

Your question is if global variables are considered evil because they are typically used for untyped and unconstrained data.

The answer is no. Global variables are considered bad due to them being global - i.e. directly accessible from anywhere in the program. This creates a tight coupling between all parts of the program which defeats all other architectural constraints like layering and encapsulation.

A global variable can be anything, eg. just a boolean flag. The problem is when this flag can be sat in some component and then affect behavior in some distant, seemingly unrelated component.

Of course using a strongly typed repository have many advantages compared to an untyped store. This is just independent of the question of whether global variables are bad.

JacquesB
  • 61,955
  • 21
  • 135
  • 189
1

The problem with global state isn’t type safety or inconsistency, the problem with global state is that it is present globally. Every function or procedure has access to every bit of global state and if it’s mutable, can change it.

The thing is, a single global used in a couple of places, isn’t a big deal, while the potential scope is large, as a practical matter you can find all usages and then mentally scope it to that. But it doesn’t scale, when you have hundreds or thousands of them and they are in turned used anywhere from just a couple of places to thousands of different places, well, realistically your chances of knowing where all of those places are without looking them up is extremely low, even knowing where all of them are after looking them up may be impossible.

So, it may very well be that the best solution to a particular problem is to have a global, but every new global you add makes the others harder to use, harder to remember and harder to reason about. Globals aren’t evil, they are costly, they use up one of the most limited resources you have - human memory space.

I’ve called them costly before, but let’s put some actual numbers to it. If you had to pay even a single hours wage to every developer on the team, each time a global was added to the system or an existing global was assigned a new value, how many globals do you think you’d have and how many times would you be be letting them assign it a new value? And IMO a single hours wage would be cheap, I’ve dealt with lots of globals and spent much more than an hour tracking down the problem caused by a single global with an unexpected value.

In short, accessing everywhere may occasionally be necessary, but accessing everything anywhere is never necessary. Be frugal with what you decide needs to be accessible from anywhere.

jmoreno
  • 11,238