26

I work in a Data Warehouse that sources multiple systems via many streams and layers with maze-like dependencies linking various artifacts. Pretty much every day I run into situations like this: I run something, it doesn't work, I go through loads of code but hours later I realise I've managed to conceptualise the process map of a tiny portion of what I now know later in the day is required, so I ask someone and they tell me that this other stream has to be run first and that if I checked here (indicating some seemingly arbitrary portion of an enormous stack of other coded dependencies), then I would have seen this. It's incredibly frustrating.

If I were able to suggest to the team that perhaps it'd be a good idea if we did more to make the dependencies between objects more visible and obvious, rather than embedding them deeply in recursive levels of code, or even in the data that has to be present due to it being populated by another stream, perhaps by referring to a well known, tried and tested software paradigm — then I might be able to make my job and everyone else's a lot simpler.

It's kind of difficult to explain the benefits of this to my team. They tend to just accept things the way they are and do not 'think big' in terms of seeing the benefits of being able to conceptualise the entire system in a new way — they don't really see that if you can model a huge system efficiently then it makes it less likely you'll encounter memory inefficiencies, stream-stopping unique constraints and duplicate keys, nonsense data because it's much easier to design it in keeping with the original vision and you won't later run into all these problems that we are now experiencing, which I know to be unusual from past jobs, but which they seem to think of as inevitable.

So, does anyone know of a software paradigm that emphasises dependencies and also promotes a common conceptual model of a system with a view to ensuring long term adherence to an ideal? At the moment we pretty much have a giant mess and the solution every sprint seems to be "just add on this thing here, and here and here" and I'm the only one that's concerned that things are really beginning to fall apart.

8 Answers8

19

Discoverability

Its absence plagues many organizations. Where is that tool that Fred built again? In the Git repository, sure. Where?

The software pattern that comes to mind is Model-View-ViewModel. To the uninitiated, this pattern is a complete mystery. I explained it to my wife as "five widgets floating above the table talking to each other via some mysterious force." Understand the pattern, and you understand the software.

Many software systems fail to document their architecture because they assume that it is self-explanatory, or emerges naturally from the code. It isn't, and it doesn't. Unless you're using a well-defined architecture, new people will get lost. If it's not documented (or well-known), new people will get lost. And veterans will get lost too, once they've been away from the code for a few months.

It is the team's responsibility to come up with a sensible organizational architecture and document it. This includes things like

  • Folder organization
  • Project references
  • Class documentation (what it is, what it does, why it exists, how it is used)
  • Project, module, assembly, whatever documentation.

It is the team's responsibility to make things organized and discoverable so that the team does not constantly reinvent the wheel.

By the way, the notion that "code should be self-documenting" is only partially correct. While it is true that your code should be clear enough so that you don't have to explain every line of code with a comment, the relationships between artifacts like classes, projects, assemblies, interfaces and the like are non-obvious, and still need to be documented.

Robert Harvey
  • 200,592
10

The best way to approach these sorts of problems is incrementally. Don't get frustrated and propose wide, sweeping architectural changes. Those will never get approved, and the code will never improve. That's assuming you can even determine the correct wide, sweeping architectural changes to make, which is unlikely.

What is likely is that you could determine a smaller change that would have helped you with the specific problem you just solved. Maybe inverting some dependencies, adding some documentation, creating an interface, writing a script that warns of a missing dependency, etc. So propose that smaller change instead. Even better, depending on your company culture, they may tolerate or even expect you to make improvements like that as part of your original task.

When you make these smaller changes a regular part of your work, and by your example encourage others to do so as well, they really add up over time. Much more effective than whining about single larger changes you aren't allowed to make.

Karl Bielefeldt
  • 148,830
2

Architecture.

There is no single, specific, universal principle or practice that solves the discoverability and maintainability problems which applies to all aspects of all software. But, the broad term for the stuff that makes a project sane is architecture.

Your architecture is the whole body of decisions around each point of potential (or historical) confusion -- including the designation of how architectural decisions are made and documented. Everything pertaining to development process, folder structure, code quality, design patterns, and so forth are all things that might go into your architecture, but not one of them is an architecture.

Ideally, those rules are unified by a singularity of mind.

A small team can certainly create architecture collaboratively. But, with varying opinions, this can lead quickly to a very schizophrenic architecture that doesn't serve to maintain your sanity. The simplest way to ensure that your architecture, and the many TLA's and patterns therein, all serve the success of the team with a singularity of mind is to make a single mind responsible for them.

Now, that doesn't necessarily require an "architect" to pontificate. And, while some teams may want an experienced person to just make those decisions, the primary point is that somebody needs to own the architecture, especially as the team grows. Somebody keep their finger on the team's pulse, moderate architectural discussions, document decisions, and monitor decisions and work going forward for compliance with the architecture and its ethos.

I'm not a big fan of any one person making all the decisions; but, identifying an "architect" or "technical product owner" who is responsible for moderating architectural discussions and documenting decisions combats a greater evil: The diffusion of responsibility that leads to no discernible architecture.

svidgen
  • 15,252
1

Welcome to Software Engineering (in both senses) ;) This is a good question, but really there are no easy answers, as I'm sure you are aware. It's really a case of evolving into better practices over time, training people to be more skillful (by definition most people in the industry are mediocre competence)...

Software engineering as a discipline suffers from build it first and design it-as-we-go mentality, part out of expediency and part out of necessity. It is just the nature of the beast. And of course hacks get built on hacks over time, as the aforementioned coders put in place functional solutions quickly that resolve the short term need often at the cost of introducing technical debt.

The paradigm you need to use is essentially get better people, train the people you have well, and emphasize the importance on taking time over planning and architecture. One cannot easily be that "Agile" when working with a monolithic system. It can take considerable planning to put in place even small changes. Getting a great high-level documentation process in place will also help key people get to grips with the code more quickly.

The ideas you could focus on would be (over time, gradually) isolating and refactoring key parts of the system in a way that makes them more modular and decoupled, readable, maintainable. The trick is in working this is to existing business requirements, so that the reduction in technical debt can be done simultaneous with delivering visible business value. So the solution is part improving practices and skills and part trying to move more towards long-term architectural thinking, as I can tell you already are.

Note that I have answered this question from a software development methodology perspective rather than a coding technique perspective because really this is a problem that is much bigger than the details of coding or even architectural style. It's really a question of how you plan for change.

1

I like @RobertHarvey's idea of conventions and think they help. I also like @KarlBielefeldt's idea to "document as you go" and know that's essential because that's the only way to keep documentation current. But I think the over-arching idea is that documenting how to find all pieces of your code, build, and deploy them is important!

I recently emailed a significant open source project that had had some XML configuration that generated code was totally undocumented. I asked the maintainer, "Where is this XML code generation process documented? Where is the test database setup documented?" and he said, "It's not." It's basically a single-contributor project and now I know why.

Look, if you're that person and you are reading this, I really appreciate what you are doing. I practically worship the fruits of your labors! But if you spent an hour documenting how your really creative stuff is put together, I might spend a couple days coding new features that could help you. When faced with the brick wall of "lack of documentation isn't a problem," I'm not even going to try.

In a business, a lack of documentation is a huge waste of time and energy. Projects like that often get farmed out to consultants who cost even more, just so that they can figure out basic stuff like, "where are all the pieces and how do they fit together."

In Conclusion

What's needed is not so much a technology or methodology, but a culture shift; a shared belief that documenting how things are built and why is important. It should be part of code reviews, a requirement for moving to production, tied to raises. When everyone believes that and acts on it, things will change. Otherwise, it's going to be like my failed open source contribution.

GlenPeterson
  • 14,950
1

To answer the question as it is posed (rather than giving you advice for your particular situation):

The programming paradigm known as pure functional programming requires that everything which affect the output of a function must be specified in input parameters. There is no hidden dependencies or global variables or other mysterious forces acting invisibly across the code base. There is no "you have to do this first" temporal coupling.

JacquesB
  • 61,955
  • 21
  • 135
  • 189
0

Each data warehouse is different but there is a lot you can do to make things easier for yourselves.

For starters, every row in the database had a DATE_ADDED and DATA_UPDATED column so we could see when it was added to the database and when it was changed. We also had a SOURCE_CODE column so we could track where every bit of data entered the system.

Next we had common tools that ran across all our data warehouses such as sorts, table matches, slicers and dicers etc.

Bespoke code was kept to an absolute minimum and even then, it had to confirm to various coding and reporting styles.

I'm going to assume you're already familiar with ETL suites. There is a lot of functionality you get for free these days that wasn't present when I was in the game about a decade ago.

You might also want to look at data marts for presenting a more friendly, sanitised version of your data warehouse. Not a silver bullet of course but could help with certain issues rather than having to rebuild/correct your data warehouse.

Robbie Dee
  • 9,823
0

I don't know how much relevant it is to your case, there are some strategies to make dependencies more visible and general maintenance of code-

  • Avoid global variables, use parameters instead. This applies to cross language calls also.
  • Avoid changing/mutating values of the variables, as much as you can. Make a new variable and use, when you need to change the value, if possible.
  • Make the code modular. If it is not possible to describe what (not how) portion is actually doing in a simple sentence, break it up into modules which satisfy the condition.
  • Name your code portions properly. When you can actually describe what a portion of code is doing in simple terms, those terms become the name of the portion. Thus, the code becomes self documenting through names of modules/classes/functions/procedures/methods etc.
  • Test your code. Test if the entities in your code justify their names, discussed in the previous point.
  • Log events in the code. At least maintain two levels of log. First one is always enabled (even in production) and logs only critical events. And use the other to log basically everything, but can be turned on or off.
  • Find and use suitable tools to browse, maintain and develop your codebase. Even a simple "Search Everything" option of Visual Studio Code made my life a lot easier for certain cases.
Gulshan
  • 9,532