32

I'm working on a large software project which is highly customized for various customers arround the world. This means that we have maybe 80% code which is common between the various customers, but also a lot of code which has to change from one customer to the other. In the past we did our development in separate repositories (SVN) and when a new project started (we have few, but large customers) created another repository based on whatever past project has the best code basis for our needs. This has worked in the past, but we ran into several problems:

  • Bugs which are fixed in one repository are not patched in other repositories. This might be a problem of organization, but I find it difficult to fix and patch a bug in 5 different repositories, keeping in mind that the team maintaining this repository might be in another part of the world and we don't have their test environment, neither know their schedule or what requirements they have (a "bug" in one country might be a "feature" in another).
  • Features and improvements made for one project, which might also be useful for another project are lost or if they are used in another project often cause big headaches merging them from one code base to another (since both branches might have been developed independently for a year).
  • Refactorings and code improvements made in one development branch are either lost or cause more harm than good if you have to merge all these changes between the branches.

We are now discussing how to solve these problems and so far came up with the following ideas on how to solve this:

  1. Keep development in separate branches but organizing better by having a central repository where general bug fixes are merged into and having all projects merge changes from this central repository into their own on a regular (e.g. daily) basis. This requires huge discipline and a lot of effort for merging between the branches. So I'm not convinced that it will work and we can keep this discipline, especially when time pressure puts in.

  2. Abandon the separate development branches and have a central code repository where all our code lives and do our customization by having pluggable modules and configuration options. We are already using Dependency Injection containers to resolve dependencies in our code and we are following the MVVM pattern in most of our code to cleanly separate business logic from our UI.

The second approach seems to be more elegant, but we have many unsolved problems in this approach. For example: how do handle changes/additions in your model/database. We are using .NET with Entity Framework to have strongly typed entities. I don't see how we can handle properties which are required for one customer but useless for another customer without cluttering our data model. We are thinking of solving this in the database by using satellite tables (having a separate tables where our extra columns for a specifiy entity live with a 1:1 mapping to the original entity), but this is only the database. How do you handle this in code? Our data model lives in a central library which we would not be able to extend for each customer using this approach.

I'm sure that we are not the only team struggling with this problem and I'm shocked to find so little material on the topic.

So my questions are the following:

  1. What experience do you have with highly customized software, what approach did you choose and how did it work for you?
  2. What approach do you recommend and why? Is there a better approach?
  3. Are there any good books or articles on the topic that you can recommend?
  4. Do you have specific recommendations for our technical environment (.NET, Entity Framework, WPF, DI)?

Edit:

Thanks for all the suggestions. Most of the ideas match those that we already had in our team, but it is really helpful to see the experience you had with them and tips to better implement them.

I'm still not sure which way we will go and I'm not making the decision (alone), but I will pass this along in my team and I'm sure it will be helpful.

At the moment the tenor seems to be a single repository using various customer specific modules. I'm not sure our architecture is up to this or how much we have to invest to make it fit, so some things might live in separate repositories for a while, but I think it's the only longterm solution that will work.

So, thanks again for all responses!

aKzenT
  • 457

9 Answers9

12

It sounds like the fundamental problem is not just code repository maintenance, but a lack of suitable architecture.

  1. What is the core/essence of the system, that will always be shared by all systems?
  2. What enhancements/deviations are required by each customer?

A framework or standard library encompasses the former, while the latter would be implemented as add-ons (plugins, subclasses, DI, whatever makes sense for the code structure).

A source control system that manages branches and distributed development would probably help also; I'm a fan of Mercurial, others prefer Git. The framework would be the main branch, each customized system would be sub-branches, for example.

The specific technologies used to implement the system (.NET, WPF, whatever) are largely unimportant.

Getting this right is not easy, but it is critical for long-term viability. And of course the longer you wait, the greater the technical debt you'll have to deal with.

You may find the book Software Architecture: Organizational Principles and Patterns useful.

Good luck!

11

One company I've worked for had the same problem, and the approach to tackle the problem was this: A common framework for all new projects was created; this includes all stuff that has to be the same in every project. E.g. form generating tools, export to Excel, logging. Effort was taken to make sure that this common framework is only improved (when a new project needs new features), but never forked.

Based on that framework, customer-specific code was maintained in seperate repositories. When useful or necessary, bug fixes and improvements are copy-pasted between projects (with all the caveats described in the question). Globally useful improvements go into the common framework, though.

Having everything in a common codebase for all customers has some advantages, but on the other hand, reading the code becomes difficult when there are countless ifs to make the program behave differently for each customer.

EDIT: One anecdote to make this more understandable:

The domain of that company is warehouse management, and one task of a warehouse management system is to find a free storage location for incoming goods. Sounds easy, but in practice, a lot of constraints and strategies have to be observed.

At one point in time, management asked a programmer to make a flexible, parameterisable module to find storage locations, which implemented several different strategies and should have been used in all subsequent projects. The noble effort resulted in a complex module, which was very difficult to understand and maintain. In the next project, the project lead couldn't figure out how to make it work in that warehouse, and the developer of said module was gone, so he eventually ignored it and wrote a custom algorithm for that task.

A few years later, the layout of the warehouse where this module was originally used changed, and the module with all its flexibility didn't match the new requirements; so I replaced it with a custom algorithm there, too.

I know LOC is not a good measurement, but anyway: the size of "flexible" module was ~3000 LOC (PL/SQL), while a custom module for the same task takes ~100..250 LOC. Therefore, trying to be flexible extremely increased the size of the code base, without gaining the reusability we had hoped for.

user281377
  • 28,434
5

One of the projects I've worked on supported multiple platforms (more than 5) across a large number of product releases. A lot of the challenges you are describing were things we faced, albeit in a slightly different way. We had a proprietary DB, so we didn't have the same types of problems in that arena.

Our structure was similar to yours, but we had a single repository for our code. Platform specific code went into their own project folders within the code tree. Common code lived within the tree based upon what layer it belonged to.

We had conditional compilation, based upon the platform being built. Maintaining that was kind of a pain, but it only had to be done when new modules were added at the platform specific layer.

Having all of the code in a single repository made it easy for us to make bug fixes across multiple platforms and releases at the same time. We had an automated build environment for all the platforms to serve as a backstop in case new code broke a presumed unrelated platform.

We tried to discourage it, but there would be cases where a platform needed a fix based upon a platform-specific bug that was in otherwise common code. If we could conditionally override the compile without making the module look fugly, we'd do that first. If not, we would move the module out of common territory and push it into platform specific.

For the database, we had a few tables that had platform specific columns / modifications. We would make sure that every platform version of the table met a baseline level of functionality so common code could reference it without worrying about platform dependencies. Platform specific queries / manipulations were pushed into the platform project layers.

So, to answer your questions:

  1. Lots, and that was one of the best teams I've worked with. Codebase at that time was around 1M loc. I didn't get to choose the approach, but it worked out pretty dang well. Even in hindsight, I haven't seen a better way of handling things.
  2. I recommend the second approach you suggested with the nuances I mention in my answer.
  3. No books that I can think of, but I would research multi-platform development as a starter.
  4. Institute some strong governance. It's the key to making sure your coding standards are followed. Following those standards is the only way to keep things manageable and maintainable. We had our share of impassioned pleas to break the model we were following, but none of those appeals ever swayed the entire senior development team.
4

I worked for many years on a Pension Administration application which had similar issues. Pension plans are vastly different between companies, and require highly specialized knowledge for implementing calculation logic and reports and also very different data design. I can only give a brief description of part of the architecture, but maybe it will give enough of the idea.

We had 2 separate teams: a core development team, which was responsible for the core system code (which would be your 80% shared code above), and an implementation team, which had domain expertise in pension systems, and was responsible for learning client requirements and coding scripts and reports for the client.

We had all of our tables defined by Xml (this before the time when entity frameworks were time-tested and common). The implementation team would design all the tables in Xml, and the core application could be prompted to generate all the tables in Xml. There were also associated VB script files, Crystal Reports, Word docs etc. for each client. (There was also an inheritance model built into the Xml to enable reusing other implementations).

The core application (one application for all clients), would cache all the client specific stuff when a request for that client came, and it generated a common data object (kind of like a remote ADO record set), which could be serialized and passed around.

This data model is less slick that entity/domain objects, but it is highly flexible, universal, and can be processed by one set of core code. Perhaps in your case, you could define your base entity objects with only the common fields, and have an additional Dictionary for custom fields (add some kind of set of data descriptors to your entity object so that it has meta data for the custom fields.)

We had separate source repositories for the core system code and for the implementation code.

Our core system actually had very little business logic, other than some very standard common calculation modules. The core system functioned as: screen generator, script runner, report generator, data access and transport layer.

Segmenting core logic and customized logic is a tough challenge. However, we always felt it was better to have one core system running multiple clients, rather than multiple copies of the system running for each client.

Sam Goldberg
  • 1,004
2

I've worked on a smaller system (20 kloc), and found that DI and configuration are both great ways to manage differences between clients, but not enough to avoid forking the system. The database is split between an application specific part, which has a fixed schema, and the client dependent part, which is defined through a custom XML configuration document.

We've kept a single branch in mercurial that's configured as if it was deliverable, but branded and configured for a fictional client. Bug fixes are mainlined into that project, and new development of core functionality only happens there. Releases to actual clients are branches off of that, stored in their own repositories. We keep track of large changes to the code through manually written version numbers and track bug fixes using commit numbers.

2

I am afraid that I do not have direct experience of the problem that you describe, but I do have some comments.

The second option, of bringing the code together into a central repository (as much as practicable), and architecting for customization (again, as much as practicable) is almost certainly the way to go in the long term.

The problem is how you plan to get there, and how long it is going to take.

In this situation, it is probably OK to (temporarily) have more than one copy of the application in the repository at a time.

This will enable you to gradually move to an architecture that directly supports customization without having to do it in one fell swoop.

William Payne
  • 1,171
  • 8
  • 20
2

The second approach seems to be more elegant, but we have many unsolved problems in this approach.

I am sure any of those problems can be solved, one after a another. If you get stuck, ask here on or on SO about the specific problem.

As others have pointed out, having one central codebase / one repository is the option you should prefer. I try to answer your example question.

For example: how do handle changes/additions in your model/database. We are using .NET with Entity Framework to have strongly typed entities. I don't see how we can handle properties which are required for one customer but useless for another customer without cluttering our data model.

There are some possibilities, all of them I have seen in real-world systems. Which one to choose depends on your situation:

  • live with the cluttering to a certain degree
  • introduce a tables "CustomAttributes" (describing names and types) and "CustomAttributeValues" (for the values, for example stored as a string representation, even if they are numbers). That will allow to add such attributes at install time or run time, having individual values for each customer. Don't insist on having each custom attribute modeled "visibly" in your data model.

  • now it should be clear how to use that in code: have just general code for accessing those tables, and individual code (perhaps in a separate plug-in DLL, which is up to you) for interpreting that attributes correctly

  • another alternative is to give each entity table a big string field where you can add an individual XML-string.
  • try to generalize some concepts, so they could be more easily reused across different customers. I recommend Martin Fowler's book "Analysis patterns". Though this book is not about customizing software pre se, it may be helpful for you either.

And for specific code: you can also try to introduce a scripting language into your product, especially for adding customer-specific scripts. That way you do not only create a clear line between your code and customer-specific code, you can also allow your customers to customize the system to some degree by themselves.

Doc Brown
  • 218,378
0

When I'm asked to start the development of B which is sharing 80% functionality with A, I will either:

  1. Clone A and modify it.
  2. Extract the functionality that both A and B share into C which they will use.
  3. Make A configurable enough to fulfill the needs of both B and itself (therefore B is embedded in A).

You chose 1, and it doesn't seem to fit your situation well. Your mission is to predict which of 2 and 3 is a better fit.

Zippo
  • 416
0

I have only built one such application. I'd say that 90% of the units sold were sold as is, no modifications. Each customer had their own customized skin and we served up the system within that skin. When a mod did come in that affected the core sections we tried using IF branching. When mod #2 came in for the same section we switched to CASE logic whiched allowed for future expansion. This seemed to handle most of the minor requests.

Any further minor custom requests were handled by implementing Case logic.

If the mods were two radical, we built a clone (separate include) and wrapped a CASE around it to include the different module.

Bug fixes and modifications on the core effected all users. We tested thoroughly in development before going to production. We always sent out email notifications that accompanied any change and NEVER, NEVER, NEVER posted production changes on Fridays... NEVER.

Our environment was Classic ASP and SQL Server. We were NOT a spaghetti code shop... Everything was modular using Includes, Subroutines and Functions.