8

From the Agile Manifesto:

We follow these principles:

Welcome changing requirements, even late in development. Agile processes harness change for the customer's competitive advantage.

I have a requirement where we are doing cross database joins with multiple partner teams in many different data sources. Some are traditional RDMBS, some are columnar, and one is more of an API than a database for security reasons (so I can't run arbitrary SELECT ... JOIN against it).

Our solution is to set up ETL jobs against the various sources and try to get snapshots of relevant data in a single place, and use these local tables to do the cross db joins and eventually generate reports.

The problem is that if I need to use some new column as a join key (a new data source is introduced), or if a new business question is being asked, and I haven't pulled that data, then I need to modify the schemas and then rerun all the extracts again, and in some cases change the business calculations in the transforms which can take hours and requires a lot of operational overhead.

I keep hoping that requirements won't keep changing, but honestly that hope is not Agile - I should be able to respond to additional business questions on the fly. Another part of Agile is that the PMs should try to plan for these sorts of changes as early as possible - is that where my process is breaking down? Alternatively, how can I structure dataflows and related processes to be just as agile as other types of code?

durron597
  • 7,610
  • 10
  • 39
  • 67

2 Answers2

6

I do not believe this is about being Agile with database design. You could remove the database from this question and replace it with a SOAP or REST interface, or anything else involving multiple system communicating.

The core issue here is "what does Agile have to say about multiple teams developing multiple products in tandem that have to work together?" or "how do we manage change outside of our control?"

As usual, the Agile Manifesto says what the team does, not how. Agile is a philosophy, not a specific technique. It just so happens that we have honed techniques to manage the complexity you are dealing with.

In a word: automate.

When interfaces change, there will be manual changes and adjustments. Try to automate as much as possible. Maybe link some fields in a configuration file, and use an automated process to build out the code that actually moves data. Another automated process then moves the data through the system.

One project I worked on had a similar setup, where data was moving through a web service and into a database. Since the interfaces were changing even late in development, I got used to plugging in a new WSDL, updating a DAO's configuration, adding a line of code to move the data, then running the automated build process. It would generate code, create an installer, install the application, and move data from one end to the other and verify the records pushed into the web service made it to the database as expected.

Regarding the time for these processes to work: sounds like a good candidate for an overnight process.

There is really not much else you can do, since trying to get multiple teams (probably with stakeholders who earn more in one paycheck than you do in a year) to reach consensus is worse than herding cats. In my experience, project managers are fairly useless in these situations. They tend to sit around and argue with each other, while my time is better spent automating the process and saving time.

3

Discussed this in chat with @Thomas Owens.

The conclusion we reached is that if backfills are common and columns changes are common, then the user story is:

We need to have proper tools to respond quickly to changing requirements and backfills in a way that doesn't consume significant developer time.

The actual length of time for backfills to run isn't the problem, the problem is my time doing them all by hand. The solution is delivering working software that meets the requirements of the environment that I'm in, in other words, automating as much of this data maintenance as possible.

durron597
  • 7,610
  • 10
  • 39
  • 67