10

What are some pros and cons of representing routes as legs or as stops?

A leg is a departure and arrival location, a departure time and a duration (or an arrival time and a duration).

A stop is an arrival time, a location, and a departure time.

The domain I'm modelling is a marketplace where people describe the route they're planning, then, suppliers can bid on routes they're interested in. We have an existing system that is using stops. We don't have any major problems with stops, but I'm wondering if using legs would be better, hence I'm wondering if there are pros or cons I'm not aware of.

I searched for "software design trip modelling", "software design route modelling", "software trip leg stop", and all I found is documentation on a tool by Oracle and model railroad software.

I do understand that legs and stops are duals of each other, but in the database, there should be a single representation. https://dba.stackexchange.com/ is silent on the matter.

Thanks in advance!

Christophe
  • 81,699

5 Answers5

22

but I'm wondering if using legs would be better

When one has two alternative data structures representing the same thing, I would recommend against the assumption that one representation could be universally "better" than the other.

Semantically equivalent data structures may behave differently in regards to resource consumption, but often a structure which gives an increased performance for certain processes will do this on the expense of requiring more memory, or vice versa. Or, a data structure which is more performant for a process X might be slower when used for process Y. For many real-world case, however, these differences are completely negligible.

Hence one should pick what works well for their specific use case - if both approaches work well, one may start with the one which appears to be easier to implement, or the one which uses less redundancy (that's often the same, since redundancy can cause inconsistencies, and to deal with those correctly, one has to write and test extra code). If that does not help for a decision, one should flip a coin.

The case you described may probably a case where the differences don't matter. The only use case for the data mentioned in the question was "description of a route" (for the purpose of bidding on it), not even something more complicated like path finding, and you wrote you don't have any real issues with it. Since the current approach seems to work well, I would recommend to stay with it. If you are in doubt, build a prototype using "legs" and evaluate its behaviour in comparison to the existing system.

Said that, I see actually one point here which could make difference: in the comments, you mentioned some requirements for meta data related to "stops" - stop types like "pick up, drop off, pit stop, and others". You did not mention similar meta data for "legs". This might be an indicator that both representations are not really equivalent, and that a "leg" representation may simply not be sufficient for the situation.

Moreover, these approaches are not mutually exclusive. In case you need also meta data for legs (like an information which road to take), you could implement a model which contains both:

  • stops with location, arrival/departure times and meta data

  • legs as a connection between to two stops, and "leg metadata" (but not with a arrival/departure time & location on their own)

So double check your requirements: are both representations really dual when it comes to meta data? Then it probably does not matter which one to pick. If not, you may not use either the one, or the other, but a combination of both.

Doc Brown
  • 218,378
11

A route is just a path in a graph that is annotated with time information:

  • A leg describes an edge in this path (from, to, duration), with a time annotation for either start or end of the edge.

  • A stop describes a node/vertice in this path (location), with a time annotation related to an incoming and outgoing edges.

You will find in the more general graph-related literature a huge amount of articles on path finding algorithms using either the one or the other, and certainly some comparative analysis. The additional timing information is just decoration of the graph data with time information.

Additional remarks

On my own limited experience in this area:

  • I found the representation of a path as a succession of edges (so what you call legs) more powerful in complex graphs where many edges relate to the same vertices.
  • The simpler and more elegant succession of nodes is fine as well, but you need to calculate or store the information about what happens between the nodes.

If you have to merge several routes (e.g. a train route with one carrier, followed by a truck route with another carrier):

  • with the leg/edge approach, the overall route is just the concatenation of the partial routes (each edge can uniquely assigned to one carrier).
  • with the stop/vertice approach, the overall route needs to duplicates vertices when you change transporter.

But in the end, whether you use one or the other, you can switch from the one to the other with some computations if you’ve kept the right annotations.

From the customer point of view, a representation of stops is very natural, since many bus stops in the world use this notation, and it allows a compact presentation of the schedule.

So it all depends on what you are the most interested in and use the most often.

A final remark is about your current algorithmic assets: if you have an existing system for which one approach worked well, and you have teams experimented with this approach, there may be an efficiency gain to keep this logic.

Christophe
  • 81,699
10

I would say it depends on what the suppliers are bidding on:

  • If suppliers are bidding on providing services for a single leg, such as providing a truck to carry a package from point A to B, then it would be more natural to represent the data as legs.
  • If suppliers are bidding on providing services at single stops, such as providing labor to unload from one truck and load another truck, then it would be more natural to represent the data as stops.
  • If suppliers are bidding on handling the entire route, it doesn't really matter how you represent it, although stops probably fit better with how a user would initially build the route.

There's nothing particularly wrong with using both, either. I would consider this a fairly natural fit for a graph database because you have nodes and edges that are related with each other.

Karl Bielefeldt
  • 148,830
8

When you have two data structures that are seemingly dual to each other for your purposes, look at the edge cases. In your example, that would be the empty route:

  • A route with zero legs? Fine. A route with a single leg? Fine.
  • A route with zero stops? Fine. A route with two stops? Fine. A route with a single stop? Wait…

Similarly, look at the start and end of each route. With a list of stops, you'd have some stops without an arrival and some stops without a departure, whereas a leg always has both - non-nullable data is easier to work with.

Sure, you might not want to model empty routes at all, so there might be an additional constraint that forbids these cases anyway. But thinking about their semantics, what they may represent if they were to occur, helps finding the more appropriate (or at least: more elegant) solution.

That said, the theoretically cleaner approach is not always the more practical one. Some database designs are better performing than others, despite (or: because) they are not properly normalised. So take a look at the (main/most important) queries you'd want run against these tables, and what constraints (like foreign keys, or checks like departure_time < arrival_time) you want to apply on these tables, and what indices you need. One approach might be easier to handle than the other.

Bergi
  • 1,368
1

OK so once you have the data in memory its trivial to convert stops to legs and vice-versa.

Your problem will come when you are searching you database for particular sets of routes.

For example, I am at A and want to get to B in a single stop. If I am storing legs, I can search for rows in the database which depart A and arrive at B. If i am storing stops, I have to do a much more complicated search, getting multiple rows and their next and previous stops.

What kind of search you expect will determine how to store the data efficiently. I would imagine a hybrid approach might be best for the kind of train journey route you describe. ie A to D with stops B and C.

Ewan
  • 83,178