26

I was browsing a github project and found this module which has more than 10 thousand lines.

Is it a common practice to have that much code in a single module?

It seems to me that this should be split over multiple modules. Maybe one for each db engine.

What benefit does the developer get from making one huge module like this (other than "having it all in one place") or what downside is there from splitting it up (other than "complexity")?

4 Answers4

16

What you encountered is the so-called "God object", because it does all or knows all. Run away from it (if you can).

There isn't a definite number of LOC per module, but it should be something that makes it easy to browse the code and easily understand what the methods are doing. From my personal experience, if your module goes beyond 1k lines*, you are doing something wrong.

* Even a 1k line module is very big.

Panzercrisis
  • 3,213
BЈовић
  • 14,049
7

This appears to be a module where typical size limits may not apply. The majority of the functionality is in the first 2k lines of code and comments. The rest of the file appears to be a lot of adapter classes and other support classes which appear to be tightly coupled to the module. In other languages the classes would be in separate files of a reasonable size.

Some more doc strings might be useful, but they would increase the size of an already large module. The code is clear and self explanatory with appropriate comments where needed.

BillThor
  • 6,310
5

Of course the actual "limit" varies by your project and a multitude of factors.

But I chime in with a rule of thumb: 200 lines of decent Python. That is, no C or Java code written in Python, but good Python in Python.

mike3996
  • 197
1

Wow.

I guess I don't know the full answer for this one, but I like thinking as an answer to the title question "How large should a Python module be?" as the concept of Parnas, hiding a secret. In this case the module seems to do that properly (and thats such a big secret it hides).

I've been digging into papers later that talk about a lot regarding coupling and cohesion. Maybe having many db modules would force too many calls between modules leading to increase what is considered bad practice, that is, lower cohesion and higher coupling?

I've seen experimental data talking about programmers deciding to sacrifice good practice for the sake of simplicity and understanding despite whatever good practice dictates. In fact, there could be a conflict between good practices as well. Say, performance doesn't usually makes the people who do maintenance happy later. I'm not quite sure how legibility would be improved in this case with such a big module.

Another thing I noticed is that part of the code is stated as generic and the rest of the dbs are extended from it. I'm not a python programmer, but perhaps this could justify something?

So, I don't have a final answer, but I hope someone highlights these points too!