Not so long ago I talked to my colleague and he was definitely against using bit masks because it is hard to understand all the values that are stored in the database. In my opinion it is not always a bad idea to use them, for example to determine the roles of the current user. Otherwise you need to store it in a separate table, which will cause one more JOIN. Can you please tell me if I am wrong? Any other side-effects, advantages/disadvantages of using bit masks?
8 Answers
I work with an application that uses bitmasks to store user role assignments. It's a pain in the butt. If this makes me biased, guilty as charged.
If you're already using a relational database, it is an anti-pattern that violates most relational theory and all the normalization rules. When you build your own data storage, it may not be such a bad idea.
There is such a thing as too many tables being joined, but relational databases are built to handle this. Many have additional features if performance becomes an issue: indexes, indexed views, etc. Even if the values you're looking up don't change very often, which is an advantage for Bitmask, the over-head of having to manage indexing is pretty easy on the database.
Although database do a good job of aggregating data, they can get sluggish when you start introducing things like complex formulas or Scalar Functions into datasets. You can do the bitwise in your app, but if all you're doing is getting related data (looking up a user's role(s)), you're not taking advantage of what your data storage does best.
My last argument against it would be simplicity for other developers. You have users, roles and assignments. It's a many-to-many relation set (because there's more than one relationship) that is so common, it should be easy to manage. It's just CRUD stuff.
- 36,956
You have already named the relevant pros and cons:
- Bit fields save space.
- They store data in the record itself, so you don't need JOINs to find them. (But individual flag fields in the record would do the same.)
- They are badly readable if you want to work productively with raw SQL output.
Deciding what to do requires more info:
- Just how scarce is disk space for your use case?
- Do you actually read user roles so often that the time to JOIN them is a bottleneck?
- Are you going to read SQL output and make decisions based on that - or is an unreadable data base record immaterial, just like that fact that the machine code of your system is unreadable?
So what you have to do is gather the risk factors and then weight them, to see whether the pros outweigh the cons.
- 110,899
If you're really, really, really strapped for disk space, then you might consider bitmaps for user permissions. If performance is your worry, then forget about them altogether, because picking them apart will actually be slower. You can't index a bitmapped field meaningfully, resulting in database table scans, which are [almost] always a performance killer.
Unless you're Amazon or Netflix, the amount of data involved in user permissions will be negligible compared to everything else you're holding.
Any serious DBMS can handle that "extra join" without even blinking.
- 13,093
Back when storage was expensive, the boon with bit masks was that they saved space. In the days of big data, this isn't the issue it once was.
Taking the example you cite - having roles stored as a bit mask would be something of a code smell from a database design point of view as it would violate first normal form. In this sense, they're an anti-pattern.
All this being said, it doesn't have to be one or the other. You could store the data as a bit mask and then have a view that can pull the user roles on the fly. You'd also then have the benefit of checking at a glance which users had the same roles.
- 9,823
The only advantage to using bitmasks is if the bit fields' meaning is not static. Relational tables only work well if you know ahead of time what each field is on a record: you have to identify the fields in the CREATE TABLE DDL statement after all.
If the meaning of each bit field is configurable at runtime, or otherwise not known ahead of time, then it might make sense to store booleans as a bit field. Even then, it is possible to define a table with arbitrary fields: field_1, field_2, etc. This gives you a cleaner relational design, although still not ideal. Whether this is preferential to a bit field is largely a matter of opinion, since neither solution is ideal.
If you know what the bits represent during development, then create fields for each bit and give them meaningful names.
Just be careful of the inner platform effect. If you end up defining arbitrary but well-typed fields that is one thing, but if you go too much farther than that you will be reinventing a relational database... inside of a relational database.
I am ambivalent about bitmasks. I find most of their detractors do not understand binary and hexadecimal. For clarity, use good mnemonics.
An advantage not mentioned above is the ability to add new meaning to bit masks without the potentially time-consuming addition of a new column. Our db designers (that preceded me) have them in a table that now gets 5 million new records daily. Adding a new column to represent a new behavior would take a long time, while defining a new bit (we've consumed 33 of 64) requires no table rebuild.
No, bit masks cannot be indexed but building 33 indexes would be ridiculous and would slow insertions to a crawl. Table searches use the dates & record "owners" indexes, hence indexes on this bit mask, if possible, would never be used.
- 21
If the goal is just to save some disk space, I think it's a bad idea:
- look at the cost of the GB today,
- compare it to the cost of the time of those who write reports and querries and have to figure out what's in the field, and how to address a specific bit, the cost/benefit comparison might end on the wrong side.
- if you're working with an SQL database, the additional bit access operations required in many querries might also consume more computing time than necessary
However there are some cases, that can jusitfiy the use of bit fields :
- if your bits represent a complex set of flags that you always handle together as a whole,
- even more if you need to apply some pattern matching algorithms on these sets,
- and especially if this data is not amongst the most frequently used selection criteria.
- 81,699
I used to work with a database when bitmask flags were used. The database stored information about call records (CDR) and contained more than 500 millions records in calls table. A call itself had a few attributes like incoming/outgoing, internal/external, etc. So for these attributes we used bitmask flags. In that case it saved space but these flags were actively used in reports so it was really hard to read.
Another example is a project I'm working on right now. I store about billion URLs in PostgreSQL database and I use URL normalization to avoid duplicates and save some space. For example, I don't store http:// or https:// prefix and it saves a few bytes per every URL. Also, I remove trailing slash from the URL. I save information about how URL was normalized in bitmask smallint field. Otherwise I would have to create about 10 boolean columns in my table.
So my point is that sometimes it can be reasonable but, of course should be well documented.
P.S. If you try to save some space by storing values in bit flags please don't forget that some RDBMS (like PostgreSQL) use alignment when storing data in tuples and you should consider column order to make this work.
- 111
- 2