4

I'm trying to decide on what format I should save data in, and hopefully this question will help me come to a decision. The program I'm writing uses 'databases' in which word=meaning (not a dictionary, that was just an example). I don't really want people to be able to manually edit these 'databases' without my program easily, but at the same time I'm looking for interoperability between programming langauges (namely C#, PHP, Java and Objective C)

At the minute, the main contenders are SQLite, XML and SQL Server CE (although PHP doesn't have support for Server CE, but that doesn't have to be too much of an issue). Really, my question is, should I use encryption because I have read that it is often just as easy to decrypt? And if you support encryption, how do you deal with the encryption key - do I just generate a long, complicated string?

Or is binary enough? Nothing in the database is too critical, however I would like to have an isEditable 'key' so my program knows whether to allow edits or not. And I'd also like to use my own file extension.

Andy
  • 372

5 Answers5

6

I'd say no, you should not encode them for multiple reasons

  • Having them more open is less work for you. Does your business need to spend the extra work hiding it?
  • Having it encrypted creates lock-in. Lock-in is not a selling feature
  • If your app decrypts it, that means that the encryption can be cracked by examining your app, so you've just done work for little gain - as soon as one person cracks it, anyone can repeat it.
  • Having it open empowers your users to do things with the data you did not foresee and did not add to your app - this means that you've given them extra utility.
  • Open data will make future partnerships easier because a partner will not need to write handing logic.
  • Open data does not mean that competing tools using that data will magically appear - competitors still need to write their own software, which is likely not as good a proposition as just using yours - how much effort will someone want to go to just to NOT use your software?

This is by no means a complete list, just what's off the top of my head.

Daenyth
  • 8,147
5

One really important fact that people always overlook about encryption is that it does absolutely nothing to ensure the integrity of your data. Encryption solves the problem of confidentiality; it prevents unauthorized users from reading the data. Encryption will make it harder for someone to modify the data in a useful way, but it isn't solving the right problem.

If you're worried about other programs changing your data, what you actually need is a trusted fingerprint of your data. A common way to do this is to generate a hash of the database when your program modifies it and then sign that hash with a private key. When you load the database, compute the hash, compare it to the signed hash, and verify that the signature is valid.

Now that the matter of theory is out of the way, it is worth noting that, in practice, encrypting your data might well be good enough for this kind of thing. Any blind modifications to encrypted structured data will likely corrupt the database, so that will prevent users from modifying the database using anything other than your application. If you also want to make sure no one else can read your data (it wasn't clear in the question), then you're probably safe to leave it at encryption and be done with it.

Dan Albert
  • 536
  • 3
  • 7
1

Do you want to limit tampering or limit the view of the data? The potential solution for each would require slightly different implementations.

Tampering you would use hashes. As a real world example, one application I worked on we would compute the hashes of strings and/or files. When users downloaded the file, we checked the hash. When user opened the file, we checked the hash. This is because our client didn't want end users tampering with the files (jpg, etc.) This prevented users from tampering with the file, because if they did, the hashes didn't match and they got an error. Any file/string can be hashed, and if your don't want user tampering with strings or files without issue, store a hash of that data along with the string/file. Passwords do not need to be encrypted, the solution would be to salt the password and then hash it. Then the data store and application has no knowledge of user passwords.

Encryption would be used to hide information such as credit card and/or personal data. Legal and regulatory requirements should define what needs to be encrypted. If the application is storing sensitive data, then one fo the ways to prevent access is via encryption. You can also limit access by defining ACLs etc. but if you only want your user(s) to view the data, then encryption is your best option.

Also, only implement if required. Going back to the example above, originally we did have any tamper resistence in the application. Users downloaded files and opened and viewed locally. Only after we had a requirement from a customer that users couldn't tamper with files then we implemented a hashing check. When in design we considered two options, limiting access on the file system to the files that were downloaded, or using a hash. We choose a hash because we were already calculating it on upload so it was relatively easy to add a check before the user could view it.

Jon Raynor
  • 11,773
1

As I understand your question, you don't really care if someone modifies or looks at your database, it's just you'd rather they didn't.

Daenyth makes a good argument why you shouldn't do that but if you decide to go with it anyway, consider a few things:

  • Crypto is hard to get right

Even if you do encrypt your data with AES or what have you, the value of the data doesn't really call for encryption/protection. It may also make it harder to read the data in the future in a different language or even with a different library.

  • You are better off with obfuscation

which means a simple Base64 will do. Of course that will do nothing to verify the integrity of the data as Dan noted but implementing signatures is still too costly. As an alternative, you might add padding to your file to conform with a set of fixed checksums (for some checksum function). Much lighter than signatures but still quite expensive.

If in the future you need to have a trusted connection to a remote database, ask again when you implement it.

As for SQLite vs XML there is a question on StackOverflow on the same topic. Although closed it has some excellent answers.


Edit: If you are using Java you might want to consider zipping the file. You get both an illegible binary blob and a checksum. I'm sure C# also has a version of this but I'm not familiar with that language.

rath
  • 886
1

If all you're looking for is to discourage external editing, while providing for interoperability among different implementation languages, consider simply compressing the file. It'll look like line noise if opened in a text editor, but if you use a standard compression algorithm you should be able to find (or write) a file compression library for most popular languages. You even get the benefit of saving storage space, if your data sets get large.

TMN
  • 11,383