3

The type of data I'm hoping to collect is a little specific, and unfortunately I'm under NDA and the data is a core part of the business plan so I'm not at liberty to post it online. I've come up with a similar example, so please bare with me and pay no attention to the flaws of this hypothetical service.

Say I make an online "school finder" service, where the user enters their address and the service finds and scrapes the tuition cost of all schools within say a 100km distance from that address. The service would display the list of schools sorted by price, and the user would select the school they are going to attend based on that information. They would also specify any other contributing factors, such as the aesthetics of the campus and the reputation of the school (things like this would be selected from a drop-down menu or something, a small set of non-personally-identifiable options). The school finder would passively collect this data (as opposed to actively seeking out every school everywhere and scraping the qualities of it from their public web page) and use it to both provide better search results, and also display a map centered on the user's region, representing that data somehow. It would point out, based on past usage, the most desired schools based on a number of factors (say, cheapest and most desired would be green, most desired for other reasons would be yellow, and seldom selected schools would be red).

Most of the data is just passively collected and is otherwise publicly available, but some of it (the reason the school was picked) is somewhat personal, as well as the user's home address being personal information. The only publicly visible data would be the "rating" of each school, based on a selected region. The only saved data would be the rating and cost of each school, as well as the postal code (or city, whatever) associated with that rating.

The service has no user accounts, so it's not feasible to ask every user to agree to a privacy policy for every request. Is it legal/ethical to collect and release this set of information without explicit consent?

Carson Myers
  • 2,480
  • 3
  • 24
  • 25

6 Answers6

4

Is it legal/ethical to collect and release this set of information without explicit consent?

Check with a lawyer. That's what they are there for. However, if you want a hint, most web sites usually display a privacy policy that describes how user data is handled. Certain laws might require you to disclose such a policy. Furthermore, having a policy does not imply that you are not violating any law; that's why check with a lawyer.

As far as implementation of policies are concerned, it wouldn't matter if you are finding it difficult to implement or not. You can't just wish away certain provisions of any law because you don't like it. I suppose a lawyer would say the same thing, so the following point is moot.

The service has no user accounts, so it's not feasible to ask every user to agree to a privacy policy for every request.

TLDR: Check with your lawyer. If your lawyer says so, do so without hesitation.

waiwai933
  • 733
4

Legal issues aside, I'd like to consider the ethical standpoint, because even if something's legal you might still consider it unwise.

Usually privacy is considered to be breached if details are tracable to one individual (or a member of a certain family). And this certainly is the case if you use and save full addresses. But since you are searching quite a large area (100km) perhaps it is ok to have a precision of less than the full address, for example neighborhood. One way of doing this is allowing people to enter just their city or town, or over here partial or full zip, or street name without number.

A different approach is using a lat/lon calculation that rounds precision to 1km (and perhaps give people the option to extend this to 10, noting that it is more anonymous but less accurate.)

For the reasons, you might opt for selecting a set of standard items (this will also probably simplify matching) and include an extra text field for extra information. The standard items can be used as anonymous info. For a text field there are several options, just explain next to the field what you are going to do with it (publicly but anonymously list (and where you list it, with the area or the school), use it to extend your set of standard items, use them for private research (like going over them yourselves and distilling useful information out of it) etc.

For the users this is very simple, they can choose their privacy options by simply entering or not entering certain information or selecting their level very specifically.

As for all the legal implications I wouldn't know, but I personally consider an application that would implement those options very ethically sound, and I believe it is possible without really compromising on accuracy.

Inca
  • 1,534
1

If you collect and release any aggregate of information, you will need to check with a lawyer to confirm that what you're doing with the data is OK; certain types of data are protected no matter what, and the laws will vary from state to state and nation to nation.

You will also want to put a relatively easy-to-reach and easy-to-read privacy policy where a user can read it, with the links at the points where you collect the data, even if you don't force an agreement each time.

But in short, vet your plans with an attorney, make sure you're in compliance with the laws where the data is hosted and processed as well as, as best you can, with the laws where the users are located. Make sure you publish a procedure whereby a user can get his data removed from your collection, if he can identify it at all.

Finally, don't be a weasel with mined data, because as Google and Facebook can both attest, even the rumor of data misuse can bite you back.

1

I'll volunteer the contrarian view here... Don't bother checking with a lawyer unless you've money to waste for shallow advice from someone who will charge you an arm and a leg.

If you're thinking you might end up doing some unethical stuff by collecting anonymized information, I cannot help but laugh at what you must be thinking about Google or Facebook.

Online advertisement businesses have successfully lobbied digital privacy rights into irrelevance on the west side of the Atlantic. The assumption in online interaction is opt-out; not opt-in. And even opt-out, to a large extent, is not enforced -- because unenforceable.

I've no Google account to speak of, nor have I ever allowed the latter to collect any kind of information on me when logged out (which is always). But they do so extremely actively and aggressively on a daily basis. They do so through their own site when I search, through Google Analytics on untold numbers of sites, through my contacts' gmail accounts, and many more. To serve me ads, no less.

So, please. If they can do it, common sense dictates that can you too. Add a brief notice in your site's legal section that says what you're collecting, how and why, make it clear that it's anonymous information, and begone with it.

1

Legal? Ask a lawyer.

Ethical? I'd say no, except where it is vital for the operation of your service that you collect this data, and you do everything you can to keep the data anonymous and confidential.

In fact, pretty much everyone who runs a web server does it, and not even anonymously - but it is generally accepted that access logs are more or less a necessary part of normal server operation. Ethics mandate you keep them stored in a safe way and handle them confidentially, and destroy them after a reasonable amount of time.

For things that go beyond standard web procedure, such as storing personal data and linking it to various other things, I'd say ethics mandate that you tell your users exactly what you are collecting and why, and you give them a chance to opt out before you do (even if this means they can't use the service).

tdammers
  • 52,936
1

Can't comment on the legal issue, that way lies madness and largely depends on where you are and how much your net worth is.

Ethically, if you can protect the privacy of the users, I don't see a problem, but doing that is harder then it looks. Remember not to release any data that doesn't have a large enough and diverse enough pool to obscure the individual. If it's just a rating, meh. But with the cost and the region in there, if there's only one person from Ohio in the theoretical college, then you'll know exactly what he paid. If 99 people give out a good vote, and just one gives out a bad, sometimes it'll be obvious to the people in the know who that one person is. Also, if you never release the cost of the school to a user, why store it at all?

As a cop out, just tell them in big bold letters that this information will be public and the burden is theirs.

Technically though, if there's no account system and anyone can submit whatever they feel like, you can expect this system to be abused. Say, by a theoretical school giving themselves AAA ratings, or a disgruntled ex-professor down-voting his old school each day.

Philip
  • 6,762
  • 29
  • 43