I have worked for numerous companies in the past who have had a policy of not using any software, libraries, or datasets that would impose requirements on their product, this would include the attribution clause in the CC-BY licence.
Placing a dataset in the public domain will maximise its potential audience; whether or not this is more desirable than attribution is going to be a matter of opinion for those involved.
A major part of the problem has to do with interpretation gray areas and where these give rise to concerns, warranted or not, regarding license compatibility or other issues.
For example the question of compatibility between the CC-BY and the GNU GPL is a relatively complicated one and it boils down to "well, how do you read these two licenses?" And secondarily "Are you afraid you might be sued?" My view is that they are compatible. The FSF's view is that they aren't. I would have no problem using CCBY data along with GPL data as long as the FSF is not a copyright holder. As an example of the sort of problems that arise, I want to go into this disagreement, and I can see two reasons why the FSF may see this license as incompatible with the GPL. IANAL, TINLA, but it is an exploration of the reasons disagreements can arise, and these disagreements (which can give rise to lawsuits!) are powerful disincentives to reuse data.
Sublicensing and License Changes
The first is that the FSF has long held that the GPL places a "relicensability" requirement on other programs regardless of modification. Stallman talks about "relicensing" as a requirement and uses this as a description of why the MPL is not compatible with the GPL. In this view the fact that the CCBY explicitly states (regarding verbatim copying) renders it incompatible with the GPL by that reading:
You may not offer or impose any terms on the Work that alter or restrict the terms of this License or the recipients' exercise of the rights granted hereunder. You may not sublicense the Work.
The issue is whether the GPL requires that other licenses can be converted to the GPL when the code is transmitted verbatim. Interestingly the predominant view of lawyers I have talked with is that the BSD family of licenses does not allow sublicensing either and therefore would be incompatible with the GPL for the same reason if this logic holds. (When I asked Eben Moglen however, he said he thought the BSD licenses allowed sublicensing which is, I think, why the FSF differs regarding these licenses.)
From the CC-BY:
You must keep intact all copyright notices for the Work and give the Original Author credit reasonable to the medium or means You are utilizing by conveying the name (or pseudonym if applicable) of the Original Author if supplied; the title of the Work if supplied; to the extent reasonably practicable, the Uniform Resource Identifier, if any, that Licensor specifies to be associated with the Work,
Copyright notices cannot be removed anyway.
This is at least arguably in keeping with the reasonable legal notices portion of the GPL v3.
In essence I am unsure how the CCBY can be incompatible with the GPL while the BSD license family (assuming no advertising clause) is compatible. Legal notices (including credit of authorship) have always been allowed under the GPL at least by common practice.
Jurisdictions differ regarding whether data from databases of facts is subject to copyright and to what extent. This adds another layer of complexity in determining what real restrictions are at issue.
In the end the issue is more basic:
The more a business has to look into these, the more they are burdened by basic questions of whether they can re-use the data. If they even have to ask the question, this involves asking what the motivations of the licensors are, how they interpret some of these gray areas, what might happen in court in whatever jurisdictions are relevant, if fear of a lawsuit makes this out of the question and much more. In general the more folks have to ask the more they are going to look for other sources.
These questions are not trivial, they are fact-bound. They require hiring attorneys, possibly asking counterparties how they interpret things, license compliance and everything else one tries to get away from by going with open data.
The legal text of CC-BY is quite complex, and has more terms than simple attribution, such as not implying the author endorses your work. Although such terms such as this may be useful, they make the license incompatible with other licenses (including the GPL http://www.gnu.org/licenses/license-list.html#ccby). I'm not sure whether this includes any common open data licenses. This potentially prevents others combining your data with other data.
CC0 on the other hand is very straightforward, and guarantees your dataset can be combined with any other dataset that someone has the rights to use.
Regarding this question, I think this post by Denny Vrandečić (the project leader of Wikimedia's Wikidata) is well worth reading: https://plus.google.com/104177144420404771615/posts/cvGay9eDSSK
Denny knows very well what he is talking about. I'll just quote the first sentences as a teaser:
tl;dr - If you publish data, attach the CC0 license to it, but that’s basically just advertising - don’t think it means anything. If you use data, you do not have to care much about the data license. If you republish data, it’s a bit more complicated, but not as horrible as you might think.
The main benefit is moving the attribution requirement from being a legal part of the license, i.e. something that you MUST do, to being a norm, i.e. should that you OUGHT to do (to be polite).
The reason why this is beneficial is because attribution can be difficult:
This adds friction when attempting to re-use data and may even dissuade people from using it.
By encouraging attribution to be a community norm, we can remove this source of friction and rely on being making attribution on a "best effort" basis.
This is what Creative Commons themselves have to say on the topic:
CC0, the public domain dedication, can also be used on databases. The effect is to waive all copyright and related rights in the database, placing it as close as possible into the worldwide public domain. In certain domains, such as science and government, there are important reasons to consider using tools like CC0. Waiving copyright and related rights eliminates all uncertainty for potential users, encouraging maximal reuse and sharing of information. Where waiver is not a viable option and some conditions on reuse are necessary, rights holders should consider using CC licenses that give the public more freedom to reuse and remix the content.
I have an application which reuses datasets from 50 sources. Should I update my "about" page if all these datasets would need attribution?
It's also very difficult to prove that a certain dataset is used. If you are a data publisher, are you going to sue if a developer doesn't give the right attribution for a certain application? If not, then it's not worth licensing your data under CC BY since it will hold people back to reuse it (e.g. to sublicense a mash-up of datasets).
You can however "ask" people to attribute you, but still just use a CC0 license for the ease of reusers.
Attribution causes obstacles to re-use, and depending on the actual attribution requirements may be prohibitive.
If you create a collection/mash-up of a large number of CC-BY datasets, you have to provide attribution to every single one of them, heeding their attribution requirements. In the best case, you will have to ship the result with a long list of attribution statements that may exceed the size actual product. In the worst case, certain sources require you to print their statement on certain places of the result.
Take for example the OpenStreetMap project:
If you want to use this data in conjuction with a large number of other sources with similar requirements, you end up with a rather unsightly block of text in your map - which may be just a tiny reference frame on your website. According to the requirements, you cannot just hide the attributions behind a single "sources" link to another page.
That is correct.
Public (i.e. government) data doesn't need formal attribution or recognition as it was already rewarded by giving a salary to those who put it together. Although good practice dictates that the data provenance be preserved.
("Attribution", in contrast, is a type of currency for proper formation of the data ecosystem we're trying to create (as citizens), rewarding "privately"-curated data.)