|
![]() |
![]() |
The Science Commons has formally recommended how resources such as Tranche can mark data as either open-access or restricted. Please see the main documentation page for more information. This page summarizes how Tranche's hashes reflects the licensing terms specified on an uploaded data set.
The key point: Tranche adds value to CC0 licensed data sets because it provides a proper citation that formally verifies the data is CC0 licensed, verifies that the data hasn't changed since publication, and allows access to the data from any computer without worrying about 'link rot'.
The following FAQ help elaborate these points.
Tranche brings one very significant feature to using explicit licenses, such as CC0, that don't normally come with the license. Typically licensing information is included as a text file that is part of a code or data distribution (e.g. a license.txt file in a ZIP). This in no way assures a user that the data, license text, or any file that is part of the distribution matches hasn't changed. For example, if a scientific manuscript cites an open access data set, there may be no way to formally check if the distributed data matches the citation. It is trivial to change the files in a ZIP, and recreate what appears to be an original.
Tranche provides both a proper citation and a proper check that ensures data hasn't changed since publication -- term "Tranche Hash" refers to these features. If you are familiar with computer science terminology, it is also fair to call a Tranche Hash a "checksum". It is done using standard algorithms, and it can be easily reproduced either using the Tranche code or a custom script. Thus, Tranche significantly increases the value of a CC0 license (or similar) if used in the following way.
Do you have to use a Tranche Hash to properly take advantage of CC0 style licenses? Certainly not. Will a Tranche Hash hold up in the court of law if the licensing terms of data are questioned? There is no guarantee; however, it is potentially the most technically sound defense one could provide. Technically sound is not meant to imply the best legal strategy.
Tranche hashes are intended to keep data distributors honest. There can be no fiddling with data sets post publication without leaving a noticeable mark. When Tranche adds licensing terms to a project it includes them as a plain-text file named license.txt. Thus, when a Tranche hash is calculated it takes the license.txt file in to consideration.
If the licensing terms are changed, a different license.txt file is generated. The different file means that a different Tranche has will be generated.
We recommend two things to do if you'd like to cite a Tranche hash and make an affirmation regarding open access availability.
The key change is the notable affirmation that the data is both open access and under a particular license. This provides a very human readable method of conveying that the data is open access. Realistically, the Tranche hash alone doesn't convey this in a form that non-Tranche users would automatically understand.