I will have to be brief. If you need to follow up, let me know.
I assume US law governs throughout; an inaccurate but necessary assumption here.
If you extract only the actual coordinate data you have no copyright liability. One cannot copyright facts, only the expression incident to factual reporting. This principle was recognized by the US Supreme Court in 1915 with respect to news reports sent by telegraph. The idea/expression distinction has been held by the Supreme Court to prevent assertion of copyright over telephone white pages, where there is no originality in the concept of alphabetic organization of data. More complex forms of association or organization of data might give rise to claims.
You should move quickly. Proposals for database protection in the US and Europe will close up vast areas of human knowledge within the next decade. Make this data free soon, or you risk losing the chance. How to license your data so that everyone is compelled to make free their improvements or accessions to it is another subject.
Eben Moglen voice: 212-854-8382
Professor of Law fax: 212-854-7946 firstname.lastname@example.org
Columbia Law School, 435 West 116th Street, NYC 10027
General Counsel, Free Software Foundation
A - Good practices
We encourage crystallographers to submit CIFs containing unit cell parameters, coordinates and (when available) atomic displacement parameters (etc) for materials they are interested in seeing in the COD. Researchers may submit CIFs from their own published work, from soon-to-be submitted papers or even for structures that will not be published. Researchers are also encouraged to submit CIFs generated from any paper published in the open literature.
Be aware that if you submit data prior to publication, journals can complain that your work is already published, and reject your manuscript. So, check about the preprint policy of the journal.
About creating/copying/extracting operations involved in CIF generations :
- The best situation is corresponding to an original CIF as created by the structure refinement software used by the researcher, checked and completed with reference and useful crystallographic details.
- For CIFs generated from any paper published in the open litterature, there could be different viewpoints. From Alan Hewat : "No permission is needed if the data is extracted from the original publication". However, we are conscious that "extracting" can be the subject of lenghty discussions. And the concept of "original publication" is also matter of infinite controversy. Scribes and medieval copyists had only one way, by hand : "manuscript" had to be taken sensu stricto, using quills, papyrus, etc. Gutenberg, provided the way to make numerous identical copies. Modern ways allow to reproduce a digitalized text by one command typed on your computer keyboard : "copy file.cif newfile.cif" (or by a few mouse clicks). Then that newfile.cif can be edited and changed, either manually or automatically by using a software especially written for that purpose. The "original publication" can be the paper release from which two approaches are already possible : typing the digital CIF letter by letter and number by number on your computer keyboard, or using a scanner on the crystal data part and then checking the quality of the scan, correcting errors if any. But the "original publication" can also be a PDF file nowadays. From such a PDF, a simple copy-paste is now possible, completing fast a CIF skeleton by the crystal data.
B - Questionable practices
Is copy-paste from a PDF file a questionable practice ? We are not absolutely sure. This is a new digitalized form of the "original publication." Some other special cases can be considered here :
- Atomic coordinates are not always included into the original manuscript.
In such a case, there is generally a footnote in the manuscript about atomic
coordinates freely available at CCDC and FIZ, respectively for data that
were deposited at CSD and ICSD. In that case, it appears to us that the
"original publication" is now really the data you may obtain freely from
CCDC or FIZ by providing the proof that you possess the paper (or PDF)
version (the data will be sent only if you can give the deposition number
and the full paper reference). You will receive a CIF which will be exactly
the one included into the commercial database. Can you send it to the COD
? We would say yes (but check it for typos) because the "original manuscript"
in which you can copy without permission is really uniquely that CIF. Moreover,
even if you are not a regular user of CSD and ICSD, these data will be
provided to you by CCDC and FIZ, free of charge. This is the result of
an agreement between journals and databases. Detailed address for these
free access data follow :
- The CCDC may provide you with your own CSD CIFs. (and see the new enCIFer software)
- For ICSD : "Detailed data on crystal structures and refinement can be obtained from the Fachinformationszentrum Karlsruhe, D-76344 Eggenstein-Leopoldshafen, Germany; E-mail : crysdata@FIZ-Karlsruhe.de, on quoting the names of the authors, the literature citation and the depository numbers."
- Another special situation is encountered with journals that make CIFs freely available at their Web sites (IUCr journals for instance since 1991). You may be a co-author of the paper so that you made that CIF. You could have lost it in a computer crash and thus we do not see why the journal would not allow you to recover it there and to send it to the COD. But who knows ? Now, if you are not a co-author of that paper, are these CIFs assimilable to the "original manuscript", and can you get them and deposit them to the COD, if your university library has subscribed to the journals ? Will you not have that right if the journal is not in your university library ? We have no clear answer here (the COD sent a query to IUCr for obtaining these CIFs, the question is being examined by the IUCr Executive Committee). It is to be noted that the commercial databases have access to these CIFs, without restriction (to our knowledge), so why not the COD ?
C - Practices to be discouraged
Copying directly data from commercial databases is prohibited by the databases owners (as a user, you may have signed a license, though some old CD-ROMs may have no license text inside). See for instance the ICSD conditions of use. And then, maybe read again the email from Prof. Eben Moglen.
D - What is an ideal CIF ? The Cartesian view and a disclaimer
An ideal CIF is a file free of any typos and presenting a high quality
crystal structure. Obviously, and using maximal logic, this means that
there would be no way to distinguish an ideal CIF in the COD from an ideal
CIF in the CSD, ICSD or CRYSTMET commercial databases. This means that
a crystallographer who would decide to copy a CIF from the commercial databases,
and would remove the typos, would have illegally built that ideal CIF,
undistinguishable from the ideal CIF typed from the paper-form litterature,
or obtained by a copy-paste from a PDF file or etc. Thus, without that
possibility to detect fraud, the COD cannot be consider in any way responsible
for the possible use of bad practices by Crystallographers.
NIH policy : "the new NIH policy requires that atomic coordinates from X-ray crystallographic and nuclear magnetic resonance experiments that were supported by NIH grants to be deposited into the appropriate structural database at the time of submission of a research article drawing conclusions from these data. This information should be released immediately at the time of publication."
policy at the Royal Society.
Report title : "Keeping science open : the effects of intellectual property policy on the conduct of science. Science relies on the free and rapid exchange of ideas and information. Intellectual Property Rights (IPRs) can protect creative work and investment in all areas, but may also restrict this exchange. This report considers whether the progress of science has been affected by the interpretation and use of IP policies, and makes recommendations for improvement." (...)
"We recommend that scientists ensure that any publicly funded data that are made available to private databases are done so non-exclusively, and that at least one repository of the information is liberal regarding access to and use and manipulation of the data."(...)
"The House of Lords inquiry concluded that there were IP issues to be resolved: What role should private databases play in the information chain? Should private databases be allowed to charge for information that is in the public domain or publicly funded ? Should publicly
funded databases charge for access ?"
"We recommend significant Government support for the organisation, publication and maintenance of data that it has funded, which might otherwise be or become inaccessible. Since the cost of scientific information is high, and the value added by proper access is great, it makes no sense to allow the value of publicly funded data to be constrained by limitations to access in private databases.(...) We recommend that databases with public funding be readily accessible, and be either free or the charge merely be the cost of permitting access or of supplying the information. It may not be appropriate to recover even the cost of supply, since for non-material transfers the administrative cost of collection normally outweighs the value of at-cost revenue. It is particularly important for science in developing countries that access to databases by their scientists is free."
See Open Science.
access to reagents and structural coordinates? The Nature journal
"we strongly encourage immediate release of coordinates, and in fact we find that most authors do choose to release their coordinates upon publication. It is now time, especially given the current discussions regarding increased access to all scientific information, to re-examine our policy, to see if it makes sense to dispense with the hold altogether — and we will be looking at this issue over the next few months."
McMahon, IUCr CODATA Representative saying :
"I feel that the IUCr should certainly consider implementing an OAI-PMH based data server, and perhaps also run harvester software. Among the possible applications are:
* By offering metadata records in the PubMed (and other) formats we could optimise the transfer of our metadata to arbitrary linking partners.
* We could harvest non-published materials such as theses, providing in the first instance a web catalogue of theses in crystallography, subsequently perhaps providing links from article reference lists to theses and other such reports.
* It could provide a possible route for limited access to databases such as CSD, which do not currently offer web access.
* More speculatively, it is a technique we might persuade generators of crystallographic data sets (synchrotron laboratories, service crystallography facilities) to adopt so as to auto-catalogue and provide access to such primary data."
See the Reciprocal Net - and if you send your crystal data there, why not to the COD too ? The Reciprocal Net project is constructing and deploying an extensive distributed and open digital collection of molecular structures.
See the Public Library of Science.
new model for the knowledge economy" :
"...the shift to open-access publishing is interesting. For a start, it is real, and it entails a fundamental shift of IP ownership, from the publisher back to the author. Not only that, the shift is accompanied by a concomitant upending of the reward system. Unlike the publishers, academic authors do not wish to profit from owning intellectual 'property', but rather to benefit the wider research community by allowing free access to the content ad infinitum. The rewards they do reap are the same as always: exposure, recognition within their circles, and credit for their achievements - not direct financial pay-back - since even the resultant increased access to resources for themselves only feeds back into the system. For this reason academics were perhaps always among the most likely to catalyse a new approach, but it is no less significant for this fact. The shift is revolutionary on a social as well as a business dimension: it proves not only that endeavours can be motivated by non-monetary rewards, but that indeed there is a strong drive to create systems to support this philosophy. "
"Meanwhile the International Council for Science (ICSU) has collected input from the international science and technology community, ahead of the first stage of the UN World Summit on the Information Society (WSIS), coming to Geneva later this year. ICSU's focus is on four key themes, which include: 'Ensuring universal access to scientific knowledge internationally' and 'Scientific data and information as a global public good'. UNESCO is also involved in this process, and states that it upholds 'universal access to information, equal access to education, cultural diversity and freedom of expression' as 'essential principles for developing equitable knowledge societies'. "
"Similar themes were addressed at a recent seminar entitled 'Knowledge: common heritage not private property,' organised by the UK's Scientists for Global Responsibility, in which a number of scientists discussed elements of a new collaborative paper entitled Towards a Convention on Knowledge and proposed alternative approaches to the processes of scientific enquiry, information-dissemination and assigning IP rights. "
(IP = Intellectual Property)
A form should be soon (?) available on the Web for building CIFs.
Visit the IUCr Web page for knowing all about CIFs.
Ask Google about CIFs : it does not mean exclusively "Canadian Institute of Forestry."
Sign the Petition
for Open Data in Crystallography