This is a short how-to explaining how to correct files in the CIF COD repository by several people without collisions. 0. Initial checkout: check out a working copy of the COD repository: > svn co svn://cod.ibt.lt/cod (this may take a while, but needs to be done only once). A cod/ subdirectory should appear. 1. Go to your cod/ subdirectory: > cd cod 2. Get the recent changes: > svn update or in a shorter form: > svn up ! This step is very important if you pick the files to be corrected yourself, since in this way you will not be correcting files that are being corrected by someone else. Then, change to the cif/ subdirectory: > cd cif/ 3. Pick 10 or so files for yourself: > head -n 10 checks/symmetry-operators-for-manual-correction.lst > my.lst 4. Remove these ten files from the cifs-with-errors.lst list; assuming your favourite editor is 'nano', run > nano checks/symmetry-operators-for-manual-correction.lst and press Ctrl-K 10 times. Alternatively, use Unix tools to remove ten lines: > tail -n +11 checks/symmetry-operators-for-manual-correction.lst > tmp > mv tmp checks/symmetry-operators-for-manual-correction.lst Or even using one single command: > svn cat checks/symmetry-operators-for-manual-correction.lst \ | tail -n +11 > checks/symmetry-operators-for-manual-correction.lst Note that you need to specify one line more for the 'tail' command. 5. Commit the checks/symmetry-operators-for-manual-correction.lst file into the repository so that others know which files are already being corrected and do not take them again: > svn ci checks/symmetry-operators-for-manual-correction.lst Tell in a log message something like "Taking 10 structures for symmetry operator correction". There is no need to specify your name and date, these will be recorded automatically. You can inspect the Subversion log by typing: > svn log | less (use 'q' to exit the 'less' viewer). NOTE: the log of your last commit will appear only after your next 'svn update'. This is because immediately after commit, a Subversion working copy stays in so called 'mixed revision' state'. The freshly committed files belong to the new revision, but all other files, including the working directory, stay in their previous revisions. Thus, 'svn log' which works on the current directory will show the log of the older revision. Only the 'svn up' command will make all files and directories up to date. 5a. It can happen that someone else has taken the same structures exactely at the same time. In such case Subversion will report a conflict. You should "resolve" the conflict by typing : > svn revert checks/symmetry-operators-for-manual-correction.lst > svn up if that does not help, try the following: > svn resolved > svn revert checks/symmetry-operators-for-manual-correction.lst > svn up and repeat the picking of the new files from the step 3. ----------------- Fixing the structures ---------------------------------- 6. If the commit went smoothly, you now have 10 structures at your disposal. Take the first one and check where the syntax error is: > cat my.lst ../2/2005706.cif ../2/2005944.cif ... > cif_CODify 2/2005706.cif > /dev/null /usr/local/bin/cif_filter: 2/2005706.cif: symmetry operator 'x+1/2,-y+1/2,z+1/2'' could not be parsed The '> /dev/null' is used since at the moment you are not interested in the cif_CODify result (which is quite lengthy, it is the whole CIF written into stdout), you only want to have an error message. The /dev/null will swallow all data directed to it and discard it. > nano 2/2005706.cif navigate to the line specified in the error message and correct it. Check that the error message disappeared by running 'cif_CODify again: > cif_CODify 2/2005706.cif > /dev/null Inspect the result of the 'cif_CODify' program: > cif_CODify 2/2005706.cif | less Pay attention to the space group symbol the cif_CODify program produces: _symmetry_space_group_name_Hall '-P 2yn' _symmetry_space_group_name_H-M 'P 1 21/n 1' ... _cod_cif_authors_sg_H-M 'P 21/n' cif_CODify attempts to determine the space group symbol from the symmetry operators if those are present; it uses a full Hermann-Mauguin symbol; the original authors' symbol, if different, is left at the _cod_cif_authors_sg_H-M tag. They should be compatible. If they are not, please note this fact in a _cod_depositor_comments multi-line text field, set the _cod_error_flag value to "warnings" or "errors" depending on the severity of the difference ("error" should be used in the case the CIF will definitely produce incorrect results in crystallographic calculations; "warning" means the current CIF could be fixed to produce obviously correct results in any calculations, but it is different from the original authors' file that was most probably a mistake). The obvious compatible changes, like 'P 21/n' -> 'P 1 21/n 1', do not need any comments or error flags, they are essentially correct. (use 'q' to exit the 'less' viewer). At this time, *do not* overwrite the result from cif_CODify onto the cif file, just correct the syntax so that the file can be processed. When all manual corrections are finished, the CIFs will be reformatted with cif_CODify in a concerted manner. Fix all structures in the list in a similar fashion. If in the process you have made some bad corrections and want to undo them, the command > svn revert 2/the-cif-file-name.cif will restore the latest checkout revision; all changes to that file will be lost. You can always inspect the changes you have made using the > svn diff or > svn diff ./2/2005706.cif commands. 7. When all structures are fixed in one way or another, commit your work to the repository: > svn ci Subversion will open a text editor for editing a log message; tell in the log message that you have fixed 10 structures; record any problems if they arouse. You can set your favourite editor using the following command: > export EDITOR='nano' or define the editor to be Subversion-specific: > export SVN_EDITOR='nano' Assuming your favourite editor is 'nano'. You can also use commands with options for defining editors: > export EDITOR='emacs -nw' 9. If there are still unfixed structures, proceed to the step 2 to update your directory and proceed with corrections. You will also get corrections of other people if these have been done. Note on the corrections of loops: errors in CIF loops are especially nasty and difficult to fix. The CIF parser can only report a loop which has wrong number of elements, but not the exact position of the error. Sometimes, the only way to find the location is to remove half of the loop lines, save the file and check its syntax. If the error is gone, then the problem is in the first half of the loop, otherwise in the second. Restore the original file (either using the 'svn revert filename' command or using the undo command of your edior) and remove half of the lines of the erroneous loop part. Save the changes, check the syntax again -- this will show you which quoter of the file is a culprit. This binary search will soon converge to a single line or a bunch of lines containing errors; these need to be detected by inspection. After restoring the original file, correct the mistake. Check with > svn diff that the changes are as intended (i.e. that all lines have been restored after tests). Check the file syntax and, if it is correct, commit the file or proceed to the next erroneous file. 10. Unfortunately, sometimes CIF files have serious data errors. For example, we have spotted some files that have beginning from one CIF file, and the end apparently coming from another CIF file. In other files, lines with chemical formulae seem to be truncated, and thus some symbols might have been lost. These files will have to be checked against their original source files, and probably redeposited once more. This is a time consuming process, thus for now I suggest only to fix file's syntax in a trivial way, and entering the name of the problematic file into the 'data-problems.txt' file. After we finish the syntax corrections, we go over to more serious data problems, and develop automated tools to check the CIF files even better. The CIF files can be marked using one of the following tags: _[local]_cod_depositor_comments ; Might contain a free-text human readable comment of the CIF file. This not necessary describes an error but might have any relevant discussions and considerations regarding the deposition and the use of the containing CIF file. ; _[local]_cod_error_flag Should have one of the 'none', 'warnings', 'errors' as a value. _[local]_cod_error_source Should have one of the 'deposition', 'upstream', 'original', 'experiment' as a value. _[local]_cod_error_description ; Might contain a human-readable description of the problem. ; 11. "Don't invent data" policy: it seems absolutely necessary that the COD does not try to supplement or correct serious experimental errors, since our guess might be wrong. It seems permissible to correct obvious typos like 'Simens' -> 'Siemens', '_journal volume' -> '_journal_volume'. I have also fixed '- 1.234(2)' -> '-1.234(2)' (removing an extra space after the 'minus' sign in a table), but this is probably on the edge of what we should be doing. If there are stray values, apparently incorrect tags and such, I would suggest that we just comment them out using '##' signs. We cold follow the convention that double comment '##' signs indicate a 'commented-out' code, while comments starting with a single '#' sign and a space indicate 'real' comments with a text for human readers. Saulius Gražulis