Manuscripts of research papers, available over the internet from journals in the form of PDF files, may contain nuclear structure data in tabular form. Putting these data into ENSDF format can be very time-consuming and error-prone. This is especially true for the large number of high-spin nuclear structure studies published each year. The two programs discussed here, namely pdfps2txt and txt2ensdf, represent an attempt at semi-automatic conversion of tabular data in pdf files to ENSDF format.
This PDF-to-ENSDF conversion software takes two forms, depending on the content of the published tabular data. Tables that list level energies are more readily converted in an automatic mode; tables that do not list level energy data require additional input in order to construct a unique level scheme.
Some useful web sites are:
http://www.edpsciences.com/docinfos/journals-subject.html#physics
http://publish.aps.org/
http://prc.aps.org/
http://prl.aps.org/
http://www.nucphys.nl/www/pub/nucphys/npa.html
In Acrobat Reader, select the "Print..." option under the "File" menu. In the Print pop-up window, select "Print to File", and enter a filename for the resulting .ps file. Under "Print Range", select the "From:" button, and enter the limiting page numbers containing the tabular data to be converted. Then click the "OK" button to proceed with the pdf-to-ps extraction.
You can now simply run pdfps2txt. This program will attempt to
extract the text characters from the .ps file, while at the same time
trying to format the data as on the printed page by filling with
spaces. Enter the filename of your saved .ps file at the prompt
"Input filename = ?".
You will then also be asked for an output file name.
Most .ps files use at least two different fonts, one for standard
characters and another for special symbols, super/subscripts, etc.
Pdfps2txt tries to figure out which font is used for symbols; if it
succeeds in identifying a good symbol font candidate, it will tell you
what number it is, and prompt you for confirmation with a message
"...ok? (Y/N)".
Normally you should answer Y for yes, but if the
translation fails and you suspect that this was because the symbol
font number was incorrectly identified, you can rerun pdfps2txt,
answer N for no, and enter your own number.
If the program is unable to identify a good candidate for the symbol font, you will be prompted to enter a "symbol font number". Open the .ps file in a text editor, looking through the postscript source for lines such as:
/N16 1 Tf /N10 1 Tfwhich define the font to be used for following characters. The "font number" in these two examples are 16 and 10, respectively. The font is usually switched back and forth between the symbol and the standard fonts, so that superscripts, etc. can be interspersed with standard text. Try to guess which number the symbol font is, and enter it at the prompt. Some common characters given in the symbol font are '1', '2', '5' and ':', which get translated to the output characters '+', '-', '=' and '.', respectively.
Once the pdfps2txt program knows what font number to interpret as the symbol font, it reads through the input file and writes formatted text to the output file.
You may wish to add some other information also, such as default values to use for the uncertainties in the gamma energies and intensities. You may also need to remove footnotes buried in the data columns, improve some formatting by realigning columns, etc.
The first line of the text file will be interpreted as an ENSDF
dataset identification record, such as:
228TH 228PA DECAY Weber et al., Eur. Phys. J. A3, 25-48 (1998)The second line specifies what data fields the various columns in the data table correspond to; for example, the second line
Ei Ji Eg Igwould indicate that the data table has four columns, with information on the level energy for the initial level, the spin-parity for the initial level, the gamma-ray energy and the gamma-ray intensity, respectively. The spacing between the labels in this second line also provide a rough indication of the length (in characters) of the various data fields in the table. You may need to experiment a little with this spacing as you become familiar with the text-to-ENSDF programs.
The text-to-ENSDF programs can be told to interpret data fields for the following information:
EI - Level energy and uncertainty for the initial level of a transition; JI - Spin-parity for the initial level of a transition; BI - A one-character band label for the initial level of a transition; EF - Level energy and uncertainty for the final level of a transition; JF - Spin-parity for the final level of a transition; BF - A one-character band label for the final level of a transition; EG - Gamma-ray energy and uncertainty; IG - Gamma-ray intensity and uncertainty; DCO - Gamma-ray DCO ratio; A2 - Gamma-ray a2 (A2/A0) angular distribution coefficient; MU - Gamma-ray multipolarity; and MR - Gamma-ray mixing ratio; X - "Junk" data fields to be ignored.Level energies in the form of X+123.4, or 567.8+A, for example, will be correctly processed. Uncertainties in level energies, gamma energies and intensities are acceptable in either of two common forms, such as 123.45(0.12) or 123.45(12), and will be correctly translated to the standard ENSDF form. Levels with the energy and/or spin-parity in parentheses will be flagged as tentative in the ENSDF file, as will transitions with the energy and/or intensity in parentheses.
Lines such as
$degam 0.2 $digam 3can be added to specify default values for the gamma-ray energies and intensities, respectively. In addition, the line
$band A Bindicates that any transitions after this line and before the next $band line are by default from band A to band B.
Lines (other than the first two lines) not beginning with a digit (0 through 9), +, -, ( or $ are ignored by the text-to-ENSDF conversion programs. Data fields for particular transitions, and for which no information is given in the table (i.e. the data field is blank), are ignored or if necessary given a default value.
Initial level energies that are given as -1 in the data table indicate that the initial level of the current transition is the same as that for the preceding transition in the table. This is very useful for published tables that leave repeated initial level data fields blank (see, for example, Weber et al., Eur. Phys. J. A3, 25-48 (1998), the text file for which is shown below).
In order to illustrate the above rules, the first 25 lines of two different sample text files, ready for processing by txt2ensdf and xmt2e, respectively, are given below.
228TH 228PA DECAY Weber et al., Eur. Phys. J. A3, 25-48 (1998)
Ei Ji Eg Ig
*
*228Th populated in the decay of228Pa (II)
Table 1.Energies and intensities of gamma-rays assigned to levels and
transitions in 228Th
Initial level Eg(keV)a Ib
57.76 2+ 57.76(2) 55(3)
186.82 4+ 129.06(2) 457(23)
328.00 1- 270.25(2) 339(17)
-1 328.03(4) 300(30)
378.17 6+ 191.35(2) 46.0(23)
396.08 3- 209.26(2) 263(13)
-1 338.32(2) 817(41)
519.20 5- 141.00(2) 20.8(10)
-1 332.37(2) 210(11)
695.4 7- 317.2(3) 1.2(6)
831.83 0+ 503.7(2) 7.4(9)
-1 774.07
874.40 2+ 478.45(4) 31.7(17)
-1 546.45(2) 28.0(14)
-1 687.8(2) 4.8(10)
-1 816.50(12) 3.3(4)
-1 874.5(2) 7.7(12)
938.6 0+ 610.7(2) 2.4(7)
118XE 58NI(64ZN,???G) Sears et al Phys Rev C57, 2991 (1998)
Eg Ig X DCO Ji Jf MU
*
$degam 0.2
*
TABLE III.g 118Xe.
Eg(keV)a Ig(%)b RDCOc RDCO8 c Iip Ifp Multipolarity
157.5 0.7(0.1) 9- 8- M1/E2
172.0(1.0) 14+ 14+ M1/E2
192.0 1.8(0.1) 2.43(0.11) 10- 9- M1/E2
211.2(0.6) 7- 7- M1/E2
246.6 5.5(0.6) 0.83(0.03) 3.23(0.07) 10- 9- M1/E2
255.8 1.5(0.3) 1.12(0.13) 3.86(0.22) 12+ 12+ M1/E2
268.9 4.5(0.5) 0.67(0.05) 2.80(0.09) 11- 10- M1/E2
286.8 4.5(0.4) 0.93(0.09) 4.17(0.22) 9- 9- M1/E2
297.5 1.3(0.2) (11-) 11- (M1/E2)
319.6 3.8(0.4) 0.77(0.05) 2.66(0.10) 12- 11- M1/E2
337.5 100.0(4.0) 1.00 4.00 2+ 0+ E2
341.8 (9-) 9- (M1/E2)
344.9 3.7(0.4) 13- 12- M1/E2
384.5 1.6(0.2) (14-) 13- (M1/E2)
403.9 1.7(0.2) 0.85(0.06) 3.38(0.12) 10- 8- E2
405.4(0.4) 1.7(0.2) (15-) (14-) (M1/E2)
418.5 1.0(0.2) 0.72(0.12) 3.31(0.30) 8- 7- M1/E2
423.6 1.8(0.2) 3.77(0.46) 7- 5- E2
In the first example, note the use of -1 to indicate repeated initial
levels. In the published manuscript, the initial level energy and
spin were given only for the first transition depopulating a level.In the second example, two different types of experimental DCO ratios are given; I chose to ignore the first value and copy the second value to the ENSDF file. Note too the default value of 0.2 keV for the gamma-ray energies, specified in line 4. Lines 6 and 7 are output from pdfps2txt; since they are ignored by xmt2e anyway, they were simply left as is.
Note that the alignment of the data columns does not need to be perfect. Some slop is programmed into the column widths, as can be seen from the second example, which was correctly interpreted by xmt2e.
As mentioned above in section 1, the text-to-ENSDF software depends on the content of the published tabular data. The program txt2ensdf can handle only tables that provide level energy information. There are also two more programs that incorporate a graphical user interface and the RadWare graphical level scheme editor, "GLS". Xmt2e3, which like txt2ensdf is for files with level energies, can be useful in some cases, especially in nuclei with band structures. The third program, xmt2e, allows conversion of tables with no level energy information. Such tables require additional input in order to construct a unique level scheme, and hence the conversion process is necessarily more complex. In this section we discuss only the conversion of the former class of data, and the simplest, non-graphical program txt2ensdf that can do an automatic conversion.
The program txt2ensdf prompts you for both input (text) and output (ENSDF) file names. It then reads through the text file, attempting to decode the columns of data and assign them to the different data fields. The data are sorted by level energy, and then, for degenerate levels, by spin. If more than one transition is reported from a given level, those transitions are then ordered by gamma-ray energy.
Care is taken to prevent typographical errors and internal inconsistencies in the published tables from creating spurious levels in the output ENSDF file. For example, it is not uncommon for a level to have spin-parities listed as (for example) (10+) in one place and 10+ in another place in the table. For this reason, after sorting the data, txt2ensdf checks for "doublets", which it defines as any levels which differ in energy or spin-parity, but that lie within 2 keV of each other. Any doublets that it finds, it reports to you, and asks if they are valid, as in the following example:
***ii Doublet: energy spin band
492.20+- .00 7/2+ F
492.60+- .00 7/2- D
...Do you want to leave this doublet alone? (y/n)
y
***ii Doublet: energy spin band
2312.70+- .00 21/2+ H
2312.80+- .00
...Do you want to leave this doublet alone? (y/n)
n
Enter correct energy, or a for 1st value, or b for 2nd value.
a
Enter correct spin, or a for 1st value, or b for 2nd value.
a
Here the first doublet is valid, and was left as is. The second
doublet was considered to be an error, and the two levels were
combined to a single 21/2+ level at 2312.7 keV. Incidentally, the
"***ii" field in the doublet report heading indicates that the
inconsistency is between the initial level of two different
transitions. The program also checks for doublets between the initial
level of one transition and the final level of a different transition,
and between final levels of two different transitions; they are
labeled "***fi" and "***ff", respectively.Once all doublets have been resolved, the program sorts the data records again (since the values may have been changed by that process) and writes the ENSDF output file. You should check this file for consistency with the published data, and add any notes or comments that you consider appropriate. You may also wish to process the ENSDF data set further, for example by performing a least-squares fit to the level energies. (Xmt2e and xmt2e3 can perform their own least-squares fits, and allow you to adopt the resulting level energies and uncertainties.)
Examples of some data files (*.ps, *.txt, *.ensdf) converted from PDF
with pdfps2txt and txt2ensdf/xmt2e/xmt2e3 are also available from that
directory,
ftp://radware.phy.ornl.gov/pub/nd/misc/
The programs should compile and run on most systems with a standard fortran compiler with little or no modification. They were developed under unix, but make no use of special operating-system-specific features.
Feel free to modify and/or redistribute the source codes.
I greatly appreciate getting comments, suggestions and questions. Please e-mail them to me at radforddc@ornl.gov.