Sorting is normally by BibTeX citation label name, or by @String macro name, and letter case is always ignored in the sorting.
For the sort order options beginning -by, the last one seen overrides all earlier ones.
All options are parsed before any input bibliography files are read, no matter what their order on the command line.
Except for the options described below, command-line words beginning with a hyphen are assumed to be options to be passed to sort(1).
The leading hyphen that distinguishes an option from a filename may be doubled, for compatibility with GNU and POSIX conventions. Thus, -author and --author are equivalent.
All remaining command-line words are assumed to be input files. Should such a filename begin with a hyphen, it must be disguised by a leading absolute or relative directory path, e.g. /tmp/-foo.bib or ./-foo.bib.
The sort(1) -f option to ignore letter case differences is always supplied. The -r option reverses the order of the sort. The -u option removes duplicate bibliography entries from the input stream; however, such entries must match exactly, including all white space.
Sort keys are constructed from several parts of the BibTeX entry. If non-numeric values are found where numbers are normally expected (that is, for BibTeX day, number, pages, volume, and year, keys), they are replaced by large integers that will sort higher than any reasonable integer value likely to be present. Nondigits after the first character are ignored, so 20S will reduce to 20: such values are occasionally seen for volume, number, and pages values.
However, uncertain year values of the form 19xx or 20xx are sorted at the end of their century.
With -byday sorting, a day keyword is recognized (it will be standard in BibTeX 1.0), but for backward compatibility, month entries of the form
"daynumber " # monthname "daynumber~" # monthname {daynumber } # monthname {daynumber~} # monthname monthname # "daynumber " monthname # "daynumber~" monthname # {daynumber } monthname # {daynumber~}
are also recognized, and will yield both a day and a month. If a day number is not available, a very large value is assumed, which will sort the entry after others that have day values in the same year and month.
The reason for ignoring the issue number is that some journal databases lack that information. If -byvolume were used, then articles lacking issue numbers would be sorted separately from those with issue numbers, which makes it harder to check for duplicates, or to compare entries with original journal issues.
With -byvolume sorting, warnings are issued for any entry in which any of these fields are missing, and a value of the missing field is supplied that will sort higher than any printable value.
Because -byvolume sorting is first on journal name, it is essential that there be only one form of each journal name; the best way to ensure this is to always use @String{...} abbreviations for them. Order -byvolume is convenient for checking a bibliography against the original journal, but less convenient for a bibliography user.
An unfortunate implementation limitation of the current BibTeX requires cross-referenced entries to appear after all other entries that cross-reference them, although this limitation works to the advantage of bibsort, allowing single-pass processing.
- 1.
- Introductory material such as comments, file headers, and edit logs that are ignored by BibTeX. No line in this part begins with an at-sign, ``@''.
- 2.
- Preamble material delineated by ``@Preamble{'' and a matching closing ``}'', intended to be processed by TeX. Normally, there is only one such entry in a bibliography file, although BibTeX, and bibsort, permit more than one.
- 3.
- Macro definitions (abbreviations) of the form ``@String{...}''. Any single @String specification may span multiple lines, and there are usually several such definitions.
- 4.
- Bibliography entries such as ``@Article{...}'', ``@Book{...}'', ``@InProceedings{...}'', and so on, provided that their citation labels have not already been encountered in a crossref assignment in a preceding entry. For bibsort, any line that begins with an ``@'' followed by letters and digits and an open brace is considered to be such an entry. Optional spaces and tabs may surround the ``@'', and precede the first open brace; these spaces and tabs will be deleted from the output to help standardize the appearance.
- 5.
- ``@Proceedings{...}'' bibliography entries, which are likely to be cross-referenced by ``@InProceedings{...}'' entries, and any other bibliography entries for which a crossref assignment was met before the entry itself.
The order of these parts is preserved in the output stream. Part 1 will be unchanged, but parts 2--5 will be sorted within themselves.
The sort key of ``@Preamble'' entries is their initial line, of ``@String'' entries, the abbreviation name. For all other BibTeX entries, the sort key is citation label between the open curly brace and the trailing comma, unless the sort key is prefixed with additional fields as requested by -byvolume or -byyear options.
bibsort will correctly handle UNIX files with LF line terminators, as well as IBM PC DOS files with CR LF line terminators; the essential requirement is that input lines be delineated by LF characters. Thus, files from the Apple Macintosh, which uses bare CR to terminate lines, would first have to be converted to UNIX or PC DOS line format before giving them to bibsort.
The user must be aware that sorting a bibliography is not without peril, for at least these reasons:
- 1.
- BibTeX has a requirement that entry labels given in crossref = label pairs in a bibliography entry must refer to entries defined later, rather than earlier, in the bibliography file. This regrettable implementation limitation of the current (pre-1.0) BibTeX prevents arbitrary ordering of entries when crossref values are present. To partially solve this problem, bibsort will place ``@Proceedings'' entries last, since they are frequently cross-referenced by ``@InProceedings'' entries. However, it is also possible for ``@Book'', ``@InBook'', and ``@InCollection'' entries to cross-reference ``@Book'' entries, and for ``@Article'' entries to cross-reference other ``@Article'' entries. Neither of these cases are dealt with by bibsort, except that ``@Book'' entries that contain a ``booktitle'' assignment, and entries that are explicitly cross-referenced before their definition, are sorted with ``@Proceedings'',
- 2.
- If the BibTeX file contains interspersed commentary between ``@keyword{...}'' entries, this material will be considered part of the preceding entry, and will be sorted with it. Leading commentary is more common, and will be moved elsewhere in the file.
This is normally not a problem for the part 1 material before the ``@Preamble'', since it is kept together at the beginning of the output stream.
- 3.
- Some kinds of bibliography files should be kept in a different order than alphabetically by citation labels. Good examples are a bibliography file with the contents of a journal, or a personal publication list, for both of which chronological publication order is likely to be preferred.
While a much more sophisticated implementation of bibsort could deal with the first point, and the -byvolume option provides a partial solution to the third point, in general, a satisfactory solution requires human intelligence and natural language understanding that computers lack.
bibsort uses octal ASCII control characters 001 through 007, 0177, and 0377, for temporary modifications of the input stream. If any of these are already present in the input, they will be altered on output. This is unlikely to be a problem, because those characters have neither a printable representation, nor are they conventionally used to mark line or page boundaries in text files.
Some implementations of BibTeX editing support in GNU emacs(1) have a sort-bibtex-entries command that is functionally similar to bibsort. However, the file size that can be processed by emacs(1) is limited, while bibsort can be used on arbitrarily large files, since it acts as a filter, processing a small amount of data at a time. The sort stage needs the entire data stream, but fortunately, the UNIX sort(1) command is clever enough to deal with very large inputs.
The current implementation of bibsort follows the UNIX tradition of combining simple already-available tools. A six-stage pipeline of egrep(1), nawk(1), sort(1), and tr(1) accomplishes the job in one pass with about 800 lines of heavily-commented shell script, about 400 lines of which is a nawk(1) program for insertion of sort keys. The initial prototype of bibsort was written and tested on several large bibliographies in a couple of hours, and after considerable use, was later extended with advanced sorting capabilities and cross-reference recognition in a couple of days of work. By contrast, bibtex(1) is more than 11 000 lines of code and documentation, and bibclean(1) is more than 15 000 lines long; both took months to develop, implement, and test.
Nelson H. F. Beebe, Ph.D. Center for Scientific Computing University of Utah Department of Mathematics, 322 INSCC 155 S 1400 E RM 233 Salt Lake City, UT 84112-0090 USA Tel: +1 801 581 5254 FAX: +1 801 585 1640, +1 801 581 4148 Email: beebe@math.utah.edu, beebe@acm.org, beebe@ieee.org (Internet) WWW URL: http://www.math.utah.edu/~beebe
ftp://ftp.math.utah.edu/pub/tex/bib/
in the file bibsort-x.yy.tar.gz where x.yy is the current version. Other distribution formats are usually available in the same location.
That site is mirrored to several other Internet archives, so you may also be able to find it elsewhere on the Internet; try searching for the string bibsort at one or more of the popular Web search sites, such as
http://altavista.digital.com/ http://search.microsoft.com/us/default.asp http://www.dejanews.com/ http://www.dogpile.com/index.html http://www.euroseek.net/page?ifl=uk http://www.excite.com/ http://www.go2net.com/search.html http://www.google.com/ http://www.hotbot.com/ http://www.infoseek.com/ http://www.inktomi.com/ http://www.lycos.com/ http://www.northernlight.com/ http://www.snap.com/ http://www.stpt.com/ http://www.yahoo.com/