Tools

For a full description of the ISIS-JSON representation, see: ISIS-JSON

ISIS and ISO-2709 to JSON export

JSON is a very common import/export format for modern semistructured databases. The isis2json.py Python/Jython script can be used to export any ISIS database to JSON.

If the input file is in ISO-2709 format (.iso), the script runs under Python or Jython. For input files in native ISIS format (.mst+.xrf), the script uses the ZeusIII Java library developed by Heitor Barbieri at BIREME, and must run under Jython 2.5 or above.

For example, the code below exports an ISO-2709 file to JSON, using some optional switches:

  • -c makes the output compatible with the CouchDB bulk insert format;
  • -f explodes fields into dictionaries with sub-field marker as keys;
  • -q 1 outputs only one record;
$ ./isis2json.py cds.iso -c -f -q 1 > cds1.json

This is the output of the example above (indentation added for clarity):

{ "docs" : 
  [
    {"070": [{"_": "Magalhaes, A.C."}, {"_": "Franco, C.M."}], 
     "069": [{"_": "Paper on: <plant physiology><plant transpiration>..."}], 
     "024": [{"_": "Techniques for the measurement of transpiration of..."}], 
     "026": [{"a": "Paris", "c": "-1965", "b": "Unesco"}], 
     "030": [{"a": "p. 211-224", "b": "illus."}], 
     "044": [{"_": "Methodology of plant eco-physiology: proceedings..."}], 
     "050": [{"_": "Incl. bibl."}]
    }
  ]
}

The output of isis2json.py can be piped to curl for immediate upload to CouchDB:

$ ./isis2json.py lilacs-1-30.iso -c -f -i 2 | curl -d @- \
  -X POST http://127.0.0.1:5984/lilacs/_bulk_docs \
  -H"Content-Type: application/json"

The script has other options, such as UUID _id generation, record skipping and quantity limits (similar to OFFSET and LIMIT in MySQL).

Use the -h option to get help:

$ ./isis2json.py -h
usage: isis2json.py [-h] [-o OUTPUT.json] [-c] [-m] [-f] [-q QTY] [-s SKIP]
                    [-i TAG_NUMBER] [-u] [-n]
                    INPUT.(mst|iso)

Convert an ISIS .mst or .iso file to a JSON array

positional arguments:
  INPUT.(mst|iso)       .mst or .iso file to read

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT.json, --out OUTPUT.json
                        the file where the JSON output should be written
                        (default: write to stdout)
  -c, --couch           output array within a "docs" item in a JSON document
                        for bulk insert to CouchDB via POST to db/_bulk_docs
  -m, --mongo           output individual records as separate JSON
                        dictionaries, one per line for bulk insert to MongoDB
                        via mongoimport utility
  -f, --subfields       explode each field into a JSON dictionary, with "_" as
                        default key, and subfield markers as additional keys
  -q QTY, --qty QTY     maximum quantity of records to read (default=ALL)
  -s SKIP, --skip SKIP  records to skip from start of .mst (default=0)
  -i TAG_NUMBER, --id TAG_NUMBER
                        generate an "_id" from the given unique TAG field
                        number for each record
  -u, --uuid            generate an "_id" with a random UUID for each record
  -n, --mfn             generate an "_id" from the MFN of each record
                        (available only for .mst input)