Tools
For a full description of the ISIS-JSON representation, see: ISIS-JSON
ISIS and ISO-2709 to JSON export
JSON is a very common import/export format for modern semistructured databases. The isis2json.py Python/Jython script can be used to export any ISIS database to JSON.
If the input file is in ISO-2709 format (.iso), the script runs under Python or Jython. For input files in native ISIS format (.mst+.xrf), the script uses the ZeusIII Java library developed by Heitor Barbieri at BIREME, and must run under Jython 2.5 or above.
For example, the code below exports an ISO-2709 file to JSON, using some optional switches:
- -c makes the output compatible with the CouchDB bulk insert format;
- -f explodes fields into dictionaries with sub-field marker as keys;
- -q 1 outputs only one record;
$ ./isis2json.py cds.iso -c -f -q 1 > cds1.json
This is the output of the example above (indentation added for clarity):
{ "docs" : [ {"070": [{"_": "Magalhaes, A.C."}, {"_": "Franco, C.M."}], "069": [{"_": "Paper on: <plant physiology><plant transpiration>..."}], "024": [{"_": "Techniques for the measurement of transpiration of..."}], "026": [{"a": "Paris", "c": "-1965", "b": "Unesco"}], "030": [{"a": "p. 211-224", "b": "illus."}], "044": [{"_": "Methodology of plant eco-physiology: proceedings..."}], "050": [{"_": "Incl. bibl."}] } ] }
The output of isis2json.py can be piped to curl for immediate upload to CouchDB:
$ ./isis2json.py lilacs-1-30.iso -c -f -i 2 | curl -d @- \ -X POST http://127.0.0.1:5984/lilacs/_bulk_docs \ -H"Content-Type: application/json"
The script has other options, such as UUID _id generation, record skipping and quantity limits (similar to OFFSET and LIMIT in MySQL).
Use the -h option to get help:
$ ./isis2json.py -h usage: isis2json.py [-h] [-o OUTPUT.json] [-c] [-m] [-f] [-q QTY] [-s SKIP] [-i TAG_NUMBER] [-u] [-n] INPUT.(mst|iso) Convert an ISIS .mst or .iso file to a JSON array positional arguments: INPUT.(mst|iso) .mst or .iso file to read optional arguments: -h, --help show this help message and exit -o OUTPUT.json, --out OUTPUT.json the file where the JSON output should be written (default: write to stdout) -c, --couch output array within a "docs" item in a JSON document for bulk insert to CouchDB via POST to db/_bulk_docs -m, --mongo output individual records as separate JSON dictionaries, one per line for bulk insert to MongoDB via mongoimport utility -f, --subfields explode each field into a JSON dictionary, with "_" as default key, and subfield markers as additional keys -q QTY, --qty QTY maximum quantity of records to read (default=ALL) -s SKIP, --skip SKIP records to skip from start of .mst (default=0) -i TAG_NUMBER, --id TAG_NUMBER generate an "_id" from the given unique TAG field number for each record -u, --uuid generate an "_id" with a random UUID for each record -n, --mfn generate an "_id" from the MFN of each record (available only for .mst input)