Tools
For a full description of the ISIS-JSON representation, see: ISIS-JSON
ISIS and ISO-2709 to JSON export
JSON is a very common import/export format for modern semistructured databases. The isis2json.py Python/Jython script can be used to export any ISIS database to JSON.
If the input file is in ISO-2709 format (.iso), the script runs under Python or Jython. For input files in native ISIS format (.mst+.xrf), the script uses the ZeusIII Java library developed by Heitor Barbieri at BIREME, and must run under Jython 2.5 or above.
For example, the code below exports an ISO-2709 file to JSON, using some optional switches:
- -c makes the output compatible with the CouchDB bulk insert format;
- -f explodes fields into dictionaries with sub-field marker as keys;
- -q 1 outputs only one record;
$ ./isis2json.py cds.iso -c -f -q 1 > cds1.json
This is the output of the example above (indentation added for clarity):
{ "docs" :
[
{"070": [{"_": "Magalhaes, A.C."}, {"_": "Franco, C.M."}],
"069": [{"_": "Paper on: <plant physiology><plant transpiration>..."}],
"024": [{"_": "Techniques for the measurement of transpiration of..."}],
"026": [{"a": "Paris", "c": "-1965", "b": "Unesco"}],
"030": [{"a": "p. 211-224", "b": "illus."}],
"044": [{"_": "Methodology of plant eco-physiology: proceedings..."}],
"050": [{"_": "Incl. bibl."}]
}
]
}
The output of isis2json.py can be piped to curl for immediate upload to CouchDB:
$ ./isis2json.py lilacs-1-30.iso -c -f -i 2 | curl -d @- \ -X POST http://127.0.0.1:5984/lilacs/_bulk_docs \ -H"Content-Type: application/json"
The script has other options, such as UUID _id generation, record skipping and quantity limits (similar to OFFSET and LIMIT in MySQL).
Use the -h option to get help:
$ ./isis2json.py -h
usage: isis2json.py [-h] [-o OUTPUT.json] [-c] [-m] [-f] [-q QTY] [-s SKIP]
[-i TAG_NUMBER] [-u] [-n]
INPUT.(mst|iso)
Convert an ISIS .mst or .iso file to a JSON array
positional arguments:
INPUT.(mst|iso) .mst or .iso file to read
optional arguments:
-h, --help show this help message and exit
-o OUTPUT.json, --out OUTPUT.json
the file where the JSON output should be written
(default: write to stdout)
-c, --couch output array within a "docs" item in a JSON document
for bulk insert to CouchDB via POST to db/_bulk_docs
-m, --mongo output individual records as separate JSON
dictionaries, one per line for bulk insert to MongoDB
via mongoimport utility
-f, --subfields explode each field into a JSON dictionary, with "_" as
default key, and subfield markers as additional keys
-q QTY, --qty QTY maximum quantity of records to read (default=ALL)
-s SKIP, --skip SKIP records to skip from start of .mst (default=0)
-i TAG_NUMBER, --id TAG_NUMBER
generate an "_id" from the given unique TAG field
number for each record
-u, --uuid generate an "_id" with a random UUID for each record
-n, --mfn generate an "_id" from the MFN of each record
(available only for .mst input)
