How to create an inverted file

This tutorial explains how to create an inverted file to a database using IsisNBP.

Just start the IsisNBP's console and following next steps:

$ pymx -i 

Open any database

In [1]: db = sample.cds 

The function to invert a database is invertdb(), just call <database>.invertdb() to use default parameters.

In [2]: db.invertdb()
150/150 

When the prompt returns it was finished.

To show all parameters for the invertdb function, do:

In [3]: print db.invertdb.__doc__

Generates inverted file for current database 

    Parameters: 

     expr:          formatting language expression (default=""),
     extraction_id: extraction id (default=1), 
     technique:     technique (default=0), 
     filename:      inverted filename(default=/basepath/databasename.idx), 
     fst:           fst filename (default=None),
     mfnexpr:       formatting expression to extract mfn (IT1000-1008) 
     callback:      notify function (default=None), 
                    ex: 
                     def cb(total,current): 
                             print '%s/%s\r'%(current,total), 

To use stopwords just create a file with database name and .stw extension in the same database directory. ex: cds.stw


How to display terms of the inverted files

The listkeys() function reads the IsisNBP's inverted file and print all terms. This function don't retrieve all records to the memory, but an iterator object and it's necessary use a next() function to get a new value.

In [4]: terms = db.listkeys() 
In [5]: print terms.next() 
AR1.1 
In [6]: print terms.next()
AR144.1


It's necessary to close the iterator object to terminate iteration.

In [7]: terms.close() 

To list all terms of the inverted file:

In [8]: for term in db.listkeys(): 
            print term 
AR1.1
AR144.1
AR15.1
AR175.1
AR194.1
...

The parameter postings (listkey() function) permits display additional information:

In [9]: terms = db.listkeys(postings=True)

In [10]: terms.next()

('AR1.1',
 [mfn:63
  extraction_id:805
  occ:1
  offset:1
  technique:0
  field:805])


How to use Search function

The search function returns record to specific term of the inverted file.

In [11]: rec = db.search('AR1.1').next()

In [12]: print rec
MFN: 63
835:AR
805:AR1.1
840:4805-3415
810:Biblioteca Rafael Herrera Vegas
810:Academia Nacional de Medicina de Buenos Aires
690:Biblioteca Rafael Herrera Vegas
815:J. Andrés Pacheco de Melo 3081 - 1º piso
850:4803-9475
820:Ciudad Autónoma de Buenos Aires
855:acabiblio@netizen.com.ar
855:mdistefano@netizen.com.ar
855:mgsevi@yahoo.com.ar
855:pcboan@netizen.com.ar
860:^aC1425AUM
830:Buenos Aires

In [13]: print(rec.format("('-',v810/)"))
-Biblioteca Rafael Herrera Vegas
-Academia Nacional de Medicina de Buenos Aires