CheckLinks


Check Links III - Bireme/PAHO/WHO


CheckLinks is a Java program that checks a collection of id/URL pairs from an input source, verifies their connection status and then saves the results to an output.

CheckLinks has 3 flavors depending of the kind of source/destination pairs of id and URL: memory (MemoryCheckUrls), file (FileCheckUrls) and Isis database (IsisCheckUrls).



MemoryCheckUrls

Program parameters are:

-in=<inputUrls> : Memory string having id/URL pairs with the following format:

ID1|http://localhost/index1.html\nID2|http://localhost/index2.html

Note the | sign that separes id and URLS and the \n sign separing each id/URL pair

-times : number of times the program will recheck the URLs. If times > 1 then only bad URLs will be recheck. Default = 1. Optional parameter.

-wait : number of seconds the program will wait between rechecks. Default = 30 minutes. Optional parameter.

--writeContent : if this flag is presente the program will load the URL page content (max 1 mbyte size) and append it to the result, otherwise only the connection will be checked. Optional parameter.



FileCheckUrls

Program parameters are:

-in=<inputUrls> : File path having id/URL pairs with the following format (one URL per line) :

ID1|http://localhost/index1.html
ID2|http://localhost/index2.html

-out=<outputFile> : File path having the result with the following format:

a) one per line if the --writeContent flag is not set

ID|URL|CheckTime|ConnectionCode|ConnectionMessage

b) possible multiple lines if the --writeContent flag is set

ID|URL|CheckTime|ConnectionCode|ConnectionMessage|content

Both good and bad URLs are stored in the same output file. If a separation of them is required then instead of -out parameter, the following two parameters should be used.

-outGood=<goodOutFile> : File path having only good output results

-outBad=<badOutFile> : File path having only bad output results

-times : number of times the program will recheck the URLs. If times > 1 then only bad URLs will be recheck. Default = 1. Optional parameter.

-encoding : type of character encoding used by the input file. Default : the encoding used by the computer terminal executing this program. Optional parameter.

-wait : number of seconds the program will wait between rechecks. Default = 30 minutes. Optional parameter.

--writeContent : if this flag is presente the program will load the URL page content (max 1 mbyte size) and append it to the result, otherwise only the connection will be checked. Optional parameter.



IsisCheckUrls

Program parameters are:

-in=<inputUrls> : Isis FFI data base having input id/URL pairs with the following format :

Field tag = 1 - id
Field tag = 2 - URL

-out=<outputFile> : Output Isis FFI database having the results with the following format (1 result per record):

Field tag = 1 - id
Field tag = 2 - URL
Field tag = 3 - connection message
Field tag = 4 - connection date
Field tag = 5 - connection code
Field tag = 6 - URL page content

-times : number of times the program will recheck the URLs. If times > 1 then only bad URLs will be recheck. Default = 1. Optional parameter.

-encoding : type of character encoding used by the input database. Default = ISO8859-1. Optional parameter.

-wait : number of seconds the program will wait between rechecks. Default = 30 minutes. Optional parameter.

--writeContent : if this flag is presente the program will load the URL page content (max 1 mbyte size) and append it to the result, otherwise only the connection will be checked. Optional parameter.



Download:

CheckLinks files can be downloaded here

To download the source code use the following command: svn co http://trac.reddes.bvsalud.org/checklinks/browser/trunk/CheckLinksIII



Observations:

1) CheckLinksIII requires java 1.6 or any latter version. (see http://java.sun.com/javase/downloads/index.jsp)

2) MemoryCheckUrls, IsisCheckUrls and FileCheckUrls command line programs require that you quote parameter when they have = sign. For example:

Linux:

    $JAVA_HOME/bin/java -cp dist/CheckLinksIII.jar:dist/lib/zeusIII.jar:
                            dist/lib/Utils.jar:dist/lib/commons-codec-1.3.jar:
                            dist/lib/commons-httpclient-3.1.jar:
                            dist/lib/commons-logging-1.1.1.jar
                   br.bireme.checkLinks.FileCheckUrls "-in=fin.txt" "-out=fout.txt"
    
    $JAVA_HOME/bin/java -cp dist/CheckLinksIII.jar:dist/lib/zeusIII.jar:
                            dist/lib/Utils.jar:dist/lib/commons-codec-1.3.jar:
                            dist/lib/commons-httpclient-3.1.jar:
                            dist/lib/commons-logging-1.1.1.jar
                   br.bireme.checkLinks.IsisCheckUrls "-in=dbin" "-out=dbout"

    $JAVA_HOME/bin/java -cp dist/CheckLinksIII.jar:dist/lib/zeusIII.jar:
                            dist/lib/Utils.jar:dist/lib/commons-codec-1.3.jar:
                            dist/lib/commons-httpclient-3.1.jar:
                            dist/lib/commons-logging-1.1.1.jar
                   br.bireme.checkLinks.MemoryCheckUrls "-in=ID1|http://localhost/index.html"

Windows:

    $JAVA_HOME/bin/java -cp dist/CheckLinksIII.jar;dist/lib/zeusIII.jar;
                            dist/lib/Utils.jar;dist/lib/commons-codec-1.3.jar;
                            dist/lib/commons-httpclient-3.1.jar;
                            dist/lib/commons-logging-1.1.1.jar
                   br.bireme.checkLinks.FileCheckUrls "-in=fin.txt" "-out=fout.txt"

    $JAVA_HOME/bin/java -cp dist/CheckLinksIII.jar;dist/lib/zeusIII.jar;
                            dist/lib/Utils.jar;dist/lib/commons-codec-1.3.jar;
                            dist/lib/commons-httpclient-3.1.jar;
                            dist/lib/commons-logging-1.1.1.jar
                   br.bireme.checkLinks.IsisCheckUrls "-in=dbin" "-out=dbout"

    $JAVA_HOME/bin/java -cp dist/CheckLinksIII.jar;dist/lib/zeusIII.jar;
                            dist/lib/Utils.jar;dist/lib/commons-codec-1.3.jar;
                            dist/lib/commons-httpclient-3.1.jar;
                            dist/lib/commons-logging-1.1.1.jar
                   br.bireme.checkLinks.MemoryCheckUrls "-in=ID1|http://localhost/index.html"

3) To use IsisCheckUrls, the *.jar libraries should be added to the class path. (-cp parameter, see example in previous line).

4) If the --writeContent is used, the java parameters -Xmx128m -Xms128m should be used in the program call:

Linux:

     $JAVA_HOME/bin/java -Xmx128m -Xms128m
         -cp dist/CheckLinksIII.jar:dist/lib/zeusIII.jar:dist/lib/Utils.jar:
             dist/lib/commons-codec-1.3.jar:dist/lib/commons-httpclient-3.1.jar:
             dist/lib/commons-logging-1.1.1.jar
         br.bireme.checkLinks.IsisCheckUrls "-in=dbin" "-out=dbout" --writeContent

Windows:

     $JAVA_HOME/bin/java -Xmx128m -Xms128m
         -cp dist/CheckLinksIII.jar;dist/lib/zeusIII.jar;dist/lib/Utils.jar;
             dist/lib/commons-codec-1.3.jar;dist/lib/commons-httpclient-3.1.jar;
             dist/lib/commons-logging-1.1.1.jar
         br.bireme.checkLinks.IsisCheckUrls "-in=dbin" "-out=dbout" --writeContent