CheckLinks
Check Links III - Bireme/PAHO/WHO
CheckLinks is a Java program that checks a collection of id/URL pairs from an input source, verifies their connection status and then saves the results to an output.
CheckLinks has 3 flavors depending of the kind of source/destination pairs of id and URL: memory (MemoryCheckUrls), file (FileCheckUrls) and Isis database (IsisCheckUrls).
MemoryCheckUrls
Program parameters are:
-in=<inputUrls> : Memory string having id/URL pairs with the following format:
ID1|http://localhost/index1.html\nID2|http://localhost/index2.html
Note the | sign that separes id and URLS and the \n sign separing each id/URL pair
-times : number of times the program will recheck the URLs. If times > 1 then only bad URLs will be recheck. Default = 1. Optional parameter.
-wait : number of seconds the program will wait between rechecks. Default = 30 minutes. Optional parameter.
--writeContent : if this flag is presente the program will load the URL page content (max 1 mbyte size) and append it to the result, otherwise only the connection will be checked. Optional parameter.
FileCheckUrls
Program parameters are:
-in=<inputUrls> : File path having id/URL pairs with the following format (one URL per line) :
ID1|http://localhost/index1.html
ID2|http://localhost/index2.html
-out=<outputFile> : File path having the result with the following format:
a) one per line if the --writeContent flag is not set
ID|URL|CheckTime|ConnectionCode|ConnectionMessage
b) possible multiple lines if the --writeContent flag is set
ID|URL|CheckTime|ConnectionCode|ConnectionMessage|content
Both good and bad URLs are stored in the same output file. If a separation of them is required then instead of -out parameter, the following two parameters should be used.
-outGood=<goodOutFile> : File path having only good output results
-outBad=<badOutFile> : File path having only bad output results
-times : number of times the program will recheck the URLs. If times > 1 then only bad URLs will be recheck. Default = 1. Optional parameter.
-encoding : type of character encoding used by the input file. Default : the encoding used by the computer terminal executing this program. Optional parameter.
-wait : number of seconds the program will wait between rechecks. Default = 30 minutes. Optional parameter.
--writeContent : if this flag is presente the program will load the URL page content (max 1 mbyte size) and append it to the result, otherwise only the connection will be checked. Optional parameter.
IsisCheckUrls
Program parameters are:
-in=<inputUrls> : Isis FFI data base having input id/URL pairs with the following format :
Field tag = 1 - id
Field tag = 2 - URL
-out=<outputFile> : Output Isis FFI database having the results with the following format (1 result per record):
Field tag = 1 - id
Field tag = 2 - URL
Field tag = 3 - connection message
Field tag = 4 - connection date
Field tag = 5 - connection code
Field tag = 6 - URL page content
-times : number of times the program will recheck the URLs. If times > 1 then only bad URLs will be recheck. Default = 1. Optional parameter.
-encoding : type of character encoding used by the input database. Default = ISO8859-1. Optional parameter.
-wait : number of seconds the program will wait between rechecks. Default = 30 minutes. Optional parameter.
--writeContent : if this flag is presente the program will load the URL page content (max 1 mbyte size) and append it to the result, otherwise only the connection will be checked. Optional parameter.
Download:
CheckLinks files can be downloaded here
To download the source code use the following command: svn co http://trac.reddes.bvsalud.org/checklinks/browser/trunk/CheckLinksIII
Observations:
1) CheckLinksIII requires java 1.6 or any latter version. (see http://java.sun.com/javase/downloads/index.jsp)
2) MemoryCheckUrls, IsisCheckUrls and FileCheckUrls command line programs require that you quote parameter when they have = sign. For example:
Linux:
$JAVA_HOME/bin/java -cp dist/CheckLinksIII.jar:dist/lib/zeusIII.jar: dist/lib/Utils.jar:dist/lib/commons-codec-1.3.jar: dist/lib/commons-httpclient-3.1.jar: dist/lib/commons-logging-1.1.1.jar br.bireme.checkLinks.FileCheckUrls "-in=fin.txt" "-out=fout.txt" $JAVA_HOME/bin/java -cp dist/CheckLinksIII.jar:dist/lib/zeusIII.jar: dist/lib/Utils.jar:dist/lib/commons-codec-1.3.jar: dist/lib/commons-httpclient-3.1.jar: dist/lib/commons-logging-1.1.1.jar br.bireme.checkLinks.IsisCheckUrls "-in=dbin" "-out=dbout" $JAVA_HOME/bin/java -cp dist/CheckLinksIII.jar:dist/lib/zeusIII.jar: dist/lib/Utils.jar:dist/lib/commons-codec-1.3.jar: dist/lib/commons-httpclient-3.1.jar: dist/lib/commons-logging-1.1.1.jar br.bireme.checkLinks.MemoryCheckUrls "-in=ID1|http://localhost/index.html"
Windows:
$JAVA_HOME/bin/java -cp dist/CheckLinksIII.jar;dist/lib/zeusIII.jar; dist/lib/Utils.jar;dist/lib/commons-codec-1.3.jar; dist/lib/commons-httpclient-3.1.jar; dist/lib/commons-logging-1.1.1.jar br.bireme.checkLinks.FileCheckUrls "-in=fin.txt" "-out=fout.txt" $JAVA_HOME/bin/java -cp dist/CheckLinksIII.jar;dist/lib/zeusIII.jar; dist/lib/Utils.jar;dist/lib/commons-codec-1.3.jar; dist/lib/commons-httpclient-3.1.jar; dist/lib/commons-logging-1.1.1.jar br.bireme.checkLinks.IsisCheckUrls "-in=dbin" "-out=dbout" $JAVA_HOME/bin/java -cp dist/CheckLinksIII.jar;dist/lib/zeusIII.jar; dist/lib/Utils.jar;dist/lib/commons-codec-1.3.jar; dist/lib/commons-httpclient-3.1.jar; dist/lib/commons-logging-1.1.1.jar br.bireme.checkLinks.MemoryCheckUrls "-in=ID1|http://localhost/index.html"
3) To use IsisCheckUrls, the *.jar libraries should be added to the class path. (-cp parameter, see example in previous line).
4) If the --writeContent is used, the java parameters -Xmx128m -Xms128m should be used in the program call:
Linux:
$JAVA_HOME/bin/java -Xmx128m -Xms128m -cp dist/CheckLinksIII.jar:dist/lib/zeusIII.jar:dist/lib/Utils.jar: dist/lib/commons-codec-1.3.jar:dist/lib/commons-httpclient-3.1.jar: dist/lib/commons-logging-1.1.1.jar br.bireme.checkLinks.IsisCheckUrls "-in=dbin" "-out=dbout" --writeContentWindows:
$JAVA_HOME/bin/java -Xmx128m -Xms128m -cp dist/CheckLinksIII.jar;dist/lib/zeusIII.jar;dist/lib/Utils.jar; dist/lib/commons-codec-1.3.jar;dist/lib/commons-httpclient-3.1.jar; dist/lib/commons-logging-1.1.1.jar br.bireme.checkLinks.IsisCheckUrls "-in=dbin" "-out=dbout" --writeContent