From | H.M. Gladney |
Date | Tue, 15 May 2007 08:35:06 -0700 |
Subject | [greenstone-users] FW: GLI create stalls and crashes |
Conjecture: the problem,
variously observed as a stall of the GLI application, a crash of the GLI application, or a freeze of the entire Linux system, might be the result of what
some people call a "memory leak".
The symptoms that follow
are in two batches. (1) to (9) describe what happened a few days ago when
I suspected that the problem might have to do with specific (kinds of) files
being imported by GLI. At that time I was trying to create again after
dropping the suspect files from the input list. The symptoms from (10) on
occurred later when I attempted to create the whole collection incrementally by
gathering about 8000 files, doing build "create" each time as a minimal rebuild,
and repeating that step.
I have repeatedly encountered a stall during Greenstone
2.72 (under Kubuntu 7.04) attempting "Build Collection". Circumstances
that might or might not be pertinent are:
(1) The input collection is moderatly large (>40,000 files in a directory tree with >1500 subdirectories).
(2) There are many instances of file copies in the
collection (in different subtrees, of course.) The failure has occurred
only with the last file (being) processed belonging to a set of
duplicates.
(3) In a test subset of the collection I am trying to
create, when I encountered such a stall failure, I deleted all copy instances of
the file last being processed, and retried. In the next test, the stall
recurred at a different file. All these file sets had names starting with "IPDxxx"
(where "x" indicates a numeral). Some of the original files were HTML
files; others were PDF files. (A screen image file is attached
FYI.)
(4) In some of the approx. 5 tests I made, a temporary
file "IPD....text" was created. It seems to be an unformatted image of the
corresponding HTML and PDF files.
(5) When the stall occurred, the CPU continued to run
at full tilt. "Cancel" worked on the Build Collection screen.
Nevertheless, the CPU continued to be fully busy. Terminating the GLI
session by killing the invoking Konsole session stopped the CPU time
consumption.
(6) More than 2700 files were created by the "Build
Collection" execution. Most of these are 4096-byte
files named "HASHxxxx" in a ".../archives/" subfolder.
(7) About half of the 56 files in the ".../tmp" folder
created by the Build Collection execution are "IPDxxx" files with duplicate
instances in the input collection. Other entries in this folder are not
associated with duplicates.
(8) In another test, I removed all the "IPDxxx"
files. The next stall occurred on "incl.item", a file without
duplicates. ==> duplicate files are not the (only)
problem.
(9) In each test, the progress meter hung at 3%.
In fact, the progress meter sat at 3% for almost all of the ~90 minutes each
test took to execute. Q: what does the progress meter
measure?
(10) During the
incremental collection build attempt, as long as the number of ingested files
was less than about 25,000, no failures were observed. Later in the
gather/build cycle, minimal build steps ended prematurely with an application
crash or a complete system crash.
(11) Sometimes when I
attempted restarting with a minimal rebuild, I received a message "not required
because you have not added to the collection". However, when I
gathered a single additional file and asked for minimal rebuild, many
files were treated. (I.e., the "not added" message was a
fib.)
(12) Each of these
partial executions takes from 15 minutes to several
hours.
Obviously, these
problems make building a collection of moderate size quite painful, especially
when they are compounded by my own occasional errors. What is going
wrong? What can I do to evade what seems to be a Greenstone
bug? Cheerio, Henry
<<attachment>> Type: application/octet-stream Filename: hs_err_pid5660.log <<attachment>> Type: application/octet-stream Filename: hs_err_pid5664.log <<attachment>> Type: image/jpeg Filename: snap4.jpg |