...
Codeblock |
---|
cp repository/* input_area
sleep 20
mpirun ...
sleep 20 |
Alternatively, the tool nocacheImage Added serves as a workaround for this issue (thanks John):
Codeblock |
---|
nocache cp repository/* input_area
mpirun ... |
Related articles
Nach Stichwort filtern (Inhalt nach Stichwort) |
---|
showLabels | false |
---|
max | 5 |
---|
spaces | PUB |
---|
showSpace | false |
---|
sort | modified |
---|
reverse | true |
---|
type | page |
---|
cql | label in ("files","invalid","format","file","huge") and type = "page" and space = "PUB" |
---|
labels | huge files invalid file format |
---|
|
Problem
In a job that requires "staging" of new huge input files (8GB in 650 files) during runtime, the job fails with error messages like "invalid file format". Inspecting the files later, does not reveal any errors and the input files are sane
Codeblock |
---|
cp repository/* input_area
mpirun ... |
It seems to be a lustre cache related problem, the startup of the parallel process is faster than lustre can sychronise itself on all nodes.
Solution
Add some delay after copying large file sets:
Codeblock |
---|
cp repository/* input_area
sleep 20
mpirun ...
sleep 20 |
Alternatively, the tool nocacheImage Added serves as a workaround for this issue (thanks John):
Codeblock |
---|
nocache cp repository/* input_area
mpirun ... |
Related articles
Nach Stichwort filtern (Inhalt nach Stichwort) |
---|
showLabels | false |
---|
max | 5 |
---|
spaces | PUB |
---|
showSpace | false |
---|
sort | modified |
---|
reverse | true |
---|
type | page |
---|
cql | label in ("files","invalid","format","file","huge") and type = "page" and space = "PUB" |
---|
labels | huge files invalid file format |
---|
|
...