Gridget > Sun Grid engine tight integration for Intel mpi - Intel® Software ...

[Intel Software Network forums feed] However the MPDs that got started are totally independent precesses and are not forked children of SGE.  The problem comes when i type qdelete or try to delete my job or kill it as it is running.  At this point SGE will killl all its forked children.  But it know nothing about the MPD deamos.  As a result after SGE deletes, kills, and cleans up my job I still have this running around on all the nodes that ran the mpi job:

Previous [Previous] 10 Ways To Donate Your CPU Time To Science...

Next [Next] David Pallmann's Blog: Grid Computing on the Azure Cloud C...

Some related posts from Technorati and Google.

[Condor Project News] 8.3 Stable Release Series 6.8: Fixed a bug in the handling of local universe jobsfor a very busy condor_ schedd daemon.When a local universe job completed, the condor_ starter might notbe able to connect to the condor_ schedd daemon to update final informationabout the job, such as the exit status.Under this circumstance,the condor_ starter would hang indefinitely.The bug is fixed by having the condor_ starter attemptto retry a few times (with a delay in between each attempt) beforeexiting with a fatal error.The fatal error causes the job to restart.

Reflected tags on Technorati: Blog, ,