BOINC science applications and random crashes (Update)


Results on RALPH@Home which is R@H’s alpha project have been very promising.

To give an idea about how large this problem was for R@H I guess I need to provide some numbers. So here goes:

R@H receives roughly 115k results a day.

Roughly there are 16k failures a day.

Of those 16k failures a day, 5.5k fell under the ERR_NESTED_UNHANDLED_EXCEPTION_DETECTED and 0xc0000005 banner. Those are the two error codes used when something really really really bad has happened on Windows. There are another 1.5k errors that have cryptic Windows error codes which may or may not be related.

Now how does this translate to RALPH@Home? Well if you work under the assumption that RALPH@home is a mini R@H, then the percentages should be roughly the same.

That said, sure enough RALPH@Home had roughly the same breakdown of errors that the public project had. Here are some rough stats for RALPH@Home:

RALPH@Home receives roughly 1k results a day.

Before 4.93 was released for Beta the failure rate was 150 or so a day.

Now with 4.93 in the mix it has dropped to 100 or so a day.

Keep in mind that the Mac and Linux clients have not been updated yet and so there error rates remain unchanged.

RALPH@Home went from a 25% failure rate down to a 12% failure rate. Now if you remove the results from Linux and the Mac the failure rate for the Windows client is floating at 5%.

I’ll include the current error rates in the public project and RALPH@Home below.

Now I’m on to the next biggest problem which has been deemed the ‘1% bug’.

For those who noticed the error code 1 in the charts below, that error code is given when Rosetta could not find something in one of the pre-staged files downloaded to your machine or when the application felt something really bad has happened and it couldn’t continue. With 4.82 that actual error data was being written to a different log file than the one BOINC sends back to the server. Starting with 4.94 the reason for the application quitting will be logged and sent back to the server in a way that can be easily tracked and fixed without having to write the workunit names in the forums.

—– Rom

Public Project Results:
















































































































































































































482


Darwin


-197 (0xffffff3b) ERR_ABORTED_VIA_GUI


5


482


Darwin


-186 (0xffffff46) ERR_RESULT_DOWNLOAD


3


482


Darwin


-185 (0xffffff47) ERR_RESULT_START


83


482


Darwin


1 Unknown error number


10


482


Darwin


4 Unknown error number


135


482


Darwin


5 Unknown error number


9


482


Darwin


6 Unknown error number


1


482


Darwin


131 (0x83) Unknown error number


26


482


Windows


-2147483641 (0x80000007) Unknown error number


18


482


Windows


-1073741819 (0xc0000005) Unknown error number


1797


482


Windows


-1073741811 (0xc000000d) Unknown error number


880


482


Windows


-1073741795 (0xc000001d) Unknown error number


2


482


Windows


-1073741674 (0xc0000096) Unknown error number


4


482


Windows


-1073741571 (0xc00000fd) Unknown error number


63


482


Windows


-1073741515 (0xc0000135) Unknown error number


2


482


Windows


-1073741502 (0xc0000142) Unknown error number


336


482


Windows


-1073740972 (0xc0000354) Unknown error number


2


482


Windows


-529697949 (0xe06d7363) Unknown error number


226


482


Windows


-197 (0xffffff3b) ERR_ABORTED_VIA_GUI


466


482


Windows


-187 (0xffffff45) ERR_RESULT_UPLOAD


3


482


Windows


-186 (0xffffff46) ERR_RESULT_DOWNLOAD


316


482


Windows


-185 (0xffffff47) ERR_RESULT_START


248


482


Windows


-177 (0xffffff4f) ERR_RSC_LIMIT_EXCEEDED


49


482


Windows


-164 (0xffffff5c) ERR_NESTED_UNHANDLED_EXCEPTION_DETECTED


3761


482


Windows


-1 (0xffffffff) Unknown error number


4


482


Windows


0


18


482


Windows


1 Unknown error number


1004


482


Windows


3 Unknown error number


52


482


Windows


128 (0x80) Unknown error number


7


482


Windows


1073807364 (0x40010004) Unknown error number


23


481


Linux


-197 (0xffffff3b) ERR_ABORTED_VIA_GUI


7


481


Linux


-186 (0xffffff46) ERR_RESULT_DOWNLOAD


15


481


Linux


-185 (0xffffff47) ERR_RESULT_START


4


481


Linux


0


1


481


Linux


1 Unknown error number


221


481


Linux


11 (0xb) Unknown error number


25


481


Linux


26 (0x1a) Unknown error number


2


481


Linux


131 (0x83) Unknown error number


144


481


Windows


-2147483645 (0x80000003) Unknown error number


1


481


Windows


-197 (0xffffff3b) ERR_ABORTED_VIA_GUI


3




Total


9976

RALPH@Home Results:
























































































493


Windows


-1073741819 (0xffffffffc0000005) Unknown error number


4


493


Windows


-1073741811 (0xffffffffc000000d) Unknown error number


19


493


Windows


-1073741678 (0xffffffffc0000092) Unknown error number


1


493


Windows


-529697949 (0xffffffffe06d7363) Unknown error number


5


493


Windows


-197 (0xffffffffffffff3b) ERR_ABORTED_VIA_GUI


5


493


Windows


-186 (0xffffffffffffff46) ERR_RESULT_DOWNLOAD


5


493


Windows


0


2


493


Windows


1 Unknown error number


5


493


Windows


3 Unknown error number


1


492


Windows


-1073741819 (0xffffffffc0000005) Unknown error number


3


491


Windows


-197 (0xffffffffffffff3b) ERR_ABORTED_VIA_GUI


1


485


Darwin


-185 (0xffffffffffffff47) ERR_RESULT_START


22


485


Darwin


4 Unknown error number


6


485


Darwin


131 (0x83) Unknown error number


1


484


Linux


11 (0xb) Unknown error number


3


484


Linux


131 (0x83) Unknown error number


6




Total


89