BOINC Q&A — 20/10/2006

Can we get more (unlimited – well, within reason!) preferences than home, school and work? Three profiles isn’t enough for me and I’m only running a small number of computers. I know these can be overridden (although the project preferences for Rosetta (i.e. runtimes) cannot)I’d find it really useful if these profiles could be added to as required, and please can you make them renamable?!?

I believe the account manager folks are working on some features which will allow greater configuration flexibility. The BOINC client is capable of dealing with a greater number of zones, there just hasn’t been an easy way of configuring them on a project’s web site. Rytis is now at the helm of the project web site and forum features. I’m looking forward at seeing what he is going to cook up.

Also, any update on BOINC on the consoles?

Well there is a lot of buzz, but nobody has signed on the dotted lines yet. David and Eric are going to a Sony R&D center next week to meet some engineers for the PS3. I haven’t heard anything new about the XBOX 360, the XNA Game Studio from Microsoft is a bust for BOINC, it assumes all of the game code is going to be managed code on the 360. So that leaves us with the need of the same development kit as the professional game studios use.

Again moor of a request i am attached to a lot of projects and when I need to take a box out of service(without throwing away wu) I have to click “no new tasks” over 30 times. A bit tedious especially over VNC. A global (per host)no new tasks button would be of great use to me.

Is the global update ever returning? Although I can see where it can be abused.

Right now many things are on hold until after we can get the BSG out the door. Tentatively I have some time allocated to re-work the Advanced UI and playing around with Vista has inspired me on how to handle the multi-selection cases in a list view control. We shall see though.

‘Retry Communications’ is about as close as your going to get for an update all type function. It basically resets the countdown timer for any pending action.

With regards to the whole ‘-return_results_immediately’ thing, from a project perspective it is altogether evil. I’ll write up another post about that separately.

1) What are the typical things which cause the work unit to fail?
(Environmental – antivirus, graphics drivers, excessive overclocking, PC crashes, playing games for hours, video encoding, etc.
Human factors – Misunderstanding boinc messages, for example incorrect URL – they detach and attach, then get upset that x months of work is ‘down the pan’. Ditto installation of berkeley version over bbc version, easy to fix but they don’t know how)

You have nailed the majority of cases. I mean we could go off into the really obscure cases like cosmic rays and the like, but you covered all the things in the majority case.

In the future we won’t be allowing a directory name change for any software package that we build for others, so that should take care of any potential future BBC issues. Now before you all think I’m making up the whole cosmic ray thing here is an article from ZDNet about eBay suffering one to two crashes a month due to a defect in their ECC memory which left them prone to cosmic rays.

2) Is there anything which can be done to avoid these, either by the science app or by Boinc itself?
(Uploading partial results as the WU runs. Exception handlers, both at science app and callbacks at boinc? Restart from checkpoint/backup if error code 0,-107…,etc etc received? Going into hibernation if PC is very busy, out of memory, etc)

This is one of those really cool but really though questions. Each environment handles things a bit differently. About the best advice I can give is for each project to really understand how the programming language they are using interacts with the operating system they are using.

CPDN is advancing the trickle model to the point where they could resend out a workunit that has timed out and take the previous users trickles and reuse them as the starting point of the new work unit.

One thing I would like to point out is that BOINC itself cannot do anything about a science application failure except fail the workunit and move on to the next one. To BOINC each of the science applications are a little black box and the only way BOINC knows anything about what is going on inside is through a little 8k chunk of shared memory broken up into 8 channels. Simple commands are passed around in these channels like show graphics, hide graphics, and here is the amount of CPU time I’ve used.

Now exceptions, and error tracking in general, use pointers in the local address space for the science application. For BOINC to be able to track exceptions in a science application would mean that BOINC would have to act like a debugger while the science application is running which would cause a 20-30% performance decrease for all science applications, and would more than likely negate any optimizations available to an application.

We did add a little something to the BOINC API library which we internally refer to the ‘BOINC runtime debugger’. This little chunk of code is compiled into the science application and informs the OS that if any unhandled exceptions happen, it needs to execute a chunk of code. Using stackwalker as a template we expanded the functionality and improved the data returned to the project using a Microsoft library on Windows to dump out as much information about the exception as possible. This code isn’t ever executed or used unless an unhandled exception happens within an application, so no performance decrease is experienced.

I’m going to need to write a whole different article on this topic.

3) What support does Boinc have / plan to have which relate to this category of work unit specifically?
(e.g.) some ideas, many of which may be impractical –
* Separation of graphics from the work unit so that a temporary problem with the graphics drivers doesn’t cause the WU to fail

Separation of the graphics code from the worker code will probably start at the beginning of next year. It is going to be a requirement for supporting Vista and other OS’s as they increase in their defense in depth models.

* Automatic backups
* Backups which are per-workunit rather than for all workunits which happen to be running

There are other tools that can be used for backups. Frankly, trying to tackle that role is complicated and really outside the design scope for BOINC.

* Callbacks from Boinc into science app to allow the science app to handle boinc exceptions it wouldn’t normally be able to trap

What kind of exceptions do you think the science applications need to handle?

* Handling of the situation where the PC is very busy, out of memory or other resources, about to crash, TCP/IP stack blocked…)

We are adding more smarts into the CPU scheduler to handle the memory/paging cases.

Crashing is a random event, the only way you could know something is about to crash would be to already know what the bug is.

We added some code awhile back to test the various communication mechanisms when BOINC is first launched, that should have taken care of the TCP/IP blocks. If you know of any cases we haven’t covered with recent builds let me know.

how’s the progress with allowing AMS/BMS/BAM (whatever it’s called these days) to control the state of projects and WUs
such as setting NNW, or suspending a project/task?

I believe this code is in for the 5.8.x release.

Farm Managers ?
Farm Manager ability came with Account Managers, I cannot find any programs on the BOINC website to install a Farm Manager on my computer, what is it? is it working? or has it been abandoned?

A farm manager is an idea that James Drews had, I believe, that is geared towards managing hundreds of machines. Basically you setup a web server which acts as a private AMS, the BOINC client includes it’s IP address, port number, and GUI RPC password (I think) when it first connects to the farm manager. After that if you want to do something specific to a machine the farm manager can issue a GUI RPC just like the BOINC Manager. I’m not sure if anybody besides James has done anything about creating a farm manager package.

BOINCView is probably the best bet unless you come by several hundred machines.

Auto update of ‘BOINC’ ?

Funny you should ask this, WCG was asking about this very same thing. We’ll probably start looking into something like this for the 5.10 release.

We were always concerned if we had put something like that in place it might be exploited by an attack vector we never even thought of. At least with a human at the other end of the equation the amount of damage would be limited.

Now with WCG as a contributor we can get the IBM security department to look things over and let us know if something is really wrong. IBM has looked over the BOINC source once already so we are confident we have our i’s dotted and our t’s crossed but with auto-deployment of code without user intervention you can never be too careful.

I am new to BOINC and I’m loving it, but I was wondering: are any plans for BOINC to use the powerful new age GPU’s and PhysX processors that are perfect for floating point computations?

FluffyChicken Wrote:

I can answer the last one,
ATI(AMD) have asked BOINC if they would like help, though it would be the projects that would need the help if the GPU is capable. NVIDIA would probably need to jump in if your(we) are going to get it running on that, or somebody like Microsoft developes an easy to use API (Accelerator in research ?)
As for PhysX, we (some members in the forum) contacted them from Rosetta@home and had no real rosponse.
Rosetta@Home are in talks with Microsoft for the XBOX360 though, apparently.

I would just like to add that with the next release BOINC currently detects your video card and processor capabilities and reports them to the project. If/when a project commits to using a graphics card or physics accelerator we could go through with the rest of the work items to turn them into a resource that can be scheduled for use.

We added in the detection code so we could try and get the stats sites to break down video card usage and processor capabilities, maybe spur on the projects to develop specific customized applications to harness the untapped capabilities of the machines.

It is much easier to go to a project and sell them with hard numbers than to say we think this could help you by ‘x’ amount.

To submit questions for next week just click on the comments link below and submit your question.

Thanks in advance.

—– Rom

Previous Articles:

BOINC Q&A — 13/10/06
BOINC Q&A — 10/06/06
BOINC Q&A — 09/30/06
BOINC Q&A — 09/22/06