BOINC Q&A — 20/10/2006

Can we get more (unlimited – well, within reason!) preferences than home, school and work? Three profiles isn’t enough for me and I’m only running a small number of computers. I know these can be overridden (although the project preferences for Rosetta (i.e. runtimes) cannot)I’d find it really useful if these profiles could be added to as required, and please can you make them renamable?!?

I believe the account manager folks are working on some features which will allow greater configuration flexibility. The BOINC client is capable of dealing with a greater number of zones, there just hasn’t been an easy way of configuring them on a project’s web site. Rytis is now at the helm of the project web site and forum features. I’m looking forward at seeing what he is going to cook up.

Also, any update on BOINC on the consoles?

Well there is a lot of buzz, but nobody has signed on the dotted lines yet. David and Eric are going to a Sony R&D center next week to meet some engineers for the PS3. I haven’t heard anything new about the XBOX 360, the XNA Game Studio from Microsoft is a bust for BOINC, it assumes all of the game code is going to be managed code on the 360. So that leaves us with the need of the same development kit as the professional game studios use.

Again moor of a request i am attached to a lot of projects and when I need to take a box out of service(without throwing away wu) I have to click “no new tasks” over 30 times. A bit tedious especially over VNC. A global (per host)no new tasks button would be of great use to me.


Is the global update ever returning? Although I can see where it can be abused.

Right now many things are on hold until after we can get the BSG out the door. Tentatively I have some time allocated to re-work the Advanced UI and playing around with Vista has inspired me on how to handle the multi-selection cases in a list view control. We shall see though.

‘Retry Communications’ is about as close as your going to get for an update all type function. It basically resets the countdown timer for any pending action.

With regards to the whole ‘-return_results_immediately’ thing, from a project perspective it is altogether evil. I’ll write up another post about that separately.

1) What are the typical things which cause the work unit to fail?
(Environmental – antivirus, graphics drivers, excessive overclocking, PC crashes, playing games for hours, video encoding, etc.
Human factors – Misunderstanding boinc messages, for example incorrect URL – they detach and attach, then get upset that x months of work is ‘down the pan’. Ditto installation of berkeley version over bbc version, easy to fix but they don’t know how)

You have nailed the majority of cases. I mean we could go off into the really obscure cases like cosmic rays and the like, but you covered all the things in the majority case.

In the future we won’t be allowing a directory name change for any software package that we build for others, so that should take care of any potential future BBC issues. Now before you all think I’m making up the whole cosmic ray thing here is an article from ZDNet about eBay suffering one to two crashes a month due to a defect in their ECC memory which left them prone to cosmic rays.

2) Is there anything which can be done to avoid these, either by the science app or by Boinc itself?
(Uploading partial results as the WU runs. Exception handlers, both at science app and callbacks at boinc? Restart from checkpoint/backup if error code 0,-107…,etc etc received? Going into hibernation if PC is very busy, out of memory, etc)

This is one of those really cool but really though questions. Each environment handles things a bit differently. About the best advice I can give is for each project to really understand how the programming language they are using interacts with the operating system they are using.

CPDN is advancing the trickle model to the point where they could resend out a workunit that has timed out and take the previous users trickles and reuse them as the starting point of the new work unit.

One thing I would like to point out is that BOINC itself cannot do anything about a science application failure except fail the workunit and move on to the next one. To BOINC each of the science applications are a little black box and the only way BOINC knows anything about what is going on inside is through a little 8k chunk of shared memory broken up into 8 channels. Simple commands are passed around in these channels like show graphics, hide graphics, and here is the amount of CPU time I’ve used.

Now exceptions, and error tracking in general, use pointers in the local address space for the science application. For BOINC to be able to track exceptions in a science application would mean that BOINC would have to act like a debugger while the science application is running which would cause a 20-30% performance decrease for all science applications, and would more than likely negate any optimizations available to an application.

We did add a little something to the BOINC API library which we internally refer to the ‘BOINC runtime debugger’. This little chunk of code is compiled into the science application and informs the OS that if any unhandled exceptions happen, it needs to execute a chunk of code. Using stackwalker as a template we expanded the functionality and improved the data returned to the project using a Microsoft library on Windows to dump out as much information about the exception as possible. This code isn’t ever executed or used unless an unhandled exception happens within an application, so no performance decrease is experienced.

I’m going to need to write a whole different article on this topic.

3) What support does Boinc have / plan to have which relate to this category of work unit specifically?
(e.g.) some ideas, many of which may be impractical –
* Separation of graphics from the work unit so that a temporary problem with the graphics drivers doesn’t cause the WU to fail

Separation of the graphics code from the worker code will probably start at the beginning of next year. It is going to be a requirement for supporting Vista and other OS’s as they increase in their defense in depth models.

* Automatic backups
* Backups which are per-workunit rather than for all workunits which happen to be running

There are other tools that can be used for backups. Frankly, trying to tackle that role is complicated and really outside the design scope for BOINC.

* Callbacks from Boinc into science app to allow the science app to handle boinc exceptions it wouldn’t normally be able to trap

What kind of exceptions do you think the science applications need to handle?

* Handling of the situation where the PC is very busy, out of memory or other resources, about to crash, TCP/IP stack blocked…)

We are adding more smarts into the CPU scheduler to handle the memory/paging cases.

Crashing is a random event, the only way you could know something is about to crash would be to already know what the bug is.

We added some code awhile back to test the various communication mechanisms when BOINC is first launched, that should have taken care of the TCP/IP blocks. If you know of any cases we haven’t covered with recent builds let me know.

how’s the progress with allowing AMS/BMS/BAM (whatever it’s called these days) to control the state of projects and WUs
such as setting NNW, or suspending a project/task?

I believe this code is in for the 5.8.x release.

Farm Managers ?
Farm Manager ability came with Account Managers, I cannot find any programs on the BOINC website to install a Farm Manager on my computer, what is it? is it working? or has it been abandoned?

A farm manager is an idea that James Drews had, I believe, that is geared towards managing hundreds of machines. Basically you setup a web server which acts as a private AMS, the BOINC client includes it’s IP address, port number, and GUI RPC password (I think) when it first connects to the farm manager. After that if you want to do something specific to a machine the farm manager can issue a GUI RPC just like the BOINC Manager. I’m not sure if anybody besides James has done anything about creating a farm manager package.

BOINCView is probably the best bet unless you come by several hundred machines.

Auto update of ‘BOINC’ ?

Funny you should ask this, WCG was asking about this very same thing. We’ll probably start looking into something like this for the 5.10 release.

We were always concerned if we had put something like that in place it might be exploited by an attack vector we never even thought of. At least with a human at the other end of the equation the amount of damage would be limited.

Now with WCG as a contributor we can get the IBM security department to look things over and let us know if something is really wrong. IBM has looked over the BOINC source once already so we are confident we have our i’s dotted and our t’s crossed but with auto-deployment of code without user intervention you can never be too careful.

I am new to BOINC and I’m loving it, but I was wondering: are any plans for BOINC to use the powerful new age GPU’s and PhysX processors that are perfect for floating point computations?

FluffyChicken Wrote:

I can answer the last one,
ATI(AMD) have asked BOINC if they would like help, though it would be the projects that would need the help if the GPU is capable. NVIDIA would probably need to jump in if your(we) are going to get it running on that, or somebody like Microsoft developes an easy to use API (Accelerator in research ?)
As for PhysX, we (some members in the forum) contacted them from Rosetta@home and had no real rosponse.
Rosetta@Home are in talks with Microsoft for the XBOX360 though, apparently.

I would just like to add that with the next release BOINC currently detects your video card and processor capabilities and reports them to the project. If/when a project commits to using a graphics card or physics accelerator we could go through with the rest of the work items to turn them into a resource that can be scheduled for use.

We added in the detection code so we could try and get the stats sites to break down video card usage and processor capabilities, maybe spur on the projects to develop specific customized applications to harness the untapped capabilities of the machines.

It is much easier to go to a project and sell them with hard numbers than to say we think this could help you by ‘x’ amount.

To submit questions for next week just click on the comments link below and submit your question.

Thanks in advance.

—– Rom

Previous Articles:

BOINC Q&A — 13/10/06
BOINC Q&A — 10/06/06
BOINC Q&A — 09/30/06
BOINC Q&A — 09/22/06

References:

http://news.zdnet.com/2100-9595_22-525403.htmlhttp://www.codeproject.com/threads/StackWalker.asp

Open Source Project Analysis and BOINC

Yesterday Scott Hanselmanblogged about an analysis tool called ‘Ohloh‘. I checked out what it had to say about BOINC.

Direct project url:
http://www.ohloh.com/projects/3215

It generates many charts and graphics about the changes it detects in the source tree over time. I have looked over quite a few things and got to heckle David a bit about some of the graphs. I really got a kick out of this chart:

To be fair though I need to point out that David checks in code using ‘davea’, ‘boincadm’, and ‘sorabji’ depending on where or when he has checked in code.

This is one chart we both got a kick out of:

How cool is that?

—– Rom

BOINC Q&A — 13/10/06

Advanced Memory Management, what is the idea/aim behind that?

Well that is a good question, the advanced memory management is more about setting boundary conditions on how much BOINC and related processes are allowed to use.

We still get a few reports of BOINC causing systems to become unresponsive or sluggish. Most of the investigations we have done revealed a machine that was paging a lot during the times BOINC was running. Paging is the process the OS uses to free up less frequently used memory to make room for active tasks by writing those pages of memory to disk. Each page of memory is roughly 4KB in size on a x86 processor.

So lets say you are running a machine with 512MB’s of memory. Windows XP uses roughly 128MB of that on boot-up and will allow parts of itself to be paged out to disk. The last round of virus scanners I looked at want around 100MB of memory, the little system tray icons in the lower right part of your screen generally take about 5MB a piece, with the notable exception of the various IM clients which have bloated out to 20-60MB a piece. Any additional programs running on your machine such as a web browser or email client can take anywhere from 20MB up to 100MB.

When the OS comes under memory pressure it starts looking for chunks of memory that haven’t been touched in awhile and writes them out to disk and then loads something into that chunk of memory that is more relevant.

So let us say that you are attached to R@H and you walk away from your computer for an hour or so, during that time R@H has used over 256MBs of memory continuously for at least 30 minutes and the OS has had to page a lot of stuff to make room for it, including itself. You start menu has to be reread from disk or whichever application you happen to be using before you left. All of that paging takes a few moments and makes your computer feel really really slow.

With the introduction of this feature we hope we can finally close one of the last remaining loopholes to user responsiveness.

Right now we have the following two settings planned:

  1. Percentage of memory use while user is active.
  2. Percentage of memory use while user is idle.

What should happen is that BOINC will detect how much memory is installed on the machine, and every 10 seconds or so looks at how much memory a science application is using. If a science application exceeds the total allotment BOINC will shut it down and look for another application to schedule.

I’m really looking forward to this feature since my 2GB machine uses about 1.2GB of memory without BOINC even running and I have four processors to feed. Up until the middle of last year I only had 1GB in my machine and if I had BOINC running it was pretty painful when BOINC rescheduled all the science applications on the machine while I was working.

Scheduler Improvents (already implemented?) how do these help ?

As far as I know John Mcleod has finished the work on the new scheduler and work-fetch policy. The new system should reduce the number of wasted cycles lost between the last checkpoint for an application and when it needed to quite due to a reschedule to honor resource shares.

John is really the wizard in this area.

How are any other improvement going to improve us? and the projects?

I believe the two major work items over the next year will probably be the inclusion of the projects to be able to use torrents in their file download process and the ability for projects to be able to send out optimized science applications for each processor type and possibly GPU enabled applications.

Is there anybody working on boinczilla? Bug reports are raising and nobody sort it out :/

My bad, I’ll see what I can do about that this weekend.

Why not run the benchmark at higher priority, so each system produces a constant value, rather than the haphazard, particular as occurring only every 5 days?

The idea behind running the benchmarks at the same priority level as the science applications is to get a rough idea how how many cycles the science applications will get. If you run the benchmarks at a normal thread priority it won’t be that much more consistent, and if you run them at the highest thread priority a user mode application can have you’ll get numbers that are not very realistic for a science application running as an idle process.

The systems are benchmarked every 5 days or so to handle changes to the environment, such as a more resource intensive virus scanner or any content indexing systems that might have been installed.

When are we going to see the first alpha/beta with the BSG?

Hopefully next week.

With regard to the idea of switching tasks at a checkpoint, what happens (as in, are there any checks etc) when an application gets “stuck” and doesn’t make any progress? This also applies to a similar situation with current apps, where they get stuck and the clint tries and tries to get it done by the aproching deadline, but obviously never will. This pushes the client into NNW and EDF. Will BOINC abandon the unit if no progress is made, or the deadline is met?

To be honest, I don’t know. I’ll have to bug John and David about that.

Is there any possibilty of releasing 5.6.4 or 5.6.5 as alternate versions?

I don’t intend to put them on the download page. But if you feel comfortable with the quality of the client that you feel you can recommend people to use it, then go ahead and give them the link. I think we were far enough along in the testing process to know it isn’t going to cause any major problems and might have only a few small bugs left before it was ready to be released.

The reason for not adding it to the download page is then people would receive a message in the message long requesting they upgrade to it. If all goes according to plan we’ll be able to release 5.8 in a few weeks, and it would be a bad experience to bug people about upgrading twice in one month.

I suspect that if somebody was experiencing a bug that is fixed in 5.6 they would be happy to start using it now and not be so annoyed when they see the upgrade notice for 5.8.

Is there any chance of a purge function being implemented?

I haven’t heard any talk of one. I’ll bring it up with David, it sounds like something a project might want.

Hot topic: Why is the hourly benchmark value between Linux and Windows different, or it’s claimed. When done with stock BOINC 5.4.9 e.g. on Windows it kicks out 8.1 per hour, when same done under Linux, it kicks out 5.0. The WU’s are processed at equal speed i.e. a job on Wondows taking 2 CPU hours would take near equal time on Linux.

It has been my experience that the Microsoft compiler has been better at optimization than the GCC compiler. I’m sure I’ll get flamed by the OSS crowd but most of the projects are experiencing the same result.

I should point out that the optimizers have been able to equal things out by a lot of trial and error by turning off and on the various optimization switches for GCC.

If the optimizers want to submit a patch that contains different non-CPU specific optimizations I’m sure we could use them.

To submit questions for next week just click on the comments link below and submit your question.

Thanks in advance.

—– Rom

BOINC 5.6 Release Canceled

Recently it was announced the 5.6 BOINC has been canceled to concentrate on the 5.8 code (as 5.7 for the time being)
Why is this and why can you not release 5.6 as is now?

The BSG (BOINC Simple GUI) is nearing completion and the 5.6 release was nearing completion but wasn’t done baking yet.

After looking over the schedules it became pretty clear that managing two different test efforts was going to create a lot of confusion and management hassles.

We believe we have stabilized most, if not all of the 5.6 features, and the remaining testing work will be focused on the BSG and improved memory management support. We believe we are a few weeks out from having a stable BSG build ready for the public, so instead of asking the community at large to do two back-to-back upgrades within a month, we decided to bag 5.6 and focus on 5.8.

I personally believe this is for the best.

If the tech savvy people want to checkout 5.6 then by all means go ahead and play around with it. We won’t be releasing any bug fix releases for that version of the client though.

—– Rom

BOINC Q&A — 10/06/06

I was wondering if you could shed some light on DNS caching, and why the BOINC client apparently keeps records for days, which would seem to ignore the TTLs associated with the records? (the recent DNS changes for Leiden would indicate this; requiring a client restart)

Actually libCurl handles all the DNS stuff. We just pass the server name to libCurl and it handles all the OS details. I took a quick peek at the libCurl source and it looks like they have an internal DNS cache. It also appears that they have a way to expire the DNS cache entries. It isn’t clear to me at the moment if we are supposed to call an API to expire DNS cache entries or if that is handled automatically as part of the easy API set.

I’ll look into it a bit more to see if I can figure it out.

To submit questions for next week just click on the comments link below and submit your question.

Thanks in advance.

—– Rom

BOINC Q&A — 09/30/06

Can you explain more on Average CPU efficiency and Result Duration Correction Factor? There seems to be some confusion about this, and little definite knowledge. For instance, some say a lower RDCF is better, others say an RDCF closer to 1.0 is best. Which is the truth?

CPU efficiency is the difference between how much CPU time a process received relative to the amount of wall clock time that has passed. It is the answer to the question of “In the last ten minutes or so, how much CPU did BOINC based science applications receive?” The thing to remember here is that the OS is constantly doing things in the background and each of those things eats a little bit of the CPU.

Duration Correction Factor is a per project value that measures the difference between the the expected time to process a result based on the benchmark verses what it actually took. A score of 1.0 means that the benchmark and the application processing time are in sync. The lower the score the greater the variance between what the benchmarks predict verse what it actually took to complete the result.

BOINC tries very hard not to ask for more work than it can actually process in a given period of time, so it tries to keep track of the machine overhead by the CPU efficiency score and Duration Correction Factor. Another thing to keep in mind is that memory speed plays a big part in the Duration Correction Factor. When you see similar processing times for a result for a 3.0Ghz processor and a 2.0Ghz processor it normally means that the 3.0Ghz processor is running with memory that cannot keep up with the processor. Or that both processors are bottlenecked with the memory speed.

We haven’t come up with a good solution for measuring the memory bandwidth problem yet. However, we are working on it.

BOINC version release notes do not seam as complete as they were before or am I looking in the incorrect places?

You can checkout the latest and greatest changes to BOINC at this web address:
http://setiathome.berkeley.edu/cgi-bin/cvsweb.cgi/boinc/

The file you’ll want to look at is ‘checkin_notes’ which contains the latest changes made to the client and sever packages.

You can see the check-in history for a specific branch by changing the tag specified near the bottom of the web page. The 5.6 branch tag is ‘boinc_core_release_5_6’. If you want to see the changes for 5.4 you would use ‘boinc_core_release_5_4’ and on it goes.

Any plans on releasing the full minutes of what went on (when your back), I read up on the 1st one but was a bit disappointed with the info on show it only gave a brief overview of what went on.

You can find the workshop proceedings here:
http://boinc.berkeley.edu/ws_06.php

How was the vacation?

I had a blast. I met a bunch of great people. I’m looking forward to going again next year.

wxWidgets 2.7 has been released. Is this going to be used in 5.6 or is it too late?

Too late for this release.

I’ve seen and myself tried to compile 5.4.x using Microsoft’s free Visual Studio Express 2005 editions (with all the bits and bobs needed, wxWidgets, SDK that needed..) Errors show up and does not compile. Is this fixed in 5.6, given this would probably be the major environment used by people trying develop BOINC under windows (since it’s free).

The BOINC DLL relies upon the ATL libraries which are not included in the express editions of the MS Development tools. I’m not sure if this is going to change in the future or not. I suspect that if we can incorporate a torrent library that doesn’t use COM/DCOM on Windows then I’ll invest more time into removing the need for ATL/COM/DCOM so that the DLL can be built with VS Express.

On a side note, I do not believe the express editions of the Visual Studio toolset contain the optimizing compilers or linkers. You might have to upgrade for those, or use the GCC toolset’s.

Would it be possible for you (since afaik you compile the final Windows releases) to put up instruction on how you compile BOINC. This may help a lot of people who just wish to dabble.

I’ll see what I can do.

How is the progress on low-latency-computing? Which projects expressed their interest in this feature?

I believe this feature was put in for a hospital who wished to be able to process MRI images faster than their current method. I’m not sure this feature will be used by a public project in the near future.

To submit questions for next week just click on the comments link below and submit your question.

Thanks in advance.

—– Rom

BOINC 2nd Pan-Galactic Workshop

This years workshop was very informative. It was really exciting to see the project leaders for some of the large projects as well as meet the project leaders for projects like PrimeGrid and Chess960@Home who started the projects as hobbies and who pay for them out of donations and out of their own pocket.

Here are a list of projects that I remember seeing right off the top of my head, sorry if I missed anybody:

  • BOINCStats
  • GridRepublic
  • S@H
  • CPDN
  • PrimeGrid
  • Chess960@Home
  • LHC@Home
  • Africa@Home
  • malariacontrol.net
  • QMC@Home
  • SIMAP
  • R@H
  • WCG
  • Condor

On the first day, David gave an overall BOINC update of everything thast has been accomplished within the last year, since the first pan-galactic workshop and various projects described what they had been up too. People also had a chance to list out what topics of interest they had for break out sessions on the second day.

We wrapped up after the first day and went out to eat dinner at a fairly nice place to eat. I had to leave early since the motel I was staying at closed their gates at 11pm.

The next day we spent the morning in breakout sessions which included some of the following topics:

  • Credits
  • Grid Integration
  • Security
  • AMS Issues
  • Project Issues
  • Science Application Feature requests

I attended the sessions on security and project issues.

I can’t wait to see the summary role up for all the break-out sessions.

After lunch, we assembled and brain stormed about how to attract new members and new projects.

The last session of the day was for David and I to listen to everybodies wishlist of features. Some of which included the server-side code for CPU feature scheduling, torrent downloads, some more work refining our sample application, and others.

All-in-all it was a great event. Now we just need to come up with a plan to implement it all.

Catch you all later. I’m still on vacation and will return to my normal schedule after the 28th.

If you all want to see additional pictures from the conference you can go here.

—– Rom

BOINC Q&A — 09/22/06

I would just like to know more info about the kind of technical challenges you will face whilst attempting to port BOINC over to them?

At this time I really don’t know. Until I can get some documentation on the supported API sets it’ll be hard to give any answer. The only thing I’m pretty sure about is that the management interface will have to be written from scratch for each console.

How is the simple GUI coming on any ETA on this?

As far as I know it is almost feature complete. The guys at WCG are probably going to kickstart the beta process with their WCG beta testers since it looks as though we’ll still be putting the finishing touches on 5.6. I believe we’ll start to see clients with the BOINC Simple GUI (BSG) in a few weeks.

Does it work well from what you have seen?

Kevin Reed has been doing an awesome job polishing up the interface. I like what I see and this will give us a way to add more to the Advanced GUI without having to worry about confusing people.

How is the sand-boxing of BOINC on Windows coming along?

I haven’t started that work item yet. I probably won’t be able to start until we have completed our agreements with WCG.

When will 5.6 be released and what are the main improvements?

The best answer I can give about when it will be released is when it is ready. Improvements include a new CPU scheduler and work-fetch policy, CPU feature detection, and video card detection. This release will be the first time the BM will use GTK2 on Linux and sandboxing science applications on the Mac.

Will we ever get column sorting in BOINC Manager?

Someday.

Talk between BOINC projects and BOINC and vise versa seems minimal (and project to project). No-one seems to know what each other is doing, could there be an improvement of talk between them, like modifications to the servers, message boards, stats and developments in the pipeline.

I believe that is what the BOINC Workshops are about. We just finished the second one last week. It was productive and the projects gave us a bunch of work to do. I believe David is going to publish the results of the workshop within a week or two. I won’t know timeframe until after I return from vacation.

Where does BOINC need help?

Hereis a list of things we have identified that we would like to have. Contact David before starting anything though in case somebody else has already started the project. That list is likely to grow after the notes of the second workshop have been processed.

Can you explain more on Average CPU efficiency and Result Duration Correction Factor? There seems to be some confusion about this, and little definite knowledge. For instance, some say a lower RDCF is better, others say an RDCF closer to 1.0 is best. Which is the truth?

I’ll have to get back to you on that. I need to look over the code again.

BOINC version release notes do not seam as complete as they were before or am I looking in the incorrect places?

We are moving toward the firefox style of release notes. For a detailed list of changes you’ll need to lookup the checkin_notes for a specific release. I’ll follow-up in another post on how to do that.

Any plans on releasing the full minutes of what went on (when your back), I read up on the 1st one but was a bit disappointed with the info on show it only gave a brief overview of what went on.

I left the note taking to David and a few others, we’ll just have to wait and see what is published. I wouldn’t have been able to type fast enough to keep up with all the discussion and I still needed to be able to answer questions.

Sorry for being late with this weeks Q@A but I’m still on vacation. 🙂

To submit questions for next week just click on the comments link below and submit your question.

Thanks in advance.

—– Rom