Latest BOINC happenings

It is time to tell us what you think. We are conducting a poll to determine where the hot spots are for what needs to happen with BOINC. We welcome all kinds of feedback, the more people that respond and the better coverage we get, the more we can improve BOINC and help the projects improve their overall experience.

You can find the poll here:
http://boinc.berkeley.edu/poll.php

The results are published here:
http://boinc.berkeley.edu/poll_results.php

I turned on my TV this weekend to catch up on some of my recordings and I found this in my wait recorded queue:
Rosetta Presentation

I have my media center setup to record any of the Computer Science Colloquium from the University of Washington that comes on UWTV. It happens to be David Baker of R@H giving a presentation to the computer science students about how Rosetta works and how they use the results. He even gave BOINC a plug and discussed how R@H was changing how they do things.

We have had some nice press within the last week, here are some of the articles:
Use your computer idle time for a great cause
Putting your computer to work to fight against malaria in Africa
Coming down to Earth

BOINC Application Optimization: The Good, the Bad, and the Ugly, Part II

Here is another little misconception:
CPU Capability detection coming to a BOINC client near you soon

BOINC does not use the processor’s CPUID instruction to determine what instruction sets are supported.

Using CPUID is a good idea if you are an OS, but it is a bad idea if you are an application. There are two parts to the supported instruction set problem, one is the CPU and the other is the OS. If your OS doesn’t support the desired instruction set you are just asking for trouble.

Here is an example of what I mean. Let us take Windows 95 Gold without any patches, and a modern single core processor.

Back when Windows 95 was released MMX and 3DNow was just taking off and SSE wasn’t mainstream. MMX still used the standard 80-bit x87 floating point registers and so the OS really didn’t have to do anything new to support it. A thread context therefore only had to worry about the general purpose registers, floating point stack registers, and debug registers and everything for the thread stayed consistent when the OS changes execution to another thread.

Now enter modern processors, with the introduction of SSE new registers were added to the CPU. Registers XMM0-XMM7. If the OS doesn’t know about those registers it cannot save the registers before moving on to another application. In the worst case scenario you could have data from your favorite DVD interfering with a BOINC Science application since they’ll both be overwriting one another’s register values. E@H processing TV signals and your DVD player displaying E@H data as video artifacts.

It appears Intel and AMD created something known as enhanced protected mode which exposes the additional SSE registers otherwise they stay hidden from applications and the OS if the OS doesn’t initialize itself as enhanced protected mode aware. So if you are attempting to run an SSE application on a CPU/OS combination that doesn’t support it you should get an illegal instruction error or privileged instruction error.

Apparently AMD decided to increase the number of registers available on the AMD64 line from 8 to 16, but I don’t currently know if this is only for 64-bit OS’s or for both 32-bit and 64-bit OS’s. If there isn’t a special safe guard the OS has to follow, things could end up like the DVD player scenario I mentioned above.

So BOINC queries the OS for what instruction sets are supported, if the OS can detect it, it should support it.

On Windows, the function is called IsProcessorFeaturePresent().

On Linux, we currently read /proc/cpuinfo.

On the Mac we’ll probably be using sysctl().

—– Rom

References:
http://en.wikipedia.org/wiki/IA-32

BOINC Application Optimization: The Good, the Bad, and the Ugly

Somebody pointed out a thread to me on E@H:
http://einstein.phys.uwm.edu/forum_thread.php?id=4480

I have to say that I’m a little shocked at some of the themes in attitudes of some of the participants I’ve seen.

First let me clear up some misunderstandings about what validators and assimilators for a BOINC server cluster are supposed to do. Validators only check to make sure there is agreement between the machines who have crunched the same workunit. If all of the machines agree on what the numbers are then the results are considered valid and flagged for assimilation. Assimilators just copy the result data from the BOINC database/file system to the projects internal database for analysis. After assimilation a result finally has meaning in the context of the projects goal, prior to that it is a collection of numbers and BOINC doesn’t have a clue if they are correct or not.

Projects are free to add additional logic to their validators and assimilators to try and weed out incorrect results, but to some degree it is still just a guess. If they already know what the correct answer is then they would not have needed to send out the work to begin with.

For projects that are searching for something, their results can be broken down to into two camps, something that needs further investigation and background noise. What separates something that needs further investigation and something that is background noise? There is some value or a set of values in the result files that exceed one or more thresholds. Some thresholds may have a cap on them in which case an interesting value or set of values falls into. We can then refer to the lower and upper bound of a threshold as a threshold window. Those thresholds are typically calibrated against the default client a project sends out. Tests are run against the default client using special workunits that contain various samples of data that expose what the application is looking for so the scientists can make sure the client is working like it is supposed to.

So now the crux of the problem, changing instruction sets for an application can and will change the level of precision of the data returned back to the project.

Optimized SSE/SSE2/SSE3/3DNow applications change how the mathematical operations are performed vs. and un-optimized application. Now whether that adversely affects the project totally depends on how the project handles data types internally. If a project doesn’t release the source code or test workunits for their application then somebody optimizing the application with a disassembler or hex editor is making an assumption about how calculations are being performed and what they can do to optimize it. If they are wrong then something might be flagged as noise when it should have been flagged as needing to be investigated. What if something is missed because the thresholds are geared for a different range of values then what the optimized application is producing?

SSE/SSE2/SSE3/3DNow instruction sets use 128-bit registers while the original x87 FPU uses 80-bit registers. Now most programming languages store floating point numbers as either 32-bit single precision floats or 64-bit double precision floats. Quite a bit of the performance improvement that these new instruction sets provide comes from packing multiple numbers into a register and then performing mathematical operations on them in a matrix style fashion. So you could fit 4 single precision floats, or two double precision floats into a single 128-bit register. Depending on the instruction the result may be bounded to 32-bits, 64-bits, or 128-bits. That means in the worse case scenarios any optimized application is rounding any computation either higher or lower than the original application.

You might be thinking, why don’t projects just enlarge the threshold window so that those small rounding errors can get through. Some of them have, but others still need to investigate how using different instructions affect the system overall. A few of the science applications perform calculations on the result of previous calculations over and over again. How large would the threshold window have to be if the calculations on previous calculations happened 1,000,000 or 10,000,000 times?

Here is an example of two different Intel SSE CPU instructions (one for working on packed data, and the other one using the whole register) on the same processor producing different results:
http://softwareforums.intel.com/ISN/Community/en-US/forums/thread/5484332.aspx

Note, that was using the Intel IPP library. That is how easy rounding problems can be introduced when optimizing.

For those who are quick to say by using optimized applications I’m doing more science because I can process workunits faster, my response is:
Only if the projects backend databases and tools are equipped to deal with the differences, otherwise something might be missed. If you processed the one but sent back numbers outside the target threshold windows have you really helped the project?

Another common thing I’ve seen is; I’ve run the standard application and the optimized application across x number of workunits I’ve been assigned and they produced the same result files so the optimized application must be good in all scenarios, my response is:
What that really means is that no rounding issues occurred with the workunits you had access too. Without the test workunits a project uses internally you really don’t know if you covered all your bases.

The good news in all of this is the projects are listening and are working with the optimizers to incorporate the needed changes into the projects default application. Please be patient during the transition though, it is going to take a bit of time to double check everything and make sure it is all in working order.

In case you are curious, I do not use any optimized clients on any of my machines. To me the science applications are big black boxes, I don’t know enough about what they do under the hood to smartly make changes for the better. I’ll wait for optimization changes to be released by the projects which means that their backend systems can account for any changes to the data.

At the end of the day most of the projects are probably not concerned with the problems of verifying data that has been flagged as interesting, it is concern about missing something interesting that was flagged as background noise.

—– Rom

References:
http://en.wikipedia.org/wiki/IA-32
http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions
http://en.wikipedia.org/wiki/SSE2
http://en.wikipedia.org/wiki/SSE3
http://en.wikipedia.org/wiki/SSE4
http://en.wikipedia.org/wiki/3dnow
http://docs.sun.com/source/806-3568/ncg_goldberg.html

[08/15/2006] Adding a few more reference articles, the banking industry is still battling with rounding errors in its software.
http://www.regdeveloper.co.uk/2006/08/12/floating_point_approximation/
http://cch.loria.fr/documentation/IEEE754/

Upgraded my blogging software


I upgraded my blogging software today.


Hit a few bumps in the road, the new version wanted ASP.NET 2.0 and since it was installed onto the web server in a way that aspnet_isapi.dll wasn’t included in the web server extension list. It took me a while to figure out that it wasn’t registered as a valid extension and that is what was causing the 404 errors.


I also managed to get a hold of a Newsgator plugin for my blogging software so I can post from Outlook now.


In my stumbling around with the blogging software I discovered that my email server hadn’t been doing regular backups since April. Ouch. I have now fixed that issue too.


Hope everyone has fun this 4th of July.

Toshiba Portege M400, Windows Vista, and BOINC

My Toshiba Portege M200 notebook was in need of an upgrade. A couple weeks ago I purchased an Toshiba Portege M400 and it finally arrived on Wednesday.

After burning the recovery DVD’s I set about installing Windows Vista Beta 2 on it. After my second try I finally got things up and running right, I failed in my first attempt since I attempted to install the Toshiba HDD shock protection driver which caused the BSOD. I searched around the blogshere and found out that the Bluetooth drivers were safe to install so I did that. I also had to change the video driver that Vista picked to the ‘Intel Lakeport Graphics Controller’ in order to view Aero Glass.

So I have everything up and running except the built-in IDE RAID controller and the HDD shock protection driver.

The system has a built-in TPM module which is pretty cool and I’m going to go ahead and try to get Bitlocker running on the machine after I make a backup. What is interesting though is that windows detected and installed the fingerprint reader but hasn’t given me an option of associating a fingerprint with a user account out of the box. Even though the machine comes with a finger print reader I think I’m going to stick with my plan of using smart cards. I’m not gutsy enough for an RFID implant and the finger print readers are a little awkward for me. What would be the bomb for me is an iris scanner.

After a little fiddling around with a manual installation of BOINC it appears to run just fine with Aero Glass. Here is a picture of it.

I’m going to see if the guys at IBM think it would be a cool idea to make the background image in the Simple GUI translucent. I think that would be neat.