NumberFields: GPU versus CPU

Post by **StefanR5R** » Sun Jan 03, 2021 2:03 pm

On January 1, Eric Driver wrote: I just released the new app version for Linux Nvidia Cuda.

I tested it on 100 WUs and it averaged about 33% faster over the previous Cuda version.

I am now making these same improvements to the openCL versions. If all goes well, I should have new apps for Windows and AMD cards within a week.

(source)

I started testing for performance-per-Watt.

Code: Select all

                          power draw     GetDecics
computer                  at the wall    throughput     efficiency
------------------------------------------------------------------
dual Epyc 7452 @155W         320 W      360,000 PPD    1,125 PPD/W
dual E5-2696 v4 @2.8 GHz     400 W      180,000 PPD      450 PPD/W
------------------------------------------------------------------
dual GTX 1080Ti @180W ¹      267 W       64,800 PPD      240 PPD/W
dual GTX 1080Ti @180W ²      280 W       79,300 PPD      280 PPD/W

Each test ran for about 10 hours. The two CPU tests and the 1st GPU test ran at the same time, therefore with overlapping workunit batches. The 2nd GPU test ran afterwards, but probably with workunit batches of same performance characteristics.

Operating system is Linux on all computers.

The EPYCs ran at their default package power tracking limit, and ended up using a core clock of circa 2.7 GHz. The Xeons were locked in to their all-core turbo clock.

The Nvidia GTXs were fed by an i7-7700K, with turbo boost disabled in the BIOS, nothing else but an idle Cinnamon desktop active. The CPU typically clocked at 3.4 GHz. In the 1st GPU test, some cores occasionally clock down to ~1 GHz. Note that GTX 1080Ti's default board power target is 250 W, but I configured it down to 180 W here.

The difference between the two GPU tests:
¹) ran one task at a time per GPU, which resulted in ~85 % shader utilization and < 110 W GPU power usage
²) ran two tasks at a time per GPU, 100 % shader utilization, < 115 W GPU power

app_config.xml for running two tasks at once per GPU:

Code: Select all

<app_config>
    <app>
        <name>GetDecics</name>
        <gpu_versions>
            <gpu_usage>0.5</gpu_usage>
            <cpu_usage>0.01</cpu_usage>
        </gpu_versions>
    </app>
</app_config>

Set <cpu_usage> higher, o.g. to 0.5, if you prefer a more realistic value there. (This setting influences how the boinc client starts and pauses tasks. It does not influence how much CPU the science application is actually using.)

Post by **StefanR5R** » Mon Jan 04, 2021 3:42 pm

The last test completed now. I obtained the average of all task run times, calculated PPD and PPD/W from them, and edited them into the previous post.

Conclusion:

Performance of the GPU application is usable, but not exciting. One GTX 1080 Ti is equivalent to 19 Xeon E5 v4 threads, or to 14 EPYC Rome threads.
Efficiency in terms of host throughput relative to total host power consumption on the tested 16 nm NVidia GPUs is merely ≈60 % compared to the Intel 14 nm server CPUs, or 25 % compared to the AMD 7 nm server CPUs.
If you find yourself in a situation in which you have to get some NumberFields results in a hurry, but all you have is a small CPU and a big GPU, then by all means, put the GPU to the task. Otherwise, prefer the CPU application version.

Raw results: The average run time was 3,830 s on the 128-threaded EPYC host, 5,210 s on the 88-threaded Xeon host, 333 s with one task per GPU, and 545 s with two simultaneous tasks per GPU.

Post by **biodoc** » Wed Jan 06, 2021 4:15 am

StefanR5R wrote: ↑Sun Jan 03, 2021 2:03 pm ²) ran two tasks at a time per GPU, 100 % shader utilization, < 115 W GPU power

I'm using your app_config.xml to run 2 tasks at a time on 3 GPUs in linux. The RTX2070S and 1080Ti are on the same machine so I can't determine average run times for each GPU.

RTX2070: 100% shader utilization / 105 watts /average run time is ~328 seconds based on 40 tasks.
RTX2070S: 100% shader utilization /115 watts
GTX1080Ti: 100% shader utilization/ 115 watts

teamanandtech.org

NumberFields: GPU versus CPU

NumberFields: GPU versus CPU

Re: NumberFields: GPU versus CPU

Re: NumberFields: GPU versus CPU