PC Musician
Technique : PC Musician
Some music applications will completely
fail to take advantage of the multiple cores of a modern CPU - but which
ones, and why? We find out, and advise on how you can make best use of
however many cores your PC has.
Over the last couple of years, the PC musician has
been offered first dual-core processors, then quad-core models, and
octo-core machines (currently featuring two quad-core processors) are
now available for those with deep enough pockets. Competitive pricing
has already ensured a healthy take-up of DAWs based around a quad-core
CPU, yet many users haven't cottoned onto the fact that not all software
benefits from all these cores. Some existing software may only be able
to use two of them, reducing potential performance by a huge 50 percent,
while older software may only be able to utilise a single core,
reducing potential performance to just 25 percent of the total
available. This month PC Musician investigates which audio software
works with dual-core, quad-core PCs and beyond, what benefits you're
likely to get in practice over a single-core machine, and which software
may for ever languish in the doldrums.
In the days when most musicians ran Windows 95, 98
or ME, the question of running multiple processors didn't arise, because
none of these operating systems supported more than a single CPU. It
was Windows NT and then Windows 2000 that introduced us to the benefits
of being able to share the processing load between multiple CPUs:
Windows 2000 Professional supported one or two processor chips, while
the more expensive Server version supported up to four, and the Advanced
Server up to eight. However, at this early stage each processor was a
physically separate device, so to be able to (for instance) use twin
processors, you needed a specially designed motherboard with two CPU
sockets. Many audio developers and interface manufacturers didn't
actively support Windows 2000, so most musicians stuck with Windows 98.
In 2001, Microsoft released Windows XP in Home and
Professional versions, and once again most consumers who opted for the
Home version were limited to a single physical processor, although the
Professional version supported two. By this stage many musicians were
straining at the leash, wanting to run more and more plug-ins and
software instruments, and this Professional version let them do exactly
that, using dual-processor motherboards and twin Xeon or Pentium 4
processors.
|
Multi-processing options really opened up the
following year, when Intel introduced first Xeon and then Pentium 4C
processor ranges with Hyperthreading technology, which let these CPUs
appear to both Windows XP Home and Professional (or Linux 2.4x) as two
'virtual' processors instead of one physical one. They each shared the
various internal 'sub-units', including the all-important FPU (Floating
Point Unit), but could run two separate processing 'threads'
simultaneously.
Intel claimed up to a 30 percent improvement with
specially written applications over a standard processor, but as many
musicians soon found, having a Hyperthreaded processor didn't
necessarily benefit them at all unless they were running several
applications simultaneously, since applications like MIDI + Audio
sequencers had to be rewritten to take advantage of Hyperthreading.
Steinberg's Nuendo 2 was one of the few music apps to support it, but
although various others followed, a few (such as Tascam's Gigastudio)
needed a major rewrite before they would even run with HT enabled.
Nevertheless, my own tests (published in PC Notes June 2004) showed that
with optimised audio applications such as Cubase SX2 you could expect a
significant drop in CPU overheads where it really mattered, at low
latencies of 3ms or under.
The biggest change came in late 2004, when both AMD
and Intel seemed to agree that processor clock speeds had reached a
ceiling. Intel abandoned plans to release a 4GHz model in their Prescott
CPU range, and in 2005 both companies largely switched to releasing
dual-core models. Unlike the twin virtual processors of Intel's
Hyperthreading range, these featured two separate processing chips
mounted inside one physical package. By placing two processor cores into
a single piece of silicon, manufacturers could provide significantly
faster performance than a single processor, even when under-clocking
them and running them at lower voltages, so that they didn't run hotter
than the single-core variety.
By late 2006 we had been introduced to quad-core
processors, which have now dropped in price and can even be run with
Windows XP Home (which is licensed to run a single physical processor,
however many cores it has inside). However, if running XP Professional
(and the x64 64-bit version), Vista Home Premium, Business, Enterprise
or Vista Ultimate you also gain the option of installing two quad-core
processors on a suitable motherboard, to provide a total of eight
processing cores. Unfortunately, as with so many new hardware
advancements, much software has had a long way to catch up before it
could take advantage of so many cores.
Multiple-threaded Applications
When you're using a PC with multiple processors of
whatever type, to gain any significant performance benefit the software
you run has to be specially written or adapted with multiple processors
in mind. The way multi-processing works is that applications are divided
into 'threads' (semi-independent processes that can be run in
parallel). Even with a single processor there are huge advantages in
this programming approach. Many applications use multiple threads to
enable multi-tasking, so that one task can carry on while another is
started; and when multiple processors are available, different threads
can be allocated to each CPU.
|
With some processor-intensive programs, such as 3D
graphics and CAD software, it's comparatively easy to split off
different functions to each processor. However, the situation becomes
somewhat more complicated with an application such as a MIDI + Audio
sequencer, since all the different tracks are generally being streamed
in real time and must remain in sync.
Early schemes used by audio software for sharing
tasks between multiple processors were fairly crude; they tended to
devote each CPU to a specific duty, so that (for instance) audio mixing
and effects were handled in one thread, MIDI processing in another, and
user interface responses in yet another. When a MIDI + Audio sequencer
is run with several identical processors under such a scheme, the entire
audio-processing workload is normally handled by one processor, with
any remaining tasks left to the others. Since audio processing is by far
the most significant overhead for any music application, this approach
resulted in a typical overall performance improvement of just 20 to 30
percent for a dual-core processor over a single-core processor running
at the same clock speed.
To gain further improvement, you need to split the
audio processing in some way between the various CPUs, so that it can be
processed in parallel. This means added code and complexity, and rather
explains why some audio software really benefits from four or more
cores, while some doesn't. Steinberg introduced their 'Advanced Multiple
Processing Support' on Cubase VST version 5, splitting the audio
processing between the processors and giving much larger performance
boosts of 50 to 60 percent. Many other audio developers (although not
all) followed with similar improvements, and although there are no
guarantees, most applications optimised in this way should also
subsequently benefit from quad-core and octo-core PCs.
Despite the possibilities, even today many
mainstream office applications and games have not really been optimised
for multiple processors, and some developers have been resistant to
rewriting their applications to support more than two cores, since
debugging an application that can run several threads in parallel is far
harder than one in which everything happens in a single queue of tasks.
Of those applications that have been optimised for multiple
processors, most can still only take advantage of two processors, so
you'll only get the best performance from them on a dual-core or twin
single-core computer. If, for instance, you run a game that can only
take advantage of two cores on a quad-core machine, it will only be able
to access up to 50 percent of the available processing power.
With quad-core processors and beyond, applications
that may benefit include 3D graphics modelling, ray-tracing, and
rendering, plus video-encoding tasks, image processing and some
scientific tasks. You're always likely to achieve good performance when
running several different applications simultaneously, since each will
get a good share of the pie, but with MIDI + Audio applications you want
a single application to have all its tasks shared out as fairly as
possible between the available cores.
Audio Applications
Developers told me that although most instruments
and plug-ins run as several threads, they have no control over how these
are distributed among the available cores. This is totally managed by
the host application, and according to all the tests I carried out while
researching this feature, most audio applications treat each
mono/stereo audio track (or soft-synth/sampler track), plus associated
plug-in effects, as a single task, and allocate it to a single processor
core. You can easily confirm this for your own applications using Task
Manager (see the 'Checking Your Tasks' box) and systematically adding a
series of demanding plug-ins to the same audio track. I suggest a
convolution reverb with the longest Impulse Response you can find.
|
If you're running multiple cores (whether in the
same chip or spread across multiple processors in discrete packages) the
above has certain implications. Let's say you have a
physically-modelled synth that consumes a lot of CPU resources. Since
our synth track is a single task, on a quad-core processor it can only
consume a maximum of 25 percent of the overall processing power
available — ie. the maximum available from a single core. So, even
though your sequencer's 'CPU meter' may indicate 100 percent loading in
this situation, and it's possible for your audio application to glitch
and stop playback because one of the four cores has run out of steam,
you still have 75 percent of your CPU resources available to run other
synths and plug-ins, which should automatically get allocated to the
remaining cores. Confusing, isn't it? So if you find yourself 'maxing
out' a single core by, for instance, running lots of instruments on
different tracks, all linked to a single multitimbral software sampler,
launch another instance of it and run some of your instruments from that
one instead.
When measuring multi-core performance of audio
applications, it's therefore important to choose a suitable benchmark
test that will allow the applications the best chance of spreading the
processor load as evenly as possible. I carefully tested single-, dual-
and quad-core PCs, all having identical clock speeds, with Cubase SX
running the Thonex and Blofelds DSP40 tests. As you can see from the
graphs, while the older Thonex test only displays a 20 to 30 percent
improvement between the dual-core and quad-core results, Blofelds showed
much better scaling. A quorum of DAW builders seem to have agreed that
Vin Curiglianio's DAWbench suite is currently the best test available to
measure differences in multi-core system performance, since it starts
with a real-world song and then ignores the application's CPU meter in
favour of adding more and more plug-ins and/or soft synths across 40
tracks until you hear audio glitching, which largely mirrors what many
musicians do in the real world.
The original DAWbench Blofelds DSP40 test is for
those Cubase/Nuendo users who mainly record audio tracks and use lots of
plug-ins (there's now also a new SONARbench DSP test that uses the same
techniques), while the L-Factor II test is for Cubase/Nuendo owners who
instead run lots of software synths. Such 'on the edge' tests are also
useful in comparing audio driver performance, as well as spotting
operating system issues such as jerky graphic scrolling under stress,
and the extra overheads imposed by the Windows XP and Vista Aero
graphics over the Windows 'classic' look.
What tasks you're going to perform with your audio
application may also affect the ideal number of cores, and thus which is
the 'best' PC for the job. For drummers and vocalists monitoring their
own live performances on headphones, the Holy Grail is to run a system
that runs with barely discernible latency. Many would be happy using a
buffer size of 64 samples, which would mean a total real-world latency
for audio monitoring with plug-ins of just under 5ms (at a sample rate
of 44.1kHz), or around 3.5ms for playing soft synths. If you still find
this unacceptably high and prefer not to rely on 'zero latency'
monitoring solutions (which bypass any plug-in effects), 32-sample
buffers would offer total audio monitoring latency of around 3.5ms
(around 2.7ms for soft synths), again at a 44.1kHz sample rate.
Blofelds DSP40 tests by a range of DAW builders who
have access to lots of PCs based around different processors have shown
that at really low buffer sizes, such as 32 samples, a single quad-core
processor will always outperform a single dual-core processor or (more
interestingly) a system featuring two dual-core processors, and
sometimes even a dual quad-core system. In some tests at these really
low latencies, when stressed with lots of plug-ins and instruments, the
single quad-core machine was the only one to complete them successfully,
making it the current king for low-latency performance.
If you're happy to run use a higher buffer size, of
128 samples or above (audio monitoring latency of around 8ms), you'll
probably be able to run significantly more plug-ins and soft synths
using two quad-core processors than one. Those involved in lots of
recording work who want 'real time' monitoring may thus prefer a single
quad-core, while others who rely mainly on samples and soft synths may
get even more mileage from a twin quad-core system.
This is the biggie: it's all very well having a
hugely powerful quad-core or octo-core PC, but not a lot of use if your
software only uses two or four cores from those available, or makes a
poor job of sharing resources between them. The secret is for the
application to balance requirements across the available cores, so that
you don't get any audio glitches as a result of one or more cores
running out of juice while there's some still available from the others.
For the reasons mentioned above, stereo audio
editors may not take full advantage of a multi-core PC — something I
soon confirmed with Steinberg's Wavelab 6, which only used one core for
DSP processing during playback or audio rendering. Its author Philippe
Goutier says that a second core will be used for disk access and the
user interface, which does at least mean that the application will
always remain responsive to new commands, but he hopes to improve
core-sharing now that so many musicians have multi-core PCs.
The vast majority of stand-alone soft synths also
seem to mostly use a single core, but as soon as you load the VSTi or
DXi version into a host VSTi or DXi application, this host should
distribute the various plug-ins and soft synths across the available
cores to make best use of resources. Fortunately, most multitrack audio
applications can distribute the combined load from all your tracks
between as many cores as they find, although it's perhaps inevitable
that since many of the latest versions were released long before
quad-core and octo-core PCs were in regular use, some don't manage it
quite as efficiently as others. Even now some developers don't have
octo-core test systems.
The DAWs
Reaper's Justin Frankel told me that he routinely
does a lot of his development on a dual quad-core Xeon PC, so it's
hardly surprising that the default Reaper settings work well with up to
eight-core machines, typically offering over 95 percent utilisation of
all eight cores. Reaper mostly uses 'Anticipatory FX processing' that
runs at irregular intervals, often out of order, and slightly ahead of
time. Apparently, there are very few times when the cores need to
synchronise with each other, and using this scheme he can let them all
crank away using nearly all of the available CPU power. Exceptions
include record input monitoring, and apparently when running UAD1 DSP
cards, which both prefer a more classic 'Synchronous FX
multi-processing' scheme.
Steinberg's Cubase SX, Cubase 4 and Nuendo all work
decently on quad-core systems, scaling up well from single to dual-core
and quad-core PCs. However, Cubase 4 and Nuendo 4 don't currently
provide all the benefits they could at low latency with a dual quad-core
system. Compared with the potential doubling of plug-in numbers from
dual to quad, when you move to 'octo' you may only be able to run about
40 percent more plug-ins down to buffer sizes of 128 samples, while
below this you may even get worse performance than a quad-core system.
Steinberg developers have already acknowledged the
problem, which is apparently due to "a serialisation of the ASIO driver,
which eats up to 40 percent of the processing time. Together with the
other synchronisation delays, only 25 to 30 percent of the
1.5-millisecond time-slice can be used for processing. This is not very
efficient." Steinberg have promised to address the issue in a Nuendo 4
maintenance update, and have hinted that it may also result in changes
to the ASIO specification.
Cakewalk's Sonar does seem to scale well, sometimes
giving a better percentage improvement when moving from a quad-core to
an octo-core PC than the current version of Nuendo/Cubase 4, but the
jury still seems to be out on whether choosing ASIO or WDM/KS drivers
gives better results; with some systems ASIO is a clear winner, while in
others WDM/KS drivers move significantly ahead.
Digidesign have a reputation for being slow but
thorough when testing out new hardware to add to their 'approved list',
and as I write this in early November 2007 their web site states that
Intel Core 2 Quad processors and Intel Xeon quad-core have not been
tested by Digidesign on Windows for any Pro Tools system.
Nevertheless, Pro Tools HD/TDM users started
posting recommendations for rock-solid systems featuring twin dual-core
Opteron processors (four CPU cores in all) in mid-2006, and there are
now loads of Pro Tools LE users successfully running both quad-core and
even a few octo-core PCs in advance of any official pronouncements
(there's lots of specific recommendations on both quad-core and
octo-core PC components in a vast 126-page thread on the Digi User
Conference at
http://duc.digidesign.com/showflat.php?Cat=&Number=988224). Despite
the lack of official 'qualification', all Pro Tools systems seem to
scale well on quad-cores, happily running all four cores up to 100
percent utilisation, and many users are very pleased with their
quad-core 'native' CPU performance.
Like various other audio applications, even the
latest Mac version of Logic Audio doesn't yet fully benefit from having
eight processor cores at its disposal, but for die-hard PC users of
Logic the situation is rather more serious: Apple discontinued
development and support for those using Logic on the PC back in 2002, so
most recent version (5.5.1) is now some five years old. Although it's a
multi-threaded application, Logic 5.5.1 for Windows is not really
optimised for multiple processors, so only one of the cores is likely to
get much of a workout. However, there's a partial workaround, using the
I/O Helper plug-in available from Logic version 5.2 onwards, which can
force any plug-ins on a track with it inserted to run on a second core,
so that you can use lots more plug-ins/instruments overall (there's a
more detailed description on Universal Audio's web site at www.uaudio.com/webzine/2003/may/index5.html).
Logic Audio 5.5.1 also has a problem if more than 1GB of system RAM is
installed (see
http://community.sonikmatter.com/forums/lofiversion/index.php/t8032.html
for some suggestions on this one), and also has problems running some
VST plug-ins. It's unlikely to benefit from a quad-core processor at
all, and I wouldn't recommend running it on a new quad-core PC, so its
shelf-life is looking increasingly limited.
Overall, getting the best out of a multi-core PC generally means a little detective work from the user. You need to make sure you have the most appropriate audio application settings (which might be different if you run DSP cards), and you also need to be cautious when running heavy-duty synths or plug-ins that might consume one of your cores in a single gulp. Keeping an occasional eye on the Windows Task Manager may also help, since the CPU meters provided by most sequencers are becoming rather less useful now that they are monitoring so many individual cores.
No comments:
Post a Comment