<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Gridware &#8211; bablick.de</title>
	<atom:link href="https://bablick.de/tag/gridware/feed/" rel="self" type="application/rss+xml" />
	<link>https://bablick.de</link>
	<description>Writing About Clusters, Curiosity, and Everything in Between.</description>
	<lastBuildDate>Wed, 05 Nov 2025 06:11:50 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.8.3</generator>

<image>
	<url>https://bablick.de/wp-content/uploads/2025/08/cropped-BablickLogo-1-32x32.png</url>
	<title>Gridware &#8211; bablick.de</title>
	<link>https://bablick.de</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Compute Nodes with Heterogeneous Topology in Gridware Cluster Scheduler</title>
		<link>https://bablick.de/compute-nodes-with-heterogenious-topology-in-gridware-cluster-scheduler/</link>
		
		<dc:creator><![CDATA[ernst.bablick]]></dc:creator>
		<pubDate>Thu, 23 Oct 2025 19:12:25 +0000</pubDate>
				<category><![CDATA[HPC]]></category>
		<category><![CDATA[Binding]]></category>
		<category><![CDATA[CPU]]></category>
		<category><![CDATA[Gridware]]></category>
		<category><![CDATA[NUMA]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Scheduling]]></category>
		<guid isPermaLink="false">https://bablick.de/?p=125</guid>

					<description><![CDATA[As CPUs evolve toward hybrid designs with mixed core types and increasingly complex memory hierarchies, HPC schedulers must also evolve.This post explains how Gridware Cluster Scheduler 9.1.0 meets that challenge—bringing detailed, topology-aware resource scheduling to modern heterogeneous compute nodes. Why Topology Awareness Matters In modern high-performance computing (HPC), CPU cores within a single socket may...]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"><figure class="wp-block-post-featured-image"><img fetchpriority="high" decoding="async" width="512" height="512" src="https://bablick.de/wp-content/uploads/2025/10/Brain-topology-e1761246631294.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Hardware Topology" style="object-fit:cover;" /></figure></div>



<div class="wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>As CPUs evolve toward hybrid designs with mixed core types and increasingly complex memory hierarchies, HPC schedulers must also evolve.<br>This post explains how <strong>Gridware Cluster Scheduler 9.1.0</strong> meets that challenge—bringing detailed, topology-aware resource scheduling to modern heterogeneous compute nodes.</p>
</div>
</div>



<span id="more-125"></span>



<h2 class="wp-block-heading">Why Topology Awareness Matters</h2>



<p>In modern high-performance computing (HPC), CPU cores within a single socket may differ in clock frequency, power characteristics, or cache layout. Meanwhile, memory hierarchies—NUMA nodes, multi-level caches, and chiplets—add new layers of complexity.</p>



<p>To schedule jobs efficiently, a cluster manager must understand and exploit this hardware topology.<br><strong>Gridware Cluster Scheduler 9.1.0</strong> introduces expanded binding and topology-awareness features to maximize performance and ensure predictable resource placement.</p>



<h2 class="wp-block-heading">Three Hardware Topologies from NVIDIA, AMD, and Intel</h2>



<p>To demonstrate the scheduler’s new capabilities, the following sections show real-world topology examples from <strong>NVIDIA</strong>, <strong>AMD</strong>, and <strong>Intel</strong> hardware.</p>



<h3 class="wp-block-heading">NVIDIA DGX Spark</h3>



<p>The <strong>NVIDIA DGX Spark</strong>—notable for its presentation by Jensen Huang to Elon Musk at SpaceX—uses a <strong>heterogeneous ARM architecture</strong> optimized for AI/ML workloads.<br>The system features <strong>20 ARM cores</strong> organized into <strong>five performance tiers</strong>, each with unique efficiency and frequency characteristics:</p>



<pre class="wp-block-code"><code>CPU-Type #4: efficiency=4, cpuset=0x00080000
  FrequencyMaxMHz = 4004
  LinuxCapacity   = 1024
CPU-Type #3: efficiency=3, cpuset=0x00078000
  FrequencyMaxMHz = 3978
  LinuxCapacity   = 1017
CPU-Type #2: efficiency=2, cpuset=0x000003e0
  FrequencyMaxMHz = 3900
  LinuxCapacity   = 997
CPU-Type #1: efficiency=1, cpuset=0x00007c00
  FrequencyMaxMHz = 2860
  LinuxCapacity   = 731
CPU-Type #0: efficiency=0, cpuset=0x0000001f
  FrequencyMaxMHz = 2808
  LinuxCapacity   = 718</code></pre>



<p>Using Intel’s terminology, this architecture could be viewed as <strong>10 Power cores</strong> and <strong>10 Efficiency cores</strong><br>(10 × ARM Cortex-X925 + 10 × ARM Cortex-A725). Each core has private L1/L2 caches, and groups of 10 share an L3 cache.</p>



<pre class="wp-block-code"><code>&gt; loadcheck -cb | grep Topology
Topology (GCS): NSXEEEEECCCCCXEEEEECCCCC</code></pre>



<p>Gridware Cluster Scheduler uses <strong>topology strings</strong> to represent such layouts.<br>Here: <code>N</code> = NUMA node, <code>S</code> = socket, <code>X</code> = L3 cache, <code>E</code> = Efficiency core, and <code>C</code> = Power core.</p>



<h3 class="wp-block-heading">Intel i9-14900HX</h3>



<p>While the <strong>Intel i9-14900HX</strong> isn’t typical for HPC clusters, it’s an ideal case study for <strong>hybrid core</strong> architectures.</p>



<pre class="wp-block-code"><code>&gt; loadcheck -cb | grep Topology
Topology (GCS): NSXCTTCTTCTTCTTCTTCTTCTTCTTYEEEEYEEEEYEEEEYEEEE</code></pre>



<ul class="wp-block-list">
<li><strong>Power cores (C)</strong>: Dual-threaded (<code>T</code>), each with its own L2 cache.</li>



<li><strong>Efficiency cores (E)</strong>: Single-threaded, grouped by four per L2 cache (<code>Y</code>).</li>



<li><strong>NUMA node (N)</strong> and <strong>socket (S)</strong>: Encompass both core types and a shared L3 cache (<code>X</code>).</li>
</ul>



<h3 class="wp-block-heading">AMD EPYC Zen5</h3>



<p>The <strong>AMD EPYC Zen5</strong> series (e.g., <code>AMD-Epyc-Zen5-c4d-highmem-384</code>) represents a <strong>chiplet-based homogeneous design</strong>.<br>Each core provides two hardware threads, and the L3 cache structure (<code>X</code>) maps directly to chiplets/dies.</p>



<pre class="wp-block-code"><code>&gt; loadcheck -cb | grep Topology
Topology (GCS): NSXCTTCTTCTTCTTCTTCTTCTTCTT XCTTCTTCTTCTTCTTCTTCTTCTT
                  XCTTCTTCTTCTTCTTCTTCTTCTT XCTTCTTCTTCTTCTTCTTCTTCTT
                  ... (repeated chiplet layout per socket)
                NSXCTTCTTCTTCTTCTTCTTCTTCTT XCTTCTTCTTCTTCTTCTTCTTCTT
                  XCTTCTTCTTCTTCTTCTTCTTCTT XCTTCTTCTTCTTCTTCTTCTTCTT
                  ... (repeated chiplet layout per socket)</code></pre>



<p>Each socket (<code>S</code>) corresponds to one NUMA node (<code>N</code>), while every core has a private L2 cache.</p>



<h2 class="wp-block-heading">Handling Heterogeneous Topologies in Gridware Cluster Scheduler</h2>



<p>Efficient scheduling means <strong>assigning tasks to the most suitable hardware</strong>.<br>If a parallel job spans both slow and fast cores, the slowest becomes a bottleneck. Similarly, crossing NUMA or cache boundaries increases latency.</p>



<p>Gridware Cluster Scheduler 9.1 introduces <strong>fine-grained binding control</strong>, allowing binding to:</p>



<ul class="wp-block-list">
<li><strong>Sockets</strong></li>



<li><strong>Cores</strong></li>



<li><strong>Threads</strong></li>



<li><strong>NUMA nodes</strong></li>



<li><strong>Chiplets/Dies (cache domains)</strong></li>
</ul>



<p>This ensures optimal locality and predictable performance, even on hybrid or asymmetric systems.</p>



<h3 class="wp-block-heading">Chiplet/Die Binding Example</h3>



<pre class="wp-block-code"><code>qsub -pe mpi 15 -btype host -bamount 2 -bunit X ...</code></pre>



<p>This example requests <strong>15 MPI tasks</strong>, all running on a single host. Using <code>-btype host</code>, binding is applied relative to the host topology. With <code>-bamount 2 -bunit X</code>, each job portion binds to <strong>two chiplets/dies</strong>, ensuring that cache boundaries are respected and minimizing cross-die interference.</p>



<p>💡 <em>In this setup, the job uses 15 out of 16 available cores. The scheduler keeps the remaining core idle to prevent contention.</em></p>



<h2 class="wp-block-heading">Summary</h2>



<p>With version 9.1.0, Gridware Cluster Scheduler becomes fully topology-aware, bridging the gap between modern heterogeneous hardware and intelligent workload scheduling.<br>By supporting multiple binding types (core, socket, thread, NUMA, chiplet/die), it ensures efficient resource utilization and predictable performance across diverse compute nodes.</p>



<p>We are currently in the QA phase of this release and welcome user feedback on these new features.<br>They are already included in our nightly builds for testing, and beta releases will be available soon.<br><a href="https://hpc-gridware.com/download-main/" data-type="link" data-id="https://hpc-gridware.com/download-main/">Download Gridware Cluster Scheduler</a></p>



<p>Stay tuned with HPC-Gridware for updates — we’ll share <a href="https://bablick.de/understanding-binding-in-gridware-cluster-scheduler/" data-type="post" data-id="133">more insights</a>, examples, and best practices as we approach the official release.</p>



<p>Follow me at <a href="https://x.com/ebablick" data-type="link" data-id="https://x.com/ebablick">X/Twitter</a> or follow us at HPC-Gridware (<a href="https://www.linkedin.com/company/hpc-gridware" data-type="link" data-id="https://www.linkedin.com/company/hpc-gridware">LinkedIn</a>, <a href="https://x.com/HPC_Gridware" data-type="link" data-id="https://x.com/HPC_Gridware">X/Twitter</a>) for release announcements, tips, and technical insights.</p>



<p></p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Open Cluster Scheduler and Gridware Cluster Scheduler v9.0.8 are Available</title>
		<link>https://bablick.de/open-cluster-scheduler-and-gridware-cluster-scheduler-v9-0-8-are-available/</link>
		
		<dc:creator><![CDATA[ernst.bablick]]></dc:creator>
		<pubDate>Thu, 28 Aug 2025 13:15:59 +0000</pubDate>
				<category><![CDATA[HPC]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA[GCS]]></category>
		<category><![CDATA[Gridware]]></category>
		<category><![CDATA[OCS]]></category>
		<guid isPermaLink="false">https://bablick.de/?p=100</guid>

					<description><![CDATA[OCS and GCS v9.0.8 are now available. As usual, the packages can be downloaded from the HPC-Gridware download page, and the source code is available on the Cluster Scheduler GitHub project page. The list of fixed issues mentioned in the Release Notes can be found here: Improvement CS-739 qstat -j output should contain job state,...]]></description>
										<content:encoded><![CDATA[<div class="wp-block-image">
<figure class="aligncenter size-full is-resized"><a href="https://www.hpc-gridware.com/"><img alt="HPC-Gridware Logo" decoding="async" width="292" height="73" src="https://bablick.de/wp-content/uploads/2025/08/NEW-HPC-GRIDWARE-BLACK-DEMO.png" alt="" class="wp-image-103" style="width:400px;height:auto"/></a></figure></div>


<p>OCS and GCS v9.0.8 are now available. As usual, the packages can be downloaded from the <a href="https://hpc-gridware.com/download-ocs-9-0-8/">HPC-Gridware download page</a>, and the source code is available on the <a href="https://github.com/hpc-gridware/clusterscheduler">Cluster Scheduler GitHub project page</a>.</p>



<p>The list of fixed issues mentioned in the <a href="https://www.hpc-gridware.com/download/11138/?tmstv=1756385425">Release Notes</a> can be found here:</p>



<h3 class="wp-block-heading">Improvement</h3>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-739">CS-739</a> qstat -j output should contain job state, start time, queue name, and host names</p>



<h3 class="wp-block-heading">Task</h3>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1407">CS-1407</a> Add SUSE SLES 15 support in support matrix of release notes</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1440">CS-1440</a> Add qtelemetry licenses to GCS 3rdparty licenses directory</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1470">CS-1470</a> do memory testing on V90_BRANCH for the 9.0.8 release</p>



<h3 class="wp-block-heading">Sub-task</h3>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1394">CS-1394</a> Add start_time of array jobs tasks to qstat -j</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1395">CS-1395</a> Cleanup of job states and show states also in qstat -j output</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1396">CS-1396</a> Show granted host information in qstat -j output</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1404">CS-1404</a> Show granted queues in qstat -j output</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1410">CS-1410</a> Show priority in qstat -j output even if it is the base priority</p>



<h3 class="wp-block-heading">Bug</h3>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-671">CS-671</a> qrsh truncates the command line and/or output at 927 characters</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1019">CS-1019</a> sge_execd logs errors when running tightly integrated parallel jobs</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1270">CS-1270</a> Installation script clears screen in case of an error which make issues harder to debug</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1381">CS-1381</a> qacct complains &#8220;error: ignoring invalid entry in line 436&#8221; for accounting records with huge command line entry</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1386">CS-1386</a> man page for sge_share_mon is missing</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1403">CS-1403</a> sge_ckpt man-page is in wrong section (1 instead of 5)</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1422">CS-1422</a> endless loop in protocol between sge_qmaster and sge_execd in certain job failure situations</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1424">CS-1424</a> qmod -sj on own job fails on submit only host</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1429">CS-1429</a> sge_qmaster can segfault on qdel -f</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1434">CS-1434</a> clearing error state of a job leads to event callback error logging in qmaster messages file</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1435">CS-1435</a> rescheduling of jobs requires manager rights, documented is &#8220;manager or operator rights&#8221;</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1436">CS-1436</a> qmod man pages says it requires manager or operator privileges to rerun a job, but a job owner may rerun his own jobs as well</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1451">CS-1451</a> option -out of examples/jobsbin//work is broken</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1476">CS-1476</a> Go DRMAA does not set JoinFiles() correctly</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1477">CS-1477</a> In Go DRMAA TransferFiles() does not set all values</p>



<p>Please let me or the <a href="https://www.hpc-gridware.com/contact/" data-type="link" data-id="https://www.hpc-gridware.com/contact/">HPC-Gridware team</a> know if you have any questions.</p>



<p></p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
