<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>HPC &#8211; bablick.de</title>
	<atom:link href="https://bablick.de/category/it/hpc/feed/" rel="self" type="application/rss+xml" />
	<link>https://bablick.de</link>
	<description>Writing About Clusters, Curiosity, and Everything in Between.</description>
	<lastBuildDate>Wed, 05 Nov 2025 06:15:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.8.3</generator>

<image>
	<url>https://bablick.de/wp-content/uploads/2025/08/cropped-BablickLogo-1-32x32.png</url>
	<title>HPC &#8211; bablick.de</title>
	<link>https://bablick.de</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>How Binding Order and Range Shape Job Placement in Gridware Cluster Scheduler</title>
		<link>https://bablick.de/how-binding-order-and-range-shape-job-placement-in-gridware-cluster-scheduler/</link>
		
		<dc:creator><![CDATA[Ernst Bablick]]></dc:creator>
		<pubDate>Thu, 30 Oct 2025 23:31:32 +0000</pubDate>
				<category><![CDATA[HPC]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA["Cache and Memory Optimization"]]></category>
		<category><![CDATA["Cost Aware Scheduling"]]></category>
		<category><![CDATA["CPU Binding"]]></category>
		<category><![CDATA["Energy-Aware Scheduling"]]></category>
		<category><![CDATA["Gridware Cluster Scheduler"]]></category>
		<category><![CDATA[GCS]]></category>
		<category><![CDATA[NUMA]]></category>
		<category><![CDATA[OCS]]></category>
		<guid isPermaLink="false">https://bablick.de/?p=158</guid>

					<description><![CDATA[This article explains how the three binding options -bsort, -bstart, and -bstop control where and in which order jobs are bound to CPUs in Gridware Cluster Scheduler. Together, they define the scheduler’s fill-up pattern — how hardware resources are used, balanced, or reused across sockets, cores, and NUMA nodes. Understanding these options is essential for...]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"><figure class="aligncenter wp-block-post-featured-image"><img fetchpriority="high" decoding="async" width="512" height="512" src="https://bablick.de/wp-content/uploads/2025/10/Binding-Topilogy-Advanced-e1761865388777.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Advanced Binding Topology" style="object-fit:cover;" /></figure></div>



<div class="wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>This article explains how the three binding options <code>-bsort</code>, <code>-bstart</code>, and <code>-bstop</code> control <strong>where and in which order jobs are bound to CPUs</strong> in Gridware Cluster Scheduler.</p>



<p>Together, they define the scheduler’s <strong>fill-up pattern</strong> — how hardware resources are used, balanced, or reused across sockets, cores, and NUMA nodes.</p>



<p>Understanding these options is essential for creating advanced scheduling strategies such as <strong>energy-aware, thermal, or cost-optimized job placement</strong>.</p>
</div>
</div>



<span id="more-158"></span>



<h2 class="wp-block-heading"><strong>Further Reading</strong></h2>



<p>If you missed the first two parts of this series, start there — they introduce the concepts of topology strings and binding basics that this article builds on:</p>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full"><img decoding="async" width="512" height="512" src="https://bablick.de/wp-content/uploads/2025/10/Brain-topology-e1761246631294.png" alt="hardware-topology" class="wp-image-128"/></figure>
</div>



<div class="wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p><a href="https://bablick.de/compute-nodes-with-heterogenious-topology-in-gridware-cluster-scheduler/" data-type="post" data-id="125">Compute Nodes with Heterogeneous Topology in Gridware Cluster Scheduler</a></p>
</div>
</div>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full"><img alt="Binding Topology" decoding="async" width="512" height="512" src="https://bablick.de/wp-content/uploads/2025/10/Brain-Topology-Binding-e1761421460621.png" alt="" class="wp-image-135"/></figure>
</div>



<div class="wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p><a href="https://bablick.de/understanding-binding-in-gridware-cluster-scheduler/">Understanding Binding in Gridware Cluster Scheduler</a></p>
</div>
</div>



<h2 class="wp-block-heading"><strong>Why order and range matter</strong></h2>



<p>Imagine your cluster as a row of houses, each with several rooms. Jobs are like guests arriving to stay overnight.</p>



<p>Without a plan, guests will pick rooms at random. Some might share walls and benefit from proximity; others might end up in different houses entirely.</p>



<p>Binding order and range options give the scheduler a <em>plan</em> — a way to decide how the “houses” and “rooms” are filled:</p>



<ul class="wp-block-list">
<li><code>-bsort</code> defines <strong>which house or room type to consider first</strong>.</li>



<li><code>-bstart</code> sets <strong>where the first guest can enter</strong>.</li>



<li><code>-bstop</code> defines <strong>where the group must stop filling</strong>.</li>
</ul>



<p>It’s a simple idea — but it’s the foundation for consistent, balanced, and sometimes energy-efficient scheduling. This ensures that jobs don’t overlap or scatter unpredictably — they follow a defined parking pattern.</p>



<h2 class="wp-block-heading"><strong>1. Sorting with <code>-bsort</code></strong></h2>



<p>The <code>-bsort</code> option controls the <em>order</em> in which available CPU units are considered. It uses the same letters as topology strings (<code>S</code>, <code>C</code>, <code>E</code>, <code>N</code>, <code>X</code>, <code>Y</code>) but interprets them by utilization:</p>



<ul class="wp-block-list">
<li><strong>Uppercase</strong> letters (<code>S</code>, <code>C</code>, <code>N</code>…) mean <em>start with free resources</em>.</li>



<li><strong>Lowercase</strong> letters (<code>s</code>, <code>c</code>, <code>n</code>…) mean <em>start with already-used resources</em>.</li>
</ul>



<p>This simple mechanism lets administrators describe sophisticated fill-up patterns:</p>



<ul class="wp-block-list">
<li><code>-bsort "SC"</code> — Fill unutilized sockets and cores first.</li>



<li><code>-bsort "sC"</code> — Reuse partially filled sockets before opening a new one.</li>



<li><code>-bsort "nSyC"</code> — (GCS only) Use utilized NUMA nodes and 3rd Level Caches before starting opening new ones but empty sockets and cores are preferred within NUMA nodes and core groups.</li>
</ul>



<p>If no sort order is defined, the scheduler uses the hardware’s natural order as reported by HWLOC. This order is the default behavior in most competitive workload management systems and ensures predictable placement when the WLM lacks a possibility to specify preferences.</p>



<h2 class="wp-block-heading"><strong>2. Defining the range with <code>-bstart</code> and <code>-bstop</code></strong></h2>



<p>After sorting, the scheduler knows which binding units exist and in what order they should be considered. The next question is: <em>how much of that ordered list should be used?</em> That’s what <code>-bstart</code> and <code>-bstop</code> control.</p>



<p>You can think of the sorted topology as a long sequence of nodes — a string that lists sockets, cores, or cache domains in the order defined by <code>-bsort</code>. </p>



<p>With <code>-bstart</code> and <code>-bstop</code>, you mark a <strong>window</strong> inside that string to show where binding is allowed.</p>



<pre class="wp-block-code"><code>Start here → &#91;================] ← Stop here</code></pre>



<p>The brackets represent the usable part of the topology. Binding can happen only inside this bracketed region; everything outside it is ignored. This is especially useful when you want to reserve or exclude certain parts of a host — for example, to keep the first socket free for system services or to group jobs within a specific NUMA region.</p>



<p>Consider a dual-socket, quad-core host represented as:</p>



<pre class="wp-block-code"><code>SCCccSCCCC</code></pre>



<p>(lowercase <code>c</code> marks partially utilized cores)</p>



<ul class="wp-block-list">
<li><code>-bstart S -bstop s</code> → the scheduler begins binding at the first <strong>unutilized socket (<code>S</code>)</strong> and stops before the first <strong>partially used socket (<code>s</code>)</strong>, effectively limiting the job to the clean half of the machine. (SCCcc[SCCCC])</li>



<li><code>-bstart s -bstop S</code> → Reversing the range starts binding at the first <strong>used socket (<code>s</code>)</strong> and continues until the next <strong>free socket (<code>S</code>)</strong>  ([SCCcc]SCCCC)</li>
</ul>



<p>You can imagine this as managing a parking lot: <code>-bsort</code> arranges the parking spaces by preference, and <code>-bstart</code>/<code>-bstop</code> draw the lines around the section that is currently open for parking. Jobs will always stay within that section, preventing overlap and ensuring predictable placement.</p>



<h2 class="wp-block-heading"><strong>3. Real-world significance</strong></h2>



<p>At first glance, <code>-bsort</code>, <code>-bstart</code>, and <code>-bstop</code> may seem like low-level tweaks. In reality, they are the <strong>building blocks of enterprise scheduling strategy</strong>.</p>



<p>These options influence how load, heat, and power consumption are distributed across hardware, and they provide the basis for higher-level policies such as:</p>



<h3 class="wp-block-heading has-medium-font-size"><strong>Energy-Aware Scheduling</strong> – fill one socket or die before activating another.</h3>



<h3 class="wp-block-heading has-medium-font-size"><strong>Thermal &amp; Power-Density Balancing</strong> – distribute workloads evenly across nodes.</h3>



<h3 class="wp-block-heading has-medium-font-size"><strong>License- or Cost-Aware Placement</strong> – schedule on specific CPU&#8217;s first.</h3>



<h3 class="wp-block-heading has-medium-font-size"><strong>Cache &amp; Memory-Bandwidth Optimization</strong> – keep related processes close to shared caches or NUMA regions.</h3>



<p>In GCS, JSV (Job Submission Verifier) scripts can dynamically adjust these settings to enforce global policies across the cluster.</p>



<h2 class="wp-block-heading"><strong>4. Looking ahead</strong></h2>



<p>This post closes the “mechanics” phase of the binding series. In the next article, we’ll apply everything we’ve learned to <strong>real enterprise use cases</strong> — combining sorting, filtering, and range selection to implement:</p>



<ul class="wp-block-list">
<li>Energy-optimized cluster configurations</li>



<li>Thermal-balanced compute fabrics</li>



<li>Cost- and license-aware job scheduling</li>
</ul>



<p>What seems like three small options today will soon become the core tools for advanced resource optimization.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Qontrol – A Modern Web UI for Gridware Cluster Scheduler</title>
		<link>https://bablick.de/qontrol-a-modern-web-ui-for-gridware-cluster-scheduler/</link>
		
		<dc:creator><![CDATA[ernst.bablick]]></dc:creator>
		<pubDate>Mon, 27 Oct 2025 19:18:19 +0000</pubDate>
				<category><![CDATA[HPC]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA["Cluster Management"]]></category>
		<category><![CDATA["Gridware Cluster Scheduler"]]></category>
		<category><![CDATA["HPC-Gridware"]]></category>
		<category><![CDATA["qmon"]]></category>
		<category><![CDATA["Qontrol"]]></category>
		<category><![CDATA["scheduler"]]></category>
		<category><![CDATA["Web UI"]]></category>
		<guid isPermaLink="false">https://bablick.de/?p=155</guid>

					<description><![CDATA[With our latest announcement at HPC-Gridware, we’re introducing Qontrol — a completely new, web-based user interface for Gridware Cluster Scheduler. For many years, qmon was the graphical management tool for Grid Engine and its derivatives. Maintaining this legacy Motif-based application had become increasingly difficult — both technically and from a usability perspective.With Qontrol, we are...]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"><figure class="wp-block-post-featured-image"><img loading="lazy" decoding="async" width="512" height="512" src="https://bablick.de/wp-content/uploads/2025/10/Qontrol-Dashboard-Overview-e1761592596433.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Qontrol Web UI for GCS" style="object-fit:cover;" /></figure></div>



<div class="wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>With our latest announcement at <a href="https://hpc-gridware.com/introducing-qontrol-modern-web-ui-for-gridware-cluster-scheduler/">HPC-Gridware</a>, we’re introducing <strong>Qontrol</strong> — a completely new, web-based user interface for <strong>Gridware Cluster Scheduler</strong>.</p>
</div>
</div>



<span id="more-155"></span>



<p>For many years, <em>qmon</em> was the graphical management tool for Grid Engine and its derivatives. Maintaining this legacy Motif-based application had become increasingly difficult — both technically and from a usability perspective.<br>With <strong>Qontrol</strong>, we are starting fresh: a <strong>modern, responsive, browser-based</strong> interface designed to support administrators and users of large HPC environments.</p>



<p>Qontrol provides direct insight into cluster status, jobs, queues, and resource usage — all presented through a clean and flexible UI that can adapt to a wide range of display environments.<br>It is built to integrate seamlessly with existing <strong>Gridware Cluster Scheduler</strong> installations, requiring no additional client software.</p>



<p>The initial release focuses on <strong>monitoring and visualization</strong>, but we are actively working on <strong>interactive management capabilities</strong> and <strong>customizable dashboards</strong> for upcoming versions.</p>



<p>You can find the full introduction and feature overview in our main post:<br>👉 <a href="https://hpc-gridware.com/introducing-qontrol-modern-web-ui-for-gridware-cluster-scheduler/">Introducing Qontrol – A Modern Web UI for Gridware Cluster Scheduler</a></p>



<p>Stay tuned — this marks the beginning of a new generation of tools built around <strong>Gridware Cluster Scheduler</strong>.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Understanding Binding in Gridware Cluster Scheduler</title>
		<link>https://bablick.de/understanding-binding-in-gridware-cluster-scheduler/</link>
		
		<dc:creator><![CDATA[ernst.bablick]]></dc:creator>
		<pubDate>Sat, 25 Oct 2025 20:13:16 +0000</pubDate>
				<category><![CDATA[HPC]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA["chiplet"]]></category>
		<category><![CDATA["core binding"]]></category>
		<category><![CDATA["Gridware Cluster Scheduler"]]></category>
		<category><![CDATA["parallel computing"]]></category>
		<category><![CDATA["scheduler"]]></category>
		<category><![CDATA["socket binding"]]></category>
		<category><![CDATA["thread binding"]]></category>
		<category><![CDATA[Binding]]></category>
		<category><![CDATA[NUMA]]></category>
		<guid isPermaLink="false">https://bablick.de/?p=133</guid>

					<description><![CDATA[Modern compute nodes have grown increasingly complex — featuring heterogeneous cores, multi-level caches, and intricate NUMA topologies. In the previous post, Compute Nodes with Heterogeneous Topology in Gridware Cluster Scheduler, we looked at how these topologies are detected and represented in Gridware Cluster Scheduler. This post explores binding — how the scheduler decides where exactly...]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"><figure class="wp-block-post-featured-image"><img loading="lazy" decoding="async" width="512" height="512" src="https://bablick.de/wp-content/uploads/2025/10/Brain-Topology-Binding-e1761421460621.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Binding Topology" style="object-fit:cover;" /></figure></div>



<div class="wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>Modern compute nodes have grown increasingly complex — featuring heterogeneous cores, multi-level caches, and intricate NUMA topologies. In the previous post, <a href="https://bablick.de/compute-nodes-with-heterogenious-topology-in-gridware-cluster-scheduler/"><em>Compute Nodes with Heterogeneous Topology in Gridware Cluster Scheduler</em></a>, we looked at how these topologies are detected and represented in <strong>Gridware Cluster Scheduler</strong>.</p>



<p>This post explores <strong>binding</strong> — how the scheduler decides <em>where</em> exactly a job runs within a node, and how users can control this behavior for optimal performance.</p>
</div>
</div>



<span id="more-133"></span>



<h2 class="wp-block-heading">Why Binding Matters</h2>



<p>In high-performance computing (HPC), <strong>resource binding</strong> defines how processes or threads are mapped to specific CPU resources. Effective binding ensures predictable performance by preventing multiple jobs from competing for the same core, cache, or memory subsystem. It also enhances bandwidth utilization for attached devices such as network interfaces, InfiniBand adapters, and GPUs by maintaining locality between compute tasks and their associated hardware resources..</p>



<p><strong>Gridware Cluster Scheduler</strong> treats binding as a <strong>first-class resource</strong>. Unlike traditional schedulers where binding was merely a hint, in Gridware Cluster Scheduler it is a <strong>hard requirement</strong> — a job will only start once the requested binding can be fulfilled.</p>



<h2 class="wp-block-heading">From Slots to Binding</h2>



<p>If you’ve read the previous post, you already know about the <strong>slot concept</strong> — where each slot represents a unit of computational capacity on a node. Here’s the full analogy that makes the difference between <em>slots</em> and <em>binding</em> concrete:</p>



<ul class="wp-block-list">
<li><strong>Slots = seats on an airplane.</strong> A compute node has a fixed number of slots, just as a plane has a fixed number of seats. Each sequential job and each task of a parallel job needs <strong>one slot</strong>, like each passenger needs <strong>one seat</strong>.</li>



<li><strong>Binding = weight-balanced placement and freight.</strong> Binding determines <em>where</em> the job runs within the node. In the airplane analogy, that’s like <strong>assigning specific rows/sections</strong> and <strong>placing freight in defined compartments</strong> to maintain balance. Similarly, binding pins tasks to <strong>threads, cores, sockets, dies (L3), or NUMA nodes</strong> so they benefit from nearby caches and memory and don’t interfere with other workloads.</li>
</ul>



<p>In short: slots define <strong>how many</strong>, binding defines <strong>where</strong> — the placement that preserves locality and stability.</p>



<h2 class="wp-block-heading">Binding Units and Amounts</h2>



<p>Binding behavior is primarily controlled with the <code>-bunit</code> and <code>-bamount</code> parameters during job submission.</p>



<h3 class="wp-block-heading">Define the Binding Level with <code>-bunit &lt;unit&gt;</code></h3>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:100%">
<figure class="wp-block-table"><table><thead><tr><th>Unit</th><th>Description</th></tr></thead><tbody><tr><td><strong>T</strong> or <strong>CT</strong></td><td>CPU thread of a power core</td></tr><tr><td><strong>ET</strong></td><td>CPU thread of an efficiency core</td></tr><tr><td><strong>C</strong></td><td>Power core (default)</td></tr><tr><td><strong>E</strong></td><td>Efficiency core</td></tr><tr><td><strong>S</strong> or <strong>CS</strong></td><td>All power cores of a socket</td></tr><tr><td><strong>ES</strong></td><td>All efficiency cores of a socket</td></tr><tr><td><strong>X</strong> or <strong>CX</strong></td><td>All power cores sharing the same L3 cache (chiplet/die)</td></tr><tr><td><strong>EX</strong></td><td>All efficiency cores sharing the same L3 cache</td></tr><tr><td><strong>Y</strong> or <strong>CY</strong></td><td>All power cores sharing the same L2 cache</td></tr><tr><td><strong>EY</strong></td><td>All efficiency cores sharing the same L2 cache</td></tr><tr><td><strong>N</strong> or <strong>CN</strong></td><td>All power cores of a NUMA node</td></tr><tr><td><strong>EN</strong></td><td>All efficiency cores of a NUMA node</td></tr></tbody></table></figure>
</div>
</div>



<p>Each unit level corresponds to a layer in the hardware hierarchy. </p>



<h3 class="wp-block-heading">Specify the Number of Units with <code>-bamount &lt;number&gt;</code></h3>



<p>The <strong>binding amount</strong> defines how many binding units should be assigned per slot (or per host).</p>



<pre class="wp-block-code"><code>qsub -pe mpi_8 16 -bunit C -bamount 2 ...</code></pre>



<p>This job requests 16 slots across two hosts. Each slot binds to <strong>two power cores</strong>, ideal for tasks starting two lightweight threads (or processes).</p>



<p>If the threads are tightly coupled, <strong>thread binding</strong> can be more suitable:</p>



<pre class="wp-block-code"><code>qsub -pe mpi_8 16 -bunit T -bamount 2 ...</code></pre>



<p>Binding threads instead of cores can enhance total cluster throughput by minimizing stalls from system calls, cache misses, and network delays, and by improving cache locality (especially for producer–consumer pairs). While each job may take up to twice as long to complete, the increased parallelism—running twice as many jobs simultaneously—often results in a net performance improvement of 5–10%.</p>



<p>Chiplet or die binding can be especially beneficial on modern CPUs where groups of cores share a common L3 cache. By aligning tasks to those chiplets, cache locality is preserved and cross-die memory traffic is minimized.</p>



<pre class="wp-block-code"><code>qsub ... -btype host -bunit X -bamount 1</code></pre>



<p>This command binds each job or task to all cores that share the same L3 cache. It ensures that the job exclusively uses that cache domain and its attached resources (e.g. GPU or I/O devices), preventing other jobs from interfering. Also this can result in a performance benefit even if the job might not use all cores of the die.</p>



<h2 class="wp-block-heading">Binding Types: Slot vs. Host</h2>



<p>Binding can be applied <strong>per slot</strong> (as we saw in previous examples) or <strong>per host</strong> using the <code>-btype</code> parameter.</p>



<ul class="wp-block-list">
<li><strong>Slot-based binding</strong> (default):<br>Each slot gets its own binding. This maximizes flexibility and is ideal for mixed workloads.</li>



<li><strong>Host-based binding</strong>:<br>Binding is applied collectively for all slots on a host, ensuring consistent placement but reducing flexibility.</li>
</ul>



<pre class="wp-block-code"><code>qsub -pe mpi_8 16 -btype host -bunit X -bamount 1 ...</code></pre>



<p>Here, the job get 16 slots (8 per host) and <strong>Gridware Cluster Scheduler</strong> binds each group to one die (L3-cache). This host-wide approach minimizes fragmentation and improves cache locality.</p>



<p>Once a job (or advance reservation) is scheduled, its actual binding can be inspected with <code>qstat</code> (or <code>qrstat</code>)</p>



<pre class="wp-block-code"><code>qstat -j &lt;job_id&gt;   # or  qrstat -ar &lt;ar_id&gt;
...
binding:               bamount=16,binstance=set,bstrategy=pack,btype=host,bunit=X
exec_binding_list 1:   host1=NSxccccccccXCCCCCCCC,host2=NSxccccccccXCCCCCCCC</code></pre>



<p>The first line shows the binding request; the second lists the binding actually applied per host (lower case letters in the topology string). In this example, all cores below the first L3 cache of the first socket were used.</p>



<h2 class="wp-block-heading">Binding Filters</h2>



<p>Sometimes certain cores or sockets should be left free — for example, one core per host reserved for system tasks. Binding filters, defined with <code>-bfilter</code>, make this possible.</p>



<p>A filter uses a <strong>topology string</strong> where lowercase letters mark excluded units.</p>



<p>Example:</p>



<pre class="wp-block-code"><code>qsub -bfilter ScCCCScCCC ...</code></pre>



<p>Here, the first core of each socket is masked and will not be used for binding. All other cores remain available.</p>



<p>Administrators can also define global filters by keyword:</p>



<pre class="wp-block-code"><code>qconf -sconf | grep binding_params
binding_params ... filter=first_core</code></pre>



<p>Global and job-specific filters are additive, and both restrictions apply simultaneously.</p>



<h2 class="wp-block-heading">Packed Binding (and What Comes Next)</h2>



<p>The <strong>packed binding strategy</strong> is the default in <strong>Gridware Cluster Scheduler</strong>.<br>It assigns available hardware units sequentially from left to right within a node’s topology string, ensuring that each host is filled efficiently while maintaining cache and NUMA locality.</p>



<p>Packed binding automatically groups tasks on nearby cores and within shared cache domains to reduce latency and memory contention.<br>If a host does not have enough free units to satisfy a job’s binding request, the scheduler simply skips that host.</p>



<p>What has been written so far is available in <strong>Open Cluster Scheduler (OCS)</strong>  and <strong>Gridware Cluster Scheduler (GCS)</strong>.</p>



<p>However, Gridware Cluster Scheduler introduces <a href="https://bablick.de/how-binding-order-and-range-shape-job-placement-in-gridware-cluster-scheduler/" data-type="post" data-id="158">extended binding control</a>. Packed binding can be refined through additional options — <strong><code>-bsort</code></strong>, <strong><code>-bstart</code></strong>, and <strong><code>-bstop</code></strong> — which let you influence the <strong>order</strong> and <strong>region</strong> of unit selection.</p>



<p>These <strong>advanced strategies</strong> are available <strong>only in Gridware Cluster Scheduler</strong> and will be discussed in detail in the <strong>next blog post</strong>.</p>



<p>🚀 <strong>Stay connected!</strong><br>Follow me on <strong><a href="https://x.com/ebablick" data-type="link" data-id="https://x.com/ebablick">X (Twitter)</a></strong> or join <strong>HPC-Gridware</strong> on <strong><a href="https://www.linkedin.com/company/hpc-gridware" data-type="link" data-id="https://www.linkedin.com/company/hpc-gridware">LinkedIn</a></strong> and <strong><a href="https://x.com/HPC_Gridware" data-type="link" data-id="https://x.com/HPC_Gridware">X (Twitter)</a></strong> for the latest release announcements, expert tips, and in-depth technical insights from our team.</p>



<p>🔧 <strong>Try it today:</strong> nightly builds featuring the latest <strong>OCS</strong> and <strong>GCS</strong> enhancements discussed in this post are now available from <strong><a href="https://hpc-gridware.com/download-main/" data-type="link" data-id="https://hpc-gridware.com/download-main/">HPC-Gridware</a></strong>.</p>



<p></p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Compute Nodes with Heterogeneous Topology in Gridware Cluster Scheduler</title>
		<link>https://bablick.de/compute-nodes-with-heterogenious-topology-in-gridware-cluster-scheduler/</link>
		
		<dc:creator><![CDATA[ernst.bablick]]></dc:creator>
		<pubDate>Thu, 23 Oct 2025 19:12:25 +0000</pubDate>
				<category><![CDATA[HPC]]></category>
		<category><![CDATA[Binding]]></category>
		<category><![CDATA[CPU]]></category>
		<category><![CDATA[Gridware]]></category>
		<category><![CDATA[NUMA]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[Scheduling]]></category>
		<guid isPermaLink="false">https://bablick.de/?p=125</guid>

					<description><![CDATA[As CPUs evolve toward hybrid designs with mixed core types and increasingly complex memory hierarchies, HPC schedulers must also evolve.This post explains how Gridware Cluster Scheduler 9.1.0 meets that challenge—bringing detailed, topology-aware resource scheduling to modern heterogeneous compute nodes. Why Topology Awareness Matters In modern high-performance computing (HPC), CPU cores within a single socket may...]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"><figure class="wp-block-post-featured-image"><img loading="lazy" decoding="async" width="512" height="512" src="https://bablick.de/wp-content/uploads/2025/10/Brain-topology-e1761246631294.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Hardware Topology" style="object-fit:cover;" /></figure></div>



<div class="wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>As CPUs evolve toward hybrid designs with mixed core types and increasingly complex memory hierarchies, HPC schedulers must also evolve.<br>This post explains how <strong>Gridware Cluster Scheduler 9.1.0</strong> meets that challenge—bringing detailed, topology-aware resource scheduling to modern heterogeneous compute nodes.</p>
</div>
</div>



<span id="more-125"></span>



<h2 class="wp-block-heading">Why Topology Awareness Matters</h2>



<p>In modern high-performance computing (HPC), CPU cores within a single socket may differ in clock frequency, power characteristics, or cache layout. Meanwhile, memory hierarchies—NUMA nodes, multi-level caches, and chiplets—add new layers of complexity.</p>



<p>To schedule jobs efficiently, a cluster manager must understand and exploit this hardware topology.<br><strong>Gridware Cluster Scheduler 9.1.0</strong> introduces expanded binding and topology-awareness features to maximize performance and ensure predictable resource placement.</p>



<h2 class="wp-block-heading">Three Hardware Topologies from NVIDIA, AMD, and Intel</h2>



<p>To demonstrate the scheduler’s new capabilities, the following sections show real-world topology examples from <strong>NVIDIA</strong>, <strong>AMD</strong>, and <strong>Intel</strong> hardware.</p>



<h3 class="wp-block-heading">NVIDIA DGX Spark</h3>



<p>The <strong>NVIDIA DGX Spark</strong>—notable for its presentation by Jensen Huang to Elon Musk at SpaceX—uses a <strong>heterogeneous ARM architecture</strong> optimized for AI/ML workloads.<br>The system features <strong>20 ARM cores</strong> organized into <strong>five performance tiers</strong>, each with unique efficiency and frequency characteristics:</p>



<pre class="wp-block-code"><code>CPU-Type #4: efficiency=4, cpuset=0x00080000
  FrequencyMaxMHz = 4004
  LinuxCapacity   = 1024
CPU-Type #3: efficiency=3, cpuset=0x00078000
  FrequencyMaxMHz = 3978
  LinuxCapacity   = 1017
CPU-Type #2: efficiency=2, cpuset=0x000003e0
  FrequencyMaxMHz = 3900
  LinuxCapacity   = 997
CPU-Type #1: efficiency=1, cpuset=0x00007c00
  FrequencyMaxMHz = 2860
  LinuxCapacity   = 731
CPU-Type #0: efficiency=0, cpuset=0x0000001f
  FrequencyMaxMHz = 2808
  LinuxCapacity   = 718</code></pre>



<p>Using Intel’s terminology, this architecture could be viewed as <strong>10 Power cores</strong> and <strong>10 Efficiency cores</strong><br>(10 × ARM Cortex-X925 + 10 × ARM Cortex-A725). Each core has private L1/L2 caches, and groups of 10 share an L3 cache.</p>



<pre class="wp-block-code"><code>&gt; loadcheck -cb | grep Topology
Topology (GCS): NSXEEEEECCCCCXEEEEECCCCC</code></pre>



<p>Gridware Cluster Scheduler uses <strong>topology strings</strong> to represent such layouts.<br>Here: <code>N</code> = NUMA node, <code>S</code> = socket, <code>X</code> = L3 cache, <code>E</code> = Efficiency core, and <code>C</code> = Power core.</p>



<h3 class="wp-block-heading">Intel i9-14900HX</h3>



<p>While the <strong>Intel i9-14900HX</strong> isn’t typical for HPC clusters, it’s an ideal case study for <strong>hybrid core</strong> architectures.</p>



<pre class="wp-block-code"><code>&gt; loadcheck -cb | grep Topology
Topology (GCS): NSXCTTCTTCTTCTTCTTCTTCTTCTTYEEEEYEEEEYEEEEYEEEE</code></pre>



<ul class="wp-block-list">
<li><strong>Power cores (C)</strong>: Dual-threaded (<code>T</code>), each with its own L2 cache.</li>



<li><strong>Efficiency cores (E)</strong>: Single-threaded, grouped by four per L2 cache (<code>Y</code>).</li>



<li><strong>NUMA node (N)</strong> and <strong>socket (S)</strong>: Encompass both core types and a shared L3 cache (<code>X</code>).</li>
</ul>



<h3 class="wp-block-heading">AMD EPYC Zen5</h3>



<p>The <strong>AMD EPYC Zen5</strong> series (e.g., <code>AMD-Epyc-Zen5-c4d-highmem-384</code>) represents a <strong>chiplet-based homogeneous design</strong>.<br>Each core provides two hardware threads, and the L3 cache structure (<code>X</code>) maps directly to chiplets/dies.</p>



<pre class="wp-block-code"><code>&gt; loadcheck -cb | grep Topology
Topology (GCS): NSXCTTCTTCTTCTTCTTCTTCTTCTT XCTTCTTCTTCTTCTTCTTCTTCTT
                  XCTTCTTCTTCTTCTTCTTCTTCTT XCTTCTTCTTCTTCTTCTTCTTCTT
                  ... (repeated chiplet layout per socket)
                NSXCTTCTTCTTCTTCTTCTTCTTCTT XCTTCTTCTTCTTCTTCTTCTTCTT
                  XCTTCTTCTTCTTCTTCTTCTTCTT XCTTCTTCTTCTTCTTCTTCTTCTT
                  ... (repeated chiplet layout per socket)</code></pre>



<p>Each socket (<code>S</code>) corresponds to one NUMA node (<code>N</code>), while every core has a private L2 cache.</p>



<h2 class="wp-block-heading">Handling Heterogeneous Topologies in Gridware Cluster Scheduler</h2>



<p>Efficient scheduling means <strong>assigning tasks to the most suitable hardware</strong>.<br>If a parallel job spans both slow and fast cores, the slowest becomes a bottleneck. Similarly, crossing NUMA or cache boundaries increases latency.</p>



<p>Gridware Cluster Scheduler 9.1 introduces <strong>fine-grained binding control</strong>, allowing binding to:</p>



<ul class="wp-block-list">
<li><strong>Sockets</strong></li>



<li><strong>Cores</strong></li>



<li><strong>Threads</strong></li>



<li><strong>NUMA nodes</strong></li>



<li><strong>Chiplets/Dies (cache domains)</strong></li>
</ul>



<p>This ensures optimal locality and predictable performance, even on hybrid or asymmetric systems.</p>



<h3 class="wp-block-heading">Chiplet/Die Binding Example</h3>



<pre class="wp-block-code"><code>qsub -pe mpi 15 -btype host -bamount 2 -bunit X ...</code></pre>



<p>This example requests <strong>15 MPI tasks</strong>, all running on a single host. Using <code>-btype host</code>, binding is applied relative to the host topology. With <code>-bamount 2 -bunit X</code>, each job portion binds to <strong>two chiplets/dies</strong>, ensuring that cache boundaries are respected and minimizing cross-die interference.</p>



<p>💡 <em>In this setup, the job uses 15 out of 16 available cores. The scheduler keeps the remaining core idle to prevent contention.</em></p>



<h2 class="wp-block-heading">Summary</h2>



<p>With version 9.1.0, Gridware Cluster Scheduler becomes fully topology-aware, bridging the gap between modern heterogeneous hardware and intelligent workload scheduling.<br>By supporting multiple binding types (core, socket, thread, NUMA, chiplet/die), it ensures efficient resource utilization and predictable performance across diverse compute nodes.</p>



<p>We are currently in the QA phase of this release and welcome user feedback on these new features.<br>They are already included in our nightly builds for testing, and beta releases will be available soon.<br><a href="https://hpc-gridware.com/download-main/" data-type="link" data-id="https://hpc-gridware.com/download-main/">Download Gridware Cluster Scheduler</a></p>



<p>Stay tuned with HPC-Gridware for updates — we’ll share <a href="https://bablick.de/understanding-binding-in-gridware-cluster-scheduler/" data-type="post" data-id="133">more insights</a>, examples, and best practices as we approach the official release.</p>



<p>Follow me at <a href="https://x.com/ebablick" data-type="link" data-id="https://x.com/ebablick">X/Twitter</a> or follow us at HPC-Gridware (<a href="https://www.linkedin.com/company/hpc-gridware" data-type="link" data-id="https://www.linkedin.com/company/hpc-gridware">LinkedIn</a>, <a href="https://x.com/HPC_Gridware" data-type="link" data-id="https://x.com/HPC_Gridware">X/Twitter</a>) for release announcements, tips, and technical insights.</p>



<p></p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Open Cluster Scheduler and Gridware Cluster Scheduler v9.0.8 are Available</title>
		<link>https://bablick.de/open-cluster-scheduler-and-gridware-cluster-scheduler-v9-0-8-are-available/</link>
		
		<dc:creator><![CDATA[ernst.bablick]]></dc:creator>
		<pubDate>Thu, 28 Aug 2025 13:15:59 +0000</pubDate>
				<category><![CDATA[HPC]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA[GCS]]></category>
		<category><![CDATA[Gridware]]></category>
		<category><![CDATA[OCS]]></category>
		<guid isPermaLink="false">https://bablick.de/?p=100</guid>

					<description><![CDATA[OCS and GCS v9.0.8 are now available. As usual, the packages can be downloaded from the HPC-Gridware download page, and the source code is available on the Cluster Scheduler GitHub project page. The list of fixed issues mentioned in the Release Notes can be found here: Improvement CS-739 qstat -j output should contain job state,...]]></description>
										<content:encoded><![CDATA[<div class="wp-block-image">
<figure class="aligncenter size-full is-resized"><a href="https://www.hpc-gridware.com/"><img alt="HPC-Gridware Logo" loading="lazy" decoding="async" width="292" height="73" src="https://bablick.de/wp-content/uploads/2025/08/NEW-HPC-GRIDWARE-BLACK-DEMO.png" alt="" class="wp-image-103" style="width:400px;height:auto"/></a></figure></div>


<p>OCS and GCS v9.0.8 are now available. As usual, the packages can be downloaded from the <a href="https://hpc-gridware.com/download-ocs-9-0-8/">HPC-Gridware download page</a>, and the source code is available on the <a href="https://github.com/hpc-gridware/clusterscheduler">Cluster Scheduler GitHub project page</a>.</p>



<p>The list of fixed issues mentioned in the <a href="https://www.hpc-gridware.com/download/11138/?tmstv=1756385425">Release Notes</a> can be found here:</p>



<h3 class="wp-block-heading">Improvement</h3>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-739">CS-739</a> qstat -j output should contain job state, start time, queue name, and host names</p>



<h3 class="wp-block-heading">Task</h3>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1407">CS-1407</a> Add SUSE SLES 15 support in support matrix of release notes</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1440">CS-1440</a> Add qtelemetry licenses to GCS 3rdparty licenses directory</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1470">CS-1470</a> do memory testing on V90_BRANCH for the 9.0.8 release</p>



<h3 class="wp-block-heading">Sub-task</h3>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1394">CS-1394</a> Add start_time of array jobs tasks to qstat -j</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1395">CS-1395</a> Cleanup of job states and show states also in qstat -j output</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1396">CS-1396</a> Show granted host information in qstat -j output</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1404">CS-1404</a> Show granted queues in qstat -j output</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1410">CS-1410</a> Show priority in qstat -j output even if it is the base priority</p>



<h3 class="wp-block-heading">Bug</h3>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-671">CS-671</a> qrsh truncates the command line and/or output at 927 characters</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1019">CS-1019</a> sge_execd logs errors when running tightly integrated parallel jobs</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1270">CS-1270</a> Installation script clears screen in case of an error which make issues harder to debug</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1381">CS-1381</a> qacct complains &#8220;error: ignoring invalid entry in line 436&#8221; for accounting records with huge command line entry</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1386">CS-1386</a> man page for sge_share_mon is missing</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1403">CS-1403</a> sge_ckpt man-page is in wrong section (1 instead of 5)</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1422">CS-1422</a> endless loop in protocol between sge_qmaster and sge_execd in certain job failure situations</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1424">CS-1424</a> qmod -sj on own job fails on submit only host</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1429">CS-1429</a> sge_qmaster can segfault on qdel -f</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1434">CS-1434</a> clearing error state of a job leads to event callback error logging in qmaster messages file</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1435">CS-1435</a> rescheduling of jobs requires manager rights, documented is &#8220;manager or operator rights&#8221;</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1436">CS-1436</a> qmod man pages says it requires manager or operator privileges to rerun a job, but a job owner may rerun his own jobs as well</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1451">CS-1451</a> option -out of examples/jobsbin//work is broken</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1476">CS-1476</a> Go DRMAA does not set JoinFiles() correctly</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1477">CS-1477</a> In Go DRMAA TransferFiles() does not set all values</p>



<p>Please let me or the <a href="https://www.hpc-gridware.com/contact/" data-type="link" data-id="https://www.hpc-gridware.com/contact/">HPC-Gridware team</a> know if you have any questions.</p>



<p></p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
