<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>GCS &#8211; bablick.de</title>
	<atom:link href="https://bablick.de/tag/gcs/feed/" rel="self" type="application/rss+xml" />
	<link>https://bablick.de</link>
	<description>Writing About Clusters, Curiosity, and Everything in Between.</description>
	<lastBuildDate>Wed, 05 Nov 2025 06:15:15 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.8.3</generator>

<image>
	<url>https://bablick.de/wp-content/uploads/2025/08/cropped-BablickLogo-1-32x32.png</url>
	<title>GCS &#8211; bablick.de</title>
	<link>https://bablick.de</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>How Binding Order and Range Shape Job Placement in Gridware Cluster Scheduler</title>
		<link>https://bablick.de/how-binding-order-and-range-shape-job-placement-in-gridware-cluster-scheduler/</link>
		
		<dc:creator><![CDATA[Ernst Bablick]]></dc:creator>
		<pubDate>Thu, 30 Oct 2025 23:31:32 +0000</pubDate>
				<category><![CDATA[HPC]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA["Cache and Memory Optimization"]]></category>
		<category><![CDATA["Cost Aware Scheduling"]]></category>
		<category><![CDATA["CPU Binding"]]></category>
		<category><![CDATA["Energy-Aware Scheduling"]]></category>
		<category><![CDATA["Gridware Cluster Scheduler"]]></category>
		<category><![CDATA[GCS]]></category>
		<category><![CDATA[NUMA]]></category>
		<category><![CDATA[OCS]]></category>
		<guid isPermaLink="false">https://bablick.de/?p=158</guid>

					<description><![CDATA[This article explains how the three binding options -bsort, -bstart, and -bstop control where and in which order jobs are bound to CPUs in Gridware Cluster Scheduler. Together, they define the scheduler’s fill-up pattern — how hardware resources are used, balanced, or reused across sockets, cores, and NUMA nodes. Understanding these options is essential for...]]></description>
										<content:encoded><![CDATA[
<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%"><figure class="aligncenter wp-block-post-featured-image"><img fetchpriority="high" decoding="async" width="512" height="512" src="https://bablick.de/wp-content/uploads/2025/10/Binding-Topilogy-Advanced-e1761865388777.png" class="attachment-post-thumbnail size-post-thumbnail wp-post-image" alt="Advanced Binding Topology" style="object-fit:cover;" /></figure></div>



<div class="wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p>This article explains how the three binding options <code>-bsort</code>, <code>-bstart</code>, and <code>-bstop</code> control <strong>where and in which order jobs are bound to CPUs</strong> in Gridware Cluster Scheduler.</p>



<p>Together, they define the scheduler’s <strong>fill-up pattern</strong> — how hardware resources are used, balanced, or reused across sockets, cores, and NUMA nodes.</p>



<p>Understanding these options is essential for creating advanced scheduling strategies such as <strong>energy-aware, thermal, or cost-optimized job placement</strong>.</p>
</div>
</div>



<span id="more-158"></span>



<h2 class="wp-block-heading"><strong>Further Reading</strong></h2>



<p>If you missed the first two parts of this series, start there — they introduce the concepts of topology strings and binding basics that this article builds on:</p>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full"><img decoding="async" width="512" height="512" src="https://bablick.de/wp-content/uploads/2025/10/Brain-topology-e1761246631294.png" alt="hardware-topology" class="wp-image-128"/></figure>
</div>



<div class="wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p><a href="https://bablick.de/compute-nodes-with-heterogenious-topology-in-gridware-cluster-scheduler/" data-type="post" data-id="125">Compute Nodes with Heterogeneous Topology in Gridware Cluster Scheduler</a></p>
</div>
</div>



<div class="wp-block-columns is-layout-flex wp-container-core-columns-is-layout-9d6595d7 wp-block-columns-is-layout-flex">
<div class="wp-block-column is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:33.33%">
<figure class="wp-block-image size-full"><img alt="Binding Topology" decoding="async" width="512" height="512" src="https://bablick.de/wp-content/uploads/2025/10/Brain-Topology-Binding-e1761421460621.png" alt="" class="wp-image-135"/></figure>
</div>



<div class="wp-block-column is-vertically-aligned-center is-layout-flow wp-block-column-is-layout-flow" style="flex-basis:66.66%">
<p><a href="https://bablick.de/understanding-binding-in-gridware-cluster-scheduler/">Understanding Binding in Gridware Cluster Scheduler</a></p>
</div>
</div>



<h2 class="wp-block-heading"><strong>Why order and range matter</strong></h2>



<p>Imagine your cluster as a row of houses, each with several rooms. Jobs are like guests arriving to stay overnight.</p>



<p>Without a plan, guests will pick rooms at random. Some might share walls and benefit from proximity; others might end up in different houses entirely.</p>



<p>Binding order and range options give the scheduler a <em>plan</em> — a way to decide how the “houses” and “rooms” are filled:</p>



<ul class="wp-block-list">
<li><code>-bsort</code> defines <strong>which house or room type to consider first</strong>.</li>



<li><code>-bstart</code> sets <strong>where the first guest can enter</strong>.</li>



<li><code>-bstop</code> defines <strong>where the group must stop filling</strong>.</li>
</ul>



<p>It’s a simple idea — but it’s the foundation for consistent, balanced, and sometimes energy-efficient scheduling. This ensures that jobs don’t overlap or scatter unpredictably — they follow a defined parking pattern.</p>



<h2 class="wp-block-heading"><strong>1. Sorting with <code>-bsort</code></strong></h2>



<p>The <code>-bsort</code> option controls the <em>order</em> in which available CPU units are considered. It uses the same letters as topology strings (<code>S</code>, <code>C</code>, <code>E</code>, <code>N</code>, <code>X</code>, <code>Y</code>) but interprets them by utilization:</p>



<ul class="wp-block-list">
<li><strong>Uppercase</strong> letters (<code>S</code>, <code>C</code>, <code>N</code>…) mean <em>start with free resources</em>.</li>



<li><strong>Lowercase</strong> letters (<code>s</code>, <code>c</code>, <code>n</code>…) mean <em>start with already-used resources</em>.</li>
</ul>



<p>This simple mechanism lets administrators describe sophisticated fill-up patterns:</p>



<ul class="wp-block-list">
<li><code>-bsort "SC"</code> — Fill unutilized sockets and cores first.</li>



<li><code>-bsort "sC"</code> — Reuse partially filled sockets before opening a new one.</li>



<li><code>-bsort "nSyC"</code> — (GCS only) Use utilized NUMA nodes and 3rd Level Caches before starting opening new ones but empty sockets and cores are preferred within NUMA nodes and core groups.</li>
</ul>



<p>If no sort order is defined, the scheduler uses the hardware’s natural order as reported by HWLOC. This order is the default behavior in most competitive workload management systems and ensures predictable placement when the WLM lacks a possibility to specify preferences.</p>



<h2 class="wp-block-heading"><strong>2. Defining the range with <code>-bstart</code> and <code>-bstop</code></strong></h2>



<p>After sorting, the scheduler knows which binding units exist and in what order they should be considered. The next question is: <em>how much of that ordered list should be used?</em> That’s what <code>-bstart</code> and <code>-bstop</code> control.</p>



<p>You can think of the sorted topology as a long sequence of nodes — a string that lists sockets, cores, or cache domains in the order defined by <code>-bsort</code>. </p>



<p>With <code>-bstart</code> and <code>-bstop</code>, you mark a <strong>window</strong> inside that string to show where binding is allowed.</p>



<pre class="wp-block-code"><code>Start here → &#91;================] ← Stop here</code></pre>



<p>The brackets represent the usable part of the topology. Binding can happen only inside this bracketed region; everything outside it is ignored. This is especially useful when you want to reserve or exclude certain parts of a host — for example, to keep the first socket free for system services or to group jobs within a specific NUMA region.</p>



<p>Consider a dual-socket, quad-core host represented as:</p>



<pre class="wp-block-code"><code>SCCccSCCCC</code></pre>



<p>(lowercase <code>c</code> marks partially utilized cores)</p>



<ul class="wp-block-list">
<li><code>-bstart S -bstop s</code> → the scheduler begins binding at the first <strong>unutilized socket (<code>S</code>)</strong> and stops before the first <strong>partially used socket (<code>s</code>)</strong>, effectively limiting the job to the clean half of the machine. (SCCcc[SCCCC])</li>



<li><code>-bstart s -bstop S</code> → Reversing the range starts binding at the first <strong>used socket (<code>s</code>)</strong> and continues until the next <strong>free socket (<code>S</code>)</strong>  ([SCCcc]SCCCC)</li>
</ul>



<p>You can imagine this as managing a parking lot: <code>-bsort</code> arranges the parking spaces by preference, and <code>-bstart</code>/<code>-bstop</code> draw the lines around the section that is currently open for parking. Jobs will always stay within that section, preventing overlap and ensuring predictable placement.</p>



<h2 class="wp-block-heading"><strong>3. Real-world significance</strong></h2>



<p>At first glance, <code>-bsort</code>, <code>-bstart</code>, and <code>-bstop</code> may seem like low-level tweaks. In reality, they are the <strong>building blocks of enterprise scheduling strategy</strong>.</p>



<p>These options influence how load, heat, and power consumption are distributed across hardware, and they provide the basis for higher-level policies such as:</p>



<h3 class="wp-block-heading has-medium-font-size"><strong>Energy-Aware Scheduling</strong> – fill one socket or die before activating another.</h3>



<h3 class="wp-block-heading has-medium-font-size"><strong>Thermal &amp; Power-Density Balancing</strong> – distribute workloads evenly across nodes.</h3>



<h3 class="wp-block-heading has-medium-font-size"><strong>License- or Cost-Aware Placement</strong> – schedule on specific CPU&#8217;s first.</h3>



<h3 class="wp-block-heading has-medium-font-size"><strong>Cache &amp; Memory-Bandwidth Optimization</strong> – keep related processes close to shared caches or NUMA regions.</h3>



<p>In GCS, JSV (Job Submission Verifier) scripts can dynamically adjust these settings to enforce global policies across the cluster.</p>



<h2 class="wp-block-heading"><strong>4. Looking ahead</strong></h2>



<p>This post closes the “mechanics” phase of the binding series. In the next article, we’ll apply everything we’ve learned to <strong>real enterprise use cases</strong> — combining sorting, filtering, and range selection to implement:</p>



<ul class="wp-block-list">
<li>Energy-optimized cluster configurations</li>



<li>Thermal-balanced compute fabrics</li>



<li>Cost- and license-aware job scheduling</li>
</ul>



<p>What seems like three small options today will soon become the core tools for advanced resource optimization.</p>
]]></content:encoded>
					
		
		
			</item>
		<item>
		<title>Open Cluster Scheduler and Gridware Cluster Scheduler v9.0.8 are Available</title>
		<link>https://bablick.de/open-cluster-scheduler-and-gridware-cluster-scheduler-v9-0-8-are-available/</link>
		
		<dc:creator><![CDATA[ernst.bablick]]></dc:creator>
		<pubDate>Thu, 28 Aug 2025 13:15:59 +0000</pubDate>
				<category><![CDATA[HPC]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA[GCS]]></category>
		<category><![CDATA[Gridware]]></category>
		<category><![CDATA[OCS]]></category>
		<guid isPermaLink="false">https://bablick.de/?p=100</guid>

					<description><![CDATA[OCS and GCS v9.0.8 are now available. As usual, the packages can be downloaded from the HPC-Gridware download page, and the source code is available on the Cluster Scheduler GitHub project page. The list of fixed issues mentioned in the Release Notes can be found here: Improvement CS-739 qstat -j output should contain job state,...]]></description>
										<content:encoded><![CDATA[<div class="wp-block-image">
<figure class="aligncenter size-full is-resized"><a href="https://www.hpc-gridware.com/"><img alt="HPC-Gridware Logo" loading="lazy" decoding="async" width="292" height="73" src="https://bablick.de/wp-content/uploads/2025/08/NEW-HPC-GRIDWARE-BLACK-DEMO.png" alt="" class="wp-image-103" style="width:400px;height:auto"/></a></figure></div>


<p>OCS and GCS v9.0.8 are now available. As usual, the packages can be downloaded from the <a href="https://hpc-gridware.com/download-ocs-9-0-8/">HPC-Gridware download page</a>, and the source code is available on the <a href="https://github.com/hpc-gridware/clusterscheduler">Cluster Scheduler GitHub project page</a>.</p>



<p>The list of fixed issues mentioned in the <a href="https://www.hpc-gridware.com/download/11138/?tmstv=1756385425">Release Notes</a> can be found here:</p>



<h3 class="wp-block-heading">Improvement</h3>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-739">CS-739</a> qstat -j output should contain job state, start time, queue name, and host names</p>



<h3 class="wp-block-heading">Task</h3>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1407">CS-1407</a> Add SUSE SLES 15 support in support matrix of release notes</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1440">CS-1440</a> Add qtelemetry licenses to GCS 3rdparty licenses directory</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1470">CS-1470</a> do memory testing on V90_BRANCH for the 9.0.8 release</p>



<h3 class="wp-block-heading">Sub-task</h3>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1394">CS-1394</a> Add start_time of array jobs tasks to qstat -j</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1395">CS-1395</a> Cleanup of job states and show states also in qstat -j output</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1396">CS-1396</a> Show granted host information in qstat -j output</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1404">CS-1404</a> Show granted queues in qstat -j output</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1410">CS-1410</a> Show priority in qstat -j output even if it is the base priority</p>



<h3 class="wp-block-heading">Bug</h3>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-671">CS-671</a> qrsh truncates the command line and/or output at 927 characters</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1019">CS-1019</a> sge_execd logs errors when running tightly integrated parallel jobs</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1270">CS-1270</a> Installation script clears screen in case of an error which make issues harder to debug</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1381">CS-1381</a> qacct complains &#8220;error: ignoring invalid entry in line 436&#8221; for accounting records with huge command line entry</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1386">CS-1386</a> man page for sge_share_mon is missing</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1403">CS-1403</a> sge_ckpt man-page is in wrong section (1 instead of 5)</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1422">CS-1422</a> endless loop in protocol between sge_qmaster and sge_execd in certain job failure situations</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1424">CS-1424</a> qmod -sj on own job fails on submit only host</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1429">CS-1429</a> sge_qmaster can segfault on qdel -f</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1434">CS-1434</a> clearing error state of a job leads to event callback error logging in qmaster messages file</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1435">CS-1435</a> rescheduling of jobs requires manager rights, documented is &#8220;manager or operator rights&#8221;</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1436">CS-1436</a> qmod man pages says it requires manager or operator privileges to rerun a job, but a job owner may rerun his own jobs as well</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1451">CS-1451</a> option -out of examples/jobsbin//work is broken</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1476">CS-1476</a> Go DRMAA does not set JoinFiles() correctly</p>



<p><a href="https://hpc-gridware.atlassian.net/browse/CS-1477">CS-1477</a> In Go DRMAA TransferFiles() does not set all values</p>



<p>Please let me or the <a href="https://www.hpc-gridware.com/contact/" data-type="link" data-id="https://www.hpc-gridware.com/contact/">HPC-Gridware team</a> know if you have any questions.</p>



<p></p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
