Blog posts Volupe

How to Efficiently Manage Workstation Resources

Nowadays, cluster queueing systems and IT procedures typically ensure a smooth user experience. These make it straight forward to distribute your simulations in a fair and efficient manner to the resources. But what if you don´t have a cluster, and neither a queuing system but rather a powerful workstation? Let´s find out how you can optimize the usage of hardware resources. In this week’s blog post, we proudly share a fantastically scripted resource manager for Linux environment by our customer Luc Evertzen, R&D Lead Developers/Simulation at Brink Climate Systems.

Workstation resources

Working With Parallel Servers

It is highly desirable to compute CFD solutions in parallel by distributing the mesh across several processes on multiple cores. This functionality is based upon a data parallelism concept in which each process applies numerical methods identical to those used in a serial solution.

Communication between the processes ensures consistency with a serial solution by passing messages through the Message Passing Interface (MPI). Regardless of whether you use a local multi-core workstation, remote machines, or a remote cluster to run your parallel server, all approaches require an implementation of the MPI which comes with the STAR-CCM+ installation.

 

Using a multi-core workstation or a group of machines

You can use a local multi-core workstation or group several computers to a cluster. However, to run in parallel, Simcenter STAR-CCM+ must be installed on each of the machines you wish to run on. And the servers must be a set of homogeneous machines (either Windows or Linux machines can be coupled). A simulation can be started on a group of machines by:

  • Defining a Machine File for Parallel Hosts

    A list of machines on which you wish to place processes

  • Starting a Cluster of Machines Manually

    Prescribing the hosts where processes are started

  • Using Batch Management Systems

    A batch system manages the job, allocating it to specific machines when they become available based on the requested resources

Resource Management on a single machine workstation

Using the hardware efficiently can be challenging. License availability and different user requests need to be aligned with hardware resources. A Batch Management System is a sophisticated option to delegate the computational jobs, but the installation and configuration is ambitious. In the following two sections we describe the installation and configuration of a Batch Management System and a Linux environment (tested on a Rocky Linux 8.5), as well as a streamlined but efficient scripting approach with individual adjustments to keep the workstation busy.

Installation of the SLURM Batch Management System

Simcenter STAR-CCM+ can be run in batch mode under the control of different batch systems like SLURM, PBS or Grid Engine. SLURM is an excellent work scheduling tool for High-Performance computing clusters. In addition, it can be a valuable tool on a local desktop or single server when you need to run several programs at once and queue them up while ensuring you don’t overload your computer or server. Furthermore, it can be useful in cases where you share a server with other users or need to run multiple jobs overnight.

1. Install packages

First we install SLURM packages

sudo yum update -y
sudo yum install slurm-slurmd slurmctld -y

Slurm uses MUNGE service for authentication within a group of hosts. Strictly speaking we just have one host (localhost), but we still need a munge key to authenticate us when submitting a job.

sudo yum install epel-release -y
yum install munge munge-libs munge-devel -y

2. Configure

Configure SLURM queue setup in SLURMs configuration file. Here we adjust the COMPUTE NODES section to your machines specs. e.g. if you have10 cores CPUs=10 and your memory is 32000MB RealMemory=32000 . You also need to set MpiDefault=none

sudo chmod 777 /etc/slurm/
vi /etc/slurm/slurm.conf

Create a MUNG key and set the permissions

sudo mkdir /etc/munge
dd if=/dev/urandom bs=1 count=1024 | sudo tee -a /etc/munge/munge.key
sudo chown munge: /etc/munge/munge.key
sudo chmod 400 /etc/munge/munge.key
sudo chmod a+x /var/run/munge

3. Start and Check

Start the MUNG service

sudo systemctl enable munge
sudo systemctl start munge

Now let’s get SLURM started with systemd:

sudo systemctl start slurmd -l
sudo systemctl start slurmctld.services -l

4. Set machine to idle

Lastly, let’s set our machine as idle, so we can start queuing up jobs:

sudo scontrol update nodename=localhost state=idle
sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
LocalQ*      up   infinite      1   idle localhost

If successful you see the above and well done, you have got slurm up and running. You now have a queue(or “partition” in slurm lingo) called LocalQ that you can now submit your work to. If you have any issues you can debug it by looking in the logfiles in /var/log/slurm-llnl/slurmd.log and /var/log/slurm-llnl/slurmctld.log.

To submit a job to the queue you can use the sbatch command. See below a short sample submission script for a Simcenter STAR-CCM+ job:

#!/bin/bash
#SBATCH -J starXX
#SBATCH --nodes=3
#SBATCH --ntasks=28
#SBATCH -n 84
#SBATCH exclusive

#SBATCH -o output_file.o
#SBATCH -e error_flie.e

#SBATCH -t 00:30:00

module load starccm+/17.02.010

sim_file="simulation.sim"
STARTMACRO="/home/MYCASE/runsim.java"

starccm+ -power -podkey your_PoD_key -licpath 1999@flex.cd-adapco.com -np 84 $sim_file -batch $STARTMACRO -batch-report

Scripting

A more flexible and streamlined approach can be to write your own bash script. A Bash script is a plain text file which contains a series of commands which we would normally type ourselves in the command line. But it can even contain loops and IF statements.

A very simple script can be composed to sequentially run several simulations in batch:

#!/bin/bash

starccm+ -batch -np 2 File1.sim ;
starccm+ -batch -np 2 File2.sim ; 
starccm+ -batch -np 2 File3.sim ;

However, our customer Luc Evertzen came up with a much smarter bash script. This script runs with the nohup (stands for no hangup) command to create a continuous background task. It runs continuously with a 20 second wait time and searches a folder for newly added .sim/.dmprj files.

For each sim and dmprj file it then runs starccm+ commands depending on the filetype. After the simulation was solved, an empty file is placed in the same folder with the same name as the simulation, but with the .solved extension. This flag tells the script to avoid running already solved simulations. So now you can just throw simulations at the server and the simulations will be run.

#!/bin/sh

# Job Manager script created by Luc Evertzen
while true; do
 day=$(date +'%d')
 if [ "$day" == 1 ]; then
  : > nohup.out
 Fi

# Find all simulation files in the current folder
 for i in “*.{sim,dmprj}”; do
# If there is no file with the same name but the .solved extension submit!
  if ! test -f $i.solved; then
   string=$i
   extension=".sim"
   case "$string" in
    *$extension*)
# Case for sim files
     starInstallationPath/star/bin/starccm+ -batch Run.java -power -np 32 -load $i;
     touch $i.solved ;
     rm *.sim~;;
# Case for dmprj files
    *           )
     starInstallationPath/star/bin/starccm+ $i -batch -dmnoshare -passtodesign "-power";
     touch $i.solved ;
     rm *.dmprj~;;
   esac
  fi
 done
 sleep 20
done

The simulations are run with a java macro which adds an additional layer of automation to the submission. The scripts checks from some naming conventions to decide whether to clear the simulation, run the mesh pipeline or to solve the physics. The naming convention uses double underscores to separate the filename from operation commands and reads as follows:

  • Clear simulation: MySim__c.sim
  • Mesh simulation: MySim__m.sim
  • Run simulation: MySim__s.sim

The naming convention can also be used to combine operations. For example:

  • Clear and solve simulation: MySim__cs.sim
  • Mesh and solve simulation: MySim__mc.sim
  • etc.

It also contains a fallback clause where it will simply mesh and solve if operation commands are omitted from the file name. Finally, the number of unique processes can be customized by simply adding additional if statements to the java file. For example:

  • if (commands.contains(“r”)): Could be used to render any screenplays
  • if (commands.contains(“p”)): Could be used to save all scenes and plots present in the simulation
// STAR-CCM+ macro: MeshRunAndSave.java
package macro;
import java.io.*;
import star.base.neo.*;
import star.common.*;
import star.meshing.*;
import star.surfacewrapper.*;
import java.util.*;

public class Run extends StarMacro {
    public void execute() {
        Simulation sim = getActiveSimulation();
        String name = sim.getPresentationName();
        try {
            String commands = name.split("__")[1];
            if (commands.contains("c")) {
                sim.getSolution().clearSolution();
                sim.saveState(sim.getPresentationName() + ".sim");
            }
            if (commands.contains("m")) {
                MeshPipelineController meshPipelineController_0 =
                    sim.get(MeshPipelineController.class);
                meshPipelineController_0.generateVolumeMesh();
                sim.saveState(sim.getPresentationName() + ".sim");
            }
            if (commands.contains("s")) {
                sim.getInterfaceManager().initialize();
                sim.getSimulationIterator().run();
                sim.saveState(sim.getPresentationName() + ".sim");
            }
        } catch (Exception ex) {
            MeshPipelineController meshPipelineController_0 =
                    sim.get(MeshPipelineController.class);
                meshPipelineController_0.generateVolumeMesh();
            sim.getInterfaceManager().initialize();
            sim.getSimulationIterator().run();
            sim.saveState(sim.getPresentationName() + ".sim");
        }
    }
}

The combination of both the bash script and the java macro allows for least user interaction to run a job. The user only needs to place the files in the “running directory” and wait for the *.solved file to appear. All this is, however, possible thanks to a license feature of Simcenter STAR-CCM+, that makes a job wait untill licenses become available again. Normally, FlexNet will return if a license is unavailable, and the code will exit. If STARWAIT environment variable is set, then FlexNet will wait until a license becomes available. So, one essential step is to simply add “export STARWAIT=1” command in your batch script.

The queueing system can be killed by running the “ps fjx” command and killing the “/bin/sh ./run.sh” process with the “kill ‘PID’” command, where ‘PID’ is the PID of the script process (second column).

The endless loop inside the script could possibly be removed and the script with only the case statement could being scheduled by a crown job. This would avoid problems to stop the execution. We hope this blog post will help you in managing your computer resources in the best possible way. For more details or if you have any questions, you are always welcome to reach out to us at

 

The Authors

Luc Evertzen
R&D Engineer at Brink Climate Systems


Florian Vesting, PhD
Contact: support@volupe.com
+46 768 51 23 46

More blog posts

en_USEnglish