Idbool

From Digitalis

Revision as of 17:16, 6 March 2015 by Neyron (Talk | contribs)
Jump to: navigation, search
| Introduction | Usage | Idfreeze | Idgraf | Idphix | Idbool | Idkat | Idcin | Idarm | Ppol | Grimage |

Contents

Overview

Idbool is a CC-NUMA system of 192 cores using the Numascale Numaconnect interconnect.

Technically, the machine is composed of 4 chassis/motherboards, each equipped with 3 AMD Opteron(tm) Processor 6376 (Abu Dhabi, 16 cores) and interconnected to the other nodes using the Numaconnect interconnect in a tore configuration with double links.

This Numaconnect interconnect provides a full hardware Single System Image (SSI) with a single memory space with cache coherency. As a result the system appears as a single Linux system.

Currently the systems is powered by a Ubuntu 14.04 LTS.

Technical documentations and other resources

For questions related to the performance achievable on this machine, please look at:

Installation notes

Instruction to install the system with Ubuntu 14.04 LTS

  • Alter /etc/sysct.conf and kernel parameters (taken from the Numascale Wiki: https://wiki.numascale.com/tips/os-tips)
  • Apply the patch from http://askubuntu.com/questions/468466/why-this-occurs-error-diskfilter-writes-are-not-supported due to software raid
  • Remove irqbalance, suggested by Numascale
  • Disable selinux and apparmor in /etc/default/grub, after that update-grub. Also disabled apparmor startup script
  • Blacklist the edac drivers, because they caused and error during boot seen in dmesg ( /etc/modprobe.d/blacklist.conf )
    • Not recommended by NumaScale, therefore reverted the steps above again. The traces can be considered as warning
    • This is due to scalability in the kernel, which should be fixed with the NumScale provided kernel
  • Install the linux-image-3.15.10-numascale17+_3.15.10-numascale17+-2_amd64.deb patch:
    • Works perfectly, scales pretty good: but swap is not in the kernel, so no swap space is usable. But swapping on a Numasystem does not make sense at all, because this slows down even more than on a normal system

How to experiment

Reserving and accessing idbool

By default OAR only gives access to 1 or the 4 hosts (motherboards) of the machine
[pneyron@digitalis ~]$ oarsub -I -p "machine='idbool'"
Properties: machine='idbool'
[ADMISSION RULE] Modify resource description with type constraints
Import job key from file: /home/pneyron/.ssh/id_rsa
OAR_JOB_ID=8348
Interactive mode : waiting...
Starting...

Connect to OAR job 8348 via the node idbool.grenoble.grid5000.fr
[OAR] OAR_JOB_ID=8348
[OAR] Your nodes are:
      idbool-1.grenoble.grid5000.fr*48

[pneyron@idbool ~](8348-->60mn)$ 

Then see:

[pneyron@idbool ~](8348-->57mn)$ cat /dev/cpuset/$(grep -o "/oar/.*" /proc/self/cgroup)/cpus
0-47
[pneyron@idbool ~](8348-->57mn)$ cat /dev/cpuset/$(grep -o "/oar/.*" /proc/self/cgroup)/mems
0-5

This job only gives access to the resources of the first host (motherboard) of the machine: logical CPUS (core) 0 to 47 and Numa nodes 0 to 5. Other resources of the machine can be seen (e.g. in `top') but are not reachable because isolated by the linux container cpuset of your job.


To reserve the complete machine, one must specify -l machine=1.

Furthermore, we request a 4 hours job in the example below:

[pneyron@digitalis ~]$ oarsub -I -p "machine='idbool'" -l machine=1,walltime=4
Properties: machine='idbool'
[ADMISSION RULE] Modify resource description with type constraints
Import job key from file: /home/pneyron/.ssh/id_rsa
OAR_JOB_ID=8349
Interactive mode : waiting...
Starting...

Connect to OAR job 8349 via the node idbool.grenoble.grid5000.fr
[OAR] OAR_JOB_ID=8349
[OAR] Your nodes are:
      idbool-1.grenoble.grid5000.fr*48
      idbool-2.grenoble.grid5000.fr*48
      idbool-3.grenoble.grid5000.fr*48
      idbool-4.grenoble.grid5000.fr*48

[pneyron@idbool ~](8349-->59mn)$

Privileged commands

Currently, the following commands can be run via sudo in exclusive jobs:

  • sudo /usr/bin/whoami (provided for testing the mechanism, should return "root")
  • sudo /usr/bin/schedtool
  • sudo /usr/bin/opcontrol
  • sudo /usr/bin/perf
  • sudo /usr/bin/lstopo

Performances

In order to get performance using the whole machine (see the case "machine=1" above), a special care must be taken with regard to data placement in memory vs. cpus. Indeed the numa factor between numa nodes from one motherboard to another motherboard is very high. A typical bandwidth might be as low as 90MB/s if accessing from one CPU, memory of a remote numa nodes. Numascale strongly advises to read https://resources.numascale.com/numascale-scaling-best-practice.pdf.

Personal tools
platforms