DECN: Isilon: SMB throughput benchmarking with fio.exe

(please, also take a look atNFS Throughput Benchmarks with fioarticle)

Performance benchmarking of Isilon Clusters during Proof Of Concepts (POCs) is quite often requested by customers, compared to "simple" integration and functionality testing:

 

smb_benchmarking_hadoop_bigdata_performance_isilon_keywords_are_fun.JPG.jpg

 

Source: https://www.reddit.com/r/engineteststands

OneFS supports the latest Server Message Block (SMB) protocol version, SMB3. This version of SMB has significant performance improvements, namely support for multiple concurrent TCP connections per single SMB session, known as "Multichannel" feature, that are distributed among CPU cores of the current Windows Clients (Server 2012, Windows 8.x, Windows 10).

Regardless of whether one needs to benchmark SMB2 or SMB3, the methodology of testing would be quite the same. As in the NFS Throughput Benchmarks with fio article, the approach discussed below is inspired by the methodology used by Isilon's Performance team at Technical Marketing Engineering (TME). Thus, by adopting the below approach, it would be possible to potentially closely achieve performance results and estimates from Isilon Sizing Tool during in-field POCs, with controllable Client:Node:Thread ratios.

Scripts on GitHub

Cluster-side and Client-side settings

All statements and assumptions, brought up in the previous NFS-related article, would be valid in SMB case as well, namely regarding usage of "out-of-the-box" configuration of Isilon OneFS, without any "special" tweaks:

  • SMB Opportunistic Locking is kept default Opclock On
  • Default MTU is used MTU 1500
  • Coalescer is on by default SmartCache On

Important Recommendations For Perfomance

Here are most important general recommendations to optimize the data:

  • Prior to testing, try to uncover the realistic network throughput achievable between all endpoints involved in POC. Isilon OneFS ships with iperf client/server utility pair, so it is necessary to test all download and upload paths between IP-s one by one.
  • One of OneFS’s unique features is the ability to set the protection level on a per-file or per-object level. Thus, it is important to ensure that the required protection levels (i.e. 2d:1n) or mirroring (i.e. 3x) are are relevant parts of test scenario. They could be set either via WebUI of OneFS, or by executing a command:

isi set –p +2:1 /ifs/data/YOUR_DIRECTORY_OF_TESTS

  • Prefer “Streaming” set on the directory of NFS export at Isilon side, to leverage greater read-ahead for reads with Adaptive Pre Fetch, and also trying to leverage different drives rotation between stripes when writing. This could either be done in a GUI, or via execution of a command:

isi set –l streaming –a streaming /ifs/data/YOUR_DIRECTORY_OF_TESTS…to set the data layout to "Default" and access pattern to "Random", it would be:isi set –l default –a random /ifs/data/YOUR_DIRECTORY_OF_TESTS

  • In both examples above, the simple setting of data layout or protection level would not restripe existing data, hence it is assumed that cleanups are done between tests. If, for some reason, it is required to restripe the existing dataset, the "-rR"   should be appended to recursively restripe the contents of the selected directory. This operation is impactful and is much slower than the cleanup approach:

isi set -rR –l default –a random /ifs/data/YOUR_DIRECTORY_OF_TESTS

  • If the POC terms require multiple parallel Linux clients testing, prefer to start from "1:1:1" Client:Node:Thread ratio profile.
  • Isilon serves best throughput doing Sequential Reads and Sequential Writes with block sizes of 32kB and higher, as these are known "best fit" workloads of Isilon. Any block size and access pattern, starting from 4KB, could be tested as well, indeed. Do not forget that OneFS block size is 8KB.
  • Prefer tests with file sizes of 12GB and larger for duration and sustainability. EMC Isilon TME runs tests with 50GB files.
  • Prefer physical servers as NFS clients. Otherwise, set up RAM reservations and and "High" CPU shares for Virtual Machines, and isolate them from any workloads into top-most Resource Pool in the VMware hierarchy.
  • Flush caches and on both NFS client and on EMC Isilon cluster, to have an unbiased disk throughput numbers between tests.
    • On Linux client: on majority of distributions, sync command without particular parameters could be used to flush filesystem buffers, otherwise, unmount the NFS export and re-mount it afterwards, i.e.:

sync && echo 3 > /proc/sys/vm/drop_caches

    • EMC Isilon cluster service command:

isi_for_array isi_flush

…to purge L1 and L2 caches. On clusters with OneFS 7.1.1 and newer, it is not advisable (due to loss of metadata and small random read blocks evicted from L2 to SSD-s), but also possible to purge L3 cache:

isi_for_array isi_flush –l3-full

Isilon FIO.exe Harness for Client:Node ratio SMB benchmarking

For the field testing of a controllable Client:Node:Thread ratios, a single virtual machine, also referred to as "harness", would be required, that would then be connecting to physical Windows clients participating in the test and distributing the commands to them. Neither DNS infrastructure nor EMC Isilon SmartConnect functionality are required. SmartConnect Advanced testing is assumed present in the default POC scenario anyway.

The overview of the 4-node EMC Isilon cluster benchmarking set-up, including a harness server and four (4) Windows clients with CYGWIN servers installed on them, is as follows:

 

00_isilon_smb_throughput_benchmarking_hadoop_bigdata_keywords_are_fun.png

 

Figure 1  –a four-node Isilon cluster example with all control scripts stored on Isilon 

Figure 1 depicts the following components:

  • Isilon Cluster
    • root user is advised to be configured with 'a' one-letter password for simplicity;
  • /ifs/data/fiotest – this folder on OneFS would be used on Isilon as the SMB share, as well as NFS export for Linux "harness" server. Disable "root squashing" on the NFS export, so that all commands during the benchmark could be executed by root user for simplicity
    • Before creation of the share, Windows Explorer should be used to modify folder's ACL and grant "Full Contol" recursive permissions to "Everybody" (see details below). It also makes sense to  chmod 777 it from Isilon before the creation of trusted.key and trusted.key.pub files that would require much narrower POSIX permissions.
    • When creating the fiotest SMB share, "Everyone" should be allowed "Read/Write" access (see detailsbelow)
    • Has to be R/W to NFS 'root' user connecting from client nodes, folders from clients would be created in this folder for temporary files storage used during tests;
    • fiojob_1024k_randread – a fio job description file, defining the IO pattern, size of the temp.file to be created and so on.
    • fiojob_1024k_seqwrite_5t – a fio job description file defining the read/write pattern, and also the 5 threads from client
  • control/ – sub-folder that would be storing the control files for the set-up:
    • cleanup_smb.sh –  script that would do housekeeping after the runs as per smb_hosts.list;
    • smb_hosts.list – list of client servers participating in the test in <IP_of_Windows_Server>|<IP_of_Isilon_Node> format;
    • run_smb_fio_1024k_randread.sh – bash file pointing to corresponding fio job;
    • run_smb_fio_1024k_seqwrite_5t.sh – bash file pointing to the thread-controlled fio job;
    • trusted.key and trusted.key.pub – generated public and private keys to avoid entering passwords during distribution of commands
  • WinServer0…4 – standard Windows Server OS distribution with with fiopackage in c:\fio\fio.exe..
    • LinServer0 is the "control" server, also referred to as a "harness server".

To save time whenever one is challenged to build this harness with stressful timing requirements, the following Cygwin section is outlined with great details:

Step 1: Configuring Windows Server 2012R2 Clients

  1. Fresh-install Windows 2012R2 server with GUI, either on physical server or on the VM. The recommended VM sizing:
    • 4 vCPU / 4 GB vRAM minimum
    • Thin-Provisioned HDD 20GB
    • Latest Virtual Hardware Version
    • 2 Virtual Network Adapters: E1000 (for 1GbE) or VMXNET3 (for 10GbE).
    • Latest VMware Tools available
  2. Latest Update Rollup from Microsoft Windows Update
  3. Do not join the domain
  4. Turn off Windows Firewall
  5. Turn off User Account Control
  6. Optional: Disable all policies in "Administrative Tools" -> "Local Security Policy" -> "Account Policies" -> "Password Policy" and set up Administrator's password to 'a' for convenience
  7. Assign Static IP address to at least one NIC
  8. Remove any Local User accounts, apart from "Administrator"
  9. Set up automatic login for "Administrator" account, — it would be used to run a Start-Up script (discussed below):
    1. Start -> Run… -> "control userpasswords2"
    2. Remove any users other than Administrator from login
    3. Select "Administrator"
    4. Remove checkbox "Users must enter a user name and password…" (see Figure 2)
    5. Click "Apply"
    6. Enter Administrator's password

 

02_SMB_configure_administrator.png

 

Figure 2  – an auto-login set for Administrator user account on a Windows 2012R2 server that was not joined to any domain

At this point, reboot the Windows Server 2012R2. It should automatically login to local "Administrator" account.

Step 2: Installation of Cygwin and initial configuration of sshd

In the proposed harness, one would have to leverage two (2) SSH daemon ("sshd") services as part of Cygwin package for Windows Server 2012R2. See this discussion for some context, but in short: when sshd is ran as Windows service, it will not have the proper access token needed to allow an incoming ssh user to use network mounts (has to do with user context switching). As a workaround, the sshd used for the actual distribution of commands from Harness Linux server would not be ran as an Auto-Start Windows service, – it would rather be launched by an auto-start script of local "Administrator" user. The two SSH daemons would run on different ports.

Amazing, right? Let's do it.

  1. Download latest Cygwin installer
  2. Run the installer, select download location from list of Global Mirrors
  3. When asked which packages to install, search for and then click on the name to select (see Figure 3 below)
    1. openssl
    2. openssh
  4. Complete the installation, dependencies would be automatically resolved and installed
  5. Right-click Cygwin64 Terminal and run it using "Run as administrator" context
  6. Run "ssh-host-config" (no options to this command line) and answer the following when prompted:
    1. *** Query: Should StrictMode be used? (yes/no)no
    2. *** Query: Should privilege separation be used? (yes/no) yes
    3. *** Query: new local account 'sshd'? (yes/no) yes
    4. *** Query: Do you want to install sshd as a service?
    5. *** Query: (Say "no" if it is already installed as a service) (yes/no) yes
    6. *** Query: Enter the value of CYGWIN for the daemon: [] bin mode ntsec
    7. *** Query: Do you want to use a different name? (yes/no) yes
    8. *** Query: Enter the new user name: administrator
    9. Re-enter administrator
    10. *** Query: Create new privileged user account 'administrator'? (yes/no) yes
    11. Enter the password for Local Administrator account of Windows Server 2012R2
  7. At this point, the installation of sshd should be complete
  8. Reboot Windows Server 2012R2
  9. Open Start -> Run -> "services.msc", and check whether "CYGWIN sshd" service is up and running (see Figure 4)

 

03_cygwin_openssh_click.png

 

 

Figure 3  – openssh package selected for installation

 

03_cygwin_sshd.png

Figure 4  – CYGWIN sshd package successfully installed as a service and is running after system rebootStep 3:  additional configuration of Cygwin and sshd

 

First, one need to ensure that we could log on to Cygwin using "root" user, for convenience:

  1. Open Cygwin console
  2. Run mkpasswd.exe -l > /etc/passwd to generate the /etc/passwd since it's not auto-generated with latest versions of Cygwin
  3. Open /etc/passwd in preferred text editor, for example vi.
  4. Replace the "Administrator" with "root" (refer to Figure 5.a and Figure 5.bbelow)
  5. Save changes

 

04_administrator.png

 

Figure 5.a  – /etc/passwd – replace Administrator with root

 

04_administrator-root.png

 

 

Figure 5.b  – /etc/passwd – replace Administrator with root

Exit Cygwin terminal and open it again by "Run as administrator".

Next, the following commands are required to enable logging it with "root" user

# chown root /var/empty

# chmod 744 /var/empty

Next, the following modifications to sshd are required to enable the "Auto-start" strategy, instead of default "Run as a service" one, – as discussed previously. By default, sshd would need to be started as a service on port 2222 after these modifications:

  1. Open /etc/sshd_config in preferred text editor
  2. Uncomment and set PermitRootLogin Yes
  3. Uncomment and set RSAAuthentication yes
  4. Uncomment and set PubkeyAuthentication yes
  5. Uncomment and set UseDNS No
  6. Ensure the instance accepts SSH-DSA PubkeyAcceptedKeyTypes=+ssh-dss
  7. Set Port 2222

Next, in Windows OS, create the following file that would auto-start an sshd under Administrator login:

C:\Users\Administrator\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Startup\startsshd.bat

The contents of the file should be (assuming default installation directory of Cygwin):

C:

cd c:\cygwin64\binset

CYGWIN=bin mode ntsec

c:\cygwin64\bin\bash.exe –login -i -c "/usr/sbin/sshd.exe -p 22"

Save the file. Reboot Windows Server. After reboot, try connecting to port 22 of the server using SSH from Linux Harness host:

#ssh -l root <IP_of_Windows_Server>

If you're not able to connect to your Windows Server 2012, then some step out of the procedure above was missed.

Step 4: Download & Install fio.exe for Windows

At the moment of writing, the latest version of fio for Windows was 2.2.10, and its x64 installer was available from Windows binaries for fio page

  1. Download and install fio for Windows from the latest .msi package
  2. Create c:\fio folder
  3. Copy fio.exe from c:\Program Files\fio to c:\fio for convenience of usage in scripts later

Step 5: Preparing Isilon cluster's fiotest folder

As noted earlier it is important to get permissions out of the way of performance benchmarking. For that, one has to open Isilon cluster's default /ifs share using Windows Explorer, authenticate as Isilon's root (or other user privileged to change ACLs on /ifs ), and set "Full Control" for "Everyone" for the folder that would be presented as a "fiotest" SMB share. See Figure 6 below:

 

05_fiotest_folder_full_control.png

 

 

Figure 6  –setting "Full control" for fiotest folder that would be the base of the tests

Use the "Advanced" option and "Replace all child object permissions" as well.

Next, create a "fiotest" SMB share (assuming "System" Access Zone is used), allowing "Everyone" Full Control (see Figure 7 below)

 

05_fiotest_share.png

 

 

Figure 7  –setting "Full control" for SMB share that would present the "fiotest" folder on IFS

Next, create a System:LOCAL:System user "administrator" on Isilon and match the password with "administrator" user on Windows Server, i.e. 'a' as in example above (see Figure 8.a and Figure 8.b below)

 

07_add_administrator.png

 

 

Figure 8.a – Click "Create a user" for System Access Zone's LOCAL:System Authentication Provider

 

07_add_administrator_user.png

Figure 8.b  –add "Administrator" user with the same password as the one on Windows Server 2012R2

 

Once the Isilon set-up has completed, reboot Windows Server.

After reboot, try opening \\<IP_of_Isilon>\fiotest\ in Windows Explorer. No password prompt should appear. Try creating the folders, deleting the folders.If it is not possible to connect, or it is not possible to create files or/and folders, than one of the steps above was missed.

Step 6: Preparation of Environment and Scripts

Log on to Linux Harness. If you haven't got it, please read NFS Throughput Benchmarks with fio  article. Once logged in, proceed to fiotest/control folder mounted, for example, /mnt/isilon/fiotest/control

One needs to generate the certificates required for authentication with trusted private (trusted.key) and public (trusted.key.pub) keys:

[root@LinServer0 ~]# ssh-keygen –t dsa

Follow the interactive wizard by hitting "Enter" several times, agreeing to default values, and specifying the destination path for the newly-generated keys to be stored as /mnt/isilon/fiotest/control

Next, one needs to copy the keys to all SMB Windows Server clients participating in the test, as well as to one of the nodes (or all the nodes) in Isilon cluster. The connection to the cluster would be required to request buffers flushing before the test run. For Isilon node, run:

[root@LinServer0 ~]# ssh-copy-id -i /mnt/isilon/fiotest/control/trusted.key.pub IP.OF.ISILON.NODE

Next, Create the following scripts (making sure to chmod them for execution)

smb_hosts.list

Enter the IP addresses or hostnames of WinFio01…04 servers on separate lines, paired with Isilon nodes for 1:1 mapping.

Two important notes:

1) Do not include the IP of the control ("harness") server here!

2) Do not mix up — Hostnames of Windows Hosts with Cygwin first — pipe — Isilon Nodes Last

10.111.158.219|10.111.158.201

10.111.158.218|10.111.158.202

10.111.158.217|10.111.158.203

10.111.158.216|10.111.158.204

smb_copy_trusted.sh

It would only be used once to copy trusted keys to Windows Server clients, just as been done for Isilon manually

#!/bin/bash

# apart from running this script, copy trusted file to the Isilon node

# of choice that would be used to clear cache when running the fio job

# by the same command as below

# the rest of the file is similar to most of the other scripts

# first go through all lines in hosts.list

for i in $(cat /mnt/isilon/fiotest/control/smb_hosts.list) ; do

# then split each line read in to an array by the pipe symbol

IFS='|' read -a pairs <<< "${i}";

# do the ssh-copy-id for putting the certificate to remote host

ssh-copy-id -i /mnt/isilon/fiotest/control/trusted.key.pub ${pairs[0]}

done

cleanup_smb.sh

Would be used after every test run. What it does is: connects to Windows Server 2012 clients from smb_hosts.list, and … let's call the rm -rf housekeeping.

#!/bin/bash

# first to the all lines in smb_hosts.list

for i in $(cat /mnt/isilon/fiotest/control/smb_hosts.list) ; do

# then split each line read in to an array by the pipe symbol

IFS='|' read -a pairs <<< "${i}";

# show back the mapping

echo "Client host: ${pairs[0]}  Isilon node: ${pairs[1]}";

# connect over ssh with the key and do cleanups, create directories etc. – has to be single line

ssh -i /mnt/isilon/fiotest/control/trusted.key ${pairs[0]} -fqno StrictHostKeyChecking=no "rm -rf '\\\\${pairs[1]}\\fiotest\\${pairs[0]}' ; sleep 5 ; mkdir '\\\\${pairs[1]}\fiotest\\${pairs[0]}'"

# erase the array pair

unset pairs ;

# go for the next line in smb_hosts.list;

done

run_smb_fio_1024k_seqwrite.sh

This is the instruction for the nodes to execute a particular fio job, which would be specified separately in another file fiojob_1024k_randread. This file is not co-located in control directory due to operational preference only. It's just easier to tabulate when multiple concurrent jobs need to be launched using the '&&' within control folder.

Few points to note:

1) Isilon TME Performance team uses 50GB files

2) fio.exe for Windows does not pre-generate files when "read" tests are executed, unlike the fio package for Linux. Run "write" tests before "read" tests.

3) For random workloads, Isilon's "Random" data access pattern should be set on the directory of tests. "Streaming" should be set for sequential.

4) For random workloads test, "randrw" is never used in Isilon TME Performance lab, rather a test of 100% random read, and concurrently — 100% random write is done.

The purple-marked commands flush cache on Isilon:

#!/bin/bash

# first, connect to the first isilon node, and flush cache on array

echo "Purging L1 and L2 cache first";

ssh -i /mnt/isilon/fiotest/control/trusted.key IP.OF.ISILON.NODE -fqno StrictHostKeyChecking=no "isi_for_array isi_flush";

# wait for cache flushing to finish, normally around 10 seconds is enough

sleep 10;

# the L3 cache purge is not recommended as all metadata accelerated by SSDs is going. but, maybe… 

# echo "On OneFS 7.1.1 clusters and newer, running L3, purging L3 cache";

# ssh -i /mnt/isilon/fiotest/control/trusted.key 10.63.208.64 -fqno StrictHostKeyChecking=no "isi_for_array isi_flush –l3-full";

#sleep 10;

# the rest is similar to the other scripts

# first go through all lines in smb_hosts.list

for i in $(cat /mnt/isilon/fiotest/control/smb_hosts.list) ; do

# then split each line read in to an array by the pipe symbol

IFS='|' read -a pairs <<< "${i}";

# connect over ssh with the key and mount hosts, create directories etc. – has to be single line

# pointing to fio job file that is one level above from control directory

# yes it is a nightmare of slashes

ssh -i /mnt/isilon/fiotest/control/trusted.key ${pairs[0]} -fqno StrictHostKeyChecking=no "export FILENAME=\\\\\\\\${pairs[1]}\\\\fiotest\\\\${pairs[0]} ; /cygdrive/c/fio/fio.exe –output=\\\\\\\\${pairs[1]}\\\\fiotest\\\\smb_fioresult_1024k_seqwrite_${pairs[0]}.txt \\\\\\\\${pairs[1]}\\\\fiotest\\\\fiojob_1024k_seqwrite";

done

Next, move one folder up, back to /mnt/isilon/fiotest/ and create the fio job file.

[root@LinServer0 control]# cd /mnt/isilon/fiotest/

[root@LinServer0 fiotest]# vi fiojob_1024k_rseqwrite

fiojob_1024k_seqwrite

The following file would execute 100% sequential IO with 1M blocksize worth 12GB

; –start job file —

[global]

description=————-THIS IS A JOB DOING ${FILENAME}———directory=${FILENAME}

rw=write

size=12G

numjobs=1

bs=1024k

zero_buffers

direct=0

sync=0

refill_buffers

ioengine=sync

iodepth=1

[1024k_seqwrite]

; — end job file —

One may find all possible options of fio jobs in manual pages of fio. Refer to FIO documentation for greater details.

By now, when all files are set, one may execute the test from Linux Harness server:

[root@LinServer0 fiotest]# ./control/run_smb_fio_1024k_seqwrite.sh

If there are two NICs used on Windows Server 2012 (or Windows 8.x/10), one would be able to see SMB3 Multichannel using TCP connections on both of them within the same session:

 

08_SMB_multichannel_write.png

 

Figure 9  – Leveraging both NICs of a dual-NIC Windows Server 2012R2 is a magic of SMB3 Multichannel in place.

Another very useful tool to monitor the buffers on Windows is RAMMap

Adding 'Thread' in to Client:Node:Thread ratio

When one needs to run benchmarking tests with more threads per Isilon node (N:1, where N>1), it could easily be done by specifying the "numjobs" in fio job files. For example, the following job would create sequential write 5 threads from every Windows client host:

fiojob_smb_1024k_5t_seqwrite.sh

The following file would execute 5 threads of 100% sequential write IO with 1M blocksize worth 12GB

; –start job file —

[global]

description=————-THIS IS A JOB DOING ${FILENAME}———directory=${FILENAME}

rw=write

size=12G

numjobs=5

bs=1024k

zero_buffers

direct=0

sync=0

refill_buffers

ioengine=sync

iodepth=1

[1024k_5t_seqwrite]

; — end job file —

Refer to FIO documentation or manpages for greater details.

Useful: Limiting per-Thread throughput

In some Media & Entertainment type of workloads, it is often required to provide the evidence of stable thread throughput at particular bandwidth.

fiojob_1024k_randwrite_5t_7500kB

The following file would execute 5 threads per Windows client, 100% read IO, at 7.5 MB/s per thread, with 1M blocksize, worth 12GB

; –start job file —

[global]

description=————-THIS IS A JOB DOING ${FILENAME} HOST———

directory=${FILENAME}

rw=read

size=12G

rate=7500k

numjobs=15

bs=1024k

zero_buffers

direct=0

sync=0

refill_buffers

ioengine=sync

iodepth=1

[1024k_randread_5t_at7500kB];

— end job file —

Collecting SMB throughput results

During the test, to collect SMB protocol total throughput statistics from the Isilon cluster to a comma-separated file for further analysis, one should log in to one of the nodes in the cluster and execute the iterative "loop" with 5 seconds delay of the following command:

# isi statistics protocol –protocols=smb2 –totalby=Proto –csv –noheader –i=5 –r=-1 >> /ifs/data/fiotest/smb_1024k_seqwrite.csv

One could use SCREEN or any other tool to suspend and com back to the command on the completion of the test and then interrupt it i.e. by hitting Ctrl+C

The comma-separated file could then be imported to i.e. Microsoft Excel for scatter plotting, finding the maximums, averages, median values of the total cluster throughput. The "missing header" (as the –noheader been specified above) would be:

Ops In Out TimeAvg TimeStdDev Node Proto Class Op
N/s B/s B/s us us  

The way the results are collected by Isilon TME Performance lab team is as follows.

  • Identical tests of the same blocksize & read/write pattern are ran 3 times, separated by some amount of time. For example, after testing 512kB Random Write test of 128kB Sequential Read follows, then few other tests, then 512kB Random Write again. That is done in attempt to remove any systemic problems (i.e. related to network congestion etc.) that could've otherwise happened;
  • The results are cleared from start-up and ending "tails" where i.e. no actual throughput been applied, but the sampling happened;
  • The median results are selected and put as official results in to empirical data used by Isilon Sizing Tool;

Please, free to post comments about your tests and questions!

Good luck with SMB throughput benchmarking!

Measure

Measure