Chapter 4: Scenario Definition File (SDF) Format

Things should be as simple as possible, but not simpler.

—Albert Einstein

The scenario definition file is an XML file which determines all the parameters which are needed in the experiment. The XML format was chosen because of its multiples advantages:

The XML structure is hierarchical, which is very apropiate to define subscenarios.
When a DTD is provided, a commander can force the use of the different parameters (i.e. you don't have to send the file to the server in order to see if it's DTD conformant or not).
The use of a DTD will easy the use of XML editors in order to guide the user in defining the scenario parameters.
It's a standard, and there are XML parsers for a lot of languages. So developers can use a wide language range in order to build a commander.

The OpenLC scenario definition files are evolving rapidly, getting richer in features and parameters. This is because they have to adapt constantly to the new features coming up in the OpenLC server. So, don't expect to have always the same tags in the scenario. However, their definition will significantly freeze as OpenLC approaches his stable release (1.0), while I'll try hard to keep these changes under a minimum. No DTD's are defined at this moment (most probably will be in the next release).

Right now, three different XML scenario files are implemented (and are coming in future releases). One is used for define a Local experiment, other for a HTTP load experiment and the third for a IMAP4 experiment. Before to state the differences, we will start with a brief explanation to the general parameters which are common to all the scenarios.

4.1 General scenario tags

Let's return again to the XML file used in the section 3.1. We will comment it by signalling the main properties for each of the XML tags in the file¹⁾. When the tags are not general it will be noted.

Let's start from the beginning.

configuration

<configuration>
  <clients number="5"></clients>
  <time max="5"></time>
  <sample small="yes" period="2.0"></sample>
  <!-- The number of bins of the resulting reduced data -->
  <reduction minElementsPerBin="10" maxBins="100"></reduction>
      
.
.
.
</configuration>

As you see, the file always start by a <configuration> tag and is closed by his counterpart </configuration>. Those tags are mandatory and signals the start and the end of the document (this is called the root element).

Let's see the tags which are direct descendents of configuration tag:

The <clients> tag fixes the number of clients in this experiment (more attributes may come in the future). It only accepts one attribute, "number", and in this documentation we usually refer to it as clients.number variable. This notation is convenient because it is used internally in the OpenLC code, and also reflects the hierarchical nature of the scenario structure. In this case, we are telling the OpenLC server that we want a total of 5 simulated clients (currently, they are implemented as Python threads).
time.max variable: the maximum execution time (wall clock time) for all the run expressed in seconds. The value is 5 seconds in this case. The run will finish wherever a time.max, runProtocol.iterations.max is reached, or when a finishRun command is received by the server, the event which happens first.
sample.small variable: it is used to select the amount of data to be sampled on the spy process. If true ("yes" value), a call to getSampleData only returns the latest transaction gathered by the server. If false ("no" value), all the data gathered from the previous call to the getSampleData() is returned.
<reduction> tag: it defines values for the data reduction process which is carried out on the server side. This process consists basically in computing histograms for each of the variables measured for each command, group and protocol along the wallclock time axis. The reduction.maxBins variable sets the maximum number of bins for the histograms, and the reduction.minElementsPerBin the minimum number of data measurements included in each bin.

runProtocol

  <runProtocol name="Local" mode="random">
    <iterations max="1000"></iterations>

This tag marks the start of a protocol definition. That is, the data values we want to save, the maximum iterations, groups of commands and, of course, the proper commands we want to issue in order to refine the scenario in the protocol context, so we can properly call this a subscenario. Some subscenarios are very simple ones (i.e., Local), but, in general, they may be very rich in subtags and attributes depending on the internal servers to deal with.

Now, we will explain the main characteristics for some runProtocol components:

name: Attribute selecting which protocol (i.e. which internal server) we want to invoke. For the moment, three protocols are supported: Local, HTTP and IMAP4.
runProtocol.mode: This attribute can have two values:

"normal" (default): all the commands behind (<cmd> tags) will be executed sequentially in each thread.
"random": randomizes the sequence of commands.

iterations.max: sets the maximum number of iterations for this protocol. All the cmd's hanging from runProtocol are considered one single iteration.

retValues

    <retValues>
      <wallClock units="s" sta="mean" typecode="Float"></wallClock>
      <timeSpent units="s" spyplot="yes" sta="vds" typecode="Float"></timeSpent>
      <dataTransferred units="KB" sta="vds" typecode="Float"></dataTransferred>
      <commandNumber units=" " typecode="Int"></commandNumber>
      <threadNumber units=" " sta="minimal" typecode="Int"></threadNumber>
    </retValues>

This tag selects the variables we want to save for each transaction. In this example, we can choose between the next variables (but this may depend on the internal server implementation used for the protocol):

wallClock(mandatory): selects (for saving)the time (from the beginning of the run) when the transaction has been completed.
timeSpent: selects the time that the transaction has taken.
dataTransferred: gives the amount of data transferred during the transaction.
commandNumber: is the number of command in the command's list.
threadNumber: indicates which thread number was responsible of the transaction.

In the retValue's subtags, we see that we have some attibutes to specify more clearly its properties.

units sets the unit of measurement for the variable (s for seconds, KB for kilobytes and so on).
typecode sets the number type (right now only 'Int' or 'Float').
sta which sets the type of statistics we want to extract from data. They can be:
- vds: Very Detailed Statistics
- mean: Only mean values
- minimal: No histogram, only a mean value
- none: No histogram, no mean value

cmdGroup

    <cmdGroup name="test">

The <cmdGroup> tag groups several command (<cmd>) tags to form "atomic" actions to be done. The runProtocol.mode variable described above doesn't interfere inside this groups, where the execution is guaranteed to be sequential. This feature is perfect to simulate a Web shopping procedure, IMAP4 procedure call sequence or, in general, a group of actions you want to ensure they will always be done sequentially. Also, the microkernel do command group statistics based on this tag grouping.

It has only one attribute for the moment:

cmdGroup.name variable: The name of the group for future references on the run database and the statistics.

cmd

      <cmd spy="yes" name="constant">
        constant(range=0.01)
      </cmd>
      <cmd spy="no" name="random">
        random(range=0.01)
      </cmd>
      <cmd spy="yes" name="linear">
        linear(range=0.01)
      </cmd>
    </cmdGroup>
  </runProtocol>
</configuration>

Finally, here we have the most internal tag for <runProtocol>, the <cmd> tag. Here goes all the stuff related with the actual commands issued by the internal servers to attack (or just simulate this, in the case of Local protocol) exterior IT servers. It has several components:

name attribute: The name of the command for future references on the run database and the statistics.
spy attribute: A boolean variable indicating if we want to spy this command or not. Values: "yes" / "no". Default value: "no".
PCDATA (only one line allowed): The PCDATA (i.e. the text between the <xmltag atr1=" ", atr2=" ", ...> and </xmltag> tags), selects the procedure (chosen from a range in the internal server API) to attack the exterior server. In this case, three different procedures will be invoked: constant, random and linear. For an explanation on what this procedures actually do, see the Local protocol section. As you can see, you can pass parameters to the internal servers procedure.

In the next section, we will have a more-in-depth look into the different sub-scenarios currently implemented in the OpenLC server.

4.2 Subscenario definitions

In the last section we have made a description on the general tags for OpenLC scenarios. But one of the OpenLC's strengths is the flexibility to adapt to an array of stress testing situations. In this section we will discuss the different XML subscenarios for the internal servers present in the OpenLC server.

4.2.1 Local subscenario

The Local scenario is implemented as a stand-alone and synthetic test. It is very useful to simulate runs and the user is offered the capability to control a variety of parameters to test the OpenLC capabilities (for example, how many transactions per second can deal with), to quickly design and test new clients, or just to learn using it.

In this subscenario, the only components which are different from the general format are the Local server internal API. Right now there are three procedures:

constant(range=floatValue): Sleep for a number of seconds given by the range parameter.
linear(range=floatValue): Sleep for a number of seconds between 0 and the value of the range paramenter. This number increases linearly during the duration of the run.
random(range=floatValue): Sleep for a random number of seconds, in the range of [0..range].

All this procedures also return a dataTransferred retValue which is simply the time spent by the sleep call multiplied by a factor (right now 1024).

4.2.2 HTTP (FTP, Gopher, file) subscenario

Like the Local protocol, no special tags are needed, and the only procedure implemented right now on the HTTP internal server API is get.

get(url="stringValue"): Get the URL stated in the url variable. It is based on the urllib Python module so, it supports HTTP 1.0 and secure HTTP (https://name) if Python is configured with OpenSSL. It also supports the FTP, Gopher and file protocols.

4.2.3 IMAP4 subscenario

The support for the IMAP4 protocol adds a couple of tags in order to authenticate the users before doing any transaction. The tags are:

auth: This tag allows the input of authorization info for the host and users. It has one attribute, host which takes the value of the machine to send requests to.
user: It is a subtag of auth and it's intended to provide the username (user.name variable) and password (user.password variable) information for each user.

The IMAP4 internal server connects to auth.host and authentifies each thread with an user.name identifier. The map between the threads and usernames is made using a round-robin algorithm. If there are more clients than user names, several clients use the same username (some IMAP4 server support until 4 sessions with the same user). If there are more usernames that clients, there will remain usernames unused.

The IMAP4 internal server API is basically the same supported by the imaplib Python module (see http://www.python.org/doc/lib/module-imaplib.html. Right now, an effort is made to compute the dataTransferred (retValue) value, but take this as an aproximation until a better algorithm is implemented.

¹⁾ Be careful with the letter cases in the next XML examples, because both XML and Python are case SENSITIVE.