MPACT-Sim Binary Instruction Decoder Generator

Goals

Define a convenient format for describing binary instruction formats and instruction encodings.
Create an algorithm to partition the instructions into groups for faster decode.
Generate C++ code for an efficient decoder to decode the instructions in the description

Overview

A binary instruction decoder is tedious and at times error prone to write by hand. For instance, the Risc-V RV32G instruction set consists of 122 separate instructions and over a dozen instruction formats. Other architectures have more, such as the The TI C64+ DSP with almost 250 different instructions (including 35 compact encodings).

While the decoder source code may be repetitive, the large number of bit-field extractions and value comparisons makes it more likely to introduce careless bugs that are hard to track down. Moreover, responding to changes in the encoding formats or overall encoding scheme requires a lot of rework, and makes binary encoding a poor choice for doing what-if analysis and architectural exploration. However, given that it is the preferred output format for most tool chains, it is almost without exception necessary to support.

Defining a declarative format for describing instruction encodings and providing a tool that parses generating the C++ code to perform the instruction decoding not only improves programmer productivity, but provides a much more readable specification of the implementation that can more easily be compared to the ISA manual. Changes to instruction encodings are trivial to implement, and the resulting decoder code is correct by construction.

Binary Instruction Description

The binary instruction description file has two main components: a set of instruction format definitions and a set of instruction encoding definitions. The former defines the layout of bit fields in an instruction format, as well as overlays, which can be considered virtual fields that can be constructed by concatenating bit constants and/or fragments of the instruction word. Overlays are useful if there are alternative interpretations of the bits in an instruction, or if the bits in a field have to be reordered to be interpreted correctly. For instance, in the RiscV ISA, the 32 bit store instructions concatenate two separate, non adjacent bit fields to form the 12 bit signed offset used in the address calculation. Another example, also from the RiscV ISA, are the 32 bit branch instructions. The branch offset is constructed from two immediates, but instead of just concatenating the two fields, they are intertwined, concatenating the two msb’s, followed by the concatenation of the remainder of the fields, adding a constant 0 bit at the lsb, to create a byte address.

Instruction encodings are described separately, grouped by instruction size, and lists constraints (equal or not equal) on field values in the instruction format.

Instruction Formats

The instruction format definition has three components. The header defines the format name, the bit width, and an optional base format. The base format helps group formats together in a hierarchy. Only instruction encodings using formats that derive from the same base can be put into the same instruction encoding group. This allows encodings for execution mode dependent instructions to be grouped according to mode and decoded accordingly, or merely a way of forcing some instructions to be decoded in separate decode functions than others. An example from RiscV RV32G is shown below.

format BType[32] : Inst32

The body of instruction formats are divided into two sections: fields and overlays.

Fields

The fields section lists the fields of the format in order from msb to lsb. The total width of the fields has to equal the declared width of the format. A field is either a bit field definition or a reference to another format. A bit field definition consists of an unsigned or signed specification, a name, and bit width. A format reference consists of the keyword format followed by the format name, and an optional size specification, which specifies how many times that format is repeated. The fields section of the RiscV RV32G BType format declaration shown previously is listed below:

 fields:
   unsigned imm7[7];
   unsigned rs2[5];
   unsigned rs1[5];
   unsigned func3[3];
   unsigned imm5[5];
   unsigned opcode[7];

Overlays

An overlay is a named virtual field of the format that consists of a sequence of field (or format) references and binary literals in order of concatenation from left to right. Overlays are used when the value of a desired quantity from the instruction word is not encoded directly in a field, but has to be assembled from the fields or formats, or fragments of fields or formats, possibly with binary literals added to the mix. Binary literals are expressed in the form ‘0b<binary digits>’, where the single quote can be used as a grouping mechanism (e.g., 0b0100'1010). The width of the literal is defined by the number of binary digits following the ‘0b’ prefix.

An overlay definition consists of an unsigned or signed specification, a name, and bit width, (just like a field specification), but then followed by an equal sign and a comma separated list of overlay components. Each overlay component is one of:

A field or format name, which means the entire field/format is used.
- E.g., imm5
A field or format name with a bit range, which selects only the specified bits.
- E.g., imm5[4..1]
A bit range on its own, which refers directly to the current format’s bits.
- E.g., [11..8] (which would refer to the same bits as imm[4..1])
A binary number literal which defines both the value and the width (number of digits).
- E.g., 0b0100'1010

The overlay section of the RiscV RV32G BType format declaration is shown below:

 overlays:
   signed imm[13] = imm7[6], imm5[0], imm7[5..0], imm5[4..1], 0b0;

Instruction Encodings

The instruction encodings are grouped in one or more instruction group definitions. Each such definition gives rise to a separate instruction decode function in the generated code. The instructions in a single group are subject to two constraints: they must all have the same width, and the instruction formats used by the instructions must all derive from the same ancestor (as specified in the group header). The user may otherwise freely select how to group the instructions according to relevant ISA constraints, such as mode etc.

Header

The header of the instruction group defines the name of the group, the width, the opcode enum C++ type used for the opcodes and the base instruction format all the grouped instructions’ formats must derive from. An example from RiscV 32 is shown below:

instruction group RiscVInst32[32] : Inst32Format

This defines “RiscVInst32” as the name of an instruction group with 32 bit wide instructions, using the base format “Inst32Format”. All the instructions listed within have to derive from “Inst32Format”.

Instructions

The instructions in an instruction group are listed sequentially in a semicolon separated list. Each instruction is described by its name, the instruction format used, and a comma separated list of value constraints on fields or overlays in that instruction.

As an example, consider the RiscV branch on equal “beq” instruction. This instruction uses the BType format that was listed previously in the discussion on instruction formats.

beq    : BType  : func3 == 0b000, opcode == 0b110'0011;

As can be seen, the value constraints are that the instruction field func3 has to be equal to 0, and the opcode field has to have the value 0x63. Note, the radix of the numeric literal in each constraint is not restricted to binary, but can be any of binary, octal, hexadecimal or decimal. The choice is up to the author in terms of what is most convenient.

Another example using five instructions from the 16 bit RiscV instruction group that all use format CR illustrates how not-equal constraints can also be used :

format CR[16] : Inst16Format {
 fields:
   unsigned func4[4];
   unsigned rs1[5];
   unsigned rs2[5];
   unsigned op[2];
};

instructions Inst16[16] "isa::OpcodeEnum" : Inst16Format {
 ...
 cjr       : CR : func4 == 0b1000, rs1 != 0, rs2 == 0, op == 0b10;
 cmv       : CR : func4 == 0b1000, rs1 != 0, rs2 != 0, op == 0b10;
 cebreak   : CR : func4 == 0b1001, rs1 == 0, rs2 == 0, op == 0b10;
 cjalr     : CR : func4 == 0b1001, rs1 != 0, rs2 == 0, op == 0b10;
 cadd      : CR : func4 == 0b1001, rs1 != 0, rs2 != 0, op == 0b10;
 ...
};

Decoder

The decoder definition specifies properties for the generated binary instruction decoder. Multiple decoders may be specified in a .bin_fmt file, but only one can be selected, by passing in a command line option to the parser tool, for parsing and code generation.

The decoder definition specifies 4 pieces of information: namespace, the opcode enumeration type to use, decoder specific include files, and the instruction groups for which to generate decoders for.

decoder RiscV32GV {
  namespace mpact::sim::riscv::encoding;
  opcode_enum = "isa32v::OpcodeEnum";
  includes {
    #include "some/include/file.h"
  }
  // Combine instruction groups RiscV32 I, M and F.
  RiscV32G = {RiscVIInst32, RiscVMInst32, RiscVFInst32};
  // Separate group for compact instructions.
  RiscVCInst16;
};

Namespace

The namespace property specifies the namespace in which the generated code will be placed.

Opcode Enumeration Type

The opcode_enum specifies the text string that references the opcode enumeration type to use as the return value for the instruction decoder(s).

The binary instruction decoder generator is designed to work with the representation independent decoder generator (go/mpact-sim-isa-decoder). As such, it has to translate from the binary representation of instructions to the opcode enumeration type defined in the generated files by that tool. In order for this to work properly, the string literal has to specify the name of the enumeration type as it needs to be used from within the namespaces (see the detailed grammar rules), defined for the binary decoder generator. The name of each instruction listed in the binary encoding description file must match the name of an instruction in the representation independent decoder description file for the enumeration type members to match up.

Include Files

Include files specific to this decoder may be specified here. They will be copied to the generated code.

Instruction Groups

The remainder specifies the named instruction groups to generate decode functions for. Multiple instruction groups can be combined into a single group for purposes of generating only one decode function for their combined instructions, as opposed to having separate functions for each one. Only instruction groups that use the same format can be combined in this way. In the example above, three instruction groups (integer, multiply/divide, and floating point) are combined to generate a single decoder, while a separate decoder function is generated for the compact, 16-bit instructions.

Generated C++ Header Code

The binary decoder generator tool generates a set of functions in the namespace defined in the description file (see detailed syntax). As mentioned previously, a decode function is generated for each instruction group. The signature is

<enum type> Decode<group name>(<word type> inst_word);

Where <enum type> is the string literal in the instruction group definition, <group name> is the name of the instruction group, and <word type> is the smallest unsigned integer type that fits the width of the instruction. When called, the function will return the enum member that matches the word passed in as an argument, or the enum member kNone (always defined), if there is no match.

The tool also generates extract functions for each field and overlay in the formats that are defined in the description file. Each such extraction function is defined within a nested namespace named after the format in which it is defined. The extraction functions are defined as inline in the .h file to improve performance without forcing every function to be included in the executable, as many of these are not used by the decoder, but should be made available to the user if desired.

The return type is the smallest signed/unsigned (according to the field/overlay definition) integer type that fits the width of the field/overlay. The argument type is the smallest unsigned integer type that fits the format width. The following example shows the RiscV BType format and the extraction functions generated:

Generating the Decoder Functions

A decoder function is generated for each instruction group, returning the enum member for one of the instructions, if there is a match with that instruction, or kNone if there is not.

Algorithm

A new algorithm was developed to generate the decoder for the instructions in an instruction group. The goal of the decoder is to be correct and reasonably fast and efficient, but does not attempt to be optimal.

Initialization

In the first stage of the algorithm only the “equal to” constraints for each encoding are considered. The algorithm computes the mask of bits that are constrained equal for each instruction in the instruction group. This is called the constraint mask of the instruction.

Creating Initial Encoding Groups

In the second stage of the algorithm, an initial set of encoding groups are formed. The algorithm is shown below. Initially, the set of encoding groups is empty. For each instruction encoding, consider each encoding group in turn. If the intersection of the constraint mask of the encoding group with the constraint mask of the encoding is zero, that is, there is no overlap between the constrained bits in the encoding with the constrained bits in the encoding group, then go to the next encoding group. If the intersection is non-zero, select that encoding group.

If an encoding group was selected, add the current encoding to that group, and update the encoding group’s constraint mask with a bitwise and of the instruction encoding’s constraint mask. If no encoding group was selected, create a new encoding group, setting the constraint mask of the encoding group to that of the instruction encoding. At the end of this stage, each instruction encoding has been assigned to an encoding group, and it is guaranteed that all encoding groups have a non-zero constraint mask. The algorithm is shown in the following figure.

for each instruction encoding enc {
  inst_enc_group = nullptr;
  for each encoding group grp {
    if ((grp->constraint_mask & enc->constraint_mask) == 0) continue
    inst_enc_group = grp
    break
  }
  if (inst_enc_group != nullptr) {
    Add enc to inst_enc_group
    inst_enc_group->constraint_mask &= enc->constraint_mask
  } else {
    Create a new encoding group enc_group
    Add enc to enc_group
    enc_group->constraint_mask = enc->constraint_mask
  }
}

Subdividing Encoding Groups

Once the initial encoding groups have been formed each encoding group is subdivided recursively into smaller encoding groups based on the “equal-to” constraints for the instructions in the encoding group.

Three quantities are computed for each encoding group. First, the constant mask, the subset of the bits in the constraint mask that have the same constraint value across all the instruction encodings in the encoding group. Second, the constant value, which is the shared value of the bits in the constant mask. Third, the discriminator mask, which is the bitset difference between the constraint mask and the constant mask. A fourth quantity is used as well, it is computed for subgroups when an encoding group is divided. This is the ignore mask, which is the set of bits that have already been handled by an encoding group higher in the hierarchy, and no longer need to be considered. For instance, all 32 bit instructions in the RiscV32G ISA have 0b11 as the value of the two lsbs. Only the top level decoder function needs to check the value of these bits, any subsequent comparisons need consider at most the remaining 28 bits.

If the number of set bits in the discriminator is 0 an encoding group cannot be subdivided, as there are no common bits by which to separate the instructions. Otherwise an encoding group can be subdivided into 2^(1 << popcount(discriminator)), where popcount(x) returns the number of set bits. For instance, if a set of instruction encodings in a group have a constraint mask of 0x7f, a constant mask of 0x3, and thus a discriminator mask of 0x7c (the discriminator mask has five bits sets), there can be a total of 32 (2^5) subgroups created. In other words, the encodings in the encoding group can be divided into 32 subgroups, and the subgroup for any instruction encoding can be looked up by an index extracted from the bits in the instruction word that correspond to the discriminator mask.

Pseudo code for the algorithm is shown below. No subgroups are created that are either size 0 or the size of the group itself. Moreover, no subgroups are created if the discriminator mask has only one bit set.

Refine(EncodingGroup G) {
  for (int i = 0; i < (1 << popcount(G->discriminator_mask()); i++) {
    subgroup = new EncodingGroup
    subgroup->SetIgnore(G->ignore_bits() | G->discriminator_mask() |
                        G->constant_mask())
    for each encoding enc in G {
      if (discriminator_value(enc) != i) continue
      subgroup->AddEncoding(enc)
    }
    if (subgroup->empty()) {
      delete subgroup
      continue
    }
    if (subgroup->size() == G.size()) {
      delete subgroup
      return
    }
    if (popcount(subgroup->discriminator()) >= 2) {
      Refine(subgroup)
    }
  }
}

There is a property of encoding groups that is worth noting. If an encoding group has no subgroups, and there are no instruction encodings in the group that have not-equal constraints, and there are no equal constraints that involve overlays with bit constants, then that encoding group is considered to be simple. Instructions in a simple encoding group can have their opcode determined using a lookup table without any explicit comparisons.

Generating Code

The code generation process is the same for each instruction group listed in the decoder definition. In addition to the top level decode function for the instruction group, a separate decode function is generated for each encoding group that will either directly return an opcode, or call the decode function for a subgroup. The top level decoder function contains the code to call the top level encoding groups’ decode function one after the other until one returns a valid opcode value (not equal to kNone). If no valid opcode is found, kNone is returned. If there is only a single top level encoding group the function is still emitted as described. A future optimization would be to inline the next level decode function directly instead. Below is the top level decode function generated for RiscV32G Inst32 instruction group:

isa::OpcodeEnum DecodeRiscVInst32(uint32_t inst_word) {
  isa::OpcodeEnum opcode;
  opcode = DecodeRiscVInst32_0(inst_word);
  if (opcode != isa::OpcodeEnum::kNone) return opcode;
  return opcode;
}

The decode functions for every encoding group follow the same pattern. First if there is a non-zero constant_mask for the encoding group the mask is applied to the instruction word and the value is compared to the constant_value. If the values are not equal, the decode function returns kNop. If the encoding group has subgroups, then the index value corresponding to the discriminator_mask is computed and used to select and call the function pointer for the next level decode function, returning its return value. The RiscV32G 0 level decode function is shown below:

isa::OpcodeEnum DecodeRiscVInst32_0(uint32_t inst_word) {
  if ((inst_word & 0x3) != 0x3) return isa::OpcodeEnum::kNone;
  uint32_t index;
  index = (inst_word >> 2) & 0x1f;
  return parse_group_RiscVInst32_0[index](inst_word);
}

The parse_group_* arrays are defined in the .cc file, and initialized with the appropriate function pointers for the different encoding groups.

For leaf encoding groups that are simple, the decode function is almost identical to the above function, except that the index is used to select from an opcode array, defined static constexpr local to the decode function. The following shows such a function for RiscV32 32 bit store instructions:

isa::OpcodeEnum DecodeRiscVInst32_0_8(uint32_t inst_word) {
 static isa::OpcodeEnum opcodes[4] = {
   isa::OpcodeEnum::kSb,
   isa::OpcodeEnum::kSh,
   isa::OpcodeEnum::kSw,
   isa::OpcodeEnum::kNone,
 };
 if ((inst_word & 0x4000) != 0x0) return isa::OpcodeEnum::kNone;
 uint32_t index;
 index = (inst_word >> 12) & 0x3;
 return opcodes[index];
}

For non-simple leaf decode functions, the code starts with field extractions required for the constraints testing, then each opcode is considered in turn, as shown below.

isa::OpcodeEnum DecodeRiscVInst16_0_12(uint16_t inst_word) {
 uint16_t index;
 index = (inst_word >> 12) & 0x1;
 uint16_t rs1_value = (inst_word >> 7) & 0x1f;
 uint16_t rs2_value = (inst_word >> 2) & 0x1f;
 if ((index == 0x0) && (rs1_value != 0x0) && (rs2_value != 0x0))
   return isa::OpcodeEnum::kCmv;
 if ((index == 0x0) && (rs2_value == 0x0) && (rs1_value != 0x0))
   return isa::OpcodeEnum::kCjr;
 if ((index == 0x1) && (rs1_value == 0x0) && (rs2_value == 0x0))
   return isa::OpcodeEnum::kCebreak;
 if ((index == 0x1) && (rs1_value != 0x0) && (rs2_value != 0x0))
   return isa::OpcodeEnum::kCadd;
 if ((index == 0x1) && (rs2_value == 0x0) && (rs1_value != 0x0))
   return isa::OpcodeEnum::kCjalr;
 return isa::OpcodeEnum::kNone;
}

Detailed Syntax of the .bin File

The full Antlr4 grammar of the .isa file is found in the file BinFormat.g4).