blob: 960d3f9c3c3b52220955c05d0da9f0563a935cdf [file] [view]
# MPACT-Sim Binary Instruction Decoder Generator
# Goals
* Define a convenient format for describing binary instruction formats and
instruction encodings.
* Create an algorithm to partition the instructions into groups for faster
decode.
* Generate C++ code for an efficient decoder to decode the instructions in the
description
# Overview
A binary instruction decoder is tedious and at times error prone to write by
hand. For instance, the Risc-V RV32G instruction set consists of 122 separate
instructions and over a dozen instruction formats. Other architectures have
more, such as the The TI C64+ DSP with almost 250 different instructions
(including 35 compact encodings).
While the decoder source code may be repetitive, the large number of bit-field
extractions and value comparisons makes it more likely to introduce careless
bugs that are hard to track down. Moreover, responding to changes in the
encoding formats or overall encoding scheme requires a lot of rework, and makes
binary encoding a poor choice for doing what-if analysis and architectural
exploration. However, given that it is the preferred output format for most tool
chains, it is almost without exception necessary to support.
Defining a declarative format for describing instruction encodings and providing
a tool that parses generating the C++ code to perform the instruction decoding
not only improves programmer productivity, but provides a much more readable
specification of the implementation that can more easily be compared to the ISA
manual. Changes to instruction encodings are trivial to implement, and the
resulting decoder code is correct by construction.
# Binary Instruction Description
The binary instruction description file has two main components: a set of
instruction format definitions and a set of instruction encoding definitions.
The former defines the layout of bit fields in an instruction format, as well as
overlays, which can be considered virtual fields that can be constructed by
concatenating bit constants and/or fragments of the instruction word. Overlays
are useful if there are alternative interpretations of the bits in an
instruction, or if the bits in a field have to be reordered to be interpreted
correctly. For instance, in the RiscV ISA, the 32 bit store instructions
concatenate two separate, non adjacent bit fields to form the 12 bit signed
offset used in the address calculation. Another example, also from the RiscV
ISA, are the 32 bit branch instructions. The branch offset is constructed from
two immediates, but instead of just concatenating the two fields, they are
intertwined, concatenating the two msbs, followed by the concatenation of the
remainder of the fields, adding a constant 0 bit at the lsb, to create a byte
address.
Instruction encodings are described separately, grouped by instruction size, and
lists constraints (equal or not equal) on field values in the instruction
format.
## Instruction Formats
The instruction format definition has three components. The header defines the
format name, the bit width, and an optional base format. The base format helps
group formats together in a hierarchy. Only instruction encodings using formats
that derive from the same base can be put into the same instruction encoding
group. This allows encodings for execution mode dependent instructions to be
grouped according to mode and decoded accordingly, or merely a way of forcing
some instructions to be decoded in separate decode functions than others. An
example from RiscV RV32G is shown below.
```
format BType[32] : Inst32
```
The body of instruction formats are divided into two sections: fields and
overlays.
### Fields
The fields section lists the fields of the format in order from msb to lsb. The
total width of the fields has to equal the declared width of the format. A field
is either a bit field definition or a reference to another format. A bit field
definition consists of an `unsigned` or `signed` specification, a name, and bit
width. A format reference consists of the keyword `format` followed by the
format name, and an optional size specification, which specifies how many times
that format is repeated. The fields section of the RiscV RV32G BType format
declaration shown previously is listed below:
```
fields:
unsigned imm7[7];
unsigned rs2[5];
unsigned rs1[5];
unsigned func3[3];
unsigned imm5[5];
unsigned opcode[7];
```
### Overlays
An overlay is a named virtual field of the format that consists of a sequence of
field (or format) references and binary literals in order of concatenation from
left to right. Overlays are used when the value of a desired quantity from the
instruction word is not encoded directly in a field, but has to be assembled
from the fields or formats, or fragments of fields or formats, possibly with
binary literals added to the mix. Binary literals are expressed in the form
0b<binary digits>’, where the single quote can be used as a grouping
mechanism (e.g., `0b0100'1010`). The width of the literal is defined by the
number of binary digits following the 0b prefix.
An overlay definition consists of an `unsigned` or `signed` specification, a
name, and bit width, (just like a field specification), but then followed by an
equal sign and a comma separated list of overlay components. Each overlay
component is one of:
* A field or format name, which means the entire field/format is used.
* E.g., `imm5`
* A field or format name with a bit range, which selects only the specified
bits.
* E.g., `imm5[4..1]`
* A bit range on its own, which refers directly to the current formats bits.
* E.g., `[11..8]` (which would refer to the same bits as `imm[4..1]`)
* A binary number literal which defines both the value and the width (number
of digits).
* E.g., `0b0100'1010`
The overlay section of the RiscV RV32G BType format declaration is shown below:
```
overlays:
signed imm[13] = imm7[6], imm5[0], imm7[5..0], imm5[4..1], 0b0;
```
## Instruction Encodings
The instruction encodings are grouped in one or more `instruction group`
definitions. Each such definition gives rise to a separate instruction decode
function in the generated code. The instructions in a single group are subject
to two constraints: they must all have the same width, and the instruction
formats used by the instructions must all derive from the same ancestor (as
specified in the group header). The user may otherwise freely select how to
group the instructions according to relevant ISA constraints, such as mode etc.
## Header
The header of the instruction group defines the name of the group, the width,
the opcode enum C++ type used for the opcodes and the base instruction format
all the grouped instructions formats must derive from. An example from RiscV 32
is shown below:
```
instruction group RiscVInst32[32] : Inst32Format
```
This defines RiscVInst32 as the name of an instruction group with 32 bit wide
instructions, using the base format "Inst32Format". All the instructions listed
within have to derive from "Inst32Format".
## Instructions
The instructions in an instruction group are listed sequentially in a semicolon
separated list. Each instruction is described by its name, the instruction
format used, and a comma separated list of value constraints on fields or
overlays in that instruction.
As an example, consider the RiscV branch on equal beq instruction. This
instruction uses the BType format that was listed previously in the discussion
on instruction formats.
```
beq : BType : func3 == 0b000, opcode == 0b110'0011;
```
As can be seen, the value constraints are that the instruction field func3 has
to be equal to 0, and the opcode field has to have the value 0x63. Note, the
radix of the numeric literal in each constraint is not restricted to binary, but
can be any of binary, octal, hexadecimal or decimal. The choice is up to the
author in terms of what is most convenient.
Another example using five instructions from the 16 bit RiscV instruction group
that all use format CR illustrates how not-equal constraints can also be used :
```
format CR[16] : Inst16Format {
fields:
unsigned func4[4];
unsigned rs1[5];
unsigned rs2[5];
unsigned op[2];
};
instructions Inst16[16] "isa::OpcodeEnum" : Inst16Format {
...
cjr : CR : func4 == 0b1000, rs1 != 0, rs2 == 0, op == 0b10;
cmv : CR : func4 == 0b1000, rs1 != 0, rs2 != 0, op == 0b10;
cebreak : CR : func4 == 0b1001, rs1 == 0, rs2 == 0, op == 0b10;
cjalr : CR : func4 == 0b1001, rs1 != 0, rs2 == 0, op == 0b10;
cadd : CR : func4 == 0b1001, rs1 != 0, rs2 != 0, op == 0b10;
...
};
```
## Decoder
The decoder definition specifies properties for the generated binary instruction
decoder. Multiple decoders may be specified in a .bin\_fmt file, but only one
can be selected, by passing in a command line option to the parser tool, for
parsing and code generation.
The decoder definition specifies 4 pieces of information: namespace, the opcode
enumeration type to use, decoder specific include files, and the instruction
groups for which to generate decoders for.
```
decoder RiscV32GV {
namespace mpact::sim::riscv::encoding;
opcode_enum = "isa32v::OpcodeEnum";
includes {
#include "some/include/file.h"
}
// Combine instruction groups RiscV32 I, M and F.
RiscV32G = {RiscVIInst32, RiscVMInst32, RiscVFInst32};
// Separate group for compact instructions.
RiscVCInst16;
};
```
### Namespace
The namespace property specifies the namespace in which the generated code will
be placed.
### Opcode Enumeration Type
The `opcode_enum` specifies the text string that references the opcode
enumeration type to use as the return value for the instruction decoder(s).
The binary instruction decoder generator is designed to work with the
representation independent decoder generator (go/mpact-sim-isa-decoder). As
such, it has to translate from the binary representation of instructions to the
opcode enumeration type defined in the generated files by that tool. In order
for this to work properly, the string literal has to specify the name of the
enumeration type as it needs to be used from within the namespaces (see the
detailed grammar rules), defined for the binary decoder generator. The name of
each instruction listed in the binary encoding description file must match the
name of an instruction in the representation independent decoder description
file for the enumeration type members to match up.
### Include Files
Include files specific to this decoder may be specified here. They will be
copied to the generated code.
### Instruction Groups
The remainder specifies the named instruction groups to generate decode
functions for. Multiple instruction groups can be combined into a single group
for purposes of generating only one decode function for their combined
instructions, as opposed to having separate functions for each one. Only
instruction groups that use the same format can be combined in this way. In the
example above, three instruction groups (integer, multiply/divide, and floating
point) are combined to generate a single decoder, while a separate decoder
function is generated for the compact, 16-bit instructions.
# Generated C++ Header Code
The binary decoder generator tool generates a set of functions in the namespace
defined in the description file (see detailed syntax). As mentioned previously,
a decode function is generated for each instruction group. The signature is
```
<enum type> Decode<group name>(<word type> inst_word);
```
Where &lt;enum type> is the string literal in the instruction group definition,
&lt;group name> is the name of the instruction group, and &lt;word type> is the
smallest unsigned integer type that fits the width of the instruction. When
called, the function will return the enum member that matches the word passed in
as an argument, or the enum member kNone (always defined), if there is no match.
The tool also generates extract functions for each field and overlay in the
formats that are defined in the description file. Each such extraction function
is defined within a nested namespace named after the format in which it is
defined. The extraction functions are defined as inline in the .h file to
improve performance without forcing every function to be included in the
executable, as many of these are not used by the decoder, but should be made
available to the user if desired.
The return type is the smallest signed/unsigned (according to the field/overlay
definition) integer type that fits the width of the field/overlay. The argument
type is the smallest unsigned integer type that fits the format width. The
following example shows the RiscV BType format and the extraction functions
generated:
<table>
<tr>
<td><code>format BType[32] : Inst32Format {</code>
<p>
<code> fields:</code>
<p>
<code> unsigned imm7[7];</code>
<p>
<code> unsigned rs2[5];</code>
<p>
<code> unsigned rs1[5];</code>
<p>
<code> unsigned func3[3];</code>
<p>
<code> unsigned imm5[5];</code>
<p>
<code> unsigned opcode[7];</code>
<p>
<code> overlays:</code>
<p>
<code> signed imm[13] = imm7[6], imm5[0], imm7[5..0], imm5[4..1], 0b0;</code>
<p>
<code>};</code>
</td>
</tr>
<tr>
<td><code>namespace BType {</code>
<p>
<code>inline uint8_t ExtractOpcode(uint32_t value) {</code>
<p>
<code> return value & 0x7f;</code>
<p>
<code>}</code>
<p>
<code>inline uint8_t ExtractImm7(uint32_t value) {</code>
<p>
<code> return (value >> 25) & 0x7f;</code>
<p>
<code>}</code>
<p>
<code>inline uint8_t ExtractRs1(uint32_t value) {</code>
<p>
<code> return (value >> 15) & 0x1f;</code>
<p>
<code>}</code>
<p>
<code>inline uint8_t ExtractImm5(uint32_t value) {</code>
<p>
<code> return (value >> 7) & 0x1f;</code>
<p>
<code>}</code>
<p>
<code>inline uint8_t ExtractRs2(uint32_t value) {</code>
<p>
<code> return (value >> 20) & 0x1f;</code>
<p>
<code>}</code>
<p>
<code>inline uint8_t ExtractFunc3(uint32_t value) {</code>
<p>
<code> return (value >> 12) & 0x7;</code>
<p>
<code>}</code>
<p>
<code>inline int16_t ExtractImm(uint32_t value) {</code>
<p>
<code> int16_t result;</code>
<p>
<code> result = (value & 0x80000000) >> 19;</code>
<p>
<code> result |= (value & 0x80) &lt;< 4;</code>
<p>
<code> result |= (value & 0x7e000000) >> 20;</code>
<p>
<code> result |= (value & 0xf00) >> 7;</code>
<p>
<code> result = result &lt;< 3;</code>
<p>
<code> result = result >> 3;</code>
<p>
<code> return result;</code>
<p>
<code>}</code>
<p>
<code>} // namespace BType</code>
</td>
</tr>
</table>
# Generating the Decoder Functions
A decoder function is generated for each instruction group, returning the enum
member for one of the instructions, if there is a match with that instruction,
or kNone if there is not.
## Algorithm
A new algorithm was developed to generate the decoder for the instructions in an
instruction group. The goal of the decoder is to be correct and reasonably fast
and efficient, but does not attempt to be optimal.
### Initialization
In the first stage of the algorithm only the equal to constraints for each
encoding are considered. The algorithm computes the mask of bits that are
constrained equal for each instruction in the instruction group. This is called
the *constraint mask* of the instruction.
### Creating Initial Encoding Groups
In the second stage of the algorithm, an initial set of *encoding groups* are
formed. The algorithm is shown below. Initially, the set of encoding groups is
empty. For each instruction encoding, consider each encoding group in turn. If
the intersection of the constraint mask of the encoding group with the
constraint mask of the encoding is zero, that is, there is no overlap between
the constrained bits in the encoding with the constrained bits in the encoding
group, then go to the next encoding group. If the intersection is non-zero,
select that encoding group.
If an encoding group was selected, add the current encoding to that group, and
update the encoding groups constraint mask with a bitwise and of the
instruction encodings constraint mask. If no encoding group was selected,
create a new encoding group, setting the constraint mask of the encoding group
to that of the instruction encoding. At the end of this stage, each instruction
encoding has been assigned to an encoding group, and it is guaranteed that all
encoding groups have a non-zero constraint mask. The algorithm is shown in the
following figure.
```
for each instruction encoding enc {
inst_enc_group = nullptr;
for each encoding group grp {
if ((grp->constraint_mask & enc->constraint_mask) == 0) continue
inst_enc_group = grp
break
}
if (inst_enc_group != nullptr) {
Add enc to inst_enc_group
inst_enc_group->constraint_mask &= enc->constraint_mask
} else {
Create a new encoding group enc_group
Add enc to enc_group
enc_group->constraint_mask = enc->constraint_mask
}
}
```
### Subdividing Encoding Groups
Once the initial encoding groups have been formed each encoding group is
subdivided recursively into smaller encoding groups based on the equal-to
constraints for the instructions in the encoding group.
Three quantities are computed for each encoding group. First, the *constant
mask*, the subset of the bits in the constraint mask that have the same
constraint value across all the instruction encodings in the encoding group.
Second, the *constant value*, which is the shared value of the bits in the
constant mask. Third, the *discriminator mask*, which is the bitset difference
between the constraint mask and the constant mask. A fourth quantity is used as
well, it is computed for subgroups when an encoding group is divided. This is
the *ignore mask*, which is the set of bits that have already been handled by an
encoding group higher in the hierarchy, and no longer need to be considered. For
instance, all 32 bit instructions in the RiscV32G ISA have 0b11 as the value of
the two lsbs. Only the top level decoder function needs to check the value of
these bits, any subsequent comparisons need consider at most the remaining 28
bits.
If the number of set bits in the discriminator is 0 an encoding group cannot be
subdivided, as there are no common bits by which to separate the instructions.
Otherwise an encoding group can be subdivided into 2^(1 &lt;<
popcount(discriminator)), where popcount(x) returns the number of set bits. For
instance, if a set of instruction encodings in a group have a constraint mask of
0x7f, a constant mask of 0x3, and thus a discriminator mask of 0x7c (the
discriminator mask has five bits sets), there can be a total of 32 (2^5)
subgroups created. In other words, the encodings in the encoding group can be
divided into 32 subgroups, and the subgroup for any instruction encoding can be
looked up by an index extracted from the bits in the instruction word that
correspond to the discriminator mask.
Pseudo code for the algorithm is shown below. No subgroups are created that are
either size 0 or the size of the group itself. Moreover, no subgroups are
created if the discriminator mask has only one bit set.
```
Refine(EncodingGroup G) {
for (int i = 0; i < (1 << popcount(G->discriminator_mask()); i++) {
subgroup = new EncodingGroup
subgroup->SetIgnore(G->ignore_bits() | G->discriminator_mask() |
G->constant_mask())
for each encoding enc in G {
if (discriminator_value(enc) != i) continue
subgroup->AddEncoding(enc)
}
if (subgroup->empty()) {
delete subgroup
continue
}
if (subgroup->size() == G.size()) {
delete subgroup
return
}
if (popcount(subgroup->discriminator()) >= 2) {
Refine(subgroup)
}
}
}
```
There is a property of encoding groups that is worth noting. If an encoding
group has no subgroups, and there are no instruction encodings in the group that
have not-equal constraints, and there are no equal constraints that involve
overlays with bit constants, then that encoding group is considered to be
*simple*. Instructions in a simple encoding group can have their opcode
determined using a lookup table without any explicit comparisons.
### Generating Code
The code generation process is the same for each instruction group listed in the
decoder definition. In addition to the top level decode function for the
instruction group, a separate decode function is generated for each encoding
group that will either directly return an opcode, or call the decode function
for a subgroup. The top level decoder function contains the code to call the top
level encoding groups decode function one after the other until one returns a
valid opcode value (not equal to `kNone`). If no valid opcode is found, `kNone`
is returned. If there is only a single top level encoding group the function is
still emitted as described. A future optimization would be to inline the next
level decode function directly instead. Below is the top level decode function
generated for RiscV32G `Inst32` instruction group:
```
isa::OpcodeEnum DecodeRiscVInst32(uint32_t inst_word) {
isa::OpcodeEnum opcode;
opcode = DecodeRiscVInst32_0(inst_word);
if (opcode != isa::OpcodeEnum::kNone) return opcode;
return opcode;
}
```
The decode functions for every encoding group follow the same pattern. First if
there is a non-zero constant\_mask for the encoding group the mask is applied to
the instruction word and the value is compared to the constant\_value. If the
values are not equal, the decode function returns `kNop`. If the encoding group
has subgroups, then the index value corresponding to the discriminator\_mask is
computed and used to select and call the function pointer for the next level
decode function, returning its return value. The RiscV32G 0 level decode
function is shown below:
```
isa::OpcodeEnum DecodeRiscVInst32_0(uint32_t inst_word) {
if ((inst_word & 0x3) != 0x3) return isa::OpcodeEnum::kNone;
uint32_t index;
index = (inst_word >> 2) & 0x1f;
return parse_group_RiscVInst32_0[index](inst_word);
}
```
The `parse_group_*` arrays are defined in the `.cc` file, and initialized with
the appropriate function pointers for the different encoding groups.
For leaf encoding groups that are simple, the decode function is almost
identical to the above function, except that the index is used to select from an
opcode array, defined static constexpr local to the decode function. The
following shows such a function for RiscV32 32 bit store instructions:
```
isa::OpcodeEnum DecodeRiscVInst32_0_8(uint32_t inst_word) {
static isa::OpcodeEnum opcodes[4] = {
isa::OpcodeEnum::kSb,
isa::OpcodeEnum::kSh,
isa::OpcodeEnum::kSw,
isa::OpcodeEnum::kNone,
};
if ((inst_word & 0x4000) != 0x0) return isa::OpcodeEnum::kNone;
uint32_t index;
index = (inst_word >> 12) & 0x3;
return opcodes[index];
}
```
For non-simple leaf decode functions, the code starts with field extractions
required for the constraints testing, then each opcode is considered in turn, as
shown below.
```
isa::OpcodeEnum DecodeRiscVInst16_0_12(uint16_t inst_word) {
uint16_t index;
index = (inst_word >> 12) & 0x1;
uint16_t rs1_value = (inst_word >> 7) & 0x1f;
uint16_t rs2_value = (inst_word >> 2) & 0x1f;
if ((index == 0x0) && (rs1_value != 0x0) && (rs2_value != 0x0))
return isa::OpcodeEnum::kCmv;
if ((index == 0x0) && (rs2_value == 0x0) && (rs1_value != 0x0))
return isa::OpcodeEnum::kCjr;
if ((index == 0x1) && (rs1_value == 0x0) && (rs2_value == 0x0))
return isa::OpcodeEnum::kCebreak;
if ((index == 0x1) && (rs1_value != 0x0) && (rs2_value != 0x0))
return isa::OpcodeEnum::kCadd;
if ((index == 0x1) && (rs2_value == 0x0) && (rs1_value != 0x0))
return isa::OpcodeEnum::kCjalr;
return isa::OpcodeEnum::kNone;
}
```
# Detailed Syntax of the .bin File
The full [Antlr4](https://www.antlr.org) grammar of the .isa file is found in
the file `BinFormat.g4)`.