| # MPACT-Sim Representation Independent Decoder Generator |
| |
| Last updated 11/28/22 |
| |
| # Goals |
| |
| * Define a separation of instruction decoding into two parts. One, fully |
| independent of the actual instruction encoding and formatting, that |
| manipulates the simulator internal instruction representation. The other |
| tasked with extracting information from the actual encoding format. |
| * Provide a convenient format for describing the structure of the target |
| architecture ISA in terms of instruction bundles and slots, where bundles |
| are groups of instruction slots that are issued together. |
| * Provide a convenient format for describing the set of instructions (opcodes) |
| that may be issued in each slot. |
| * Provide a convenient format for describing each instruction, its predicate, |
| source and destination operands, the semantic function that implements its |
| instruction semantics, and its optional disassembly format, without |
| requiring description or knowledge of the underlying instruction encoding |
| format. |
| * Generate C++ source code that implements the representation independent part |
| of the decoder and the interface that the encoding specific part of the |
| decoder must implement. |
| |
| # Decoding Instructions |
| |
| MPACT-Sim uses instances of the Instruction class to represent individual target |
| architecture instructions. The Instruction class instance contains several key |
| items. Most importantly it stores the c++ callable for the C++ function (or |
| method, lambda, or other function object) that implements the semantics of an |
| instruction, i.e., reads operands, computes result(s), and writes to one or more |
| of the destination operands. It also includes pointers to source and destination |
| operand interfaces, through which the source and destination operand values are |
| read and written. These Instruction class instances are cached in a translation |
| cache, and reused when the same instruction is re-executed. The caching allows |
| the simulator to amortize the cost of decoding an instruction across many |
| executions. |
| |
| The decoding of a target instruction extracts information from the instruction |
| encoding to populate fields in the Instruction class instance. Writing this code |
| by hand is tedious and repetitive. Similar code with slight variations has to be |
| written for each of the hundred(s) of instructions in typical ISAs. The textual |
| representation of the C++ code is also a far cry from how the instruction |
| information is represented in most ISA manuals. This tends to make this code bug |
| prone, whether it is from mistranslating the information from ISA manuals |
| (extracting bit-fields for instance), or faulty copy-paste-modify on the many |
| repeated sequences of code. These bugs are typically tedious to debug, as their |
| side-effects often manifest themselves as simulated code not executing quite |
| right, e.g., wrong branch offset, or the wrong branch instruction executes. |
| |
| A hand written decoder is typically inextricably coupled to the instruction |
| representation, such as your standard binary instruction encoding. That makes it |
| more costly to repurpose the simulator to read instructions in different |
| formats, e.g., textual assembly. |
| |
| When using a simulator to perform architectural experiments it is desirable to |
| have a flexible encoding scheme. Often an existing encoding scheme may not have |
| room for all the architectural variations that need to be explored. A simple |
| experiment to measure the effect of doubling the number of registers becomes |
| difficult, as it would likely require a redesign and/or widening of the |
| instruction encoding. Decoupling parts of the decoder from the instruction |
| representation makes it easier and cheaper in terms of engineering resources to |
| use the simulator for architectural exploration. |
| |
| ## Representation Independent vs Dependent Decoding |
| |
| The instruction instance needs the following information in order to simulate an |
| instruction: |
| |
| * Semantic function - a C callable that implements the instruction semantics. |
| * Predicate operand (optional), used to determine if the instruction should |
| execute or not. |
| * Source operand interfaces used by the semantic function to read instruction |
| sources. |
| * Destination operand interfaces used by the semantic function to write the |
| results of the instruction execution. |
| * Instruction size. |
| * Instruction address. |
| |
| The first four of these are all dependent on the type and identity of the |
| instruction, or the instruction opcode. In the traditional binary encoding |
| scheme, the opcode of an instruction may be determined from a single field |
| across the instruction set (rare), or a single field in sub-groups of the |
| instruction set combined with an instruction format specifier (more common). The |
| opcode may have additional constraints based on values of operand fields in the |
| instruction word as well, as is the case in the RiscV architecture. |
| Additionally, a predicate field may override any other decoding and designate an |
| instruction statically as a nop. On the other hand, in a proto based encoding |
| scheme, the opcode may be expressed as a single number (or enumeration type). |
| |
| Regardless of the representation, the semantic function and the operands all |
| depend on the opcode. Therefore, the first step in creating a representation |
| independent decoder is to abstract out the parsing of the opcode to an interface |
| implemented by a representation specific decoder. |
| |
| As mentioned above, the number and types of operands, such as register or |
| immediate, required by an instruction are determined by the opcode. On the other |
| hand, the exact value of a register number or an immediate, can only be |
| determined from the instruction representation. If factory methods for creating |
| instruction operands are implemented in the representation specific decoder, the |
| representation independent decoder can call these to obtain operand objects to |
| populate the Instruction instance. |
| |
| The size of an instruction typically refers to its size in a particular |
| representation and again is a function of the opcode with no additional |
| information needed from the instruction representation. |
| |
| The instruction address is not part of the decoding except as a parameter to |
| identify the storage location of an instruction. |
| |
| # Representation Independent Decoder Description |
| |
| The information necessary to generate the representation independent decoder is |
| contained in an MPACT-Sim ISA description file. It is processed by |
| `decoder_gen`, a purpose-built tool based on the [Antlr4](https://www.antlr.org) |
| parser generator, which reads the description and generates the appropriate |
| code. BUILD rules have been set up in the `mpact_sim_isa.bzl` file to make it |
| easy to incorporate the generated code into the simulator project for any target |
| architecture. |
| |
| This section gives an overview of the contents of the description file and how |
| it fits in with MPACT-Sim and the code that gets generated. A detailed |
| description of the syntax is described in the next section. |
| |
| ## Instruction Description |
| |
| Each instruction in the ISA is described separately. If the same operation, say |
| integer ADD, is implemented using multiple instructions, as long as they have |
| different encodings or different operands (for instance immediates vs registers, |
| unsigned immediate vs signed immediate), each such variation must have its own |
| description. |
| |
| Each instruction description can be divided into three components. The opcode |
| description, the semantic function specification, and the optional disassembly |
| format. |
| |
| ### Opcode |
| |
| #### Name |
| |
| The opcode description specifies the name, the operands, and any child |
| instructions (see go/mpact-sim-overview “Modeling Instruction Issue and |
| Semantics”). The name of the opcode has to be unique within the ISA description. |
| By convention it is written in snake-case, with no capital letters, though this |
| is not a requirement. The name of the opcode is used to generate the name of an |
| entry in the enumeration class `OpcodeEnum`, by prepending “k” to the |
| pascal-case version of the name. That is, `add_i` becomes `kAddI`. This |
| enumeration class is generated in a separate .h file and is used both by the |
| generated code and in the representation specific decoder interface. |
| |
| #### Size |
| |
| Each opcode has a byte size associated with it, by default this is 1. The use of |
| the size is left to the simulator writer. For some ISAs it makes sense to have |
| the instruction size represent the PC increment when issuing instructions. For |
| some VLIW instructions, only the top-level bundle has an address, so the size of |
| the individual instructions don’t matter and can be left at 1. |
| |
| #### Operands |
| |
| The operands of an opcode are defined by a triplet: predicate operand, source |
| operand list, and destination operand list. Each individual operand is given a |
| name. This name is important. Just like the opcode name is used to create an |
| entry in an enum, each unique operand name is similarly added to an enum. There |
| are separate enum classes for each operand category: `PredOpEnum`, |
| `SourceOpEnum` and `DestOpEnum`. Just like for the OpcodeEnum class, the opcode |
| name gets converted to Pascal-case and prepended with a ‘k’. Thus, the operand |
| name “I\_imm12” gives rise to the enum class entry of `kIImm`12. Operands of the |
| same type (predicate/source/destination), that 1) have the same width, 2) have |
| the same zero/sign extension, and 3) refer to the same fields in the instruction |
| encoding should be given the same name. This minimizes the number of distinct |
| operand types and simplifies implementation of the representation specific |
| decoder, which will need to implement 3 methods, one for each of the predicate, |
| source and destination type. Each takes a value of the corresponding operand |
| enum class as a parameter and returns a new initialized operand object. |
| |
| The source and destination operands are specified in comma separated lists of |
| operand names. The predicate operand is a single operand name. Each type is |
| separated by a ‘:’, and may be left empty if desired. |
| |
| Each destination operand can be specified with an instruction latency, that is, |
| how many cycles the result should be buffered before being written to the |
| destination object (register or other simulated state). A value of zero causes |
| an immediate update without buffering. A value of one writes the value back at |
| the end of the current simulated cycle, i.e., so that instructions issued in the |
| next simulated cycle can see the update. A value of two writes the value back |
| and the end of the next simulated cycle, etc. The distinction between zero and |
| one is significant mostly for ISAs where instructions can be issued with |
| parallel semantics, e.g., instructions in a VLIW instruction word, where no |
| instruction can see updates from any other instruction in the same word. |
| |
| The following shows an example of the opcode declaration for the 32 bit add |
| immediate instruction with 0 latency in the RiscV32i ISA: |
| |
| ``` |
| addi [4] { /*empty*/ : rs1, I_imm12 : rd(0) } |
| ``` |
| |
| #### Child Instructions |
| |
| The semantics of an instruction may be divided into multiple actions that are |
| performed at separate times during the simulated execution of that instruction. |
| For instance, a load instruction is typically divided into two, the address |
| calculation and memory request, and the write-back of the data fetched from |
| memory to the register. The second (and any subsequent actions) are referred to |
| as child instructions of the opcode. They are allocated to separate Instruction |
| instances and have their own operands and semantic function. In this case, |
| operands can be assigned to each child instruction by using parenthesized lists |
| of operand triplets. For instance a RiscV32i ISA load word instruction would be |
| described as follows: |
| |
| ``` |
| lw [4] { (/*empty*/ : rs1, I_imm12), (/*empty*/ : /*empty*/ : rd) } |
| ``` |
| |
| ### Semantic Function |
| |
| The instruction semantic function is a C++ callable that takes a pointer to the |
| instruction instance that implements the semantic operation of the instruction. |
| That is, it reads any source operands, performs the operation, and writes out |
| any results to the destination operands. An important part of the decoder is to |
| bind the correct semantic function to each instruction instance. |
| |
| To make it easier, the binding is expressed directly in the ISA description file |
| by adding a `semfunc` attribute to the opcode declaration. The semfunc attribute |
| takes a list of strings, one entry for each instruction/sub-instruction |
| specified, containing C++ code that is suitable to assign to the C++ callable, |
| including pointers to free functions, std::bind and absl::bind\_front bound |
| methods and functions, as well as lambdas and functors. |
| |
| The example below shows the semantic functions for the above load word |
| instruction. |
| |
| ``` |
| lw [4] { (/*empty*/ : rs1, I_imm12), (/*empty*/ : /*empty*/ : rd) }, |
| semfunc: "&RV32ILw", "&RV32ILwChild"; |
| ``` |
| |
| ### Disassembly |
| |
| One of the intended use cases for MPACT-Sim is modeling prototype ISAs or |
| prototype ISA extensions. It is not always the case that there is a full toolset |
| available for such architectures, including a disassembler and/or debugger. |
| Therefore, the addition of a disassembly capability makes a lot of sense. |
| MPACT-Sim comes with a simple interactive user interface to step, run, set/clear |
| breakpoints, read and write memory and registers, etc. More complex interfaces |
| can easily be built, as well as custom simulator drivers. Being able to see the |
| disassembly of the instruction that is being executed is very valuable, and |
| helps debugging should there be any suspected issues with the instruction |
| semantics or decoding. |
| |
| The disassembly format is specified as the `disasm` opcode attribute, similarly |
| to the semantic function attribute. The argument to `disasm` is a list of text |
| strings that describes how the instruction should be disassembled. The list may |
| contain one or more strings that, when formatted, are concatenated to a single |
| disassembly string. The use of the `global disasm` widths declaration, allows |
| for each individual string to be left or right justified within a fixed width |
| field. The `global disasm` declaration takes a brace delimited list of integers, |
| one for each field to format. The sign of the integer specifies either left (-) |
| or right (+) justified, while the absolute value specifies the width (similar to |
| C style format strings). |
| |
| E.g., the following specifies that the first string will be left justified in a |
| field of 18 characters wide. The remaining strings will be concatenated. |
| |
| ``` |
| global disasm = {-18}; |
| ``` |
| |
| Any term in the string following an unescaped ‘%’ sign is interpreted to require |
| string substitution. Typically the string substitution is performed for operands |
| of the instruction. Each operand class has a `ToString()` method that returns |
| the preferred string representation of the operand value. For instance, register |
| operands return the register name, whereas immediate operands return the |
| immediate value. More complex formatting can be performed using a ‘%(<expr>)’ |
| construct, which allows a simple expression to be used as well as formatting the |
| value in hexadecimal, octal or binary. |
| |
| The disassembly format applies only to the main instruction, not child |
| instructions. |
| |
| Below is an example of the RiscV32i slli (shift left logical immediate) |
| instruction description including the disassembly format. |
| |
| ``` |
| slli[4] { /*empty*/ : rs1, I_uimm5 : rd }, |
| disasm: "slli", "%rd, %rs1, 0x%(I_uimm5:x)", |
| semfunc: "&RV32ISll"; |
| ``` |
| |
| ## Slots and Bundles: Supporting VLIW Architectures |
| |
| As discussed so far the instruction description easily supports single issue |
| ISAs, that is, where the instructions have sequential semantics, regardless of |
| how an implementation may issue them. Most traditional architectures fall into |
| this category. However, VLIW ISAs impose some additional structure on the ISA, |
| and the MPACT-Sim isa description language has features to support these. |
| |
| ### Slots |
| |
| A *slot* is an instruction position in a VLIW word. In its simplest form, a VLIW |
| word consists of exactly one slot, and that is how traditional non-VLIW ISAs are |
| modeled in this description. True VLIW ISAs has a number of slots. The slots may |
| be identical, that is, any instruction can be issued from any slot, or they can |
| differ, restricting which instructions can be issued from which slots. The |
| MPAC-Sim isa description supports both cases. |
| |
| A slot definition specifies an identifier as the slot name and an optional comma |
| separated list of slot names to inherit from (see below). The slot body |
| contains, an optional include file section, a set of `default` declarations and |
| an `opcodes` specification, which contains all the opcode definitions valid for |
| this slot. |
| |
| The default declarations allow for specifying the default size of instructions, |
| default latency for destination operands, and default opcode attributes |
| (semantic function and disassembly format), so that they don’t have to be |
| specified in opcode descriptions except when the value differs. An example is |
| shown below: |
| |
| ``` |
| slot riscv32i { |
| includes { |
| // Any include files containing definitions used in the semfunc |
| // attributes. |
| #include "some/include/file.h" |
| } |
| default size = 4; |
| default latency = 0; |
| default opcode = |
| disasm: "Illegal instruction at 0x%(@:08x)", |
| semfunc: "&RV32IllegalInstruction"; |
| opcodes { |
| ... |
| } |
| } |
| ``` |
| |
| The ‘@’ sign in the disassembly format represents the instruction address. |
| |
| The opcode specification is done as previously described. |
| |
| The tool generates a separate C++ decode function for each slot type that is |
| used in the ISA. |
| |
| #### Slot Inheritance |
| |
| In some ISAs there may be a subset of instructions that can be issued from |
| multiple slots. Instead of requiring that the opcodes be defined anew in each |
| such slot, the notion of slot inheritance is introduced. A slot inheriting from |
| another slot inherits all of the opcodes from the base slot, except those that |
| are marked “deleted” in the derived slot. For instance: |
| |
| ``` |
| slot base { |
| opcodes { |
| one [4] {}; |
| two [4] {}; |
| } |
| } |
| |
| slot derived : base { |
| opcodes { |
| two = delete; // Only inherits opcode 'one'. |
| } |
| } |
| ``` |
| |
| Slot inheritance allows the ISA to be divided into subgroups by function if so |
| desired. In the RiscV32G description, each subgroup of the ISA is defined in a |
| separate slot, and then combined in a final slot that is used in the ISA. |
| |
| ``` |
| slot riscv32 : riscv32i, riscv32c, riscv32m, riscv32a, riscv32f, riscv32d, zicsr, zfencei { |
| // default attributes for any instructions not otherwise matched. |
| default opcode = |
| disasm: "Illegal instruction at 0x%(@:08x)", |
| semfunc: "&RV32IllegalInstruction"; |
| } |
| ``` |
| |
| #### Templated Slots |
| |
| A base slot can also be a templated slot. This allows, for instance, destination |
| operand latencies to be specified as an expression involving one or more |
| template parameters. Inherited opcodes are then evaluated in terms of the actual |
| template arguments. Currently only integer valued template arguments are |
| supported. The syntax is unsurprisingly familiar: |
| |
| ``` |
| template <int a, int b> |
| slot base_templated { |
| opcodes { |
| one [4] { : rs1, rs2 : rd(a + 1) }; |
| two [4] { : rs1, rs2 : rd(a + b + 1) }; |
| } |
| } |
| |
| slot derived : base_templated<1, 3> { |
| … |
| } |
| ``` |
| |
| ### Bundles |
| |
| A traditional VLIW instruction word is a bundle of slots with instructions that |
| are issued at the same time. However, some VLIW ISAs go further and divide its |
| slots into subgroups that can be issued at different times in the pipeline, or |
| even with a variable delay. MPACT-Sim allows bundles to be defined. A bundle |
| definition specifies the name of the bundle. The bundle body has two sections: |
| bundles and slots, that list the names of other bundle and slot definitions that |
| make up the current bundle. |
| |
| Each bundle definition used in the isa will have a DecodeFunction generated for |
| it. |
| |
| ### ISA |
| |
| The top level of the MPACT-Sim isa specification is the “isa” definition. There |
| may be more than one isa definition in a .isa file. The isa for which to |
| generate code is specified in an option to the isa-parsing tool. The “isa” |
| definition specifies the name of the isa, the namespace within which the C++ |
| code will be generated, and the set of slots and bundles which makes up the isa, |
| similarly to a bundle definition. |
| |
| The simple isa definition for RiscV32G is shown below: |
| |
| ``` |
| isa RiscV32G { |
| namespace mpact::sim::riscv::isa32; |
| slots { riscv32; } |
| } |
| ``` |
| |
| ### Constants |
| |
| Integer typed constants can be declared both at the global level and within |
| slots to give names to values. These constants can be used in expressions with |
| other constants and integer literals wherever integer values can be used (e.g., |
| instruction latencies). E.g., |
| |
| ``` |
| int global_latency = 1; |
| |
| slot myslot { |
| int my_latency = global_latency + 1; |
| ... |
| } |
| ``` |
| |
| ### Include Files |
| |
| In addition to the include files specified within each slot, as described |
| previously, a set of include files can also be specified at the global level. |
| While those specified within a slot are only added to the generated code if the |
| slot is reachable from the top level isa, the global include files are always |
| included in the generated code. |
| |
| ``` |
| includes { |
| #include "include/a/global/file.h" |
| } |
| ``` |
| |
| # Detailed Syntax of .isa File |
| |
| The full [Antlr4](https://www.antlr.org) grammar of the .isa file is found in |
| the file `InstructionSet.g4`. |