# Teak Instruction Set Encoding

Each opcode is one- or two-word wide. In this code base, two-word wide opcodes are generally referred as "expanded". The optional second word is used purely for parameters, so the opcode family can always be determined from the first word (which then determine whether there is a second word). The encoding is very messy. The general theme of the encoding are
 - The encoding seems first designed for TeakLite (which is already a mess), then additional opcodes from Teak squeezed in formerly unused locations.
 - While registers `r0`~`r7` are generally at equal status in the computation model, they are not always in the opcodes. Specifically, as `r6` doesn't present in TeakLite, and because of the opcode squeezing, many opcodes involving `r6` are encoded separately with seemingly random pattern.
 - Similarly, there are also opcodes that are in the same family but have different pattern for different arguments.
 - Some specific combination of arguments can be invalid for one opcode, and this specific pattern can be used for a totally different opcode.
 - Some opcodes have unused bits, where setting either 0 or 1 has no effect.

## The format of the decoder table

`decoder.h` contains the full table of the instruction set. Each entry in the table is in the form:
```
INST(name, pattern[, slot0[...]])[.EXCEPT(except_slot)[..]]
```
An entry represents match rules for an opcode family, in the following procedure:
 - The opcode is matched against the pattern, with some bits masked out (i.e. not compared). The masked bits are provided by all the `slot`s. If the pattern doesn't match, try a different entry.
 - Then the opcode is matched against each EXCEPT clause. A specific pattern and masked bits are provided by the `except_slot`. If any pattern in EXCEPT matches, try a different entry.
 - Now this entry is successfully matched with the opcode. Each `slot` extract their relavent bits from the opcode and convert them into arguments.
 - The function `Visitor::name` is called with the converted arguments.

## Comparison between the decoder table and gbatek

The decoder table is derived from the table in gbatek, but rearranged a little according to the opcode families and patterns. The notation of the two table are comparable. For example:

```
teakra: INST(push, 0xD7C8, At<Abe, 1>, Unused<0>)
gbatek: D7C8h TL2 push Abe@1, Unused1@0
```
This means that the opcode has a parameter `Abe` at bit 1 (which is 2-bit long, defined by operand `Abe`), and an unused bit at bit 0. The rest bit of the opcode (bit 3~15) should match the pattern `0xD7C8`. The assembly of this opcode would be like `push <Abe>`.

Sometimes there is a 18-bit address parameter that is split into two parts:

```
teakra: INST(br, 0x4180, At<Address18_16, 16>, At<Address18_2, 4>, At<Cond, 0>)
gbatek: 4180h TL  br   Address18@16and4, Cond@0
```
Note that the existence of `At<..., 16>` also indicates that this opcode is 2-word long. The pattern `0x4180` is only matched against the first word.

Some opcodes that have the same pattern are merged into one opcode in teakra. For example
```
teakra: INST(alm, 0xA000, At<Alm, 9>, At<MemImm8, 0>, At<Ax, 8>),
gbatek: {
        A000h TL  or   MemImm8@0, Ax@8
        A200h TL  and  MemImm8@0, Ax@8
        ...
        BE00h TL  cmpu MemImm8@0, Ax@8
}
```
Here the operation name is also treated as an operand `Alm` in teakra.

Opcodes with constants that has special encoding are marked in their opcode names:
```
teakra: INST(sub_p1, 0xD4B9, At<Ax, 8>),
gbatek: D4B9h TL2 sub  p1, Ax@8
```

However, if several opcodes with constants have different encoding but similar semantics, the constants are placed in the parameter list and delegate to the same opcode
```
teakra: {
        INST(alm_r6, 0xD388, Const<Alm, 0>, At<Ax, 4>),
        INST(alm_r6, 0xD389, Const<Alm, 1>, At<Ax, 4>),
        ...
        INST(alm_r6, 0x9462, Const<Alm, 8>, At<Ax, 0>),
        ...
        INST(alm_r6, 0x5F41, Const<Alm, 13>, Const<Ax, 0>),
        INST(alm_r6, 0x9062, Const<Alm, 14>, At<Ax, 8>, Unused<0>),
        INST(alm_r6, 0x8A63, Const<Alm, 15>, At<Ax, 3>),
}
gbatek: {
        D388h TL2 or   r6, Ax@4
        D389h TL2 and  r6, Ax@4
        ...
        9462h TL2 msu  y0, r6, Ax@0 // y0 is implied by msu
        ...
        5F41h TL2 sqr  r6 // Ax is implied/unused for sqr
        9062h TL2 sqra r6, Ax@8, Unused1@0
        8A63h TL2 cmpu r6, Ax@3
}
```