MacOS Arm64 汇编 part 2 - MacOS Arm64 Assembly part 2

  • flow control
  • branch
  • flags
  • a basic printer

Unconditional branch

无条件分支(跳转)

1
B label

直接跳转到 label 处,理论上计算时是利用 PC 再加上一个 26 bits 的 offset 实现,但是 clang 这边会自动处理超长,因此只需考虑标签名称,无需考虑指令距离超过 26 bits offset 表示长度

Condition Flags

  • Negative: N 是 如果有符号数是负数,否则为
  • Zero: Z 是 如果结果是 ,通常用于表示相等比较的结果
  • Carry: 对于加法类型的操作,如果有溢出,则此标志为 ,否则为 ,对于减法则相反;同时这一标志通常存储上一被移位出去的 bit
  • oVerflow: 对于加法或减法,如果有符号计算溢出发生,则置

一些指令通过 oVerflow 标志来表示错误情况
通常能需要添加一个 S 在指令后面以使它设置标志,否则将不会修改标志,处了比较指令

Branch on Condition

有条件分支

1
B.{condition} label

其中 {condtion} 来源于下表

{condition} Flags Meaning
EQ Z set Equal
NE Z clear Not equal
CS or HS C set Higher or same (unsigned >=)
CC or LO C clear Lower (unsigned <)
MI N set Negative
PL N clear Positive or zero
VS V set Overflow
VC V clear No overflow
HI C set and Z clear Higher (unsigned >)
LS C clear and Z set Lower or same (unsigned <=)
GE N and V the same Signed >=
LT N and V differ Signed <
GT Z clear, N and V the same Signed >
LE Z set, N and V differ Signed <=
AL Any Always (same as no suffix)

set 表示为 ,clear 表示为

CMP

用于进行比较的指令

1
CMP Xn, Operand2

事实上,这里也是别名,等价于

1
SUBS XZR, Xn, Operand2

CMN / TST

同样是比较指令,但与 CMP 相比有些差别,具体在比较方法的选取上,CMN 采用加法替代 CMP 的加法,TST 则采用按位与。

在手写汇编时,若 Operand2 不合法时,汇编器会自动在三条指令中尝试切换

Loop

For Loop

对于这样的 C 代码

1
2
3
4
5
int main(){
for(int i = 0; i < 10; i++){
}
return 0;
}

可以采用

1
2
3
4
5
6
7
8
9
// Assume X5 denotes i
loop:
cmp i, #10
b.ge loop_end
// start of for loop
// todo: ...
add X5, X5, #1 // i++
b loop
loop_end: // end of for loop

这是 clang 实现的简化版,真实的版本会采用栈上存储的变量,且 cmp 在编译时替换为 subs 我们只保留了结构

clang implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
t[0x100000328] <+0>:  sub    sp, sp, #0x10
t[0x10000032c] <+4>: str wzr, [sp, #0xc]
t[0x100000330] <+8>: str wzr, [sp, #0x8]
t[0x100000334] <+12>: b 0x100000338 ; <+16> at t.c:2:18
t[0x100000338] <+16>: ldr w8, [sp, #0x8]
t[0x10000033c] <+20>: subs w8, w8, #0xa
t[0x100000340] <+24>: b.ge 0x10000035c ; <+52> at t.c:4:3
t[0x100000344] <+28>: b 0x100000348 ; <+32> at t.c:3:3
t[0x100000348] <+32>: b 0x10000034c ; <+36> at t.c:2:27
t[0x10000034c] <+36>: ldr w8, [sp, #0x8]
t[0x100000350] <+40>: add w8, w8, #0x1
t[0x100000354] <+44>: str w8, [sp, #0x8]
t[0x100000358] <+48>: b 0x100000338 ; <+16> at t.c:2:18
t[0x10000035c] <+52>: mov w0, #0x0 ; =0
t[0x100000360] <+56>: add sp, sp, #0x10
t[0x100000364] <+60>: ret

While Loop

与 for loop 类似,区别是无需维护循环变量

1
2
3
4
5
6
7
int main(){
int i = 0;
while(i < 10){
i++;
}
return 0;
}

可以采用

1
2
3
4
5
6
7
8
9
// Assume X5 denotes i
loop:
cmp i, #10
b.ge loop_end
// start of for loop
// todo: ...
add X5, X5, #1 // i++
b loop
loop_end: // end of for loop

而 clang 的实现是

clang implementation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
t[0x100000328] <+0>:  sub    sp, sp, #0x10
t[0x10000032c] <+4>: str wzr, [sp, #0xc]
t[0x100000330] <+8>: str wzr, [sp, #0x8]
t[0x100000334] <+12>: b 0x100000338 ; <+16> at t.c:3:9
t[0x100000338] <+16>: ldr w8, [sp, #0x8]
t[0x10000033c] <+20>: subs w8, w8, #0xa
t[0x100000340] <+24>: b.ge 0x100000358 ; <+48> at t.c:6:3
t[0x100000344] <+28>: b 0x100000348 ; <+32> at t.c:4:6
t[0x100000348] <+32>: ldr w8, [sp, #0x8]
t[0x10000034c] <+36>: add w8, w8, #0x1
t[0x100000350] <+40>: str w8, [sp, #0x8]
t[0x100000354] <+44>: b 0x100000338 ; <+16> at t.c:3:9
t[0x100000358] <+48>: mov w0, #0x0 ; =0
t[0x10000035c] <+52>: add sp, sp, #0x10
t[0x100000360] <+56>: ret

IF-ELSE IF-ELSE

一样是通过 CMP 加标签跳转,和上面是一样的

Logical Operators (bitwise)

逻辑运算

AND EOR ORR

1
2
3
AND{S} Xd, Xs, Operand2 // Xd = Xs & Operand2
EOR{S} Xd, Xs, Operand2 // Exclusive or (XOR) Xd = Xs ^ Operand2
ORR{S} Xd, Xs, Operand2 // Inclusive or Xd = Xs | Operand2

BIC

特殊的一条指令(bit clear),执行的实际上是 Xd = Xs & ~Operand2,也即当对应 bit 在 Operand2 中为 0 时,才输出 Xs 对应位到 Xd,否则为 0

1
BIC{S} Xd, Xs, Operand2

Converting Integers to ASCII

转换一个整型到 ASCII 字符串并输出

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
//
// Assembler program to print a register in hex
// to stdout.
//
// X0-X2 - parameters to linux function services
// X1 - is also address of byte we are writing
// X4 - register to print
// W5 - loop index
// W6 - current character
// X8 - linux function number
//

.global _start // Provide program starting address
.align 4

_start:
MOV X4, #0x6E3A
MOVK X4, #0x4F5D, LSL #16
MOVK X4, #0xFEDC, LSL #32
MOVK X4, #0x1234, LSL #48

// Set X1 to the end of the destination
ADRP X1, hexstr@PAGE
ADD X1, X1, hexstr@PAGEOFF
ADD X1, X1, #17

// The loop is FOR W5 = 16 TO 1 STEP -1
MOV W5, #16 // 16 digits to print
loop:
AND W6, W4, #0xf // mask of least sig digit
// If W6 >= 10 then goto letter
CMP W6, #10 // is 0-9 or A-F
B.GE letter
// Else its a number so convert to an ASCII digit
ADD W6, W6, #'0'
B cont // goto to end if
letter: // handle the digits A to F
ADD W6, W6, #('A'-10)
cont: // end if
STRB W6, [X1] // store ascii digit
SUB X1, X1, #1 // decrement address for next digit
LSR X4, X4, #4 // shift off the digit we just processed

// next W5
SUBS W5, W5, #1 // step W5 by -1
B.NE loop // another for loop if not done

// Setup the parameters to print our hex number
// and then call Linux to do it.
MOV X0, #1 // 1 = StdOut
ADRP X1, hexstr@PAGE // start of string
ADD X1, X1, hexstr@PAGEOFF
MOV X2, #19 // length of our string
MOV X16, #4 // linux write system call
SVC #0x80 // Call linux to output the string

// Setup the parameters to exit the program
// and then call the kernel to do it.
MOV X0, #0 // Use 0 return code
MOV X16, #1 // System call number 1 terminates this program
SVC #0x80 // Call kernel to terminate the program

.data
hexstr: .ascii "0x123456789ABCDEFG\n"

其中 X4 存储我们需要打印的整型,这里采用了 .data 段,该段可进行读写,我们的程序修改这里到我们需要的结果,并最终进行系统调用打印这一字符串

我们的程序采用了一个特殊的指令 adrp,这一指令加载了符号对应页的首地址到 X1,随后我们通过 add 指令在这一地址上添加这一符号对应页内偏移量

分两条是因为指令长度有限,存不下 64 bit 的内存地址

我们还额外添加了 #17 的 offset 是为了移到字符串的 \n 前一位,因为我们从后往前输出

程序当中,我们通过 W5 维护循环变量,执行 次循环,在每次循环中,我们采用一个 and 操作,通过 0xf 也就是 0b1111 作为掩码,取出最后 4 bit,也就是一位 16 进制数,存入 W6

随后我们先判断其所是大于 还是小于 ,进行对应 - 以及 - 的 ASCII 选择,存入 W6,在代码中 #'0'#('A'-10) 这样的写法是取 ASCII 的值

然后我们通过 STR 指令保存 W6[X1],中括号意为间接,也就是要存储到 [X1] 所存储的地址的内存,随后我们将 X1 减去 前移一位,以及对 X4 做了右移,获取高一位的十六进制数

循环结束后,我们在 X1 存储字符串开头地址,进行系统调用。

下面是一个通过递减右移实现顺序打印的版本,无需变长buffer,但是多次 syscall 应该是会导致性能降低

64-bit int print without 64-bit long buffer
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
.global _start
.align 4
_start: mov X4, #0xabcd // int
movk X4, #0x5678, LSL #16
movk X4, #0x1234, LSL #32
movk X4, #0xbcde, LSL #48
mov X5, #60 // descending offset 60,56,52,48,...,0
mov X0, #1 // stdout
adr X1, prefix // print \n
mov X2, #2
mov X16, #4
mov X7, #0 // flag of is leading zero
svc #0x80
loop: cmp W5,0
b.lt loop_end
lsr X8, X4, X5
and W8, W8, #0xf // mask last 4 bits
cmp X7, #1
b.eq compare
cmp W8, #0
b.eq call_pos
mov X7, #1
compare: cmp W8, #10
b.lt num
letter: add W8, W8, #('a' - 10)
b print
num: add W8, W8, #48
b print
call_pos: sub W5, W5, #4
b loop
print: mov X0, #1 // stdout
adrp X1, buffer@PAGE
add X1, X1, buffer@PAGEOFF
strb W8, [X1] // w8: char ascii code
mov X2, #1
mov X16, #4
svc #0x80
b call_pos
loop_end: mov X0, #1 // stdout
adr X1, newline // print \n
mov X2, #1
mov X16, #4
svc #0x80
mov X0, #0
mov X16, #1
svc #0x80
.text
newline: .byte '\n'
prefix: .ascii "0x"
.data
buffer: .byte 0

Switch case

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
// AArch64 Assembly Code
// Programming with 64-Bit ARM Assembly Language
// Chapter #4: Excercise #2 (Page 81)
// Jeff Rosengarden 08/27/20

//
// Create assembly code to emulate a switch/case statement

// REGISTERS USED IN CODE
// w11 - holds switch variable (1 thru 3 for this case statement)
// w12 - holds the exit value that can be queried at the OS level with: echo $?
// NOTE: w12 is transferred to w0 just before program exit so the
// user can query the value with $?
// w13 - work register used during calculation of mesg length

// X0 - X2 hold parameters for Darwin/kernel system call
// X0 - holds FD (file device) for output (stdout in this case)
// X1 - holds address of mesg
// X2 - holds length of mesg

// X16 - used to hold Darwin/Kernel system call ID
// X9 - holds calculated length of mesg




.global _start // Provide program starting address
.align 4

_start: // this is the switch portion of the case statement
// branching to select1, select2, select3 or default
// labels based on value in w11


// set "select" value (1 thru 3) in w11

mov w12, 0xff // Prepare for error case
cmp x0, #2 // Make sure we have precisely two arguments
bne endit // If it is not: exit
ldr x11, [x1, #8] // Get the pointer at x1 + 8
ldrb w11, [x11] // Load the Byte pointed to by that pointer into w11
sub w11, w11, #'0' // Subtract the ascii value for '0'

cmp w11, #1 // will set Z flag to 1 if w11 - 1 == 0
b.eq select1 // Z Flag == 1 ?
cmp w11, #2 // will set Z flag to 1 if w11 - 2 == 0
b.eq select2 // Z Flag == 1 ?
cmp w11, #3 // will set Z flag to 1 if w11 - 3 == 0
b.eq select3 // Z Flag == 1 ?

// if w11 contains anything other than 1 thru 3 the program
// will fall thru to the default label here

// each label is a switch/case target based on the above select code

default: mov w12, #99 // Use 99 return code for os query ($?)
b break // same as switch/case break statement

select1: mov w12, #1 // Use 1 return code for os query ($?)
b break // same as switch/case break statement

select2: mov w12, #2 // Use 2 return code for os query ($?)
b break // same as switch/case break statement

select3: mov w12, #3 // Use 3 return code for os query ($?)
// b break not necessary here as it will
// fall thru when done executing the
// select3: case

break: nop // nop here just to prove we actually
// wind up here from each case statement
// when debugging with lldb

// code after the select/case would go
// here

ADRP X1, mesg@PAGE // start of message to display at OS level
ADD X1, X1, mesg@PAGEOFF

// calculate length of mesg (store it in x9)

mov x9, #0 // zero out x9 before starting
cloop:
ldrb w13, [x1,x9] // get the next digit in mesg
cmp w13, #255 // is it equal to 255 (0xFF)
b.eq cend // yes - jump to cend
add x9, x9, #1 // no - increase x9 count by 1
b cloop // do it again
cend:


// Setup the parameters to print string
// and then call Darwin/kernel to do it.
MOV X0, #1 // 1 = StdOut



MOV X2, X9 // length of our string
MOV X16, #4 // Darwin write system call
SVC #0x80 // Call Darwin/kernel to output the string


// Setup the parameters to exit the program
// and then call the kernel to do it.
endit:
mov W0, W12 // move return code into X0 so it can be
// queried at OS level
MOV X16, #1 // System call number 1 terminates this program
SVC #0x80 // Call Darwin/kernel to terminate the program

// return 0


.data
mesg: .byte 0x1B // clear screen
.byte 'c' // and start msg at
.byte 0 // top of screen
.asciz "Chapter 4; Excercise #2\n"
.asciz " - Emulate switch/case in assembly code\n\n"
.asciz " -By Jeff Rosengarden\n"
.asciz " -Created on Apple DTK\n\n"
.asciz " Use: echo $? to see result of program\n"
.asciz "\n\n\n" // 3 add'l blank lines
.byte 255


// alternative clear screen command (clears screen and starts msg @ bottom)
// .asciz "\33[2J"

这个程序除了 switch...case... 之外还给出了读取命令行参数的方法,下面是一个命令行参数获取的demo

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
.global _start
.align 4

strlen:
mov x1, x0
strlen_loop:
ldrb w2, [x1] // only 32 bit register
cmp w2, #0
b.eq strlen_loop_end
add x1, x1, #1
b strlen_loop
strlen_loop_end:
sub x0, x1, x0
ret

println:
mov x0, #1 // 1 - stdout
mov x16, #4 // output
svc #0x80
mov x0, #1 // 1 - stdout
adr x1, newline
mov x2, #1
svc #0x80
ret

exit:
mov x0, #0 // exit code 0
mov x16, #1 // System call number 1 terminates this program
svc #0x80 // Call Darwin/kernel to terminate the program
ret

_start:
cmp x0, #2
b.lt no_arg

mov x10, x0 // x0 store the count of args
mov X11, x1 // x1 store the first memory address (not logical, each pointer has 8 bytes interval) of char * [],
// which means each element is a pointer
ldr x0, [x11, #8] // 8 bytes offset, which is a pointer-interval, the instruction load the second string pointer address to x1
bl strlen
ldr x1, [x11, #8]
mov x2, x0 // string length we calculated

bl println
bl exit

no_arg:
adr x1, noarg
mov x0,x1
bl strlen
mov x2,x0
adr x1, noarg
bl println
bl exit
.text
newline: .byte '\n'
noarg: .ascii "Error: no arg"
  • x0 对应 int argc 是参数的数量
  • x1 对应 char *argv 参数字符串指针数组的首地址
    argv + 1 对应 [x1, #8],这个理解为先将 x1 存储的首地址偏移 8 字节再解引用(读取偏移后地址存储内容)
  • ldr 是读取寄存器中的地址(并计算偏移)将地址结果存储到另一个寄存器
  • adr 是基于 pc 计算标签的地址(小范围)并存储到寄存器
  • adrp 是基于 pc 计算全局页变量(大范围)并存储到寄存器,与 @PAGE @PAGEOFF 连用,可用 add 累加 @PAGEOFF