MacOS Arm64 汇编 part 2 - MacOS Arm64 Assembly part 2

Posted on 2026-02-05 Edited on 2026-02-24 In Assembly , Arm64 , OSX Views: Word count in article: 1.4k Reading time ≈ 5 mins.

flow control
branch
flags
a basic printer

Unconditional branch

无条件分支（跳转）

B label

直接跳转到 label 处，理论上计算时是利用 PC 再加上一个 26 bits 的 offset 实现，但是 clang 这边会自动处理超长，因此只需考虑标签名称，无需考虑指令距离超过 26 bits offset 表示长度

Condition Flags

Negative: N 是如果有符号数是负数，否则为
Zero: Z 是如果结果是，通常用于表示相等比较的结果
Carry: 对于加法类型的操作，如果有溢出，则此标志为，否则为，对于减法则相反；同时这一标志通常存储上一被移位出去的 bit
oVerflow: 对于加法或减法，如果有符号计算溢出发生，则置

一些指令通过 oVerflow 标志来表示错误情况
通常能需要添加一个 S 在指令后面以使它设置标志，否则将不会修改标志，处了比较指令

Branch on Condition

有条件分支

1	B.{condition} label

其中 {condtion} 来源于下表

{condition}	Flags	Meaning
EQ	Z set	Equal
NE	Z clear	Not equal
CS or HS	C set	Higher or same (unsigned >=)
CC or LO	C clear	Lower (unsigned <)
MI	N set	Negative
PL	N clear	Positive or zero
VS	V set	Overflow
VC	V clear	No overflow
HI	C set and Z clear	Higher (unsigned >)
LS	C clear and Z set	Lower or same (unsigned <=)
GE	N and V the same	Signed >=
LT	N and V differ	Signed <
GT	Z clear, N and V the same	Signed >
LE	Z set, N and V differ	Signed <=
AL	Any	Always (same as no suffix)

set 表示为，clear 表示为

`CMP`

用于进行比较的指令

1	CMP Xn, Operand2

事实上，这里也是别名，等价于
1
SUBS XZR, Xn, Operand2

`CMN` / `TST`

同样是比较指令，但与 CMP 相比有些差别，具体在比较方法的选取上，CMN 采用加法替代 CMP 的加法，TST 则采用按位与。

在手写汇编时，若 Operand2 不合法时，汇编器会自动在三条指令中尝试切换

Loop

For Loop

对于这样的 C 代码

int main(){
  for(int i = 0; i < 10; i++){
  }
  return 0;
}

可以采用

// Assume X5 denotes i
loop:
cmp i, #10
b.ge loop_end
// start of for loop
// todo: ...
add X5, X5, #1 // i++
b loop
loop_end: // end of for loop

这是 clang 实现的简化版，真实的版本会采用栈上存储的变量，且 cmp 在编译时替换为 subs 我们只保留了结构

clang implementation

t[0x100000328] <+0>:  sub    sp, sp, #0x10
t[0x10000032c] <+4>:  str    wzr, [sp, #0xc]
t[0x100000330] <+8>:  str    wzr, [sp, #0x8]
t[0x100000334] <+12>: b      0x100000338    ; <+16> at t.c:2:18
t[0x100000338] <+16>: ldr    w8, [sp, #0x8]
t[0x10000033c] <+20>: subs   w8, w8, #0xa
t[0x100000340] <+24>: b.ge   0x10000035c    ; <+52> at t.c:4:3
t[0x100000344] <+28>: b      0x100000348    ; <+32> at t.c:3:3
t[0x100000348] <+32>: b      0x10000034c    ; <+36> at t.c:2:27
t[0x10000034c] <+36>: ldr    w8, [sp, #0x8]
t[0x100000350] <+40>: add    w8, w8, #0x1
t[0x100000354] <+44>: str    w8, [sp, #0x8]
t[0x100000358] <+48>: b      0x100000338    ; <+16> at t.c:2:18
t[0x10000035c] <+52>: mov    w0, #0x0 ; =0
t[0x100000360] <+56>: add    sp, sp, #0x10
t[0x100000364] <+60>: ret

While Loop

与 for loop 类似，区别是无需维护循环变量

int main(){
  int i = 0;
  while(i < 10){
    i++;
  }
  return 0;
}

可以采用

// Assume X5 denotes i
loop:
cmp i, #10
b.ge loop_end
// start of for loop
// todo: ...
add X5, X5, #1 // i++
b loop
loop_end: // end of for loop

而 clang 的实现是

clang implementation

t[0x100000328] <+0>:  sub    sp, sp, #0x10
t[0x10000032c] <+4>:  str    wzr, [sp, #0xc]
t[0x100000330] <+8>:  str    wzr, [sp, #0x8]
t[0x100000334] <+12>: b      0x100000338    ; <+16> at t.c:3:9
t[0x100000338] <+16>: ldr    w8, [sp, #0x8]
t[0x10000033c] <+20>: subs   w8, w8, #0xa
t[0x100000340] <+24>: b.ge   0x100000358    ; <+48> at t.c:6:3
t[0x100000344] <+28>: b      0x100000348    ; <+32> at t.c:4:6
t[0x100000348] <+32>: ldr    w8, [sp, #0x8]
t[0x10000034c] <+36>: add    w8, w8, #0x1
t[0x100000350] <+40>: str    w8, [sp, #0x8]
t[0x100000354] <+44>: b      0x100000338    ; <+16> at t.c:3:9
t[0x100000358] <+48>: mov    w0, #0x0 ; =0
t[0x10000035c] <+52>: add    sp, sp, #0x10
t[0x100000360] <+56>: ret

IF-ELSE IF-ELSE

一样是通过 CMP 加标签跳转，和上面是一样的

Logical Operators (bitwise)

逻辑运算

`AND` `EOR` `ORR`

1
2
3

AND{S} Xd, Xs, Operand2 // Xd = Xs & Operand2
EOR{S} Xd, Xs, Operand2 // Exclusive or (XOR) Xd = Xs ^ Operand2
ORR{S} Xd, Xs, Operand2 // Inclusive or Xd = Xs | Operand2

`BIC`

特殊的一条指令（bit clear），执行的实际上是 Xd = Xs & ~Operand2，也即当对应 bit 在 Operand2 中为 0 时，才输出 Xs 对应位到 Xd，否则为 0

1	BIC{S} Xd, Xs, Operand2

Converting Integers to ASCII

转换一个整型到 ASCII 字符串并输出

//
// Assembler program to print a register in hex
// to stdout.
//
// X0-X2 - parameters to linux function services
// X1 - is also address of byte we are writing
// X4 - register to print
// W5 - loop index
// W6 - current character
// X8 - linux function number
//

.global _start	 // Provide program starting address
.align 4

_start:
  MOV	X4, #0x6E3A
	MOVK	X4, #0x4F5D, LSL #16
	MOVK	X4, #0xFEDC, LSL #32
	MOVK	X4, #0x1234, LSL #48

    // Set X1 to the end of the destination
    ADRP    X1, hexstr@PAGE
    ADD     X1, X1, hexstr@PAGEOFF
    ADD     X1, X1, #17

// The loop is FOR W5 = 16 TO 1 STEP -1
	MOV	W5, #16	    // 16 digits to print
loop:
  AND	W6, W4, #0xf // mask of least sig digit
// If W6 >= 10 then goto letter
	CMP	W6, #10	    // is 0-9 or A-F
	B.GE	letter
// Else its a number so convert to an ASCII digit
	ADD	W6, W6, #'0'
	B	cont	// goto to end if
letter: // handle the digits A to F
	ADD	W6, W6, #('A'-10)
cont:	// end if
	STRB	W6, [X1]	// store ascii digit
	SUB	X1, X1, #1		// decrement address for next digit
	LSR	X4, X4, #4	// shift off the digit we just processed

	// next W5
	SUBS	W5, W5, #1		// step W5 by -1
	B.NE	loop		// another for loop if not done

// Setup the parameters to print our hex number
// and then call Linux to do it.
	MOV	X0, #1	    // 1 = StdOut
	ADRP	X1, hexstr@PAGE // start of string
	ADD	X1, X1, hexstr@PAGEOFF
	MOV	X2, #19	    // length of our string
	MOV	X16, #4	    // linux write system call
	SVC	#0x80 	    // Call linux to output the string

// Setup the parameters to exit the program
// and then call the kernel to do it.
	MOV     X0, #0      // Use 0 return code
  MOV     X16, #1     // System call number 1 terminates this program
  SVC     #0x80           // Call kernel to terminate the program

.data
hexstr:      .ascii  "0x123456789ABCDEFG\n"

其中 X4 存储我们需要打印的整型，这里采用了 .data 段，该段可进行读写，我们的程序修改这里到我们需要的结果，并最终进行系统调用打印这一字符串

我们的程序采用了一个特殊的指令 adrp，这一指令加载了符号对应页的首地址到 X1，随后我们通过 add 指令在这一地址上添加这一符号对应页内偏移量

分两条是因为指令长度有限，存不下 64 bit 的内存地址

我们还额外添加了 #17 的 offset 是为了移到字符串的 \n 前一位，因为我们从后往前输出

程序当中，我们通过 W5 维护循环变量，执行次循环，在每次循环中，我们采用一个 and 操作，通过 0xf 也就是 0b1111 作为掩码，取出最后 4 bit，也就是一位 16 进制数，存入 W6

随后我们先判断其所是大于还是小于，进行对应 - 以及 - 的 ASCII 选择，存入 W6，在代码中 #'0' 或 #('A'-10) 这样的写法是取 ASCII 的值

然后我们通过 STR 指令保存 W6 到 [X1]，中括号意为间接，也就是要存储到 [X1] 所存储的地址的内存，随后我们将 X1 减去前移一位，以及对 X4 做了右移，获取高一位的十六进制数

循环结束后，我们在 X1 存储字符串开头地址，进行系统调用。

下面是一个通过递减右移实现顺序打印的版本，无需变长buffer，但是多次 syscall 应该是会导致性能降低

64-bit int print without 64-bit long buffer

.global _start
.align 4
_start: mov X4, #0xabcd // int
    movk X4, #0x5678, LSL #16
    movk X4, #0x1234, LSL #32
    movk X4, #0xbcde, LSL #48
    mov X5, #60        // descending offset 60,56,52,48,...,0
    mov X0, #1 // stdout
    adr X1, prefix // print \n
    mov X2, #2
    mov X16, #4
    mov X7, #0 // flag of is leading zero
    svc #0x80
  loop: cmp W5,0
    b.lt loop_end
    lsr X8, X4, X5
    and W8, W8, #0xf // mask last 4 bits
    cmp X7, #1
    b.eq compare
    cmp W8, #0
    b.eq call_pos
    mov X7, #1
  compare: cmp W8, #10
    b.lt num
  letter: add W8, W8, #('a' - 10)
    b print
  num: add W8, W8, #48
    b print
  call_pos: sub W5, W5, #4
    b loop
  print: mov X0, #1 // stdout
    adrp X1, buffer@PAGE
    add X1, X1, buffer@PAGEOFF
    strb W8, [X1] // w8: char ascii code
    mov X2, #1
    mov X16, #4
    svc #0x80
    b call_pos
  loop_end: mov X0, #1 // stdout
    adr X1, newline // print \n
    mov X2, #1
    mov X16, #4
    svc #0x80
    mov X0, #0
    mov X16, #1
    svc #0x80
.text
  newline: .byte '\n'
  prefix: .ascii "0x"
.data
  buffer: .byte 0

Switch case

// AArch64 Assembly Code
// Programming with 64-Bit ARM Assembly Language
// Chapter #4: Excercise #2 (Page 81)
// Jeff Rosengarden 08/27/20

//
// Create assembly code to emulate a switch/case statement

// REGISTERS USED IN CODE
// w11 	- 	holds switch variable (1 thru 3 for this case statement)
// w12 	-	holds the exit value that can be queried at the OS level with: echo $?
//			NOTE:  w12 is transferred to w0 just before program exit so the
//				   user can query the value with $?
// w13 	-	work register used during calculation of mesg length

// X0 - X2  hold parameters for Darwin/kernel system call
// X0	-	holds FD (file device) for output (stdout in this case)
// X1	-	holds address of mesg
// X2 	-	holds length of mesg 

// X16	-	used to hold Darwin/Kernel system call ID
// X9	-	holds calculated length of mesg




.global _start	 							// Provide program starting address
.align 4

_start:		// this is the switch portion of the case statement
			// branching to select1, select2, select3 or default
			// labels based on value in w11
			
			
			// set "select" value (1 thru 3) in w11
				
			mov w12, 0xff	// Prepare for error case 
			cmp x0, #2	// Make sure we have precisely two arguments
			bne endit	// If it is not: exit
			ldr x11, [x1, #8]	// Get the pointer at x1 + 8
			ldrb w11, [x11]	// Load the Byte pointed to by that pointer into w11
			sub w11, w11, #'0' // Subtract the ascii value for '0'

			cmp w11, #1			// will set Z flag to 1 if w11 - 1 == 0
			b.eq select1		// Z Flag == 1 ?
			cmp w11, #2			// will set Z flag to 1 if w11 - 2 == 0
			b.eq select2		// Z Flag == 1 ?
			cmp w11, #3			// will set Z flag to 1 if w11 - 3 == 0
			b.eq select3		// Z Flag == 1 ?
			
			// if w11 contains anything other than 1 thru 3 the program
			// will fall thru to the default label here
			
			// each label is a switch/case target based on the above select code
			
default:	mov w12, #99					// Use 99 return code for os query ($?)
			b break							// same as switch/case break statement

select1:	mov w12, #1						// Use 1 return code for os query ($?)
			b break							// same as switch/case break statement
			
select2:	mov w12, #2						// Use 2 return code for os query ($?)
			b break							// same as switch/case break statement

select3:	mov w12, #3						// Use 3 return code for os query ($?)
											// b break not necessary here as it will
											// fall thru when done executing the
											// select3: case
		
break:		nop								// nop here just to prove we actually 
											// wind up here from each case statement
											// when debugging with lldb
											
											// code after the select/case would go 
											// here
											
			ADRP	X1, mesg@PAGE 			// start of message to display at OS level
			ADD	X1, X1, mesg@PAGEOFF
			
			// calculate length of mesg (store it in x9)

			mov x9, #0						// zero out x9 before starting
cloop:
			ldrb w13, [x1,x9]				// get the next digit in mesg
			cmp  w13, #255					// is it equal to 255 (0xFF)
			b.eq  cend						// yes - jump to cend
			add x9, x9, #1					// no  - increase x9 count by 1
			b cloop							// do it again
cend:	
	

											// Setup the parameters to print string
											// and then call Darwin/kernel to do it.
			MOV	X0, #1	    				// 1 = StdOut

	

			MOV	X2, X9	    				// length of our string
			MOV	X16, #4	    				// Darwin write system call
			SVC	#0x80 	    				// Call Darwin/kernel to output the string


// Setup the parameters to exit the program
// and then call the kernel to do it.
endit:
			mov		W0, W12					// move return code into X0 so it can be
											// queried at OS level
    		MOV     X16, #1     			// System call number 1 terminates this program
    		SVC     #0x80           		// Call Darwin/kernel to terminate the program
		
			// return 0


.data
mesg:	.byte 0x1B												// clear screen 
		.byte 'c'												// and start msg at
		.byte 0													// top of screen
		.asciz	"Chapter 4; Excercise #2\n"
		.asciz	" - Emulate switch/case in assembly code\n\n"
		.asciz	" -By Jeff Rosengarden\n"
		.asciz	" -Created on Apple DTK\n\n" 
		.asciz	" Use: echo $? to see result of program\n"
		.asciz	"\n\n\n"										// 3 add'l blank lines
		.byte	255	
		

//		alternative clear screen command (clears screen and starts msg @ bottom)
//		.asciz 	"\33[2J"

这个程序除了 switch...case... 之外还给出了读取命令行参数的方法，下面是一个命令行参数获取的demo

.global _start
.align 4

strlen: 
  mov x1, x0
  strlen_loop: 
    ldrb  w2, [x1] // only 32 bit register
    cmp w2, #0
    b.eq strlen_loop_end
    add x1, x1, #1
    b strlen_loop
  strlen_loop_end:
  sub x0, x1, x0
  ret

println: 
  mov x0, #1    // 1 - stdout
  mov x16, #4   // output
  svc #0x80
  mov x0, #1    // 1 - stdout
  adr x1, newline
  mov x2, #1
  svc #0x80
  ret

exit:
  mov x0, #0    // exit code 0
  mov x16, #1   // System call number 1 terminates this program
  svc #0x80     // Call Darwin/kernel to terminate the program
  ret

_start:	
  cmp x0, #2
  b.lt no_arg

  mov x10, x0   // x0 store the count of args
  mov X11, x1   // x1 store the first memory address (not logical, each pointer has 8 bytes interval) of char * [], 
                // which means each element is a pointer
  ldr x0, [x11, #8] // 8 bytes offset, which is a pointer-interval, the instruction load the second string pointer address to x1
  bl strlen
  ldr x1, [x11, #8]
  mov x2, x0    // string length we calculated

  bl println
  bl exit

  no_arg:
    adr x1, noarg
    mov x0,x1
    bl strlen
    mov x2,x0
    adr x1, noarg
    bl println
    bl exit
.text 
  newline: .byte '\n'
  noarg: .ascii "Error: no arg"

x0 对应 int argc 是参数的数量

x1 对应 char *argv 参数字符串指针数组的首地址
argv + 1 对应 [x1, #8]，这个理解为先将 x1 存储的首地址偏移 8 字节再解引用（读取偏移后地址存储内容）

ldr 是读取寄存器中的地址（并计算偏移）将地址结果存储到另一个寄存器，若目标寄存器为 32 bits (w) 那么只读低 32 bits 并零填充，若目标寄存器为 64 bits (x) 则完整拷贝

ldrb 是读低字节并零填充到目标寄存器，同时限制目标寄存器只能是 32 bits 的 w 类型寄存器

adr 是基于 pc 计算标签的地址（小范围）并存储到寄存器

adrp 是基于 pc 计算全局页变量（大范围）并存储到寄存器，与 @PAGE @PAGEOFF 连用，可用 add 累加 @PAGEOFF