fishhook源代码解析

导读
在做网络性能监听的时候,需要对网络状态进行监听,OC的代码可以通过runtime机制做hook,但是其中涉及到C代码函数,如何进行hook?通过查资料,使用fishhook可以解决这个问题

官方内容:

fishhook

fishhook is a very simple library that enables dynamically rebinding symbols in Mach-O binaries running on iOS in the simulator and on device. This provides functionality that is similar to using DYLD_INTERPOSE on OS X. At Facebook, we’ve found it useful as a way to hook calls in libSystem for debugging/tracing purposes (for example, auditing for double-close issues with file descriptors).

Usage

Once you add fishhook.h/fishhook.c to your project, you can rebind symbols as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
#import <dlfcn.h>
#import <UIKit/UIKit.h>
#import "AppDelegate.h"
#import "fishhook.h"
static int (*orig_close)(int);
static int (*orig_open)(const char *, int, ...);
int my_close(int fd) {
printf("Calling real close(%d)\n", fd);
return orig_close(fd);
}
int my_open(const char *path, int oflag, ...) {
va_list ap = {0};
mode_t mode = 0;
if ((oflag & O_CREAT) != 0) {
// mode only applies to O_CREAT
va_start(ap, oflag);
mode = va_arg(ap, int);
va_end(ap);
printf("Calling real open('%s', %d, %d)\n", path, oflag, mode);
return orig_open(path, oflag, mode);
} else {
printf("Calling real open('%s', %d)\n", path, oflag);
return orig_open(path, oflag, mode);
}
}
int main(int argc, char * argv[])
{
@autoreleasepool {
rebind_symbols((struct rebinding[2]){{"close", my_close, (void *)&orig_close}, {"open", my_open, (void *)&orig_open}}, 2);
// Open our own binary and print out first 4 bytes (which is the same
// for all Mach-O binaries on a given architecture)
int fd = open(argv[0], O_RDONLY);
uint32_t magic_number = 0;
read(fd, &magic_number, 4);
printf("Mach-O Magic Number: %x \n", magic_number);
close(fd);
return UIApplicationMain(argc, argv, nil, NSStringFromClass([AppDelegate class]));
}
}

Sample output

1
2
3
4
Calling real open('/var/mobile/Applications/161DA598-5B83-41F5-8A44-675491AF6A2C/Test.app/Test', 0)
Mach-O Magic Number: feedface
Calling real close(3)
...

How it works

dyld binds lazy and non-lazy symbols by updating pointers in particular sections of the __DATA segment of a Mach-O binary. fishhook re-binds these symbols by determining the locations to update for each of the symbol names passed to rebind_symbols and then writing out the corresponding replacements.

For a given image, the __DATA segment may contain two sections that are relevant for dynamic symbol bindings: __nl_symbol_ptr and __la_symbol_ptr. __nl_symbol_ptr is an array of pointers to non-lazily bound data (these are bound at the time a library is loaded) and __la_symbol_ptr is an array of pointers to imported functions that is generally filled by a routine called dyld_stub_binder during the first call to that symbol (it’s also possible to tell dyld to bind these at launch). In order to find the name of the symbol that corresponds to a particular location in one of these sections, we have to jump through several layers of indirection. For the two relevant sections, the section headers (struct sections from <mach-o/loader.h>) provide an offset (in the reserved1 field) into what is known as the indirect symbol table. The indirect symbol table, which is located in the __LINKEDIT segment of the binary, is just an array of indexes into the symbol table (also in __LINKEDIT) whose order is identical to that of the pointers in the non-lazy and lazy symbol sections. So, given struct section nl_symbol_ptr, the corresponding index in the symbol table of the first address in that section is indirect_symbol_table[nl_symbol_ptr->reserved1]. The symbol table itself is an array of struct nlists (see <mach-o/nlist.h>), and each nlist contains an index into the string table in __LINKEDIT which where the actual symbol names are stored. So, for each pointer __nl_symbol_ptr and __la_symbol_ptr, we are able to find the corresponding symbol and then the corresponding string to compare against the requested symbol names, and if there is a match, we replace the pointer in the section with the replacement.

The process of looking up the name of a given entry in the lazy or non-lazy pointer tables looks like this:
Visual explanation

源码解析

调用方法:

使用非常简单,可以用下面的方式hook printf的方法,来执行自定义的printf方法。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
static int (*orig_printf)(const char * __restrict, ...);
int my_printf(const char * __restrict result,...) {
return orig_printf("%s+custom\n",result);
}
- (IBAction)btAction:(id)sender {
printf("测试1");
// 调用绑定
rebind_symbols((struct rebinding[1]){{"printf", my_printf, (void *)&orig_printf}}, 1);
printf("测试2");
}

绑定的函数定义为:

1
2
int rebind_symbols(struct rebinding rebindings[], size_t rebindings_nel)

调用非常简单,其中传入的参数是一个结构体数组,其中结构体定义为:

1
2
3
4
5
6
7
8
9
/*
* A structure representing a particular intended rebinding from a symbol
* name to its replacement
*/
struct rebinding {
const char *name; //需要hook的方法名称
void *replacement; //替换后的函数实现
void **replaced; //保存替换后函数实现
};

rebind_symbols方法详解

外部调用的方法,用于重新绑定实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
int rebind_symbols(struct rebinding rebindings[], size_t rebindings_nel) {
int retval = prepend_rebindings(&_rebindings_head, rebindings, rebindings_nel); //预绑定方法,主要是实现相关的结构体和内存地址。
if (retval < 0) {
return retval;
}
// If this was the first call, register callback for image additions (which is also invoked for
// existing images, otherwise, just run on existing images
// 第一次调用的时候注册回调处理,否则直接加载当前的images内存处理
if (!_rebindings_head->next) {
_dyld_register_func_for_add_image(_rebind_symbols_for_image);
} else {
uint32_t c = _dyld_image_count();
for (uint32_t i = 0; i < c; i++) {
_rebind_symbols_for_image(_dyld_get_image_header(i), _dyld_get_image_vmaddr_slide(i));
}
}
return retval;
}

prepend_rebindings的作用是做一些初始化的工作,生成需要的数据结构,赋值给_rebindings_head,最终是一个链表结构。实现为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
static int prepend_rebindings(struct rebindings_entry **rebindings_head,
struct rebinding rebindings[],
size_t nel) {
struct rebindings_entry *new_entry = (struct rebindings_entry *) malloc(sizeof(struct rebindings_entry));
if (!new_entry) {
return -1;
}
new_entry->rebindings = (struct rebinding *) malloc(sizeof(struct rebinding) * nel);
if (!new_entry->rebindings) {
free(new_entry);
return -1;
}
memcpy(new_entry->rebindings, rebindings, sizeof(struct rebinding) * nel);
new_entry->rebindings_nel = nel;
new_entry->next = *rebindings_head;
*rebindings_head = new_entry;
return 0;
}

其中最主要的是这个rebindings_entry数据结构,保留了后续需要绑定的数据:

1
2
3
4
5
struct rebindings_entry {
struct rebinding *rebindings; //需要绑定的数拒数组
size_t rebindings_nel; //绑定的数量
struct rebindings_entry *next; //下一个节点
};

_dyld_register_func_for_add_image是这段代码关键的函数,这个函数的作用是注册一个回调,调用这个函数后,首先系统会返回所有已经加载的镜像的数据,然后之后会回调新加入的镜像的数据。通过这个函数就能获取到镜像加载的数据情况。

接着调用_rebind_symbols_for_image方法来解析image数据。

rebind_symbols_for_image 方法详解

这段代码之前,先讲相关联的的一些知识,更详细的可以看我的另外一篇文章,mach-o parser

程序在解析完Mach64 Header后,开始加载命令(Segment commands),每个命令记录了相关数据的地址信息和命令类型,下面是几个用到的命令。

  1. LC_SEGMENT_64(_LINKEDIT) : 用于处理动态链接的段命令,在程序里,主要用这个段里的数据算出起始偏移地址:base_address = vmaddress - file_offset + slide(ps:动态偏移,是程序运行时动态计算出来的,保证地址空间随机)。比如下图的起始地址计算为:base_address = 0x10000C000 - 0x00000C000 + 0(静态分析是0偏移) = 0x100000000

  2. LC_SYMTAB : 记录加载symbol tablestring table的命令,symbol table记录了函数更详细的数据,程序中主要是利用这个表来从string table表中找到具体的方法名称。程序中用于定位symbol tablestring table的起始地址。计算公式为:symbol_addr = base_address + symbol_table_offsetstring_addr = base_address + string_table_offset。比如下图的起始地址计算为:symbol_addr = 0x100000000 + 0x00000C4D8 = 0x10000C4D8,string_addr = 0x100000000 + 0x00000D02C = 0x10000D02C

  1. LC_DYSYMTAB : 记录加载动态链接信息的命令。程序中主要是用来计算Dynamic Symbol Table的地址。利用Dynamic Symbol Table地址可以定位到对应方法在symbol table的地址,从而定位到方法名称。起始地址计算公式为:dynamic_symbol_addr = base_address + indsym_table_offset,比如下图的起始地址计算为:dynamic_symbol_addr = 0x100000000 + 0000CF78 = 0x10000CF78

下面的程序主要是定位出symbol tablestring tabledynamic symbol table的起始地址,最终交给perform_rebinding_with_section方法来真正做替换。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
static void rebind_symbols_for_image(struct rebindings_entry *rebindings,
const struct mach_header *header,
intptr_t slide) {
//rebindings是一个链表结构,保存了所有需要绑定的数据
Dl_info info;
if (dladdr(header, &info) == 0) {
return;
}
// 解析加载指令,找到需要的几个加载命令:SEG_LINKEDIT、LC_SYMTAB、LC_DYSYMTAB
segment_command_t *cur_seg_cmd;
segment_command_t *linkedit_segment = NULL;
struct symtab_command* symtab_cmd = NULL;
struct dysymtab_command* dysymtab_cmd = NULL;
// header的数据结构为
/**
/*
* The 64-bit mach header appears at the very beginning of object files for
* 64-bit architectures.
*/
struct mach_header_64 {
uint32_t magic; /* mach magic number identifier */
cpu_type_t cputype; /* cpu specifier */
cpu_subtype_t cpusubtype; /* machine specifier */
uint32_t filetype; /* type of file */
uint32_t ncmds; /* number of load commands */
uint32_t sizeofcmds; /* the size of all the load commands */
uint32_t flags; /* flags */
uint32_t reserved; /* reserved */
};
**/
uintptr_t cur = (uintptr_t)header + sizeof(mach_header_t);
// 头部之后是加载指令
for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) {
cur_seg_cmd = (segment_command_t *)cur;
if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT) {
//解析__LINKEDIT的cmd命令
if (strcmp(cur_seg_cmd->segname, SEG_LINKEDIT) == 0) {
linkedit_segment = cur_seg_cmd;
}
} else if (cur_seg_cmd->cmd == LC_SYMTAB) {
//解析__SYMTAB的cmd命令
symtab_cmd = (struct symtab_command*)cur_seg_cmd;
} else if (cur_seg_cmd->cmd == LC_DYSYMTAB) {
//解析__DYSYMTAB的cmd命令
dysymtab_cmd = (struct dysymtab_command*)cur_seg_cmd;
}
}
// 如果没找到就退出
if (!symtab_cmd || !dysymtab_cmd || !linkedit_segment ||
!dysymtab_cmd->nindirectsyms) {
return;
}
// Find base symbol/string table addresses
// 找到对应的基础地址,使用__LINKEDIT中的vmaddress - file_offset + slide(动态偏移),通过这个地址算出symbol/string table的地址。
uintptr_t linkedit_base = (uintptr_t)slide + linkedit_segment->vmaddr - linkedit_segment->fileoff;
//计算symbol table的地址 = linkedit_base + symbol_table_offset
nlist_t *symtab = (nlist_t *)(linkedit_base + symtab_cmd->symoff);
//计算string table的地址 = linkedit_base + string_table_offset
char *strtab = (char *)(linkedit_base + symtab_cmd->stroff);
// Get indirect symbol table (array of uint32_t indices into symbol table)
//计算indirect symbols table地址 = linkedit_base + dysymtab->indSym_table_offset
uint32_t *indirect_symtab = (uint32_t *)(linkedit_base + dysymtab_cmd->indirectsymoff);
// 再次遍历加载命令,用于定位`__DATA`段中的`__la_symbol_ptr`或`__nl_symbol_ptr`节的加载命令
cur = (uintptr_t)header + sizeof(mach_header_t);
for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) {
cur_seg_cmd = (segment_command_t *)cur;
if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT) {
if (strcmp(cur_seg_cmd->segname, SEG_DATA) != 0 &&
strcmp(cur_seg_cmd->segname, SEG_DATA_CONST) != 0) {
continue;
}
for (uint j = 0; j < cur_seg_cmd->nsects; j++) {
section_t *sect =
(section_t *)(cur + sizeof(segment_command_t)) + j;
//如果是_la_symbol_ptr或者__nl_symbol_ptr就调用hook方法进行hook
if ((sect->flags & SECTION_TYPE) == S_LAZY_SYMBOL_POINTERS) {
perform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab);
}
if ((sect->flags & SECTION_TYPE) == S_NON_LAZY_SYMBOL_POINTERS) {
perform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab);
}
}
}
}
}

perform_rebinding_with_section

这段代码是查找符号表的数据,如果名字和要代替的方法名称相同,就把实现用替换的方法做替换。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
static void perform_rebinding_with_section(struct rebindings_entry *rebindings,
section_t *section,
intptr_t slide,
nlist_t *symtab,
char *strtab,
uint32_t *indirect_symtab) {
//通过节中保存的reserved1来定位在Dynamic Symbol Table的起始地址。Dynamic Symbol Table是一个数组结构,数组的元素保存了symbol table中的的index值
uint32_t *indirect_symbol_indices = indirect_symtab + section->reserved1;
//算出对应的在DATA区的数据的地址(__DATA__got节对应的是__nl_symbol_ptr指针,__DATA__la_symbol_prt节对应的是_la_symbol_ptr),这个节对应的是一个数组列表,数组元素为具体的方法实现的函数指针,最终的目的就是把这个实现的函数指针替换成需要替换的函数指针
void **indirect_symbol_bindings = (void **)((uintptr_t)slide + section->addr);
//indirect_symbol_indices和indirect_symbol_bindings都是数组结构,而且一一对应,通过数组index就可以找到成对的对应关系,数组长度是节的长度/指针的长度
for (uint i = 0; i < section->size / sizeof(void *); i++) {
//从Dynamic Symbol Table表中取出对应在symbol table中的的index。
uint32_t symtab_index = indirect_symbol_indices[i];
if (symtab_index == INDIRECT_SYMBOL_ABS || symtab_index == INDIRECT_SYMBOL_LOCAL ||
symtab_index == (INDIRECT_SYMBOL_LOCAL | INDIRECT_SYMBOL_ABS)) {
continue;
}
//通过index从symbol table取出nlist_t数据结构,在这个结构了保存了在string table中的偏移量
nlist_t symtablist = symtab[symtab_index];
uint32_t strtab_offset = symtablist.n_un.n_strx;
//通过偏移量可以找到对应的方法名称
char *symbol_name = strtab + strtab_offset;
if (strnlen(symbol_name, 2) < 2) {
continue;
}
struct rebindings_entry *cur = rebindings;
while (cur) {
//遍历传进来的替换结构体里是否有和现在数据方法名匹配
for (uint j = 0; j < cur->rebindings_nel; j++) {
if (strcmp(&symbol_name[1], cur->rebindings[j].name) == 0) {
if (cur->rebindings[j].replaced != NULL &&
indirect_symbol_bindings[i] != cur->rebindings[j].replacement) {
//保存原始实现
*(cur->rebindings[j].replaced) = indirect_symbol_bindings[i];
}
//如果匹配到了,将indirect_symbol_bindings保存的函数指针替换成要绑定的函数指针
indirect_symbol_bindings[i] = cur->rebindings[j].replacement;
goto symbol_loop;
}
}
cur = cur->next;
}
symbol_loop:;
}
}

分解下替换的步骤:

  1. 找到加载__nl_symbol_ptr__la_symbol_prt的节命令,其中__nl_symbol_ptr是程序启动就会加载的symbol,__la_symbol_prt代表懒加载。
    对应的数据结构为:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    struct section_64 { /* for 64-bit architectures */
    char sectname[16]; /* name of this section */
    char segname[16]; /* segment this section goes in */
    uint64_t addr; /* memory address of this section */
    uint64_t size; /* size in bytes of this section */
    uint32_t offset; /* file offset of this section */
    uint32_t align; /* section alignment (power of 2) */
    uint32_t reloff; /* file offset of relocation entries */
    uint32_t nreloc; /* number of relocation entries */
    uint32_t flags; /* flags (section type and attributes)*/
    uint32_t reserved1; /* reserved (for offset or index) */
    uint32_t reserved2; /* reserved (for count or sizeof) */
    uint32_t reserved3; /* reserved */
    };

    __nl_symbol_ptr 对应的节加载命令如下:

    __la_symbol_prt对应的节加载命令如下:

  2. 利用这个加载命令,可以定位到__DATA段中的地址,比如上图中__la_symbol_prt节,对应的起始地址为100008018,这个地址在__DATA__la_symbol_ptr

    每个数据记录了具体实现的函数地址,比如__printf函数,对应的实现地址为100006BDC,这个地址指向__stub__helper,也就是利用__stub__helper来调用__printf函数。把这个地址替换后就可以调用自己的函数了。对应的地址在程序中赋给**indirect_symbol_bindings

  3. 通过section加载命令,定位这个节对应的在Indirect Symbols中的起始位置,计算公式为:*indirect_symbol_indices = indirect_symtab + section->reserved1,

    上图中的起始index就是:24。在Indirect Symbols找到第24位(起始从0开始),

    确定好起始位置后,Indirect Symbols表和前面确定的__DATA端中地址就是一一对应的。如下图,第一个是NSStringFromClass,上图第一个也是NSStringFromClass

  4. 利用第三步中的Indirect Symbols表中的数据定位对应Symbol table中的index,比如下图中printf对应偏移地址00000A6,对应10进制为166。

    上面计算的Symbol table起始地址为10000C4D8,这是一个数组,找到index为166的数据:

  5. 利用第五步得到的结果,取出在String table中的偏移值,上图中的_printf偏移值为00002CCString table起始地址为0x10000D02C,则对应的地址为10000D2F8,这样就可以获得函数名,如果和传入的函数名相同,则将第二步获取的函数实现的地址和传入要改变的地址进行替换,并把原地址保存下来。

查找方法名参考官方的图:

Visual explanation

参考文章:

  1. fishhook源码分析
  2. 动态修改 C 语言函数的实现
坚持原创技术分享,您的支持将鼓励我继续创作!