导读
在做网络性能监听的时候,需要对网络状态进行监听,OC的代码可以通过runtime机制做hook,但是其中涉及到C代码函数,如何进行hook?通过查资料,使用fishhook可以解决这个问题
官方内容:
fishhook
fishhook is a very simple library that enables dynamically rebinding symbols in Mach-O binaries running on iOS in the simulator and on device. This provides functionality that is similar to using DYLD_INTERPOSE
on OS X. At Facebook, we’ve found it useful as a way to hook calls in libSystem for debugging/tracing purposes (for example, auditing for double-close issues with file descriptors).
Usage
Once you add fishhook.h
/fishhook.c
to your project, you can rebind symbols as follows:
|
|
Sample output
|
|
How it works
dyld
binds lazy and non-lazy symbols by updating pointers in particular sections of the __DATA
segment of a Mach-O binary. fishhook re-binds these symbols by determining the locations to update for each of the symbol names passed to rebind_symbols
and then writing out the corresponding replacements.
For a given image, the __DATA
segment may contain two sections that are relevant for dynamic symbol bindings: __nl_symbol_ptr
and __la_symbol_ptr
. __nl_symbol_ptr
is an array of pointers to non-lazily bound data (these are bound at the time a library is loaded) and __la_symbol_ptr
is an array of pointers to imported functions that is generally filled by a routine called dyld_stub_binder
during the first call to that symbol (it’s also possible to tell dyld
to bind these at launch). In order to find the name of the symbol that corresponds to a particular location in one of these sections, we have to jump through several layers of indirection. For the two relevant sections, the section headers (struct section
s from <mach-o/loader.h>
) provide an offset (in the reserved1
field) into what is known as the indirect symbol table. The indirect symbol table, which is located in the __LINKEDIT
segment of the binary, is just an array of indexes into the symbol table (also in __LINKEDIT
) whose order is identical to that of the pointers in the non-lazy and lazy symbol sections. So, given struct section nl_symbol_ptr
, the corresponding index in the symbol table of the first address in that section is indirect_symbol_table[nl_symbol_ptr->reserved1]
. The symbol table itself is an array of struct nlist
s (see <mach-o/nlist.h>
), and each nlist
contains an index into the string table in __LINKEDIT
which where the actual symbol names are stored. So, for each pointer __nl_symbol_ptr
and __la_symbol_ptr
, we are able to find the corresponding symbol and then the corresponding string to compare against the requested symbol names, and if there is a match, we replace the pointer in the section with the replacement.
The process of looking up the name of a given entry in the lazy or non-lazy pointer tables looks like this:
源码解析
调用方法:
使用非常简单,可以用下面的方式hook printf
的方法,来执行自定义的printf方法。
|
|
绑定的函数定义为:
调用非常简单,其中传入的参数是一个结构体数组,其中结构体定义为:
|
|
rebind_symbols方法详解
外部调用的方法,用于重新绑定实现
|
|
prepend_rebindings
的作用是做一些初始化的工作,生成需要的数据结构,赋值给_rebindings_head
,最终是一个链表结构。实现为:
|
|
其中最主要的是这个rebindings_entry
数据结构,保留了后续需要绑定的数据:
|
|
_dyld_register_func_for_add_image
是这段代码关键的函数,这个函数的作用是注册一个回调,调用这个函数后,首先系统会返回所有已经加载的镜像的数据,然后之后会回调新加入的镜像的数据。通过这个函数就能获取到镜像加载的数据情况。
接着调用_rebind_symbols_for_image
方法来解析image数据。
rebind_symbols_for_image 方法详解
这段代码之前,先讲相关联的的一些知识,更详细的可以看我的另外一篇文章,mach-o parser。
程序在解析完Mach64 Header后,开始加载命令(Segment commands),每个命令记录了相关数据的地址信息和命令类型,下面是几个用到的命令。
LC_SEGMENT_64(_LINKEDIT) : 用于处理动态链接的段命令,在程序里,主要用这个段里的数据算出起始偏移地址:base_address = vmaddress - file_offset + slide(ps:动态偏移,是程序运行时动态计算出来的,保证地址空间随机)。比如下图的起始地址计算为:
base_address = 0x10000C000 - 0x00000C000 + 0(静态分析是0偏移) = 0x100000000
LC_SYMTAB : 记录加载
symbol table
和string table
的命令,symbol table
记录了函数更详细的数据,程序中主要是利用这个表来从string table
表中找到具体的方法名称。程序中用于定位symbol table
和string table
的起始地址。计算公式为:symbol_addr = base_address + symbol_table_offset
、string_addr = base_address + string_table_offset
。比如下图的起始地址计算为:symbol_addr = 0x100000000 + 0x00000C4D8 = 0x10000C4D8
,string_addr = 0x100000000 + 0x00000D02C = 0x10000D02C
LC_DYSYMTAB : 记录加载动态链接信息的命令。程序中主要是用来计算
Dynamic Symbol Table
的地址。利用Dynamic Symbol Table
地址可以定位到对应方法在symbol table
的地址,从而定位到方法名称。起始地址计算公式为:dynamic_symbol_addr = base_address + indsym_table_offset
,比如下图的起始地址计算为:dynamic_symbol_addr = 0x100000000 + 0000CF78 = 0x10000CF78
下面的程序主要是定位出symbol table
、string table
和dynamic symbol table
的起始地址,最终交给perform_rebinding_with_section
方法来真正做替换。
|
|
perform_rebinding_with_section
这段代码是查找符号表的数据,如果名字和要代替的方法名称相同,就把实现用替换的方法做替换。
|
|
分解下替换的步骤:
找到加载
__nl_symbol_ptr
和__la_symbol_prt
的节命令,其中__nl_symbol_ptr
是程序启动就会加载的symbol,__la_symbol_prt
代表懒加载。
对应的数据结构为:1234567891011121314struct section_64 { /* for 64-bit architectures */char sectname[16]; /* name of this section */char segname[16]; /* segment this section goes in */uint64_t addr; /* memory address of this section */uint64_t size; /* size in bytes of this section */uint32_t offset; /* file offset of this section */uint32_t align; /* section alignment (power of 2) */uint32_t reloff; /* file offset of relocation entries */uint32_t nreloc; /* number of relocation entries */uint32_t flags; /* flags (section type and attributes)*/uint32_t reserved1; /* reserved (for offset or index) */uint32_t reserved2; /* reserved (for count or sizeof) */uint32_t reserved3; /* reserved */};__nl_symbol_ptr
对应的节加载命令如下:__la_symbol_prt
对应的节加载命令如下:利用这个加载命令,可以定位到
__DATA
段中的地址,比如上图中__la_symbol_prt
节,对应的起始地址为100008018
,这个地址在__DATA__la_symbol_ptr
。每个数据记录了具体实现的函数地址,比如
__printf
函数,对应的实现地址为100006BDC
,这个地址指向__stub__helper
,也就是利用__stub__helper
来调用__printf
函数。把这个地址替换后就可以调用自己的函数了。对应的地址在程序中赋给**indirect_symbol_bindings
通过section加载命令,定位这个节对应的在
Indirect Symbols
中的起始位置,计算公式为:*indirect_symbol_indices = indirect_symtab + section->reserved1
,上图中的起始index就是:24。在
Indirect Symbols
找到第24位(起始从0开始),确定好起始位置后,
Indirect Symbols
表和前面确定的__DATA
端中地址就是一一对应的。如下图,第一个是NSStringFromClass
,上图第一个也是NSStringFromClass
。利用第三步中的
Indirect Symbols
表中的数据定位对应Symbol table
中的index,比如下图中printf
对应偏移地址00000A6
,对应10进制为166。上面计算的
Symbol table
起始地址为10000C4D8
,这是一个数组,找到index为166的数据:利用第五步得到的结果,取出在
String table
中的偏移值,上图中的_printf
偏移值为00002CC
,String table
起始地址为0x10000D02C
,则对应的地址为10000D2F8
,这样就可以获得函数名,如果和传入的函数名相同,则将第二步获取的函数实现的地址和传入要改变的地址进行替换,并把原地址保存下来。
查找方法名参考官方的图: