Introduction[]
finsig_thumb2 is a tool used to automatically identify functions and variables in Digic 6 firmware dumps. It serves the same role as the original Signature finder but aside from some shared utility code is a completely new implementation due the fact Digic 6 uses the thumb2 instruction set.
Configuration[]
You must configure the capstone library as described in Digic 6 Porting.
Usage[]
Once configured, use is the same as for original sig finder. Build with OPT_GEN_STUBS=1 and the firmware PRIMARY.BIN either in the in the source tree or pointed to by PRIMARY_ROOT
It is normal for finsig_thumb2 to produce some warnings. In general, they are not a problem. If a specific function match is referenced, inspecting the addresses reported may be useful to find manually finding the functions.
Note: if you see "WARNING! Incorrect disassembly is likely", it means your capstone library has not been correctly patched. This must be fixed, or addresses reported by finsig_thumb2 will be incorrect.
Development[]
Overview[]
finsig_thumb2 consists of two major components
- firmware_load_ng.c is a generic library for analyzing firmware dumps with capstone. The public functions are mostly documented in firmware_load_ng.h.
- finsig_thumb2.c provides the methods to identify specific firmware functions and variables.
The main purpose of finsig_thumb2 is to find functions, variables and constants and output them to stubs_entry.S. The items to be found are defined sig_names (for functions) and misc_vals (for most other values.)
Additional functions used to find required functions and aid reverse engineering are found automatically and added to sig_names.
Unlike the original finsig_dryos, matching is done by executing defined rules and hard-coded bootstrap functions in the order specified in code.
Also unlike finsig_dryos, finsig_thumb2 only supports one match per function, and does not score partial matches.
Because the variable instruction size and alignment of thumb2 code makes searching for particular instruction sequences inefficient, full searches of the firmware code are avoided as much as possible.
Disassembly basics[]
Disassembly is done using functions that operate on the iter_state_t structure, typically named is in the code. This structure encapsulates
- The most recently disassembled instruction (insn member)
- Current address
- Current ARM/thumb state
- A history of recently disassembled addresses, to allow back-tracking which would otherwise be unreliable for variable instruction size thumb2 code.
- Capstone disassembly state
To analyze firmware code, the state is initialized to a specific address with disasm_iter_init (or disasm_iter_set). Each subsequent call to disasm_iter disassembles one instruction and advances the current address by the size of the instruction.
The loaded firmware dump and information about it are encapsulated in the firmware structure, typically called fw.
Capstone cs_insn[]
Most analysis is done using the cs_insn structure which is populated by capstone disassembly. The capstone headers capstone.h and arm.h can be helpful to understand the fields and constants.
Matches[]
Most values are identified using "match rules" defined in arrays of sig_rule_t. sig_rules_initial defines matches used to bootstrap eventprocs and task identification.
After sig_rules_initial is processed, find_generic_funcs identifies eventprocs and tasks.
sig_rules_main defines matches for the remaining values.
Rules[]
Rules consist of
- A match function
- The name of the value to find
- A reference string whose meaning is defined by the match function, typically either a function name or firmware string
- options passed to the match function
- options to restrict the rule to particular DryOS versions
Match functions identify the target value and add it to sig_names or misc_vals. Most only add the function in the name field, but a few have side effects where multiple related values are obtained from the same firmware code.
Match functions may be generic, using the name and reference string to identify the target value, or specific to a particular function.
Rules in each list are called in the order they appear.
By convention, specific match functions are name sig_match_...
Generic rules[]
The most common generic rules are sig_match_named and sig_match_near_str
sig_match_named[]
The reference string for sig_match_named is the name of an already known function. The options (SIG_NAMED... macros) define whether it simply an alias for an eventproc (no options) or a call from the named function.
sig_match_near_str[]
The reference string is a string referenced by firmware code, where the target function is assumed to be "near" the where the string is referenced. The SIG_NEAR... macros are used to define where the function found in relation to the string reference.
sig_rule match functions overview[]
Match functions are passed
- Firmware data object fw required for calls to most analysis functions
- An iter_state_t state is to use for disassembling with capdis
- The rule object rule for the function name, flags and reference string
On success, match functions should add the target(s) using save_sig_with_j (for functions) or save_misc_val for other values and return 1. On failure, they return 0.
Most match functions initialize is to the function named by the reference string using init_disasm_sig_ref or somewhere near the reference string using find_str_bytes and disasm_iter_init
After initializing is, match functions disassemble forward using search and analysis functions to identify "signposts" (e.g. calls to know functions, references to known strings etc) and the target value.
Existing match functions provide many examples, and new matches can often be pieced together by finding previous matches that do similar things. To understand how a match works, it's usually a good idea to try to follow the logic in the disassembly of a known firmware.
The firmware structure contains an iter_state fw->is, which is used by some functions that need an additional temporary state is required, e.g. for backtracking. This state can be used as a temporary state in match code, but care must be taken not to call functions which also modify this state.
Analysis functions overview[]
Many of the basic analysis functions are briefly documented in firmware_load_ng.h
The description below are intended to provide general overview of commonly used functions and jumping off points to the code, not comprehensive documentation.
Disassembly control and instruction search functions[]
Most match functions involve starting disassembly at a known point and then attempting to match an expected sequence of instructions. The functions listed below control the disassembly process.
disasm_iter_init[]
Initializes the iter_state to a given address. This prepares for disassembly starting at the specified address, but does not disassemble anything. This is often used to "follow" a call using an address provided by get_branch_call_insn_target.
Note: the thumb bit specifies whether disassembly should be in arm or thumb mode.
init_disasm_sig_ref[]
Initializes the iter_state to the start of the named function.
disasm_iter[]
Disassemble and advance the iter_state_t by one instruction. After disassembling, insn member can be used get information about the current instruction. firmware_load_ng contains various helpers to identify specific classes of instructions and extract operands.
Returns 1 on success, 0 if disassembly failed
find_next_sig_call[]
Disassemble up to the specified number of bytes (NOT instructions, unlike many other functions) looking for calls to the an already identified firmware functions.
The iter_state is left pointing to the call if found, or the last instruction analyzed.
Returns 1 on success, 0 if not found or disassembly failed.
insn_match_find_next[]
Disassemble up to the specified number of instructions looking for any of the instructions defined in the match argument.
Returns 1 on success, 0 if not found or disassembly failed.
insn_match_find_nth[]
As above, but find the Nth matching instruction.
insn_match_find_next_seq[]
Disassemble up to the specified number of instructions looking for the sequence of instructions defined in the match argument.
Returns 1 on success, 0 if not found or disassembly failed.
fw_search_insn[]
Call the specified callback for every instruction in the given address range. If disassembly fails, the address is advanced by 2 bytes (for thumb) or 4 (arm). Frequently used with search_disasm_const_ref find a reference to a string.
Instruction matching[]
firmware_load_ng contains a variety of functions for identifying specific types of instructions.
Instruction match structure[]
Arrays of insn_match_t structure is used to define instruction matches for insn_match_* functions.
Depending on the function called, the match may represent a sequence of instructions (for ...find_next_seq) or a list of alternative instructions.
Matches are defined using the MATCH_... macros from firmware_load_ng.h
Each definition consists of
- An instruction match, defined using the MATCH_OP or or MATCH_OP_CC macros. The instruction match defines the instruction and number of operands to match. MATCH_OPCOUNT_IGNORE ignores all operands. MATCH_OPCOUNT_ANY matches any number of operands, but requires that all specified operand match.
- 0 or more operand matches, defined using the MATCH_OP_... macros. Operand matches can match by type (register, immediate etc) or specific register or value.
Match definitions end with ARM_INS_ENDING.
Instruction identification functions[]
Separate from the matching functions, firmware_load_ng also provides variour functions for identifying classes of instructions. This are generally named isMNEMONIC_operands. An "x" at the end of the mnemonic indicates that the function checks for a class of similar instructions, so isSUBx_imm identifies all subtract instructions (SUB, SUBW, SUBS etc) with an immediate operand.
Extracting values[]
Functions are provided to obtain values of operands, variables etc.
Note: many of the functions dealing with firmware address (jump targets, ADR, PC relative LDR) return 0 on failure. While 0 could theoretically be a valid value, it is unlikely to be in practice.
ADRx2adr[]
Extract address calculated by various ADD and SUB instructions using PC as an operand to generate a nearby address. Returns 0 if instruction isn't ADR-like.
LDR_PC2val[]
Return the value that would be loaded by a PC relative load.
B_target and similar[]
B_target, CBx_target and similar return the target of various immediate branch instructions, or zero if the instruction is not of the specified type.
Note: these function return the address as it is encoded in the underlying instruction, without modifying the thumb bit. In general, this means the thumb bit is not set, except for LDR_PC_PC_target. The get_ ... functions described below are more convenient in many cases.
get_direct_jump_target[]
Checks if the code starting at is_init is direct jump (e.g. B, LDR PC,#const, or multi-instruction variants involving IP). These kinds of instructions are frequently generated as veneers in the thumb2 code.
Returns the target address with the thumb bit set appropriately, or zero if not matching instruction is found.
Modifies fw->is, does not modify is_init
get_branch_call_insn_target[]
Checks if the current instruction of is is a single instruction function call or branch instruction.
Returns the target address with the thumb bit set appropriately, or zero if the instruction does not match.
get_call_const_args[]
Uses the address and history in is_init to disassemble backwards attempting to identify constant values that would end up as function arguments (in r0-r3).
Returns a bitmask of the registers for which values were identified, and stores the values identified in res
Modifies fw->is, does not modify is_init
Note: this function works reasonably well in practice, but there are many cases in which it can produce incorrect results.
Storing found values[]
When a match is found, it needs to saved in the sig_names or misc_vals structure.
save_sig_with_j[]
Saves an address for a function that already exists in sig_names. If the passed address is a veneer (a direct jump to another address), the veneer is saved with the name j_name and the function is added with the target of the veneer. This currently only handles one level of veneer. Adding matches via a commonly used veneer is helpful for reverse engineering.
The given address must have the thumb bit set appropriately. Typical usage from a match function is
return save_sig_with_j(fw,rule->name,get_branch_call_insn_target(fw,is));
add_func_name[]
Adds new function to sig_names. Used for functions which are automatically identified and named using generic analysis, like event procs and tasks.
save_misc_val[]
Saves the address for the named misc value. The address is specified as a base and offset, to document values that are structure members in the stubs file. A reference address may also be provided to indicate where the value was found.
Typical usage from a match function is
save_misc_val(rule->name,is->insn->detail->arm.operands[1].imm,0,(uint32_t)is->insn->address);
... to be continued ...