- 1 Introduction
- 2 Prerequisites
- 3 Importing a firmware dump
- 4 Initial analysis
- 4.1 Adding CHDK Ghidra scripts
- 4.2 Preparing CHDK stubs files
- 4.3 Initializing the memory map
- 4.4 Importing known functions from stubs
- 4.5 Initial auto-analysis
- 4.6 Post analysis cleanup
- 4.7 Apply known function signatures and data types
- 4.8 Additional analysis scripts
- 5 General Usage
Ghidra is free, open source software reverse engineering (SRE) suite of tools developed by the NSA. It supports disassembly, de-compilation and professional analysis capabilities on the ARM instruction sets used by the main CPUs of Digic 2 - 7 cameras.
- Ghidra downloaded from https://ghidra-sre.org/ and installed following the directions at https://ghidra-sre.org/InstallationGuide.html (CHDK scripts support 10.0.x through 10.0.3 as of Sept 2021. Earlier versions may work but are not generally tested. Version prior to 9.2.2 should be avoided)
- A firmware dump for the camera you want to work on, e.g dumped with Canon Basic dumper script or obtained from the archive.
- Identified ROMBASEADDR
- A CHDK source tree with initial stubs built by the vxworks, dryos or thumb2 sig finder. Optional, but strongly recommended unless you are working on a new model which uses a new CPU architecture or firmware wildly different from known models.
Importing a firmware dump
Create a project, if you don't have one already
In Ghidra, analysis is performed on "programs", which must imported within a "project", so you must create a project if you don't already have one. If you work on multiple firmware dumps, keeping them all in the same project is recommended, because it allows you to have them open at the same time and use the version comparison tool.
Create a new project under File -> New Project.
- A dialog will prompt shared or non-shared. This document assumes you are working alone, so choose non-shared.
- Enter an name and directory and click "finish"
Note: All data related to project will be stored in the project directory. This can amount to several hundred MB per firmware dump.
Load the dump
To add a program, the file(s) must be imported under the File -> Import menu.
- Select the firmware dump (usually PRIMARY.BIN) for your camera
- Select "Raw binary" for in format.
- "Language" is Ghidra's terminology for CPU architecture
- For Digic 5 and earlier, choose arm, v5t (or v5, see note), little endian, default compiler.
- For Digic 6 and 7, choose arm, v7, little endian, default compiler
- In destination folder, you can choose a project folder to organize your firmware dumps. Making a folder for each model is recommended if you work on many dumps. This "folder" is only a organization tool in the Ghidra UI, it is not a filesystem directory.
- Program name will default to the filename. Changing it to identify the camera and firmware is recommended, because having every firmware called PRIMARY.BIN gets old real quick. Like the folder, this is only in the Ghidra UI and does not affect actual file names.
- Under options
- Set block name ROM
- Set Base Address to the ROMBASEADDR of your firmware. For existing ports, it will be defined in platform/.../sub/.../makefile.inc or platform/.../makefile.inc. For new ports, it should usually be the start address identified in CBDUMPER.LOG by the Canon Basic dumper script.
- Offset should normally be 0, and length auto-detected.
- Click OK on the import dialog. You will be shown a summary of the import and returned to the main project window.
Note: Digic 5 and below are v5t, meaning they support thumb code. However, Canon firmware for these models does not normally include thumb code. Choosing v5 may avoid having code incorrectly identified as thumb, or accidentally disassembling as thumb in the UI. However, you will not be able to disassemble memory dumps including CHDK code with this setting.
After you import the dump, you can open it in the "Code Browser" tool by double clicking on the program name. You will see a prompt like "(program) has not been analyzed yet, would you like to analyze it now". For CHDK, additional preparation described below is recommended, so select "No".
Adding CHDK Ghidra scripts
Some scripts to aid setup and analysis are included in the CHDK source tree, under tools/ghidra_scripts. It is recommended that you use the latest scripts from the CHDK SVN trunk.
- In the Ghidra menu, select Window -> Script manager.
- Click the script directories button on the right of the tool bar (looks like bullet list, 3rd from right)
- Add tools/ghidra_scripts from the CHDK source directory
- Select the CHDK folder at the left to access CHDK scripts
To add scripts to the Ghidra tools menu:
- Check "In tool" next to the desired scripts in the script manager.
- CHDK scripts will appear under Tools -> CHDK
- To keep the scripts in the menu for future Ghidra sessions, choose File -> Save tool (or Save tool as) in Ghidra.
Preparing CHDK stubs files
The CHDK scripts use information generated by the CHDK sig finders, so you need to run the rebuild-stubs step for your port before using them.
You can build stubs with an essentially empty tree. Minimal requirements
- platform/(model) makefile.inc defining ROMBASEADDR, and THUMB_FW for digic 6 and above
- platform/(model)/sub/(firmware) Makefile with just the include ../../../makefile_sub.inc
- empty platform/(model)/sub/(firmware)/makefile.inc
If you copied another port as your starting point, you should comment out or FAKEDEF / NULL_SUB all stubs in stubs_entry.2 and stubs_min.S, to avoid the scripts picking up addresses for the wrong firmware.
Run make PLATFORM=(your platform) PLATFORMSUB=(your sub) rebuild-stubs
It's OK if rebuild-stubs fails due to missing functions: The sigfinder will output found stubs to stubs-entry.S.err, and funcs_by*.csv
Initializing the memory map
Ghidra analysis is significantly improved if a memory map is defined to locate copied code and data at the correct addresses, indicate which regions are expected to contain code, and what address space is expected to be accessed as data.
For Digic 2 - 7 models as of 2019, the InitCHDKMemMap.py script can be used to configure an initial memory map, if CHDK stubs have been built.
- Double click the script in name in the script manager
- Select the platform ... sub directory where you built stubs
- Note on Windows, the Ghidra directory selector combines the selected directory with whatever appears in the "file" text area, so if you click into the final directory, the file part should be empty. Or you can click into parent directory and just select the sub.
- Click OK and the script should run, summarizing the created memory regions in the console window. You can examine or adjust them by clicking on the memory chip icon in the main Code Browser toolbar.
- If the console contains red error text, the script failed
Importing known functions from stubs
The CHDK build process automatically identifies many functions and variables. Defining these in Ghidra before analysis allows Ghidra to start disassembling from known code, which significantly aids analysis.
- Double click ImportCHDKStubs.py in the script manager
- Select stubs directory like the previous scripts
- When prompted for stubs to import: For a new port, select only stubs_entry.S (or stubs_entry.S.err) and funcs_by_address.csv. For an existing, working port, select all.
- When prompted for mode, select "Entry points only" if the firmware dump has not yet been analyzed. If you run the script again after analyzing, use load and disassemble.
Note: If you find additional stubs or correct misidentified ones after running the script, you can safely run this script again with the "load and disassemble" option.
Once the scripts have been run, autoanalysis can be started. Select Analysis -> Auto analyze... from the Ghidra menu.
The following options are suggested:
- Turn off "embedded media" for the first run, as it seems to misidentify some things as WAV in code. Run it from the one-shot menu afterwards instead
- Turn off "Non-returning functions - discovered". This seems to cause disassembly to stop in a lot of places it shouldn't.
- Turn on "Shared return calls". This helps deal with code that does a B ... after a POP LR. Turning on "allow conditional calls" for this analyzer may also improve results.
- Turn off "address tables". This seems to be better as a one-shot after initial analysis, to avoid creating data from runs of things that could be addresses.
- Select "Scalar operand references", if not selected by default
Click analyze. This can take a long time! You can browse the program while it's analyzing, but it's probably a good time to grab a beverage or snack.
Note: The auto-analysis options don't just apply to the initial, full analysis, they also apply whenever new code is disassembled. So if you turn something off for the initial run, you may want to re-enable it after. The settings are still saved if you open the dialog, make changes and cancel.
Post analysis cleanup
- On Digic 6 and above, run CleanThumbBookmarkErrors.py.
- Run CleanEmptyFuncs.py. Ghidra auto-analysis seems to sometimes create zero-length functions. This script removes them, and re-creates a function at the location if it contains valid code.
- CleanFuncBookmarks.py creates functions from probable functions identified in analysis bookmarks, and can also fix some incorrectly identified functions. It should run after CleanThumbBookmarkErrors.py (if applicable). This script tries to be conservative, only creating functions that are likely to be valid. By default, if a function is created or already exists, the corresponding bookmark is removed. This can be be disabled by editing g_options in the script.
- Run ImportCHDKStubs.py a second time, with "load and disassemble" enabled. This will help ensure that functions identified in stubs are defined as functions in Ghidra.
- Run LabelsToFuncs.py. Create functions from labels if they look like valid function starts. It might get a few cases wrong (particularly where the original Canon code was doing weird things in ASM) but seems seems seems to give good results in the vast majority of cases.
- If you turned off "embedded media" and "address tables" for the initial analysis, you may want to run them as one-shot, and also re-enabled them in auto analysis options. Note address tables can take a long time.
Ghidra creates "bookmarks" where disassembly ran into errors. More of the firmware can be successfully analyzed when they are resolved. To view error bookmarks, select "bookmarks" in the window menu, and then click the filter icon (gear) in the bookmarks window toolbar. Clicking on a bookmark will jump to the location. Typical causes of errors are the Ghidra interpreting data as code, or code as data, or starting in the wrong arm/thumb state.
Ghidra also creates bookmarks for various non-error items, such as likely code, functions, and embedded media. Creating functions for the identified code manually or with CleanFuncBookmarks.py can improve analysis and version tracking.
Apply known function signatures and data types
Making Ghidra aware of function signatures improves decompilation and analysis. This can be done by importing header files from tools/ghidra_scripts/datatypes. The dump should already be analyzed as described above.
Importing header files
Select File -> Parse C Source... from the Ghidra menu.
Creating a parse configuration
To import header files, you must create a "parse configuration" which sets the header files to be used, along with any defines that need to be set when the header files are used. For the CHDK files, different defines are required depending on the camera, as described below.
- Use the small disk icon with ... under it to copy an existing parse configuration, e.g clib.prf
- Name your copy something obviously related to CHDK and camera configuration, e.g. chdk-dryos31
- Select all the header file entries, and use the red X button to delete them
- Use the green + button to add chdk source tools/ghidra_scripts/datatypes/fw_functions.h
- Adjust the parse options section to match your platform:
- Remove all entries except -D__builtin_va_list=void *
- If your camera uses dryos, add the PLATFORMOSVER value from makefile.inc, like -DCAM_DRYOS_REL=31
- If your camera uses 3 argument DebugAssert (all digic 6, some very early vxworks and some but not all DryOS >= 52, platform_camera.h) add -DCAM_3ARG_DebugAssert=1
- For ixus30 and ixus 40, add -DVX_OLD_PTP=1
- Save your parse configuration with the big floppy icon.
Note: Parse configurations are global Ghidra settings, not specific to a project or "program". They are stored in a version specific .ghidra directory, like $HOME/.ghidra/.ghidra_9.2.2_PUBLIC/parserprofiles. They are a simple text format, so if you upgrade ghidra, it's probably safe to copy them to the new config directory.
Loading header files
- Select the parse configuration appropriate for your camera.
- Click "Parse to Program", and continue when prompted
- If a prompt about "Use Open Archives" appears, click continue. Note: The prompt may be covered by another dialog titled "Parsing C Files". If so, move the "Parsing C files" dialog out of the way.
- If parsing is unsuccessful, the preprocessed output will appear in your system home directory in a file named CParserPlugin.out
- If parsing succeeds, dismiss the Parse C Source dialog.
Applying function data types
Right click on your firmware program name in the data types manager window, and select "Apply Function Data Types".
If you update the header files, just repeat the loading and applying steps above. You can re-use parse configurations for camera models that require identical defines.
Additional analysis scripts
- CommentPropCalls.py labels property case calls with names from CHDK propsetN.h. Note if you are working on a port with an unknown or new propset, this could cause confusion if used before the propcase IDs have been verified. This script can be re-run any time the popset file is updated.
- ListPropCalls.py prints the address of calls to propcase functions with the specified propcase IDs in the console.
- CommentLeventCalls.py labels calls to "Logical Event" related functions with names from the levent_table.
- ListPropCalls.py prints the address of calls to Logical Event functions with the specified names or IDs in the console.
- CommentMzrmCreateCalls.py - for thumb2 firmware, add comments with the name of mzrm messages, using list generated from tools in https://chdk.setepontos.com/index.php?topic=11316.msg129104#msg129104
- ListMzrmCreateCalls.py - as above, but list address where specific messages are created
- NameMzrmFunctions.py - As above, but name the calling function with the name of the message, if the function only creates one message.
For the "List" scripts, clicking on the address printed in the console jumps to the call.
Due to limitations with the method used to obtain call arguments, all of the above scripts currently only work in code that is part of a function defined in Ghidra. If a call is encountered outside of a function, the address printed in the console. You can click on the listed address, create functions and re-run the script if you want to include that code in your search.
On digic 6 and later, use F12 to disassemble in thumb mode, F11 for ARM. On these models, almost all the firmware is thumb. Pre-digic 6 models are all (or virtually all) ARM code.
If you run into large chunks that look like code but don't disassemble well, it's likely they are source blobs that are copied elsewhere, or run on a separate processor, or both. Restricting the executable ranges in the memory map helps avoid this, but the disassembler can still flow outside them.
- In Edit -> Tool Options -> Listing fields, Operands field: Uncheck "Markup inferred register references". If this is checked, registers in functions that have parameters defined will be shown with the parameter name in disassembly, even when the register has been re-used from something totally different.
- In Edit -> Tool Options -> Decompiler - > Analysis: Uncheck "Eliminate unreachable code". This makes decompiler output more accurately reflect the program logic when it directly references flash sectors uses for settings or checks whether the code is running in a ROM address. Alternately, you can turn off "Respect readonly flags" in the same page.
Working with multiple firmwares
In many cases, you'll want to look at more than one firmware at once. For example, when porting, you usually want an already ported firmware as a reference. There are several ways to do this:
- Open additional programs in the code browser from the file menu. These show up as additional tabs. This has some quirks: Using the "back" navigation can jump between tabs, and switching tabs causes things like search and string windows to update slowly.
- Open multiple copies of the code browser. This seems to work OK, but you may get warnings about settings conflicting with the other copy.
- Use the Ghidra "version tracking" tool, described in Ghidra Version Tracking workflow for porting. This is by far the best option for porting or doing detailed comparisons.
Program Trees window
The "Program Trees" window allows you to label regions of memory. It defaults to the the regions defined in the memory map, but is not always updated to reflect memory map changes. You can create a fresh tree from the memory map using the "new default tree" button (far left with a green plus) in the toolbar. Trees can be renamed or deleted by right clicking on tab label.
These mostly act on either the current address, or current selection. You generally have to undefine before converting code to data or vice versa.
- F11 decompile ARM (Thumb supporting CPUs only)
- F12 decompile Thumb (Thumb supporting CPUs only)
- d decompile
- c undefine the current address or selection
- p treat data at current address as a pointer
- ' cycle through various types of strings
- b cycle through integer types
- middle mouse on a register, highlights references to reg, with the most recent assignment highlighted in a different color