How an environment variable annoyed me for three months

Guillaume Pagnoux

Created: 2019-05-14 Tue 19:09

Introduction

  • I had to work on some ARM64 board
  • More specifically, add support for some device to it
  • It didn't go as I planned

Context

The board

  • Marvell Espressobin
  • Aimed to be used in production
  • So we won't use Marvell's provided rootfs and build our own with Yocto
  • We also won't use Marvell's provided u-boot

u-boot ?

  • The de facto ARM bootloader
  • Lots of features..
  • Including something called verified boot

Verified boot ?

  • Basically secure boot
  • Supported by the EB thanks to its OTP and trusted zone
  • Works by establishing a chain of trust (like basically every kind of secure booting ?)
  • We need to use FIT Images

FIT images ??

  • ARM boards comes with very specific hardware
  • Supporting that wide range of hardware is a pain for kernel developpers
  • So device trees were invented
  • The device tree is passed to the kernel by u-boot so the kernel knows which device to init, and how to initialize it

You may be thinking:

That's awesome! Why are you telling us this ?

Because we need to rebuild u-boot with Verified Boot and FIT images support.

The problem

It's more problems really…

Building the driver in the kernel

  • So we enable our driver in menuconfig, generate an image with Yocto and flash it
  • We reset the board and…it fails silently..

    ## Booting image at 80500000 ...
    Image Name:   Linux-3.4.34-01535-g499e8d5-dirt
    Image Type:   ARM Linux Kernel Image (uncompressed)
    Data Size:    3519528 Bytes =  3.4 MB
    Load Address: 80008000
    Entry Point:  80008000
    Verifying Checksum ... OK
    
    Starting kernel ...
    

Debugging ?

  • At this point, u-boot has passed control to Linux
  • There is no JTAG, or any other kind of debugging ports available
  • Early printk does nothing

Rebuilding u-boot with debug print ?

  • At the time, we used an u-boot binary we built some time ago
  • Because we can't build a working one anymore for still unknown reasons..
  • Figuring this out requires reversing..I don't know what this is
  48   │ WRITE: 0xC0001010 0x00100100
  49   │ WRITE: 0xC0001014 0x00080200
  50   │ WRITE: 0xC000101C 0x90118011
  51   │ WRITE: 0xC0001028 0x00000011
  52   │ WRITE: 0xC0001040 0x00000607
  53   │ WRITE: 0xC00010C0 0x51000000
  54   │ WRITE: 0xC0001050 0x15150000
  55   │ WRITE: 0xC0001054 0x20100000
  56   │ WRITE: 0xC0001074 0x15150000
  57   │
  58   │ ;Setp7: init read fifo pointer and OFF spec parameter
  59   │ WRITE: 0xC0001000 0x00004032
  60   │ WRITE: 0xC00003bc 0x02020404
  61   │
  62   │ ;Step8: phyinit_sequence_sync2(1, 3, 2, 0)
  63   │ WRITE: 0xC0001014 0x00080200

I'll spare you the Perl code that comes along with this..

So I basically spent three months in that situation:

  • A kernel that used to boot does not anymore because of a driver (why?)
  • u-boot can't be modified, thanks to the super chinese code (I hate you)
  • Early printk does not work (but why?)

The solution

Let's observe what we've got

  • The non-booting kernel is bigger (duh!)
  • Its signature and checksum are good
  • The device tree is sane
  • Wait..the non-booting kernel is bigger

What is the thing loaded after the kernel?

The device tree?

What can we observe about the device tree ?

  • It's the exact same one as before
  • It's load address is different

(Disclaimer: from this point, this is really how I think it works based on my observations)

What IF ?

If the kernel can't load the device tree:

  • Loading it to initialize the hardware is one of the first thing it does. Without the device tree, the kernel is doomed pretty early.
  • We access the console via serial. Technically, its a device too.
  • That would explain why even early printk can't help me.

That makes no sense!

U-boot does load the FDT!

Ever heard of lowmem pool?

  • During early boot, the kernel can only access a memory area called lowmem pool
  • By default, its something like 256M
  • So I took a look at the addresses
  • And realized the fdt is loaded after..

U-boot is a nice guy

  • U-boot has a relocation mechanism used when loading binaries
  • This way it can make sure the loaded binaries don't overlap in memory

Here comes the environment variable

U-boot has a notion of environment:

 bootdelay=1
 baudrate=115200
 ipaddr=192.168.0.2
 serverip=192.168.0.1
 netmask=255.255.255.0
 bootfile="fitImage"
 bootcmd=mmc read 80200000 280000 400000;bootm 80200000
 bootargs=console=ttyS2,115200n8 console=tty0 root=/dev/mtdblock4 rw rootfstyp=jffs2 nohz=off
 stdin=serial
 stdout=serial
 stderr=serial
 kernel_addr=0x5....... /* Something we don't care */
 fdt_addr=0x5....... /* Something whatever */
 fdt_high=0xffffffff

See that fdt_high variable?

  • Its value is not random in the previous example
  • 0xffffffff disables the relocation feature of u-boot!
  • What if we give it a value, let's say, 256M or before?
...
[    0.636324] usbcore: registered new interface driver usbfs
[    0.636328] usbcore: registered new interface driver hub
[    0.636345] usbcore: registered new device driver usb
[    0.636359] pps_core: LinuxPPS API ver. 1 registered
[    0.636360] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
[    0.636361] PTP clock support registered
[    0.636366] EDAC MC: Ver: 3.0.0
[    0.636412] Registered efivars operations
[    0.646290] PCI: Using ACPI for IRQ routing
[    0.647520] PCI: pci_cache_line_size set to 64 bytes
[    0.647520] e820: reserve RAM buffer [mem 0x0009f000-0x0009ffff]
[    0.647520] e820: reserve RAM buffer [mem 0x40004000-0x43ffffff]
[    0.647520] e820: reserve RAM buffer [mem 0xd3cff018-0xd3ffffff]
[    0.647520] e820: reserve RAM buffer [mem 0xd3d08018-0xd3ffffff]
...

Three months…for an environment variable…

Questions ?