The deployment of software for embedded systems is often a complex procedure that should be carefully designed, implemented, and tested. There are two major challenges:
- Embedded systems are often deployed in places that are difficult or impractical for a human operator to access.
- If software deployment fails, the system can become inoperable. It will require the intervention of a skilled technician and additional tools for recovery. This is expensive and often impossible.
A solution for the first challenge of embedded systems that are connected to the internet was found in the form of Over-the-Air (OTA) updates. A system periodically connects to the dedicated server and checks for available updates. If the updated version of the software is found, it is downloaded to the device and installed to the persistent memory.
This approach is widely adopted by manufacturers of smartphones, Set-Top-Box (STB) appliances, smart TVs, and game consoles connected to the internet.
When designing OTA updates, system architects should take into account many factors that affect the scalability and reliability of the overall solution. For example, if all devices check for updates at approximately the same time, it creates high peak loads in the update servers, while leaving them idle all other time. Randomizing the check time keeps the load distributed evenly. The target system should be designed to reserve enough persistent memory to download the complete update image before applying it. The code implementing the updated software image download should handle network connection drops and resume download once the connection is recovered, rather than start over. Another important factor of OTA update is security. The updated process should only accept genuine update images. Updates are cryptographically signed by the manufacturer and an image is not accepted by the installer running on the device unless the signature matches.
Developers of embedded systems are aware that the update may fail for different reasons; for example, a power outage during the update. Even if the update completes successfully, the new version of the software may be unstable and crash on startup. It is expected that even in such situations the system will be able to recover.
This is achieved by separating the main software components and the bootloader. The bootloader validates the consistency of the main components, such as the operating system kernel and root filesystem that contains all the executables, data, and scripts. Then, it tries to run the operating system. In the case of failure, it switches to the previous version, which should be kept in the persistent memory along with the new one. Hardware watchdog timers are used to detect and prevent situations where a software update causes the system to hang.
It is impractical to use OTA or complete image re-flashing during software development and testing. It significantly slows down the development process. Engineers use other ways to deploy their software builds to the development systems, such as a remote shell or network filesystems that allow file sharing between developers' workstations and target boards.