TLDR: We built a tool - The Freedom Resource Monitor - to log and monitor system resources on any robot (GPU, network connectivity, topic data, webRTC diagnostics). It’s all logged historically so you can go back in time... and we're giving it away for free for a year here.
The Butterfly Effect Of System Resources
I can’t tell you how many times I’ve hunted down a bizarre one-off issue in an actuator to find out that the cause was a hamstrung resource like CPU or memory caused by a completely different process.
Using 105% of a core for a deep-learning model (yes, we have seen this!) can starve a second process that retrieves USB data, which then cascades to a camera driver failing, so it resets, creating significant CPU usage on restart and then it loops - but the robot mostly seems to be working, just at a slower frame rate… and then it stops after 3 hours and there is no clear reason why.
The story is the same for GPU resource utilization. The robot in question worked well in an air-conditioned lab then steering control went unstable on a hot summer day while delivering a burrito. Why would heat lead to unstable steering control? The answer wasn’t obvious to us either.
Here is what happened. Full usage of the GPU, combined with poor cooling, increased the temperature of the unit and the CPU switched to a lower-compute mode. Perception algorithms weren’t able to keep up with the frame rate, so frame callbacks start to stack up with unstable consequences. Finally motor updates slowed down and became unstable, which created a resonance in the robot. All of this was identified and tracked by viewing historical system resources.
Your Resource Monitoring Tools Need To Think Like A Roboticist
We realized that we can speed up tracking down these kinds of issues and enable optimizations to prevent them in the future by using the right tools. Visualization and introspection, without plugging in a monitor and keyboard to your robot’s computer, can be a lifesaver. We call it the Freedom Resource Monitor. It is designed to enable resource monitoring and optimization on robots from your first one to your millionth. If you want to try this now, you can sign up for free and get your first robot going.
Freedom Robotics Resource Monitor shows network utilization for the robot and breaks that utilization down by topic showing individual message size and bandwidth attributed to each datapoint. Upload exactly what you need and no more.
Current Robotics Resource Monitoring Is Painful
I often find myself running a robot, then running top and watching processes run and seeing when the spike happens. But then wait - was that a spike? Did that happen at the same time that the SLAM node stopped?
Trying to combine linux packages such as top, nvidia-smi, sensors, dmesg together and then remembering to keep them running to profile performance is a pain, and standard cloud-server devops services are focused on a very different set of challenges so they don’t represent the data in a way that allows you to easily debug your robot.
Cloud server system and network monitoring has made a ton of sense for IT infrastructure - we just built it right for robotics.
Trying to use top and other tools works somewhat well when you are watching a process in detail in the lab, but these tools fail when you have to review historical data or look for causal triggers between resources, processes and the environment.
Announcing Freedom Robotics Resource Monitor
We at Freedom Robotics are excited to announce the release of resource optimization for robots. It installs in one line and works on any linux system (within reason). Once installed it sits in the background measuring and logging resource utilization across fleets of robots from 1 to 1 million.
Zooming into a particular process reveals a steep ramp in memory indicating this node is most likely the culprit of a memory leak.
You can check out a replay of a robot’s resources here.
How it Works
It introspects and logs to the cloud all system resources which can be viewed in real time or historically. You can then use the API or the Freedom App to understand and correlate system resource changes to your code, environment, and compute changes.
Examples of logged system resources:
- CPU Usage
- CPU Temperature
- Memory (RAM) Usage
- Disk Usage
- GPU Usage
- GPU Temperature
- Process-level CPU Usage
- Process-level Memory Usage
- Network Bandwidth
- Network Errors
- webRTC-specific Network Bandwidth
- Overall Topic Message Upload Bandwidth
- Individual Topic Message Frequencies
- Individual Topic Message Bandwidth
Ways to access them:
- Resource Measurement Tab of the App
- Historical Replays
- Share Links
- Directly on the API
View of the total network connectivity data during a teleoperation session (top) with the network bandwidth utilized by webRTC (bottom). The initiation of the session correlates directly to a spike in network traffic, which makes sense since webRTC takes more bandwidth than normal upload operations.
When Can I Use This?
Resource optimization for robots is live and ready to work with your robot right now. It’s a one line install on any linux system and takes less than 5 minutes to set up and try. We’re eager to see what the community does with it and would love to hear how it’s being used.
If you want to try this now, you can quickly sign up for free here and get your first robot going.