{"id":602720,"date":"2023-11-20T03:49:09","date_gmt":"2023-11-20T11:49:09","guid":{"rendered":"https:\/\/dreness.com\/blog\/?p=602720"},"modified":"2024-06-07T11:25:19","modified_gmt":"2024-06-07T18:25:19","slug":"a-better-activity-metric","status":"publish","type":"post","link":"https:\/\/dreness.com\/blog\/archives\/602720","title":{"rendered":"A Better Activity Metric"},"content":{"rendered":"\n<p>Measuring system-wide activity on a computer usually begins by measuring the aggregate utilization of key hardware components like CPU, storage, and network. These kinds of metrics generally come in two different forms: <em>counters<\/em> and <em>snapshots<\/em>. Understanding the difference is important, because they are not always equally useful.<\/p>\n\n\n\n<p>A counter keeps track of the total number of &#8216;units&#8217; processed by the resource, like octets across a network device or blocks read or written on a storage device. (I am specifically avoiding the word &#8216;work&#8217;, because that&#8217;s a whole can of worms.) Counters usually start at zero (boot time) and increment forever until they roll over. Assuming a counter&#8217;s value is maintained by the sole arbiter of the hardware resource (i.e. the kernel), you can be confident that all the activity of that resource is represented in the counter&#8217;s value.<\/p>\n\n\n\n<p>A snapshot, on the other hand, is basically a way to answer a question like &#8220;how much is happening right now&#8221; &#8211; the important part being &#8220;<em>right now&#8221;. <\/em>Computers are very fast, so any kind of instantaneous measurement often describes a very small time quantum &#8211; which is a very different scope compared to a counter&#8217;s reference point (&#8220;since boot&#8221;). Any sort of rate is a snapshot.<\/p>\n\n\n\n<p>Let&#8217;s proceed to the thrust of this post with an automobile analogy: using CPU utilization to measure overall CPU activity is like trying to infer how far a car has traveled by looking at the speedometer instead of the odometer. To demonstrate, open Activity Monitor (or <code>top<\/code>, <code>htop<\/code>, etc), wait a few seconds to get a &#8216;baseline&#8217;, then run the following small shell script:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">while true ; do \/bin\/echo zoop &gt; \/dev\/null ; done<\/pre>\n\n\n\n<p>While that script is running, you should see an uptick in activity, but you may never see <code>echo<\/code> in the process list. That&#8217;s because the amount of time required by <code>echo<\/code> to zoop to <code>\/dev\/null<\/code> is small compared to the amount of time it takes to start any new process (on Darwin-based systems like macOS, iOS, etc). For that reason, any instantaneous snapshot of active processes is statistically unlikely to include very short-lived processes &#8211; even if there are thousands of such processes being spawned every second. You can see where <em>some<\/em> of that &#8216;startup time&#8217; is going in the form of work done by long-lived processes like <code>kernel_task<\/code> or <code>dznd<\/code>.<\/p>\n\n\n\n<p>As it happens, macOS does provide a way to poll usage counters for (recently) deceased processes. The CLI tool <code>powermetrics<\/code> can fetch it for you. Let&#8217;s ask for the <code>tasks<\/code> category of data and pull just the <code>cputime_ms_per_s<\/code> value from the <code>all_tasks<\/code> dictionary. Something in this data isn&#8217;t json-compatible so no svelte jq one-liners &#8211; let&#8217;s use Python.<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">#!python -u\n\nimport subprocess\nimport plistlib\nimport sys\n\n\"\"\"\nPrint selected power metrics from \/usr\/bin\/powermetrics at intervals, suitable for graphing.\n\"\"\"\n\ninterval = sys.argv[1] if len(sys.argv) &gt; 1 else 1000\ncmd = f\"sudo powermetrics -i {interval} -s tasks --format plist\"\n\n# Execute the long-running command, which produces one plist to stdout every\n# interval, spread across multiple lines.\n# For each line of output from `powermetrics`:\n#   - accumulate lines into a bytearray\n#   When \"&lt;\/plist&gt;\" is seen:\n#    - parse the accumulated bytearray into a plist\n#    - print the metric of interest\n#    - clear the bytearray\npl = bytearray()\n# using shell mode here because we need sudo for powermetrics\nfor line in subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE).stdout:\n    pl.extend(line)\n    # look for the plist close tag\n    if \"&lt;\/plist&gt;\" in str(line):\n        # powermetrics adds a null byte at the start of each plist after the\n        # first, which breaks the plist parser.  Remove it.\n        if pl[0] == 0:\n            pl = pl[1:]\n        d = plistlib.loads(pl[0:])\n        print(d.get('all_tasks').get('cputime_ms_per_s'))\n        pl = bytearray()<\/pre>\n\n\n\n<p>Make sure you have <code>sudo<\/code> rigged up to work without a password, or if you don&#8217;t, just run <code>sudo whoami<\/code> no more than 5 minutes before running the above in the same terminal. Instead of just reading lines of numbers, let&#8217;s pipe to <code>ttyplot<\/code>. Here&#8217;s a picture of a <code>ttyplot<\/code> showing what happens when I run the <code>echo<\/code> loop from earlier:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><a href=\"https:\/\/dreness.com\/blog\/wp-content\/uploads\/2023\/11\/total_cpu_time.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"555\" src=\"https:\/\/dreness.com\/blog\/wp-content\/uploads\/2023\/11\/total_cpu_time-1024x555.png\" alt=\"\" class=\"wp-image-602723\" style=\"aspect-ratio:1.845045045045045;width:840px;height:auto\" srcset=\"https:\/\/dreness.com\/blog\/wp-content\/uploads\/2023\/11\/total_cpu_time-1024x555.png 1024w, https:\/\/dreness.com\/blog\/wp-content\/uploads\/2023\/11\/total_cpu_time-300x163.png 300w, https:\/\/dreness.com\/blog\/wp-content\/uploads\/2023\/11\/total_cpu_time-1536x833.png 1536w, https:\/\/dreness.com\/blog\/wp-content\/uploads\/2023\/11\/total_cpu_time.png 1660w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>The big chunk there is when I was running the loop, but the first little chunk was &#8216;something else&#8217;. If this feels unsatisfying, you&#8217;re thinking correctly :) The difference between the low point and the high point in this graph feels around the same as the difference you can see in Activity Monitor or <code>htop<\/code> when running the <code>echo<\/code> loop. If your intuition suggests there&#8217;s more going on that we&#8217;re not yet seeing, that&#8217;s also correct! <\/p>\n\n\n\n<p>Let&#8217;s stop beating around the bush: by far the best metric for measuring overall system activity level is <em>power consumed<\/em>. As the name suggests, powermetrics has that for us also. This time let&#8217;s ask for the <code>cpu_power<\/code> category, and fetch the value at <code>processor.combined_power<\/code>.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/dreness.com\/blog\/wp-content\/uploads\/2023\/11\/total_power.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"555\" src=\"https:\/\/dreness.com\/blog\/wp-content\/uploads\/2023\/11\/total_power-1024x555.png\" alt=\"\" class=\"wp-image-602724\" srcset=\"https:\/\/dreness.com\/blog\/wp-content\/uploads\/2023\/11\/total_power-1024x555.png 1024w, https:\/\/dreness.com\/blog\/wp-content\/uploads\/2023\/11\/total_power-300x163.png 300w, https:\/\/dreness.com\/blog\/wp-content\/uploads\/2023\/11\/total_power-1536x833.png 1536w, https:\/\/dreness.com\/blog\/wp-content\/uploads\/2023\/11\/total_power.png 1660w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n\n\n\n<p>In the above graph, I ran the shell loop for two bursts that are clearly visible and substantially above the noise floor.<\/p>\n\n\n\n<p>Here&#8217;s a <a href=\"https:\/\/gist.github.com\/dreness\/1c5117dd2d185dfbeed1f970213b6dc7\">gist of the python script<\/a>.<\/p>\n\n\n\n<p>As a bonus, here&#8217;s a little program that prints out the number of new processes created every interval (without burning a PID in the process&#8230; heyo), which I will call <a href=\"https:\/\/gist.github.com\/dreness\/e4d699a73037db9733f69c2629b82a4f\">pidpersec.c<\/a>, as a nod to the script it&#8217;s replacing: pidpersec.d. Running the shell loop with that program&#8217;s output piped to ttyplot looks like this:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><a href=\"https:\/\/dreness.com\/blog\/wp-content\/uploads\/2023\/11\/new_processes.png\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"508\" src=\"https:\/\/dreness.com\/blog\/wp-content\/uploads\/2023\/11\/new_processes-1024x508.png\" alt=\"\" class=\"wp-image-602725\" srcset=\"https:\/\/dreness.com\/blog\/wp-content\/uploads\/2023\/11\/new_processes-1024x508.png 1024w, https:\/\/dreness.com\/blog\/wp-content\/uploads\/2023\/11\/new_processes-300x149.png 300w, https:\/\/dreness.com\/blog\/wp-content\/uploads\/2023\/11\/new_processes-1536x762.png 1536w, https:\/\/dreness.com\/blog\/wp-content\/uploads\/2023\/11\/new_processes.png 1660w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/a><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Using CPU utilization to measure overall CPU activity is like trying to infer how far a car has traveled by looking at the speedometer instead of the odometer. <a href=\"https:\/\/dreness.com\/blog\/archives\/602720\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[36,6,13],"tags":[],"class_list":["post-602720","post","type-post","status-publish","format-standard","hentry","category-quiesce","category-scripts","category-the-more-you-know"],"_links":{"self":[{"href":"https:\/\/dreness.com\/blog\/wp-json\/wp\/v2\/posts\/602720","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/dreness.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/dreness.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/dreness.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/dreness.com\/blog\/wp-json\/wp\/v2\/comments?post=602720"}],"version-history":[{"count":5,"href":"https:\/\/dreness.com\/blog\/wp-json\/wp\/v2\/posts\/602720\/revisions"}],"predecessor-version":[{"id":602729,"href":"https:\/\/dreness.com\/blog\/wp-json\/wp\/v2\/posts\/602720\/revisions\/602729"}],"wp:attachment":[{"href":"https:\/\/dreness.com\/blog\/wp-json\/wp\/v2\/media?parent=602720"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/dreness.com\/blog\/wp-json\/wp\/v2\/categories?post=602720"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/dreness.com\/blog\/wp-json\/wp\/v2\/tags?post=602720"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}