<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>sigalrm &#8211; richliu&#039;s blog</title>
	<atom:link href="https://richliu.com/tag/sigalrm/feed/" rel="self" type="application/rss+xml" />
	<link>https://richliu.com</link>
	<description>Linux, 工作, 生活, 家人</description>
	<lastBuildDate>Thu, 01 Sep 2016 10:24:22 +0000</lastBuildDate>
	<language>zh-TW</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>
	<item>
		<title>SIGALARM / timer_create 造成 CPU sys 100% 的問題</title>
		<link>https://richliu.com/2016/09/01/2002/sigalarm-timer_create-%e9%80%a0%e6%88%90-cpu-sys-100-%e7%9a%84%e5%95%8f%e9%a1%8c/</link>
					<comments>https://richliu.com/2016/09/01/2002/sigalarm-timer_create-%e9%80%a0%e6%88%90-cpu-sys-100-%e7%9a%84%e5%95%8f%e9%a1%8c/#respond</comments>
		
		<dc:creator><![CDATA[richliu]]></dc:creator>
		<pubDate>Thu, 01 Sep 2016 10:19:27 +0000</pubDate>
				<category><![CDATA[Linux]]></category>
		<category><![CDATA[Programming]]></category>
		<category><![CDATA[epoll]]></category>
		<category><![CDATA[pthread]]></category>
		<category><![CDATA[sigalrm]]></category>
		<category><![CDATA[timer_create]]></category>
		<guid isPermaLink="false">https://richliu.com/?p=2002</guid>

					<description><![CDATA[<p>最近遇到一個怪問題, 某一隻程式跑起來的時候, 有一定的機率 sys 佔有率是 100% Cpu0  :  0 [&#8230;]</p>
<p>The post <a rel="nofollow" href="https://richliu.com/2016/09/01/2002/sigalarm-timer_create-%e9%80%a0%e6%88%90-cpu-sys-100-%e7%9a%84%e5%95%8f%e9%a1%8c/">SIGALARM / timer_create 造成 CPU sys 100% 的問題</a> appeared first on <a rel="nofollow" href="https://richliu.com">richliu&#039;s blog</a>.</p>
]]></description>
										<content:encoded><![CDATA[<p>最近遇到一個怪問題, 某一隻程式跑起來的時候, 有一定的機率 sys 佔有率是 100%</p>
<blockquote><p>Cpu0  :  0.0%us,<span style="color: #ff0000;">100.0%sy</span>,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st</p></blockquote>
<p><span id="more-2002"></span></p>
<p>原來的 code 長的像是這樣</p>
<pre lang="c">#include <time .h>
#include <stdio .h>
#include <errno .h>
#include <stdlib .h>
#include <string .h>
#include <signal .h>
#include <unistd .h>
#include <pthread .h>


#ifdef USE_SEM
sem_t g_timer_sem_hdlr;
#endif


void  sig_timer_hanler(void)
{
#ifdef USE_SEM
	sem_post(&amp;g_timer_sem_hdlr);
#endif
	return ;
}

void main(){

unsigned int period=1000;
	struct itimerval it_val;	/* for setting itimer */
	int i=0;
#ifdef USE_SEM
	sem_init(&amp;g_timer_sem_hdlr,1,0);
#endif

	if (signal(SIGALRM, (void (*)(int)) sig_timer_hanler) == SIG_ERR)
	{
		perror("Unable to catch SIGALRM");
		return ;
	}
	it_val.it_value.tv_sec =  0;
	it_val.it_value.tv_usec = period;
	it_val.it_interval = it_val.it_value;
	if (setitimer(ITIMER_REAL, &amp;it_val, NULL) == -1)
	{
		perror("error calling setitimer()");
		return ;
	}
	printf("test started.\n");
	while(1){
#ifdef USE_SEM
		sem_wait(&amp;g_timer_sem_hdlr);
#else
		usleep(1000);
#endif
		if((i%1000)==0) printf("%s: hello, i=%d\n",__func__,i);
		i++;
	}
	printf("exited.\n");
	return ;
}
</pthread></unistd></signal></string></stdlib></errno></stdio></time></pre>
<p>如果用 perf 下去看, 大概長得像這樣,  kernel space 的 wait 不是真正的 wait</p>
<pre lang="text">   
66.86%  swapper  [kernel.kallsyms]  [k] r4k_wait
        |
        --- r4k_wait
        cpu_idle

32.21%    test  test               [.] main
0.12%     test  [kernel.kallsyms]  [k] finish_task_switch
        |
        --- finish_task_switch
        |
        |--54.54%-- __schedule
        |          schedule
        |          |
        |          |--83.34%-- work_resched
        |          |
        |           [--16.66%-- schedule_timeout
        |                     do_sigtimedwait
        |                     SyS_rt_sigtimedwait
        |                     handle_sys64
        |
        --45.46%-- schedule_tail
</pre>
<p>打開 kernel spin lock stat 也看不出所以然來(註: 上面是 100% sys cpu 的, 下面是正常的</p>
<div id="attachment_2003" style="width: 310px" class="wp-caption aligncenter"><a href="https://richliu.com/wp-content/uploads/2016/08/spinlock.png"><img decoding="async" aria-describedby="caption-attachment-2003" class="size-medium wp-image-2003" src="https://richliu.com/wp-content/uploads/2016/08/spinlock-300x145.png" alt="spinlock" width="300" height="145" srcset="https://richliu.com/wp-content/uploads/2016/08/spinlock-300x145.png 300w, https://richliu.com/wp-content/uploads/2016/08/spinlock-768x371.png 768w, https://richliu.com/wp-content/uploads/2016/08/spinlock-1024x495.png 1024w" sizes="(max-width: 300px) 100vw, 300px" /></a><p id="caption-attachment-2003" class="wp-caption-text">spinlock</p></div>
<p>所以這就讓人很困擾, 到底出了什麼問題<br />
做過一些測試, 例如:<br />
Enable Realtime option , 改用 tickless or 1000HZ tick , 檢查 spin lock deadlock ,<br />
叫出 Sysrq 出來看(看不出來有什麼特別的)<br />
不過仍然會發生, 百思不得其解.</p>
<p>除了自己的 Embedded System , 連 Ubuntu PC 也可以複製的出來,<br />
所以這就排除了 SoC vendor 的問題, 理論上我們的應用不是特別奇怪.<br />
所以心中訥悶, 這世界上這麼多人只有我們有問題嗎?</p>
<p>關於這個問題的人討論的很少, 所以可能要多翻一下 Google , 多濾一些關鍵字.</p>
<p>直到翻到這個網站, <a href="https://nativeguru.wordpress.com/2015/02/19/why-you-should-avoid-using-sigalrm-for-timer/" target="_blank" rel="noopener">WHY YOU SHOULD AVOID USING SIGALRM FOR TIMER</a><br />
他的說法是</p>
<blockquote><p>The problem occurs when the signal function is triggered just before we call setitimer().<br />
The signal handler blocks, as the mutex is currently being held by do_timer_bookkeeping() and this is the deadlock.</p></blockquote>
<p>不過我們沒有 deadlock , 但是他的發想很好, 仍然可以參考他講的去做</p>
<blockquote><p>To fix it, I removed all the SIGALRM code and replaced it with setitimer.<br />
I set the previous SIGALRM handler as the thread that will be spawned whenever the timer expired.</p></blockquote>
<p>看起來是好主義</p>
<p>所以我就用 timer_create 改寫了一下這隻程式,<br />
好的事情是, 情況有改善, 不會在 10 分鐘就發生,<br />
壞的事情是, 仍然會發生. 雖然極難複製這個問題, 但是會發生就是很討厭, 會留一個尾巴沒有辦法收尾.</p>
<p>所以持續在看有沒有其他的解決方案.<br />
(timer_create 的寫法可以參考<a href="https://wirelessr.gitbooks.io/working-life/content/linux_timer.html" target="_blank" rel="noopener"> Linux timer</a>)</p>
<p>中間有翻到 pthread 的 sigmask 用 thread 處理 SIGALRM 訊號,<br />
不過多想一下, 個人認為應該也會踏到同樣的問題, 所以做完功課之後, 這個方式晚一點再說.</p>
<p>又翻到這一篇<a href="http://stackoverflow.com/questions/2586926/setitimer-sigalrm-multithread-process-linux-c" target="_blank" rel="noopener">setitimer, SIGALRM &amp; multithread process (linux, c) </a>其中有說到,</p>
<blockquote><p>In the topic, Andi Kleen (Intel) <a href="https://lkml.org/lkml/2010/4/11/81" rel="nofollow noopener" target="_blank">recommends to switch to</a> &#8220;<em>POSIX timers (<a href="http://pubs.opengroup.org/onlinepubs/7908799/xsh/timer_create.html" rel="nofollow noopener" target="_blank"><code>timer_create</code></a>)</em>&#8220;; and in <a href="https://lkml.org/lkml/2010/4/11/92" rel="nofollow noopener" target="_blank">ML thread</a> Davide Libenzi suggests use of <code>timerfd</code> (timerfd_create, timerfd_settime) on non-ancient Linuxes.</p></blockquote>
<p>答案就呼之欲出了, 要改用 timerfd 改寫我目前的 timer_create.<br />
不過 timerfd 是 trigger fd 用的, 不會呼叫 call back function, 所以要配合 epoll 用, 而且 epoll 是 IO event trigger , 所以要配合 pthread 去處理 epoll .</p>
<p>程式改寫完的結果, 看起來效能比 SIGALRM / timer_create 好. 但是複雜許多, 有<del>抄</del>參考別人的 sample code ,<br />
不過考量這些 code 要處理 multiple fd, 而我目地只是取代原來的 SIGALRM, 所以又再更精簡化, 一般會更複雜的.</p>
<p>程式碼</p>
<pre lang="C">#include <stdio .h>
#include <stdlib .h>
#include <sys /time.h>
#include <pthread .h>
#include <semaphore .h>
#include <signal .h>
#include <unistd .h>
#include <time .h>
#include <sys /epoll.h>
#include </sys><sys /timerfd.h>


sem_t g_timer_sem_hdlr;
int timerfd;
pthread_t sig_thread;

#define MAX_EVENT 4

static int      epfd = -1;
static struct   epoll_event events ;
static int      sig_timer;
static struct itimerspec        its;
static struct epoll_event      ev;


void *sig_epoll_wait( void *ptr){
    int numEvent=0;
    unsigned long value;
    struct event_record *pevent = NULL;
    int i;

    while(1){
        numEvent = epoll_wait( epfd, &amp;events,1 , -1 );

        for( i=0; i &lt; numEvent ; i++){

            if( epoll_ctl( epfd, EPOLL_CTL_DEL, timerfd, &amp;ev) == -1)
                perror("[1] epoll_ctl del");

            if(timerfd_settime( timerfd, 0, &amp;its, NULL ) == -1 )
                perror("[1] timerfd_settime");

            if( epoll_ctl( epfd, EPOLL_CTL_ADD, timerfd, &amp;ev) == -1)
                perror("[1] epoll_ctl add");
            sem_post(&amp;g_timer_sem_hdlr);
        }
    }

    return NULL;
}

void create_timer(int sig, int msec)
{

    timerfd = timerfd_create(CLOCK_MONOTONIC, 0);
    if( timerfd == -1)
        perror("timerfd_create");

    its.it_value.tv_sec = 0;
    its.it_value.tv_nsec = msec * 1000000;
    its.it_interval.tv_sec = its.it_value.tv_sec;
    its.it_interval.tv_nsec = its.it_value.tv_nsec;

    if ( timerfd_settime( timerfd, 0, &amp;its, NULL ) == -1 )
        perror( "timerfd_settime" );

    // Create epoll
    epfd = epoll_create( 1 );
    if ( epfd == -1 )
        perror(" epoll_create ");

    ev.data.ptr = &amp;sig_timer;
    ev.events   = EPOLLIN | EPOLLET;

    // enable epoll

    if( epoll_ctl( epfd, EPOLL_CTL_ADD, timerfd, &amp;ev) == -1)
        perror("[0] epoll_ctl");

    // Create Thread
    pthread_create(&amp;sig_thread, NULL , sig_epoll_wait , NULL);

}

void main(){
    int i=0;
    sem_init(&amp;g_timer_sem_hdlr,1,0);

    printf("Create Timer \n");
    create_timer(5566, 5);  // ms
    while(1){
        sem_wait(&amp;g_timer_sem_hdlr);
        if((i%10000)==0) printf("%s: hello, i=%d\n",__func__,i);
        i++;
    }
    printf("exited.\n");
    return ;
}


</sys></time></unistd></signal></semaphore></pthread></sys></stdlib></stdio></pre>
<p>其他有些資訊是提到 top 計算不準的問題</p>
<p>https://www.kernel.org/doc/Documentation/cpu-load.txt<br />
https://lkml.org/lkml/2007/2/12/6</p>
<p>ref.</p>
<ul>
<li><a href="https://github.com/seiyak/ssebot-samples/blob/master/sample-timerfd-epoll.c" target="_blank" rel="noopener">Linux FD Handler 以及 Timer 機制</a></li>
<li><a href="https://github.com/seiyak/ssebot-samples/blob/master/sample-timerfd-epoll.c" target="_blank" rel="noopener">ssebot-samples/sample-timerfd-epoll.c : </a>這 code 寫的不錯, 主要是學習這個的.</li>
</ul>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>The post <a rel="nofollow" href="https://richliu.com/2016/09/01/2002/sigalarm-timer_create-%e9%80%a0%e6%88%90-cpu-sys-100-%e7%9a%84%e5%95%8f%e9%a1%8c/">SIGALARM / timer_create 造成 CPU sys 100% 的問題</a> appeared first on <a rel="nofollow" href="https://richliu.com">richliu&#039;s blog</a>.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://richliu.com/2016/09/01/2002/sigalarm-timer_create-%e9%80%a0%e6%88%90-cpu-sys-100-%e7%9a%84%e5%95%8f%e9%a1%8c/feed/</wfw:commentRss>
			<slash:comments>0</slash:comments>
		
		
			</item>
	</channel>
</rss>
