[linux] APIC error a TX timeout

Martin Kyrc martin.kyrc na gmail.com
Čtvrtek Leden 11 13:33:07 CET 2007


ahojte,

vcera som spustal nejake benchmark testy smerom na web server 
(nepodstatne). test spocival v generovani http requestov pomocou siege.
dnes rano som zistil, ze testy nezbehly uplne do konca a dovod bol ten, 
ze klient (problemovy host) stratil ip konektivitu. zaujimave bolo vsak 
to, ze L2 konektivita medzi klientom a switchom bola. ked som prepojil 
klienta priamo s inym pc a tam spustil tcpdump, tak UDP packety od 
'klienta' odchadzali, no ICMP, ani TCP som uz nevidel... mozno to ma 
suvis s textom uvedenym pod log vystupom.

po chvily zistovania, ci nie je problem medzi stolickou a klavesnicou, 
pohlad do logu ukazal toto:

<!-- test bol spusteny asi okolo 22:15 --->

Jan 10 22:42:08 lab-elbrus kernel: APIC error on CPU0: 01(01)
Jan 10 22:54:17 lab-elbrus kernel: APIC error on CPU0: 01(05)
Jan 10 22:54:17 lab-elbrus kernel: APIC error on CPU0: 05(0c)
Jan 11 00:10:38 lab-elbrus kernel: APIC error on CPU0: 0c(01)
Jan 11 00:34:31 lab-elbrus kernel: APIC error on CPU0: 01(01)

<!-- cut; zhruba v 15-20min intervaloch sa to opakovalo -->
<!-- cut; medzi tym nic podstatne -->
<!-- cut; tesne pred vypadkom je v logu toto: -->

Jan 11 01:53:42 lab-elbrus kernel: APIC error on CPU0: 02(04)
Jan 11 01:53:42 lab-elbrus kernel: APIC error on CPU0: 04(02)
Jan 11 01:55:02 lab-elbrus syslog-ng[19150]: STATS: dropped 0
Jan 11 02:00:48 lab-elbrus kernel: APIC error on CPU0: 02(04)
Jan 11 02:05:02 lab-elbrus syslog-ng[19150]: STATS: dropped 0
Jan 11 02:06:50 lab-elbrus kernel: NETDEV WATCHDOG: eth0: transmit timed out
Jan 11 02:06:53 lab-elbrus kernel: eth0: Transmit timeout, status 0c 
0005 c07f media 10.
Jan 11 02:06:53 lab-elbrus kernel: eth0: Tx queue start entry 14333816 
dirty entry 14333812.
Jan 11 02:06:53 lab-elbrus kernel: eth0:  Tx descriptor 0 is 0008a042. 
(queue head)
Jan 11 02:06:53 lab-elbrus kernel: eth0:  Tx descriptor 1 is 0008a042.
Jan 11 02:06:53 lab-elbrus kernel: eth0:  Tx descriptor 2 is 0008a04a.
Jan 11 02:06:53 lab-elbrus kernel: eth0:  Tx descriptor 3 is 0008a04a.
Jan 11 02:06:53 lab-elbrus kernel: eth0: link up, 100Mbps, full-duplex, 
lpa 0x41E1
Jan 11 02:07:05 lab-elbrus kernel: NETDEV WATCHDOG: eth0: transmit timed out
Jan 11 02:07:08 lab-elbrus kernel: eth0: Transmit timeout, status 0c 
0005 c07f media 10.
Jan 11 02:07:08 lab-elbrus kernel: eth0: Tx queue start entry 4  dirty 
entry 0.
Jan 11 02:07:08 lab-elbrus kernel: eth0:  Tx descriptor 0 is 0008a04a. 
(queue head)
Jan 11 02:07:08 lab-elbrus kernel: eth0:  Tx descriptor 1 is 0008a04a.
Jan 11 02:07:08 lab-elbrus kernel: eth0:  Tx descriptor 2 is 0008a04a.
Jan 11 02:07:08 lab-elbrus kernel: eth0:  Tx descriptor 3 is 0008a04a.
Jan 11 02:07:08 lab-elbrus kernel: eth0: link up, 100Mbps, full-duplex, 
lpa 0x41E1

<!-- a stale dookola... --->

ak sa nemylim (opravte ma), tak doslo k preplneniu tx buffera a z dovodu 
timeoutu doslo k restartu eth0. to sa opakovalo aj rano, po 
rekonfiguracii a nastartovani interface. nie som si vsak isty, ci to 
moze mat suvis s 'APIC error...'. podla toho co sa mi podarilo zatial 
najst, by to suvis mohlo mat v pripade pouzitia SMP, ale to nepouzivam 
(nemam podporu v jadre, ani modul).

nenapada vas co by to mohlo sposobovat? system bezal zhruba 80dni bez 
najmensich problemov, no je pravda, ze som asi z neho negeneroval vysoky 
traffic.

ako klient bol pouzity 1u server, debian, vlastny kernel.

lab-elbrus:~# cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 11
model name      : Intel(R) Celeron(TM) CPU                1200MHz
stepping        : 4
cpu MHz         : 1202.989
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge 
mca cmov pat pse36 mmx fxsr sse
bogomips        : 2409.28

lab-elbrus:~# uname -a
Linux lab-elbrus 2.6.16.21-lab #1 PREEMPT Wed Jun 21 13:54:17 CEST 2006 
i686 GNU/Li

dik za nakopnutie spravnym smerom
--
mk



Další informace o konferenci linux