[linux] APIC error a TX timeout
Martin Kyrc
martin.kyrc na gmail.com
Čtvrtek Leden 11 13:33:07 CET 2007
ahojte,
vcera som spustal nejake benchmark testy smerom na web server
(nepodstatne). test spocival v generovani http requestov pomocou siege.
dnes rano som zistil, ze testy nezbehly uplne do konca a dovod bol ten,
ze klient (problemovy host) stratil ip konektivitu. zaujimave bolo vsak
to, ze L2 konektivita medzi klientom a switchom bola. ked som prepojil
klienta priamo s inym pc a tam spustil tcpdump, tak UDP packety od
'klienta' odchadzali, no ICMP, ani TCP som uz nevidel... mozno to ma
suvis s textom uvedenym pod log vystupom.
po chvily zistovania, ci nie je problem medzi stolickou a klavesnicou,
pohlad do logu ukazal toto:
<!-- test bol spusteny asi okolo 22:15 --->
Jan 10 22:42:08 lab-elbrus kernel: APIC error on CPU0: 01(01)
Jan 10 22:54:17 lab-elbrus kernel: APIC error on CPU0: 01(05)
Jan 10 22:54:17 lab-elbrus kernel: APIC error on CPU0: 05(0c)
Jan 11 00:10:38 lab-elbrus kernel: APIC error on CPU0: 0c(01)
Jan 11 00:34:31 lab-elbrus kernel: APIC error on CPU0: 01(01)
<!-- cut; zhruba v 15-20min intervaloch sa to opakovalo -->
<!-- cut; medzi tym nic podstatne -->
<!-- cut; tesne pred vypadkom je v logu toto: -->
Jan 11 01:53:42 lab-elbrus kernel: APIC error on CPU0: 02(04)
Jan 11 01:53:42 lab-elbrus kernel: APIC error on CPU0: 04(02)
Jan 11 01:55:02 lab-elbrus syslog-ng[19150]: STATS: dropped 0
Jan 11 02:00:48 lab-elbrus kernel: APIC error on CPU0: 02(04)
Jan 11 02:05:02 lab-elbrus syslog-ng[19150]: STATS: dropped 0
Jan 11 02:06:50 lab-elbrus kernel: NETDEV WATCHDOG: eth0: transmit timed out
Jan 11 02:06:53 lab-elbrus kernel: eth0: Transmit timeout, status 0c
0005 c07f media 10.
Jan 11 02:06:53 lab-elbrus kernel: eth0: Tx queue start entry 14333816
dirty entry 14333812.
Jan 11 02:06:53 lab-elbrus kernel: eth0: Tx descriptor 0 is 0008a042.
(queue head)
Jan 11 02:06:53 lab-elbrus kernel: eth0: Tx descriptor 1 is 0008a042.
Jan 11 02:06:53 lab-elbrus kernel: eth0: Tx descriptor 2 is 0008a04a.
Jan 11 02:06:53 lab-elbrus kernel: eth0: Tx descriptor 3 is 0008a04a.
Jan 11 02:06:53 lab-elbrus kernel: eth0: link up, 100Mbps, full-duplex,
lpa 0x41E1
Jan 11 02:07:05 lab-elbrus kernel: NETDEV WATCHDOG: eth0: transmit timed out
Jan 11 02:07:08 lab-elbrus kernel: eth0: Transmit timeout, status 0c
0005 c07f media 10.
Jan 11 02:07:08 lab-elbrus kernel: eth0: Tx queue start entry 4 dirty
entry 0.
Jan 11 02:07:08 lab-elbrus kernel: eth0: Tx descriptor 0 is 0008a04a.
(queue head)
Jan 11 02:07:08 lab-elbrus kernel: eth0: Tx descriptor 1 is 0008a04a.
Jan 11 02:07:08 lab-elbrus kernel: eth0: Tx descriptor 2 is 0008a04a.
Jan 11 02:07:08 lab-elbrus kernel: eth0: Tx descriptor 3 is 0008a04a.
Jan 11 02:07:08 lab-elbrus kernel: eth0: link up, 100Mbps, full-duplex,
lpa 0x41E1
<!-- a stale dookola... --->
ak sa nemylim (opravte ma), tak doslo k preplneniu tx buffera a z dovodu
timeoutu doslo k restartu eth0. to sa opakovalo aj rano, po
rekonfiguracii a nastartovani interface. nie som si vsak isty, ci to
moze mat suvis s 'APIC error...'. podla toho co sa mi podarilo zatial
najst, by to suvis mohlo mat v pripade pouzitia SMP, ale to nepouzivam
(nemam podporu v jadre, ani modul).
nenapada vas co by to mohlo sposobovat? system bezal zhruba 80dni bez
najmensich problemov, no je pravda, ze som asi z neho negeneroval vysoky
traffic.
ako klient bol pouzity 1u server, debian, vlastny kernel.
lab-elbrus:~# cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 11
model name : Intel(R) Celeron(TM) CPU 1200MHz
stepping : 4
cpu MHz : 1202.989
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 2
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 mmx fxsr sse
bogomips : 2409.28
lab-elbrus:~# uname -a
Linux lab-elbrus 2.6.16.21-lab #1 PREEMPT Wed Jun 21 13:54:17 CEST 2006
i686 GNU/Li
dik za nakopnutie spravnym smerom
--
mk
Další informace o konferenci linux