tag:blogger.com,1999:blog-2288976384834646763.comments2022-12-02T16:16:09.826+00:00High-Availability ObsessionUnknownnoreply@blogger.comBlogger32125tag:blogger.com,1999:blog-2288976384834646763.post-30274042319162412782015-09-30T18:47:09.241+00:002015-09-30T18:47:09.241+00:00Excelente! Muchas Gracias.Excelente! Muchas Gracias.Matiasnoreply@blogger.comtag:blogger.com,1999:blog-2288976384834646763.post-89305819509107499312015-08-15T05:56:20.403+00:002015-08-15T05:56:20.403+00:00Thanks - that worked perfectly Thanks - that worked perfectly David Goodwinhttps://www.blogger.com/profile/10685760237399362222noreply@blogger.comtag:blogger.com,1999:blog-2288976384834646763.post-76471838028149017892012-11-07T15:15:26.291+00:002012-11-07T15:15:26.291+00:00I just thought I would point out that your Blog po...I just thought I would point out that your Blog post is still helpful, 2 years after you wrote it. Thanks!Unknownhttps://www.blogger.com/profile/15337522630419520144noreply@blogger.comtag:blogger.com,1999:blog-2288976384834646763.post-71286427253400278202011-06-30T08:54:04.054+00:002011-06-30T08:54:04.054+00:00Hi Anon,
thanks for comment. Actually I was trying...Hi Anon,<br />thanks for comment. Actually I was trying pure linear download, but as you said, it had negative effects with more than one seeds/peers. Also I wouldnt recommend to not have at least few random start positions on private trackers, as there could be some sort of detection :)danynoreply@blogger.comtag:blogger.com,1999:blog-2288976384834646763.post-6476213757696683762011-06-28T02:40:38.815+00:002011-06-28T02:40:38.815+00:00Thats realy what i was looking for!! In new stable...Thats realy what i was looking for!! In new stable 0.8.9, there is implemented that first and last chunks have higher priorities but this does even more.<br /><br />For my project, I just commented out:<br /><br />// if ((random() & 63) == 0) {<br /> // m_position = random() % size();<br /> // queue->clear();<br /> // }<br /><br />and changed:<br /><br /> if (m_position == invalid_chunk)<br /> m_position = random() % size();<br /><br /> advance_position();<br />}<br /><br />to<br /><br /> if (m_position == invalid_chunk)<br /> +m_position;<br /><br /> advance_position();<br />}<br /><br />Like that, you get the chunks in a row :)<br /><br />It works for me because i just have one seeder, if you have more, it is probably not the best solution when you have a look at the comments why they do the randomization:<br /><br /> // Randomize position on average every 16 chunks to prevent<br /> // inefficient distribution with a slow seed and fast peers<br /> // all arriving at the same position.<br /><br />But thanks anyway for the hint, it helped me a lot!Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-2288976384834646763.post-16951561095937664762011-01-01T19:20:36.141+00:002011-01-01T19:20:36.141+00:00I could not "see" the ST2540...UNTIL I u...I could not "see" the ST2540...UNTIL I upgraded to latest kernel 2.6.32-100.24.1.el5. We are using Oracle Unbreakable Linux. I can't seem to go over 4GB with a QLogic QLE2562 card to the ST2540 ? What gives ?<br /><br />QLogic Fibre Channel HBA Driver: 8.03.01.01.32.1-k9<br /> QLogic Fibre Channel HBA Driver: 8.03.01.01.32.1-k9<br /> QLogic QLE2562 - Sun StorageTek 8Gb FC PCIe HBA, dual port<br /> QLogic Fibre Channel HBA Driver: 8.03.01.01.32.1-k9<br /> QLogic QLE2562 - Sun StorageTek 8Gb FC PCIe HBA, dual portEd Silvahttp://silvex.smugmug.comnoreply@blogger.comtag:blogger.com,1999:blog-2288976384834646763.post-28177468695253523812010-11-16T10:54:24.436+00:002010-11-16T10:54:24.436+00:00Yes :) I wanted just to be sure, that you dont for...Yes :) I wanted just to be sure, that you dont forget anything, because lvm.conf is copied to initrd boot file, so any mistakes here are fatal :) This copy occurs during install of new kernel, or in step 5 of mine original post.Danielhttps://www.blogger.com/profile/08032090875660846868noreply@blogger.comtag:blogger.com,1999:blog-2288976384834646763.post-69487933676169206992010-11-16T10:26:34.575+00:002010-11-16T10:26:34.575+00:00In fact, I think
filter = [ "a/dev/sda[0-9]...In fact, I think <br /><br />filter = [ "a/dev/sda[0-9]+/", "r/.*/" ]<br /><br />is all we really want as we only use LVM on the internal RAID1 array (/dev/sda).<br /><br />That makes sense, doesn't it? (-:Benhttps://www.blogger.com/profile/10094191761311369134noreply@blogger.comtag:blogger.com,1999:blog-2288976384834646763.post-43421840286220560662010-11-16T10:18:57.271+00:002010-11-16T10:18:57.271+00:00We don't even use LVM on sdb as far as I can t...We don't even use LVM on sdb as far as I can tell. Only sda. So with a little tweaking that might work perfectly.<br /><br />I'll give it a shot. Thank you.Benhttps://www.blogger.com/profile/10094191761311369134noreply@blogger.comtag:blogger.com,1999:blog-2288976384834646763.post-33778383312565721682010-11-15T15:57:50.695+00:002010-11-15T15:57:50.695+00:00Ben,
in filter line include only block devices, w...Ben,<br /><br />in filter line include only block devices, which you think LVM should scan during boot or normal operation. So if you have LVM on bootdisks sda and sdb, include only those, so maybe this will work:<br /><br />filter = [ "a/dev/mpath/.*/", "a/dev/sda[0-9]+/", "a/dev/sdb[0-9]+/", "a/dev/md.*/", "r/.*/" ] <br /><br />If you dont use LVM on bootdisks, but only on multipath devices, use something like this:<br /><br />filter = [ "a/dev/mpath/.*/", "a/dev/md.*/", "r/.*/" ] <br /><br />But please, after changing lvm.conf, everything test first using "vgscan -vv" or "vgscan -vvv", so you dont make mistake.<br /><br />DanielDanielhttps://www.blogger.com/profile/08032090875660846868noreply@blogger.comtag:blogger.com,1999:blog-2288976384834646763.post-28084649502819394712010-11-15T14:06:05.556+00:002010-11-15T14:06:05.556+00:00What happens if we have a multipath device called ...What happens if we have a multipath device called /dev/sdag? For example, on another problem server we have the following:<br /><br /># multipath -ll<br />[...]<br />vold08 (3600a0b8000498550000001ff492aa0ca) dm-11 SUN,LCSM100_F<br />[size=1.4T][features=1 queue_if_no_path][hwhandler=1 rdac][rw]<br />\_ round-robin 0 [prio=100][active]<br /> \_ 3:0:0:1 sdd 8:48 [active][ready]<br />vold10 (360050768019c038e9000000000000003) dm-2 IBM,2145<br />[size=500G][features=1 queue_if_no_path][hwhandler=0][rw]<br />\_ round-robin 0 [prio=200][active]<br /> \_ 6:0:0:1 sdai 66:32 [active][ready]<br /> \_ 6:0:2:1 sdaw 67:0 [active][ready]<br /> \_ 4:0:0:1 sdf 8:80 [active][ready]<br /> \_ 4:0:2:1 sdt 65:48 [active][ready]<br />\_ round-robin 0 [prio=40][enabled]<br /> \_ 4:0:3:1 sdaa 65:160 [active][ready]<br /> \_ 6:0:1:1 sdap 66:144 [active][ready]<br /> \_ 6:0:3:1 sdbd 67:112 [active][ready]<br /> \_ 4:0:1:1 sdm 8:192 [active][ready]<br />vold07 (3600a0b8000498550000001fd492aa0a0) dm-10 SUN,LCSM100_F<br />[size=1.4T][features=1 queue_if_no_path][hwhandler=1 rdac][rw]<br />\_ round-robin 0 [prio=100][active]<br /> \_ 3:0:0:0 sdc 8:32 [active][ready]<br />vold01 (3600a0b800038b29a000001f0477b118c) dm-8 SUN,LCSM100_F<br />[size=1.5T][features=1 queue_if_no_path][hwhandler=1 rdac][rw]<br />\_ round-robin 0 [prio=100][active]<br /> \_ 1:0:0:0 sdb 8:16 [active][ready]<br />\_ round-robin 0 [prio=0][enabled]<br /> \_ 5:0:0:0 sdag 66:0 [active][ghost]<br /><br />It's dm-8 that's the problem child. We get the I/O errors on sdag. Will your suggested filter work with all of the above? Would it need to be modified to something of the form <br /><br />filter = [ "a/dev/mpath/.*/", "a/dev/sda[0-9]/", "a/dev/sdb[0-9]/", "a/dev/md.*/", "r/.*/" ] <br /><br />instead?Benhttps://www.blogger.com/profile/10094191761311369134noreply@blogger.comtag:blogger.com,1999:blog-2288976384834646763.post-73266504426573655962010-11-15T13:18:43.705+00:002010-11-15T13:18:43.705+00:00Hi Ben,
my first guess is that you are experienci...Hi Ben,<br /><br />my first guess is that you are experiencing errors due to default /etc/lvm/lvm.conf file. By default it is accepting to scan all devices (which includes ghost ones). Please modify it according to step 2 in my original post.<br /><br />2. In lvm.conf, it is good to modify two lines, to reduce LVM discovery time.<br /><br /> filter = [ "a/dev/mpath/.*/", "a/dev/sda.*/", "a/dev/sdb.*/", "a/dev/md.*/", "r/.*/" ]<br /> types = [ "device-mapper", 1]<br /><br />Please post results if it helped.<br />DanielDanielhttps://www.blogger.com/profile/08032090875660846868noreply@blogger.comtag:blogger.com,1999:blog-2288976384834646763.post-44667118833023289172010-11-15T09:47:19.486+00:002010-11-15T09:47:19.486+00:00http://fpaste.org/iWYy/
/etc/lvm/lvm.conf
http://...http://fpaste.org/iWYy/<br />/etc/lvm/lvm.conf<br /><br />http://fpaste.org/G9vx/<br />/etc/modprobe.conf<br /><br />http://fpaste.org/2mKm/<br />/etc/multipath.conf<br /><br />NOTE: They'll expire within 24 hours.Benhttps://www.blogger.com/profile/10094191761311369134noreply@blogger.comtag:blogger.com,1999:blog-2288976384834646763.post-35304732532979543302010-11-12T18:13:25.196+00:002010-11-12T18:13:25.196+00:00Hi Ben,
can you please post your /etc/multipath.c...Hi Ben,<br /><br />can you please post your /etc/multipath.conf, /etc/modprobe.conf and /etc/lvm/lvm.conf. During normal operation, there should be no IO errors, only during boot.<br /><br />DanielAnonymousnoreply@blogger.comtag:blogger.com,1999:blog-2288976384834646763.post-75468238402530323782010-11-04T09:43:28.052+00:002010-11-04T09:43:28.052+00:00Hi there,
We're running fully patched and upd...Hi there,<br /><br />We're running fully patched and updated RHEL5.5 on some Sun X4600M2 servers with Emulex FC cards connected to Sun 2540 arrays. For those we're multipathing we see huge amounts of buffer I/O errors to the ghost path on every boot and regularly during operation.<br /><br />Currently we're booting with "pci=noacpi irqpoll" added to the kernel line, have rebuilt the initrd with "--preload=scsi_dh_rdac" and have "alias scsi_hostadapter3 lpfc<br />alias scsi_hostadapter4 dm_multipath<br />alias scsi_hostadapter5 scsi_dh_rdac" in modprobe.conf all without being able to get rid of the buffer I/O errors.<br /><br />Can you please give some suggestions on what we're missing? I'm beginning to tear my hair out over this. Obviously there are no functional problems (everything works):<br /><br />datavol (3600a0b800038b3e500000224477df1d2) dm-2 SUN,LCSM100_F<br />[size=1.5T][features=1 queue_if_no_path][hwhandler=1 rdac][rw]<br />\_ round-robin 0 [prio=100] [active]<br /> \_ 2:0:0:0 sdc 8:32 [active] [ready]<br />\_ round-robin 0 [prio=0][enabled]<br /> \_ 1:0:0:0 sdb 8:16 [active] [ghost]<br /><br />but it's a annoying to think things could be cleaner.<br /><br />With thanks,<br /><br />BenBenhttps://www.blogger.com/profile/10094191761311369134noreply@blogger.comtag:blogger.com,1999:blog-2288976384834646763.post-24127682568520028732010-05-18T14:01:07.940+00:002010-05-18T14:01:07.940+00:00I have unofficial information from friend Sun engi...I have unofficial information from friend Sun engineer, that these LUN trespasses are known issue and that it should be fixed in RHEL 5.5. I cant confirm this info, but hope it helps :)Danielhttps://www.blogger.com/profile/08032090875660846868noreply@blogger.comtag:blogger.com,1999:blog-2288976384834646763.post-82083470036679839812010-05-06T19:53:03.272+00:002010-05-06T19:53:03.272+00:00Thanks for the info. There are some differences, b...Thanks for the info. There are some differences, but I don't believe they should result in sporadic AVT. But, I don't have much FC/SAN experience. Would you think it to be reasonable?<br /><br />To give you an update, we've got high level people within Sun/Oracle scrambling on this now. Had a tense meeting yesterday.DANiELhttps://www.blogger.com/profile/10384183436853363057noreply@blogger.comtag:blogger.com,1999:blog-2288976384834646763.post-9856265765566900512010-04-29T08:44:46.299+00:002010-04-29T08:44:46.299+00:00Hi,
yes both hosts see both controllers.
For examp...Hi,<br />yes both hosts see both controllers.<br />For example<br />s1-dbbackup1 (3600a0b80005b156d000007664ae719e7) dm-12 SUN,LCSM100_F<br />[size=144G][features=1 queue_if_no_path][hwhandler=1 rdac][rw]<br />\_ round-robin 0 [prio=200][active]<br /> \_ 8:0:2:9 sdak 66:64 [active][ready]<br /> \_ 9:0:2:9 sdal 66:80 [active][ready]<br />\_ round-robin 0 [prio=0][enabled]<br /> \_ 8:0:0:9 sdh 8:112 [active][ghost]<br /> \_ 9:0:0:9 sdv 65:80 [active][ghost]<br /><br />Here is our setup plus debug messages from last boot.<br /><br />db1 (sorry for format, its from "script"):<br />http://pastebin.org/191638<br />db2:<br />http://pastebin.org/191641<br /><br />I see you have little differences in multipath.conf (no_path_retry param and blacklist), lvm.conf (no filter for SAN devices) and modprobe.conf(no scsi_hostadapter4 qla2xxx<br />alias scsi_hostadapter5 dm-multipath<br />alias scsi_hostadapter6 scsi_dh_rdac lines, but you have special qlport_down_retry parameter defined). Even if you dont use lvm, you should made filter changes to lvm.conf, because LVM boot scan is performed.<br /><br />Hmm, actually I dont have idea, why you are observing such problems.. pretty strange, will be thinking about it..Danielhttps://www.blogger.com/profile/08032090875660846868noreply@blogger.comtag:blogger.com,1999:blog-2288976384834646763.post-6872871763676197882010-04-13T16:22:28.410+00:002010-04-13T16:22:28.410+00:00We're using the same HBAs, I don't have th...We're using the same HBAs, I don't have the fw on hand, but it should be the recent as of this fall. We upgraded fw on everything for all of our systems. SANs are latest as well.<br /><br />We've got a mix of x4140s and x4450s. I don't remember what PCIe slot the HBA is in, however, those were installed by our reseller, not by us.<br /><br />We are not using LVM, not needed for our environment.<br /><br />Your use is considerably more than ours. But, regardless, I can create controller resets on a single 250g volume (FS agnostic, used both ext3 and OCFS2), with nothing else using the SAN, no other volumes, and only a single host with an initiator. The behavior only manifests itself when both controllers are fibre attached. If we unplug one controller (remove failover possibility), the configuration is stable. We know for a fact that the controller is resetting: I've become very familiar with the SAN service adviser dumps. <br /><br />I pasted multipath.conf, multipath -v2 -ll, uname -a, modprobe.conf and logs from one of our machines that was observing the behavior. We are using a rebuilt initrd that preloads scsi_dh_rdac. <br /><br />http://pastebin.org/149331<br /><br />Do both of your hosts see both controllers? ie, multipath -v2 -ll shows 2 active paths and 2 ghost paths per LUN?<br /><br />I appreciate your help on this. Sounds like we have a lot of similarities in our setup, I'm jealous of yours though ;)DANiELhttps://www.blogger.com/profile/10384183436853363057noreply@blogger.comtag:blogger.com,1999:blog-2288976384834646763.post-75575065222367227712010-04-08T09:09:43.670+00:002010-04-08T09:09:43.670+00:00We use in servers also these cards:
QLogic Fibre ...We use in servers also these cards:<br /> QLogic Fibre Channel HBA Driver: 8.03.00.1.05.05-k<br /> QLogic QLE2560 - Sun StorageTek 8Gb FC PCIe HBA, single port<br /> ISP2532: PCIe (2.5Gb/s x8) @ 0000:03:00.0 hdma+, host#=8, fw=4.04.09 (85)<br /><br /><br />Yes, we talked recently to LSI guys, so I know their possition about not supporting dm-multipath. I was really surprised. <br /><br />I have few things for you if you wanna further investigate: are u using latest fcode (firmware) for your HBAs? Can you paste me in pastebin.org all configs I was mentioning in original article (multipath.conf, lvm.conf, ...) + "uname -a" + maybe logs? We are not observing controller resets.. On one cluster we use about 10 LUNs from disk array (which are all owned by one controller) and on second cluster also about 10 LUNs (which are owned by second controller). <br /><br />What exactly servers are you using? For example in X4270 servers you should put HBAs only in number 0 and 3 PCIe slots!Danielhttps://www.blogger.com/profile/08032090875660846868noreply@blogger.comtag:blogger.com,1999:blog-2288976384834646763.post-91524074491737814232010-04-02T18:46:50.210+00:002010-04-02T18:46:50.210+00:00Sorry its taken a second to get back to you. Our h...Sorry its taken a second to get back to you. Our highload issues may be associated with other problems. FYI, you should read this link: <br /><br />https://lists.linux-foundation.org/pipermail/bugme-new/2007-June/016354.html<br /><br />I'm currently in communication with Sun/Oracle as well as LSI (the actual manufacturer of the ST2540) and their official position is that DM-MP is not supported. I believe it to be due to how the ST2540 is asymmetrical. Here is another link that talks about it somewhat. The ST6140 is more or less the same as the ST2540:<br /><br />http://southbrain.com/south/2009/10/qla-linux-and-rdac---sun-6140.html<br /><br />I'd be curious to know what your take on all this is. We started observing controller resets for no known reason back in February and have been diagnosing since. I'm of the opinion that it is due to AVT. I'm currently testing a multipath configuration where the second controller is disabled; multiple paths, but no failover. Failover is not essential in our environment, but would be nice to have. We are unfortunately unable to use mppRDAC, as far as we can tell, as it does not actually support our HBAs. Sun is somewhat confused as to what is and is not supported hardware: their documents conflict. Officially, our HBAs (QLE2560s) and switches (QLogic 5802s) are not compatible with the ST2540. Needless to say, we are very happy with our reseller...DANiELhttps://www.blogger.com/profile/10384183436853363057noreply@blogger.comtag:blogger.com,1999:blog-2288976384834646763.post-91369253680620545392010-04-01T07:57:09.125+00:002010-04-01T07:57:09.125+00:00Little update, I cant confirm this yes, but accord...Little update, I cant confirm this yes, but according https://bugzilla.redhat.com/show_bug.cgi?id=515326 they fixed correct loading order of scsi_dh_rdac and qla2xxx modules in latest update RHEL 5.5Danielhttps://www.blogger.com/profile/08032090875660846868noreply@blogger.comtag:blogger.com,1999:blog-2288976384834646763.post-87573777780986267612010-03-15T12:33:14.898+00:002010-03-15T12:33:14.898+00:00Please try it again without using CPU intensive ss...Please try it again without using CPU intensive ssh+rsync. Then we can isolate problem. Try netcat pump, about which I wrote here: http://www.ha-obsession.net/2007/12/fast-remote-dir-copy-using-netcat.htmlDanielhttps://www.blogger.com/profile/08032090875660846868noreply@blogger.comtag:blogger.com,1999:blog-2288976384834646763.post-43452506304382996152010-03-12T19:19:34.908+00:002010-03-12T19:19:34.908+00:00Have you experienced any unusually high loads? I&#...Have you experienced any unusually high loads? I've been rsync'ing over ssh a 200GB dataset via a gigabit switch to a SAN volume via a machine fibre attached to the SAN. This drives loadavgs to, at its highest, 25, with iowait hovering between 60-80%. Very unexpected for a 50-60MB/s transfer rate...DANiELhttps://www.blogger.com/profile/10384183436853363057noreply@blogger.comtag:blogger.com,1999:blog-2288976384834646763.post-77410393897575614392010-02-12T09:42:20.947+00:002010-02-12T09:42:20.947+00:00Yes, those few I/O errors come from SAN device. An...Yes, those few I/O errors come from SAN device. And for automated installs: I would personaly do it simpler way just like you do (in post install script, or not at all during install), as there are less problems (no need to recreate boot image every time new OS update comes). Os eventually boots and then I would solve problems :)Danielhttps://www.blogger.com/profile/08032090875660846868noreply@blogger.com