vmWare esxi 4 performance over iscsi

HatopHatop Alabama
edited December 2011 in Science & Tech
I've got a bit of a weird problem with my current setup, which I will describe.
I am attempting to run the following setup:
4x Dell r715 servers(4x12 core cpu, 128Gb ram)
2x Dell 10Gbit switches
1x Compellent storage array with 26Tb of space connected with multipathing to each of the Dell servers.

Here is the main problem inside this setup. I can verify that SAN read and write speeds are correct, sitting at a stable 550MB/s when connected to external hardware. However, when accessing this through a virtual machine(using the vmware software iscsi initiator, not the hardware one actually present on the server). When testing the copy of a small(800MB) file, I see write speeds sitting at a staggeringly slow 60MB/s. I can't find any error being presented, but its limiting my ability to move files between vms or use vMotion to migrate VMs across servers.

Any ideas on why storage performance would be so low inside a VM?
I can provide more detailed system information if that would be helpful.

Comments

  • HatopHatop Alabama
    edited November 2011
    I'm not sure if operating systems was the appropriate place. If this needs to be moved to networking and security, go for it mods.
  • primesuspectprimesuspect Beepin n' Boopin Detroit, MI Icrontian
    edited November 2011
    Nah, I think you're in the right area; it's probably just that anybody who could help you with this or offer intelligent input hasn't seen it yet :)
  • TushonTushon I'm scared, Coach Alexandria, VA Icrontian
    edited November 2011
    I've looked at it and pondered several times, but have nothing to offer.
  • HatopHatop Alabama
    edited November 2011
    Its a large mystery to me. I think the issue boils down to the virtual switch inside vmware esxi. I have no evidence, but there are only two pieces that should slow anything: the virtual switch and the software iscsi initiator.

    As it stands it takes a full 39.5 seconds to copy 810260891 bytes. This works out to 27.466 MB/s. Painfully slow. I wish the vmware host environment was more friendly to testing storage.
  • TushonTushon I'm scared, Coach Alexandria, VA Icrontian
    edited November 2011
    Can you test it with another VM (i.e. some Linux distro or something similar) to see if it is present in any VM, or perhaps a config issue with your present VMs? Having no experience with this beyond installing ESX once and playing with vCenter for a few minutes, I don't know that I'll be of much help. Are there "drivers" you need to be loading, or does the VM just see the SAN as another drive?
  • HatopHatop Alabama
    edited November 2011
    Another oddity. When creating a ramdisk inside the VM, filling it to capacity with /dev/urandom, and then dumping that to disk we see no performance issues to speak of. Sustained 300MB/s write. We can also get a sustained 130 MB/s reads moving TO ram. However, mixed read/write IO appears to blow it up inside the VM. That is where I see the very low numbers. Since my servers will largely be random reads with small amounts of writing in comparison, this really only bottlenecks me doing any backups from inside the VM.
  • TushonTushon I'm scared, Coach Alexandria, VA Icrontian
    edited November 2011
    Couldn't you just do a snapshot of the VM for backups, to avoid that? Or does that not make sense in your environment?
  • HatopHatop Alabama
    edited November 2011
    The drive appears as a regular scsi drive with 512 byte sectors. I've tested this in both RHEL 5.7 and CentOS 5.x(can't remember specific version). As far as drivers, vmware handles the iscsi part and hands off the space inside the vmdk to the host machine as two drives, each partitioned as ext3. There are a couple odd statements in dmesg I'm going to track down, mostly belonging to the drive itself where it says that cache data is unavailable. I'm very new to using vmware myself, having started migration from physical solaris boxes to virtual RHEL boxes about a month ago.
  • HatopHatop Alabama
    edited December 2011
    To add a little more fun to the issue, we had also seen problems with speed when moving files to our DR site. We may have seperate network issue, but it also could be an issue with our controller. When viewed by the DR, one of our controllers is showing down on its dedicated path to the DR. If the controller is trying to multipath, shows a down link externally but doesn't see the same internally, and attempts to use both when responding to IO it COULD cause the massive slow down. We've also tried setting our MTU from 64k to 12,000k, but I have no testing data on that currently.
  • Straight_ManStraight_Man Geeky, in my own way Naples, FL Icrontian
    edited December 2011
    Would someone clarify-- iSCSI is internet routing or for USB, or for direct-connected SATA of any speed, and like that IIRC. Network cabling??? Could that limit flow-through??? Port speed on the servers??? As to internet, is it loaded with other traffic at transfer time??? Is the right driver set running for the routers and servers???

    Trying to think about(brainstorm) what I would look at if I had this on my hands, to start with.

    John.
  • HatopHatop Alabama
    edited December 2011
    ISCSI is ethernet encapsulated SCSI traffic. It is not USB and works on top of SATA/SAS as the drives are housed inside a storage enclosure and accessed via a storage controller. Network cables are all rated for 10Gb(faster than SATA III). Port speed on all servers is also 10Gb, as shown in the direct hardware connection test that posted 550MB/s(Around 4.4Gb/s which would be correct with 10K rpm SAS drives plus TCP/IP overhead). This is all local traffic to the switches, none of it is routing to an external network. The problem only occurs inside VMware, and isn't part of the external replication problem. I'm not sure I can answer the question about driver sets for routers, since they are stand alone hardware that only interfaces via standard fiber or ethernet. The underlying switching and cabling is not the current issue, as a direct ISCSI connection from a physical server is fine. The servers all run vmWare ESXi 4.x and hence use only the modules allowed by vmWare.
Sign In or Register to comment.