SFTP in progress

I've been working on some ideas for adding SFTP support into PTssh, that way it'll make the library much more complete. Plus, there's probably only a handful of people out there (like myself) that care about a good SSH library. I imagine the majority of the people that use SSH related libs use them for their SFTP support.

I'm only a little familiar with SFTP, having used it a tiny bit for random scripts and such. I must say that I do like the way that with SFTP you make requests for things. However, this also can hurt performance when doing things like trying to read a 100MB+ file down over SSH. Why? Well, you can make a request for the entire file, but more than likely, your SSH/SFTP server will only send back about 64KB at a time. I imagine that this was done so that if the user make a SFTP request to read a 1gig file, that they don't have to wait on the entire file to transfer before being able to make more requests. And for that, its a good design...

*update*
Seems that SFTP ver3 has this read file limitation: You request to read a big piece, but only get as much as the Sftp server wants to send you. Seems this limitation was addressed in Sftp version 6. During the client and server handshake, the client gets a "supported2" structure, which gives details about the maximum read size.

Now here's the downside to it. If you are on a fairly quick network and you do things in a rather single-threaded serial nature, you'll have a flow thats something like this:

Client Server
---> requests nth 64KB ----->
<-----gets nth 64KB ------<--
--> requests n+1th 64KB --->
< ----- gets n+1th 64KB <-----

This is HORRIBLE for efficiency sake! The latency between the client and the server becomes a huge factor in how fast a file will take to transfer. Say you want to transfer a 100MB file. With OpenSSH's sftp subsystem, you'll only get 64KB pieces at a time, so we would need to make 1,600 requests for the pieces of the file. Let's say you have a decent connections and the time to make a request and get that data (assuming request is handled instantly) is 0.1msec. This means that we will spend about 1600 x 0.5msec just waiting on requests to travel back and forth + the time required for actual data transmission. So we have a nasty overhead of 800msec ( 0.8sec). Sure this seems like a small amount of time, but it really hurts efficiency over high-speed links (1000Mbit+).

Here's some quick benchmarks that I took on some of my early SFTP code vs. the quicker SCP transfer. A setup a ramdisk on the linux box thats running OpenSSH v5.1 so that reading the file from that system wouldn't be limited or slowed down by slow hard drive reads. Reading from the ramdisk is incredibly fast ;p

Reading a 576,346,112 byte file from the linux host:
SCP average: 76MB/sec ~7.25 sec
SFTP average: 21MB/sec

Definitely room for a lot of optimization ;p