So lately I’ve been working on a multi-user database on Amazon EC2 using Ubuntu Server 9.10 (Karmic Koala) and postgreSQL 8.3.
I wrote in the last entry about how cool it is that you can connect hard drives so easily, and I want to say something more about it. While I was reading the fine manual, I learned about partitioning the database for increased performance.
When I say that my database is fully partitioned now, that means that I’ve spread my data out over multiple drives so that multiple read and write operations can be going on at the same time, where a single drive would be wasting a lot of time in seek mode. Thanks to Amazon’s Elastic Block Storage, I can create disk volumes that are only as large as I need them to be. In this case it’s 1-5GB (when’s the last time you saw a 2GB HDD for sale?) If I was building a computer on my own, the smallest I could get would be 40GB, so if I wanted to attach 5 of those, it would cost quite a bit even using small drives. But on Amazon, I only pay $0.15 / GB for every month that I have the volumes reserved. Checking my current AWS statement, for the 15 days in November it’s only cost me ~$1.50 for the hard drive space and ~$20 for the server time. Sweet!
Now, it may sound really great when I say it like this, but I’m still not sure how well the drives perform vs a non-cloud drive. So the next step is, I’m going to try to figure out what the standard benchmarking procedure is and do some testing.
Here’s the specs for my drive setup:
5GB mounted from /dev/sdh – used to store raw data
1GB mounted from /dev/sdg – used to store write-ahead logs for postgreSQL
2GB mounted from /dev/sdh – used to store pre-generated queries of type A
3GB mounted from /dev/sdi – used to store pre-generated queries of type B
2GB mounted from /dev/sdj – used to store pre-generated queries of type C
2 Comments
You’re implementing on one level to achieve effeciencies on a different level.
The small logical drives are likely to be on the same physical drive or disk channel, and therefore not able to simultaneously operate the drives.
Thanks John.
This seems to be a problem with my understanding of Amazon’s EBS system. If they are indeed allocating me logical drives on one physical drive instead of on a random physical drive chosen from among the thousands that make up the EBS system, which is what I was hoping for, then I guess I might as well have just left everything on one partition then to make the setup a whole lot easier.
I never did get around to doing the benchmarking… comparing partitioned vs non-partitioned postgreSQL performance would have answered this question with solid evidence.