So lately I’ve been working on a multi-user database on Amazon EC2 using Ubuntu Server 9.10 (Karmic Koala) and postgreSQL 8.3.
I wrote in the last entry about how cool it is that you can connect hard drives so easily, and I want to say something more about it. While I was reading the fine manual, I learned about partitioning the database for increased performance.
When I say that my database is fully partitioned now, that means that I’ve spread my data out over multiple drives so that multiple read and write operations can be going on at the same time, where a single drive would be wasting a lot of time in seek mode. Thanks to Amazon’s Elastic Block Storage, I can create disk volumes that are only as large as I need them to be. In this case it’s 1-5GB (when’s the last time you saw a 2GB HDD for sale?) If I was building a computer on my own, the smallest I could get would be 40GB, so if I wanted to attach 5 of those, it would cost quite a bit even using small drives. But on Amazon, I only pay $0.15 / GB for every month that I have the volumes reserved. Checking my current AWS statement, for the 15 days in November it’s only cost me ~$1.50 for the hard drive space and ~$20 for the server time. Sweet!
Now, it may sound really great when I say it like this, but I’m still not sure how well the drives perform vs a non-cloud drive. So the next step is, I’m going to try to figure out what the standard benchmarking procedure is and do some testing.
Here’s the specs for my drive setup:
5GB mounted from /dev/sdh – used to store raw data
1GB mounted from /dev/sdg – used to store write-ahead logs for postgreSQL
2GB mounted from /dev/sdh – used to store pre-generated queries of type A
3GB mounted from /dev/sdi – used to store pre-generated queries of type B
2GB mounted from /dev/sdj – used to store pre-generated queries of type C
