John Resig - SQL Issues

SQL Issues

I’ve been doing a lot of SQL work on the Schedule Maker these past couple of days, when I realized something. The SQL query for a relatively common command is going to be absolutely huge! Observe:

-- Get all my friend's current schedules: 
SELECT * FROM   friends, semesters, schedules, schedule_sections,   
sections, section_times WHERE   friends.user_id='jeresig' AND   
semesters.semester_current = '1' AND   
schedules.semester_id = semesters.semester_id AND   
schedules.schedule_default = '1' AND   
schedules.user_id=friends.friends.friend_id AND
chedule_sections.schedule_id=schedules.schedule_id AND 
sections.section_id=schedule_sections.section_id AND  
section_times.section_id=sections.section_id;

That’s six joins. Insane. Especially since I was planning on having the results of this query be the main page whenever you log in to your account. So, this leads me to the next issue: How can this be resolved? It’s not completely apparent, as of yet, but these are a few of the modifications that I’ve made so far:

I’ve switched from PostgreSQL back to MySQL. MySQL is faster and more supported. I was using PostgreSQL originally for its nice advanced features (stored procedures, schemas, etc.) when I found that I was only using schemas in the end. That’s hardly a cause to stay with the database, especially considering that I can get better performance elsewhere. So, I’m still digging around as to how to work around this, I’ll have to see.
Searches were being performmed in memory, this has been moved to MySQL Full Text searches. In memory searches are fast (obviously) however, costly. I was storing about 40mb of data in memory per school. These adds up after a while and doesn’t work to well when I want to add more data or restart the server. First off, I had no idea that full text searches existed and man are they nice. Boolean queries, query expansion – the works! It’s search engine for dummies time! I’m definitely going to have to play around with this some more.
Finally, I began to realize that so many of my data was going to be virtually static (schedules, course data, etc. all change very infrequently). This is starting to push me in the direction of having a static XML copy of all data in a nicely organized directory structure. Observe:
```
/users/ /users/jeresig/
/users/jeresig/friends/
/users/jeresig/friends/fred -> /users/fred
/users/jeresig/friends/ted -> /users/ted
/users/jeresig/schedules/
/users/jeresig/schedules/20043/mysched.xml
/users/jeresig/schedules/20043/default.xml ->
  /users/jeresig/schedules/20043/mysched.xml
/users/jeresig/schedules/current.xml ->
  /users/jeresig/schedules/20043/default.xml
```
Now observe this simple shell query:
```
cat /users/jeresig/friends/*/schedules/current.xml
```
Using only that command I can effectively acheive the same result of the massive SQL query above. I’m definitely going to explore this more as I’m still not sure if this is a viable way of going about this, but I am intrigued by the simplicity in it. (Also, since all the files are static, the caching implications are going to be quite interesting.)

I hope to report back soon with some of my final findings. I hope to talk with some of my database guru friends and see what we can cook up as being an ideal solution.

Posted: March 24th, 2005

Subscribe for email updates

6 Comments (Show Comments)

Joseph Scott (April 6, 2005 at 4:55 pm)

Bummer part about going with Full Text searches in MySQL is that only work with MyISAM tables so you don’t get any of the InnoDB features like foreign keys, non-table level locks, etc. If speed is a major concern (over most everything else) then take a look at things like SQLite and Berkeley DB.

Having said all that, I’m still a fan of PostgreSQL :-)
John Resig (April 6, 2005 at 7:00 pm)

Yeah, it is a real bummer.

It’s no so much speed as its the complexity of the relational database that’s making this difficult – doing so many joins to find something (seemingly) so trivial is quite aggrevating.

On another note, I’ve started looking into Native XML Databases, and I’m getting quite excited. It seems like it might be the combination of speed and flexibility that I desire, I’ll definitely be reporting back with more info on it as I learn more.

Carl Manaster (April 9, 2005 at 10:45 am)

I don’t know whether mysql supports it, but I’ve found the JOIN syntax to be a lot clearer and to simplify these kinds of queries a lot – they can also readily be wrapped up into a nice VIEW. Here is an example that I’m afraid will lose all its nice formatting to wrapping (all the table names line up, as do all the ONs and the equals signs).

SELECT *
FROM   friends
JOIN   schedules         ON schedules.user_id             = friends.friend_id
JOIN   semesters         ON semesters.semester_id         = schedules.semester_id
JOIN   schedule_sections ON schedule_sections.schedule_id = schedules.schedule_id
JOIN   sections          ON sections.section_id           = schedule_sections.section_id
JOIN   section_times     ON section_times.section_id      = sections.section_id
WHERE  friends.user_id            = 'jeresig'
AND    semesters.semester_current = '1'
AND    schedules.schedule_default = '1'

Here we are again, with aliases (I would hope they’re supported by mysql, but again I don’t know); uniform-width aliases can be cryptic, but make lining everything up much easier – also maybe this won’t suffer the wrapping problems I anticipate from the long form above:

SELECT *
FROM   friends           frn
JOIN   schedules         sch ON sch .user_id     = frn.friend_id
JOIN   semesters         sem ON sem .semester_id = sch.semester_id
JOIN   schedule_sections s_s ON s_s.schedule_id  = sch.schedule_id
JOIN   sections          sec ON sec.section_id   = s_s.section_id
JOIN   section_times     s_t ON s_t.section_id   = sections.section_id
WHERE  frn.user_id          = 'jeresig'
AND    sem.semester_current = '1'
AND    sch.schedule_default = '1'

Monospaced font for best results, of course.

If your objection to all the joins is performance, I’d have to say that I’ve found most such fears to be superstition. But, again, I don’t know mysql.

John Resig (April 9, 2005 at 11:38 am)

The issue is definitely performance. Normally, with a small dataset it wouldn’t be a problem, but I’m working with some, potentially, very large sets of information that will constantly be increasing. When you start looking at things in multiples of hundreds of thousands, it begins to get pretty tricky. But thanks for the pointers!
health insurance in nj (May 28, 2007 at 9:13 pm)

health insurance in nj

symmetrically!attenuates advantages
health insurance for the self employed (May 29, 2007 at 1:49 am)

health insurance for the self employed

onslaught upgrading.sabbath?misty barrier Algenib:

Comments are closed.
Comments are automatically turned off two weeks after the original post. If you have a question concerning the content of this post, please feel free to contact me.