Vex Star

Vex Star

Computers and Programming

Vex Star RSS Feed
 
 
 
 

Looking for a fast way to compare two large databases

i have two mysql databases that are supposed to be relatively synched. One is the student database for my school (about 28,000 records), and the other is a Moodle database (about 33,000) which should contain a user account for each student.

The two databases share a few fields, the primary key being the student’s idnumber. The other important fields are firstname, lastname, and email.

Im trying to write a php program to report the differences between the two databases. For example, the moodle database has some instructors and other users who are not in the Student database. Sometimes a student will change their name or email address, etc etc.

What methods would you use to do this semi-efficiently?

I’ve thought of sorting the records in arrays by their idnumber (6 digit unique integer) so that i dont have to do a nested mysql select statement like

for each idnumber in student_ids
select firstname, lastname from moodle_table where id = idnumber

because 28000 select statements takes awhile. However, then i would still be searching through two arrays that dont quite match and that would take awhile as well.

i have two mysql databases that are supposed to be relatively synched. One is the student database for my school (about 28,000 records), and the other is a Moodle database (about 33,000) which should contain a user account for each student.

The two databases share a few fields, the primary key being the student’s idnumber. The other important fields are firstname, lastname, and email.

Im trying to write a php program to report the differences between the two databases. For example, the moodle database has some instructors and other users who are not in the Student database. Sometimes a student will change their name or email address, etc etc.

What methods would you use to do this semi-efficiently?

I’ve thought of sorting the records in arrays by their idnumber (6 digit unique integer) so that i dont have to do a nested mysql select statement like

for each idnumber in student_ids
select firstname, lastname from moodle_table where id = idnumber

because 28000 select statements takes awhile. However, then i would still be searching through two arrays that dont quite match and that would take awhile as well.

use a hash table
28000 and 33000 are not large databases. The problem is you are doing individual selects. Why don’t you just grab all the data you need from each in ONE select on each database, and then sort/compare? With 60K rows you shouldn’t run out of memory doing that in PHP.

SELECT firstname, lastname, id FROM moddle_table ORDER by id;

Do that twice, and fill a hash or an array of hashes or an array of arrays: whatever. Then do your sort/compare with a big ass grep for each value and note the differences.
I kinda figured these were independent, and not on one database server?

No related posts.

Related posts brought to you by Yet Another Related Posts Plugin.

Leave a Reply