Django Model For Variable Type Of Data
Solution 1:
A table contains the rows that make some statement (parameterized by column names) true. A table should only have columns as long as you can make a single statement using them.
Put many but related columns together
If a (user,exercise,date) always has a weight and a date, have a single table.
-- user [user] in exercise [exercise] on [date] weighed [weight] kg and was [height] m tall
PRstuff(user,exercise,date,weight,height)
But if a (user,exercise,date) might have a weight but not a date, have separate tables.
-- user [user] in exercise [exercise] on [date] lifted [weight] kg
PRlift(user,exercise,date,weight)
-- User [user] in exercise [exercise] on [date] jumped [height] m
PRjump(user,exercise,date,height)
Conditional columns are complicated
It is possible to have a statement/table like your third example:
-- user [user] did [exercise] on [date]-- AND ( lifted != blank and they lifted [weight] kg OR lifted = blank and they didn't lift )-- AND ( jumped != blank and they jumped [height] m OR jumped = blank and they didn't jump )
PR(user,exercise,date,weight,height)
But as you can see your query statements, which are combinations of the table statements, and your SQL expressions, which are combinations of the tables, become complicated. Basically, in querying you have to constantly cut up this table into the separate non-conditional table versions anyway.
Don't use record-typed columns
Sometimes we could have a column whose type is made of fixed parts. But if you want to query about the parts using logical conditions like in SQL then instead you should make the parts into columns of the table.
-- user [user] in exercise [exercise] on [date] weighed [record.weight] kg and was [record.height] m tall
PRrecord(user,exercise,date,record)
SELECT * FROM PRrecord
WHERE PRrecord.record.weight = 100 -- really, record_dot(PRrecord.record,'weight')=100
Here the first dot is a database table operation but the second dot is a programming language record operation. The DBMS cannot optimize your query because it optimizes table operations, not data type operations. Basically it has to get a whole bunch of rows not looking at record values then call record operator dot then call field equality then throw away lots of rows.
SELECT*FROM PR
WHERE PR.weight =100
Now the DBMS can combine field equality into how it optimizes getting rows because you have only used the table versions of dot.
Don't use container-typed columns
Sometimes we could have a column whose type is made of a collection of similar parts. But if you want to query about the parts using logical conditons like in SQL then instead you should make a new table. The new table has a PK column for that particular collection and the old table has a column that is a FK to the new PK.
-- user [user] in exercise [exercise] on [date] and their set of lifted weights in kg is [weights]
PRlifts(user,exercise,date,weights)
SELECTuserFROM PRlifts l
WHERE l.name ='Fred'AND set_has_member(l.weights,200)
AND ??? no two lifts were the same weight ???
Bad. Notice how the statement is complicated. Also the DBMS cannot optimize queries because set_has_member is not a table operation. Worse you cannot even query for some conditions, you have to write non-query loop code.
SELECTuserFROM PRlift l
WHERE l.user ='Fred'AND l.weight =200ANDNOTEXISTS(
SELECT weight
FROM Prlift l1, PRlift l2
WHERE l1.user = l.user AND l2.user = l.user AND l1.weight = l2.weight
)
Now the DBMS can optimize and also obviate loops.
(Notice that if this were
WHERE string_length(l1.name)= string_length(l2.name)
then the DBMS could optimize further by having a name length column. But usually a DBMS has special knowledge of strings and certain other types and more or less optimizes as if certain columns existed corresponding to values for certain operators. In fact a DBMS could know about record and set types, but you still couldn't have simple statements and query looplessly.)
Post a Comment for "Django Model For Variable Type Of Data"