[Tutor] design advise

Thu Aug 27 18:28:47 CEST 2009

Alan Gauld wrote:
> <div class="moz-text-flowed" style="font-family: -moz-fixed">
> <davidwilson at Safe-mail.net> wrote
>
>> The thing that bothers me is that I ma have 10 users or 100,000 users 
>> and really wanted to get an opinion as to which option would scale 
>> better, leaving aside the relational DB approach.
>
> If you have to cater for 100,000 users all with different views on a 
> common set of resources I don;t think you can afford to leave aside 
> the database approach! Almost anything else will run like a dog with a 
> broken leg...
>
> 10 users with 100,000 resources would be fine but 100,000 users
> will be a problem if you try to use the filesystem as an organising tool.
>
> The trick to using the database is to build the relationships in the 
> database but keep the resources in the filesystem. You can then query 
> the database for which resources to display then access the resources 
> directly from disk using their filenames etc
>
> HTH,
>
>
+1   you need a database to keep track of 100,000 users.  Scaling is 
what databases do best.

That doesn't mean you necessarily need a "real database" yet.  But if 
you start with a database-compatible approach, then you'll be able to 
scale it into a database when the users grows enough.

However, whether or not you use a database, you still have to design the 
interactions.  If these filenames have to be unique, then it'd be quite 
difficult to check that if they were all in separate directories.  So 
adding a new file would require that you check in all the directories to 
make sure the selected name is unique.  So updating a single file would 
be very slow.  Cure for that is either to put them all centrally, or 
make the name arbitrary.  This is equivalent in database terms to using 
an ID (integer) abstraction to identify people, since their name field 
might not be unique.

If you're trying to use the file system as your database, you have to 
consider three tradeoffs:

1) how to make sure things are self-consistent, according to whatever 
the business rules are.
2) how to minimize access time for more than one kind of query
3) how to reconstruct things when something goes wrong (which it will).  
And of course you have to decide which problems are to be recoverable 
and which ones are catastrophic.

DaveA