DISQUS

DISQUS Hello!  The comments on this profile are unclaimed and thus are unverified.

Do they belong to you? Claim these comments.

pigalle's picture

Unregistered

Feeds

aliases

  • pigalle

pigalle

1 year ago

in Indexes, Hashes & Compression on Phil Dawes' Stuff
re: optimal storage / read efficiency - have you tried reiser4? it does a wonderful job of not wasting disk space. a 'du -k' inside a dir used roughly the same amount of total space as a n3 serialization of the same data. said ~30 mb of data took up 230 mb on ext3. and, about one in every 5 triples is a blog post / news story text where theres a 5K chunk of text - the difference would be even more absurd if not for that. also read back is much faster than your numbers would suggest - its nowhere near 10 ms per call. what kind of drive are you using a 423 mb thing you found in a discared PC on the street?

as for 'in memory' - the kernel disk cache is a great for 'in memory' - especially in the concurrency department - 10 mongrels can all benefit from it w/o a seperate memcached..

as for indexing - i havent thought about it much yet - my query engine takes about 0.1 seconds for a basic 'fetch the content, title, author, date, abstract of ___ resources sorted by ascending date'.. hopefully that can be shaved down once i learn some stuff, and your previous post is my jumping off point - thanks!

oh ya. wheres your source? mines http://whats-your.name/yard
Returning? Login