Do they belong to you? Claim these comments.
Phil Dawes
Is this you? Claim Profile »
9 months ago
in Searching arrays in X86 assembler with a bloom filter pt 3 on Phil Dawes' Stuff
Slava Pestov wrote:
> Hi Phil,
>
> I couldn't post a comment on your blog for some reason so I'm posting
> this to the list instead.
>
> The seq>hash word you write already exists in the sets vocabulary, its
> called unique.
>
> And (prepare-filter) looks nicer if you use fry:
>
> : (prepare-filter) ( filter seq -- )
> '[ 1048576 mod _ set-bit ] each ;
>
> Slava
> Hi Phil,
>
> I couldn't post a comment on your blog for some reason so I'm posting
> this to the list instead.
>
> The seq>hash word you write already exists in the sets vocabulary, its
> called unique.
>
> And (prepare-filter) looks nicer if you use fry:
>
> : (prepare-filter) ( filter seq -- )
> '[ 1048576 mod _ set-bit ] each ;
>
> Slava
9 months ago
in Searching arrays in X86 assembler with a bloom filter pt 3 on Phil Dawes' Stuff
@Asm - thanks for the link, that's a handy resource
9 months ago
in Searching arrays in X86 assembler with a bloom filter pt 2 on Phil Dawes' Stuff
Thanks for the tip Asm. It seems pretty quick but I'll try the ANDing and SHRing approach to see how it compares
9 months ago
in Searching arrays in X86 assembler with a bloom filter pt 2 on Phil Dawes' Stuff
Hi Kieran! I'm expecting thousands occasionally. Hundreds commonly.
1 year ago
in How realistic is using OWL for semweb data integration? on Phil Dawes' Stuff
Sorry I didn't make that very clear. I meant the rdf for rss1description is:
<pre>
#item1 rss1:description "foobah"
</pre>
whereas the atom RDF for content would be something like :
<pre>
#entry1 atom:content #content1
#content1 atom:type "xhtml"
#content1 atom:value "foobah"
</pre>
Does that make sense?
<pre>
#item1 rss1:description "foobah"
</pre>
whereas the atom RDF for content would be something like :
<pre>
#entry1 atom:content #content1
#content1 atom:type "xhtml"
#content1 atom:value "foobah"
</pre>
Does that make sense?
1 year ago
in How realistic is using OWL for semweb data integration? on Phil Dawes' Stuff
Hi Bob,
Thanks for the reply.
I suspect the devil is in the detail. For example how would you map RSS1.0's item 'description' and Atom's entry 'content'?. The former is a literal property but the latter is structured object.
Thanks for the reply.
I suspect the devil is in the detail. For example how would you map RSS1.0's item 'description' and Atom's entry 'content'?. The former is a literal property but the latter is structured object.
1 year ago
in Digging into Factor's compiler on Phil Dawes' Stuff
Thanks Slava. I need to fiddle with the CSS but have got to go to work now. Have turned it into a blockquote in the meantime...
1 year ago
in Beginning Factor is like programming assembler on Phil Dawes' Stuff
hmmm.. I think you're right. Maybe I should elaborate in another post. In the meantime I suspect the factor mailing list is littered with examples. Here's a recent one.
1 year ago
in Beginning Factor is like programming assembler on Phil Dawes' Stuff
Hi Manu,
The factor cookbook started me off, but the thing I found really good was Leo Brodie's "Thinking Forth" Book, which helped convince me that time spent learning a stack language wasn't a waste of time.
The factor cookbook started me off, but the thing I found really good was Leo Brodie's "Thinking Forth" Book, which helped convince me that time spent learning a stack language wasn't a waste of time.
1 year ago
in W3C Semantic Web = Global Ontology after all? on Phil Dawes' Stuff
That's true, but for semantic interoperability the same URIs need to feature in the communication somewhere - whether directly or via relationships defined in owl.
1 year ago
in More factor: tabular to triples on Phil Dawes' Stuff
Thanks Christopher - it was a missing close brace in the factor link preventing it from being displayed.
I'll definitely check out both Cat and ripple - thanks for the links
I'll definitely check out both Cat and ripple - thanks for the links
1 year ago
in Coding when you're tired and unmotivated on Phil Dawes' Stuff
Hi Arto,
That's good advice. All my projects until recently have been shared and so I've had a cvs or svn repository to work with.
I assumed that this one wouldn't attract any other coders (given that it's written in gambit scheme), but hadn't thought about just using a repository anyway. Time to crank up git.
Cheers!
That's good advice. All my projects until recently have been shared and so I've had a cvs or svn repository to work with.
I assumed that this one wouldn't attract any other coders (given that it's written in gambit scheme), but hadn't thought about just using a repository anyway. Time to crank up git.
Cheers!
1 year ago
in Indexes, Hashes & Compression on Phil Dawes' Stuff
@pigalle: thanks for the comments - I'll take a look at reiser4.
The ~10ms latency is for a disk seek not a read. What sort of timings are you getting?
@Seth: Cool - I'm planning on doing the same thing (have you read the research papers for cstore?).
The ~10ms latency is for a disk seek not a read. What sort of timings are you getting?
@Seth: Cool - I'm planning on doing the same thing (have you read the research papers for cstore?).
2 years ago
in A simple scheme unittest DSL on Phil Dawes' Stuff
Hi Nat,
Actually using continuations was the original plan but it turned out to be more tricky than I'd expected because of syntax-rules macro hygiene.
The problem is that the inner assert macro can't just use the symbol for the continuation because it gets masked, so IIRC the outer macro needs to traverse the code re-writing the assert calls to include the continuation as an argument - it was doing that where I came a bit unstuck and switched to the exception hack.
Hmmm... maybe I should revisit this.
Actually using continuations was the original plan but it turned out to be more tricky than I'd expected because of syntax-rules macro hygiene.
The problem is that the inner assert macro can't just use the symbol for the continuation because it gets masked, so IIRC the outer macro needs to traverse the code re-writing the assert calls to include the continuation as an argument - it was doing that where I came a bit unstuck and switched to the exception hack.
Hmmm... maybe I should revisit this.
2 years ago
in Some ideas for static triple indexing on Phil Dawes' Stuff
Hi Nick,
Opaque subject identifiers are even easier to index because they can be picked to be sequential in the index. I.e. subject 3 is at position 3.
Re. number of indexes: I think I'll need at least the following.
s->p->o
p->o->s
o->s->p
So 3 index hierarchies for searches. The subject-id-in-the-object-position mentioned above is a special case, and will probably require its own (relatively small) index o->sp.
Opaque subject identifiers are even easier to index because they can be picked to be sequential in the index. I.e. subject 3 is at position 3.
Re. number of indexes: I think I'll need at least the following.
s->p->o
p->o->s
o->s->p
So 3 index hierarchies for searches. The subject-id-in-the-object-position mentioned above is a special case, and will probably require its own (relatively small) index o->sp.
2 years ago
in Some ideas for static triple indexing on Phil Dawes' Stuff
Hi Drew,
The latter. Subject identifiers aren't exposed to the client so there's no way to make statements using them specificially. Instead to join data from two subjects in different graphs you must use identity by discription (i.e. the subject that has these property values..) and the person/agent doing the query must know about them.
Internally the subject IDs can be in the 'object' position to support things like containment. E.g. the XML:
<pre>
Internally indexed as:
<pre>
but externally you can't refer to them. Does that make sense?
The latter. Subject identifiers aren't exposed to the client so there's no way to make statements using them specificially. Instead to join data from two subjects in different graphs you must use identity by discription (i.e. the subject that has these property values..) and the person/agent doing the query must know about them.
Internally the subject IDs can be in the 'object' position to support things like containment. E.g. the XML:
<pre>
<person>
<name>Phil Dawes</name>
<email>phil@example.com</email>
<knows>
<person>
<name>Steve</name>
<email>s@example.com</email>
</person>
</knows>
</person>
</pre>Internally indexed as:
<pre>
#1 name "Phil Dawes"
#1 tag Person
#1 knows #2
#2 name Steve
#2 email steve@example.com
#2 tag Person
</pre>but externally you can't refer to them. Does that make sense?
2 years ago
in Microsoft to support OpenID on Phil Dawes' Stuff
But if I delete it, my comment won't make sense and I'll look like a tit!
I think my comment system does the same for vanilla comments, but I'm guessing the openid plugin must be incompatible somehow.
I think my comment system does the same for vanilla comments, but I'm guessing the openid plugin must be incompatible somehow.
2 years ago
in Microsoft to support OpenID on Phil Dawes' Stuff
Ah - looks like you've been bitten by the 'comment not showing up because it's queued for moderation' trick.
(I had to turn on moderation - too much spam was slipping through and adverts for viagra don't look too good on the internal company blog aggregator)
(I had to turn on moderation - too much spam was slipping through and adverts for viagra don't look too good on the internal company blog aggregator)
2 years ago
in Dark side of the semantic web on Phil Dawes' Stuff
Hi Josh, I think you're missing something here: the only way you can ever be 100% sure that two documents are talking about the exactly the same thing is through shared knowledge about the context and provenance of the documents. Nothing stops somebody from inadvertantly using a URI to mean something slightly different to the original author.
In the foaf case you're relying on the client to have understood that 'http://xmlns.com/foaf/0.1/mbox' is an IFP property requiring a unique personal mailbox and not one e.g. shared with a spouse. So we're talking about sliding scales of confidence here, not absolutes.
Using a description framework ala RDF allows you to disambigate terms through their relationship to other terms. This is a proven technique in natural language and translates well to software: Applications have knowledge of the problem domain they're operating in and the combination of terms they're expecting to operate with. That combination of terms provides a trivial way to disambiguate data from disperate sources. Besides, you can always add disambiguation metadata to your descriptions:
<pre>
</pre>
Also you tend to find that where there are global areas of ambiguity humans tend to invent names and schemes which have a low chance of collision. Email addresses, vehicle number plates, URLs and names like 'FOAF' are examples of these. These are already grounded in real life, widely shared, and are ripe for use in data exchange.
In the foaf case you're relying on the client to have understood that 'http://xmlns.com/foaf/0.1/mbox' is an IFP property requiring a unique personal mailbox and not one e.g. shared with a spouse. So we're talking about sliding scales of confidence here, not absolutes.
Using a description framework ala RDF allows you to disambigate terms through their relationship to other terms. This is a proven technique in natural language and translates well to software: Applications have knowledge of the problem domain they're operating in and the combination of terms they're expecting to operate with. That combination of terms provides a trivial way to disambiguate data from disperate sources. Besides, you can always add disambiguation metadata to your descriptions:
<pre>
<> type FoafPerson
<> usesTermsFrom http://www.foaf-project.org/
<> name "Phil Dawes"
<> surname "Dawes"
<> homepage http://www.phildawes.net/
<> mbox phil@example.com
</pre>
Also you tend to find that where there are global areas of ambiguity humans tend to invent names and schemes which have a low chance of collision. Email addresses, vehicle number plates, URLs and names like 'FOAF' are examples of these. These are already grounded in real life, widely shared, and are ripe for use in data exchange.
2 years ago
in Scheme is love on Phil Dawes' Stuff
Yep, I've done the Gambit... no Chicken!.. no Gambit!... etc.. shuffle.
I've settled on gambit for the moment because of termite. I'm hoping that the new library system in the upcoming r6rs standard will lead to more portable set of libraries and make the choice less all-or-nothing.
Re learning a new language: I think you're right to be nervous. The biggest problem is the Red Pill-ness of it all. Once you've hit upon a feature you like, especially one which gives a big productivity boost, going back to your old language is pretty depressing. I remember in the late nineties witnessing a bunch of jaded smalltalkers hit the java market - they had this constant 'things will never be the same' look about them.
I've settled on gambit for the moment because of termite. I'm hoping that the new library system in the upcoming r6rs standard will lead to more portable set of libraries and make the choice less all-or-nothing.
Re learning a new language: I think you're right to be nervous. The biggest problem is the Red Pill-ness of it all. Once you've hit upon a feature you like, especially one which gives a big productivity boost, going back to your old language is pretty depressing. I remember in the late nineties witnessing a bunch of jaded smalltalkers hit the java market - they had this constant 'things will never be the same' look about them.
2 years ago
in Amazon get into the virtual computing space on Phil Dawes' Stuff
Sort of - however the Sun grid offering is targetted and priced for traditional enterprise clients and established industries who want to bolster their number-crunching horsepower (see faq). I think this is a bit before it's time - enterprise still hasn't got used to the idea of taking data out of its own intranet.
I think the interesting thing about amazon's offering is that it is clearly priced and targetted at the low end of the market. This is the sector I think will push demand for service grid technologies in the same way it has for cheap web hosting: individuals and startups wanting to start small but then needing to scale up close to the demand curve (rather than risking VC upfront for their server facilities).
I think the interesting thing about amazon's offering is that it is clearly priced and targetted at the low end of the market. This is the sector I think will push demand for service grid technologies in the same way it has for cheap web hosting: individuals and startups wanting to start small but then needing to scale up close to the demand curve (rather than risking VC upfront for their server facilities).
2 years ago
in Java cage rattling on Phil Dawes' Stuffpure “Java programmers” are only a step up from VB weenies ;-) and many have no concept of functional languages except that “they’re a bit hard to write in”
Blimey - if that's not a troll I don't know what is!
123Next