Disqus - Latest Comments for dams

Re: Creating a supervision tree for Elixir GenEvent behavior

dams — Sat, 14 May 2016 17:15:25 -0000

Question: why use two layers of supervisors when one would be good enough? Is it because you assume your application will do other stuff and by default you want to isolate the genevent and friends ?

Thanks for the nice post

Re: Using Riak as Events Storage - Part 1

dams — Sun, 06 Mar 2016 18:23:25 -0000

About number or MR jobs concern: I've demonstrated this in part 3 of this series, available here: http://blog.booking.com/usi... but in a nutshell: yes you're right :) This is while we have implemented a different approach, in part 4 (to be released soon).

About Hadoop: we are already using Hadoop for batch processing on data older than 6 hours basically, using Hive. However it's nowhere near the performance and freshness of what we get with Riak, which is expected. Hadoop is for long term storage and slow/big queries, Riak for medium (10 days) term storage with extremely high throughput and very low latency while being very robust, and for simple queries.

About Spark and Kafka: we are using Spark to do analysis. Events are transported there using Kafka, but the source is Riak. At some point, if we reimplement the frontend layer with Kafka, we'll be able to bypass Riak and stream to Spark directly, but again the gain in term of lag reduction won't be that big and doesn't bring any business value for now. We thought about using Spark on top of Riak (there is a connector) but we want isolation of contents, so we currently use Spark on top of Hadoop but we might have it running on top of Cassandra for instance at some point. In any case we don't use Hadoop as the source.

About disk based architecture: Riak can use various backend (at the same time), among them bitcask, leveldb, but also a memory backend. So you can have fresh data in memory backend with small TTL and also in disk backend for longer storage. But even without doing that, when using the bitcask backend, fresh data will be in your in memory FS cache, which is quite fast. I don't think the disk-based backend is a speed bottleneck for Riak's MR.

The problem of having only Spark on top of Hadoop is that if you need to do some ad-hoc processing, you're forced to use Spark, we need more flexibility/hackability. Also, it's great for working on realtime data (Spark) and very old data (Hadoop), but it's not good at working with semi-fresh data. For instance it's going to be an issue to reprocess data from 2 hours (or 3 days) ago in a streaming way with high performance. Or at least to my knowledge :)

Riak gives us this robust fault tolerance realtime + medium-term storage with very high performance (throughput+latency) for any kind of extraction and processing job (not only realtime preconfigured analysis).

I hope that makes sense.

Re: Using Riak as Events Storage - Part 1

dams — Wed, 02 Mar 2016 16:37:34 -0000

Indeed, I think we are on the same track here. It needs to be said that we tried kafka as a candidate for our centralized storage, and indeed it didn't do very well. However as you said it's a good candidate as a replacement of our frontend aggregation layer, allowing to stream events to on-the-flow data processing before pouring data into Riak. In theory it brings some benefits and ease of use.

However our current frontend layer is working fine for now and scales properly. And the lag between the time an event is produced and it's available in the central storage (Riak) is very small. At least small enough for our usage. We are very pragmatic, so the frontend layer will be replaced when it needs to be, not before :)

Re: Exception::Stringy - Modern exceptions for legacy code

dams — Mon, 28 Dec 2015 16:19:31 -0000

OMG It looks like you didn't read the blog post properly :)

I'm not enforcing the use of Try::Tiny, as I give the alternative of using eval. I personally stopped using Try::Tiny 3 years ago now.

Your example actually illustrates my message : you can't assume that your exception will be an object, you have to take care of "other exception objects and strings", so it's a mess.

For instance, have a look at this code, that throws a nice Exception::Class object, with fields. It ends up as a flat string at the end, because a stupid sig handler added a timestamp. And you lost the field value

use Data::Dumper;
$SIG{__DIE__} = sub { die time() . " " . $_[0] };
use Exception::Class (
'MyException' => { fields => [ 'field1' ] },
);

eval { MyException->throw(field1 => 4, message => "plop") };
if (my $e = MyException->caught) {
say $e->field1;
} elsif ($e = Exception::Class->caught) {
print Dumper($e);
say "it's a string, I can't get the value of field1";
}

There are few other ways to get this kind of results. Exception::Stringy is a pragmatic way of making sure your exceptions don't degrade.

Re: Using Riak as Events Storage - Part 1

dams — Tue, 08 Dec 2015 02:26:48 -0000

In reality we have more than the 3 example event streams. But yes we store the raw data in Riak. We then make use of the raw data to perform a lot of different tasks and exports. One of them is to store a subset of the raw data (mainly error messages) in Elasticsearch. Elasticsearch wouldn't be able to store all the data we have in Riak and provide the same level of fault tolerance and versatility. Another solution would be to use Riak Search (it's similar to elasticsearchn using distributed solr), but because we wanted isolation, we needed a separate cluster, and we already had Elasticsearch experience and infrastructure.

Re: Using Riak as Events Storage - Part 2

dams — Tue, 01 Dec 2015 16:00:23 -0000

1) we could have used a different buckets layout. using events:DC1 epochs:DC2 would totally work, except that each time we add a new DC, a bit more work needs to be done rather than having "epochs" as bucket. It's a balance between doing the book-keeping inside generic buckets, or having buckets more specifics but additional administrative overhead. At the end of the day it's not a big difference.

2) Good question :) I used statistics about historical data and I assume a certain compression ratio to go unders 500K. Currently we have a 50% ratio, so I chunk before compression at around 1M and make sure the result is < 500K. If not, I iterate and chunk a bit more. Not very interesting wrt this article, just a bit of plumbing

Re: Using Riak as Events Storage - Part 1

dams — Tue, 20 Oct 2015 17:27:43 -0000

In addition to Ivan's comment, after testing it we concluded that it was not ideal in a position where a lot of clients would fetch data from it. Kafka seemed more appropriate as a way to transfer data to a small number of endpoints. So in theory Kafka would be a good replacement for what we have called the aggregators layer, pouring data into a couple of storage points (Riak). However in this case Kafka would work with individual events, and it's not clear if it could aggregate events into blobs at the same time. But definitely something to investigate at some point.

Re: Perl Benchmark Serializer: JSON vs Sereal vs Data::MessagePack vs CBOR

dams — Fri, 20 Mar 2015 20:24:23 -0000

For information, the Erlang implementation is now done, encoding and decoding. It's very fast, the core is in C and then bound to Erlang. It's in master on github: https://github.com/Sereal/S...

Re: Keeping Perl Classy

dams — Tue, 13 Jan 2015 18:21:54 -0000

my eyes are indeed bleeding now

Re: DateTime duration in seconds

dams — Tue, 20 May 2014 17:53:55 -0000

as its name suggests, delta_ms returns a duration expressed in minutes and seconds. Which is useless if you want duration in seconds, as you can't convert minutes (that can be 60, 61 or 62 secs long) in seconds. Read the doc again...

Re: Perl Benchmark Serializer: JSON vs Sereal vs Data::MessagePack vs CBOR

dams — Mon, 27 Jan 2014 10:24:33 -0000

At some point it was on my todo list, but meh...

Re: Perl Benchmark Serializer: JSON vs Sereal vs Data::MessagePack vs CBOR

dams — Mon, 27 Jan 2014 08:32:40 -0000

I think that Steffen meant: "use your real data in the benchmark you posted", not "switch library in your real life system" :) Otherwise you are comparing tomatoes with potatoes.

Re: Perl Benchmark Serializer: JSON vs Sereal vs Data::MessagePack vs CBOR

dams — Mon, 27 Jan 2014 08:30:21 -0000

I used the lua implementation of Sereal, when working at the same company as Celogeek, tackling the same issue :) the lua implementation is robust enough so that it works fine with any data structure, but doesn't work very well with Perl specific things. Which is good enough for most usage imho. I used it in a specific case where I needed to add elements to an existing serealized ArrayRef in Redis. It worked, but it was faster to use Redis to concatenate a new sereal object containing the additional elements, thus not using lua. Then on retrieving, using deserealization from offset, and merging the results in one ArrayRef. That solution was basically beating any others I could come of with.

Re: protect a screen session with a password

dams — Fri, 17 Jan 2014 04:12:08 -0000

Well, read the disclaimer in the post. In any case even with the encrypted string, all root can do is brute-force it, to find *a* password that matches, not necessarily *your* password.

Re: Perl Benchmark Cache with Expires and Max Size

dams — Thu, 02 Jan 2014 12:33:42 -0000

code is available https://metacpan.org/source...

Re: Perl Benchmark Cache with Expires and Max Size

dams — Thu, 02 Jan 2014 05:45:01 -0000

or https://metacpan.org/pod/CH...

Re: p5-mop

dams — Tue, 17 Sep 2013 15:07:59 -0000

Hm, there were a typo in the article. I meant to say, "most Perl developers (I think) implement *function exporting* by inheriting from Exporter". So they'll do "use base qw(Exporter) instead of doing "use Exporter qw(import);". Which won't work.

Re: p5-mop

dams — Tue, 17 Sep 2013 08:08:54 -0000

Yes, good point, but I wanted to git instructions that would work with older versions. cpanm is really awesome. New features from the last development version are really great

Re: p5-mop

dams — Tue, 17 Sep 2013 06:12:19 -0000

This post is also on blogs.perl.org, with more comments. See http://blogs.perl.org/users...

Re: p5-mop: a gentle introduction

dams — Mon, 16 Sep 2013 20:03:07 -0000

So, for those of you who are wondering why this page is almost empty: I'm having a hell lot of troubles getting GitHub Pages to generate it properly. It seems to be currently stuck in an infinite loop of bad rendering mania. I double checked everything, following the docs and the "what if everything fails" help page, but no luck. Currently waiting for GitHub support to come back to me, I'll update this page asap, sorry for the inconvenience.

Re: Perl Moderne, a new Perl book

dams — Sat, 15 Jun 2013 17:22:51 -0000

The web site has been moved, we'll rebuild it somewhere else soon. Thanks for letting me know

Re: Stop Talking About Perl

dams — Sat, 15 Jun 2013 15:14:54 -0000

"Actions beat words every single time". Except in poetry !

Re: New And Improved: Bloomd::Client

dams — Sat, 15 Jun 2013 15:13:19 -0000

Yeah, I tried these two modules. Bloom::Faster worked well for me (despite what the negative review says). However, bloomd implements a bloomd *server*. So to be able to compare, you should build up a server running Bloom::Faster.

Re: Hacking Thy Fearful Symmetry - My Pro-Tips for YAPC First-Comers

dams — Thu, 30 May 2013 07:32:39 -0000

You forgot the most important one: find BooK or one of the french Cabal member, and make sure you know when and where the "evil green stuff" party takes place. And be there.

Re: MooX::LvalueAttribute - Lvalue accessors in Moo

dams — Tue, 12 Feb 2013 02:53:25 -0000

Thanks ! fixed