We were unable to load Disqus. If you are a moderator please see our troubleshooting guide.
Yes.
Some hints? :)
If I understood correctly how thrust works, the multiplications should be non-blocking (asynchronous), if your CUDA device has enough juice (threads available) they will run in parallel. You can also ask this on their forums (or mailing lists) to be sure.
Considering an implementation of thrust::complex type, how would you do to summon cublasCgemm, which is the cuComplex version of cublasSgemm? I mean, how would you cast between thrust::complex and cuComplex?
Awesome tutorial!
Very nice example! However, your print_matrix code does not work with the thrust additions. Doesn't the data need to be moved back to the host by copying to a host_vector before it can be printed out?
When you work with thrust vectors you can access directly the vector elements (no matter if the vector is on the CPU or on the GPU).
I got Bus Errors until I added code that looked like as follows
thrust::host_vector<float> h_A = d_A;
print_matrix(thrust::raw_pointer_cast(&h_A[0]), nr_rows_A, nr_cols_A);
Thanks for the follow up,
maybe your answer will help other people with the same problem.
On my machine I was able to print directly elements from d_A.
This is very useful. Thanks!
one minor problem. Shouldn't the seed move out the GPU_fill_rand() function?
You can move it outside if you want, as it is implemented now it will reinitialized the seed for every call, twice in this case.
I was looking all over the web for an easy to understand matrix multiplication example using CUBLAS. It was almost hopeless, but you got it. Thanks!
Thanks a lot for these tutorials, they are very helpful !
Did you mean to say "CPU" for the first comment?
Yes, definitely CPU :). Thanks.
Very nice tutorial! I've one question: Can one use Thrust to perform lots of matrix-vector multiplications in parallel?