Here's another challenge. Your testers are going to be heavily influenced by their expectations. For example, I can suspend my disbelief for a cartoon as long as I didn't expect live action. Setting up the test is going to be quite challenging as you make sure you don't bias the participants right off the bat.
I think consideration of this issue will help address your questions as well.