There are probably certain elements of human speech that can be keyed in on (cadence, dynamic modulation, individual’s tone) that differ from ambient noise and music. This might help them isolate the spoken parts of the videos – which in turn might allow them to convert the text to speech and index. They could even try to catalog all known / published video media and try to create some Shazam style identifier to further ID the content (is this “Dude Where’s My Car,” or “amateur” video).I bet you’re right.