Going from the WebVTT specs, you can do all sorts of things with CSS. It'd be cool to see stuff like this done to distinguish voices, using an example from the page:
video::cue(v[voice="Kathryn"] {
color: lime;
}
in combination with
00:00:16.000 --> 00:00:24.000
<v Kathryn>I like lime.
would...