I think it's a good enough start. I gather that the goal is to measure and judge with as little human opinion as possible. I do similar evaluations at work. Though I feel it is impossible to rule out errors completely.
You are viewing a single comment's thread from: