LLM-as-judge uses a model to grade another model's outputs against a rubric.
It scales evaluation of open-ended responses that exact-match scoring cannot handle, like summaries or code reviews. You give the judge clear criteria and, ideally, calibration examples. It is powerful but must itself be validated against human judgment.