Knowledge distilation is training a smaller model to mimic the outputs of a larger model. You don’t need to use the same training set that was used to train the larger model (the whole internet or whatever they used for chatgpt), but can use a transfer set.
Here’s a reference: Hinton, Geoffrey. “Distilling the Knowledge in a Neural Network.” arXiv preprint arXiv:1503.02531 (2015)., https://arxiv.org/pdf/1503.02531
Knowledge distilation is training a smaller model to mimic the outputs of a larger model. You don’t need to use the same training set that was used to train the larger model (the whole internet or whatever they used for chatgpt), but can use a transfer set.
Here’s a reference: Hinton, Geoffrey. “Distilling the Knowledge in a Neural Network.” arXiv preprint arXiv:1503.02531 (2015)., https://arxiv.org/pdf/1503.02531