When generating text using the GPT-2 Large model, we found that both the method of generation, and text prompt used, have a statistically significant effect on on the output produced. In four out of six trials we found that the Nucleus Sampling method proposed by Holtzman, et all1Holtzman, Buys, Du, Forbes, Choi. (2020). The Curious Case of Natural Text Degeneration. ICLR 2020. Retrieved February 1, 2020, from https://arxiv.org/pdf/1904.09751.pdf, (aka Top-P) produced output that was significantly more humanlike than other methods. We also found that some troublesome prompts, such as the first sentence of the Bible, consistently produce outputs that seem relatively unaffected by the choice of generation method.
A full crash course in neural text generation. We review various methods of generating text with GPT-2 (the “little brother” of GPT-3), including Beam Search, Top-K and Top-P sampling. We also review some key metrics of generated text, including perplexity and repetition, and try to get a more intuitive sense of these measures.