name generators

Gengen is only the latest development in a rich history of experimentation with random name generation, stretching back as far as spring of 2015.

Early experimentation focused on Latin-style names. The first attempt, simply named NameGenerator.java, made names by selecting an abstract pattern of vowels, consonants, and endings. Each blank was then filled in using statistical data about the occurrence of each phonetic feature in a long list of attested (masculine) Roman names. The results were hardly authentic, but pleasingly colorful for a first attempt; I used them as city names for one of my map generators.

My second name generator was a perl program called MakeNames.pl that would use a Markov chain trained on a list of input names to produce new ones. In this way I hoped to generate not only Latin-style names, but Greek, Mandarin, or any other language for which I could compile a suitable training set. Whereas NameGenerator.java’s outputs were true to Latin phonotactics, it assembled them in unusual ways, resulting in names that would have felt exotic or fanciful to Roman ears. MakeNames.pl, by contrast, was much likelier to reproduce real Roman names, and its novel outputs were much more likely to pass for authentic. Although the program has sadly been lost, its ideas survive in its successor projects.

MakeNames.pl was structured as a 3-dimensional hash that would store the probability of a letter appearing given the previous two letters. Names3D.java replicated this approach, but instead of accepting a list of names as input, it would attempt to populate the hash structure ex nihilo with random data. In this way it was hoped that every instance of Names3D could represent a new language, with its own unique phonology and flavor. However, this proved to be a terribly naive approach, and the results were haphazard and unpronounceable. If I wanted to produce not only names, but naming languages that were distinctive and coherent, I would need a more thorough, thoughtful approach.

Finally, before Gengen reached its current form, I made another phonology-based naming language generator that I now call Gengen 0. Consistent with its predecessors, it was a Markov chain-based approach, but it now used phonological data and principles to populate its chains, of which there were several, representing syllable onsets, nuclei, codas, and the transitions between them. While it was easily my most sophisticated model to date, the results were underwhelming, prone to unlikely and unsightly consonant clusters and illogical phonemic inventories, and devoid of any sense of prosody. Worse still, its approach was so muddled and haphazard that these problems couldn’t be fixed without aggressively restructuring the generator from back to front. Hence came the impetus for Gengen’s present incarnation.