January 27, 2003

diveintomarkov

Mark Pilgrim has posted some "random" markov poetry that was generated using a python script that he wrote.

I had a phase where I was really into generating random text... so I leared a bit about this. Mark is using a word-level model which outputs words that are found in the original corpus. You can also use a letter-level model which creates new words based on how often letters occur depending on previous letters. I like higher order (taking into account more previous letters) letter-level models because they can output some really funny words.

A quick search on google yields the following :

I had also created a Zope site that integrated a random markov text generator to create arbitrarily large websites for search engine testing (so that they fit with Zipf's law.

comments (3) | [programming]

Related Entries

TrackBacks

In brief: 28 Jan 2003 from dive into mark

Comments

  1. posted by: Mark on January 27, 2003 5:52 PM

    Best. Title. Ever. I can't believe I missed it.

  2. posted by: Brandon on February 4, 2003 7:54 AM

    Hey! I saw some refs coming from here,clever title. So I created a little CGI to create markov text from any web site: http://www.bigredswitch.com/blog/archives/2003/02/02/000123.html#000123.

    The markov text gen for search engine testing is an interesting idea but I would think that you would just be getting different distributions of the same words as your input text (in particular a distribution that follows Zipf's) unless your input was random real text such as random web sites (or even random words using markov word gen). Since there are many input texts available, wouldn't it be better to just use real text? Of course, if it's the size of the web site your are testing this is a great solution.

  3. posted by: nathan jacobs on February 4, 2003 9:15 AM

    Now that you mention it I never did actually test see if the distribution of words was Zipf. The strategy I used was to train a 3-letter gram Markov model from Project Gutenberg (and some other sources) and then create a CGI (a Zope page at the time) that accepted three parameters (a random seed, a depth value, and a split value). These together worked to create a tree of pages composed of markov-generated words with links to subtrees with depth = parent_depth - 1.

    The goal was to test the scalability of HT://Dig without having to crawl the web. I wanted to be able to tweak the site size quickly... I thought this method would be easier than finding and combining documents.

    In the process of doing this I realized that I have never seen a truly markov website (not just text - but style and structure also)... or even a simple weblog. One which could fool you into thinking it was created by a human. But alas, I had to drop this project for more pressing work related activities.

    Lately I have begun to think about how this relates to http://news.google.com... not created using the same methods but able to fool a lot of people into believing it was compiled by humans.

Post a comment

Name:


Email Address:


URL:


Comments: