As a user, choosing account URLs is hard — like myname.app.com — especially when your usual names are already taken. Sometimes the choice is even enough to deter users from signing up at all — which is why a lot of services, Prevue included, are moving towards auto-creating a random URL on behalf of the user, then allowing them to change it later if necessary.
It seems simple enough, but as usual, in removing complexity for the user, you invariably add complexity to the product — smart URL suggesting being a great example of that. Having recently written the logic that generates a “best guess” URL in Prevue, I thought I’d explain how I approached the problem, and give some insight into the guesswork that goes into choosing a “good” URL.
First and foremost, URLs need to be unique — which is why a lot of products tend to append numbers to the end of generic names (like user123.app.com) or remove the need for custom URLs entirely. This is a totally valid way of programatically generating unique string, but this invariably creates quite boring results — definitely not the kind of attention to detail I expect from Prevue. So here’s the logic I use instead:
Attempt 1. Custom domains
Since an email address is a required field for all new signup’s, the first suggestion is to detect whether you have a custom domain in your email. As a business tool, Prevue assumes that email@example.com will want abcagency as their custom URL — so this is a natural place to start our search for a good URL.
The main problem here was to split the custom URLs from the generic email addresses — like Gmail, Hotmail and Yahoo. For this, I output a list of every email domain used by some 30,000 existing users, and ordered them by most frequently occurring. Then I took the top 50 recurring domains, and combined them with some others that I found via public lists.
Using this data, if your email domain didn’t match an item in the list, then it’ll assume you’ve got something custom and use that. If it matched a common email provider, it moved onto the next guess…
Attempt 2. First name
Though domain is the most preferred starting point for a custom URL, using the user’s name is a good fallback. Again, using the email value provided by the user, Prevue will strip out everything before the @ and decide whether it’s a name.
But how do you work out what’s a name, and what’s just random text? Instead of matching the first part of an email address against a huge list of common first-names, Prevue compares the word that occurs before any symbols or numbers against a blacklist of words that it knows aren’t names.
This blacklist was built using the same technique as finding commonality in domain names, and showed some surprising consistency in what people use as common email prefixes — things like contact, info, no-reply and hello.
So “john.smith” ends up as just john — “buzz1234” returns buzz — and “hello” fails completely. If by this point, something appropriate isn’t found, then the script will move onto creating something original from scratch…
Attempt 3. Fun names
At this point, the script couldn’t get a custom domain or a first name — which means we don’t have anything useful to work with. But instead of jumping to something generic, I thought it was a chance to have some fun and create a more memorable URL.
So I created two lists, each containing 200 unique words — the first list contains adjectives, and the second contains animal names. The script then chooses a word at random from each list and combines them using a hyphen, resulting in some fun combo’s like:
This will result in a huge number of possible permutations of pretty funky URLs. Much better than app123!
By now, the script will have generated a URL based on either domain or first name, or created something totally random. Now we have to check whether that URL already exists — since this value needs to be unique.
So at this point, the script runs a number of checks to ensure that the URL is unique, contains more than 3 characters and doesn’t exist on a blacklist of URLs I’ve previously determined as invalid (like www, sex or blog). If the suggestion fails any of the above criteria, we add a random number to the end of the suggestion and try again — this process will repeat 3 times before starting from the “fun-name” stage all over again.