don't worry, it's probably fine

Character sorting exercise in linear time

17 Nov 2019

nablopomo java tdd

I had a go at another code exercise, Sorting Characters, using Test-Commit-Revert.

Given a string, a piece of code should lowercase and sort the entire string.

For example, “a dog and a cat” would become “aaaacddgnot”.

Initial implementation (streams)

The solution must lowercase the input, remove any non-alphabet characters, and then sort the remaining content - my first implementation was using the Java streams API.

private String sort(String input) {
    return input.toLowerCase()
            
        // turn into characters
        .chars()
            
        // only keep a-z
        .filter(character -> 'a' <= character && character <= 'z')
            
        // sort
        .sorted()
            
        // convert primative char to String
        .mapToObj(Character::toString)
            
        // merge the Strings
        .collect(Collectors.joining());
}

This works, but can we make it faster? First, I needed to measure how fast this version runs.

Measuring

I used the Java Micro-Benchmarking Harness to measure how long this approach took when operating on a 20,000 character input.

Using 10 warm-up iterations and 30 measurement iterations, the result was an average of 0.756 milliseconds to process that string.

I think we can do better!

Linear time

Most generic sorting algorithms, like QuickSort or MergeSort, have a run-time of O(n log n). This means that as an input of length n increases, the runtime increases more than linearly but less than quadratically.

Knowing the limited domain of an input means you can “cheat”, by only traversing the input twice - once for counting and once for building the output. In this case, the input is an ordered sequence composed of at most 26 distinct characters.

By counting the number of each character in the sequence, then using that to generate an output string, the algorithm will be O(n) - the runtime grows in linear proportion to the input size.

private String sort2(String input) {

    // Track how many of each character we've seen
    int[] characterCounts = new int[26];

    for (char character : input.toLowerCase().toCharArray()) {
        if('a' <= character && character <= 'z') {
        
            // Add one to the number of times we've seen this character
            // Subtracting 'a' is a trick to make a = 0, b = 1, ... for the array
            int index = character - 'a';
            characterCounts[index]++;
        }

    }

    StringBuilder output = new StringBuilder();

    // For every letter
    for (int i = 0; i < 26; i++) {
    
        // Add one of them to the output for each time we've seen one
        for(int j = 0; j < characterCounts[i]; j++) {
            output.append((char) (i + 'a'));
        }
    }

    return output.toString();
}

Running exactly the same benchmark against this code gives an average benchmark of 0.105 milliseconds, significantly faster than the original implementation using streams.

Nice, although the code is not as nice to look at as the streams version!

Code available on my exercises repo.

(Epilogue - replacing the Character::toString and Collectors.joining() with a StringBuilder as a consumer drops the average runtime for the streams implementation down to 0.572 - better but still slower)


November is National Blog Posting Month, or NaBloPoMo. I’ll be endeavouring to write one blog post per day in the month of November 2019 - some short and sweet, others long and boring.