Join our FREE personalized newsletter for news, trends, and insights that matter to everyone in America

Newsletter
New

Optimizing Chained Strcmp Calls For Speed And Clarity

Card image cap

From memcmp and bloom filters to 4CC encoding for small fixed-length string comparisons

I've been working on an article to describe a small performance issues with a pattern I've seen multiple times - long chain of if statements based on strcmp. This is the equivalent of switch/case on string value (which is not supported in C).

The code implemented critical business logic, based on currency codes (ISO 3-letter code), and an as-of date. Similar to:

bool model_ccy_lookup(const char *s, int asof, struct model_param *param)  
{  
    // Major Currencies  
    if ( strcmp(s, "USD") == 0 || strcmp(s, "EUR") == 0 || ...) {  
        ...  
    // Asia-Core  
    } else if ( strcmp(s, "CNY") == 0 || strcmp(s, "HKD") == 0 || ... ) {  
        ...  
    } else if ( ... ) {  
        ...  
    } else {  
        ...  
    }  
}   

The code couldn’t be refactored into a different structure (for non-technical reasons), so I had to explore few approaches to keep the existing structure - without rewrite/reshape of the logic. I tried few tings - like memcmp, small filters, and eventually packing the strings into 32-bit values (“4CC”-style) and letting the compiler work with integer compares.

I was able to revise the code into cleaner form, which outperform the original code by 10X. This was achieved with a series of experiments, some of them helped, some of them did not yield improvements, but hinted at other directions.

Full Article: Medium (No Paywall):

Optimizing Chained strcmpCalls for Speed and Clarity

The final implementation looks like:

bool model_ccy_lookup(const char *s, int asof, struct model_param *param)  
{  
    // Major Currencies  
    if ( CCY_IN(s, "USD", "EUR", ...) ) {  
        ...  
    // Asia-Core  
    } else if ( CCY_IN(s, "CNY", "HKD", ...) ) {  
        ...  
    } else if ( ... ) {  
        ...  
    } else {  
        ...  
    }  
}   

And the CCY_IN was implemented as a series of integer compare, using the FourCC encoding = replacing each fixed-size strcmp with a call to CCY_EQ macro:

#define CCY_EQ(x, ccy) (*(int *)x == *(int*) ccy )  

I’m also trying a slightly different writing style than usual - a bit more narrative, focusing on the path (including the dead ends), not just the final result.

If you have a few minutes, I’d really appreciate feedback (comment here or on Medium) on two things:

  • Does the technical content hold up?
  • Is the presentation clear, or does it feel too long / indirect?

Interested to hear on other ideas/approach for this problem as well.