Hashing - Why?! Everytime I see all those strcmp()'s and strcasecmp()'s in the Max Specter sources I am asking myself: "Has this been written by a 12 year old on a Commodore VIC-20 in BASIC-70?" Not only is this an incredible waste of CPU cycles, it also smacks of total ignorance of software development techniques developed since the 1950s and is therefore utmost unprofessional. It has got to be thrown out. Whenever the pbx execution engine evaluates a line of code in the dialplan, it calls strcmp() or strcasecmp() to compare each keyword, application, variable, context or extension to the names of all other keywords, variables, applications, contexts and extensions respectively, one by one, and each one character by character. This is an incredible waste of resources. The workload thereby increases with the number of concurrent calls times the average number of applications, variables, contexts and extensions times the average length of identifiers. The use of case insensitive application names and variable has the effect of doubling the average length of application names and variables and thereby increases the workload further. Worse still, each call to strcmp() and strcasecmp() involves a context switch which is computationally expensive and in this case totally unneccessary. This is all the more puzzling when considering that the avoidance of context switches for performance reasons is the most common justification why the sources lack abstraction and feature so many monstrous spaghetti functions. Even early 1960s BASIC interpreters did not use string comparisons while executing a program. They tokenized all identifiers and only compared the token codes during execution. Tokenizing and hashing are well known and firmly established techniques which have been in use in interpreters and compilers since the early 1950s. With a hash based approach, a hash value is calculated for each identifier found and the hash value is then compared against known hash codes in order to recognise the identifier. A single integer comparison replaces the strcmp() or strcasecmp() function call which is not only faster but also avoids the context switch associated with a function call. This is not a matter of style. A software project that doesn't employ these time proven techniques in script execution engines cannot possibly claim any credibility for itself and no self respecting software professional would write code the way pbx.c looks like. There can be no argument about it. To their credit, the original author seem to have at least been aware of the impact on performance as the following comment in the original code shows ... /* * I M P O R T A N T : * * The speed of extension handling will likely be among the most important * aspects of this PBX. The switching scheme as it exists right now isn't * terribly bad (it's O(N+M), where N is the # of extensions and M is the * avg # of priorities, but a constant search time here would be great ;-) * */ For some reason though the author didn't seem to have done anything about it. Maybe their skillset was insufficient, maybe they have never heard of hashes, maybe they didn't pay attention when the subject matter of hashing came up in class, maybe they were too lazy to do their homework and forgot all about it. Whatever the reason may have been, it could not possibly serve as an excuse. There is also no excuse not to fix this problem. So, I have made a start towards replacing all the strcmp() and strcasecmp() calls in pbx.c with hash comparisons by providing this hashing API and hash codes for keywords used in there. This comprises the following files ... cw_hash.h -- header file of the hashing API cw_hash.c -- implementation of the hashing API cw_keywords.h -- hash values for keywords found in pbx.c Eventually, everything should be hashed, contexts, extensions, variables etc. The following is a description of the main steps of the conversion to a hash based approach, using hash comparison instead of strcmp() and strcasecmp(), but not yet using hashtables for storage (this is the next step, though). In pbx.c the structs cw_app and cw_builtin got an additional member ... unsigned int hash; Function pbx_findapp() changed to ... unsigned int hash = cw_hash_string(app); alternatively, ... unsigned int hash = cw_hash_string_toupper(app); ... in case you want to retain case insensitivity. Note, that this is slightly more expensive, computationally. Function pbx_findapp() also changed to ... while(tmp) { // The proper way ... if (hash == tmp->hash) break; // The silly way ... /* if (!strcasecmp(tmp->name, app)) break; */ tmp = tmp->next; } and this illustrates very well how hashes save many CPU cycles: First, there is no function call to check the application name for a match in every turn of the loop. Context switches are rather expensive and we avoid them altogether by not calling strcasecmp() anymore. Secondly, instead of one test per character in the application name, we now have only one single test per application entry to test the hash code. The result is that application search performance becomes linear instead of logarithmic. One integer comparison per registered application, instead of one context switch plus average number of characters in an application name times the number of registered applications. Now, in order to get the hash codes for arbitrary applications into the cw_app struct in the first place, the following code was added to function cw_register_application() ... unsigned int hash = cw_hash_string(app); or, again, for retaining case insensitivity ... unsigned int hash = cw_hash_string_toupper(app); Then, ... while(tmp) { if (hash == tmp->hash) { // replaced code: (!strcasecmp(app, tmp->name)) { and ... if (tmp) { memset(tmp, 0, sizeof(struct cw_app)); strncpy(tmp->name, app, sizeof(tmp->name)-1); tmp->hash = hash; Function cw_unregister_application() has also been modified accordingly. Finally, all the strcmp()'s for builtin keyword checks were replaced by ... if (c && (hash == CW_KEYWORD_CALLERIDNUM)) { // replaced code: (c && !strcmp(var, "CALLERIDNUM")) { etc etc Eventually, I will replace the linked list storage and linear search with a hashtable storage library so CallWeaver can store arbitrary objects in a hash table from where they can be retrieved much faster than the current method using linear searches on linked lists and arrays. benjk, August 2006 *************************************************************************** PERFORMANCE NOTICE - PLEASE READ WHEN BENCHMARKING *************************************************************************** In pbx.c file, exactly in function static int pbx_builtin_noop(struct cw_channel *chan, void *data) we have added a call to usleep(1); This means that every time that a NoOp() application is executed in the dialplan, a microsecond sleep is added to the execution time. This wouldn't be a major performance problem and surely it isn't. Let's take a sample dialplan for doing tests (thanks to Royk): [proc-speedtest] exten => s,1,Answer exten => s,n,PlayTones(ring) exten => s,n,Wait(1) exten => s,n,Set(iterations=$[ 100000 ]) exten => s,n,Set(i=1) exten => s,n,Verbose(Begin-of-speedtest) exten => s,n,Wait(2) exten => s,n,Set(time1=${EPOCH}) exten => s,n(iterate),GotoIf($[ ${i}<${iterations} ]?next:last) exten => s,n(next),Set(i=$[ ${i}+1 ]) ;exten => s,n,NoOp() exten => s,n,Goto(iterate) exten => s,n(last),Set(time2=${EPOCH}) exten => s,n,Verbose(Finish-of-speedtest) exten => s,n,Verbose(The time diff is $[${time2} - ${time1} ] seconds) exten => s,n,Verbose(Which means that the priorities/sec = $[ 4 * ${iterations} / (${time2} - ${time1}) ] ) exten => s,n,Wait(1) exten => s,n,SayNumber($[ 4 * ${iterations} / (${time2} - ${time1}) ] ) exten => s,n,Hangup Without the NoOp() call, it would execute on my PC at about 60k iterations per second. The drawback is that the playtone isn't played at all because all the CPU is dedicated to the dialplan loop. Adding the call to the NoOp(), performance tests would fall down to about 2000 ips. This is dued to the fact that when usleep is called, the other threads are allowed to wake up and execute and it's likely to take more than 1 microsecond. Fortunately, under load, the usleep() call will allow all the threads to run smoothly without any particular issue. We are more interested in smooth RTP streams than dialplan performance. Please note that this big performance reduction, so if you want to compare the dialplan execution, don't use NoOp() or comment out the usleep() in it; With this small mod, we have managed to find a useful way to use NoOp which should be inserted in tight loops. CtRiX, November 2006