1

Backstory (You can skip)

I am writing a pronunciation library for irregular words. Take something like the following:

T1E1s    // tee one E one es | tee one E ones
1994-1995// 1994 (minus|dash|to|) 1995 
19-mm    // 19 dash mmmmmmmmmmmmmm | 19 dash millimeter | 19 dash em em
4x4      // 4 ex 4 | 4 times 4 | 4 by 4

What you see for every word, are the different possible interpretations. Tackling the issue is pretty taxing, but is honestly pretty straight forward. Basically, I parse the word into various types denoted by this enumerator, as such:

    enum StringType
    { // Heirarchical
        Acronymn                  , // "mm","мм"
        Measurement               , // pound

        Number                    , // 0-9

        Character_Times           , // x × X
        Character_Dash            , // -
        Character_ForwardSlash    , // /
        Character_Latin_Lower     , // a
        Character_Latin_Upper     , // A
        Character_Latin_Plural    , // s

        Consanants_Latin_Proper   , // Fff
        Consanants_Latin_Lower    , // fff
        Consanants_Latin_Mixed    , // fFF
        Consanants_Latin_Upper    , // FFF

        Word_Latin_Proper         , // Foo
        Word_Latin_Lower          , // foo
        Word_Latin_Mixed          , // fOO
        Word_Latin_Upper          , // FOO

        Consanants_Cyrillic_Proper, // Fff
        Consanants_Cyrillic_Lower , // fff
        Consanants_Cyrillic_Mixed , // fFF
        Consanants_Cyrillic_Upper , // FFF

        Word_Cyrillic_Proper      , // Фоо
        Word_Cyrillic_Lower       , // фоо
        Word_Cyrillic_Mixed       , // фОО
        Word_Cyrillic_Upper       , // ФОО
    };

thus, a word like 19-mm is parsed like so:

Halt: void mapper(QString) /home/akiva/Programming/Blanket/main.cpp:206
[QStringList] sp.parsedStrings()
QStringList:: QStringList
0   : 19
1   : -
2   : mm
QList<QCD::StringType>:: QList<QCD::StringType>
0   : Number
1   : Character_Dash
2   : Acronymn
QString:: "19-mm"

The taxing part is where I have to tackle each case with its own implementation, and by the end of this, I imagine I will have something like 500 different combinations I will need to program functions for. This is where things get messy, because I do not want this:

     if (string.types() == QList<StringType>() << StringType::Number << StringType::Dash << StringType::Acronymn) { Number_Dash_Acronymn(); }
else if (string.types() == QList<StringType>() << StringType::Number << StringType::Dash << StringType::Number) { Number_Dash_Number(); }
else if (string.types() == QList<StringType>() << StringType::Number << StringType::Times << StringType::Number) { Number_Times_Number(); }
// + 500 additional if else statements

500 if else statements is not acceptable. An enumerator with 500 different values is also disgusting for all the reasons that you can imagine. I had floated the idea of using bitflags, but that ended up being far too limited (32 bits = only 32 parameters). Thus I think I have the best possible approach detailed below:

Actual Issue:

I want something like this:

switch (stringTypes)    
case Number + Dash + Word_Latin_Lower: {
        /* code */
        break;
} case Word_Latin_Lower + Dash + Number: { 
        /* code */
        break;
} default:
        ct_Error("Failed to account for the combination: ", stringTypes);
        break;
}

The obvious issue being that the first two cases have the same value, despite being in a different order. Regardless, if the code was functional, it would be foldable, readable, efficient, and easy to sort. Not to mention, I won't have to touch my header file to add new enumerators or functions, thus drastically helping my compile times.

Thus, how should I give combined enumerators guaranteed unique values, so much so that the order it is given also guarantee a unique value, while still maintaining readability?

Thanks.

Anon
  • 3,639

4 Answers4

1

It is not that straightforward but doable nonetheless. First you can create a new enum for your 500 or so string type sequences. To make this readable, eliminate all underscores from your current StringType enum members and use these underscores to separate the parts in your new StringTypeSequence enum. So you could get something like this:

StringTypeSequence
{
    Number_Dash_WordLatinLower,
    WordLatinLower_Dash_Number,
    ...
    et cetera, all the way up to your 500 or so
}

Then you write a parser that picks up the member names from your code file and turns them into hash values. You use these hash values to update your enum:

StringTypeSequence
{
    Number_Dash_WordLatinLower = 0x7C34012B,
    WordLatinLower_Dash_Number = 0x058AAE67,
    ...
    et cetera, all the way up to your 500 or so
}

I just made up some values, the real ones will be different. Anyway...

You can now create your switch statement, casing on the StringTypeSequence members.

You then parse your text and you find a "Number + Dash + WordLatinLower" sequence. You concatenate the enum member names into the string "Number_Dash_WordLatinLower", calculate the hash for it and switch on it. You will end up at the case for enum member Number_Dash_WordLatinLower. If you end up in the default clause you found a new sequence you had not anticipated yet and you throw an exception, presenting the hash for the enum value and case to be added.

I am not sure if your version of C++ allows you to obtain the member name string of an enum value, you may need a CLR type language with reflection capability for that. If you cannot do that, just add an array of strings to your StringType enum that matches the enum members so you can use the enum values as indexes into the array to get to the names.

Martin Maat
  • 18,652
1

Summary

  1. Use a class to store the fields of StringType
  2. Refactor your 500 methods into command classes and use the chain of responsibility pattern.

Eliminate those 500 ad-hoc checks

If you have 500 functions that could be called, I'd say a bigger problem is the massive amount of code to check for each condition and invoke each function. Even with a switch statement, it's likely to be a bear to read and maintain.

Instead, I'd like to suggest you use the chain of responsibility design pattern so that the list of functions is abstract and you can iterate over it. The next sections show you how.

Define a class to contain the StringType information

First, define a class that can contain all the information you currently require in StringType. Yes, a class, not an integer (which does not have enough storage capacity to handle the possible combinations). For example:

class StringType
{
public:
    StringType() : Acronymn(), Dash(), Number() {}
    bool Acronymn;
    bool Dash;
    bool Number;
};

To use this class, you'd set individual fields instead of setting bits in an enum.

If you are concerned this will take up too much space, you can use bit fields to compact the data structure. I doubt this is really necessary though.

Define an abstraction for checking and handling different string types

Second, write a base class that exposes functions to (1) check if the class can handle a particular StringType, and (2) to handle it.

class StringHandler
{
public:
    virtual bool CanHandle(StringType* stringType) = 0;
    virtual void Execute() = 0;
};

Implement the abstraction

You then implement this class for each possible method call that results from a combination of flags. Here are two examples:

class NumberAcronymnHandler : public StringHandler
{
public:
    bool CanHandle(StringType* stringType) override
    {
        return (stringType->Number && stringType->Acronymn);
    };

    void Execute() override
    {
        Number_Acronymn();  //Here's where you do the work
    }
};

class NumberDashAndAcronymnHandler : public StringHandler
{
public:
    bool CanHandle(StringType* stringType) override
    {
        return (stringType->Number && stringType->Acronymn && stringType->Dash);
    };

    void Execute() override
    {
        Number_Dash_Acronymn();  //Here's where you do the work
    }
};

Eliminate the switch statement by iterating over the command classes

To figure out which class can handle a StringType, iterate over an array of these classes and check each one. This example supports 2 types, but if there were 500, the code would be more or less the same; it's just a loop over an array.

void Example(StringType* stringType)
{
    StringHandler* map[2] = 
    {
        new NumberAcronymnHandler(), 
        new NumberDashAndAcronymnHandler() 
    };

    for (int i=0; i<2; i++)
    {
        StringHandler* handler = map[i];
        if (handler->CanHandle(stringType))
        {
            handler->Execute();
        }
    }
};

This solves the issue with your enumerated type, and also adds maintainability and structure to your 500+ methods.

John Wu
  • 26,955
0

You are right in thinking that 500 if-else conditions is unreadable. Same thing for 500 switch cases.

The first thing to consider is whether having 500 different functions actually makes sense. Are those functions all very different or will they contain a lot of duplication? If so, how can you factor out this duplication? You should aim to find a canonical representation of these functions, i.e. only write the data/code that differentiates it from all other functions.

To structure actually picking the right function to call, you simply build a lookup table (hash table, if possible) linking the sequence to a callback function.

D Drmmr
  • 310
0

Inspired by Martin Maat's answer but using number hashing instead of string hashing.

Disclaimer: There is one downside to all of this solutions: they use hashes. And as is usual with hashes: while improbable collisions are a possibility. So it might end up that you will and up with a collision where two combinations resolve to the same hash and you have to treat it as a special case. This is a nasty bug waiting to happen so document this behaviour well if you do end up using this.

Solution 1:

// you need qHash() overload for QCD::StringType
uint qHash(QCD::StringType st)
{
    return qHash(static_cast<std::underlying_type<QCD::StringType>::type>(st));
}

// use like this
switch(qHash(stringTypes): {
case qHash(QList<StringType>{ Number, Dash, Word_Latin_Lower }): {
    /* code */
    break;
}
case qHash(QList<StringType>{ Word_Latin_Lower, Dash, Number }): {
    /* code */
    break;
}
default:
    ct_Error("Failed to account for the combination: ", stringTypes);
    break;
}

Solution 2:

This one has a bit more syntactic sugar

// you need qHash() overload for QCD::StringType
uint qHash(QCD::StringType st)
{
    return qHash(static_cast<std::underlying_type<QCD::StringType>::type>(st));
}

// this is pureliy syntactic sugar
uint StringTypeListHash(const QList<QCD::StringType>& list)
{
    return qHash(list);
}

// for this usage
switch(StringTypeListHash(stringTypes): {
case StringTypeListHash({ Number, Dash, Word_Latin_Lower }): {
    /* code */
    break;
}
case StringTypeListHash({ Word_Latin_Lower, Dash, Number }): {
    /* code */
    break;
}
default:
    ct_Error("Failed to account for the combination: ", stringTypes);
    break;
}

Solution 3:

Operator overloading. Very pretty code, should work. I chose the xor operator because it is most commonly associated with combining hashes.

I'm not 100% sure this won't have unintended sideffects if QCD::StringType is an enum instead of enum class. As usual - take caution with operator overloading and don't blindly use code someone on the internet whipped up without even trying to compile it.

// you still need qHash() overload for QCD::StringType
uint qHash(QCD::StringType st)
{
    return qHash(static_cast<std::underlying_type<QCD::StringType>::type>(st));
}

// here comes the magic
namespace QCD {

    // this is basically the implementation of boost::hash_combine
    // taken from: https://stackoverflow.com/a/27952689/6178613
    uint hash_combine( uint lhs, uint rhs ) {
        lhs^= rhs + 0x9e3779b9 + (lhs << 6) + (lhs >> 2);
        return lhs;
    }

    size_t operator^(StringType lhs, StringType rhs) {
        using enum_int_type = std::underlying_type<StringType>::type
        std::size_t hash = hash_combine(qHash(static_cast<enum_int_type>(lhs)), qHash(static_cast<enum_int_type>(rhs)));
    }

    size_t operator^(StringType lhs, size_t rhs) {
        using enum_int_type = std::underlying_type<StringType>::type
        std::size_t hash = hash_combine(qHash(static_cast<enum_int_type>(lhs)), rhs);
    }

    size_t operator^(size_t lhs, StringType rhs) {
        using enum_int_type = std::underlying_type<StringType>::type
        std::size_t hash = hash_combine(lhs, qHash(static_cast<enum_int_type>(rhs)));
    }
}

// usage
switch(qHash(stringTypes): {
case Number ^ Dash ^ Word_Latin_Lower: {
    /* code */
    break;
}
case Word_Latin_Lower ^ Number ^ Dash: {
    /* code */
    break;
}
default:
    ct_Error("Failed to account for the combination: ", stringTypes);
    break;
}
jaskij
  • 575