Is it really good practice in Python code for machine learning to use so many parameters?

Question

Currently I am a student learning Machine Learning, and so my observation is from an academic context. It may be different in a business environment.

One thing I find very odd when I see Python code for machine learning is that when you call a typical network class, you send it lots of parameters, which strikes me as a risky thing to do. Surely it would be better practice to put all of your hyperparameters and configuration values in a special class for hyperparameters and configuration, and then when you want to change the values they are easy to find?

If the class is in a seperate file, you can simply copy it across for the next project.

Would anyone want to comment on why this is not the obvious thing to do?

Here is an example of what I mean:

    agent = agent_(gamma=args.gamma,
              epsilon=args.eps,
              lr=args.lr,
              input_dims=env.observation_space.shape,
              n_actions=env.action_space.n,
              mem_size=args.max_mem,
              eps_min=args.eps_min,
              batch_size=args.bs,
              replace=args.replace,
              eps_dec=args.eps_dec,
              chkpt_dir=args.path,
              algo=args.algo,
              env_name=args.env)

Apologies I imagined you might have seen code like this before. As you can see, a lot of hyperparameters and configuration values. I see this a lot in books and courses. Instead of passing all these parameters around, just put them in a class.

score 1 · Answer 1 · answered Jan 15 '21 at 16:07

It really depends on the context, as always.

"Just put them in a class" is a bit overly general. Let's look at the refactoring: https://refactoring.com/catalog/introduceParameterObject.html

"Introduce Parameter Object". In the linked example, a function with arguments startDate and endDate gets turned into a function with argument aDateRange.

The reason this is a good refactoring in this case is that the start and end date truly belong together. A date range is a reasonable abstraction.

In your example, putting all the parameters into the same class (ModelArgs or whatever) would be bad, because the parameters fit into different things: What does learning rate have to do with checkpoint directory?

In this case, using the "Introduce Parameter Object" refactoring is too low-level, and too much concerned with masking a different design smell: This agent class seems an awful lot like a "God Class": Takes care of the model, takes care of the training, takes care of writing checkpoint data and who knows what else.

If you want to apply principles of good object oriented design to this, you would probably first extract these different responsibilities into different classes. This is what's being done, for example, in the great fast.ai library.

And after that, you might find that the individual classes with their more narrow responsibilities end up with quite reasonable parameter lists after all.

score 1 · Answer 2 · answered Jan 16 '21 at 06:46

Take a look at Tensorflow Keras. When you have many APIs in an end-to-end machine learning workflow, and people are using the APIs, you will see this very common.

The API a machine learning process exposes to the users should be clear enough for them to read and use, with all the possible arguments included, and docs below the function signature. It also uses **kwargs for the sake of backward compatibility.

Taking the compile example,

compile(
    optimizer='rmsprop', loss=None, metrics=None, loss_weights=None,
    weighted_metrics=None, run_eagerly=None, steps_per_execution=None, **kwargs
)

it shows clearly to the users what they can pass and what are the default values. Wrapping them into a separate class indicates the users have to find out the doc for that class in order to know how to call this compile api, which is unnecessarily extra work.

Besides, there are many APIs in various modules, if each one includes such a wrapped class, the code management is just a mess.

score -3 · Answer 3 · answered Jan 14 '21 at 16:54

Surely passing parameters as parameters is the obvious thing to do, no?

Creating a class to put the parameters in and then passing the class is less obvious.

So if there's no reason to do it that way, then why would you do it that way?

and then when you want to change the values they are easy to find?

False premise. It's not hard to find the parameter when the parameter is right there in the function call. It is hard to find the parameter when it's in a separate configuration file and you have to trace the code to see where the parameter comes from.

If the class is in a seperate file, you can simply copy it across for the next project.

Don't you have different parameters in each project? If they were the same, it would be the same project, with the same input and output.

Is it really good practice in Python code for machine learning to use so many parameters?

3 Answers3