32

The specific example in mind is a list of filenames and their sizes. I can't decide whether each item in the list should be of the form {"filename": "blabla", "size": 123}, or just ("blabla", 123). A dictionary seems more logical to me because to access the size, for example, file["size"] is more explanatory than file[1]... but I don't really know for sure. Thoughts?

clb
  • 521

5 Answers5

79

I would use a namedtuple:

from collections import namedtuple
Filesize = namedtuple('Filesize', 'filename size')
file = Filesize(filename="blabla", size=123)

Now you can use file.size and file.filename in your program, which is IMHO the most readable form. Note namedtuple creates immutable objects like tuples, and they are more lightweight than dictionaries, as described here.

Doc Brown
  • 218,378
19

{"filename": "blabla", "size": 123}, or just ("blabla", 123)

This is the age old question of whether to encode your format / schema in-band or out-of-band.

You trade off some memory to get the readability and portability that comes from expressing the format of the data right in the data. If you don't do this the knowledge that the first field is the file name and the second is the size has to be kept elsewhere. That saves memory but it costs readability and portability. Which is going to cost your company more money?

As for the immutable issue, remember immutable doesn't mean useless in the face of change. It means we need to grab more memory, make the change in a copy, and use the new copy. That's not free but it's often not a deal breaker. We use immutable strings for changing things all the time.

Another consideration is extensibility. When you store data only positionally, without encoding format information, then you're condemned to only single inheritance, which really is nothing but the practice of concatenating additional fields after the established fields. I can define a 3rd field to be the creation date and still be compatible with your format since I define first and second the same way.

However, what I can't do is bring together two independently defined formats that have some overlapping fields, some not, store them in one format, and have it be useful to things that only know about one or the other formats.

To do that I need to encode the format info from the begining. I need to say "this field is the filename". Doing that allows for multiple inheritance.

You're probably used to inheritance only being expressed in the context of objects but the same ideas work for data formats because, well, objects are stored in data formats. It's exactly the same problem.

So use whichever you think you're most likely to need. I reach for flexibility unless I can point to a good reason not to.

candied_orange
  • 119,268
7

I would use a class with two properties. file.size is nicer than either file[1] or file["size"].

Simple is better than complex.

JacquesB
  • 61,955
  • 21
  • 135
  • 189
5

Are the filenames unique? If so, you could scrap the list entirely and just use a pure dictionary for all the files. e.g. (a hypothetical website)

{ 
  "/index.html" : 5467,
  "/about.html" : 3425,
  "/css/main.css" : 9876
}

etc...

Now, you don't get "name" and "size", you just use key and value, but often this is more natural. YMMV.

If you really want a "size" for clarity, or you need more than one value for the file, then:

{ 
   "/index.html" : { "size": 5467, "mime_type" : "foo" },
   "/about.html" : { "size": 3425, "mime_type" : "foo" }
   "/css/main.css" : { "size": 9876, "mime_type" : "bar" }
}
user949300
  • 9,009
0

In python, dictionary is mutable object. Other side, tuple is immutable object.

if you need to change dictionary key, value pair often or every time. i suggest dictionary to use.

if you have fixed/static data, i suggest tuple to use.

# dictionary define.
a = {}
a['test'] = 'first value'

# tuple define.
b = ()
b = b+(1,)

# here, we can change dictionary value for key 'test'
a['test'] = 'second'

But, not able to change tuple data using assignment operator.