Two Dimensional Python Matrix Data-Structure with String Indices (Indexes)

17 Jan
January 17, 2013

Today I was trying to create a 2 dimensional data structure that can be queried using string indices rather than integer ones, this is using Python which am a total newbie in (but trying to write a research project using).

The idea was to find something natively within Python, rather than implement my own structure, such a data-structure is fundamental in programming theory, so a very likely chance that an out-of-the-box implementation exists already in most languages, and they are generally a dimensional extension of Arrays (Array of Array), Lists (List of Lists) or Dictionaries (Dictionary of Dictionary), but with string rather than integer indexes.

Arrays and Lists in Python are only accessible through integer based indexes rather than string, and so any attempt to search the structure using [‘string’] indices will result in the following error message:

TypeError: list indices must be integers, not str

So the obvious winner was the Dictionary data-structure, since in it’s one dimensional form dictionaries allow string based searches (using [Key]), but how can I turn a dictionary into a string searched matrix efficiently?

Well, it turns out you can use the defaultdict module which allows you to create a Dictionary structure of type Dictionary (oh the recursion!), leading to a matrix like structure with easily searchable entries, as so:

>>> from collections import defaultdict
>>> matrix_dict = defaultdict(dict)
>>> matrix_dict['cat']['grumpy'] = 14
>>> matrix_dict['dog']['grumpy'] = 1

To add a new co-ordinate to the matrix (on dimension 1 or dimension 2), you can simply reference them in the index as we have done above.

Alternatively, this structure could have been created using a dictionary with string as Key and dictionary as Value, as so:

>>> dict_dict = dict()
>>> dict_cat = {'grumpy':14}
>>> dict_dog = {'grumpy':1}
>>> dict_dict['cat'] = dict_cat
>>> dict_dict['dog'] = dict_dog

Although the end result is the same, the 2nd method requires new dimension 1 co-ordinates to be explicitly added as a dictionary item, and using the matrix coordinate for assignment will cause a failure, for example:

>>> dict_dict['horse']['grumpy'] = 5 #errors

Traceback (most recent call last):
File "<pyshell#51>", line 1, in <module>
dict_dict['horse']['grumpy'] = 5
KeyError: 'horse'
>>> dict_dict['horse'] = {'grumpy':5} #works fine
>>> dict_dict['cat']['drunk'] = 3 #adding new elements to dimension 2 works fine

One little quirk to note here is that because our measure field is not typed (the Value of the inner-most dictionary), that means it does not have a default value (for example type Integer has a default value of 0), and so type specific functions (such as increment) will not succeed on the first iteration (when the new dimension coordinates is being added). An example might be better at explaining this:

>>> matrix_dict['duck']['grumpy'] += 1

Traceback (most recent call last):
File "<pyshell#54>", line 1, in <module>
matrix_dict['duck']['grumpy']+= 1
KeyError: 'duck'

And that is it really, a very simple post today!

* * * *   1 vote
Tags: , ,
1 reply
  1. Isaac says:

    Awesome. Hashmaps and dictionaries are my favorite data type.
    Sometimes I wonder why anyone uses anything else. LoL!!


Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>