Technical Interviews: Properties of Sorting Algorithms

Run time complexity:

Amount of steps/number of operations required to complete the algorithm.

Performance:

How fast can the algorithm run? This is a bit different then running time complexity because here you care about how long an operation takes. For example, if you need to look up something in a DB to complete your sorting, because let us say a list only has ids but you want it sorted according to names as well - looking up those names is expensive and takes time.

//how long it would take to carry out 10 operations of varying lengths:
10 operations of x (each x takes 10 ms) = 10 x 10 ms = 100 ms
10 operations of y (each y takes 100 ms) = 10 x 100 ms = 1000 ms

Another topic that falls under performance is cache locality. Whether you're frequently accessing similar items while you're processing a list is important, especially when your list is large and does not fit into memory. If you're constantly thrashing and moving things in and out of memory, you have bad cache locality. If you're working on a subset of a list, and then move on and work on the next subset without constantly thrashing (for example constantly traversing the entire list), you have good cache locality.

Good cache locality: quick sort
Bad cache locality: heap sort, merge sort (if not done in place)

Storage (in-place vs not):

Extra storage needed to hold things while sorting a list.

An algorithm is in-place if it does not need extra storage space while sorting the list (minus a few variables). For example, while the algorithm is sorting, it does not need to create a new list to put the sorted elements into, it can do the sorting and placing of elements in their proper place directly on the existing list.

Stable vs Non-Stable:

An algorithm is stable if it does not change the relative position of items that are equal.
So after sorting a list, the relative position of items that are equal stays the same guaranteed.

Sort the list below by GPA (high to low), input List below:
[(Anna, 4.0), (David, 4.0), (John, 2.0), (Kelly, 3.0)]

Stable should produce:
[(Anna, 4.0), (David, 4.0), (Kelly, 3.0), (John, 2.0)]

Relative position of Anna and David are maintained, Anna still comes before David.

It should not produce (while non-stable algorithms could produce this):
[(David, 4.0), (Anna, 4.0), (Kelly, 3.0), (John, 2.0)]

The sorting is correct, but the relative position of David and Anna is not the original.

You care to chose a stable sort if you care about the current order (because it has some property you want to preserve) of the list you're about to sort.

So like the example above, if I have a list sorted by alphabetical order, and now I want to sort by GPA while keeping the alphabetical order sorting. I'd want to chose a stable algorithm.

Why not just sort and just take into account the various properties? Who cares which algorithm you use? Because it might be expensive to look up the data to attain the exact sorting you want. Let us say you have a list with ids and GPAs only. You want that list sorted by alphabetical order of names, even though names aren't included. If the list is presorted alphabetically, you want to retain that sorting. Because if you do not, then you have to do a db look up (which is slow) to get the name to factor it into your sorting equation when doing the sort.

Technical Interviews

Sunday, November 6, 2011

Properties of Sorting Algorithms

1 comment: