Let's say you have an array, which is sorted, and then some small number of the elements become corrupt, so the array is not sorted anymore. Is there some efficient way to sort it again that doesn't involve resorting the whole array? (You may not necessarily know which elements are corrupt)

Another way you could do it is Bubblesort, where you scan through the entire array, swapping the elements that are out of place. this is an O(n^2) algorithm (the average time for sorting an array of n elements is n^2 comparisons), but if you only have a few elements that are out of place, it's actually quite fast. It also has a wikipedia article and a c/c++ implementation

Are you doing this in c++ or another language? I'm only really familiar with these things in general terms, and in specific c++ implementations.

A good sort to use if there is log n or more unsorted items is Introspective Sort, which I think is the sort implemented in the c++ stl most of the time.

It's like quicksort, except it knows when quicksort is going to have n^2 time and chooses a different algorithm accordingly, and then it uses a different algorithm, which is heapsort, I think. Heapsort isn't a stable sort so it may end up resorting sorted items, but it still has a worst and average case n log n.

In particular, if say the last log n elements of the array become corrupted, and they now need to be moved to the beginning, then these algorithms don't do any better (unless I misunderstand you).

Insertion sort is supposed to be a little harder to write than bubble sort (except that I can always remember why insertion sort works), but it's better at nearly sorted data than bubble sort.

Shell Sort works by using insertion sort of portions of an array to get the whole array into insertion sort's best case. Until Tony Hoare worked out QuickSort, I believe Shell Sort was considered the best in-place sort algorithm, and it's still better than QuickSort for small arrays.

Until I covered sorting in college, I sorted playing cards with selection sort. Now I do something between merge sort and radix sort.

1. turn the array into a linked list 2. remove the elements that are out of place and put them into a separate array (or linked list) 3. sort the smaller list 4. do a merge

as you can see, this is O(n) as long as the number of out-of-place elements is O(n/log n).

the only tricky part is step 2. if you expect the list to be almost-sorted-increasing, and you find that the i'th element is bigger than the i+1'th element, is it because the former is too big or the latter is too small?

you can take care of that by storing a small amount of state as you traverse the list.

Then it becomes (10 elements have changed) 1 2 3 4 5 101 102 103 104 105 .. 110 16 17 18 19 ... 99 100

As you are going along the array, it's not clear if the elements 101-110 are incorrectly-placed, or if the elements 16-26 are incorrectly placed without doing a lot of work.

that's why i said you need to remember some state. if you have an upper bound M on the number of elements that might be out of order, in any consecutive block of 2M+1 elements you have enough information to determine which are the incorrect ones.

You scan through the array, splitting it into ascending chunks, and build any kind of balanced tree on top of each chunk; this takes O(chunk size) since the chunk is sorted, and there will be O(number of spoilt elements) such trees; in total building trees will take O(n). Then you merge the trees; merging two trees can usually be done in O(log a + log b) where a,b are sizes of the trees. Looks like you end up with O(n*log m) where m is the number of spoilt elements.

Indeed, split the array into m ascending chunks. Then put all first elements of the chunks into a heap (together with the information about which chunk the element comes from). Now each time you take the smallest element from the heap, add it to the answer, remove it from the heap, and place into the heap the next element of the corresponding chunk (if any).

Also O(n log m), but only one data structure. And only O(m) additional memory used.

Oh. I think, what we are know, how items was corrupted. *reread post* Oh, yes, i inattentive with first read. My opinion: very effective this... buble sort (: serious. O( count_of_corrupted_item * count_of_overall_item).

If there are more than O(log n) corrupted items, this solution becomes no better than simply sorting the whole array from scratch. A good solution should become O(n log n) only in the case where there are O(n) corrupted items.

How do you find out which items are corrupted? That's the hardest part here, to my mind.

That's easy - corrupted chunks are the ones which are not in order (not smaller and bigger then their neighbours). Well, some corrupted chunks may be in right place and can't be detected, but there is no need to move them.

Yes, I naively not realize that corrupted items can be placed in a run. But that can be resolved if we drop the idea to pick only corrupted items and allow us to pick some more (but just a few ones, say O(k), where k is a number of corrupted elements). Let's scan the array left to right finding items that out of order (are bigger then next ones if array is sorted in ascending order) and place first and last indexes of intervals where items are in order in additional array (there can be no more then k intervals). Indexes of items between this intervals goes directly to array of corrupted elements. Finding optimal subset of this intervals that is ordered properly is complex, but we can use a greedy heuristic. First let's merge the neighbouring intervals if they are ordered about each other (smallest element is second interval is bigger then largest element in first interval). Then while there are more then one interval (not all interval collapsed) let's pick the smallest by number of elements interval and remove it (it`s items goes to discribed above array) and collapse if possible the intervals before and after it. Correct me if I wrong, but it seems that in this way we pick no more then k additional elements. This can be done in O(k^2) operations it worst case, but lately we need O(n) operatoins any way. If needed we can use a tree instead of array and use just O(k log k) operatons.

Finily let's sort the indexes of founded items, move them to small additional array and remove them from big array (O(n)). Then, like discribed by zabivator, sort this small array and merge it with big one. I should mention that merging can be done without additional memory if we scan both arrays from end and put merged elements to the end of space previously used by big array.

So we totaly use O(k) additonal memory and O(n+const*k^2) operations (can be improved to O(n+const*k*log(k))).

If you want simplicity, use Bubble Sort. For small number of corrupt elements it is quite fast, just as sarah_briarmoss said. And you don't have to use another data structure, you can work on the array you hold elements in.

Is this hypothetical? If not, could you describe how the array is used, and how the corruption occurs? The reason I ask is that distributional assumptions can matter, both for the data, and for the length of the error block. Also, if it isn't hypothetical, it's worth revisiting the source of corruption and making sure things aren't getting borked.

If it is hypothetical and I don't get any assumptions, I would traverse the array, finding the first two ascending lists, and then do a modified binary search/merge of the smaller into the larger (if one list is much smaller than the other) or a merge (if they were approximately equal size. I believe this is O(m * lg n)+O(n) with constant extra memory, and a very small constant on the O(n) term.

There are two interpretations to your question. I'll address the one that I didn't think you meant first, because it is easier.

1) If you want the data in descending order, you can rewrite the algorithm to handle that, but it is easier just to sort it ascending, and then swap.

2) The trick to handling descending sequences is that you are only handling two sequences at any given time in a linear pass, so that a descending sequence is treated as a series of single element ascending sequences. For example

1,2,5,4,3,6,7

1,2,5, 4 are the first two ascending sequences. Binary sort 4 into 1,2,5.

1,2,4,5 3,6,7 are the first two ascending sequences. They are about the same size, so it's probably worthwhile performing a merge.

1,2,3,4,5,6,7 and we're done

At any given time, you are just merging or binary inserting two ascending sequences, and ignoring the rest of the array, as we know after the merge or insert, we'll have reduced the problem to having an ascending sequence, and finding another one in the head of the remaining array and merging it.

What do you mean by binary sorting? Searching for a proper place in array for a key ,using a binary search, and shifting the right part of an array one position to the right? If yes it would work in worst case just the same time as silly merging.

I'll just summarize the algorithms discussed in this post: O(n log n) - Resort the array. O(k n) - Merge consecutive ascending sequences. O(n log k) - Merge ascending sequences using a heap. O(n + k log k) - Optimal algorithm (I don't want to spoil the surprise...)

sarah_briarmossFebruary 8 2009, 00:48:30 UTC 8 years ago

It's a very similar algorithm to the one a human would use to sort a deck of cards. It has a wikipedia article and a c/c++ implementation

Another way you could do it is Bubblesort, where you scan through the entire array, swapping the elements that are out of place. this is an O(n^2) algorithm (the average time for sorting an array of n elements is n^2 comparisons), but if you only have a few elements that are out of place, it's actually quite fast. It also has a wikipedia article and a c/c++ implementation

emptywarrior37February 8 2009, 00:52:13 UTC 8 years ago

sarah_briarmossFebruary 8 2009, 00:57:54 UTC 8 years ago

A good sort to use if there is log n or more unsorted items is Introspective Sort, which I think is the sort implemented in the c++ stl most of the time.

It's like quicksort, except it knows when quicksort is going to have n^2 time and chooses a different algorithm accordingly, and then it uses a different algorithm, which is heapsort, I think. Heapsort isn't a stable sort so it may end up resorting sorted items, but it still has a worst and average case n log n.

http://en.wikipedia.org/wiki/Introsort

emptywarrior378 years ago

hearthand8 years ago

chouyu_318 years ago

emptywarrior37February 8 2009, 00:55:54 UTC 8 years ago

pstscrptFebruary 10 2009, 13:13:41 UTC 8 years ago

Shell Sort works by using insertion sort of portions of an array to get the whole array into insertion sort's best case. Until Tony Hoare worked out QuickSort, I believe Shell Sort was considered the best in-place sort algorithm, and it's still better than QuickSort for small arrays.

Until I covered sorting in college, I sorted playing cards with selection sort. Now I do something between merge sort and radix sort.

arvindnFebruary 8 2009, 01:58:03 UTC 8 years ago

1. turn the array into a linked list

2. remove the elements that are out of place and put them into a separate array (or linked list)

3. sort the smaller list

4. do a merge

as you can see, this is O(n) as long as the number of out-of-place elements is O(n/log n).

the only tricky part is step 2. if you expect the list to be almost-sorted-increasing, and you find that the i'th element is bigger than the i+1'th element, is it because the former is too big or the latter is too small?

you can take care of that by storing a small amount of state as you traverse the list.

emptywarrior37February 8 2009, 02:44:21 UTC 8 years ago

Say you have initially

1 2 3 4 5 ... 99 100

Then it becomes (10 elements have changed)

1 2 3 4 5 101 102 103 104 105 .. 110 16 17 18 19 ... 99 100

As you are going along the array, it's not clear if the elements 101-110 are incorrectly-placed, or if the elements 16-26 are incorrectly placed without doing a lot of work.

arvindnFebruary 8 2009, 03:31:20 UTC 8 years ago

emptywarrior378 years ago

emptywarrior378 years ago

chouyu_318 years ago

antilamerFebruary 8 2009, 06:46:54 UTC 8 years ago

Looks like you end up with O(n*log m) where m is the number of spoilt elements.

antilamerFebruary 8 2009, 06:47:30 UTC 8 years ago

darnleyFebruary 8 2009, 08:33:45 UTC 8 years ago

Indeed, split the array into

mascending chunks. Then put all first elements of the chunks into a heap (together with the information about which chunk the element comes from). Now each time you take the smallest element from the heap, add it to the answer, remove it from the heap, and place into the heap the next element of the corresponding chunk (if any).Also

O(nlogm), but only one data structure. And onlyO(m) additional memory used.antilamer8 years ago

worlds_owner8 years ago

mapjunkie8 years ago

czarandy8 years ago

mapjunkie8 years ago

zamotivatorFebruary 8 2009, 09:01:13 UTC 8 years ago

2) Use merge sort for sorted big array and sorted small array O( n_big * n_small )

Result: O( n_small * ( log n_small + n_big ) )

antilamerFebruary 8 2009, 09:08:05 UTC 8 years ago

zamotivatorFebruary 8 2009, 09:13:51 UTC 8 years ago

*reread post* Oh, yes, i inattentive with first read. My opinion: very effective this... buble sort (: serious. O( count_of_corrupted_item * count_of_overall_item).

antilamerFebruary 8 2009, 09:18:02 UTC 8 years ago

A good solution should become O(n log n) only in the case where there are O(n) corrupted items.

subdmitryFebruary 9 2009, 01:32:43 UTC 8 years ago

How do you find out which items are corrupted? That's the hardest part here, to my mind.That's easy - corrupted chunks are the ones which are not in order (not smaller and bigger then their neighbours). Well, some corrupted chunks may be in right place and can't be detected, but there is no need to move them.

zabivator, good solution!

subdmitryFebruary 9 2009, 03:20:02 UTC 8 years ago

Finily let's sort the indexes of founded items, move them to small additional array and remove them from big array (O(n)). Then, like discribed by zabivator, sort this small array and merge it with big one. I should mention that merging can be done without additional memory if we scan both arrays from end and put merged elements to the end of space previously used by big array.

So we totaly use O(k) additonal memory and O(n+const*k^2) operations (can be improved to O(n+const*k*log(k))).

subdmitry8 years ago

zamotivator8 years ago

subdmitry8 years ago

antilamerFebruary 9 2009, 05:06:10 UTC 8 years ago

Which items are corrupted here?

subdmitry8 years ago

makingthematrixFebruary 8 2009, 11:26:19 UTC 8 years ago

sarah_briarmosssaid. And you don't have to use another data structure, you can work on the array you hold elements in.mapjunkieFebruary 8 2009, 17:17:02 UTC 8 years ago

If it is hypothetical and I don't get any assumptions, I would traverse the array, finding the first two ascending lists, and then do a modified binary search/merge of the smaller into the larger (if one list is much smaller than the other) or a merge (if they were approximately equal size. I believe this is O(m * lg n)+O(n) with constant extra memory, and a very small constant on the O(n) term.

worlds_ownerFebruary 9 2009, 23:31:06 UTC 8 years ago

How it would work on decreasing sequence?

mapjunkieFebruary 10 2009, 01:01:18 UTC 8 years ago

1) If you want the data in descending order, you can rewrite the algorithm to handle that, but it is easier just to sort it ascending, and then swap.

2) The trick to handling descending sequences is that you are only handling two sequences at any given time in a linear pass, so that a descending sequence is treated as a series of single element ascending sequences. For example

1,2,5,4,3,6,7

1,2,5, 4 are the first two ascending sequences. Binary sort 4 into 1,2,5.

1,2,4,5 3,6,7 are the first two ascending sequences. They are about the same size, so it's probably worthwhile performing a merge.

1,2,3,4,5,6,7 and we're done

At any given time, you are just merging or binary inserting two ascending sequences, and ignoring the rest of the array, as we know after the merge or insert, we'll have reduced the problem to having an ascending sequence, and finding another one in the head of the remaining array and merging it.

worlds_ownerFebruary 10 2009, 01:09:37 UTC 8 years ago

mapjunkie8 years ago

mapjunkie8 years ago

worlds_owner8 years ago

mapjunkie8 years ago

czarandyFebruary 11 2009, 15:22:55 UTC 8 years ago

czarandyFebruary 11 2009, 15:26:19 UTC 8 years ago

O(n log n) - Resort the array.

O(k n) - Merge consecutive ascending sequences.

O(n log k) - Merge ascending sequences using a heap.

O(n + k log k) - Optimal algorithm (I don't want to spoil the surprise...)