Let's say you have an array, which is sorted, and then some small number of the elements become corrupt, so the array is not sorted anymore. Is there some efficient way to sort it again that doesn't involve resorting the whole array? (You may not necessarily know which elements are corrupt)

Another way you could do it is Bubblesort, where you scan through the entire array, swapping the elements that are out of place. this is an O(n^2) algorithm (the average time for sorting an array of n elements is n^2 comparisons), but if you only have a few elements that are out of place, it's actually quite fast. It also has a wikipedia article and a c/c++ implementation

Are you doing this in c++ or another language? I'm only really familiar with these things in general terms, and in specific c++ implementations.

A good sort to use if there is log n or more unsorted items is Introspective Sort, which I think is the sort implemented in the c++ stl most of the time.

It's like quicksort, except it knows when quicksort is going to have n^2 time and chooses a different algorithm accordingly, and then it uses a different algorithm, which is heapsort, I think. Heapsort isn't a stable sort so it may end up resorting sorted items, but it still has a worst and average case n log n.

In particular, if say the last log n elements of the array become corrupted, and they now need to be moved to the beginning, then these algorithms don't do any better (unless I misunderstand you).

Insertion sort is supposed to be a little harder to write than bubble sort (except that I can always remember why insertion sort works), but it's better at nearly sorted data than bubble sort.

Shell Sort works by using insertion sort of portions of an array to get the whole array into insertion sort's best case. Until Tony Hoare worked out QuickSort, I believe Shell Sort was considered the best in-place sort algorithm, and it's still better than QuickSort for small arrays.

Until I covered sorting in college, I sorted playing cards with selection sort. Now I do something between merge sort and radix sort.

1. turn the array into a linked list 2. remove the elements that are out of place and put them into a separate array (or linked list) 3. sort the smaller list 4. do a merge

as you can see, this is O(n) as long as the number of out-of-place elements is O(n/log n).

the only tricky part is step 2. if you expect the list to be almost-sorted-increasing, and you find that the i'th element is bigger than the i+1'th element, is it because the former is too big or the latter is too small?

you can take care of that by storing a small amount of state as you traverse the list.

Then it becomes (10 elements have changed) 1 2 3 4 5 101 102 103 104 105 .. 110 16 17 18 19 ... 99 100

As you are going along the array, it's not clear if the elements 101-110 are incorrectly-placed, or if the elements 16-26 are incorrectly placed without doing a lot of work.

that's why i said you need to remember some state. if you have an upper bound M on the number of elements that might be out of order, in any consecutive block of 2M+1 elements you have enough information to determine which are the incorrect ones.

You scan through the array, splitting it into ascending chunks, and build any kind of balanced tree on top of each chunk; this takes O(chunk size) since the chunk is sorted, and there will be O(number of spoilt elements) such trees; in total building trees will take O(n). Then you merge the trees; merging two trees can usually be done in O(log a + log b) where a,b are sizes of the trees. Looks like you end up with O(n*log m) where m is the number of spoilt elements.

Indeed, split the array into m ascending chunks. Then put all first elements of the chunks into a heap (together with the information about which chunk the element comes from). Now each time you take the smallest element from the heap, add it to the answer, remove it from the heap, and place into the heap the next element of the corresponding chunk (if any).

Also O(n log m), but only one data structure. And only O(m) additional memory used.

Oh. I think, what we are know, how items was corrupted. *reread post* Oh, yes, i inattentive with first read. My opinion: very effective this... buble sort (: serious. O( count_of_corrupted_item * count_of_overall_item).

How do you find out which items are corrupted? That's the hardest part here, to my mind.

That's easy - corrupted chunks are the ones which are not in order (not smaller and bigger then their neighbours). Well, some corrupted chunks may be in right place and can't be detected, but there is no need to move them.

If you want simplicity, use Bubble Sort. For small number of corrupt elements it is quite fast, just as sarah_briarmoss said. And you don't have to use another data structure, you can work on the array you hold elements in.

Is this hypothetical? If not, could you describe how the array is used, and how the corruption occurs? The reason I ask is that distributional assumptions can matter, both for the data, and for the length of the error block. Also, if it isn't hypothetical, it's worth revisiting the source of corruption and making sure things aren't getting borked.

If it is hypothetical and I don't get any assumptions, I would traverse the array, finding the first two ascending lists, and then do a modified binary search/merge of the smaller into the larger (if one list is much smaller than the other) or a merge (if they were approximately equal size. I believe this is O(m * lg n)+O(n) with constant extra memory, and a very small constant on the O(n) term.

There are two interpretations to your question. I'll address the one that I didn't think you meant first, because it is easier.

1) If you want the data in descending order, you can rewrite the algorithm to handle that, but it is easier just to sort it ascending, and then swap.

2) The trick to handling descending sequences is that you are only handling two sequences at any given time in a linear pass, so that a descending sequence is treated as a series of single element ascending sequences. For example

1,2,5,4,3,6,7

1,2,5, 4 are the first two ascending sequences. Binary sort 4 into 1,2,5.

1,2,4,5 3,6,7 are the first two ascending sequences. They are about the same size, so it's probably worthwhile performing a merge.

1,2,3,4,5,6,7 and we're done

At any given time, you are just merging or binary inserting two ascending sequences, and ignoring the rest of the array, as we know after the merge or insert, we'll have reduced the problem to having an ascending sequence, and finding another one in the head of the remaining array and merging it.

What do you mean by binary sorting? Searching for a proper place in array for a key ,using a binary search, and shifting the right part of an array one position to the right? If yes it would work in worst case just the same time as silly merging.

I'll just summarize the algorithms discussed in this post: O(n log n) - Resort the array. O(k n) - Merge consecutive ascending sequences. O(n log k) - Merge ascending sequences using a heap. O(n + k log k) - Optimal algorithm (I don't want to spoil the surprise...)

sarah_briarmossFebruary 8 2009, 00:48:30 UTC 7 years ago

It's a very similar algorithm to the one a human would use to sort a deck of cards. It has a wikipedia article and a c/c++ implementation

Another way you could do it is Bubblesort, where you scan through the entire array, swapping the elements that are out of place. this is an O(n^2) algorithm (the average time for sorting an array of n elements is n^2 comparisons), but if you only have a few elements that are out of place, it's actually quite fast. It also has a wikipedia article and a c/c++ implementation

emptywarrior37February 8 2009, 00:52:13 UTC 7 years ago

sarah_briarmossFebruary 8 2009, 00:57:54 UTC 7 years ago

A good sort to use if there is log n or more unsorted items is Introspective Sort, which I think is the sort implemented in the c++ stl most of the time.

It's like quicksort, except it knows when quicksort is going to have n^2 time and chooses a different algorithm accordingly, and then it uses a different algorithm, which is heapsort, I think. Heapsort isn't a stable sort so it may end up resorting sorted items, but it still has a worst and average case n log n.

http://en.wikipedia.org/wiki/Introsort

emptywarrior377 years ago

hearthand7 years ago

chouyu_317 years ago

emptywarrior37February 8 2009, 00:55:54 UTC 7 years ago

pstscrptFebruary 10 2009, 13:13:41 UTC 7 years ago

Shell Sort works by using insertion sort of portions of an array to get the whole array into insertion sort's best case. Until Tony Hoare worked out QuickSort, I believe Shell Sort was considered the best in-place sort algorithm, and it's still better than QuickSort for small arrays.

Until I covered sorting in college, I sorted playing cards with selection sort. Now I do something between merge sort and radix sort.

arvindnFebruary 8 2009, 01:58:03 UTC 7 years ago

1. turn the array into a linked list

2. remove the elements that are out of place and put them into a separate array (or linked list)

3. sort the smaller list

4. do a merge

as you can see, this is O(n) as long as the number of out-of-place elements is O(n/log n).

the only tricky part is step 2. if you expect the list to be almost-sorted-increasing, and you find that the i'th element is bigger than the i+1'th element, is it because the former is too big or the latter is too small?

you can take care of that by storing a small amount of state as you traverse the list.

emptywarrior37February 8 2009, 02:44:21 UTC 7 years ago

Say you have initially

1 2 3 4 5 ... 99 100

Then it becomes (10 elements have changed)

1 2 3 4 5 101 102 103 104 105 .. 110 16 17 18 19 ... 99 100

As you are going along the array, it's not clear if the elements 101-110 are incorrectly-placed, or if the elements 16-26 are incorrectly placed without doing a lot of work.

arvindnFebruary 8 2009, 03:31:20 UTC 7 years ago

emptywarrior37February 8 2009, 03:34:46 UTC 7 years ago

emptywarrior377 years ago

chouyu_317 years ago

antilamerFebruary 8 2009, 06:46:54 UTC 7 years ago

Looks like you end up with O(n*log m) where m is the number of spoilt elements.

antilamerFebruary 8 2009, 06:47:30 UTC 7 years ago

darnleyFebruary 8 2009, 08:33:45 UTC 7 years ago

Indeed, split the array into

mascending chunks. Then put all first elements of the chunks into a heap (together with the information about which chunk the element comes from). Now each time you take the smallest element from the heap, add it to the answer, remove it from the heap, and place into the heap the next element of the corresponding chunk (if any).Also

O(nlogm), but only one data structure. And onlyO(m) additional memory used.antilamerFebruary 8 2009, 09:03:44 UTC 7 years ago

I wonder if it can be parallelized equally well. Probably it can, using some kind of mergeable heaps, like Okasaki's.

worlds_ownerFebruary 8 2009, 21:14:28 UTC 7 years ago

mapjunkie7 years ago

czarandy7 years ago

mapjunkie7 years ago

zamotivatorFebruary 8 2009, 09:01:13 UTC 7 years ago

2) Use merge sort for sorted big array and sorted small array O( n_big * n_small )

Result: O( n_small * ( log n_small + n_big ) )

antilamerFebruary 8 2009, 09:08:05 UTC 7 years ago

zamotivatorFebruary 8 2009, 09:13:51 UTC 7 years ago

*reread post* Oh, yes, i inattentive with first read. My opinion: very effective this... buble sort (: serious. O( count_of_corrupted_item * count_of_overall_item).

antilamer7 years ago

subdmitryFebruary 9 2009, 01:32:43 UTC 7 years ago

How do you find out which items are corrupted? That's the hardest part here, to my mind.That's easy - corrupted chunks are the ones which are not in order (not smaller and bigger then their neighbours). Well, some corrupted chunks may be in right place and can't be detected, but there is no need to move them.

zabivator, good solution!

subdmitry7 years ago

subdmitry7 years ago

zamotivator7 years ago

subdmitry7 years ago

antilamer7 years ago

subdmitry7 years ago

makingthematrixFebruary 8 2009, 11:26:19 UTC 7 years ago

sarah_briarmosssaid. And you don't have to use another data structure, you can work on the array you hold elements in.mapjunkieFebruary 8 2009, 17:17:02 UTC 7 years ago

If it is hypothetical and I don't get any assumptions, I would traverse the array, finding the first two ascending lists, and then do a modified binary search/merge of the smaller into the larger (if one list is much smaller than the other) or a merge (if they were approximately equal size. I believe this is O(m * lg n)+O(n) with constant extra memory, and a very small constant on the O(n) term.

worlds_ownerFebruary 9 2009, 23:31:06 UTC 7 years ago

How it would work on decreasing sequence?

mapjunkieFebruary 10 2009, 01:01:18 UTC 7 years ago

1) If you want the data in descending order, you can rewrite the algorithm to handle that, but it is easier just to sort it ascending, and then swap.

2) The trick to handling descending sequences is that you are only handling two sequences at any given time in a linear pass, so that a descending sequence is treated as a series of single element ascending sequences. For example

1,2,5,4,3,6,7

1,2,5, 4 are the first two ascending sequences. Binary sort 4 into 1,2,5.

1,2,4,5 3,6,7 are the first two ascending sequences. They are about the same size, so it's probably worthwhile performing a merge.

1,2,3,4,5,6,7 and we're done

At any given time, you are just merging or binary inserting two ascending sequences, and ignoring the rest of the array, as we know after the merge or insert, we'll have reduced the problem to having an ascending sequence, and finding another one in the head of the remaining array and merging it.

worlds_ownerFebruary 10 2009, 01:09:37 UTC 7 years ago

mapjunkie7 years ago

mapjunkie7 years ago

worlds_owner7 years ago

mapjunkie7 years ago

czarandyFebruary 11 2009, 15:22:55 UTC 7 years ago

czarandyFebruary 11 2009, 15:26:19 UTC 7 years ago

O(n log n) - Resort the array.

O(k n) - Merge consecutive ascending sequences.

O(n log k) - Merge ascending sequences using a heap.

O(n + k log k) - Optimal algorithm (I don't want to spoil the surprise...)

coiseswrote inalgorithms: ←Pseudo-random dicejtootfwrote inalgorithms: →n-th order statistics of the immutable data