emptywarrior37 (emptywarrior37) wrote in algorithms,

Partially sorted array

Let's say you have an array, which is sorted, and then some small number of the elements become corrupt, so the array is not sorted anymore. Is there some efficient way to sort it again that doesn't involve resorting the whole array? (You may not necessarily know which elements are corrupt)
  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 44 comments

sarah_briarmoss

February 8 2009, 00:48:30 UTC 5 years ago

Yes, it's called Insertion Sort.

It's a very similar algorithm to the one a human would use to sort a deck of cards. It has a wikipedia article and a c/c++ implementation


Another way you could do it is Bubblesort, where you scan through the entire array, swapping the elements that are out of place. this is an O(n^2) algorithm (the average time for sorting an array of n elements is n^2 comparisons), but if you only have a few elements that are out of place, it's actually quite fast. It also has a wikipedia article and a c/c++ implementation

emptywarrior37

February 8 2009, 00:52:13 UTC 5 years ago

Let's say that there are about log n corrupted elements. Then neither of these does any better than sorting the array all over again.

sarah_briarmoss

February 8 2009, 00:57:54 UTC 5 years ago

Are you doing this in c++ or another language? I'm only really familiar with these things in general terms, and in specific c++ implementations.

A good sort to use if there is log n or more unsorted items is Introspective Sort, which I think is the sort implemented in the c++ stl most of the time.

It's like quicksort, except it knows when quicksort is going to have n^2 time and chooses a different algorithm accordingly, and then it uses a different algorithm, which is heapsort, I think. Heapsort isn't a stable sort so it may end up resorting sorted items, but it still has a worst and average case n log n.

http://en.wikipedia.org/wiki/Introsort

emptywarrior37

February 8 2009, 01:07:47 UTC 5 years ago

Yes, but it still takes n log n time... I want something asymptotically better (say n is large...)

hearthand

5 years ago

chouyu_31

5 years ago

emptywarrior37

February 8 2009, 00:55:54 UTC 5 years ago

In particular, if say the last log n elements of the array become corrupted, and they now need to be moved to the beginning, then these algorithms don't do any better (unless I misunderstand you).

pstscrpt

February 10 2009, 13:13:41 UTC 5 years ago

Insertion sort is supposed to be a little harder to write than bubble sort (except that I can always remember why insertion sort works), but it's better at nearly sorted data than bubble sort.

Shell Sort works by using insertion sort of portions of an array to get the whole array into insertion sort's best case. Until Tony Hoare worked out QuickSort, I believe Shell Sort was considered the best in-place sort algorithm, and it's still better than QuickSort for small arrays.

Until I covered sorting in college, I sorted playing cards with selection sort. Now I do something between merge sort and radix sort.

arvindn

February 8 2009, 01:58:03 UTC 5 years ago

easy.

1. turn the array into a linked list
2. remove the elements that are out of place and put them into a separate array (or linked list)
3. sort the smaller list
4. do a merge

as you can see, this is O(n) as long as the number of out-of-place elements is O(n/log n).

the only tricky part is step 2. if you expect the list to be almost-sorted-increasing, and you find that the i'th element is bigger than the i+1'th element, is it because the former is too big or the latter is too small?

you can take care of that by storing a small amount of state as you traverse the list.

emptywarrior37

February 8 2009, 02:44:21 UTC 5 years ago

It's not clear how to do step 2.

Say you have initially
1 2 3 4 5 ... 99 100

Then it becomes (10 elements have changed)
1 2 3 4 5 101 102 103 104 105 .. 110 16 17 18 19 ... 99 100

As you are going along the array, it's not clear if the elements 101-110 are incorrectly-placed, or if the elements 16-26 are incorrectly placed without doing a lot of work.

arvindn

February 8 2009, 03:31:20 UTC 5 years ago

that's why i said you need to remember some state. if you have an upper bound M on the number of elements that might be out of order, in any consecutive block of 2M+1 elements you have enough information to determine which are the incorrect ones.

emptywarrior37

5 years ago

emptywarrior37

5 years ago

chouyu_31

5 years ago

antilamer

February 8 2009, 06:46:54 UTC 5 years ago

You scan through the array, splitting it into ascending chunks, and build any kind of balanced tree on top of each chunk; this takes O(chunk size) since the chunk is sorted, and there will be O(number of spoilt elements) such trees; in total building trees will take O(n). Then you merge the trees; merging two trees can usually be done in O(log a + log b) where a,b are sizes of the trees.
Looks like you end up with O(n*log m) where m is the number of spoilt elements.

antilamer

February 8 2009, 06:47:30 UTC 5 years ago

And in the end you just flatten the resulting tree.

darnley

February 8 2009, 08:33:45 UTC 5 years ago

Good one! Can be simplified a bit.

Indeed, split the array into m ascending chunks. Then put all first elements of the chunks into a heap (together with the information about which chunk the element comes from). Now each time you take the smallest element from the heap, add it to the answer, remove it from the heap, and place into the heap the next element of the corresponding chunk (if any).

Also O(n log m), but only one data structure. And only O(m) additional memory used.

antilamer

February 8 2009, 09:03:44 UTC 5 years ago

Whoa, cool!
I wonder if it can be parallelized equally well. Probably it can, using some kind of mergeable heaps, like Okasaki's.

worlds_owner

February 8 2009, 21:14:28 UTC 5 years ago

Stupid merge sort of ascending chunks (1st with 2nd, 3rd with 4th, etc) would have smaller constant than heap, wouldn't it?

mapjunkie

5 years ago

czarandy

5 years ago

mapjunkie

5 years ago

zamotivator

February 8 2009, 09:01:13 UTC 5 years ago

1) Sort only corrupted item. O( n_small log n_small )
2) Use merge sort for sorted big array and sorted small array O( n_big * n_small )

Result: O( n_small * ( log n_small + n_big ) )

antilamer

February 8 2009, 09:08:05 UTC 5 years ago

How do you find out which items are corrupted? That's the hardest part here, to my mind.

zamotivator

February 8 2009, 09:13:51 UTC 5 years ago

Oh. I think, what we are know, how items was corrupted.
*reread post* Oh, yes, i inattentive with first read. My opinion: very effective this... buble sort (: serious. O( count_of_corrupted_item * count_of_overall_item).

antilamer

5 years ago

subdmitry

February 9 2009, 01:32:43 UTC 5 years ago

How do you find out which items are corrupted? That's the hardest part here, to my mind.

That's easy - corrupted chunks are the ones which are not in order (not smaller and bigger then their neighbours). Well, some corrupted chunks may be in right place and can't be detected, but there is no need to move them.

zabivator, good solution!

subdmitry

5 years ago

subdmitry

5 years ago

zamotivator

5 years ago

subdmitry

5 years ago

antilamer

5 years ago

subdmitry

5 years ago

makingthematrix

February 8 2009, 11:26:19 UTC 5 years ago

If you want simplicity, use Bubble Sort. For small number of corrupt elements it is quite fast, just as sarah_briarmoss said. And you don't have to use another data structure, you can work on the array you hold elements in.

mapjunkie

February 8 2009, 17:17:02 UTC 5 years ago

Is this hypothetical? If not, could you describe how the array is used, and how the corruption occurs? The reason I ask is that distributional assumptions can matter, both for the data, and for the length of the error block. Also, if it isn't hypothetical, it's worth revisiting the source of corruption and making sure things aren't getting borked.

If it is hypothetical and I don't get any assumptions, I would traverse the array, finding the first two ascending lists, and then do a modified binary search/merge of the smaller into the larger (if one list is much smaller than the other) or a merge (if they were approximately equal size. I believe this is O(m * lg n)+O(n) with constant extra memory, and a very small constant on the O(n) term.

worlds_owner

February 9 2009, 23:31:06 UTC 5 years ago

I don't understand your solution.
How it would work on decreasing sequence?

mapjunkie

February 10 2009, 01:01:18 UTC 5 years ago

There are two interpretations to your question. I'll address the one that I didn't think you meant first, because it is easier.

1) If you want the data in descending order, you can rewrite the algorithm to handle that, but it is easier just to sort it ascending, and then swap.

2) The trick to handling descending sequences is that you are only handling two sequences at any given time in a linear pass, so that a descending sequence is treated as a series of single element ascending sequences. For example

1,2,5,4,3,6,7

1,2,5, 4 are the first two ascending sequences. Binary sort 4 into 1,2,5.

1,2,4,5 3,6,7 are the first two ascending sequences. They are about the same size, so it's probably worthwhile performing a merge.

1,2,3,4,5,6,7 and we're done

At any given time, you are just merging or binary inserting two ascending sequences, and ignoring the rest of the array, as we know after the merge or insert, we'll have reduced the problem to having an ascending sequence, and finding another one in the head of the remaining array and merging it.

worlds_owner

February 10 2009, 01:09:37 UTC 5 years ago

What do you mean by binary sorting? Searching for a proper place in array for a key ,using a binary search, and shifting the right part of an array one position to the right? If yes it would work in worst case just the same time as silly merging.

mapjunkie

5 years ago

mapjunkie

5 years ago

worlds_owner

5 years ago

mapjunkie

5 years ago

czarandy

February 11 2009, 15:22:55 UTC 5 years ago

Let k be the number of corrupted elements. Then you can actually solve this in O(n + k log k) time, which is optimal.

czarandy

February 11 2009, 15:26:19 UTC 5 years ago

I'll just summarize the algorithms discussed in this post:
O(n log n) - Resort the array.
O(k n) - Merge consecutive ascending sequences.
O(n log k) - Merge ascending sequences using a heap.
O(n + k log k) - Optimal algorithm (I don't want to spoil the surprise...)