Sorting algorithms for ya
#21
bot-a-bing: https://www.bing.com/search?q=%22yeah+ba...=MOZTSB%22
Reply
#22
HashListSort: a new algorithm for rearranging stuff into order, quickly, invented by CodeGuy. Times of 11ms for N=1,000,000 have been bragged about on Quora. I assume as much on similar machines to mine (2.16 GHz), 4GB main, Intel Celeron mobile processor. Mine has them beaten by a factor of 2 AND this is ALSO faster than FlashSort() by a factor of 25% for large N (ie: N>=16777216). No more than N+25%N auxiliary memory as part of the HashTable is necessary. FAST? YouBetcha. Typical Performance is about 8,000,00Recs/GHzS for N=1,000,000. I doubt there's anything faster. It even includes a modified CountingSort() for HIGHLY redundant data. ALL of these methods are from my Sorting Library, modified for datatype to suit the DOUBLE-precision data used for this demo. To adapt for strings, you'll need to use _CVx functions part of qb64. I spent WEEKS testing this code after developing the prototype in FAR less. BTW, performance is very close to O(n). Only CountingSort will get you there faster. Even for N=100,000,000, execution time is 5.625s @ 2.16Ghz, roughly expected linear time compared to N=1,000,000 (7/128)s (.0546875s). 100 times as much data takes a VERY slight bit longer than the 1% version extrapolated to 100N. .15625s difference for time at N=1,000,000 done 100 times versus N=100,000,000, so (1/640)s difference at 100 times the data. for verification, I added the same sequence check used in ALL my testing. This actually takes LONGER than the sort itself. It is designed to stop and exit if an error is found, the way I know the result is correct if the results are displayed. Yes, it has been thoroughly tested for descending order too, JUST in case you wondered about that.
Code:
TYPE MinMaxRec
    min AS LONG
    max AS LONG
END TYPE
NTest& = 99999999
REDIM Array(0 TO NTest&) AS DOUBLE
FOR q& = 0 TO NTest&
    Array(q&) = INT(RND * ((2 ^ 31) * 2 - 1))
NEXT
PRINT "Starting sort...";
SortOrder& = 1
s! = TIMER(.001)
HashListSort Array(), 0, NTest&, SortOrder&
f! = TIMER(.001)
PRINT "Finished Sort of "; NTest& - 0 + 1
'* sequence check for ascending or descending order (the 1 used in the call to HashListSort)
seqstart! = TIMER(.001)
IF SortOrder& = 1 THEN
    h& = LBOUND(array)
    FOR c& = 0 TO NTest&
        IF Array(s&) < Array(h&) THEN
            STOP
        ELSEIF Array(s&) > Array(h&) THEN
            h& = s&
        END IF
    NEXT
ELSE
    h& = LBOUND(array)
    FOR c& = 0 TO NTest&
        IF Array(s&) > Array(h&) THEN
            STOP
        ELSEIF Array(s&) < Array(h&) THEN
            h& = s&
        END IF
    NEXT
END IF
seqFinish! = TIMER(.001)
'* if there's a sequence error, this will never display. This takes roughly as long as the sort itself.
'* for N=100,000,000, the sequence check actually takes LONGER than the sort itself.
IF f! - s! < seqFinish! - seqstart! THEN
    PRINT f! - s!; " time to sort was faster."; (f! - s!) / (seqFinish! - seqstart!)
    PRINT seqFinish! - seqstart!; " time to sequence check"
ELSE
    PRINT f! - s!; " time to sequence check was faster or same."; (f! - s!) / (seqFinish! - seqstart!)
    PRINT seqFinish! - seqstart!; " time to sequence check"
END IF
'******************************************
'* Yes, this is MY invention, by CodeGuy. Faster than FlashSort and relatively simple.
'* It involves an array roughly 25% bigger than the original array,
'* Yes, you read that Correctly, faster than FlashSort, even with a final InsertionSort.
'* Can also be used in place of CountingSort as it keeps track of repetitions (counts > 1).
'* 09 AUG 2017. 8388608 DOUBLE-precision elements sorted in about 10.95s (actually, a bit less),
'* versus 11.80s for FlashSort. 25% faster than FlashSort at N=16777216.
'******************************************
SUB HashListSort (array() AS DOUBLE, start&, finish&, order&)
    DIM Mrec AS MinMaxRec
    GetMinMax array(), start&, finish&, Mrec
    IF ptrmin& = ptrmax& THEN EXIT SUB
    delta# = array(Mrec.max) - array(Mrec.min)
    Probe& = pRIMEnUMBER&(2 * INT(1.25# * (finish& - start&) / 2) - 1, -1)
    REDIM HashTable(0 TO Probe&) AS DOUBLE
    REDIM Count(0 TO Probe&) AS LONG
    FOR s& = start& TO finish&
        f& = INT(Probe& * (array(s&) - array(Mrec.min)) / delta#)
        DO
            IF f& > Probe& THEN
                f& = f& - Probe&
            END IF
            IF f& < 0 THEN
                f& = f& + Probe&
            END IF
            IF HashTable(f&) = array(s&) THEN
                Count(f&) = Count(f&) + 1
                EXIT DO
            ELSE
                IF Count(f&) = 0 THEN
                    HashTable(f&) = array(s&)
                    Count(f&) = 1
                    EXIT DO
                END IF
            END IF
            f& = f& + 1
        LOOP
    NEXT
    inserted& = start&
    IF order& = 1 THEN
        FOR s& = 0 TO Probe&
            WHILE Count(s&) > 0
                array(inserted&) = HashTable(s&)
                inserted& = inserted& + 1
                Count(s&) = Count(s&) - 1
            WEND
        NEXT
    ELSE
        FOR s& = Probe& TO 0 STEP -1
            WHILE Count(s&) > 0
                array(inserted&) = HashTable(s&)
                inserted& = inserted& + 1
                Count(s&) = Count(s&) - 1
            WEND
        NEXT
    END IF
    ERASE Count, HashTable
    InsertionSort array(), start&, finish&, order&
END SUB

'* This is the original version of the faster min-max vector search
SUB GetMinMax (array() AS DOUBLE, Start&, finish&, minmax AS MinMaxRec)
    n& = finish& - Start&
    IF (n& MOD 2) THEN
        minmax.min = Start&
        minmax.max = Start&
        i& = Start& + 1
    ELSE
        IF (array(Start&) > array(finish&)) THEN
            minmax.max = Start&
            minmax.min = finish&
        ELSE
            minmax.min = finish&
            minmax.max = Start&
        END IF
        i& = Start& + 2
    END IF '

    WHILE (i& < n&)
        IF (array(i&) > array(i& + 1)) THEN
            IF array(i&) > array(minmax.max) THEN
                minmax.max = i&
            END IF
            IF array(i& + 1) < array(minmax.min) THEN
                minmax.min = i& + 1
            END IF
        ELSE
            IF array(i& + 1) > array(minmax.max) THEN
                minmax.max = i& + 1
            END IF
            IF array(i&) < array(minmax.min) THEN
                minmax.min = i&
            END IF
        END IF
        i& = i& + 2
    WEND
END SUB

FUNCTION pRIMEnUMBER& (finish&, direction%)
    N& = finish& - START&
    IF (N& MOD 2) <> 0 THEN
        N& = N& + (2 * direction%)
    END IF
    sqrootn& = 2 * (SQR(N&) \ 2) + 1
    div& = 3
    DO
        IF div& > sqrootn& THEN
            EXIT DO
        ELSE
            IF (N& MOD div&) THEN
                div& = div& + 2
            ELSE
                N& = N& + (2 * direction%)
                sqrootn& = 2 * (SQR(N&) \ 2) + 1
                div& = 3
            END IF
        END IF
    LOOP
    pRIMEnUMBER& = N&
END FUNCTION

'********************
'* InsertionSort is a simple to construct sort. Generally because of its O(n^2) running time, it's usually limited to VERY short runs
'* or used as a final sorting stage of many sorts. it is stable. The advantage of this sort for nearly sorted arrays is it runs in nearly O(n) time.
'********************
SUB InsertionSort (Array() AS DOUBLE, start&, finish&, order&)
    SELECT CASE order&
        CASE 1
            FOR i& = start& + 1 TO finish&
                FOR j& = i& TO start& + 1 STEP -1
                    IF Array(j& - 1) > Array(j&) THEN
                        SWAP Array(j&), Array(j& - 1)
                    ELSE
                        EXIT FOR
                    END IF
                NEXT
            NEXT
        CASE ELSE
            FOR i& = start& + 1 TO finish&
                FOR j& = i& TO start& + 1 STEP -1
                    IF Array(j& - 1) < Array(j&) THEN
                        SWAP Array(j&), Array(j& - 1)
                    ELSE
                        EXIT FOR
                    END IF
                NEXT
            NEXT
    END SELECT
END SUB

The EXACT same code as contributed to QB64.org -- STEAL freely. Maybe even give props to CodeGuy for his contribution to fast and correct algorithms.
Reply
#23
hi CodeGuy
I am wondering ..because you are good at invention
Sorting algo can be used as sorting operators with precedence
is there a such a algorythm?
or i just babeling   Clapping Hands
Reply
#24
Hi, codeguy, I tested your HashListSort with NTest& = 2147483646 (signed long)

and got nothing: no memory error or anything.

Erik.

because in the qb64 wiki it states the upper range of an array element.
dndbbs project:

Links to my MUD: (strictly 16-bit); AKA XP:

Dndbbs executables
http://www.filegate.net/pdn/pdnbasic/dnd50a1e.zip

Dndbbs source
http://www.filegate.net/pdn/pdnbasic/dnd50a1s.zip

Dndbbs upgrade
http://www.filegate.net/pdn/pdnbasic/dnd50a1u.zip

DNDDOOR - https://bit.ly/EriksDNDDoor DUNGEON - https://bit.ly/EriksDungeon
Interpreter - https://bit.ly/EriksSICK Hex Editor - https://bit.ly/EriksHexEditor Utilities - https://bit.ly/EriksUtils
QB45 files: - https://bit.ly/EriksQB45 QB64shell - https://bit.ly/QB64shell Some old QB64 versions: - https://bit.ly/OldQB64
Reply
#25
Unless you have sufficient RAM to store Array(), HashTable(), Count() and such, there is no way you'll be able to store nearly that many elements. That's 12-20 GB alone for 2Billion+ elements with only 2GB main. Even on my machine, I couldn't do that. I would use either space on a drive or sort and merge in manageable chunks. This is a proof-of-concept, which is proven to work within computing environment limits. I sincerely doubt you'll find ANY method faster for n>(1/2) billion. Even FlashSort(), the previous champion was 27+ times SLOWER than HashListSort(). Use it how it's designed and know its limits and it should serve you and others well. I don't know about you, but throughput of 10M+/GhzS is WAY beyond acceptable to me.

Aurel, It is entirely possible to use character orders not corresponding to ASCII. You would simply create your own table of character maps to sorting precedence. But this would require using custom comparators based on this mapping. If you wanted, you could make "z" sort before "b", but "a" sort before "b", although I haven't a clue about why anyone would want to do this.
Reply
#26
Which is why HashListSort() won't sort code:

Code:
NTest& = 2147483646
REDIM Array(0 TO NTest& AS DOUBLE
because it would attempt to assign 16GB of RAM.

Erik.
dndbbs project:

Links to my MUD: (strictly 16-bit); AKA XP:

Dndbbs executables
http://www.filegate.net/pdn/pdnbasic/dnd50a1e.zip

Dndbbs source
http://www.filegate.net/pdn/pdnbasic/dnd50a1s.zip

Dndbbs upgrade
http://www.filegate.net/pdn/pdnbasic/dnd50a1u.zip

DNDDOOR - https://bit.ly/EriksDNDDoor DUNGEON - https://bit.ly/EriksDungeon
Interpreter - https://bit.ly/EriksSICK Hex Editor - https://bit.ly/EriksHexEditor Utilities - https://bit.ly/EriksUtils
QB45 files: - https://bit.ly/EriksQB45 QB64shell - https://bit.ly/QB64shell Some old QB64 versions: - https://bit.ly/OldQB64
Reply
#27
Use it within reasonable limits and compare against FlashSort(), the previous sorting champ I dethroned and you'll see how much faster HashListSort (throughput around 10M+/GHzS) than FlashSort(). DWYT, already been there, done that and it was almost 28 times slower on the exact same dataset. (5.4s vs 160s+) not even a contest. Yes, you CAN eliminate the Sequence checking code. The sorting algorithm HashListSort() is CORRECT when used within its limits and used correctly.
Reply
#28
The newest iteration of  CodeGuySortLib is at QB64(net) and QB64(org)
http://qb64.org/basbin/4041.txt, including a BFPRT implementation.
Reply
#29
Quote:hi CodeGuy
I am wondering ..because you are good at invention
Sorting algo can be used as sorting operators with precedence
is there a such a algorythm?
or i just babeling

@Aurel: You are just babeling..

Erik.

(all hail the great babel)
dndbbs project:

Links to my MUD: (strictly 16-bit); AKA XP:

Dndbbs executables
http://www.filegate.net/pdn/pdnbasic/dnd50a1e.zip

Dndbbs source
http://www.filegate.net/pdn/pdnbasic/dnd50a1s.zip

Dndbbs upgrade
http://www.filegate.net/pdn/pdnbasic/dnd50a1u.zip

DNDDOOR - https://bit.ly/EriksDNDDoor DUNGEON - https://bit.ly/EriksDungeon
Interpreter - https://bit.ly/EriksSICK Hex Editor - https://bit.ly/EriksHexEditor Utilities - https://bit.ly/EriksUtils
QB45 files: - https://bit.ly/EriksQB45 QB64shell - https://bit.ly/QB64shell Some old QB64 versions: - https://bit.ly/OldQB64
Reply