Actually, it’s 5 4 10 12 2 9 8 11 6 7 3 1 for me, but too lazy to edit the image

  • tetris11@lemmy.ml
    link
    fedilink
    arrow-up
    77
    ·
    edit-2
    4 hours ago

    Which language provides the most random alphabetically sorted sequence?

    Data
    |  N | Eng | Dut | Ger | Tur | Chi | Lex |
    |----+-----+-----+-----+-----+-----+-----|
    |  1 |   8 |   8 |   8 |   6 |   8 |   1 |
    |  2 |  11 |   3 |   3 |   5 |   2 |  10 |
    |  3 |   5 |   1 |   1 |   1 |   9 |  11 |
    |  4 |   4 |  11 |  11 |   9 |   6 |  12 |
    |  5 |   9 |   9 |   5 |   4 |   3 |   2 |
    |  6 |   1 |  10 |   9 |   2 |   4 |   3 |
    |  7 |   7 |  12 |   6 |  10 |   7 |   4 |
    |  8 |   6 |   2 |   7 |  11 |  10 |   5 |
    |  9 |  10 |   4 |   4 |  12 |  12 |   6 |
    | 10 |   3 |   5 |  10 |   8 |  11 |   7 |
    | 11 |  12 |   6 |   2 |   3 |   5 |   8 |
    | 12 |   2 |   7 |  12 |   7 |   1 |   9 |
    

    Sourced from comments in thread (English from image, Dutch from [email protected], German from [email protected] , Turkish from some rando, Chinese from [email protected], Lexicographical from [email protected])

    Plot with Correlation Scores

    We will compute the pearson correlation (r-statistic) score by comparing the base number (column 1) with the corresponding language column. We will also compute the Serial correlation, by creating staggered columns that measure how close a number is in a sequence to the one before it.

    Staggered Table
    cat alphabetic.tab \
        | awk '{print $0"\t"prE"\t"prD"\t"prG"\t"prT"\t"prC"\t"prL;prE=$2;prD=$3;prG=$4;prT=$5;prC=$6;prL=$7}' \
        | tee alphabetic.tab.stagger
    
    Plot Code
    gnuplot -p -e '
      set xlabel "Base Sequence";
      set ylabel "Alphabetic";
      set xtics 1,1,12;
      set ytics 1,1,12;
      set title "Alphabetic Number Plot with Correlation Score";
      set rmargin 25; set key at graph 1.5,0.9;
      set size ratio 0.45;
    
      stats "alphabetic.tab.stagger" using 1:2 name "E";
      stats "" using 1:3 name "D";
      stats "" using 1:4 name "G";
      stats "" using 1:5 name "T";
      stats "" using 1:6 name "C";
      stats "" using 1:7 name "L";
      
      stats "" using 2:8 name "ES";
      stats "" using 3:9 name "DS";
      stats "" using 4:10 name "GS";
      stats "" using 5:11 name "TS";
      stats "" using 6:12 name "CS";
      stats "" using 7:13 name "LS";
    
      set label 1 sprintf("%10s  %6s  %6s", "", "Base", "Stagger") at graph 1.07,0.95;
    
      plot "" using 1:2 with lines lw 3 title sprintf("%10s  %+.3f  %+.3f", "English", E_correlation, ES_correlation),
           "" using 1:3 with lines lw 3 title sprintf("%10s  %+.3f  %+.3f", "Dutch", D_correlation, DS_correlation),
           "" using 1:4 with lines lw 3 title sprintf("%10s  %+.3f  %+.3f", "German", G_correlation, GS_correlation),
           "" using 1:5 with lines lw 3 title sprintf("%10s  %+.3f  %+.3f", "Turkish", T_correlation, TS_correlation),
           "" using 1:6 with lines lw 3 title sprintf("%10s  %+.3f  %+.3f", "Chinese", C_correlation, CS_correlation),
           "" using 1:7 with lines lw 1 title sprintf("%10s  %+.3f  %+.3f", "Lexicon", L_correlation, LS_correlation)
    '
    

    It looks like Dutch has the lowest (near 0) correlation to both the base sequence and it’s own staggered sequence, with Turkish mirroring it’s staggered randomness somewhat.

    The least random alphabetic sequences are English and German.


    Updated: Added chinese and staggered analysis.