Not for me, tho

Die Martin Die@sh.itjust.works · 1 day ago

Not for me, tho

tetris11@lemmy.ml · edit-2 4 hours ago

Which language provides the most random alphabetically sorted sequence?

Data

|  N | Eng | Dut | Ger | Tur | Chi | Lex |
|----+-----+-----+-----+-----+-----+-----|
|  1 |   8 |   8 |   8 |   6 |   8 |   1 |
|  2 |  11 |   3 |   3 |   5 |   2 |  10 |
|  3 |   5 |   1 |   1 |   1 |   9 |  11 |
|  4 |   4 |  11 |  11 |   9 |   6 |  12 |
|  5 |   9 |   9 |   5 |   4 |   3 |   2 |
|  6 |   1 |  10 |   9 |   2 |   4 |   3 |
|  7 |   7 |  12 |   6 |  10 |   7 |   4 |
|  8 |   6 |   2 |   7 |  11 |  10 |   5 |
|  9 |  10 |   4 |   4 |  12 |  12 |   6 |
| 10 |   3 |   5 |  10 |   8 |  11 |   7 |
| 11 |  12 |   6 |   2 |   3 |   5 |   8 |
| 12 |   2 |   7 |  12 |   7 |   1 |   9 |

Sourced from comments in thread (English from image, Dutch from [email protected], German from [email protected] , Turkish from some rando, Chinese from [email protected], Lexicographical from [email protected])

Plot with Correlation Scores

We will compute the pearson correlation (r-statistic) score by comparing the base number (column 1) with the corresponding language column. We will also compute the Serial correlation, by creating staggered columns that measure how close a number is in a sequence to the one before it.

Staggered Table

cat alphabetic.tab \
    | awk '{print $0"\t"prE"\t"prD"\t"prG"\t"prT"\t"prC"\t"prL;prE=$2;prD=$3;prG=$4;prT=$5;prC=$6;prL=$7}' \
    | tee alphabetic.tab.stagger

Plot Code

gnuplot -p -e '
  set xlabel "Base Sequence";
  set ylabel "Alphabetic";
  set xtics 1,1,12;
  set ytics 1,1,12;
  set title "Alphabetic Number Plot with Correlation Score";
  set rmargin 25; set key at graph 1.5,0.9;
  set size ratio 0.45;

  stats "alphabetic.tab.stagger" using 1:2 name "E";
  stats "" using 1:3 name "D";
  stats "" using 1:4 name "G";
  stats "" using 1:5 name "T";
  stats "" using 1:6 name "C";
  stats "" using 1:7 name "L";
  
  stats "" using 2:8 name "ES";
  stats "" using 3:9 name "DS";
  stats "" using 4:10 name "GS";
  stats "" using 5:11 name "TS";
  stats "" using 6:12 name "CS";
  stats "" using 7:13 name "LS";

  set label 1 sprintf("%10s  %6s  %6s", "", "Base", "Stagger") at graph 1.07,0.95;

  plot "" using 1:2 with lines lw 3 title sprintf("%10s  %+.3f  %+.3f", "English", E_correlation, ES_correlation),
       "" using 1:3 with lines lw 3 title sprintf("%10s  %+.3f  %+.3f", "Dutch", D_correlation, DS_correlation),
       "" using 1:4 with lines lw 3 title sprintf("%10s  %+.3f  %+.3f", "German", G_correlation, GS_correlation),
       "" using 1:5 with lines lw 3 title sprintf("%10s  %+.3f  %+.3f", "Turkish", T_correlation, TS_correlation),
       "" using 1:6 with lines lw 3 title sprintf("%10s  %+.3f  %+.3f", "Chinese", C_correlation, CS_correlation),
       "" using 1:7 with lines lw 1 title sprintf("%10s  %+.3f  %+.3f", "Lexicon", L_correlation, LS_correlation)
'

It looks like Dutch has the lowest (near 0) correlation to both the base sequence and it’s own staggered sequence, with Turkish mirroring it’s staggered randomness somewhat.

The least random alphabetic sequences are English and German.

Updated: Added chinese and staggered analysis.

Resonosity@lemmy.dbzer0.com · 3 hours ago

c/dataisbeautiful

jaybone@lemmy.zip · 1 day ago

You put a lot of work into this.

FeatherConstrictor@sh.itjust.works · 1 day ago

Thank you for doing and sharing this

null@slrpnk.net · 22 hours ago

This is the second comment I’ve seen like this from you.

Please never stop.

Die Martin Die@sh.itjust.works · 21 hours ago

I didn’t expect soneone to put that much effort into it.

Thanks! This is awesome!