Dataset statistics
Number of variables | 4 |
---|---|
Number of observations | 55125 |
Missing cells | 0 |
Missing cells (%) | 0.0% |
Duplicate rows | 0 |
Duplicate rows (%) | 0.0% |
Total size in memory | 1.7 MiB |
Average record size in memory | 32.0 B |
Variable types
Text | 2 |
---|---|
Categorical | 1 |
Numeric | 1 |
pos is highly imbalanced (72.7%) | Imbalance |
Reproduction
Analysis started | 2024-08-20 08:04:28.787013 |
---|---|
Analysis finished | 2024-08-20 08:04:30.377965 |
Duration | 1.59 second |
Software version | ydata-profiling v4.8.3 |
Download configuration | config.json |
lemma
Text
Distinct | 52671 |
---|---|
Distinct (%) | 95.5% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 430.8 KiB |
Value | Count | Frequency (%) |
ホーム | 12 | < 0.1% |
大和 | 11 | < 0.1% |
太刀 | 9 | < 0.1% |
頭 | 8 | < 0.1% |
ダンス | 8 | < 0.1% |
アップ | 8 | < 0.1% |
ペーパー | 7 | < 0.1% |
大人 | 7 | < 0.1% |
一人 | 7 | < 0.1% |
端 | 7 | < 0.1% |
Other values (52648) | 55211 |
Most occurring characters
Value | Count | Frequency (%) |
る | 2568 | 1.8% |
ー | 1747 | 1.3% |
り | 1488 | 1.1% |
ン | 1329 | 1.0% |
い | 1313 | 0.9% |
し | 1073 | 0.8% |
す | 1028 | 0.7% |
ス | 793 | 0.6% |
ト | 704 | 0.5% |
く | 638 | 0.5% |
Other values (4393) | 126691 |
Most occurring categories
Value | Count | Frequency (%) |
(unknown) | 139372 |
Most frequent character per category
(unknown)
Value | Count | Frequency (%) |
る | 2568 | 1.8% |
ー | 1747 | 1.3% |
り | 1488 | 1.1% |
ン | 1329 | 1.0% |
い | 1313 | 0.9% |
し | 1073 | 0.8% |
す | 1028 | 0.7% |
ス | 793 | 0.6% |
ト | 704 | 0.5% |
く | 638 | 0.5% |
Other values (4393) | 126691 |
Most occurring scripts
Value | Count | Frequency (%) |
(unknown) | 139372 |
Most frequent character per script
(unknown)
Value | Count | Frequency (%) |
る | 2568 | 1.8% |
ー | 1747 | 1.3% |
り | 1488 | 1.1% |
ン | 1329 | 1.0% |
い | 1313 | 0.9% |
し | 1073 | 0.8% |
す | 1028 | 0.7% |
ス | 793 | 0.6% |
ト | 704 | 0.5% |
く | 638 | 0.5% |
Other values (4393) | 126691 |
Most occurring blocks
Value | Count | Frequency (%) |
(unknown) | 139372 |
Most frequent character per block
(unknown)
Value | Count | Frequency (%) |
る | 2568 | 1.8% |
ー | 1747 | 1.3% |
り | 1488 | 1.1% |
ン | 1329 | 1.0% |
い | 1313 | 0.9% |
し | 1073 | 0.8% |
す | 1028 | 0.7% |
ス | 793 | 0.6% |
ト | 704 | 0.5% |
く | 638 | 0.5% |
Other values (4393) | 126691 |
reading
Text
Distinct | 43079 |
---|---|
Distinct (%) | 78.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 430.8 KiB |
Value | Count | Frequency (%) |
しょう | 45 | 0.1% |
し | 40 | 0.1% |
かん | 37 | 0.1% |
き | 34 | 0.1% |
そう | 33 | 0.1% |
とう | 30 | 0.1% |
せい | 29 | 0.1% |
けい | 28 | 0.1% |
けん | 27 | < 0.1% |
かい | 25 | < 0.1% |
Other values (43055) | 54975 |
Most occurring characters
Value | Count | Frequency (%) |
う | 17636 | 7.7% |
ん | 16437 | 7.2% |
い | 13587 | 5.9% |
し | 10682 | 4.7% |
く | 8244 | 3.6% |
き | 7118 | 3.1% |
ょ | 6936 | 3.0% |
か | 6878 | 3.0% |
り | 4794 | 2.1% |
こ | 4738 | 2.1% |
Other values (144) | 132233 |
Most occurring categories
Value | Count | Frequency (%) |
(unknown) | 229283 |
Most frequent character per category
(unknown)
Value | Count | Frequency (%) |
う | 17636 | 7.7% |
ん | 16437 | 7.2% |
い | 13587 | 5.9% |
し | 10682 | 4.7% |
く | 8244 | 3.6% |
き | 7118 | 3.1% |
ょ | 6936 | 3.0% |
か | 6878 | 3.0% |
り | 4794 | 2.1% |
こ | 4738 | 2.1% |
Other values (144) | 132233 |
Most occurring scripts
Value | Count | Frequency (%) |
(unknown) | 229283 |
Most frequent character per script
(unknown)
Value | Count | Frequency (%) |
う | 17636 | 7.7% |
ん | 16437 | 7.2% |
い | 13587 | 5.9% |
し | 10682 | 4.7% |
く | 8244 | 3.6% |
き | 7118 | 3.1% |
ょ | 6936 | 3.0% |
か | 6878 | 3.0% |
り | 4794 | 2.1% |
こ | 4738 | 2.1% |
Other values (144) | 132233 |
Most occurring blocks
Value | Count | Frequency (%) |
(unknown) | 229283 |
Most frequent character per block
(unknown)
Value | Count | Frequency (%) |
う | 17636 | 7.7% |
ん | 16437 | 7.2% |
い | 13587 | 5.9% |
し | 10682 | 4.7% |
く | 8244 | 3.6% |
き | 7118 | 3.1% |
ょ | 6936 | 3.0% |
か | 6878 | 3.0% |
り | 4794 | 2.1% |
こ | 4738 | 2.1% |
Other values (144) | 132233 |
pos
Categorical
IMBALANCE
 
Distinct | 5 |
---|---|
Distinct (%) | < 0.1% |
Missing | 0 |
Missing (%) | 0.0% |
Memory size | 430.8 KiB |
名詞 | |
---|---|
動詞 | 4252 |
副詞 | 1207 |
形容詞 | 665 |
助動詞 | 2 |
Common Values
Value | Count | Frequency (%) |
名詞 | 48999 | |
動詞 | 4252 | 7.7% |
副詞 | 1207 | 2.2% |
形容詞 | 665 | 1.2% |
助動詞 | 2 | < 0.1% |
Length
Common Values (Plot)
Value | Count | Frequency (%) |
名詞 | 48999 | |
動詞 | 4252 | 7.7% |
副詞 | 1207 | 2.2% |
形容詞 | 665 | 1.2% |
助動詞 | 2 | < 0.1% |
Most occurring characters
Value | Count | Frequency (%) |
詞 | 55125 | |
名 | 48999 | |
動 | 4254 | 3.8% |
副 | 1207 | 1.1% |
形 | 665 | 0.6% |
容 | 665 | 0.6% |
助 | 2 | < 0.1% |
Most occurring categories
Value | Count | Frequency (%) |
(unknown) | 110917 |
Most frequent character per category
(unknown)
Value | Count | Frequency (%) |
詞 | 55125 | |
名 | 48999 | |
動 | 4254 | 3.8% |
副 | 1207 | 1.1% |
形 | 665 | 0.6% |
容 | 665 | 0.6% |
助 | 2 | < 0.1% |
Most occurring scripts
Value | Count | Frequency (%) |
(unknown) | 110917 |
Most frequent character per script
(unknown)
Value | Count | Frequency (%) |
詞 | 55125 | |
名 | 48999 | |
動 | 4254 | 3.8% |
副 | 1207 | 1.1% |
形 | 665 | 0.6% |
容 | 665 | 0.6% |
助 | 2 | < 0.1% |
Most occurring blocks
Value | Count | Frequency (%) |
(unknown) | 110917 |
Most frequent character per block
(unknown)
Value | Count | Frequency (%) |
詞 | 55125 | |
名 | 48999 | |
動 | 4254 | 3.8% |
副 | 1207 | 1.1% |
形 | 665 | 0.6% |
容 | 665 | 0.6% |
助 | 2 | < 0.1% |
score
Real number (ℝ)
Distinct | 53034 |
---|---|
Distinct (%) | 96.2% |
Missing | 0 |
Missing (%) | 0.0% |
Infinite | 0 |
Infinite (%) | 0.0% |
Mean | -0.31976353 |
Minimum | -1 |
---|---|
Maximum | 1 |
Zeros | 20 |
Zeros (%) | < 0.1% |
Negative | 49983 |
Negative (%) | 90.7% |
Memory size | 430.8 KiB |
Quantile statistics
Minimum | -1 |
---|---|
5-th percentile | -0.9785214 |
Q1 | -0.522353 |
median | -0.339964 |
Q3 | -0.176277 |
95-th percentile | 0.3779788 |
Maximum | 1 |
Range | 2 |
Interquartile range (IQR) | 0.346076 |
Descriptive statistics
Standard deviation | 0.38273805 |
---|---|
Coefficient of variation (CV) | -1.1969409 |
Kurtosis | 3.6076668 |
Mean | -0.31976353 |
Median Absolute Deviation (MAD) | 0.171778 |
Skewness | 1.3628983 |
Sum | -17626.964 |
Variance | 0.14648841 |
Monotonicity | Decreasing |
Value | Count | Frequency (%) |
-0.130089 | 22 | < 0.1% |
0 | 20 | < 0.1% |
-0.127868 | 10 | < 0.1% |
0.996701 | 10 | < 0.1% |
-0.0199173 | 9 | < 0.1% |
0.96851 | 7 | < 0.1% |
-0.133944 | 7 | < 0.1% |
-0.0752967 | 6 | < 0.1% |
-0.170712 | 6 | < 0.1% |
0.996789 | 5 | < 0.1% |
Other values (53024) | 55023 |
Value | Count | Frequency (%) |
-1 | 1 | |
-0.999999 | 1 | |
-0.999998 | 1 | |
-0.999997 | 2 | |
-0.999961 | 1 | |
-0.999947 | 1 | |
-0.999882 | 1 | |
-0.99986 | 1 | |
-0.999831 | 1 | |
-0.999805 | 1 |
Value | Count | Frequency (%) |
1 | 1 | |
0.999995 | 1 | |
0.999979 | 2 | |
0.999645 | 1 | |
0.999486 | 1 | |
0.999314 | 1 | |
0.999295 | 1 | |
0.999267 | 1 | |
0.999122 | 1 | |
0.999104 | 1 |
pos | score | |
---|---|---|
pos | 1.000 | 0.092 |
score | 0.092 | 1.000 |
lemma | reading | pos | score | |
---|---|---|---|---|
0 | 優れる | すぐれる | 動詞 | 1.000000 |
1 | 良い | よい | 形容詞 | 0.999995 |
2 | 喜ぶ | よろこぶ | 動詞 | 0.999979 |
3 | 褒める | ほめる | 動詞 | 0.999979 |
4 | めでたい | めでたい | 形容詞 | 0.999645 |
5 | 賢い | かしこい | 形容詞 | 0.999486 |
6 | 善い | いい | 形容詞 | 0.999314 |
7 | 適す | てきす | 動詞 | 0.999295 |
8 | 天晴 | あっぱれ | 名詞 | 0.999267 |
9 | 祝う | いわう | 動詞 | 0.999122 |
lemma | reading | pos | score | |
---|---|---|---|---|
55115 | 下手 | へた | 名詞 | -0.999831 |
55116 | 卑しい | いやしい | 形容詞 | -0.999860 |
55117 | ない | ない | 形容詞 | -0.999882 |
55118 | 浸ける | つける | 動詞 | -0.999947 |
55119 | 罵る | ののしる | 動詞 | -0.999961 |
55120 | ない | ない | 助動詞 | -0.999997 |
55121 | 酷い | ひどい | 形容詞 | -0.999997 |
55122 | 病気 | びょうき | 名詞 | -0.999998 |
55123 | 死ぬ | しぬ | 動詞 | -0.999999 |
55124 | 悪い | わるい | 形容詞 | -1.000000 |