Parstat package =============== _Version 1.0 (first public version)_ Parstat is a package for the OpTeX format that counts glyphs and spaces on paragraph lines. From these numbers a statistic is made, which is printed into the logfile. One important measure of paragraph readability is the number of characters per line. It should be somewhere between 45–75 (spaces included) for optimal reading experience. This macro gives us the numbers to tell how good our layout is in that. The space counter also tells us the word count on the line (`words = spaces + 1`). That all is done directly after breaking the paragraph into lines by Lua code inside the LuaTeX engine, so no external tools are used. We can much more easily customize it to our specific work demands and get more precise results. The downside of this is that we analyze glyph nodes that are already shaped. So the term "glyphs", used throughout the package, is correct. While this works well with Latin (see Limitations section), it will probably fail on Arabic or Indic text, which builds on many ligatures and where glyph shaping was more complex. Usage ----- The first step is to load the package `\load[parstat]`. Then we must enable it by `\parstatenable`. From this moment on, we get the statistic for every finished paragraph. If we want to disable it, then use its counterpart `\parstatdisable`. These macros work in TeX groups as we would expect. We can ignore lines at the beginning of the paragraph by the counter `\_parstat_fskip`. This is handy for ignoring indented lines or when using initials, which would otherwise distort the statistic. We can also skip lines from the end of the paragraph by setting the counter `\_parstat_bskip`. In most cases it should be 1, which is the default, as the last one is mostly shorter than regular lines. At any time we can print the current statistic `\parstat`, which shows the state of the statistic at that point of execution. For printing a summary at the end, we can, for example, redefine the `\bye` macro. ``` \def\bye{\par\parstat \_bye} ``` We can reset the total statistic at any time by using `\parstatreset`. Example and output interpretation --------------------------------- Minimal example looks like this: ``` \load[parstat] \parstatenable \lipsum[1] \parstat \bye ``` After running, we get this output in the log: ``` Parstat 1 in './test.tex' at line 3 1X:(100g,14s) 2:(101g,12s) 3:(101g,13s) 4:(99g,16s) 5:(107g,14s) 6:(107g,16s) 7:(104g,14s) 8:(104g,16s) 9X:(36g,5s) Glyphs: ave 103.29, stddev 3.09, min 99, max 107 Spaces: ave 14.43, stddev 1.62, min 12, max 16 Parstat summary from './test.tex' at line 4 Analyzed 1 paragraphs, 7 lines in total Glyphs: ave 103.29, stddev 3.09, min 99, max 107 Spaces: ave 14.43, stddev 1.62, min 12, max 16 ``` The first line tell us that we see the first statistic and where it came from. Followed by counts of glyphs (g) and spaces (s) for every line. Line numbers ending with "X" are the ignored lines. Statistics of glyphs and spaces finishes the paragraph stat. Summary triggered with `\parstat` continues in the same style. Limitations ----------- * Counting ligature components works only if the shaper is TeX and not OpenType or Harf. Harf is default in OpTeX now, and we use OpenType fonts, so ligatures are counted as single characters. The reason for this is that when counting, the glyphs are already shaped (glyph subtype is 256) and there is no way to track their components. One possible workaround is to temporarily disable ligatures by using `\setff{-liga}`. This works, but it will change line breaks. * Math formulas are not counted. License ------- Parstat use the LaTeX Project Public License ([LPPL-1.3c](https://www.latex-project.org/lppl.txt)). Repository ---------- Parstat code lives in [this GitHub repository](https://github.com/petrk23/parstat). Issues can be reported there.