This is a joint portal of the Masaryk University's NLP Centre and Lexical Computing dedicated to a number of software tools for corpus processing including a well-known corpus manager Sketch Engine .
If you have any questions or suggestions, please subscribe to the NoSketch Engine Google group, where you can get involved in the discussion with the developers and other users.
JusText is a HTML boilerplate removal tool. It can strip navigation links, headers, footers, etc. from HTML pages and leave just regular text containing full sentences. Paper | Cite | Licence