This update for python-nltk fixes the following issues:
Update to 3.7
-
Improve and update the NLTK team page on nltk.org (#2855,
#2941)
-
Drop support for Python 3.6, support Python 3.10 (#2920)
-
Update to 3.6.7
- Resolve IndexError in
sent_tokenize and word_tokenize
(#2922)
-
Update to 3.6.6
- Refactor
gensim.doctest to work for gensim 4.0.0 and up
(#2914)
- Add Precision, Recall, F-measure, Confusion Matrix to Taggers
(#2862)
- Added warnings if .zip files exist without any corresponding
.csv files. (#2908)
- Fix
FileNotFoundError when the download_dir is
a non-existing nested folder (#2910)
- Rename omw to omw-1.4 (#2907)
- Resolve ReDoS opportunity by fixing incorrectly specified
regex (#2906, boo#1191030, CVE-2021-3828).
- Support OMW 1.4 (#2899)
- Deprecate Tree get and set node methods (#2900)
- Fix broken inaugural test case (#2903)
- Use Multilingual Wordnet Data from OMW with newer Wordnet
versions (#2889)
- Keep NLTKs 'tokenize' module working with pathlib (#2896)
- Make prettyprinter to be more readable (#2893)
- Update links to the nltk book (#2895)
- Add
CITATION.cff to nltk (#2880)
- Resolve serious ReDoS in PunktSentenceTokenizer (#2869)
- Delete old CI config files (#2881)
- Improve Tokenize documentation + add TokenizerI as superclass
for TweetTokenizer (#2878)
- Fix expected value for BLEU score doctest after changes from
#2572
- Add multi Bleu functionality and tests (#2793)
- Deprecate 'return_str' parameter in NLTKWordTokenizer and
TreebankWordTokenizer (#2883)
- Allow empty string in CFG's + more (#2888)
- Partition
tree.py module into tree package + pickle fix
(#2863)
- Fix several TreebankWordTokenizer and NLTKWordTokenizer bugs
(#2877)
- Rewind Wordnet data file after each lookup (#2868)
- Correct init call for SyntaxCorpusReader subclasses
(#2872)
- Documentation fixes (#2873)
- Fix levenstein...