Package: sbo 0.5.0
Valerio Gherardi
sbo: Text Prediction via Stupid Back-Off N-Gram Models
Utilities for training and evaluating text predictors based on Stupid Back-Off N-gram models (Brants et al., 2007, <https://www.aclweb.org/anthology/D07-1090/>).
Authors:
sbo_0.5.0.tar.gz
sbo_0.5.0.zip(r-4.5)sbo_0.5.0.zip(r-4.4)sbo_0.5.0.zip(r-4.3)
sbo_0.5.0.tgz(r-4.4-x86_64)sbo_0.5.0.tgz(r-4.4-arm64)sbo_0.5.0.tgz(r-4.3-x86_64)sbo_0.5.0.tgz(r-4.3-arm64)
sbo_0.5.0.tar.gz(r-4.5-noble)sbo_0.5.0.tar.gz(r-4.4-noble)
sbo_0.5.0.tgz(r-4.4-emscripten)sbo_0.5.0.tgz(r-4.3-emscripten)
sbo.pdf |sbo.html✨
sbo/json (API)
NEWS
# Install 'sbo' in R: |
install.packages('sbo', repos = c('https://vgherard.r-universe.dev', 'https://cloud.r-project.org')) |
Bug tracker:https://github.com/vgherard/sbo/issues
- twitter_dict - Top 1000 dictionary from Twitter training set
- twitter_freqs - K-gram frequencies from Twitter training set
- twitter_predtable - Next-word prediction tables from 3-gram model trained on Twitter training set
- twitter_test - Twitter test set
- twitter_train - Twitter training set
natural-language-processingngram-modelspredictive-textsbo
Last updated 4 years agofrom:75374a5bf5 (on v0.5.0). Checks:OK: 1 NOTE: 8. Indexed: yes.
Target | Result | Date |
---|---|---|
Doc / Vignettes | OK | Sep 08 2024 |
R-4.5-win-x86_64 | NOTE | Sep 08 2024 |
R-4.5-linux-x86_64 | NOTE | Sep 08 2024 |
R-4.4-win-x86_64 | NOTE | Sep 08 2024 |
R-4.4-mac-x86_64 | NOTE | Sep 08 2024 |
R-4.4-mac-aarch64 | NOTE | Sep 08 2024 |
R-4.3-win-x86_64 | NOTE | Sep 08 2024 |
R-4.3-mac-x86_64 | NOTE | Sep 08 2024 |
R-4.3-mac-aarch64 | NOTE | Sep 08 2024 |
Exports:as_sbo_dictionarybabbledictionaryeval_sbo_predictorkgram_freqskgram_freqs_fastpredictorpredtablepreprocessprunesbo_dictionarysbo_kgram_freqssbo_kgram_freqs_fastsbo_predictorsbo_predtabletokenize_sentencesword_coverage
Dependencies:briocallrclicpp11crayondescdiffobjdigestdplyrevaluatefansifsgenericsgluejsonlitelifecyclemagrittrpillarpkgbuildpkgconfigpkgloadpraiseprocessxpspurrrR6Rcpprematch2rlangrprojrootstringistringrtestthattibbletidyrtidyselectutf8vctrswaldowithr
Readme and manuals
Help Manual
Help page | Topics |
---|---|
Coerce to dictionary | as_sbo_dictionary as_sbo_dictionary.character |
Babble! | babble |
Evaluate Stupid Back-off next-word predictions | eval_sbo_predictor |
k-gram frequency tables | kgram_freqs kgram_freqs_fast sbo_kgram_freqs sbo_kgram_freqs_fast |
Plot method for word_coverage objects | plot.word_coverage |
Predict method for k-gram frequency tables | predict.sbo_kgram_freqs |
Predict method for Stupid Back-off text predictor | predict.sbo_predictor |
Preprocess text corpus | preprocess |
Prune k-gram objects | prune prune.sbo_kgram_freqs prune.sbo_predtable |
Dictionaries | dictionary sbo_dictionary |
Stupid Back-off text predictions | predictor predtable sbo_predictions sbo_predictor sbo_predictor.character sbo_predictor.sbo_kgram_freqs sbo_predictor.sbo_predtable sbo_predtable sbo_predtable.character sbo_predtable.sbo_kgram_freqs |
Sentence tokenizer | tokenize_sentences |
Top 1000 dictionary from Twitter training set | twitter_dict |
k-gram frequencies from Twitter training set | twitter_freqs |
Next-word prediction tables from 3-gram model trained on Twitter training set | twitter_predtable |
Twitter test set | twitter_test |
Twitter training set | twitter_train |
Word coverage fraction | word_coverage word_coverage.character word_coverage.sbo_dictionary word_coverage.sbo_kgram_freqs word_coverage.sbo_predictions |