1. crfsuite::airbnb
    Dutch reviews of AirBnB customers on Brussels address locations available at www.insideairbnb.com
  2. crfsuite::airbnb_chunks
    Dutch reviews of AirBnB customers on Brussels address locations manually tagged with entities
  3. doc2vec::be_parliament_2020
    Corpus with Questions asked in the Belgium Federal Parliament in 2020
  4. nametagger::europeananews
    Tagged news paper articles from Europeana
  5. recogito::openseadragon_areas
    A dataset of annotations using openseadragon
    data.frame|3 x 9
  6. ruimtehol::dekamer
    Dataset from 2017 with Questions and Answers in the Belgium Federal Parliament
  7. ruimtehol::dekamer_theme_terminology
    Dataset containing relevant terminology for each theme of the 'dekamer' dataset
  8. textplot::example_btm
    Example Biterm Topic Model
  9. textplot::example_embedding
    Example word embedding matrix
  10. textplot::example_embedding_clusters
    Example words emitted in a ETM text clustering model
  11. textplot::example_udpipe
    Example annotation of text using udpipe
  12. textrank::joboffer
    The text of a job offer, annotated with the package udpipe
  13. tokenizers.bpe::belgium_parliament
    Dataset from 2017 with Questions asked in the Belgium Federal Parliament
  14. topicmodels.etm::ng20
    Bag of words sample of the 20 newsgroups dataset
  15. udpipe::brussels_listings
    Brussels AirBnB address locations available at www.insideairbnb.com
  16. udpipe::brussels_reviews
    Reviews of AirBnB customers on Brussels address locations available at www.insideairbnb.com
  17. udpipe::brussels_reviews_anno
    Reviews of the AirBnB customers which are tokenised, POS tagged and lemmatised
  18. udpipe::brussels_reviews_w2v_embeddings_lemma_nl
    An example matrix of word embeddings
    matrix|2687 x
  19. udpipe::udpipe_annotation_params
    List with training options set by the UDPipe community when building models based on the Universal Dependencies data