About 195,000 results
Open links in new tab
  1. Looking for a clear definition of what a "tokenizer", "parser" and ...

    Mar 28, 2018 · A tokenizer breaks a stream of text into tokens, usually by looking for whitespace (tabs, spaces, new lines). A lexer is basically a tokenizer, but it usually attaches extra context …

  2. What does Keras Tokenizer method exactly do? - Stack Overflow

    On occasion, circumstances require us to do the following: from keras.preprocessing.text import Tokenizer tokenizer = Tokenizer(num_words=my_max) Then, invariably, we chant this mantra: …

  3. Azure OpenAI Embeddings - Tiktoken Connection Error

    Sep 11, 2024 · I am using Azure AI Search instance with an embedding function text-embedding-ada-002. I am calling the embedding function via AzureOpenAIEmbeddings class using …

  4. Why does tokeniser break down words that are present in vocab

    Apr 30, 2022 · In my understanding, what tokeniser does is that, given each word, the tokeniser will break down the word into sub-words only if the word is not present in the …

  5. How to use NLTK tokeniser on text with quotation marks?

    Nov 21, 2023 · NLTK's tokeniser is acting very strangely when I try to tokenise a text with quotation marks in it. I have a .txt file like this: This text has some "quotation marks" to really …

  6. tokenize - Elasticsearch implement off-the-shelf language analyser …

    Jun 17, 2024 · Elasticsearch implement off-the-shelf language analyser but use custom tokeniser Asked 1 year, 2 months ago Modified 1 year, 2 months ago Viewed 42 times

  7. Unable to get the tokenizer of Gemma-3 - Stack Overflow

    Mar 22, 2025 · I am trying to get the tokenizer using huggingface AutoTokenizer library, but I am unable to fetch, is there any other way to get it? Where I am doing wrong? from transformers …

  8. how to use tiktoken in offline mode computer - Stack Overflow

    Apr 26, 2023 · import tiktoken tokenizer = tiktoken.get_encoding ("cl100k_base") tokenizer = tiktoken.encoding_for_model ("gpt-3.5-turbo") text = "Hello, nice to meet you" tokenizer.

  9. How to download punkt tokenizer in nltk? - Stack Overflow

    import nltk nltk.download('punkt') but then i am unable to download this package because everytime i run this i get error that says

  10. C++ tokeniser, skip the lines start with '//' and the ad hoc

    Apr 19, 2017 · I am practising to write a tokeniser in C++, however, I assume that I will need to take a file with some comments as the input file. Then how can I skip the comments like those …