ckip_transformers.nlp.util module
This module implements the utilities for CKIP Transformers NLP drivers.
- class ckip_transformers.nlp.util.CkipTokenClassification(model_name: str, tokenizer_name: Optional[str] = None, *, device: int = - 1)[source]
Bases:
object
The base class for token classification task.
- Parameters
model_name (
str
) – The pretrained model name (e.g.'ckiplab/bert-base-chinese-ws'
).tokenizer_name (
str
, optional, defaults to model_name) – The pretrained tokenizer name (e.g.'bert-base-chinese'
).device (
int
, optional, defaults to -1) – Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on the associated CUDA device id.
- __call__(input_text: Union[List[str], List[List[str]]], *, use_delim: bool = False, delim_set: Optional[str] = ',,。::;;!!??', batch_size: int = 256, max_length: Optional[int] = None, show_progress: bool = True, pin_memory: bool = True)[source]
Call the driver.
- Parameters
input_text (
List[str]
orList[List[str]]
) – The input sentences. Each sentence is a string or a list of string.use_delim (
bool
, optional, defaults to False) – Segment sentence (internally) usingdelim_set
.delim_set (str, optional, defaults to
',,。::;;!!??'
) – Used for sentence segmentation ifuse_delim=True
.batch_size (
int
, optional, defaults to 256) – The size of mini-batch.max_length (
int
, optional) – The maximum length of the sentence, must not longer then the maximum sequence length for this model (i.e.tokenizer.model_max_length
).show_progress (
bool
, optional, defaults to True) – Show progress bar.pin_memory (
bool
, optional, defaults to True) – Pin memory in order to accelerate the speed of data transfer to the GPU. This option is incompatible with multiprocessing.
- class ckip_transformers.nlp.util.NerToken(word: str, ner: str, idx: Tuple[int, int])[source]
Bases:
tuple
A named-entity recognition token.
- property word
str
, the token word.
- property ner
str
, the NER-tag.
- property idx
Tuple[int, int]
, the starting / ending index in the sentence.
- __getnewargs__()
Return self as a plain tuple. Used by copy and pickle.
- static __new__(_cls, word: str, ner: str, idx: Tuple[int, int])
Create new instance of NerToken(word, ner, idx)
- __repr__()
Return a nicely formatted representation string