ckip_transformers.nlp.driver module¶
This module implements the CKIP Transformers NLP drivers.
-
class
ckip_transformers.nlp.driver.CkipWordSegmenter(level: int = 3, **kwargs)[source]¶ Bases:
ckip_transformers.nlp.util.CkipTokenClassificationThe word segmentation driver.
- Parameters
level (
stroptional, defaults to 3, must be 1—3) – The model level. The higher the level is, the more accurate and slower the model is.device (
int, optional, defaults to -1,) – Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on the associated CUDA device id.
-
__call__(input_text: List[str], *, use_delim: bool = False, **kwargs) → List[List[str]][source]¶ Call the driver.
- Parameters
input_text (
List[str]) – The input sentences. Each sentence is a string.use_delim (
bool, optional, defaults to False) – Segment sentence (internally) usingdelim_set.delim_set (str, optional, defaults to
',,。::;;!!??') – Used for sentence segmentation ifuse_delim=True.batch_size (
int, optional, defaults to 256) – The size of mini-batch.max_length (
int, optional) – The maximum length of the sentence, must not longer then the maximum sequence length for this model (i.e.tokenizer.model_max_length).show_progress (
int, optional, defaults to True) – Show progress bar.
- Returns
List[List[NerToken]]– A list of list of words (str).
-
class
ckip_transformers.nlp.driver.CkipPosTagger(level: int = 3, **kwargs)[source]¶ Bases:
ckip_transformers.nlp.util.CkipTokenClassificationThe part-of-speech tagging driver.
- Parameters
level (
stroptional, defaults to 3, must be 1—3) – The model level. The higher the level is, the more accurate and slower the model is.device (
int, optional, defaults to -1,) – Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on the associated CUDA device id.
-
__call__(input_text: List[List[str]], *, use_delim: bool = True, **kwargs) → List[List[str]][source]¶ Call the driver.
- Parameters
input_text (
List[List[str]]) – The input sentences. Each sentence is a list of strings (words).use_delim (
bool, optional, defaults to True) – Segment sentence (internally) usingdelim_set.delim_set (str, optional, defaults to
',,。::;;!!??') – Used for sentence segmentation ifuse_delim=True.batch_size (
int, optional, defaults to 256) – The size of mini-batch.max_length (
int, optional) – The maximum length of the sentence, must not longer then the maximum sequence length for this model (i.e.tokenizer.model_max_length).show_progress (
int, optional, defaults to True) – Show progress bar.
- Returns
List[List[NerToken]]– A list of list of POS tags (str).
-
class
ckip_transformers.nlp.driver.CkipNerChunker(level: int = 3, **kwargs)[source]¶ Bases:
ckip_transformers.nlp.util.CkipTokenClassificationThe named-entity recognition driver.
- Parameters
level (
stroptional, defaults to 3, must be 1—3) – The model level. The higher the level is, the more accurate and slower the model is.device (
int, optional, defaults to -1,) – Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on the associated CUDA device id.
-
__call__(input_text: List[str], *, use_delim: bool = False, **kwargs) → List[List[ckip_transformers.nlp.util.NerToken]][source]¶ Call the driver.
- Parameters
input_text (
List[str]) – The input sentences. Each sentence is a string or a list or string (words).use_delim (
bool, optional, defaults to False) – Segment sentence (internally) usingdelim_set.delim_set (str, optional, defaults to
',,。::;;!!??') – Used for sentence segmentation ifuse_delim=True.batch_size (
int, optional, defaults to 256) – The size of mini-batch.max_length (
int, optional) – The maximum length of the sentence, must not longer then the maximum sequence length for this model (i.e.tokenizer.model_max_length).show_progress (
int, optional, defaults to True) – Show progress bar.
- Returns
List[List[NerToken]]– A list of list of entities (NerToken).