spaCy is a library for advanced natural language processing in Python andCython. spaCy is built on the very latest research, but it isn't researchware.It was designed from day 1 to be used in real products. It's commercialopen-source software, released under the MIT license.
Features
- Non-destructive tokenization
- Syntax-driven sentence segmentation
- Pre-trained word vectors
- Part-of-speech tagging
- Named entity recognition
- Labelled dependency parsing
- Convenient string-to-int mapping
- Export to numpy data arrays
- GIL-free multi-threading
- Efficient binary serialization
- Easy deep learning integration
- Statistical models forEnglish and German
- State-of-the-art speed
- Robust, rigorously evaluated accuracy
See facts, figures and benchmarks.
Top Peformance
- Fastest in the world: <50ms per document. No faster system has ever beenannounced.
- Accuracy within 1% of the current state of the art on all tasks performed(parsing, named entity recognition, part-of-speech tagging). The only moreaccurate systems are an order of magnitude slower or more.
Supports
- CPython 2.6, 2.7, 3.3, 3.4, 3.5 (only 64 bit)
- macOS / OS X
- Linux
- Windows (Cygwin, MinGW, Visual Studio)
GitHub link:https://github.com/explosion/spacy