In this article we are going to be discussing some of the new features available in the new release of Spacy which you can test out in deepnote!
SpaCy is my go to library for all things natural language processing. SpaCy can handle all stages in an NLP workflow including pre-processing and extending to full natural language understanding systems. Specifically spaCy is designed for spaCy for production. Thats means if youre looking to build a system that can scale and serve users spaCy might be right up your alley.
SpaCy summarizes pretty well what its able to do so I will attach that chart here.
In their own words:
spaCy v3.0 features all new transformer-based pipelines that bring spaCy’s accuracy right up to the current state-of-the-art. You can use any pretrained transformer to train your own pipelines, and even share one transformer between multiple components with multi-task learning. Training is now fully configurable and extensible, and you can define your own custom models using PyTorch, TensorFlow and other frameworks. The new spaCy projects system lets you describe whole end-to-end workflows in a single file, giving you an easy path from prototype to production, and making it easy to clone and adapt best-practice projects for your own use cases.
Likely the biggest change in the v3 release is the addition of transformers. Spacy now has access to transformer architechture models to perform tasks like Named Entity Recognition.
The new models can be used in the same way as in previous versions. To use the new transformer model you will need to download
Notice how "Madeupville" was labeled as a Geopolitical Entity despite as you may have been able to figure out from the name is not something that exists or that the model would be able to understand from the word itself but only through context.
The transformers is using the
roberta-base model in the background courtesy of huggingface (more on them later) and you can read more about the release here.