实体识别和关键字判别

  1. OSError: [E050] Can’t find model ‘en’. It doesn’t seem to be a shortcut link, a Python package or a valid path to a data directory.

网上是说缺乏了 ”en“ 模型,我是直接

1
2
> sudo pip install -U spcay
>

但是,还是没有解决缺少 ”en“ 模型的问题,于是使用下列指令:

1
2
> sudo python3 -m spacy download en
>

还是依旧不好使,提示:

1
2
> requests.exceptions.ConnectionError: HTTPSConnectionPool(host='raw.githubusercontent.com', port=443): Max retries exceeded with url: /explosion/spacy-models/master/shortcuts-v2.json (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fbd237ddbe0>: Failed to establish a new connection: [Errno 61] Connection refused',))
>

无法建立连接,在尝试多遍之后,终于下载成功了「原因是我翻墙了,变成全局模式就可以了」

然后就运行成功了。

  1. 写测试类的时候,一直出问题,主要参考:

    1
    https://blog.csdn.net/weixin_39800144/article/details/79241620
    1
    java.lang.IllegalStateException: Unable to find a @SpringBootConfiguration, you need to use @ContextConfiguration or @SpringBootTest(classes=...) with your test

原因是 test 类和主程序的包路径不一致,导致启动不了。

于是我在测试类中加上了注解

1
2
> @SpringBootTest(classes = Application.class)
>

依旧报错:

1
2
> org.junit.runners.model.InvalidTestClassError:No runnable methods
>

查询后,发现是导错包了,我导的包为:

1
2
> import org.junit.jupiter.api.Test;
>

但是实际上我们需要的是:

1
2
> import org.junit.Test
>

好的,现在可以启动 springboot 的主程序了,但是又遇到了新的问题:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
> java.lang.IllegalStateException: Failed to load ApplicationContext
>
> at org.springframework.test.context.cache.DefaultCacheAwareContextLoaderDelegate.loadContext(DefaultCacheAwareContextLoaderDelegate.java:132)
> at org.springframework.test.context.support.DefaultTestContext.getApplicationContext(DefaultTestContext.java:123)
> at org.springframework.test.context.web.ServletTestExecutionListener.setUpRequestContextIfNecessary(ServletTestExecutionListener.java:190)
> at org.springframework.test.context.web.ServletTestExecutionListener.prepareTestInstance(ServletTestExecutionListener.java:132)
> at org.springframework.test.context.TestContextManager.prepareTestInstance(TestContextManager.java:244)
> at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.createTest(SpringJUnit4ClassRunner.java:227)
> at org.springframework.test.context.junit4.SpringJUnit4ClassRunner$1.runReflectiveCall(SpringJUnit4ClassRunner.java:289)
> at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.methodBlock(SpringJUnit4ClassRunner.java:291)
> at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:246)
> at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.runChild(SpringJUnit4ClassRunner.java:97)
> at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
> at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
> at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
> at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
> at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
> at org.springframework.test.context.junit4.statements.RunBeforeTestClassCallbacks.evaluate(RunBeforeTestClassCallbacks.java:61)
> at org.springframework.test.context.junit4.statements.RunAfterTestClassCallbacks.evaluate(RunAfterTestClassCallbacks.java:70)
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
> at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
> at org.springframework.test.context.junit4.SpringJUnit4ClassRunner.run(SpringJUnit4ClassRunner.java:190)
> at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
> at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
> at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> Caused by: java.lang.UnsupportedOperationException: MockServerContainer does not support addEndpoint(ServerEndpointConfig)
> at org.springframework.test.context.web.socket.MockServerContainer.addEndpoint(MockServerContainer.java:132)
> at org.springframework.web.socket.server.standard.ServerEndpointExporter.registerEndpoint(ServerEndpointExporter.java:170)
> at org.springframework.web.socket.server.standard.ServerEndpointExporter.registerEndpoints(ServerEndpointExporter.java:140)
> at org.springframework.web.socket.server.standard.ServerEndpointExporter.afterSingletonsInstantiated(ServerEndpointExporter.java:112)
> at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:914)
> at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:879)
> at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:551)
> at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:758)
> at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:750)
> at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:405)
> at org.springframework.boot.SpringApplication.run(SpringApplication.java:315)
> at org.springframework.boot.test.context.SpringBootContextLoader.loadContext(SpringBootContextLoader.java:120)
> at org.springframework.test.context.cache.DefaultCacheAwareContextLoaderDelegate.loadContextInternal(DefaultCacheAwareContextLoaderDelegate.java:99)
> at org.springframework.test.context.cache.DefaultCacheAwareContextLoaderDelegate.loadContext(DefaultCacheAwareContextLoaderDelegate.java:124)
> ... 25 more
>

经过查询,我就把 注解 @WebAppConfiguration 去掉了,成功。

  1. Java 程序调用 Python
1
2
3
4
5
6
7
8
9
10
11
String[] args1 = new String[]{"python", "/Users/yangweijie/Desktop/chengdu/1203ner精简版本/english_ner.py", segment};
String line;
//2. 实体识别
Process proc = Runtime.getRuntime().exec(args1);

// 输入输出流截取结果
BufferedReader in = new BufferedReader(new InputStreamReader(proc.getInputStream()));
while ((line = in.readLine()) != null) {
System.out.println(line);
JSONArray entities = (JSONArray) JSONArray.parse(line);
...
  1. 将 python、java放至到 docker 中

遇到的问题:

同时需要 python 和 java 的环境,但是 FROM 只能有一个

https://hub.docker.com/r/rappdw/docker-java-python

使用上面这个镜像,包含了 java 8和 python 3的环境。

nltk 装不了

原因是我们有墙,所以根本没办法连接服务器下载 nltk,解决办法就是自己去 github 把自己需要的 nltk 的组件下载下来,然后放到对应的目录下。

这里值得特别注意的是,我们需要放在特定目录下,因为 python 只会在以下几个目录中查找是否有 nltk「这里指的是 linux」:

Searched in:
- ‘/home/hadoopcj/nltk_data’
- ‘/usr/share/nltk_data’
- ‘/usr/local/share/nltk_data’
- ‘/usr/lib/nltk_data’
- ‘/usr/local/lib/nltk_data’
- ‘/home/hadoopcj/nltk_data’
- ‘’

正是没注意到需要放在特定目录下,所以程序一直没成功,卡了一下午…

验证是否成功:

容器中使用nltk

spacy的 en 模型装不了

上述问题解决后,又出现了新的问题,引用的 spacy 的 en 模型一直找不到,首先我尝试了最简单的官方方法:

1
2
> python -m spacy en   
>

或者

1
2
> python -m spacy en_core_web_sm
>

均失败。

原因还是因为在 docker 中无法翻墙导致的,所以我们无法通过该命令来选择 model,于是我又尝试了第二种方法,也就是手动下载相应的 model 然后自己手动安装到 python 中。

首先在官方仓库下载该 model:

https://github.com/explosion/spacy-models/releases//tag/en_core_web_sm-2.3.1

然后使用命令:

1
2
> pip install /usr/local/lib/needs/en_core_web_sm-2.3.1.tar.gz
>

至此安装结束,但是 python 中依然无法运行,查了很久,最终发现下面的调用方式行不通:

1
2
>     spacy_nlp = spacy.load('en')
>

于是,我尝试的换成了:

1
2
>     spacy_nlp = spacy.load('en_core_web_sm')
>

终于成功了。

参考:昌泽的博客链接:https://changzeyan.github.io/2020/12/16/python/web-ying-yong-flask/bu-shu-flask-ying-yong-dao-docker/

最终的 dockerfile 为:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
FROM rappdw/docker-java-python
EXPOSE 9022
ADD target/cloud-produce-1.0-SNAPSHOT.jar /cloud-enterprise-relation.jar
ADD ./code /usr/src/app
ADD ./needs /usr/local/lib/needs
ADD ./nltk_data /usr/local/lib/nltk_data
WORKDIR /usr/src/app
COPY requirements.txt .
VOLUME /tmp
RUN pip install -r ./requirements.txt -i http://mirrors.aliyun.com/pypi/simple/ --trusted-host mirrors.aliyun.com
RUN pip install /usr/local/lib/needs/en_core_web_sm-2.3.1.tar.gz
RUN [ "python", "-c", "import nltk; nltk.download('punkt')" ]
RUN bash -c 'touch /cloud-enterprise-relation.jar'
ENTRYPOINT ["java","-Djava.security.egd=file:/dev/./urandom","-jar","/cloud-enterprise-relation.jar"]

这里要注意的是, Dockerfile 中的 f 不能大写,同时在创建镜像时,镜像名字也不能有大写,必须全小写。

还有,在使用 mac 版本的 docker 时,需要配置镜像:

1
2
3
4
5
6
7
8
9
10
{
"experimental": false,
"features": {
"buildkit": true
},
"registry-mirrors": [
"https://docker.mirrors.ustc.edu.cn",
"https://hub-mirror.c.163.com"
]
}

这样在下载时,就会很快了。

macOS下的 docker 越来越大,导致存储被吞了几十个 G

image-20201218233417972

解决方案:

  1. 首先用该命令看看到底 docker 什么很占存储:
1
2
> docker system df
>

image-20201218234935491

  1. 然后使用该命令:
1
2
>docker system prune
>

image-20201218235000178

显示是清除了4个多 g 的缓存,但是看了看存储空间,基本没变化啊

image-20201218235128207

image-20201219002855251

Thank you for your accept. mua!
-------------本文结束感谢您的阅读-------------