在Docker上使用Node.js和ElasticSearch进行全文搜索

发表于 2019年9 月15日星期日下午 5:03:10

作者：MicheleRiva✏道

全文搜索既可怕又令人兴奋。一些流行的数据库，如MySql和Postgres，是存储数据的绝佳解决方案……但是当涉及到全文搜索性能时，与ElasticSearch没有竞争。

对于那些不知道的人，ElasticSearch是一个建立在Lucene之上的搜索引擎服务器，具有惊人的分布式架构支持。根据db-engines.com，它是目前最常用的搜索引擎。

在这篇文章中，我们将构建一个名为The Quotes Database的简单REST应用程序，它允许我们存储和搜索任意数量的引号。

我已经为他们的作者准备了一个包含5000多个引号的JSON文件，我们将把它作为填充ElasticSearch的起始数据。

您可以在此处找到此项目的存储库。

设置Docker

首先，我们不想在我们的机器上安装ElasticSearch。我们将使用Docker在容器上编排Node.js服务器和ES实例，这将允许我们部署一个生产就绪的应用程序，其中包含所需的所有依赖项。

让我们创建一个 Dockerfile 在我们的项目根文件夹中：

FROM node:10.15.3-alpine  WORKDIR /usr/src/app  COPY package*.json ./  RUN npm install RUN npm install -g pm2  COPY . ./  EXPOSE 3000 EXPOSE 9200  CMD npm run start

如您所见，我们告诉Docker我们将运行Node.js 10.15.3-alpine运行时。我们还将在下面创建一个新的工作目录 /usr/src/app，我们将复制两者 package.json 和 package-lock.json 文件。这样，Docker就可以运行了 npm install 在我们里面 WORKDIR，安装我们所需的依赖项。

我们还将通过运行来安装PM2 RUN npm install -g pm2。 Node.js运行时是单线程的，因此如果进程崩盘，则需要重新启动整个应用程序… PM2检查Node.js进程状态，并在应用程序因任何原因关闭时重新启动它。

安装PM2后，我们将复制我们的代码库 WORKDIR （COPY . ./），我们告诉Docker公开两个端口， 3000，这将揭示我们的RESTful服务，和 9200，公开ElasticSearch服务（EXPOSE 3000 和 EXPOSE 9200）。

最后但同样重要的是，我们告诉Docker哪个命令将启动Node.js应用程序 npm run start。

设置docker-compose

现在你可能会说，'太好了，我明白了但是如何在Docker中处理ElasticSearch实例？我在Dockerfile中找不到它'…你是对的这就是docker-compose变得有用的地方。它允许我们编排多个Docker容器并在它们之间创建连接。所以，让我们写下来 docker-compose.yml 文件，它将存储在我们的项目根目录中：

version: '3.6' services:   api:     image: node:10.15.3-alpine     container_name: tqd-node     build: .     ports:       - 3000:3000     environment:      - NODE_ENV=local      - ES_HOST=elasticsearch      - NODE_PORT=3000      - ELASTIC_URL=http://elasticsearch:9200     volumes:       - .:/usr/src/app/quotes     command: npm run start     links:         - elasticsearch     depends_on:         - elasticsearch     networks:        - esnet   elasticsearch:     container_name: tqd-elasticsearch     image: docker.elastic.co/elasticsearch/elasticsearch:7.0.1     volumes:       - esdata:/usr/share/elasticsearch/data     environment:       - bootstrap.memory_lock=true       - "ES_JAVA_OPTS=-Xms512m -Xmx512m"       - discovery.type=single-node     logging:       driver: none     ports:       - 9300:9300       - 9200:9200     networks:        - esnet volumes:   esdata: networks:   esnet:

这比我们的Dockerfile复杂一点，但让我们来分析一下：

我们宣布哪个版本 docker-compose.yml 我们正在使用的文件（3.6）
我们声明我们的服务：
- api，这是我们的Node.js应用程序。就像我们的Dockerfile一样，它需要 node:10.15.3-alpine 图片。我们还为此容器指定了一个名称 tqd-node，在这里我们使用了以前创建的Dockerfile build . 命令。
- 我们需要揭露 3000 port，所以我们写这些语句如下 3000:3000。这意味着我们从端口映射 3000 （在我们的容器内）到港口 3000 （可从我们的机器访问）。然后我们将设置一些环境变量。价值 elasticsearch 是一个变量，指的是 elasticsearch 我们的服务 docker-compose.yml 文件。
- 我们还想安装一个成交量 /usr/src/app/quotes。这样，一旦我们重新启动容器，我们将保留我们的数据而不会丢失它。
- 再一次，我们告诉Docker一旦容器启动我们需要执行哪个命令，然后我们设置一个链接到 elasticsearch 服务。我们还告诉Docker启动 api 服务之后 elasticsearch 服务已启动（使用 depends_on 指示）。
- 最后但并非最不重要的是，我们告诉Docker连接 api 服务下的 esnet 网络。那是因为每个容器都有自己的网络。那样，我们这样说 api 和 elasticsearch 服务共享相同的网络，因此他们可以使用相同的端口相互呼叫。
- elasticsearch，（正如您可能已经猜到的）我们的ES服务。它的配置非常类似于 api 服务。我们只是切断它的详细日志设置 logging 指令 driver: none 。
我们还声明了我们存储ES数据的成交量。
我们宣布我们的网络， esnet。

引导Node.js应用程序

现在我们需要创建我们的Node.js应用程序，所以让我们开始设置我们的 package.json 文件：

npm init -y

现在我们需要安装一些依赖项：

npm i -s @elastic/elasticsearch body-parser cors dotenv express

大我们的 package.json 文件应如下所示：

{   "name": "nodejselastic",   "version": "1.0.0",   "description": "",   "main": "index.js",   "scripts": {     "test": "echo "Error: no test specified" && exit 1"   },   "keywords": (),   "author": "",   "license": "ISC",   "dependencies": {     "@elastic/elasticsearch": "^7.3.0",     "body-parser": "^1.19.0",     "cors": "^2.8.5",     "dotenv": "^8.0.0",     "express": "^4.17.1"   } }

让我们在Node.js中实现我们的ElasticSearch连接器。首先，我们需要创建一个新的 /src/elastic.js 文件：

const { Client } = require("@elastic/elasticsearch");                    require("dotenv").config();  const elasticUrl = process.env.ELASTIC_URL || "http://localhost:9200"; const esclient   = new Client({ node: elasticUrl }); const index      = "quotes"; const type       = "quotes";

如您所见，这里我们设置了一些非常有用的常量。首先，我们使用其官方Node.js SDK创建与ElasticSearch的新连接，然后我们定义一个索引（"quotes"）和索引类型（"quotes" 再次，我们稍后会看到他们的意思）。

现在我们需要在ElasticSearch上创建一个索引。您可以将“索引”视为SQL“数据库”等效项。 ElasticSearch是一个NoSQL数据库，这意味着它没有表 – 它只存储JSON文档。索引是一个逻辑命名空间，它映射到一个或多个主分片，并且可以具有零个或多个副本分片。您可以在此处阅读有关ElasticSearch索引的更多信息。

现在让我们定义一个将创建索引的函数：

async function createIndex(index) {    try {     await esclient.indices.create({ index });     console.log(`Created index ${index}`);   } catch (err) {     console.error(`An error occurred while creating the index ${index}:`);     console.error(err);   } }

现在我们需要另一个为我们的引号创建映射的函数。映射定义了文档的模式和类型：

async function setQuotesMapping () {   try {     const schema = {       quote: {         type: "text"        },       author: {         type: "text"       }     };      await esclient.indices.putMapping({        index,        type,       include_type_name: true,       body: {          properties: schema        }      })      console.log("Quotes mapping created successfully");   } catch (err) {     console.error("An error occurred while setting the quotes mapping:");     console.error(err);   } }

正如您所看到的，我们正在为文档定义模式，我们将其插入到我们的文档中 index。

现在让我们考虑ElasticSearch是一个庞大的系统，可能需要几秒钟才能启动。在准备就绪之前我们无法连接到ES，因此我们需要一个检查ES服务器何时就绪的功能：

function checkConnection() {   return new Promise(async (resolve) => {     console.log("Checking connection to ElasticSearch...");     let isConnected = false;     while (!isConnected) {       try {         await esclient.cluster.health({});         console.log("Successfully connected to ElasticSearch");         isConnected = true;       // eslint-disable-next-line no-empty       } catch (_) {       }     }     resolve(true);   }); }

如你所见，我们正在回复一个承诺。那是因为通过使用 async/await 我们能够停止整个Node.js进程，直到这个promise得到解决，并且在连接到ES之前它不会这样做。这样，我们强制我们的Node.js在启动之前等待ES。

我们已经完成了ElasticSearch现在让我们导出我们的功能：

module.exports = {   esclient,   setQuotesMapping,   checkConnection,   createIndex,   index,   type };

大让我们看看整个 elastic.js 文件：

const { Client } = require("@elastic/elasticsearch");                    require("dotenv").config(); const elasticUrl = process.env.ELASTIC_URL || "http://localhost:9200"; const esclient   = new Client({ node: elasticUrl }); const index      = "quotes"; const type       = "quotes"; /**  * @function createIndex  * @returns {void}  * @description Creates an index in ElasticSearch.  */ async function createIndex(index) {   try {     await esclient.indices.create({ index });     console.log(`Created index ${index}`);   } catch (err) {     console.error(`An error occurred while creating the index ${index}:`);     console.error(err);   } } /**  * @function setQuotesMapping,  * @returns {void}  * @description Sets the quotes mapping to the database.  */ async function setQuotesMapping () {   try {     const schema = {       quote: {         type: "text"        },       author: {         type: "text"       }     };      await esclient.indices.putMapping({        index,        type,       include_type_name: true,       body: {          properties: schema        }      })      console.log("Quotes mapping created successfully");    } catch (err) {     console.error("An error occurred while setting the quotes mapping:");     console.error(err);   } } /**  * @function checkConnection  * @returns {Promise}  * @description Checks if the client is connected to ElasticSearch  */ function checkConnection() {   return new Promise(async (resolve) => {     console.log("Checking connection to ElasticSearch...");     let isConnected = false;     while (!isConnected) {       try {         await esclient.cluster.health({});         console.log("Successfully connected to ElasticSearch");         isConnected = true;       // eslint-disable-next-line no-empty       } catch (_) {       }     }     resolve(true);   }); } module.exports = {   esclient,   setQuotesMapping,   checkConnection,   createIndex,   index,   type };

使用引号填充ElasticSearch

现在我们需要使用引号填充ES实例。这听起来很容易，但请相信我，这可能很棘手。

让我们创建一个新文件 /src/data/index.js：

const elastic = require("../elastic"); const quotes  = require("./quotes.json");  const esAction = {   index: {     _index: elastic.index,     _type: elastic.type   } };

如您所见，我们正在导入 elastic 我们刚创建的模块和存储在JSON文件中的引号 /src/data/quotes.json。我们还创建了一个名为的对象 esAction，一旦我们插入文件，它将告诉ES如何索引文档。

现在我们需要一个脚本来填充我们的数据库。我们还需要使用以下结构创建一个Object数组：

(   {     index: {       _index: elastic.index,       _type:  elastic.type     }   },   {     author: "quote author",     quote:  "quote"   },   ... )

如您所见，对于我们要插入的每个引用，我们需要将其映射设置为ElasticSearch。这就是我们要做的：

async function populateDatabase() {   const docs = ();   for (const quote of quotes) {     docs.push(esAction);     docs.push(quote);   }   return elastic.esclient.bulk({ body: docs }); }

大现在让我们创建我们的主文件 /src/main.js 看看我们如何构建到目前为止我们写的所有内容：

const elastic = require("./elastic"); const data    = require("./data");                 require("dotenv").config();  (async function main() {    const isElasticReady = await elastic.checkConnection();   if (isElasticReady) {     const elasticIndex = await elastic.esclient.indices.exists({index: elastic.index});      if (!elasticIndex.body) {       await elastic.createIndex(elastic.index);       await elastic.setQuotesMapping();       await data.populateDatabase()     }   }  })();

我们来分析上面的代码。我们创建一个自动执行的主函数，它将检查ES连接。在ES连接之前，代码执行不会继续。当ES准备好时，我们将检查是否 quotes 索引存在。如果没有，我们将创建它，我们将设置其映射并填充数据库。显然，我们只会在第一次启动服务器时这样做

创建RESTful API

现在我们需要创建我们的RESTful服务器。我们将使用Express.js，这是用于构建服务器的最流行的Node.js框架。

我们将从开始 /src/server/index.js 文件：

const express      = require("express"); const cors         = require("cors"); const bodyParser   = require("body-parser"); const routes       = require("./routes");                      require("dotenv").config();  const app  = express(); const port = process.env.NODE_PORT || 3000;  function start() {   return  app.use(cors())              .use(bodyParser.urlencoded({ extended: false }))              .use(bodyParser.json())              .use("/quotes",routes)              .use((_req, res) => res.status(404).json({ success: false,error: "Route not found" }))              .listen(port, () => console.log(`Server ready on port ${port}`)); }  module.exports = {   start };

如您所见，它只是一个标准的Express.js服务器，我们不会花太多时间。

我们来看看吧 /src/server/routes/index.js 文件：

const express    = require("express"); const controller = require("../controllers"); const routes     = express.Router();  routes.route("/").get(controller.getQuotes); routes.route("/new").post(controller.addQuote);  module.exports = routes;

我们只创建两个端点：

GET /，将返回与我们的查询字符串参数匹配的引号列表。
POST /new/，将允许我们发布一个新的报价，将其存储在ElasticSearch中。

现在让我们看看我们的 /src/server/controllers/index.js 文件：

const model = require("../models");  async function getQuotes(req, res) {   const query  = req.query;   if (!query.text) {     res.status(422).json({       error: true,       data: "Missing required parameter: text"     });     return;   }   try {     const result = await model.getQuotes(req.query);     res.json({ success: true, data: result });   } catch (err) {     res.status(500).json({ success: false, error: "Unknown error."});   } }  async function addQuote(req, res) {   const body = req.body;   if (!body.quote || !body.author) {     res.status(422).json({       error: true,       data: "Missing required parameter(s): 'body' or 'author'"     });     return;   }   try {     const result = await model.insertNewQuote(body.quote, body.author);     res.json({        success: true,        data: {         id:     result.body._id,         author: body.author,         quote:  body.quote       }      });   } catch (err) {     res.status(500).json({ success: false, error: "Unknown error."});   } } module.exports = {   getQuotes,   addQuote };

这里我们基本上定义了两个函数：

getQuotes，至少需要一个查询字符串参数 – text
addQuote，需要两个参数 – author 和 quote

ElasticSearch接口被委托给我们 /src/server/models/index.js。这种结构有助于我们维护MVC-ish架构。

让我们看看我们的模型：

const { esclient, index, type } = require("../../elastic");  async function getQuotes(req) {   const query = {     query: {       match: {         quote: {           query: req.text,           operator: "and",           fuzziness: "auto"         }       }     }   }    const { body: { hits } } = await esclient.search({     from:  req.page  || 0,     size:  req.limit || 100,     index: index,      type:  type,     body:  query   });    const results = hits.total.value;   const values  = hits.hits.map((hit) => {     return {       id:     hit._id,       quote:  hit._source.quote,       author: hit._source.author,       score:  hit._score     }   });    return {     results,     values   } }

如您所见，我们通过选择包含给定单词或短语的每个引用来编写ElasticSearch查询。

然后，我们生成查询，设置两者 page 和 limit 例如，我们可以在查询字符串中传递它们 http://localhost:3000/quotes?text=love&page=1&limit=100。如果这些值没有通过querystring传递，我们将回退到它们的默认值。

ElasticSearch返回了大量的数据，但我们只需要四件事：

报价ID
报价本身
引用作者
得分

分数表示报价与我们的搜索字词的接近程度。获得这些值后，我们将返回总结果数，这在将结果分页到前端时可能很有用。

现在我们需要为我们的模型创建最后一个函数 insertNewQuote：

async function insertNewQuote(quote, author) {   return esclient.index({     index,     type,     body: {       quote,       author     }   }) }

这个函数非常简单，我们只需将引号和作者发布到索引中，然后将查询结果返回给控制器。

现在完成了 /src/server/models/index.js 文件应如下所示：

const { esclient, index, type } = require("../../elastic");  async function getQuotes(req) {   const query = {     query: {       match: {         quote: {           query: req.text,           operator: "and",           fuzziness: "auto"         }       }     }   }    const { body: { hits } } = await esclient.search({     from:  req.page  || 0,     size:  req.limit || 100,     index: index,      type:  type,     body:  query   });    const results = hits.total.value;    const values  = hits.hits.map((hit) => {     return {       id:     hit._id,       quote:  hit._source.quote,       author: hit._source.author,       score:  hit._score     }   });    return {     results,     values   } }  async function insertNewQuote(quote, author) {   return esclient.index({     index,     type,     body: {       quote,       author     }   }) }  module.exports = {   getQuotes,   insertNewQuote }

我们完成了我们只需要在里面设置我们的启动脚本 package.json 文件，我们准备好了：

"scripts": {   "start": "pm2-runtime start ./src/main.js --name node_app",   "stop": "pm2-runtime stop node_app " }

我们还需要更新我们的 /src/main.js 脚本，以便在连接ElasticSearch后启动我们的Express.js服务器：

const elastic = require("./elastic"); const server  = require("./server"); const data    = require("./data");                 require("dotenv").config();  (async function main() {   const isElasticReady = await elastic.checkConnection();    if (isElasticReady) {     const elasticIndex = await elastic.esclient.indices.exists({index: elastic.index});      if (!elasticIndex.body) {       await elastic.createIndex(elastic.index);       await elastic.setQuotesMapping();       await data.populateDatabase()     }      server.start();   } })();

启动应用程序

我们现在准备使用docker-compose启动我们的应用程序了

只需运行以下命令：

docker-compose up

您需要等到Docker同时下载ElasticSearch和Node.js图像，然后它将启动您的服务器，您将准备好查询您的REST端点

让我们测试几个cURL调用：

curl localhost:3000/quotes?text=love&limit=3  {   "success": true,   "data": {     "results": 716,     "values": (       {         "id": "JDE3kGwBuLHMiUvv1itT",         "quote": "There is only one happiness in life, to love and be loved.",         "author": "George Sand",         "score": 6.7102118       },       {         "id": "JjE3kGwBuLHMiUvv1itT",         "quote": "Live through feeling and you will live through love. For feeling is the language of the soul, and feeling is truth.",         "author": "Matt Zotti",         "score": 6.2868223       },       {         "id": "NTE3kGwBuLHMiUvv1iFO",         "quote": "Genuine love should first be directed at oneself if we do not love ourselves, how can we love others?",         "author": "Dalai Lama",         "score": 5.236455       }     )   } }

如您所见，我们决定将结果限制为 3，但有超过713报价

我们可以通过调用以下内容轻松获得接下来的三个引号

curl localhost:3000/quotes?text=love&limit=3&page=2  {   "success": true,   "data": {     "results": 716,     "values": (       {         "id": "SsyHkGwBrOFNsaVmePwE",         "quote": "Forgiveness is choosing to love. It is the first skill of self-giving love.",         "author": "Mohandas Gandhi",         "score": 4.93597       },       {         "id": "rDE3kGwBuLHMiUvv1idS",         "quote": "Neither a lofty degree of intelligence nor imagination nor both together go to the making of genius. Love, love, love, that is the soul of genius.",         "author": "Wolfgang Amadeus Mozart",         "score": 4.7821507       },       {         "id": "TjE3kGwBuLHMiUvv1h9K",         "quote": "Speak low, if you speak love.",         "author": "William Shakespeare",         "score": 4.6697206       }     )   } }

如果您需要插入新报价怎么办？只需拨打电话 /quotes/new 终点

curl --request POST       --url http://localhost:3000/quotes/new       --header 'content-type: application/json'       --data '{         "author": "Michele Riva",         "quote": "Using Docker and ElasticSearch is challenging, but totally worth it." }'

并且响应将是：

{   "success": true,   "data": {     "id": "is2QkGwBrOFNsaVmFAi8",     "author": "Michele Riva",     "quote": "Using Docker and ElasticSearch is challenging, but totally worth it."   } }

结论

Docker使得管理依赖项及其部署非常容易。从那时起，我们可以轻松地在Heroku，AWS ECS，Google Cloud Container或任何其他基于Docker的服务上托管我们的应用程序，而无需使用超复杂配置来设置我们的服务器。

下一步？

了解如何使用Kubernetes扩展容器并编排更多ElasticSearch实例
创建一个允许您更新现有报价的新端点。错误可能发生
删除报价怎么样？你将如何实现该端点？
用标签保存你的报价会很棒（例如，关于爱情，健康，艺术的报价）…尝试更新你的 quotes 指数

软件开发很有趣。使用Docker，Node和ElasticSearch，它甚至更好

编者注：看到这篇文章有问题？你可以在这里找到正确的版本。

插件：LogRocket，一个用于网络应用的DVR

LogRocket是一个前端日志记录工具，可让您像在自己的浏览器中一样重放问题。 LogRocket不是猜测错误发生的原因，也不是要求用户提供屏幕截图和日志暴跌，而是让您重播会话以快速了解出现了什么问题。它适用于任何应用程序，无论框架如何，并且具有从Redux，Vuex和@ngrx / store记录其他上下文的插件。

除了记录Redux操作和状态之外，LogRocket还记录控制台日志，JavaScript错误，堆栈跟踪，带有标题+正文的网络请求/响应，浏览器元数据和自定义日志。它还使用DOM来记录页面上的HTML和CSS，重新创建即使是最复杂的单页应用程序的像素完美视频。

免费试用。

使用Node.js和Docker上的ElasticSearch的全文搜索首先出现在LogRocket博客上。

资讯来源：由0x资讯编译自DEV，原文：https://dev.to/bnevilleoneill/full-text-search-with-node-js-and-elasticsearch-on-docker-146k ，版权归作者所有，未经许可，不得转载