NIFI使用(四)用户手册 (译)

Introduction (简介)

Apache NiFi is a dataflow system based on the concepts of flow-based programming. It supports powerful and scalable directed graphs of data routing, transformation, and system mediation logic. NiFi has a web-based user interface for design, control, feedback, and monitoring of dataflows. It is highly configurable along several dimensions of quality of service, such as loss-tolerant versus guaranteed delivery, low latency versus high throughput, and priority-based queuing. NiFi provides fine-grained data provenance for all data received, forked, joined cloned, modified, sent, and ultimately dropped upon reaching its configured end-state.

Apache NIFI是一个基于流式编程设计的数据流系统。它支持数据路由、数据转换、通过有向图构建逻辑流程,强大并且可扩展。NIFI可以通过WEB界面设计和控制流程、获取流程反馈信息、监控数据流走向。高度可配的服务保证了高可用性、低延迟、高吞吐量以及优先级队列。NIFI为接收、分支、克隆、修改、发送和最后留下的数据提供细粒度的数据来源查询。

See the System Administrator’s Guide for information about system requirements, installation, and configuration. Once NiFi is installed, use a supported web browser to view the UI.

关于系统安装要求、配置指南等信息,请查阅 系统管理员指南。安装成功后,请使用兼容的浏览器打开WEB界面。

Browser Support(支持的浏览器)

Browser Version
Chrome Current and Current - 1
FireFox Current and Current - 1
Edge Current and Current - 1
Safari Current and Current - 1

Current and Current - 1 indicates that the UI is supported in the current stable release of that browser and the preceding one. For instance, if the current stable release is 45.X then the officially supported versions will be 45.X and 44.X.

Current和Current -1 标识该浏览器的当前版本和前一个版本。例如,当前文档版本为45.X那么NIFI支持的版本是45.X和44.X。

For Safari, which releases major versions much less frequently, Current and Current - 1 simply represent the two latest releases.

对于不太频繁发布的Safari,Current和Current -1仅代表最新版本。

The supported browser versions are driven by the capabilities the UI employs and the dependencies it uses. UI features will be developed and tested against the supported browsers. Any problem using a supported browser should be reported to Apache NiFi.

当你使用兼容的浏览器版本发生问题,你应当报告给Apache NIFI。

Unsupported Browsers(不支持的浏览器)

While the UI may run successfully in unsupported browsers, it is not actively tested against them. Additionally, the UI is designed as a desktop experience and is not currently supported in mobile browsers.

UI也许可以在不支持的浏览器成功运行,我们并没有对不支持的浏览器进行测试。目前,移动端的浏览器不兼容UI。

Viewing the UI in Variably Sized Browsers(改变浏览器窗口大小)

In most environments, all of the UI is visible in your browser. However, the UI has a responsive design that allows you to scroll through screens as needed, in smaller sized browsers or tablet environments.

在大多数环境中,NIFI的UI都是可见的。不管如何,UI是通过响应式设计实现,允许您在小尺寸的浏览器或者平板电脑中滚动屏幕。

In environments where your browser width is less than 800 pixels and the height less than 600 pixels, portions of the UI may become unavailable.

当浏览器宽度小于800像素高度小于600像素时,UI某些部分可能会变的不可用。


Terminology(术语/名词解释)

DataFlow Manager: A DataFlow Manager (DFM) is a NiFi user who has permissions to add, remove, and modify components of a NiFi dataflow.

DataFlow Manager:DataFlow Manager (DFM)是指一个NIFI用户具有增、删、改NIFI数据流组件的权限。

FlowFile: The FlowFile represents a single piece of data in NiFi. A FlowFile is made up of two components: FlowFile Attributes and FlowFile Content. Content is the data that is represented by the FlowFile. Attributes are characteristics that provide information or context about the data; they are made up of key-value pairs. All FlowFiles have the following Standard Attributes:

  • uuid: A unique identifier for the FlowFile
  • filename: A human-readable filename that may be used when storing the data to disk or in an external service
  • path: A hierarchically structured value that can be used when storing data to disk or an external service so that the data is not stored in a single directory

FlowFile:FlowFile代表NIFI中的单条数据。它由两个部分组成:属性和内容。内容是代表FlowFile存储的数据。属性是指具有和数据相关的信息或上下文的特征;它们由键值对组成。所有FlowFile都包含以下标准:

  • uuid:唯一标识
  • filename:人类可读的文件名,可在存储到磁盘和供外部服务的时候使用
  • path:分层目录结构,在数据存储到磁盘或提供外部服务使用,数据不会存储在单个目录中

Processor: The Processor is the NiFi component that is used to listen for incoming data; pull data from external sources; publish data to external sources; and route, transform, or extract information from FlowFiles.

Processor: 处理器是NIFI组件,用于监听传入的数据;从外部来源提取数据;将数据发布到外部来源;并通过路由、转换或提取FlowFIles信息。

Relationship: Each Processor has zero or more Relationships defined for it. These Relationships are named to indicate the result of processing a FlowFile. After a Processor has finished processing a FlowFile, it will route (or “transfer”) the FlowFile to one of the Relationships. A DFM is then able to connect each of these Relationships to other components in order to specify where the FlowFile should go next under each potential processing result.

Relationship:每个处理器都有0个或多个关系(也可以叫处理结果,如:成功,失败)。这些关系将引导FlowFile的结果。处理器处理完FlowFile后,它将判断执行结果,选择传输到其中一个关系中。然后,DFM(用户)可以将组件和组件通过关系连接,以指定每个FlowFile的流向。

Connection: A DFM creates an automated dataflow by dragging components from the Components part of the NiFi toolbar to the canvas and then connecting the components together via Connections. Each connection consists of one or more Relationships. For each Connection that is drawn, a DFM can determine which Relationships should be used for the Connection. This allows data to be routed in different ways based on its processing outcome. Each connection houses a FlowFile Queue. When a FlowFile is transferred to a particular Relationship, it is added to the queue belonging to the associated Connection.

Connection:DFM(用户)可以通过工具栏将NIFI组件拖动到画布中,然后通过连线将组件连接在一起,最后的到一个自动数据流。每个连接都由一个或多个关系组成。对于每个连接,DFM(用户)可以确定连接是应用的哪些关系。这样就可以不同的数据处理结果路由到不同的地方。每个连接中包含了一个FlowFile队列。传输FlowFile时,会将数据添加到关联的队列中。

Controller Service: Controller Services are extension points that, after being added and configured by a DFM in the User Interface, will start up when NiFi starts up and provide information for use by other components (such as processors or other controller services). A common Controller Service used by several components is the StandardSSLContextService. It provides the ability to configure keystore and/or truststore properties once and reuse that configuration throughout the application. The idea is that, rather than configure this information in every processor that might need it, the controller service provides it for any processor to use as needed.

Controller Service: 控制器服务是一个可扩展的服务,DFM(用户)在界面添加和配置后,可以跟随NIFI启动时候运行,并且提供其他组件使用。多个组件使用的常见Controller service是StandardSSLContextService。它提供了密钥的配置,并且可以重用。我们的想法是,控制器服务可以应用到任何需要的处理器。

Reporting Task: Reporting Tasks run in the background to provide statistical reports about what is happening in the NiFi instance. The DFM adds and configures Reporting Tasks in the User Interface as desired. Common reporting tasks include the ControllerStatusReportingTask, MonitorDiskUsage reporting task, MonitorMemory reporting task, and the StandardGangliaReporter.

Reporting Task: 报告任务在后台执行,提供NIFI中发生的情况的统计报告。DFM(用户)根据需求可在界面中添加和配置报告任务。常见的报告任务包括ControllerStatusReportingTask,MonitorDiskUsage报告任务,MonitorMemory报告任务和StandardGangliaReporter。

Funnel: A funnel is a NiFi component that is used to combine the data from several Connections into a single Connection.

Funnel: 漏斗是一个NIFI组件,作用是将多个连接的数据合并到一个连接中。

Process Group: When a dataflow becomes complex, it often is beneficial to reason about the dataflow at a higher, more abstract level. NiFi allows multiple components, such as Processors, to be grouped together into a Process Group. The NiFi User Interface then makes it easy for a DFM to connect together multiple Process Groups into a logical dataflow, as well as allowing the DFM to enter a Process Group in order to see and manipulate the components within the Process Group.

Process Group: 当数据流变得很复杂的时候,我们需要在更高的层面上抽象数据流。NIFI允许将多个组件组合到一个进程组中。然后,用户可以轻松的将流程组连接到数据流中,并允许DFM进入进程组查看。

Port: Dataflows that are constructed using one or more Process Groups need a way to connect a Process Group to other dataflow components. This is achieved by using Ports. A DFM can add any number of Input Ports and Output Ports to a Process Group and name these ports appropriately.

Port: 当一个或多个进程组构建的数据流需要将进程组连接到其他数据流组件的时候,我们可以通过Ports实现。DFM可以为进程组添加任意数量的输入端口和输出端口,并且为这些端口命名。

Remote Process Group: Just as data is transferred into and out of a Process Group, it is sometimes necessary to transfer data from one instance of NiFi to another. While NiFi provides many different mechanisms for transferring data from one system to another, Remote Process Groups are often the easiest way to accomplish this if transferring data to another instance of NiFi.

Remote Process Group:就像数据传入传出进程组一样,有时候需要将数据从一个NIFI传递到另一个NIFI中。虽然NIFI提供了许多不同的机制来将数据从一个系统传输到另一个系统,但是远程进程组通常是将数据传输到另一个NIFI的最简单的实现方式。

Bulletin: The NiFi User Interface provides a significant amount of monitoring and feedback about the current status of the application. In addition to rolling statistics and the current status provided for each component, components are able to report Bulletins. Whenever a component reports a Bulletin, a bulletin icon is displayed on that component. System-level bulletins are displayed on the Status bar near the top of the page. Using the mouse to hover over that icon will provide a tool-tip that shows the time and severity (Debug, Info, Warning, Error) of the Bulletin, as well as the message of the Bulletin. Bulletins from all components can also be viewed and filtered in the Bulletin Board Page, available in the Global Menu.

Bulletin:NIFI用户界面提供了大奖有关应用程序当前状态的监控和反馈信息。另外组件还能打印简报,滚动统计信息和各个组件当前的状态。每当组件打印简报时,在该组件上都会显示简报图标。系统级别的简报显示在页面顶部附件的状态栏上。使用鼠标悬停在该简报图标上将会显示简报时间和严重性(Debug,Info,Warning,Error),以及,简报的消息。也可以在全局的简报板页面中查看和过滤所有的组件的简报。

Template: Often times, a dataflow is comprised of many sub-flows that could be reused. NiFi allows DFMs to select a part of the dataflow (or the entire dataflow) and create a Template. This Template is given a name and can then be dragged onto the canvas just like the other components. As a result, several components may be combined together to make a larger building block from which to create a dataflow. These templates can also be exported as XML and imported into another NiFi instance, allowing these building blocks to be shared.

Template: 通常,数据流由许多可以重用的子流组成。NIFI允许DFM选择数据流的一部分并创建模板。模板具有名称,然后可以像其他组件一样拖动到画布上。这样做可以和其他组件构成更大的块,组建更大的数据流。这些模块也可以导出为XML并导入到另一个NIFI实例中,达到共享的目的。

flow.xml.gz: Everything the DFM puts onto the NiFi User Interface canvas is written, in real time, to one file called the flow.xml.gz. This file is located in the nifi/conf directory by default. Any change made on the canvas is automatically saved to this file, without the user needing to click a "save" button. In addition, NiFi automatically creates a backup copy of this file in the archive directory when it is updated. You can use these archived files to rollback flow configuration. To do so, stop NiFi, replace flow.xml.gz with a desired backup copy, then restart NiFi. In a clustered environment, stop the entire NiFi cluster, replace the flow.xml.gz of one of nodes, and restart the node. Remove flow.xml.gz from other nodes. Once you confirmed the node starts up as a one-node cluster, start the other nodes. The replaced flow configuration will be synchronized across the cluster. The name and location of flow.xml.gz, and auto archive behavior are configurable. See the System Administrator’s Guide Guide for further details.

flow.xml.gz: DFM放入NiFi用户界面画布的所有内容都实时写入一个名为flow.xml.gz的文件中。默认情况下,此文件位于nifi / conf目录中。在画布上进行的任何更改都会自动保存到此文件中,而无需用户单击“保存”按钮。此外,NiFi在更新时会自动在归档目录中创建此文件的备份副本。您可以使用这些归档文件来回滚流配置。为此,请停止NiFi,将flow.xml.gz替换为所需的备份副本,然后重新启动NiFi。在集群环境中,停止整个NiFi集群,替换其中一个节点的flow.xml.gz,然后重新启动该节点。从其他节点中删除flow.xml.gz。确认节点启动为单节点群集后,启动其他节点。群集将同步配置和流。 flow.xml.gz的名称和位置以及自动存档行为是可配置的。有关更多详细信息,请参阅“系统管理员指南”

NiFi User Interface(NIFI 用户界面)

The NiFi UI provides mechanisms for creating automated dataflows, as well as visualizing, editing, monitoring, and administering those dataflows. The UI can be broken down into several segments, each responsible for different functionality of the application. This section provides screenshots of the application and highlights the different segments of the UI. Each segment is discussed in further detail later in the document.

NIFI UI 提供了创建自动数据流以及可视化、编辑、监控和管理这些数据流的机制。 UI 可以以分为几个部分,每个部分负责应用程序的不同功能。本节提供应用程序的截图,并突出 UI 的不同部分。每个部分将在本文后面进一步详细讨论。

When the application is started, the user is able to navigate to the UI by going to the default address ofhttp://<hostname>:8080/nifi in a web browser. There are no permissions configured by default, so anyone is able to view and modify the dataflow. For information on securing the system, see the System Administrator’s Guide.

启动应用后,用户可以通过WEB浏览器输入默认地址http://<hostname>:8080/nifi来打开
NIFI 界面。默认没有配置权限,因此任何人都可以查看和修改数据流。有关系统保护,请参阅“系统管理员指南”

When a DFM navigates to the UI for the first time, a blank canvas is provided on which a dataflow can be built:

当DFM首次导航到UI时,会提供一个空白画布,可以在其上构建数据流:

NiFi Components Toolbar

The Components Toolbar runs across the top left portion of your screen. It consists of the components you can drag onto the canvas to build your dataflow. Each component is described in more detail in Building a DataFlow.

组件工具栏位于左上角。它包含了可以拖动到画布上以构建数据流的组件。在构建DataFlow更加详细的描述了每个组件。

The Status Bar is under the Components Toolbar. The Status bar provides information about the number of threads that are currently active in the flow, the amount of data that currently exists in the flow, how many Remote Process Groups exist on the canvas in each state (Transmitting, Not Transmitting), how many Processors exist on the canvas in each state (Stopped, Running, Invalid, Disabled), how many versioned Process Groups exist on the canvas in each state (Up to date, Locally modified, Stale, Locally modified and stale, Sync failure) and the timestamp at which all of this information was last refreshed. Additionally, if the instance of NiFi is clustered, the Status bar shows how many nodes are in the cluster and how many are currently connected.

状态栏位于组件工具栏下。状态栏位于组件工具栏下。 状态栏提供有关当前在流中处于活动状态的线程数,流中当前存在的数据量,每个状态中画布上存在的远程进程组数(传输,未传输),有多少的信息 画布上的处理器存在于每个状态(已停止,正在运行,无效,已禁用),每个状态下画布上存在多少个版本化的进程组(最新,本地修改,过时,本地修改和失效,同步失败)和 上次刷新所有此类信息的时间戳。 此外,如果NiFi实例是群集的,则状态栏会显示群集中有多少节点以及当前连接的节点数。

The Operate Palette sits to the left-hand side of the screen. It consists of buttons that are used by DFMs to manage the flow, as well as by administrators who manage user access and configure system properties, such as how many system resources should be provided to the application.

操作面板位于屏幕的左侧。 它由DFM用于管理流的按钮以及管理员为系统用户提供的属性组成,例如应向应用程序提供多少系统资源。

On the right side of the canvas is Search, and the Global Menu. You can use Search to easily find components on the canvas and can to search by component name, type, identifier, configuration properties, and their values. The Global Menu contains options that allow you to manipulate existing components on the canvas:

在画布的右侧是搜索和全局菜单。 您可以使用“搜索”轻松查找画布上的组件,并可以按组件名称,类型,标识符,配置属性及其值进行搜索。 全局菜单允许您操作画布上现有组件:

NiFi Global Menu

Additionally, the UI has some features that allow you to easily navigate around the canvas. You can use the Navigate Palette to pan around the canvas, and to zoom in and out. The “Birds Eye View” of the dataflow provides a high-level view of the dataflow and allows you to pan across large portions of the dataflow. You can also find breadcrumbs along the bottom of the screen. As you navigate into and out of Process Groups, the breadcrumbs show the depth in the flow, and each Process Group that you entered to reach this depth. Each of the Process Groups listed in the breadcrumbs is a link that will take you back up to that level in the flow.

此外,UI还具有一些功能,可让您轻松浏览画布。 您可以使用“导航选项板”在画布上平移,以及放大和缩小。 数据流的“鸟眼视图”提供了数据流的高级视图,并允许您平移大部分数据流。 您还可以在屏幕底部找到面包屑。 当您导航进出流程组时,面包屑会显示流程中的深度,以及您输入的每个流程组以达到此深度。 面包屑中列出的每个进程组都是一个链接,可以跳转回对应的等级内。

NiFi Global Menu

Accessing the UI with Multi-Tenant Authorization(多租户授权访问UI)

Multi-tenant authorization enables multiple groups of users (tenants) to command, control, and observe different parts of the dataflow, with varying levels of authorization. When an authenticated user attempts to view or modify a NiFi resource, the system checks whether the user has privileges to perform that action. These privileges are defined by policies that you can apply system wide or to individual components. What this means from a Dataflow Manager perspective is that once you have access to the NiFi canvas, a range of functionality is visible and available to you, depending on the privileges assigned to you.

多租户授权允许多组用户(租户)命令,控制和观察数据流的不同部分,具有不同级别的授权。 当通过身份验证的用户尝试查看或修改NiFi资源时,系统会查询用户是否具有执行该操作的权限。 这些权限由可以应用于系统范围或单个组件。 从数据流管理器的角度来看,这意味着一旦您可以访问NiFi画布,您就可以看到一系列功能,具体功能取决于分配给您的权限。

The available global access policies are:

可用的全局访问策略是:

Policy Privilege
view the UI Allows users to view the UI
access the controller Allows users to view and modify the controller including reporting tasks, Controller Services, and nodes in the cluster
query provenance Allows users to submit a provenance search and request even lineage
access restricted components Allows users to create/modify restricted components assuming other permissions are sufficient. The restricted components may indicate which specific permissions are required. Permissions can be granted for specific restrictions or be granted regardless of restrictions. If permission is granted regardless of restrictions, the user can create/modify all restricted components.
access all policies Allows users to view and modify the policies for all components
access users/groups Allows users view and modify the users and user groups
retrieve site-to-site details Allows other NiFi instances to retrieve Site-To-Site details
view system diagnostics Allows users to view System Diagnostics
proxy user requests Allows proxy machines to send requests on the behalf of others
access counters Allows users to view and modify counters
策略 权限
查看UI 允许用户查看UI
访问控制器 允许用户查看和修改控制器,包括报告任务,控制器服务和群集中的节点
来源检索 允许用户顺这来源检索请求甚至血统检索
访问受限制的组件 假设其他权限足够,允许用户创建/修改受限制的组件。受限组件可以指示需要哪些特定权限。可以为特定限制授予权限,也可以在不受限制的情况下授予权限。如果授予权限而不受限制,则用户可以创建/修改所有受限制的组件
访问所有政策 允许用户查看和修改所有组件的策略
访问用户/组 允许用户查看和修改用户和用户组
检索站点到站点的详细信息 允许其他NiFi实例检索站点到站点的详细信息
系统诊断 允许用户查看系统诊断信息
代理用户请求 允许代理机器代表其他人发送请求
访问计数器 允许用户查看和修改计数器

The available component-level access policies are:

可用的组件级访问策略包括:

Policy Privilege
view the component Allows users to view component configuration details
modify the component Allows users to modify component configuration details
view provenance Allows users to view provenance events generated by this component
view the data Allows users to view metadata and content for this component in flowfile queues in outbound connections and through provenance events
modify the data Allows users to empty flowfile queues in outbound connections and submit replays through provenance events
view the policies Allows users to view the list of users who can view and modify a component
modify the policies Allows users to modify the list of users who can view and modify a component
retrieve data via site-to-site Allows a port to receive data from NiFi instances
send data via site-to-site Allows a port to send data from NiFi instances
策略 权限
查看组件 允许用户查看组件配置详细信息
修改组件 允许用户修改组件配置详细信息
来源检索 允许用户查看此组件生成的来源事件
查看数据 允许用户在出站连接和源头事件中查看流组件队列中此组件的元数据和内容
修改数据 允许用户在出站连接中清空流文件队列,并通过出处事件提交重播
查看政策 允许用户查看可以查看和修改组件的用户列表
修改政策 允许用户修改可以查看和修改组件的用户列表
通过站点到站点检索数据 允许端口从NiFi实例接收数据
通过站点到站点发送数据 允许端口从NiFi实例发送数据

If you are unable to view or modify a NiFi resource, contact your System Administrator or see Configuring Users and Access Policies in the System Administrator’s Guide for more information.

如果您无法查看或修改NiFi资源,请与系统管理员联系,或参阅系统管理员指南中的配置用户和访问策略以获取更多信息。

Logging In(登录)

If NiFi is configured to run securely, users will be able to request access to the DataFlow. For information on configuring NiFi to run securely, see the System Administrator’s Guide . If NiFi supports anonymous access, users will be given access accordingly and given an option to log in.

如果NiFi配置为安全运行,则用户将能够请求访问DataFlow。 有关配置NiFi以安全运行的信息,请参阅系统管理员指南。 如果NiFi支持匿名访问,则会为用户提供相应的访问权限,并提供登录选项。

Clicking the 'login' link will open the log in page. If the user is logging in with their username/password they will be presented with a form to do so. If NiFi is not configured to support anonymous access and the user is logging in with their username/password, they will be immediately sent to the login form bypassing the canvas.

单击“登录”链接将打开登录页面。 如果用户使用他们的用户名/密码登录,他们将会看到一个表单。 如果NiFi未配置为支持匿名访问且用户使用其用户名/密码登录,则会立即绕过画布的登录表单。

Log In

Building a DataFlow(构建DataFlow)

A DFM is able to build an automated dataflow using the NiFi UI. Simply drag components from the toolbar to the canvas, configure the components to meet specific needs, and connect the components together.

DFM能够使用NiFi UI构建自动数据流。 只需将组件从工具栏拖到画布,配置组件以满足特定需求,并将组件连接在一起。

Adding Components to the Canvas(画布添加组件)

The User Interface section above outlined the different segments of the UI and pointed out a Components Toolbar. This section looks at each of the Components in that toolbar:

上面的“用户界面”部分概述了UI的不同部分,并指出了“组件工具栏”。 本节将查看详解该工具栏中的每个组件:

Components

Processor

Processor: The Processor is the most commonly used component, as it is responsible for data ingress, egress, routing, and manipulating. There are many different types of Processors. In fact, this is a very common Extension Point in NiFi, meaning that many vendors may implement their own Processors to perform whatever functions are necessary for their use case. When a Processor is dragged onto the canvas, the user is presented with a dialog to choose which type of Processor to use:

Processor: 处理器是最常用的组件,因为它负责数据入口,出口,路由和操作。 有许多不同类型的处理器。 实际上,这是NiFi中非常常见的扩展点,这意味着许多供应商可以实现自己的处理器来执行其用例所需的任何功能。 将处理器拖动到画布上时,会向用户显示一个对话框,以选择要使用的处理器类型:

Add Processor Dialog

In the top-right corner, the user is able to filter the list based on the Processor Type or the Tags associated with a Processor. Processor developers have the ability to add Tags to their Processors. These tags are used in this dialog for filtering and are displayed on the left-hand side in a Tag Cloud. The more Processors that exist with a particular Tag, the larger the Tag appears in the Tag Cloud. Clicking a Tag in the Cloud will filter the available Processors to only those that contain that Tag. If multiple Tags are selected, only those Processors that contain all of those Tags are shown. For example, if we want to show only those Processors that allow us to ingest files, we can select both the files Tag and the ingestTag:

在右上角,用户可以根据处理器类型或与处理器关联的标签过滤列表。 处理器开发人员能够将标签添加到其处理器中。 这些标签在此对话框中用于过滤,并显示在标签云的左侧。 使用特定标记存在的处理器越多,标记在标记云中显示的越大。 单击云中的标记会将可用的处理器过滤为仅包含该标记的处理器。 如果选择了多个标记,则仅显示包含所有这些标记的处理器。 例如,如果我们只想显示那些允许我们提取文件的处理器,我们可以同时选择files标签和ingest Tag:

Add Processor with Tag Cloud

Restricted components will be marked with a Restricted icon next to their name. These are components that can be used to execute arbitrary unsanitized code provided by the operator through the NiFi REST API/UI or can be used to obtain or alter data on the NiFi host system using the NiFi OS credentials. These components could be used by an otherwise authorized NiFi user to go beyond the intended use of the application, escalate privilege, or could expose data about the internals of the NiFi process or the host system. All of these capabilities should be considered privileged, and admins should be aware of these capabilities and explicitly enable them for a subset of trusted users. Before a user is allowed to create and modify restricted components they must be granted access. Hovering over the Restricted icon will display the specific permissions a restricted component requires. Permissions can be assigned regardless of restrictions. In this case, the user will have access to all restricted components. Alternatively, users can be assigned access to specific restrictions. If the user has been granted access to all restrictions a component requires, they will have access to that component assuming otherwise sufficient permissions. For more information refer to Accessing the UI with Multi-Tenant Authorization and Restricted Components in Versioned Flows.

受限制的组件将在其名称旁边标有“受限制”图标。这些组件可用于执行操作员通过NiFi REST API / UI提供的任意不干净的代码,或者可用于使用NiFi OS凭证获取或更改NiFi主机系统上的数据。这些组件可由其他授权的NiFi用户使用,可能会超出应用程序的预期用途,特权的升级,或者可能暴露有关NiFi进程或主机系统内部的数据。所有这些功能都应被视为特权,管理员应了解这些功能,并为信任的用户启用它们。在允许用户创建和修改受限制的组件之前,必须授予他们访问权限。将鼠标悬停在“受限制”图标上将显示受限制组件所需的特定权限。无论限制如何,都可以分配权限。在这种情况下,用户可以访问所有受限制的组件。或者,可以为用户分配对特定限制的访问权限。如果用户已被授予访问组件所需的所有限制的权限,则他们将具有足够的访问该组件的权限。有关更多信息,请参阅在版本化流程中多租户授权受限组件的UI

Clicking the Add button or double-clicking on a Processor Type will add the selected Processor to the canvas at the location that it was dropped.

单击“添加”按钮将处理器添加到画布或双击“处理器”画布中心会落到被选处理器的位置。

For any component added to the canvas, it is possible to select it with the mouse and move it anywhere on the canvas. Also, it is possible to select multiple items at once by either holding down the Shift key and selecting each item or by holding down the Shift key and dragging a selection box around the desired components.

对于添加到画布的任何组件,可以使用鼠标选择它并将其移动到画布上的任何位置。 此外,可以通过按住Shift键并选择多个组件。

Once you have dragged a Processor onto the canvas, you can interact with it by right-clicking on the Processor and selecting an option from the context menu. The options available to you from the context menu vary, depending on the privileges assigned to you.

将处理器拖到画布上后,可以通过右键单击处理器并从菜单中选择一个选项来与其进行交互。 根据分配给您的权限,菜单中可用的选项会有所不同。

Processor Menu

While the options available from the context menu vary, the following options are typically available when you have full privileges to work with a Processor:

虽然菜单中提供的选项有所不同,但是当您具有使用处理器的全部权限时,通常可以使用以下选项:

  • Configure: This option allows the user to establish or change the configuration of the Processor (see Configuring a Processor).

  • Configure: 此选项允许用户建立或更改处理器的配置(请参阅配置处理器)。

For Processors, Ports, Remote Process Groups, Connections and Labels, it is possible to open the configuration dialog by double-clicking on desired component.

对于处理器,端口,远程进程组,连接和标签,可以通过双击所需组件来打开配置对话框。

  • Start or Stop: This option allows the user to start or stop a Processor; the option will be either Start or Stop, depending on the current state of the Processor.
  • Start or Stop: 此选项允许用户启动或停止处理器; 该选项可以是Start或Stop,具体取决于处理器的当前状态。
  • Enable or Disable: This option allows the user to enable or enable a Processor; the option will be either Enable or Disable, depending on the current state of the Processor.
  • Enable or Disable: 此选项允许用户启用或启用处理器; 该选项将为“启用”或“禁用”,具体取决于处理器的当前状态。
  • View data provenance: This option displays the NiFi Data Provenance table, with information about data provenance events for the FlowFiles routed through that Processor (see Data Provenance).
  • View data provenance: 此选项显示NiFi Data Provenance表,其中包含有关通过该处理器路由的FlowFiles的数据来源事件的信息(请参阅数据源)。
  • View status history: This option opens a graphical representation of the Processor’s statistical information over time.
  • View status history: 此选项打开处理器统计信息,随时间变化的图表形式。
  • View usage: This option takes the user to the Processor’s usage documentation.
  • View usage: 此选项将跳转到处理器的使用文档。
  • View connections→Upstream: This option allows the user to see and "jump to" upstream connections that are coming into the Processor. This is particularly useful when processors connect into and out of other Process Groups.
  • View connections→Upstream:此选项允许用户查看并“跳转”到进入处理器的上游连接。 当处理器连接进出其他进程组时,这尤其有用。
  • View connections→Downstream: This option allows the user to see and "jump to" downstream connections that are going out of the Processor. This is particularly useful when processors connect into and out of other Process Groups.
  • View connections→Downstream:此选项允许用户查看并“跳转”到处理器外的下游连接。 当处理器连接进出其他进程组时,这尤其有用。
  • Center in view: This option centers the view of the canvas on the given Processor.
  • Center in view: 将画布以选定的处理器为中心移动。
  • Change color: This option allows the user to change the color of the Processor, which can make the visual management of large flows easier.
  • Change color: 此选项允许用户更改处理器的颜色,这可以使大流量的可视化管理更容易。
  • Create template: This option allows the user to create a template from the selected Processor.
  • Create template: 此选项允许用户从所选处理器创建模板。
  • Copy: This option places a copy of the selected Processor on the clipboard, so that it may be pasted elsewhere on the canvas by right-clicking on the canvas and selecting Paste. The Copy/Paste actions also may be done using the keystrokes Ctrl-C (Command-C) and Ctrl-V (Command-V).
  • Copy: 此选项将所选处理器的副本放在剪贴板上,以便可以通过右键单击画布并选择“粘贴”将其粘贴到画布上的其他位置。 复制/粘贴操作也可以使用按键Ctrl-C(Command-C)和Ctrl-V(Command-V)完成。
  • Delete: This option allows the DFM to delete a Processor from the canvas.
  • Delete:此选项允许用户从画布中删除处理器。

Input Port

Input Port: Input Ports provide a mechanism for transferring data into a Process Group. When an Input Port is dragged onto the canvas, the DFM is prompted to name the Port. All Ports within a Process Group must have unique names.

Input Port: 输入端口提供了将数据传输到进程组的机制。 将输入端口拖动到画布上时,将提示用户命名端口。 进程组中的所有端口必须具有唯一的名称。

All components exist only within a Process Group. When a user initially navigates to the NiFi page, the user is placed in the Root Process Group. If the Input Port is dragged onto the Root Process Group, the Input Port provides a mechanism to receive data from remote instances of NiFi via Site-to-Site. In this case, the Input Port can be configured to restrict access to appropriate users, if NiFi is configured to run securely. For information on configuring NiFi to run securely, see theSystem Administrator’s Guide.

所有组件仅存在于进程组中。 当用户最初导航到NiFi页面时,用户被放置在根进程组中。 如果将输入端口拖动到根进程组,则输入端口提供了一种通过站点到站点从远程NiFi实例接收数据的机制。 在这种情况下,如果NiFi配置为安全运行,则可以将输入端口配置为限制信任的用户访问。 有关配置NiFi安全运行的信息,请参阅系统管理员指南

Output Port

Output Port: Output Ports provide a mechanism for transferring data from a Process Group to destinations outside of the Process Group. When an Output Port is dragged onto the canvas, the DFM is prompted to name the Port. All Ports within a Process Group must have unique names.

Output Port: 输出端口提供了一种机制,用于将数据从进程组传输到进程组外部的目标。 将输出端口拖动到画布上时,将提示用户命名端口。 进程组中的所有端口必须具有唯一的名称。

If the Output Port is dragged onto the Root Process Group, the Output Port provides a mechanism for sending data to remote instances of NiFi via Site-to-Site. In this case, the Port acts as a queue. As remote instances of NiFi pull data from the port, that data is removed from the queues of the incoming Connections. If NiFi is configured to run securely, the Output Port can be configured to restrict access to appropriate users. For information on configuring NiFi to run securely, see the System Administrator’s Guide.

如果将输出端口拖动到根进程组,则输出端口提供了一种通过站点到站点将数据发送到远程NiFi实例的机制。 在这种情况下,端口充当队列。 当NiFi的远程实例从端口提取数据时,该数据将从传入连接的队列中删除。 如果NiFi配置为安全运行,则可以将输出端口配置为提供适当用户访问。 有关配置NiFi安全运行的信息,请参阅系统管理员指南

Process Group

Process Group: Process Groups can be used to logically group a set of components so that the dataflow is easier to understand and maintain. When a Process Group is dragged onto the canvas, the DFM is prompted to name the Process Group. All Process Groups within the same parent group must have unique names. The Process Group will then be nested within that parent group.

Process Group:进程组可用于对一组组件进行逻辑分组,以便更容易理解和维护数据流。 将进程组拖动到画布上时,将提示用户命名进程组。 同一父级组中的所有进程组必须具有唯一的名称。 然后,进程组将嵌套在该父级组中。

Once you have dragged a Process Group onto the canvas, you can interact with it by right-clicking on the Process Group and selecting an option from context menu.The options available to you from the context menu vary, depending on the privileges assigned to you.

将进程组拖到画布上后,可以通过右键单击进程组并从菜单中选择一个选项来与其进行交互。根据分配给您的权限,从菜单中可用的选项会有所不同。

Process Group Menu

While the options available from the context menu vary, the following options are typically available when you have full privileges to work with the Process Group:

虽然菜单中提供的选项各不相同,但如果您具有使用进程组的所有权限,则通常可以使用以下选项:

  • Configure: This option allows the user to establish or change the configuration of the Process Group.
  • Configure: 此选项允许用户建立或更改进程组的配置。
  • Variables: This option allows the user to create or configure variables within the NiFi UI.
  • Variables: 此选项允许用户在NiFi UI中创建或配置变量。
  • Enter group: This option allows the user to enter the Process Group.
  • Enter group:此选项允许用户进入进程组。

It is also possible to double-click on the Process Group to enter it.

可以双击进程组输入它。

  • Start: This option allows the user to start a Process Group.
  • Start: 此选项允许用户启动进程组。
  • Stop: This option allows the user to stop a Process Group.
  • Stop: 此选项允许用户停止进程组。
  • View status history: This option opens a graphical representation of the Process Group’s statistical information over time.
  • View status history:此选项打开随时间变化的过程组统计信息的图形表示。
  • View connections→Upstream: This option allows the user to see and "jump to" upstream connections that are coming into the Process Group.
  • View connections→Upstream: 此选项允许用户查看并“跳转”到进程组中的上游连接。
  • View connections→Downstream: This option allows the user to see and "jump to" downstream connections that are going out of the Process Group.
  • View connections→Downstream:此选项允许用户查看并“跳转”到进程组外的下游连接。
  • Center in view: This option centers the view of the canvas on the given Process Group.
  • Center in view:此选项会导致画布视图以给定的过程组为中心。
  • Group: This option allows the user to create a new Process Group that contains the selected Process Group and any other components selected on the canvas.
  • Group:此选项允许用户创建一个新的Process Group,其中包含选定的Process Group和画布上选择的任何其他组件。
  • Create template: This option allows the user to create a template from the selected Process Group.
  • Create template: 此选项允许用户从选定的进程组创建模板。
  • Copy: This option places a copy of the selected Process Group on the clipboard, so that it may be pasted elsewhere on the canvas by right-clicking on the canvas and selecting Paste. The Copy/Paste actions also may be done using the keystrokes Ctrl-C (Command-C) and Ctrl-V (Command-V).
  • Copy: 此选项放到剪贴板上选定过程组的副本,以便它可以通过在画布上右击并选择粘贴在画布上粘贴到其他地方。 复制/粘贴操作也可以使用按键Ctrl-C(Command-C)和Ctrl-V(Command-V)完成。
  • Delete: This option allows the DFM to delete a Process Group.
  • Delete:此选项允许用户删除进程组。

Remote Process Group

Remote Process Group:Remote Process Groups appear and behave similar to Process Groups. However, the Remote Process Group (RPG) references a remote instance of NiFi. When an RPG is dragged onto the canvas, rather than being prompted for a name, the DFM is prompted for the URL of the remote NiFi instance. If the remote NiFi is a clustered instance, the URL that should be used is the URL of any NiFi instance in that cluster. When data is transferred to a clustered instance of NiFi via an RPG, the RPG will first connect to the remote instance whose URL is configured to determine which nodes are in the cluster and how busy each node is. This information is then used to load balance the data that is pushed to each node. The remote instances are then interrogated periodically to determine information about any nodes that are dropped from or added to the cluster and to recalculate the load balancing based on each node’s load. For more information, see the section on Site-to-Site.

Remote Process Group:远程进程组的显示和行为与进程组类似。但是,远程进程组(RPG)引用了NiFi的远程实例。将远程进程组拖动到画布上时,不会提示输入名称,而是提示用户输入远程NiFi实例的URL。如果远程NiFi是群集实例,则可以使用的URL是该集群中任何NiFi实例的URL。当数据通过远程进程组传输到NiFi的集群时,远程进程组将首先连接到远程实例,其URL配置为确定集群中的节点以及每个节点的繁忙程度。然后,此信息用于对推送到每个节点的数据进行负载平衡。然后定期询问远程实例,以确定有关从群集中删除或添加到群集的任何节点的信息,并根据每个节点的负载重新计算负载平衡。有关详细信息,请参阅站点到站点

Once you have dragged a Remote Process Group onto the canvas, you can may interact with it by right-clicking on the Remote Process Group and selecting an option from context menu. The options available to you from the context menu vary, depending on the privileges assigned to you.

将远程进程组拖到画布上后,可以通过右键单击远程进程组并从菜单中选择一个选项来与其进行交互。 根据分配给您的权限,菜单中可用的选项会有所不同。

Remote Process Group Menu

While the options available from the context menu vary, the following options are typically available when you have full privileges to work with the Remote Process Group:

虽然菜单中的选项有所不同,但是当您具有使用远程进程组的全部权限时,通常可以使用以下选项:

  • Configure: This option allows the user to establish or change the configuration of the Remote Process Group.
  • Configure: 此选项允许用户建立或更改远程进程组的配置。
  • Enable transmission: Makes the transmission of data between NiFi instances active (see Remote Process Group Transmission).
  • Enable transmission:使NiFi实例之间的数据传输处于活动状态(请参阅远程进程组传输)。
  • Disable transmission: Disables the transmission of data between NiFi instances.
  • Disable transmission:禁用NiFi实例之间的数据传输。
  • View status history: This option opens a graphical representation of the Remote Process Group’s statistical information over time.
  • View status history: 此选项展示了远程过程组的时间统计的图表。
  • View connections→Upstream: This option allows the user to see and "jump to" upstream connections that are coming into the Remote Process Group.
  • View connections→Upstream:此选项允许用户查看和“跳转”进入远程进程组的上游连接。
  • View connections→Downstream: This option allows the user to see and "jump to" downstream connections that are going out of the Remote Process Group.
  • View connections→Downstream: 此选项允许用户查看并“跳转”到远程进程组外的下游连接。
  • Refresh remote: This option refreshes the view of the status of the remote NiFi instance.
  • Refresh remote: 此选项可以刷新远程NiFi实例的状态视图。
  • Group: This option allows the user to create a new Process Group that contains the selected Remote Process Group and any other components selected on the canvas.
  • Group: 此选项允许用户创建一个新的进程组,其中包含指定的远程进程组和在画布上的其他组件。
  • Manage remote ports: This option allows the user to see input ports and/or output ports that exist on the remote instance of NiFi that the Remote Process Group is connected to. Note that if the Site-to-Site configuration is secure, only the ports that the connecting NiFi has been given accessed to will be visible.
  • Manage remote ports: 此选项允许用户查看远程进程组所连接的远程NiFi实例上存在的输入端口和输出端口。 请注意,如果站点到站点配置是安全的,那么仅可以看到已访问的NiFi的端口的那些连接。
  • Center in view: This option centers the view of the canvas on the given Remote Process Group.
  • Center in view:此选项会使画布以指定的远程组为中心移动。
  • Go to: This option opens a view of the remote NiFi instance in a new tab of the browser. Note that if the Site-to-Site configuration is secure, the user must have access to the remote NiFi instance in order to view it.
  • Go to: 此选项在浏览器的新选项卡中打开远程NiFi实例的视图。 请注意,如果站点到站点配置是安全的,则用户必须能够访问远程NiFi实例才能查看它。
  • Group: This option allows the user to create a Process Group containing the selected Remote Process Group.
  • Group: 此选项允许用户创建包含所选远程进程组的进程组。
  • Create template: This option allows the user to create a template from the selected Remote Process Group.
  • Create template: 此选项允许用户从选定的远程进程组创建模板。
  • Copy: This option places a copy of the selected Process Group on the clipboard, so that it may be pasted elsewhere on the canvas by right-clicking on the canvas and selecting Paste. The Copy/Paste actions also may be done using the keystrokes Ctrl-C (Command-C) and Ctrl-V (Command-V).
  • Copy: 此选项将所选进程组的副本放在剪贴板上,以便可以通过右键单击画布并选择“粘贴”将其粘贴到画布上的其他位置。 复制/粘贴操作也可以使用按键Ctrl-C(Command-C)和Ctrl-V(Command-V)完成。
  • Delete: This option allows the DFM to delete a Remote Process Group from the canvas.
  • Delete:此选项允许用户从画布中删除远程进程组。

Funnel

Funnel: Funnels are used to combine the data from many Connections into a single Connection. This has two advantages. First, if many Connections are created with the same destination, the canvas can become cluttered if those Connections have to span a large space. By funneling these Connections into a single Connection, that single Connection can then be drawn to span that large space instead. Secondly, Connections can be configured with FlowFile Prioritizers. Data from several Connections can be funneled into a single Connection, providing the ability to Prioritize all of the data on that one Connection, rather than prioritizing the data on each Connection independently.

Funnel: 漏斗用于将来自多个连接的数据组合到单个连接中。 这有两个好处。 首先,如果使用相同的目标创建了许多连接,而且这些连接在分散到不同空间内,则画布可能会变得混乱。 通过将这些连接汇集到一个连接中,可以绘制该单个连接聚集到一个空间内。 其次,可以使用数据配置器配置多个连接。 来自多个连接的数据可以汇集到一个连接中,从而能够对该一个连接上的所有数据进行优先级排序,而不是单独确定每个连接上的数据的优先级。

Template

Template: Templates can be created by DFMs from sections of the flow, or they can be imported from other dataflows. These Templates provide larger building blocks for creating a complex flow quickly. When the Template is dragged onto the canvas, the DFM is provided a dialog to choose which Template to add to the canvas:

Template: 用户可以从流的各个部分创建模板,也可以从其他数据流导入模板。 这些模板提供了更大的构建模块,可以快速创建复杂的流程。 将模板拖动到画布上时,用户会出现一个对话框,用于选择要添加到画布的模板:

Instantiate Template Dialog

Clicking the drop-down box shows all available Templates. Any Template that was created with a description will show a question mark icon, indicating that there is more information. Hovering over the icon with the mouse will show this description:

单击下拉框可显示所有可用模板。 创建的任何模板都将显示一个问号图标,可以看到更多信息描述。 使用鼠标将鼠标悬停在图标上将显示以下说明:

Instantiate Template Dialog

Label

Label: Labels are used to provide documentation to parts of a dataflow. When a Label is dropped onto the canvas, it is created with a default size. The Label can then be resized by dragging the handle in the bottom-right corner. The Label has no text when initially created. The text of the Label can be added by right-clicking on the Label and choosing Configure

Label: 标签用于为部分数据流提供文档。 将标签放到画布上时,会使用默认大小创建它。 然后可以通过拖动右下角来调整Label的大小。 标签在最初创建时没有文本。 可以通过右键单击标签并选择Configure来添加文本

Component Versions(组件版本)

You have access to information about the version of your Processors, Controller Services, and Reporting Tasks. This is especially useful when you are working within a clustered environment with multiple NiFi instances running different versions of a component or if you have upgraded to a newer version of a processor. The Add Processor, Add Controller Service, and Add Reporting Task dialogs include a column identifying the component version, as well as the name of the component, the organization or group that created the component, and the NAR bundle that contains the component.

您可以访问有关处理器,控制器服务和报告任务的版本的信息。 当您在具有运行不同版本组件的多个NiFi实例的集群环境中工作或者已升级到较新版本的处理器时,此功能尤其有用。 “添加处理器”,“添加控制器服务”和“添加报告任务”对话框包括一个标识组件版本的列,以及组件的名称,创建组件的组织或组以及包含该组件的NAR捆绑包。

Add Processor Version Example

Each component displayed on the canvas also contains this information.

画布上显示的每个组件也包含此信息。

Processor Version Information Example

Sorting and Filtering Components(排序和过滤组件)

When you are adding a component, you can sort on version number or filter based on originating source.
To sort based on version, click the version column to display in ascending or descending version order.
To filter based on source group, click the source drop-down in the upper left of your Add Component dialog, and select the group you want to view.

添加组件时,可以根据原始源对版本号或过滤器进行排序。
要基于版本进行排序,请单击版本列以按升序或降序版本顺序显示。
要基于源组进行过滤,请单击“添加组件”对话框左上角的源下拉列表,然后选择要查看的组。

Add Processor Version Sort and Filter

Changing Component Versions(更改组件版本)

To change a component version, perform the following steps.

要更改组件版本,请执行以下步骤。

  1. Right-click the component on the canvas to display configuration options.

    右键单击画布上的组件以显示配置选项。

  2. Select Change version.

    选择更改版本。

    Processor Change Version

  3. In the Component Version dialog, select the version you want to run from the Version drop-down menu.

    在“组件版本”对话框中,从“版本”下拉菜单中选择要运行的版本。

    Component Version

Understanding Version Dependencies(了解版本依赖关系)

When you are configuring a component, you can also view information about version dependencies.

配置组件时,还可以查看有关版本依赖性的信息。

  1. Right-click your component and select Configure to display the Configure dialog for your component.

    右键单击组件,然后选择“配置”以显示组件的“配置”对话框。

  2. Click the Properties tab.

    单击“属性”选项卡。

  3. Click the information icon to view any version dependency information.

    单击信息图标以查看任何版本依赖关系信息。

Configuration Version Requirements

In the following example, MyProcessor version 1.0 is configured properly with the controller service StandardMyService version 1.0:

在以下示例中,使用控制器服务StandardMyService 1.0版正确配置了MyProcessor 1.0版:

Processor and Controller Service Version Match

If the version of MyProcessor is changed to an incompatible version (MyProcessor 2.0), validation errors will be displayed on the processor:

如果MyProcessor的版本更改为不兼容的版本(MyProcessor 2.0),则验证错误将显示在处理器上:

Processor and Controller Service Version Mismatch Warnings

and an error message will be displayed in the processor’s controller service configuration since the service is no longer valid:

并且由于服务不再有效,因此处理器的控制器服务配置中将显示错误消息:

Processor and Controller Service Version Mismatch Property

Configuring a Processor(配置处理器)

To configure a processor, right-click on the Processor and select the Configure option from the context menu. Alternatively, just double-click on the Processor. The configuration dialog is opened with four different tabs, each of which is discussed below. Once you have finished configuring the Processor, you can apply the changes by clicking the Applybutton or cancel all changes by clicking the Cancel button.

要配置处理器,请右键单击处理器,然后从菜单中选择“配置”选项。 或者,只需双击处理器即可。 打开配置对话框,其中包含四个不同的选项卡,每个选项卡将在下面讨论。 完成处理器的配置后,可以通过单击“应用”按钮应用更改,或单击“取消”按钮取消所有更改。

Note that after a Processor has been started, the context menu shown for the Processor no longer has a Configure option but rather has a View Configuration option. Processor configuration cannot be changed while the Processor is running. You must first stop the Processor and wait for all of its active tasks to complete before configuring the Processor again.

请注意,处理器启动后,为处理器显示的上下文菜单不再具有“配置”选项,而是具有“查看配置”选项。 处理器运行时无法更改处理器配置。 您必须先停止处理器并等待其所有活动任务完成,然后再次配置处理器。

Note that entering certain control characters are not supported and will be automatically filtered out when entered. The following characters and any unpaired Unicode surrogate codepoints will not be retained in any configuration:

请注意,不支持输入某些控制字符,并在输入时自动过滤掉。 任何配置中都不会保留以下字符和任何未配对的Unicode代理点代码点:

[#x0], [#x1], [#x2], [#x3], [#x4], [#x5], [#x6], [#x7], [#x8], [#xB], [#xC], [#xE], [#xF], [#x10], [#x11], [#x12], [#x13], [#x14], [#x15], [#x16], [#x17], [#x18], [#x19], [#x1A], [#x1B], [#x1C], [#x1D], [#x1E], [#x1F], [#xFFFE], [#xFFFF]

Settings Tab(设置选项卡)

The first tab in the Processor Configuration dialog is the Settings tab:

处理器配置对话框中的第一个选项卡是“设置”选项卡:

Settings Tab

This tab contains several different configuration items. First, it allows the DFM to change the name of the Processor. The name of a Processor by default is the same as the Processor type. Next to the Processor Name is a checkbox, indicating whether the Processor is Enabled. When a Processor is added to the canvas, it is enabled. If the Processor is disabled, it cannot be started. The disabled state is used to indicate that when a group of Processors is started, such as when a DFM starts an entire Process Group, this (disabled) Processor should be excluded.

此选项卡包含几个不同的配置项。 首先,它允许用户更改处理器的名称。 默认情况下,处理器的名称与处理器类型相同。 处理器名称旁边是一个复选框,指示处理器是否已启用。 将处理器添加到画布后,将启用它。 如果禁用处理器,则无法启动。 禁用状态用于指示当启动一组处理器时,例如当用户启动整个进程组时,应排除此(禁用)处理器。

Below the Name configuration, the Processor’s unique identifier is displayed along with the Processor’s type and NAR bundle. These values cannot be modified.

在Name配置下方,将显示Processor的唯一标识符以及Processor的类型和NAR包。 这些值无法修改。

Next are two dialogues for configuring 'Penalty Duration' and 'Yield Duration'. During the normal course of processing a piece of data (a FlowFile), an event may occur that indicates that the data cannot be processed at this time but the data may be processable at a later time. When this occurs, the Processor may choose to Penalize the FlowFile. This will prevent the FlowFile from being Processed for some period of time. For example, if the Processor is to push the data to a remote service, but the remote service already has a file with the same name as the filename that the Processor is specifying, the Processor may penalize the FlowFile. The 'Penalty Duration' allows the DFM to specify how long the FlowFile should be penalized. The default value is 30 seconds.

接下来是两个用于配置“惩罚持续时间”和“产量持续时间”的对话框。 在处理一条数据(FlowFile)的正常过程期间,可能发生事件,该事件指示此时不能处理数据但是数据可以在稍后的时间处理。 发生这种情况时,处理器可以选择Penalize FlowFile。 这将阻止FlowFile在一段时间内被处理。 例如,如果处理器要将数据推送到远程服务,但远程服务已经有一个与处理器指定的文件名同名的文件,则处理器可能会惩罚FlowFile。 “惩罚持续时间”允许DFM指定FlowFile应该受到多长时间的惩罚。 默认值为30秒。

Similarly, the Processor may determine that some situation exists such that the Processor can no longer make any progress, regardless of the data that it is processing. For example, if a Processor is to push data to a remote service and that service is not responding, the Processor cannot make any progress. As a result, the Processor should 'yield', which will prevent the Processor from being scheduled to run for some period of time. That period of time is specified by setting the 'Yield Duration'. The default value is 1 second.

类似地,处理器可以确定存在某种情况,使得处理器不再能够进行任何进展,而不管其正在处理的数据。 例如,如果处理器要将数据推送到远程服务并且该服务没有响应,则处理器无法取得任何进展。 结果,处理器应该“让步”,阻止处理器运行一段时间。 通过设置'Yield Duration'来指定该时间段。 默认值为1秒。

The last configurable option on the left-hand side of the Settings tab is the Bulletin level. Whenever the Processor writes to its log, the Processor also will generate a Bulletin. This setting indicates the lowest level of Bulletin that should be shown in the User Interface. By default, the Bulletin level is set to WARN, which means it will display all warning and error-level bulletins.

“设置”选项卡左侧的最后一个可配置选项是“日志”级别。 每当处理器写入其日志时,处理器也将生成日志。 此设置指示应在用户界面中显示的最低级别的日志。 默认情况下,Bulletin级别设置为WARN,这意味着它将显示所有警告和错误级别的日志。

The right-hand side of the Settings tab contains an 'Automatically Terminate Relationships' section. Each of the Relationships that is defined by the Processor is listed here, along with its description. In order for a Processor to be considered valid and able to run, each Relationship defined by the Processor must be either connected to a downstream component or auto-terminated. If a Relationship is auto-terminated, any FlowFile that is routed to that Relationship will be removed from the flow and its processing considered complete. Any Relationship that is already connected to a downstream component cannot be auto-terminated. The Relationship must first be removed from any Connection that uses it. Additionally, for any Relationship that is selected to be auto-terminated, the auto-termination status will be cleared (turned off) if the Relationship is added to a Connection.

“设置”选项卡的右侧包含“自动终止关系”部分。 此处列出了处理器定义的每个关系及其描述。 为了使处理器被视为有效且能够运行,处理器定义的每个关系必须连接到下游组件或自动终止。 如果关系是自动终止的,则将从流中删除任何路由到该关系的FlowFile,并将其处理视为完成。 已连接到下游组件的任何关系都无法自动终止。 必须从使用它的任何连接中删除关系。 此外,对于选择自动终止的任何关系,如果将关系添加到连接,则将清除(关闭)自动终止状态。

Scheduling Tab(调度选项卡)

The second tab in the Processor Configuration dialog is the Scheduling Tab:

“处理器配置”对话框中的第二个选项卡是“计划”选项卡:

Scheduling Tab

Scheduling Strategy(调度策略)

The first configuration option is the Scheduling Strategy. There are three possible options for scheduling components:

第一个配置选项是调度策略。 调度组件有三种可能的选项:

Timer driven: This is the default mode. The Processor will be scheduled to run on a regular interval. The interval at which the Processor is run is defined by the 'Run Schedule' option (see below).

Timer driven: 这是默认模式。 处理器将安排定期运行。 运行处理器的时间间隔由“运行时间表”选项定义(见下文)。

Event driven: When this mode is selected, the Processor will be triggered to run by an event, and that event occurs when FlowFiles enter Connections feeding this Processor. This mode is currently considered experimental and is not supported by all Processors. When this mode is selected, the 'Run Schedule' option is not configurable, as the Processor is not triggered to run periodically but as the result of an event. Additionally, this is the only mode for which the 'Concurrent Tasks' option can be set to 0. In this case, the number of threads is limited only by the size of the Event-Driven Thread Pool that the administrator has configured.

Event driven: 选择此模式后,将触发处理器以由事件运行,并且当数据进入连接此处理器的连接时,将发生该事件。 此模式目前被认为是实验性的,并非所有处理器都支持。 选择此模式时,“运行计划”选项不可配置,因为处理器未被触发定期运行,而是作为事件的结果。 此外,这是“并发任务”选项可以设置为0的唯一模式。在这种情况下,线程数仅受管理员配置的事件驱动线程池的大小限制。

CRON driven: When using the CRON driven scheduling mode, the Processor is scheduled to run periodically, similar to the Timer driven scheduling mode. However, the CRON driven mode provides significantly more flexibility at the expense of increasing the complexity of the configuration. The CRON driven scheduling value is a string of six required fields and one optional field, each separated by a space. These fields are:

CRON driven: 当使用CRON驱动的调度模式时,处理器被安排定期运行,类似于定时器驱动的调度模式。 然而,CRON驱动模式以增加配置的复杂性为代价提供了显着更大的灵活性。 CRON驱动的调度值是由六个必需字段和一个可选字段组成的字符串,每个字段由空格分隔。 这些字段是:

Field Valid values
Seconds 0-59
Minutes 0-59
Hours 0-23
Day of Month 1-31
Month 1-12 or JAN-DEC
Day of Week 1-7 or SUN-SAT
Year (optional) empty, 1970-2099

You typically specify values one of the following ways:

您通常通过以下方式之一指定值:

  • Number: Specify one or more valid value. You can enter more than one value using a comma-separated list.
  • Number:指定一个或多个有效值。 您可以使用逗号分隔列表输入多个值。
  • Range: Specify a range using the - syntax.
  • Range:使用 - 语法指定范围。
  • Increment: Specify an increment using / syntax. For example, in the Minutes field, 0/15 indicates the minutes 0, 15, 30, and 45.
  • Increment: 使用 / 语法指定增量。 例如,在“分钟”字段中,0/15表示分钟0,15,30和45。

You should also be aware of several valid special characters:

您还应该知道几个有效的特殊字符:

  • *  — Indicates that all values are valid for that field.
  • *  —表示所有值对该字段都有效。
  • ?  — Indicates that no specific value is specified. This special character is valid in the Days of Month and Days of Week field.
  • ?  — 表示未指定任何特定值。 此特殊字符在“星期几”和“星期几”字段中有效。
  • L  — You can append L to one of the Day of Week values, to specify the last occurrence of this day in the month. For example, 1L indicates the last Sunday of the month.
  • L  — 您可以将L附加到星期几值中的一个,以指定该月中该日的最后一次出现。 例如,1L表示该月的最后一个星期日。

For example:

例如:

  • The string 0 0 13 * * ? indicates that you want to schedule the processor to run at 1:00 PM every day.
  • 字符串“0 0 13 * *?”表示您希望将处理器安排在每天下午1:00运行。
  • The string 0 20 14 ? * MON-FRI indicates that you want to schedule the processor to run at 2:20 PM every Monday through Friday.
  • 字符串'0 20 14? * MON-FRI`表示您希望将处理器安排在每周一至周五下午2:20运行。
  • The string 0 15 10 ? * 6L 2011-2017 indicates that you want to schedule the processor to run at 10:15 AM, on the last Friday of every month, between 2011 and 2017.
  • 字符串0 15 10? * 6L 2011-2017表示您希望将处理器安排在2011年至2017年的每个月的最后一个星期五上午10:15运行。

For additional information and examples, see the Chron Trigger Tutorial in the Quartz documentation.

有关其他信息和示例,请参阅Quartz文档中的Chron Trigger Tutorial

Concurrent Tasks(并发任务)

Next, the Scheduling tab provides a configuration option named 'Concurrent Tasks'. This controls how many threads the Processor will use. Said a different way, this controls how many FlowFiles should be processed by this Processor at the same time. Increasing this value will typically allow the Processor to handle more data in the same amount of time. However, it does this by using system resources that then are not usable by other Processors. This essentially provides a relative weighting of Processors — it controls how much of the system’s resources should be allocated to this Processor instead of other Processors. This field is available for most Processors. There are, however, some types of Processors that can only be scheduled with a single Concurrent task.

接下来,Scheduling选项卡提供名为'Concurrent Tasks'的配置选项。 这可以控制处理器将使用的线程数。 换句话说,它控制此处理器应同时处理多少个数据。 增加此值通常会使处理器在相同的时间内处理更多数据。 但是,它通过使用其他处理器无法使用的系统资源来实现此目的。 这基本上提供了处理器的相对权重 - 它控制应该将多少系统资源分配给此处理器而不是其他处理器。 该字段适用于大多数处理器。 但是,某些类型的处理器只能使用单个“并发”任务进行调度。

Run Schedule(运行计划)

The 'Run Schedule' dictates how often the Processor should be scheduled to run. The valid values for this field depend on the selected Scheduling Strategy (see above). If using the Event driven Scheduling Strategy, this field is not available. When using the Timer driven Scheduling Strategy, this value is a time duration specified by a number followed by a time unit. For example, 1 second or 5 mins. The default value of 0 sec means that the Processor should run as often as possible as long as it has data to process. This is true for any time duration of 0, regardless of the time unit (i.e., 0 sec, 0 mins, 0 days). For an explanation of values that are applicable for the CRON driven Scheduling Strategy, see the description of the CRON driven Scheduling Strategy itself.

“运行计划”指示应该安排处理器运行的频率。 此字段的有效值取决于所选的调度策略(参见上文)。 如果使用事件驱动的调度策略,则此字段不可用。 使用定时器驱动的调度策略时,该值是由数字后跟时间单位指定的持续时间。 例如,1秒或5分钟。 默认值0秒表示处理器应尽可能频繁地运行,只要它有要处理的数据即可。 无论时间单位如何(即0秒,0分钟,0天),对于0的任何持续时间都是如此。 有关适用于CRON驱动的调度策略的值的说明,请参阅CRON驱动的调度策略本身的说明。

Execution(执行)

The Execution setting is used to determine on which node(s) the Processor will be scheduled to execute. Selecting 'All Nodes' will result in this Processor being scheduled on every node in the cluster. Selecting 'Primary Node' will result in this Processor being scheduled on the Primary Node only. Processors that have been configured for 'Primary Node' execution are identified by a "P" next to the processor icon:

执行设置是确定处理器将被调度执行的节点。 选择“所有节点”将导致在集群中的每个节点上调度此处理器。 选择“主节点”将导致此处理器仅在主节点上进行调度。 已为“主节点”执行配置的处理器由处理器图标旁边的“P”标识:

Primary Node Processor

To quickly identify 'Primary Node' processors, the "P" icon is also shown in the Processors tab on the Summary page:

要快速识别“主节点”处理器,“P”图标也会显示在“摘要”页面的“处理器”选项卡中:

Primary Node Processors in Summary Page

Run Duration(运行持续时间)

The right-hand side of the Scheduling tab contains a slider for choosing the 'Run Duration'. This controls how long the Processor should be scheduled to run each time that it is triggered. On the left-hand side of the slider, it is marked 'Lower latency' while the right-hand side is marked 'Higher throughput'. When a Processor finishes running, it must update the repository in order to transfer the FlowFiles to the next Connection. Updating the repository is expensive, so the more work that can be done at once before updating the repository, the more work the Processor can handle (Higher throughput). However, this means that the next Processor cannot start processing those FlowFiles until the previous Process updates this repository. As a result, the latency will be longer (the time required to process the FlowFile from beginning to end will be longer). As a result, the slider provides a spectrum from which the DFM can choose to favor Lower Latency or Higher Throughput.

“调度”选项卡的右侧包含一个用于选择“运行持续时间”的滑块。 这可以控制处理器每次触发时应安排运行的时间。 在滑块的左侧,标记为“较低延迟”,而右侧标记为“较高吞吐量”。 处理器完成运行后,必须更新存储库才能将数据传输到下一个连接。 更新存储库的成本很高,因此在更新存储库之前可以立即完成的工作量越多,处理器可以处理的工作量就越多(吞吐量越高)。 但是,这意味着在上一个处理器更新此存储库之前,下一个处理器无法开始处理这些数据。 因此,延迟将更长(从开始到结束处理数据所需的时间将更长)。 因此,滑块提供了一个范围,用户可以从中选择支持较低延迟或较高吞吐量。

Properties Tab(属性选项卡)

The Properties tab provides a mechanism to configure Processor-specific behavior. There are no default properties. Each type of Processor must define which Properties make sense for its use case. Below, we see the Properties tab for a RouteOnAttribute Processor:

Properties选项卡提供了一种配置特定于处理器的行为的机制。 没有默认属性。 每种类型的处理器必须定义哪些属性对其用例有意义。 下面,我们看到RouteOnAttribute Processor的Properties选项卡:

Properties Tab

This Processor, by default, has only a single property: 'Routing Strategy'. The default value is 'Route to Property name'. Next to the name of this property is a small question-mark symbol ( Question Mark ). This help symbol is seen in other places throughout the User Interface, and it indicates that more information is available. Hovering over this symbol with the mouse will provide additional details about the property and the default value, as well as historical values that have been set for the Property.

默认情况下,此处理器只有一个属性:“路由策略”。 默认值为“路由到属性名称”。 此属性的名称旁边是一个小问号符号(问号)。 在整个用户界面的其他位置可以看到此帮助符号,它表示可以获得更多信息。 使用鼠标将鼠标悬停在此符号上将提供有关属性和默认值的其他详细信息,以及为该属性设置的历史值。

Clicking on the value for the property will allow a DFM to change the value. Depending on the values that are allowed for the property, the user is either provided a drop-down from which to choose a value or is given a text area to type a value:

单击属性的值将允许用户更改该值。 根据属性允许的值,向用户提供下拉列表,或者为用户提供文本区域:

Edit Property with Dropdown

In the top-right corner of the tab is a button for adding a New Property. Clicking this button will provide the DFM with a dialog to enter the name and value of a new property. Not all Processors allow User-Defined properties. In processors that do not allow them, the Processor becomes invalid when User-Defined properties are applied. RouteOnAttribute, however, does allow User-Defined properties. In fact, this Processor will not be valid until the user has added a property.

选项卡的右上角是一个用于添加新属性的按钮。 单击此按钮将为用户提供一个对话框,用于输入新属性的名称和值。 并非所有处理器都允许用户定义的属性。 在不允许它们的处理器中,处理器在应用用户定义属性时变为无效。 但是,RouteOnAttribute允许用户定义的属性。 实际上,在用户添加属性之前,此处理器是无效的。

Edit Property with Text Area

Note that after a User-Defined property has been added, an icon will appear on the right-hand side of that row ( Delete Icon ). Clicking it will remove the User-Defined property from the Processor.

请注意,添加用户定义属性后,该行的右侧将出现一个图标 ( Delete Icon ).。 单击它将从处理器中删除用户定义的属性。

Some processors also have an Advanced User Interface (UI) built into them. For example, the UpdateAttribute processor has an Advanced UI. To access the Advanced UI, click the Advanced button that appears at the bottom of the Configure Processor window. Only processors that have an Advanced UI will have this button.

某些处理器还内置了高级用户界面(UI)。 例如,UpdateAttribute处理器具有高级UI。 要访问高级UI,请单击“配置处理器”窗口底部显示的“高级”按钮。 只有具有高级UI的处理器才具有此按钮。

Some processors have properties that refer to other components, such as Controller Services, which also need to be configured. For example, the GetHTTP processor has an SSLContextService property, which refers to the StandardSSLContextService controller service. When DFMs want to configure this property but have not yet created and configured the controller service, they have the option to create the service on the spot, as depicted in the image below. For more information about configuring Controller Services, see the Controller Services section.

某些处理器具有引用其他组件的属性,例如Controller Services,这些组件也需要进行配置。 例如,GetHTTP处理器具有SSLContextService属性,该属性引用StandardSSLContextService控制器服务。 当DFM想要配置此属性但尚未创建和配置控制器服务时,他们可以选择在现场创建服务,如下图所示。 有关配置Controller Services的详细信息,请参阅Controller Services部分。

Create Service

Comments Tab(Comments 选项卡)

The last tab in the Processor configuration dialog is the Comments tab. This tab simply provides an area for users to include whatever comments are appropriate for this component. Use of the Comments tab is optional:

处理器配置对话框中的最后一个选项卡是“注释”选项卡。 此选项卡仅为用户提供一个区域,以包含适用于此组件的任何注释。 使用“注释”选项卡是可选的:

Comments Tab

Additional Help(其他帮助)

You can access additional documentation about each Processor’s usage by right-clicking on the Processor and selecting 'Usage' from the context menu. Alternatively, select Help from the Global Menu in the top-right corner of the UI to display a Help page with all of the documentation, including usage documentation for all the Processors that are available. Click on the desired Processor to view usage documentation.

您可以通过右键单击处理器并从上下文菜单中选择“使用”来访问有关每个处理器使用情况的其他文档。 或者,从UI右上角的“全局菜单”中选择“帮助”,以显示包含所有文档的“帮助”页面,包括所有可用处理器的使用文档。 单击所需的处理器以查看使用文档。

Using Custom Properties with Expression Language(使用表达式语言的自定义属性)

You can use NiFi Expression Language to reference FlowFile attributes, compare them to other values, and manipulate their values when you are creating and configuring dataflows. For more information on Expression Language, see the Expression Language Guide.

您可以使用NiFi表达式语言来引用数据属性,将它们与其他值进行比较,并在创建和配置数据流时修改它们的值。 有关表达式语言的更多信息,请参阅表达式语言指南

In addition to using FlowFile attributes, system properties, and environment properties within Express Language, you can also define custom properties for Expression Language use. Defining custom properties gives you more flexibility in handling and processing dataflows. You can also create custom properties for connection, server, and service properties, for easier dataflow configuration.

除了在Express Language中使用数据属性,系统属性和环境属性之外,您还可以定义表达式语言使用的自定义属性。 定义自定义属性可以更灵活地处理和处理数据流。 您还可以为连接,服务器和服务属性创建自定义属性,以便更轻松地配置数据流。

NiFi properties have resolution precedence of which you should be aware when creating custom properties:

NiFi属性具有在创建自定义属性时应注意的分辨率优先级:

  • Processor-specific attributes
  • FlowFile properties

  • FlowFile attributes

  • From variable registry:

    • User defined properties (custom properties)
    • 用户定义的属性(自定义属性)
    • System properties
    • 系统属性
    • Operating System environment variables
    • 操作系统环境变量

When you are creating custom properties, ensure that each custom property contains a distinct property value, so that it is not overridden by existing environment properties, system properties, or FlowFile attributes.

在创建自定义属性时,请确保每个自定义属性包含不同的属性值,以便现有环境属性,系统属性或数据属性不会覆盖它。

There are two ways to use and manage custom properties:

有两种方法可以使用和管理自定义属性:

  • In the NiFi UI via the Variables window
  • 在NiFi UI中通过Variables窗口
  • Referencing custom properties via 'nifi.properties'
  • 通过'nifi.properties'配置文件自定义属性

Variables Window(变量窗口)

Variables can be created and configured within the NiFi UI. The variables can be used in any field that supports Expression Language. NiFi automatically picks up new or modified variables created in the UI.

可以在NiFi UI中创建和配置变量。 变量可用于支持表达式语言的任何字段。 NiFi自动获取在UI中创建的新变量或修改变量。

To access the Variables window, right-click on the canvas with nothing selected:

要访问“变量”窗口,请在画布的空白区域单击鼠标右键:

Variables in Context Menu for RPG

Select "Variables" from the Context Menu:

从菜单中选择“变量”:

Empty Variables Window

"Variables" is also available in the right-click Context Menu when a process group is selected:

选择进程组时,右键单击“菜单”中也可以使用“变量”:

Variables in Context Menu for PG

Creating a Variable(创建变量)

In the Variables window, click the "+" button to create a new variable. Add a name:

在“变量”窗口中,单击“+”按钮以创建新变量。 添加名称:

Variable Name Creation

and a value:

还有值

Variable Value Creation

Select "Apply":

选择“Apply”:

New Variable Applied

Steps to update the variable are performed (Identifying components affected, Stopping affected Processors, etc.). For example, the Referencing Processors section now lists the "PutFile-Root" processor. Selecting the name of the processor in the list will navigate to that processor on the canvas. Looking at the properties of the processor, ${putfile_dir} is referenced by the Directory property:

执行更新变量的步骤(识别受影响的组件,停止受影响的处理器等)。 例如,Referencing Processors部分现在列出了“PutFile-Root”处理器。 在列表中点击处理器名称将跳转到对应的处理器中。 查看处理器的属性,Directory属性引用$ {putfile_dir}

Processor Property Using Variable

Variable Scope(变量作用空间)

Variables are scoped by the Process Group they are defined in and are available to any Processor defined at that level and below (i.e. any descendant Processors).

变量的作用域由它们定义的进程组确定,并且可供该级别及以下定义的任何处理器使用(即任何后代处理器)。

Variables in a descendant group override the value in a parent group. More specifically, if a variable x is declared at the root group and also declared inside a process group, components inside the process group will use the value of x defined in the process group.

后代组中的变量会覆盖父组中的值。 更具体地说,如果变量“x”在根组中声明并且也在进程组内声明,则进程组内的组件将使用进程组中定义的“x”值。

For example, in addition to the putfile_dir variable that exists at the root process group, assume another putfile_dirvariable was created within Process Group A. If one of the components within Process Group A references putfile_dir, both variables will be listed, but the putfile_dir from the root group will have a strikethrough indicating that is is being overridden:

例如,除了存在于根进程组的putfile_dir变量之外,假设在进程组A中创建了另一个putfile_dir变量。如果进程组A中的一个组件引用了putfile_dir,则两个变量都将是 列出,但根组中的putfile_dir将有一个删除线,表示正在被覆盖:

Variable Overridden

A variable can only be modified for the process group it was created in, which is listed at the top of the Variables window. To modify a variable defined in a different process group, select the "arrow" icon in that variable’s row:

只能通过创建它的进程组修改变量,该变量列在“变量”窗口的顶部。 要修改在不同进程组中定义的变量,请选择该变量行中的“箭头”图标:

Variable Go To

which will navigate to the Variables window for that process group:

这将跳转到该进程组的Variables窗口:

Variables Window for RPG

Variable Permissions(变量权限)

Variable permissions are based solely on the privileges configured on the corresponding Process Group.

变量权限仅支持在相应进程组上配置的权限。

For example, if a user does not have access to View a process group, the Variables window can not be viewed for that process group:

例如,如果用户无权查看进程组,则无法查看该进程组的“变量”窗口:

Insufficient Permissions to View Variables

If a user has access to View a process group but does not have access to Modify the process group, the variables can be viewed but not modified.

如果用户有权查看流程组但无权访问“修改流程组”,则可以查看变量但不能修改变量。

For information on how to manage privileges on components, see the Access Policies section in the System Administrator’s Guide.

有关如何管理组件权限的信息,请参阅系统管理员中的访问策略的在指南部分。

Referencing Controller Services(控制器服务)

In addition to Referencing Processors, the Variables window also displays Referencing Controller Services:

除了Referencing Processors之外,Variables窗口还显示Referencing Controller Services:

Referencing Controller Services

Selecting the name of the controller service will navigate to that controller service in the Configuration window:

选择控制器服务的名称将跳转到“配置”窗口中的该控制器服务:

Controller Service Using Variable

Unauthorized Referencing Components(未经授权的组件)

When View or Modify privileges are not given to a component that references a variable, the UUID of the component will be displayed in the Variables window:

如果未向引用变量的组件提供“查看”或“修改”权限,则组件的UUID将显示在“变量”窗口中:

Unauthorized Referencing Components

In the above example, the variable property1 is referenced by a processor that "user1" is not able to view:

在上面的示例中,变量property1是“user1”的处理器无法找到的:

Unauthorized Referencing Processor

Referencing Custom Properties via nifi.properties(通过nifi.properties配置自定义属性)

Identify one or more sets of key/value pairs, and give them to your system administrator.

识别一组或多组键/值对,并将它们提供给系统管理员。

Once the new custom properties have been added, ensure that the nifi.variable.registry.properties field in the 'nifi.properties' file is updated with the custom properties location.

添加新的自定义属性后,请确保使用自定义属性位置更新“nifi.properties”文件中的“nifi.variable.registry.properties”字段。

NiFi must be restarted for these updates to be picked up.

NIFI必须重启才会生效

Controller Services(控制器服务)

Controller Services are shared services that can be used by reporting tasks, processors, and other services to utilize for configuration or task execution.

Controller Services是共享服务,可供报告,处理器和其他服务使用,以用于配置或任务执行。

Controller Services defined on the controller level are limited to reporting tasks and other services defined there. Controller Services for use by processors in your dataflow must be defined in the configuration of the root process group or sub-process group(s) where they will be used.

控制器服务仅限于报告任务和其他定义的服务。当被在处理器中使用时,控制器服务必须是根进程组或子进程组中已经配置的。

If your NiFi instance is secured, your ability to view and add Controller Services is dependent on the privileges assigned to you. If you do not have access to one or more Controller Services, you are not able to see or access it in the UI. Access privileges can be assigned on a global or Controller Service-specific basis (see Accessing the UI with Multi-Tenant Authorization for more information).

果您的NiFi实例受到保护,您查看和添加Controller Services的能力取决于分配给您的权限。 如果您无权访问一个或多个Controller Services,则无法在UI中查看或访问它。 可以在全局或特定于Controller Service的基础上分配访问权限(有关更多信息,请参阅访问具有多租户授权的UI )。

Adding Controller Services for Reporting Tasks(报告任务添加控制器服务)

To add a Controller Service for a reporting task, select Controller Settings from the Global Menu.

要为报告任务添加控制器服务,请从全局菜单中选择控制器设置。

Global Menu - Controller Settings

This displays the NiFi Settings window. The window has four tabs: General, Reporting Task Controller Services, Reporting Tasks and Registry Clients. The General tab provides settings for the overall maximum thread counts of the instance.

这将显示“NiFi设置”窗口。该窗口有四个选项卡:常规、报告任务控制器服务、报告任务和注册表客户端。 “常规”选项卡提供实例的总体最大线程数的设置。

Controller Settings General Tab

To the right of the General tab is the Reporting Task Controller Services tab. From this tab, the DFM may click the "+" button in the upper-right corner to create a new Controller Service.

“常规”选项卡右侧是“控制器服务”选项卡。在此选项卡中,用户可以单击右上角的“+”按钮以创建新的Controller Service。

Controller Services Tab

The Add Controller Service window opens. This window is similar to the Add Processor window. It provides a list of the available Controller Services on the right and a tag cloud, showing the most common category tags used for Controller Services, on the left. The DFM may click any tag in the tag cloud in order to narrow down the list of Controller Services to those that fit the categories desired. The DFM may also use the Filter field at the top-right of the window to search for the desired Controller Service or use the Source drop-down at the top-left to filter the list by the group who created them. Upon selecting a Controller Service from the list, the DFM can see a description of the service below. Select the desired controller service and click Add, or simply double-click the name of the service to add it.

打开“添加控制器服务”窗口。 此窗口类似“添加处理器”窗口。 它提供了右侧可用的Controller Services列表和标签云,左侧显示Controller Services最常见类别标签。 DFM可以单击标签云中的任何标签,以便将Controller Services列表范围缩小到适合的类别。 DFM还可以使用窗口右上角的“过滤器”字段来搜索所需的“控制器服务”,或使用左上角的“源”下拉列表按创建它们的组筛选。 从列表中选择Controller Service后,DFM可以在下面看到该服务的描述。 选择所需的控制器服务,然后单击“添加”,或者只需双击服务名称即可。

Add Controller Service Window

Once you have added a Controller Service, you can configure it by clicking the Configure button in the far-right column. Other buttons in this column include Enable, Remove and Access Policies.

添加控制器服务后,可以通过单击最右侧列中的“配置”按钮对其进行配置。 其他按钮为启用,删除和访问策略。

Controller Services Buttons

You can obtain information about Controller Services by clicking the Usage and Alerts buttons in the left-hand column.

您可以通过单击左侧列中的“使用和警报”按钮来获取有关Controller Services的信息。

Controller Services Information Buttons

When the DFM clicks the Configure button, a Configure Controller Service window opens. It has three tabs: Settings, Properties,and Comments. This window is similar to the Configure Processor window. The Settings tab provides a place for the DFM to give the Controller Service a unique name (if desired). It also lists the UUID, Type, Bundle and Support information for the service and provides a list of other components (reporting tasks or other controller services) that reference the service.

当DFM单击“配置”按钮时,将打开“配置控制器服务”窗口。 它有三个选项卡:设置,属性和注释。 此窗口类似“配置处理器”窗口。 “设置”选项卡为DFM提供了一个功能,以便为Controller Service提供唯一的名称(如果需要)。 它还列出了服务的UUID,类型,捆绑和支持信息,并提供了引用该服务的其他组件(报告任务或其他控制器服务)的列表。

Configure Controller Service Settings

The Properties tab lists the various properties that apply to the particular controller service. As with configuring processors, the DFM may hover over the question mark icons to see more information about each property.

“属性”选项卡列出了适用于特定控制器服务的各种属性。 与配置处理器一样,DFM可以将鼠标悬停在问号图标上以查看有关每个属性的更多信息。

Configure Controller Service Properties

The Comments tab is just an open-text field, where the DFM may include comments about the service. After configuring a Controller Service, click the Apply button to apply the configuration and close the window, or click the Cancel button to cancel the changes and close the window.

“注释”选项卡只是一个开放文本,其中DFM可能包含有关服务的注释。 配置Controller Service后,单击“应用”按钮以应用关闭窗口,或单击“取消”按钮取消更改并关闭窗口。

Adding Controller Services for Dataflows(为数据流添加控制器服务)

To add a Controller Service for a dataflow, you can either right click a Process Group and select Configure, or click Configure from the Operate Palette.

要为数据流添加控制器服务,可以右键单击“进程组”并选择“配置”,或单击“操作选项板”中的“配置”。

Process Group Configuration Options

When you click Configure from the Operate Palette with nothing selected on your canvas, you add a Controller Service for your Root Process Group. That Controller Service is then available to all nested Process Groups in your dataflow. When you select a Process Group on the canvas and then click Configure from either the Operate Palette or the Process Group context menu, the service will be available to all Processors and Controller Services defined in that Process Group and below.

在画布上单击“操作选项板”中的“配置”时,如果在画布上未选择任何内容,则为根进程组添加控制器服务。 然后,该控制器服务可用于数据流中的所有进程组。 在画布上选择“进程组”,然后从“操作选项板”或“进程组”菜单中单击“配置”时,该服务将可用于该进程组及下图中定义的所有处理器和控制器服务。

Process Group Controller Services Scope

Use the following steps to add a Controller Service:

添加Controller服务的步骤:

  1. Click Configure, either from the Operate Palette, or from the Process Group context menu. This displays the process group Configuration window. The window has two tabs: General and Controller Services. The General tab is for settings that pertain to general information about the process group. For example, if configuring the root process group, the DFM can provide a unique name for the overall dataflow, as well as comments that describe the flow (Note: this information is visible to any other NiFi instance that connects remotely to this instance (using Remote Process Groups, a.k.a., Site-to-Site)).

    单击“配置”,可以从“操作选项板”或“进程组”菜单中单击“配置”。 这将显示进程组“配置”窗口。 该窗口有两个选项卡:常规和控制器服务。 “常规”选项卡是常规信息有关的设置。 例如,如果配置根进程组,DFM可以为整个数据流提供唯一的名称,以及描述流的注释(注意:此信息对于远程连接到此实例的任何其他NiFi实例是可见的(使用 远程进程组,又叫,站点到站点))。

Process Group Configuration Window

  1. From the Process Group Configuration page, select the Controller Services tab.

    从Process Group Configuration页面中,选择Controller Services选项卡。

  2. Click the "+" button to display the Add Controller Service dialog.

    单击“+”按钮以显示“添加控制器服务”对话框。

  3. Select the Controller Service desired, and click Add.

    选择所需的Controller Service,然后单击“添加”。

  4. Perform any necessary Controller Service configuration tasks by clicking the Configure icon (Configure) in the right-hand column.

    通过单击右侧列中的配置图标(Configure)执行任何必要的Controller Service配置任务。

Enabling/Disabling Controller Services

After a Controller Service has been configured, it must be enabled in order to run. Do this using the Enable button (Enable Button) in the far-right column of the Controller Services tab. In order to modify an existing/running controller service, the DFM needs to stop/disable it (as well as all referencing reporting tasks and controller services). Do this using the Disable button (Disable Button). Rather than having to hunt down each component that is referenced by that controller service, the DFM has the ability to stop/disable them when disabling the controller service in question. When enabling a controller service, the DFM has the option to either start/enable the controller service and all referencing components or start/enable only the controller service itself.

配置Controller Service后,必须启用它才能运行。 使用“控制器服务”选项卡的最右侧列中的“启用”按钮 (Enable Button)。 为了修改现有/正在运行的控制器服务,DFM需要停止/禁用它(以及所有引用报告任务和控制器服务)。 使用“禁用”按钮( (Disable Button)。 DFM可以在禁用相关控制器服务时停止/禁用它们,而不必搜寻该控制器服务引用的每个组件。 启用控制器服务时,DFM可以选择启动/启用控制器服务和所有引用组件,也可以仅启动/启用控制器服务本身。

Reporting Tasks(报告任务)

Reporting Tasks run in the background to provide statistical reports about what is happening in the NiFi instance. The DFM adds and configures Reporting Tasks similar to the process for Controller Services. To add a Reporting Task, select Controller Settings from the Global Menu.

报告任务在后台运行,以提供有关NiFi实例中发生情况的统计报告。 DFM添加和配置报告任务,类似于Controller Services的过程。 要添加报告任务,请从全局菜单中选择控制器设置。

Global Menu - Controller Settings

This displays the NiFi Settings window. Select the Reporting Tasks tab and click the "+" button in the upper-right corner to create a new Reporting Task.

这将显示“NiFi设置”窗口。 选择“报告任务”选项卡,然后单击右上角的“+”按钮以创建新的报告任务。

Reporting Tasks Tab

The Add Reporting Task window opens. This window is similar to the Add Processor window. It provides a list of the available Reporting Tasks on the right and a tag cloud, showing the most common category tags used for Reporting Tasks, on the left. The DFM may click any tag in the tag cloud in order to narrow down the list of Reporting Tasks to those that fit the categories desired. The DFM may also use the Filter field at the top-right of the window to search for the desired Reporting Task or use the Source drop-down at the top-left to filter the list by the group who created them. Upon selecting a Reporting Task from the list, the DFM can see a description of the task below. Select the desired reporting task and click Add, or simply double-click the name of the service to add it.

打开“添加报告任务”窗口。 此窗口类似于“添加处理器”窗口。 它提供了右侧可用报告任务的列表和标签云,显示了左侧用于报告任务的最常见类别标签。 DFM可以单击标签云中的任何标签,以便将报告任务列表缩小到适合所需类别的那些。 DFM还可以使用窗口右上角的“过滤器”字段来搜索所需的“报告任务”,或使用左上角的“源”下拉列表按创建它们的组筛选列表。 从列表中选择报告任务后,DFM可以在下面看到该任务的描述。 选择所需的报告任务,然后单击“添加”,或者只需双击要添加的服务名称即可。

Add Reporting Task Window

Once a Reporting Task has been added, the DFM may configure it by clicking the Edit button in the far-right column. Other buttons in this column include Start, Remove, State and Access Policies.

添加报告任务后,DFM可以通过单击最右侧列中的“编辑”按钮对其进行配置。 此列中的其他按钮包括启动,删除,状态和访问策略。

Reporting Tasks Edit Buttons

You can obtain information about Reporting Tasks by clicking the View Details, Usage, and Alerts buttons in the left-hand column.

您可以通过单击左侧列中的“查看详细信息”,“使用情况”和“警报”按钮来获取有关报告任务的信息。

Reporting Tasks Information Buttons

When the DFM clicks the Edit button, a Configure Reporting Task window opens. It has three tabs: Settings, Properties, and Comments. This window is similar to the Configure Processor window. The Settings tab provides a place for the DFM to give the Reporting Task a unique name (if desired). It also lists the UUID, Type, and Bundle information for the task and provides settings for the task’s Scheduling Strategy and Run Schedule (similar to the same settings in a processor). The DFM may hover the mouse over the question mark icons to see more information about each setting.

当DFM单击“编辑”按钮时,将打开“配置报告任务”窗口。 它有三个选项卡:设置,属性和注释。 此窗口类似于“配置处理器”窗口。 “设置”选项卡为DFM提供了一个功能,以便为报告任务提供唯一的名称(如果需要)。 它还列出了任务的UUID,Type和Bundle信息,并提供了任务的Scheduling Strategy和Run Schedule的设置(类似于处理器中的相同设置)。 DFM可以将鼠标悬停在问号图标上以查看有关每个设置的更多信息。

Configure Reporting Task Settings

The Properties tab lists the various properties that may be configured for the task. The DFM may hover the mouse over the question mark icons to see more information about each property.

“属性”选项卡列出了可为任务配置的各种属性。 DFM可以将鼠标悬停在问号图标上以查看有关每个属性的更多信息。

Configure Reporting Task Properties

The Comments tab is just an open-text field, where the DFM may include comments about the task. After configuring the Reporting Task, click the Apply button to apply the configuration and close the window, or click the Cancel button to cancel the changes and close the window.

“注释”选项卡只是一个开放文本字段,其中DFM可能包含有关任务的注释。 配置报告任务后,单击“应用”按钮应用配置并关闭窗口,或单击“取消”按钮取消更改并关闭窗口。

When you want to run the Reporting Task, click the Start button (Start Button).

如果要运行报告任务,请单击“开始”按钮(Start Button)。

Connecting Components(连接组件)

Once processors and other components have been added to the canvas and configured, the next step is to connect them to one another so that NiFi knows what to do with each FlowFile after it has been processed. This is accomplished by creating a Connection between each component. When the user hovers the mouse over the center of a component, a new Connection icon ( Connection Bubble ) appears:

将处理器和其他组件添加到画布并进行配置后,下一步是将它们相互连接,以便NiFi知道在处理完每个FlowFile后如何处理。 这是通过在每个组件之间创建连接来实现的。 当用户将鼠标悬停在组件的中心上时,会出现一个新的连接图标( Connection Bubble ) :

Processor with Connection Bubble

The user drags the Connection bubble from one component to another until the second component is highlighted. When the user releases the mouse, a 'Create Connection' dialog appears. This dialog consists of two tabs: 'Details' and 'Settings'. They are discussed in detail below. Note that it is possible to draw a connection so that it loops back on the same processor. This can be useful if the DFM wants the processor to try to re-process FlowFiles if they go down a failure Relationship. To create this type of looping connection, simply drag the connection bubble away and then back to the same processor until it is highlighted. Then release the mouse and the same 'Create Connection' dialog appears.

用户将连接气泡从一个组件拖动到另一个组件,直到第二个组件突出显示。 当用户释放鼠标时,会出现“创建连接”对话框。 该对话框包含两个选项卡:“详细信息”和“设置”。 它们将在下面详细讨论。 请注意,可以连接自己,以便它在同一处理器上循环。 如果DFM希望处理器在失败关系时尝试重新处理FlowFiles,这将非常有用。 要创建这种类型的循环连接,只需将连接气泡拖离,然后再返回到同一处理器,直到它突出显示。 然后释放鼠标,出现相同的“创建连接”对话框。

Details Tab(Details选项卡)

The Details tab of the 'Create Connection' dialog provides information about the source and destination components, including the component name, the component type, and the Process Group in which the component lives:

“创建连接”对话框的“详细信息”选项卡提供有关源和目标组件的信息,包括组件名称,组件类型和组件所在的进程组:

Create Connection

Additionally, this tab provides the ability to choose which Relationships should be included in this Connection. At least one Relationship must be selected. If only one Relationship is available, it is automatically selected.

此外,此选项卡还提供了选择此连接中应包含哪些关系的功能。 必须至少选择一个关系。 如果只有一个关系可用,则会自动选择它。

If multiple Connections are added with the same Relationship, any FlowFile that is routed to that Relationship will automatically be 'cloned', and a copy will be sent to each of those Connections.

如果使用相同的关系添加多个连接,则将自动“克隆”路由到该关系的任何FlowFile,并将副本发送到每个连接。

Settings(设置)

The Settings tab provides the ability to configure the Connection’s Name, FlowFile Expiration, Back Pressure Thresholds, Load Balance Strategy and Prioritization:

“设置”选项卡提供配置连接名称,FlowFile到期,背压阈值,负载平衡策略和优先级的功能:

Connection Settings

The Connection name is optional. If not specified, the name shown for the Connection will be names of the Relationships that are active for the Connection.

连接名称是可选的。如果未指定,则为Connection显示的名称将是Connection的活动关系的名称。

FlowFile Expiration(FlowFile到期)

FlowFile expiration is a concept by which data that cannot be processed in a timely fashion can be automatically removed from the flow. This is useful, for example, when the volume of data is expected to exceed the volume that can be sent to a remote site. In this case, the expiration can be used in conjunction with Prioritizers to ensure that the highest priority data is processed first and then anything that cannot be processed within a certain time period (one hour, for example) can be dropped. The expiration period is based on the time that the data entered the NiFi instance. In other words, if the file expiration on a given connection is set to '1 hour', and a file that has been in the NiFi instance for one hour reaches that connection, it will expire. The default value of 0 sec indicates that the data will never expire. When a file expiration other than '0 sec' is set, a small clock icon appears on the connection label, so the DFM can see it at-a-glance when looking at a flow on the canvas.

FlowFile到期概念是一个可以自动从流中删除无法及时处理的数据。例如,当预计数据量超过可以发送到远程站点的卷时。到期可以与优先级排序器一起使用,以确保首先处理最高优先级数据,然后可以丢弃在特定时间段(例如,一小时)内无法处理的任何内容。到期时间基于数据进入NiFi实例的时间。换句话说,如果给定连接上的文件到期时间设置为“1小时”,并且已经在NiFi实例中一小时的文件到达该连接,则该文件将过期。默认值“0秒”表示数据永不过期。当设置了“0秒”以外的文件到期时,连接标签上会出现一个小时钟图标,因此在查看画布上的流时,DFM可以一目了然地看到它。

File Expiration Indicator

Back Pressure(背压)

NiFi provides two configuration elements for Back Pressure. These thresholds indicate how much data should be allowed to exist in the queue before the component that is the source of the Connection is no longer scheduled to run. This allows the system to avoid being overrun with data. The first option provided is the "Back pressure object threshold." This is the number of FlowFiles that can be in the queue before back pressure is applied. The second configuration option is the "Back pressure data size threshold." This specifies the maximum amount of data (in size) that should be queued up before applying back pressure. This value is configured by entering a number followed by a data size (B for bytes, KB for kilobytes, MB for megabytes, GB for gigabytes, or TB for terabytes).

NiFi为背压提供两种配置元素。 这些阈值表示在不再计划运行作为连接源的组件之前,应允许在队列中存在多少数据。 这避免系统数据溢出。 提供的第一个选项是“背压对象阈值”。 这是在应用背压之前可以在队列中的FlowFiles的数量。 第二个配置选项是“背压数据大小阈值”。 这指定了在应用反压之前应排队的最大数据量(大小)。 通过输入数字后跟数据大小来配置此值(B表示字节,KB表示千字节,MB表示兆字节,GB表示千兆字节,TB表示TB表示)。

By default each new connection added will have a default Back Pressure Object Threshold of 10,000 objects and Back Pressure Data Size Threshold of 1 GB. These defaults can be changed by modifying the appropriate properties in the nifi.properties file.

默认情况下,添加的每个新连接的默认背压对象阈值为10,000个对象,背压数据大小阈值为1 GB。 可以通过修改nifi.properties文件中的相应属性来更改这些默认值。

When back pressure is enabled, small progress bars appear on the connection label, so the DFM can see it at-a-glance when looking at a flow on the canvas. The progress bars change color based on the queue percentage: Green (0-60%), Yellow (61-85%) and Red (86-100%).

启用背压时,连接标签上会出现小进度条,因此在查看画布上的流时,DFM可以一目了然地看到它。 进度条根据队列百分比更改颜色:绿色(0-60%),黄色(61-85%)和红色(86-100%)。

Back Pressure Indicator Bars

Hovering your mouse over a bar displays the exact percentage.

将鼠标悬停在条形图上会显示确切的百分比。

Back Pressure Indicator Hover Text

When the queue is completely full, the Connection is highlighted in red.

队列完全填满后,Connection将以红色突出显示。

Back Pressure Queue Full

Load Balancing(负载均衡)
Load Balance Strategy(负载均衡策略)

To distribute the data in a flow across the nodes in the cluster, NiFi offers the following load balance strategies:

为了在群集中的节点之间分配流中的数据,NiFi提供以下负载均衡策略:

  • Do not load balance: Do not load balance FlowFiles between nodes in the cluster. This is the default.
  • Do not load balance: 不在群集中的节点之间平衡FlowFiles。 这是默认值。
  • Partition by attribute: Determines which node to send a given FlowFile to based on the value of a user-specified FlowFile Attribute. All FlowFiles that have the same value for the Attribute will be sent to the same node in the cluster. If the destination node is disconnected from the cluster or if unable to communicate, the data does not fail over to another node. The data will queue, waiting for the node to be available again. Additionally, if a node joins or leaves the cluster necessitating a rebalance of the data, consistent hashing is applied to avoid having to redistribute all of the data.
  • Partition by attribute:根据用户指定的FlowFile属性的值确定将给定FlowFile发送到哪个节点。 具有相同Attribute值的所有FlowFile将发送到集群中的同一节点。 如果目标节点与群集断开连接或无法通信,则数据不会故障转移到另一个节点。 数据将排队,等待节点再次可用。 此外,如果节点加入或离开集群需要重新平衡数据,则通过一致性散列以避免必须重新分发所有数据。
  • Round robin: FlowFiles will be distributed to nodes in the cluster in a round-robin fashion. If a node is disconnected from the cluster or if unable to communicate with a node, the data that is queued for that node will be automatically redistributed to another node(s).
  • Round robin:FlowFiles将以轮循方式分发到集群中的节点。 如果节点与群集断开连接或无法与节点通信,则排队等待该节点的数据将自动重新分发到另一个节点。
  • Single node: All FlowFiles will be sent to a single node in the cluster. Which node they are sent to is not configurable. If the node is disconnected from the cluster or if unable to communicate with the node, the data that is queued for that node will remain queued until the node is available again.
  • Single node: 所有FlowFiles都将发送到集群中的单个节点。 它们被发送到哪个节点是不可控的。 如果节点与群集断开连接或无法与节点通信,则排队等待该节点的数据将保持排队,直到该节点再次可用。

In addition to the UI settings, there are Cluster Node Properties related to load balancing that must also be configured in nifi.properties.

除UI设置外,还有与负载平衡相关的群集节点属性,还必须在nifi.properties中进行配置。

NiFi persists the nodes that are in a cluster across restarts. This prevents the redistribution of data until all of the nodes have connected. If the cluster is shutdown and a node is not intended to be brought back up, the user is responsible for removing the node from the cluster via the "Cluster" dialog in the UI (see Managing Nodes for more information).

NiFi会在重新启动时持久保存群集中的节点。 这可以防止在节点连接之前重新分配数据。 如果群集已关闭且没有重新启动节点,则用户负责通过UI中的“群集”对话框从群集中删除节点 Managing Nodes了解更多信息)。

Load Balance Compression(负载均衡压缩)

After selecting the load balance strategy, the user can configure whether or not data should be compressed when being transferred between nodes in the cluster.

选择负载平衡策略后,用户可以配置在群集中的节点之间传输时是否压缩数据。

Load Balance Compression Options

The following compression options are available:

可以使用以下压缩选项:

  • Do not compress: FlowFiles will not be compressed. This is the default.
  • Do not compress: FlowFiles不会被压缩。这是默认值。
  • Compress attributes only: FlowFile attributes will be compressed, but FlowFile contents will not.
  • Compress attributes only: FlowFile属性将被压缩,但FlowFile内容不会被压缩。
  • Compress attributes and content: FlowFile attributes and contents will be compressed.
  • Compress attributes and content: FlowFile属性和内容将被压缩。
Load Balance Indicator(负载均衡指示器)

When a load balance strategy has been implemented for a connection, a load balance indicator (Load Balance Icon) will appear on the connection:

当为连接实施负载平衡策略时,负载平衡指示器(Load Balance Icon)将出现在连接上:

Connection Configured with Load Balance Strategy

Hovering over the icon will display the connection’s load balance strategy and compression configuration. The icon in this state also indicates that all data in the connection has been distributed across the cluster.

将鼠标悬停在该图标上将显示连接的负载平衡策略和压缩策略。 此状态下的图标还表示连接中的所有数据都已在群集中分布。

Distributed Load Balance Connection

When data is actively being transferred between the nodes in the cluster, the load balance indicator will change orientation and color:

当在群集中的节点之间主动传输数据时,负载均衡指示器将更改方向和颜色:

Active Load Balance Connection

Cluster Connection Summary(群集连接摘要)

To see where data has been distributed among the cluster nodes, select Summary from the Global Menu. Then select the "Connections" tab and the "View Connection Details" icon for a source:

要查看在群集节点之间分配数据的位置,请从“全局菜单”中选择“摘要”。 然后选择“连接”选项卡“和“查看连接详细信息”图标:

NiFi Summary Connections

This will open the Cluster Connection Summary dialog, which shows the data on each node in the cluster:

这将打开“群集连接摘要”对话框,该对话框显示群集中每个节点上的数据:

Cluster Connection Summary Dialog

Prioritization(优先级)

The right-hand side of the tab provides the ability to prioritize the data in the queue so that higher priority data is processed first. Prioritizers can be dragged from the top ('Available prioritizers') to the bottom ('Selected prioritizers'). Multiple prioritizers can be selected. The prioritizer that is at the top of the 'Selected prioritizers' list is the highest priority. If two FlowFiles have the same value according to this prioritizer, the second prioritizer will determine which FlowFile to process first, and so on. If a prioritizer is no longer desired, it can then be dragged from the 'Selected prioritizers' list to the 'Available prioritizers' list.

选项卡的右侧提供了对队列中数据进行优先级排序的功能,以便首先处理更高优先级的数据。 优先级可以从顶部('使用的优先级排序器')拖动到底部('选用的优先级排序器')。 可以选择多个优先级排序器。 位于“所选优先级”列表顶部的优先级排序是最高优先级。 如果两个FlowFiles根据此优先级排序器具有相同的值,则第二个优先级排序器将确定首先处理哪个FlowFile,依此类推。 如果不再需要优先级排序器,则可以将其从“选用的优先级排序器”列表拖动到“使用的优先级排序器”列表。

The following prioritizers are available:

可以使用以下优先顺序:

  • FirstInFirstOutPrioritizer: Given two FlowFiles, the one that reached the connection first will be processed first.
  • FirstInFirstOutPrioritizer: 给定两个FlowFiles,首先处理到达的FlowFiles。
  • NewestFlowFileFirstPrioritizer: Given two FlowFiles, the one that is newest in the dataflow will be processed first.
  • NewestFlowFileFirstPrioritizer: 给定两个FlowFiles,将首先处理数据流中最新的FlowFiles。
  • OldestFlowFileFirstPrioritizer: Given two FlowFiles, the one that is oldest in the dataflow will be processed first. 'This is the default scheme that is used if no prioritizers are selected'.
  • OldestFlowFileFirstPrioritizer: 给定两个FlowFiles,将首先处理数据流中最旧的FlowFiles。 '这是在没有选择优先级的情况下使用的默认方案'。
  • PriorityAttributePrioritizer: Given two FlowFiles that both have a "priority" attribute, the one that has the highest priority value will be processed first. Note that an UpdateAttribute processor should be used to add the "priority" attribute to the FlowFiles before they reach a connection that has this prioritizer set. Values for the "priority" attribute may be alphanumeric, where "a" is a higher priority than "z", and "1" is a higher priority than "9", for example.
  • PriorityAttributePrioritizer:给定两个具有“优先级”属性的FlowFile,将首先处理具有最高优先级值的FlowFiles。 请注意,应使用UpdateAttribute处理器在FlowFiles到达具有此优先级设置的连接之前将“priority”属性添加到FlowFiles。 “优先级”属性的值可以是字母数字,其中“a”的优先级高于“z”,“1”的优先级高于“9”。

With a Load Balance Strategy configured, the connection has a queue per node in addition to the local queue. The prioritizer will sort the data in each queue independently.

配置负载平衡策略后,除本地队列外,每个节点的连接都有一个队列。 优先级排序器将独立地对每个队列中的数据进行排序。

Changing Configuration and Context Menu Options(更改配置和菜单选项)

After a connection has been drawn between two components, the connection’s configuration may be changed, and the connection may be moved to a new destination; however, the processors on either side of the connection must be stopped before a configuration or destination change may be made.

在两个组件之间建立连接之后,可以更改连接的配置,并且可以将连接移动到新地点(画布上); 但是,必须先停止连接任一侧的处理器,然后才能进行配置或目标更改。

Connection

To change a connection’s configuration or interact with the connection in other ways, right-click on the connection to open the connection context menu.

要更改连接的配置或以其他方式与连接交互,请右键单击连接以打开连接菜单。

Connection Menu

The following options are available:

可以使用以下选项:

  • Configure: This option allows the user to change the configuration of the connection.
  • Configure: 此选项允许用户更改连接的配置。
  • View status history: This option opens a graphical representation of the connection’s statistical information over time.
  • View status history: 此选项打开连接统计信息随时间的图表。
  • List queue: This option lists the queue of FlowFiles that may be waiting to be processed.
  • List queue: 此选项列出可能正在等待处理的FlowFiles队列。
  • Go to source: This option can be useful if there is a long distance between the connection’s source and destination components on the canvas. By clicking this option, the view of the canvas will jump to the source of the connection.
  • Go to source: 如果画布上连接的源组件和目标组件之间存在较长距离,则此选项很有用。 通过单击此选项,画布视图将跳转到连接源。
  • Go to destination: Similar to the "Go to source" option, this option changes the view to the destination component on the canvas and can be useful if there is a long distance between two connected components.
  • Go to destination: 与“转到源”选项类似,此选项将视图更改为画布上的目标组件,如果两个连接组件之间存在较长距离,则此选项可能很有用。
  • Bring to front: This option brings the connection to the front of the canvas if something else (such as another connection) is overlapping it.
  • Bring to front: 如果其他东西(例如另一个连接)与其重叠,则此选项将连接带到画布的前面。
  • Empty queue: This option allows the DFM to clear the queue of FlowFiles that may be waiting to be processed. This option can be especially useful during testing, when the DFM is not concerned about deleting data from the queue. When this option is selected, users must confirm that they want to delete the data in the queue.
  • Empty queue: 此选项允许DFM清除可能正在等待处理的FlowFiles队列。 当DFM不关心从队列中删除数据时,此选项在测试期间特别有用。 选择此选项后,用户必须确认是否要删除队列中的数据。
  • Delete: This option allows the DFM to delete a connection between two components. Note that the components on both sides of the connection must be stopped and the connection must be empty before it can be deleted.
  • Delete: 此选项允许DFM删除两个组件之间的连接。 请注意,必须先停止连接两侧的组件,并且连接必须为空才能删除。

Bending Connections(弯曲连接)

To add a bend point (or elbow) to an existing connection, simply double-click on the connection in the spot where you want the bend point to be. Then, you can use the mouse to grab the bend point and drag it so that the connection is bent in the desired way. You can add as many bend points as you want. You can also use the mouse to drag and move the label on the connection to any existing bend point. To remove a bend point, simply double-click it again.

要向现有连接添加弯曲点(或弯头),只需双击要弯曲点所在位置的连接即可。 然后,您可以使用鼠标抓住弯曲点并拖动它,以便以所需的方式弯曲连接。 您可以根据需要添加任意数量的弯曲点。 您还可以使用鼠标将连接上的标签拖动并移动到任何现有折弯点。 要删除折弯点,只需再次双击即可。

Connection Bend Points

Processor Validation(处理器验证)

Before trying to start a Processor, it’s important to make sure that the Processor’s configuration is valid. A status indicator is shown in the top-left of the Processor. If the Processor is invalid, the indicator will show a yellow Warning indicator with an exclamation mark indicating that there is a problem:

在尝试启动处理器之前,确保处理器的配置有效非常重要。 状态指示器显示在处理器的左上角。 如果处理器无效,指示器将显示黄色警告指示器,并带有感叹号,表示存在问题:

Invalid Processor

In this case, hovering over the indicator icon with the mouse will provide a tooltip showing all of the validation errors for the Processor. Once all of the validation errors have been addressed, the status indicator will change to a Stop icon, indicating that the Processor is valid and ready to be started but currently is not running:

在这种情况下,使用鼠标悬停在指示器图标上将提供工具提示,显示处理器的所有验证错误。 一旦解决了所有验证错误,状态指示器将变为Stop图标,表示处理器有效并准备启动但当前未运行:

Valid Processor

Site-to-Site(站点到站点)

When sending data from one instance of NiFi to another, there are many different protocols that can be used. The preferred protocol, though, is the NiFi Site-to-Site Protocol. Site-to-Site makes it easy to securely and efficiently transfer data to/from nodes in one NiFi instance or data producing application to nodes in another NiFi instance or other consuming application.

当从一个NiFi实例向另一个实例发送数据时,可以使用许多不同的协议。 但是,首选协议是NiFi站点到站点协议。 站点到站点可以轻松安全高效地将数据传输到一个NiFi实例中的节点或从一个NiFi实例中的节点生成数据传输到另一个NiFi实例或其他节点消费应用程序中的数据。

Using Site-to-Site provides the following benefits:

站点到站点提供以下好处:

  • Easy to configure
  • 配置简单
    • After entering the URL of the remote NiFi instance, the available ports (endpoints) are automatically discovered and provided in a drop-down list
    • 输入远程NiFi实例的URL后,将自动发现可用端口(端点)并在下拉列表中提供
  • Secure
  • 安全
    • Site-to-Site optionally makes use of Certificates in order to encrypt data and provide authentication and authorization. Each port can be configured to allow only specific users, and only those users will be able to see that the port even exists. For information on configuring the Certificates, see the Security Configuration section of theSystem Administrator’s Guide.
    • 站点到站点可选用证书来加密数据并提供身份验证和授权。 可以将每个端口配置为仅允许特定用户访问,并且只有那些用户才能看到该端口的存在。 有关配置证书的信息,请参阅系统管理员指南安全配置
  • Scalable
  • 可扩展
    • As nodes in the remote cluster change, those changes are automatically detected and data is scaled out across all nodes in the cluster.
    • 随着远程群集中的节点发生更改,将自动检测这些更改,并在群集中的所有节点上扩展数据。
  • Efficient
  • 高效
    • Site-to-Site allows batches of FlowFiles to be sent at once in order to avoid the overhead of establishing connections and making multiple round-trip requests between peers.
    • 站点到站点允许一次发送批量的FlowFiles,以避免建立连接和在对等点之间进行多次往返请求的开销。
  • Reliable
  • 可靠
    • Checksums are automatically produced by both the sender and receiver and compared after the data has been transmitted, in order to ensure that no corruption has occurred. If the checksums don’t match, the transaction will simply be canceled and tried again.
    • 发送方和接收方自动生成校验和,并在数据传输后进行比较,以确保没有发生损坏。 如果校验和不匹配,则只会取消交易并再次尝试。
  • Automatically load balanced
  • 自动负载均衡
    • As nodes come online or drop out of the remote cluster, or a node’s load becomes heavier or lighter, the amount of data that is directed to that node will automatically be adjusted.
    • 当节点联机或退出远程群集或节点的负载变得更重或更轻时,将自动调整到该节点的数据量。
  • FlowFiles maintain attributes
  • FlowFiles维护性
    • When a FlowFile is transferred over this protocol, all of the FlowFile’s attributes are automatically transferred with it. This can be very advantageous in many situations, as all of the context and enrichment that has been determined by one instance of NiFi travels with the data, making for easy routing of the data and allowing users to easily inspect the data.
    • 当通过此协议传输FlowFile时,所有FlowFile的属性都会随之自动传输。 这在许多情况下是非常有利的,因为由一个NiFi实例确定的所有上下文和数据,使得数据易于路由并且允许用户地检查数据。
  • Adaptable
  • 适应性
    • As new technologies and ideas emerge, the protocol for handling Site-to-Site communications are able to change with them. When a connection is made to a remote NiFi instance, a handshake is performed in order to negotiate which protocol and which version of the protocol will be used. This allows new capabilities to be added while still maintaining backward compatibility with all older instances. Additionally, if a vulnerability or deficiency is ever discovered in a protocol, it allows a newer version of NiFi to forbid communication over the compromised versions of the protocol.
    • 随着新技术和新想法的出现,处理站点到站点通信的协议能够随之改变。 当与远程NiFi实例建立连接时,执行握手以协商将使用哪种协议和协议版本。 这允许添加新功能,同时仍保持与所有旧实例的向后兼容性。 此外,如果在协议中发现漏洞或缺陷,它允许更新版本的NiFi禁用受损版本的协议进行通信。

Site-to-Site is a protocol transferring data between two NiFi instances. Both end can be a standalone NiFi or a NiFi cluster. In this section, the NiFi instance initiates the communications is called Site-to-Site client NiFi instance and the other end as Site-to-Site server NiFi instance to clarify what configuration needed on each NiFi instances.

站点到站点是在两个NiFi实例之间传输数据的协议。 两端可以是独立的NiFi或NiFi集群。 在本节中,NiFi实例启动通信称为站点到站点客户端NiFi实例,另一端称为站点到站点服务器NiFi实例,以阐明每个NiFi实例所需的配置。

A NiFi instance can be both client and server for Site-to-Site protocol, however, it can only be a client or server within a specific Site-to-Site communication. For example, if there are three NiFi instances A, B and C. A pushes data to B, and B pulls data from C. A — push → B ← pull — C. Then B is not only a server in the communication between A and B, but also a client in B and C.

NiFi实例可以是站点到站点协议的客户端和服务器,但是,它只能是特定站点到站点通信中的客户端或服务器。 例如,如果有三个NiFi实例A,B和C. A将数据推送到B,B从C中提取数据。* A - push→B←pull - C 。 那么B不仅是A和B之间通信中的服务器,而且是B和C中的客户端*。

It is important to understand which NiFi instance will be the client or server in order to design your data flow, and configure each instance accordingly. Here is a summary of what components run on which side based on data flow direction:

了解哪个NiFi实例将是客户端或服务器以设计数据流并相应地配置每个实例非常重要。 以下是基于数据流方向在哪一方运行的组件的摘要:

  • Push: a client sends data to a Remote Process Group, the server receives it with an Input Port
  • Push:客户端数据发送到远程进程组,服务器通过输入端口接收
  • Pull: a client receives data from a Remote Process Group, the server sends data through an Output Port
  • Pull:客户端从远程进程组接收数据,服务器通过输出端口发送数据

Configure Site-to-Site client NiFi instance(配置站点到站点客户端NiFi实例)

Remote Process Group: In order to communicate with a remote NiFi instance via Site-to-Site, simply drag a Remote Process Group onto the canvas and enter the URL of the remote NiFi instance (for more information on the components of a Remote Process Group, see Remote Process Group Transmission section of this guide.) The URL is the same URL you would use to go to that instance’s User Interface. At that point, you can drag a connection to or from the Remote Process Group in the same way you would drag a connection to or from a Processor or a local Process Group. When you drag the connection, you will have a chance to choose which Port to connect to. Note that it may take up to one minute for the Remote Process Group to determine which ports are available.

要站点到站点与远程NiFi实例进行通信,只需拖动远程进程组到画布上,并输入远程NiFi实例的URL(有关远程进程组组件的更多信息,请参阅远程进程组传输 指南。)URL与跳转到实例的用户界面的URL相同。 此时,您可以使用与将处理器连接到处理器或本地进程组的连接相同的方式将连接拖到远程进程组或从远程进程组拖出连接。 拖动连接时,您将可以选择要连接的端口。 请注意,远程进程组哪些端口可用可能最多有一分钟的延迟。

If the connection is dragged starting from the Remote Process Group, the ports shown will be the Output Ports of the remote group, as this indicates that you will be pulling data from the remote instance. If the connection instead ends on the Remote Process Group, the ports shown will be the Input Ports of the remote group, as this implies that you will be pushing data to the remote instance.

如果从远程进程组开始拖动连接,则显示的端口将是远程组的输出端口,因为这表示您将从远程实例中提取数据。 如果连接在远程进程组上结束,则显示的端口将是远程组的输入端口,因为这意味着您将数据推送到远程实例。

If the remote instance is configured to use secure data transmission, you will see only ports that you are authorized to communicate with. For information on configuring NiFi to run securely, see the System Administrator’s Guide.

如果远程实例配置为使用安全数据传输,您将只看到您有权与之通信的端口。 有关配置NiFi安全运行的信息,请参阅系统管理员指南

Transport Protocol: On a Remote Process Group creation or configuration dialog, you can choose Transport Protocol to use for Site-to-Site communication as shown in the following image:

Transport Protocol:在远程进程组创建或配置对话框中,您可以选择用于站点到站点通信的传输协议,如下图所示:

Configure Remote Process Group

By default, it is set to RAW which uses raw socket communication using a dedicated port. HTTP transport protocol is especially useful if the remote NiFi instance is in a restricted network that only allow access through HTTP(S) protocol or only accessible from a specific HTTP Proxy server. For accessing through a HTTP Proxy Server, BASIC and DIGEST authentication are supported.

默认情况下,它设置为* RAW *,它使用专用端口使用原始套接字通信。 *如果远程NiFi实例位于仅允许通过HTTP(S)协议进行访问或仅可从特定HTTP代理服务器访问的受限网络中,则HTTP *传输协议特别有用。 对于通过HTTP代理服务器进行访问,支持BASIC和DIGEST身份验证。

Local Network Interface: In some cases, it may be desirable to prefer one network interface over another. For example, if a wired interface and a wireless interface both exist, the wired interface may be preferred. This can be configured by specifying the name of the network interface to use in this box. If the value entered is not valid, the Remote Process Group will not be valid and will not communicate with other NiFi instances until this is resolved.

Local Network Interface: 在某些情况下,可能希望优选一个网络接口而不是另一个网络接口。 例如,如果存在有线接口和无线接口,则有线接口可能是优选的。 可以通过指定要在此框中使用的网络接口的名称来配置。 如果输入的值无效,则远程进程组将无效,并且在解决此问题之前不会与其他NiFi实例通信。

Configure Site-to-Site server NiFi instance(配置站点到站点服务器NiFi实例)

Retrieve Site-to-Site Details: If your NiFi is running securely, in order for another NiFi instance to retrieve information from your instance, it needs to be added to the Global Access "retrieve site-to-site details" policy. This will allow the other instance to query your instance for details such as name, description, available peers (nodes when clustered), statistics, OS port information and available Input and Output ports. Utilizing Input and Output ports in a secured instance requires additional policy configuration as described below.

Retrieve Site-to-Site Details: 如果您的NiFi安全运行,为了让另一个NiFi实例从您的实例中检索信息,需要将其添加到Global Access“检索站点到站点详细信息”策略。 这将允许另一个实例查询您的实例以获取详细信息,例如名称,描述,可用对等体(群集时的节点),统计信息,OS端口信息以及可用的输入和输出端口。 在安全实例中使用输入和输出端口需要额外的策略配置。

Input Port: In order to allow another NiFi instance to push data to your local instance, you can simply drag an Input Portonto the Root Process Group of your canvas. After entering a name for the port, it will be added to your flow. You can now right-click on the Input Port and choose Configure in order to adjust the name and the number of concurrent tasks that are used for the port.

Input Port: 为了允许另一个NiFi实例将数据推送到本地实例,您只需拖动输入端口到画布的根进程组。 输入端口命名后,它将添加到您的流程中。 您现在可以右键单击“输入端口”并选择“配置”,以便调整用于端口的名称和并发任务数。

If Site-to-Site is configured to run securely, you will need to manage the port’s "receive data via site-to-site" component access policy. Only those users who have been added to the policy will be able to communicate with the port.

如果将站点到站点配置为安全运行,则需要管理端口的“通过站点到站点接收数据”组件访问权限。 只有已获得权限的用户才能与端口通信。

Output Port: Similar to an Input Port, a DataFlow Manager may choose to add an Output Port to the Root Process Group. The Output Port allows an authorized NiFi instance to remotely connect to your instance and pull data from the Output Port. Configuring the Output Port and managing the port’s access policies will again allow the DFM to control how many concurrent tasks are allowed, as well as which users are authorized to pull data from the instance being configured.

Output Port: 与输入端口类似,DFM可以选择将输出端口添加到根进程组。 输出端口允许授权的NiFi实例远程连接到您的实例并从输出端口提取数据。 配置输出端口和管理端口的访问权限将再次允许DFM控制并发任务数,以及授权哪些用户从正在配置的实例中提取数据。

In addition to other instances of NiFi, some other applications may use a Site-to-Site client in order to push data to or receive data from a NiFi instance. For example, NiFi provides an Apache Storm spout and an Apache Spark Receiver that are able to pull data from NiFi’s Root Group Output Ports.

除了NiFi的其他实例之外,一些其他应用程序可以使用站点到站点客户端来将数据推送到NiFi实例或从NiFi实例接收数据。 例如,NiFi提供Apache Storm spout和Apache Spark Receiver,它们能够从NiFi的根组输出端口提取数据。

For information on how to enable and configure Site-to-Site on a NiFi instance, see the Site-to-Site Properties section of theSystem Administrator’s Guide.

有关如何在NiFi实例上启用和配置站点到站点的信息,请参阅站点到站点属性系统管理员指南

For information on how to configure access policies, see the Access Properties section of the System Administrator’s Guide.

有关如何配置访问策略的信息,请参阅系统管理员访问属性

Example Dataflow(数据流示例)

This section has described the steps required to build a dataflow. Now, to put it all together. The following example dataflow consists of just two processors: GenerateFlowFile and LogAttribute. These processors are normally used for testing, but they can also be used to build a quick flow for demonstration purposes and see NiFi in action.

本节介绍了构建数据流所需的步骤。 现在,把它们归总到一起。 以下示例数据流仅包含两个处理器:GenerateFlowFile和LogAttribute。 这些处理器通常用于测试,但它们也可用于构建快速流程以用于演示目的,并查看NiFi的运行情况。

After you drag the GenerateFlowFile and LogAttribute processors to the canvas and connect them (using the guidelines provided above), configure them as follows:

将GenerateFlowFile和LogAttribute处理器拖到画布并连接它们(根据上面提供的指南,按如下所示进行配置:

  • Generate FlowFile
    • On the Scheduling tab, set Run schedule to: 5 sec. Note that the GenerateFlowFile processor can create many FlowFiles very quickly; that’s why setting the Run schedule is important so that this flow does not overwhelm the system NiFi is running on.
    • 在“调度”选项卡上,将“运行计划”设置为:5秒。 请注意,GenerateFlowFile处理器可以非常快速地创建许多FlowFiles; 这就是为什么设置运行计划很重要,这样这个流程就不会让NiFi运行的系统不堪重负。
    • On the Properties tab, set File Size to: 10 kb
    • 在“属性”选项卡上,将“文件大小”设置为:10 kb
  • Log Attribute
    • On the Settings tab, under Auto-terminate relationships, select the checkbox next to Success. This will terminate FlowFiles after this processor has successfully processed them.
    • 在“设置”选项卡上的“自动终止关系”下,选中“成功”旁边的复选框。 这将在此处理器成功处理后终止FlowFiles。
    • Also on the Settings tab, set the Bulletin level to Info. This way, when the dataflow is running, this processor will display the bulletin icon (see Anatomy of a Processor), and the user may hover over it with the mouse to see the attributes that the processor is logging.
    • 同样在“设置”选项卡上,将“Bulletin”级别设置为“Info”。 这样,当数据流运行时,此处理器将显示公告图标(请参阅处理器剖析),用户可以用鼠标悬停在它上面以查看处理器正在记录的信息。

The dataflow should look like the following:

数据流应如下所示:

Simple Flow

Now see the following section on how to start and stop the dataflow. When the dataflow is running, be sure to note the statistical information that is displayed on the face of each processor (see Anatomy of a Processor).

下面一节将介绍启动和停止数据流。 数据流运行时,请务必记下每个处理器正面显示的统计信息(请参阅处理器剖析)。


Command and Control of the DataFlow(DataFlow的命令和控制)

When a component is added to the NiFi canvas, it is in the Stopped state. In order to cause the component to be triggered, the component must be started. Once started, the component can be stopped at any time. From a Stopped state, the component can be configured, started, or disabled.

将组件添加到NiFi画布时,默认“已停止”状态。 为了触发组件,必须启动组件。 启动后,组件可以随时停止。 在“已停止”状态下,可以配置,启动或禁用该组件。

Starting a Component(启动组件)

In order to start a component, the following conditions must be met:

启动组件,必须满足以下条件:

  • The component’s configuration must be valid.
  • 组件的配置必须正确。
  • All defined Relationships for the component must be connected to another component or auto-terminated.
  • 组件的所有已定义关系必须连接到另一个组件或自动终止。
  • The component must be stopped.
  • 组件必须是停止状态。
  • The component must be enabled.
  • 组件必须是可用的。
  • The component must have no active tasks. For more information about active tasks, see the "Anatomy of …" sections under Monitoring of DataFlow (Anatomy of a Processor, Anatomy of a Process Group, Anatomy of a Remote Process Group).
  • 组件必须没有活动任务。 有关活动任务的更多信息,请参阅DataFlow监控( 处理器剖析,[过程组解剖](http://nifi.apache.org /docs/nifi-docs/html/user-guide.html#process_group_anatomy),远程过程组的剖析)。

Components can be started by selecting all of the components to start and then clicking the Start button ( Start ) in the Operate Palette or by right-clicking a single component and choosing Start from the context menu.

可以通过选择要启动的所有组件然后单击“开始”按钮( Start)来启动组件。 操作选项板或右键单击单个组件,然后从菜单中选择“开始”。

If starting a Process Group, all components within that Process Group (including child Process Groups) will be started, with the exception of those components that are invalid or disabled.

如果启动进程组,则该进程组中的所有组件(包括子进程组)都会启动,但无效或禁用的组件除外。

Once started, the status indicator of a Processor will change to a Play symbol ( Run ).

一旦启动,处理器的状态指示器将变为符号(Run )。

Stopping a Component(停止组件)

A component can be stopped any time that it is running. A component is stopped by right-clicking on the component and clicking Stop from the context menu, or by selecting the component and clicking the Stop button ( Stop ) in the Operate Palette.

组件可以在运行时停止。 通过右键单击组件并从菜单中单击“停止”,或者选择组件并单击“停止”按钮( Stop )。

If a Process Group is stopped, all of the components within the Process Group (including child Process Groups) will be stopped.

如果停止进程组,则将停止进程组(包括子进程组)中的所有组件。

Once stopped, the status indicator of a component will change to the Stop symbol ( Stop ).

一旦停止,组件的状态指示器将变为停止符号( Stop )。

Stopping a component does not interrupt its currently running tasks. Rather, it stops scheduling new tasks to be performed. The number of active tasks is shown in the top-right corner of the Processor (See Anatomy of a Processor for more information).

停止组件不会中断其当前正在运行的任务。 相反,它会停止安排要执行的新任务。 活动任务的数量显示在处理器的右上角(参见处理器剖析) 欲获得更多信息)。

Enabling/Disabling a Component

When a component is enabled, it is able to be started. Users may choose to disable components when they are part of a dataflow that is still being assembled, for example. Typically, if a component is not intended to be run, the component is disabled, rather than being left in the Stopped state. This helps to distinguish between components that are intentionally not running and those that may have been stopped temporarily (for instance, to change the component’s configuration) and inadvertently were never restarted.

当组件可用时,即可启动它。 例如,用户可以选择在组件仍然是正在组装的数据流的一部分时禁用组件。 通常,如果不打算运行组件,则禁用该组件,而不是将其置于“已停止”状态。 这有助于区分有意未运行的组件和可能已暂时停止的组件(例如,更改组件的配置),并且无意中从未重新启动。

When it is desirable to re-enable a component, it can be enabled by selecting the component and clicking the Enable button ( Enable ) in the Operate Palette. This is available only when the selected component or components are disabled. Alternatively, a component can be enabled by checking the checkbox next to the "Enabled" option in the Settings tab of the Processor configuration dialog or the configuration dialog for a Port.

当需要重新启用组件时,可以通过选择组件并单击“启用”按钮来启用它(Enable )。 仅当禁用所选组件时,此选项才可用。 或者,可以通过选中“处理器配置”对话框的“设置”选项卡中的“已启用”选项旁边的复选框或端口的配置对话框来启用组件。

Once enabled, the component’s status indicator will change to either Invalid ( Invalid ) or Stopped ( Stopped ), depending on whether or not the component is valid.

启动时,组件的状态将编程(Invalid )或( Stopped )之中的一个,这取决于组件是否有效。

A component is then disabled by selecting the component and clicking the Disable button ( Disable ) in the Operate Palette, or by clearing the checkbox next to the "Enabled" option in the Settings tab of the Processor configuration dialog or the configuration dialog for a Port.

然后通过选择组件并单击Operate Palette中的Disable按钮( Disable)来禁用组件, 或者清除“处理器配置”对话框的“设置”选项卡中的“已启用”选项旁边的复选框或端口的配置对话框。

Only Ports and Processors can be enabled and disabled.

只能启用和禁用端口和处理器。

Remote Process Group Transmission(远程进程组传输)

Remote Process Groups provide a mechanism for sending data to or retrieving data from a remote instance of NiFi. When a Remote Process Group (RPG) is added to the canvas, it is added with the Transmission Disabled, as indicated by the icon (Transmission Disabled ) in the top-left corner. When Transmission is Disabled, it can be enabled by right-clicking on the RPG and clicking the "Enable Transmission" menu item. This will cause all ports for which there is a Connection to begin transmitting data. This will cause the status indicator to then change to the Transmission Enabled icon ( Transmission Enabled ).

远程进程组提供了一种向远程NiFi实例发送数据或获取数据的机制。 将远程进程组(RPG)添加到画布时,默认“禁用”状态,如图标所示(Transmission Disabled)在左上角。 当传输被禁用时,可以通过右键单击RPG并单击“启用传输”菜单项来启用它。 这时所有端口开始传输数据。 指示器状态变为Transmission Enabled图标( Transmission Enabled)。

If there are problems communicating with the Remote Process Group, a Warning indicator ( Warning ) may instead be present in the top-left corner. Hovering over this Warning indicator with the mouse will provide more information about the problem.

如果与远程进程组通信时出现问题,则警告指示符( Warning )可能会出现在左上角。 使用鼠标将鼠标悬停在此警告指示器上将提供有关该问题的更多信息。

Individual Port Transmission(个体端口传输)

There are times when the DFM may want to either enable or disable transmission for only a specific Port within the Remote Process Group. This can be accomplished by right-clicking on the Remote Process Group and choosing the "Remote ports" menu item. This provides a configuration dialog from which each Port can be configured:

有时,DFM可能希望仅为远程进程组中的特定端口启用或禁用传输。 这可以通过右键单击远程进程组并选择“远程端口”菜单项来完成。 这提供了一个配置对话框,可以从中配置每个端口:

Remote Process Groups

The left-hand side lists all of the Input Ports that the remote instance of NiFi allows data to be sent to. The right-hand side lists all of the Output Ports from which this instance is able to pull data. If the remote instance is using secure communications (the URL of the NiFi instance begins with https://, rather than http://), any Ports that the remote instance has not made available to this instance will not be shown.

左侧列出了NiFi远程实例允许发送数据的所有输入端口。 右侧列出了此实例能够从中提取数据的所有输出端口。 如果远程实例使用安全通信(NiFi实例的URL以 https://, http://),远程实例未提供给此实例的端口不会显示。

If a Port that is expected to be shown is not shown in this dialog, ensure that the instance has proper permissions and that the Remote Process Group’s flow is current. This can be checked by closing the Port Configuration Dialog and looking at the bottom-right corner of the Remote Process Group. The date at which the flow was last refreshed is shown. If the flow appears to be outdated, it can be updated by right-clicking on the Remote Process Group and selecting "Refresh flow." (See Anatomy of a Remote Process Group for more information).

如果此对话框中未显示预期的端口,请确保实例具有适当的权限,并且远程进程组的流是最新的。 可以通过关闭“端口配置”对话框并查看“远程进程组”的右下角来检查上次刷新流的日期。 如果流程已过时,可以通过右键单击远程进程组并选择“刷新流程”来更新它。 (有关更多信息,请参阅远程过程组剖析)。

Each Port is shown with the Port name, followed by its description, currently configured number of Concurrent tasks, and whether or not data sent to this port will be compressed. To the left of this information is a switch to turn the Port on or off. Those Ports that have no Connections attached to them are grayed out:

每个端口都显示端口名称,描述,当前配置的并发任务数,以及是否发送压缩数据。 此信息的左侧是用于打开或关闭端口的开关。 那些没有连接到它们的连接的端口显示为灰色:

Remote Port Statuses

The on/off switch provides a mechanism to enable and disable transmission for each Port in the Remote Process Group independently. Those Ports that are connected but are not currently transmitting can be configured by clicking the pencil icon ( Edit ) below the on/off switch. Clicking this icon will allow the DFM to change the number of Concurrent tasks and whether or not compression should be used when transmitting data to or from this Port.

开/关提供了一种机制,可以独立地启用和禁用远程过程组中每个端口的传输。 可以通过单击下面的铅笔图标(Edit)配置已连接但当前未传输的端口 开/关。 单击此图标将允许DFM更改并发任务的数量,以及在向此端口传输数据时是否使用压缩。


Navigating within a DataFlow(DataFlow中的导航)

NiFi provides various mechanisms for getting around a dataflow. The NiFi User Interface section describes various ways to navigate around the NiFi canvas; however, once a flow exists on the canvas, there are additional ways to get from one component to another. When multiple Process Groups exist in a flow, breadcrumbs appear at the bottom of the screen, providing a way to navigate between them. In addition, to enter a Process Group that is currently visible on the canvas, simply double-click it, thereby "drilling down" into it. Connections also provide a way to jump from one location to another within the flow. Right-click on a connection and select "Go to source" or "Go to destination" in order to jump to one end of the connection or another. This can be very useful in large, complex dataflows, where the connection lines may be long and span large areas of the canvas. Finally, all components provide the ability to jump forward or backward within the flow. Right-click any component (e.g., a processor, process group, port, etc.) and select either "Upstream connections" or "Downstream connections". A dialog window will open, showing the available upstream or downstream connections that the user may jump to. This can be especially useful when trying to follow a dataflow in a backward direction. It is typically easy to follow the path of a dataflow from start to finish, drilling down into nested process groups; however, it can be more difficult to follow the dataflow in the other direction.

NiFi提供各种机制来绕过数据流。 NiFi用户界面部分介绍了在NiFi画布中导航的各种方法;但是,一旦画布上存在流,就会有其他方法从一个组件到另一个组件。当流中存在多个流程组时,面包屑会显示在屏幕底部,提供在它们之间导航的方法。此外,要进入当前在画布上可见的进程组,只需双击它,从而“向下进入”它。 连接中还提供了一种在流程中从一个位置跳转到另一个位置的方法。右键单击连接并选择“转到源”或“转到目标”以跳转到连接的一端或另一端。这在大型复杂数据流中非常有用,其中连接线可能很长并且跨越画布的大部分区域。最后,所有组件都提供在流程中向前或向后跳跃的能力。右键单击任何组件(例如,处理器,进程组,端口等),然后选择“上游连接”或“下游连接”。将打开一个对话框窗口,显示用户可以跳转到的可用上游或下游连接。当尝试沿向后方向跟踪数据流时,这尤其有用。通常很容易从头到尾跟踪数据流的路径,向下进入到嵌套的流程组;但是,在另一个方向上跟踪数据流可能更加困难。

Component Linking(组件链接)

A hyperlink can be used to navigate directly to a component on the NiFi canvas. This is especially useful when Multi-Tenant Authorization is configured. For example, a URL can be given to a user to direct them to the specific process group to which they have privileges.

超链接可用于直接导航到NiFi画布上的组件。 当配置[多租户授权]](http://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#multi-tenant-authorization)时,这尤其有用。 例如,可以将URL提供给用户以将其定向到他们具有特权的特定进程组。

The default URL for a NiFI instance is http://<hostname>:8080/nifi, which points to the root process group. When a component is selected on the canvas, the URL is updated with the component’s process group id and component id in the form http://<hostname>:8080/nifi/?processGroupId=<UUID>&componentIds=<UUIDs>;. In the following screenshot, the GenerateFlowFile processor in the process group PG1 is the selected component:

NiFI实例的默认URL是http://:8080/nifi,它指向根进程组。 在画布上选择组件时,将使用组件的进程组ID和组件ID更新URL,格式为“http://:8080/nifi/?processGroupId=&componentIds=`。 在以下屏幕截图中,进程组PG1中的GenerateFlowFile处理器是所选组件:

Component Linking Processor Example

Linking to multiple components on the canvas is supported, with the restriction that the length of the URL cannot exceed a 2000 character limit.

支持链接到画布上的多个组件,但限制URL的长度不能超过2000个字符。

Component Alignment(组件对齐)

Components on the NiFi canvas can be aligned to more precisely arrange your dataflow. To do this, first select all the components you want to align. Then right-click to see the context menu and select “Align vertically” or “Align horizontally” depending on your desired result.

NiFi画布上的组件可以对齐,以更精确地排列数据流。 为此,首先选择要对齐的所有组件。 然后右键单击以查看菜单,并根据所需结果选择“垂直对齐”或“水平对齐”。

Align Vertically(垂直对齐)

Here is an example of aligning components vertically on your canvas. With all components selected/highlighted, right-click:

以下是在画布上垂直对齐组件的示例。 选中/突出显示所有组件后,右键单击:

Align Vertically Example Before

and select "Align vertically" to achieve these results:

并选择“垂直对齐”以获得以下结果:

Align Vertically Example After

Align Horizontally(水平对齐)

Here is an example of aligning components horizontally on your canvas. With all components selected/highlighted, right-click:

以下是在画布上水平对齐组件的示例。 选中/突出显示所有组件后,右键单击:

Align Horizontally Example Before

and select "Align horizontally" to achieve these results:

并选择“水平对齐”以获得以下结果:

Align Horizontally Example Before

Monitoring of DataFlow(监控DataFlow)

NiFi provides a great deal of information about the DataFlow in order to monitor its health and status. The Status bar provides information about the overall system health (see NiFi User Interface). Processors, Process Groups, and Remote Process Groups provide fine-grained details about their operations. Connections and Process Groups provide information about the amount of data in their queues. The Summary Page provides information about all of the components on the canvas in a tabular format and also provides System Diagnostics that include disk usage, CPU utilization, and Java Heap and Garbage Collection information. In a clustered environment, this information is available per-node or as aggregates across the entire cluster. We will explore each of these monitoring artifacts below.

NiFi提供有关DataFlow的大量信息,以便监控其健康状况。 状态栏提供有关整体系统运行状况的信息(请参阅NiFi用户界面)。 处理器,进程组和远程进程组提供有关其操作的细粒度详细信息。 连接和进程组提供有关其队列中数据量的信息。 摘要页面以表格格式提供有关画布上所有组件的信息,还提供包括磁盘使用情况,CPU利用率以及Java堆和垃圾收集信息的系统诊断。 在群集环境中,此信息可以按节点使用,也可以作为整个群集中的聚合使用。 我们将在下面探讨每个监控组件。

Anatomy of a Processor(处理器剖析)

NiFi provides a significant amount of information about each Processor on the canvas. The following diagram shows the anatomy of a Processor:

NiFi提供有关画布上每个处理器的大量信息。 下图显示了处理器的解剖结构:

Anatomy of a Processor

The image outlines the following elements:

该图像概述了以下元素:

  • Processor Type: NiFi provides several different types of Processors in order to allow for a wide range of tasks to be performed. Each type of Processor is designed to perform one specific task. The Processor type (PutFile, in this example) describes the task that this Processor performs. In this case, the Processor writes a FlowFile to disk - or "Puts" a FlowFile to a File.
  • Processor Type: NiFi提供多种不同类型的处理器,以便执行各种任务。 每种类型的处理器都旨在执行一项特定任务。 处理器类型(在此示例中为PutFile)描述了此处理器执行的任务。 在这种情况下,处理器将FlowFile写入磁盘 - 或者将FlowFile“放入”文件。
  • Bulletin Indicator: When a Processor logs that some event has occurred, it generates a Bulletin to notify those who are monitoring NiFi via the User Interface. The DFM is able to configure which bulletins should be displayed in the User Interface by updating the "Bulletin level" field in the "Settings" tab of the Processor configuration dialog. The default value is WARN, which means that only warnings and errors will be displayed in the UI. This icon is not present unless a Bulletin exists for this Processor. When it is present, hovering over the icon with the mouse will provide a tooltip explaining the message provided by the Processor as well as the Bulletin level. If the instance of NiFi is clustered, it will also show the Node that emitted the Bulletin. Bulletins automatically expire after five minutes.
  • Bulletin Indicator: 当处理器记录某个事件已发生时,它会生成一个公告,以通过用户界面通知正在监控NiFi的人员。 DFM能够通过更新“处理器配置”对话框的“设置”选项卡中的“公告级别”字段来配置应在用户界面中显示的公告。 默认值为“WARN”,这意味着UI中仅显示警告和错误。 除非此处理器存在公告,否则此图标不存在。 当它出现时,用鼠标悬停在图标上将提供一个工具提示,说明处理器和公告级别提供的消息。 如果NiFi的实例是群集的,它还将显示发布公告的节点。 公告会在五分钟后自动失效。
  • Status Indicator: Shows the current Status of the Processor. The following indicators are possible:
  • Status Indicator: 显示处理器的当前状态。 以下指标是可能的:
    • Running Running: The Processor is currently running.
    • Running Running: 处理器执行中。
    • Stopped Stopped: The Processor is valid and enabled but is not running.
    • Stopped Stopped: 处理器配置正确,但没有运行。
    • Invalid Invalid: The Processor is enabled but is not currently valid and cannot be started. Hovering over this icon will provide a tooltip indicating why the Processor is not valid.
    • Invalid Invalid: 处理器已启用但当前无效且无法启动。 将鼠标悬停在此图标上将提供工具提示,指示处理器无效的原因。
    • Disabled Disabled: The Processor is not running and cannot be started until it has been enabled. This status does not indicate whether or not the Processor is valid.
    • Disabled Disabled: 处理器未运行,在启用之前无法启动。 此状态不表示处理器是否有效。
  • Processor Name: This is the user-defined name of the Processor. By default, the name of the Processor is the same as the Processor Type. In the example, this value is "Copy to /review".
  • Processor Name: 这是用户定义的处理器名称。 默认情况下,Processor的名称与Processor Type相同。 在示例中,此值为“复制到/审核”。
  • Active Tasks: The number of tasks that this Processor is currently executing. This number is constrained by the "Concurrent tasks" setting in the "Scheduling" tab of the Processor configuration dialog. Here, we can see that the Processor is currently performing one task. If the NiFi instance is clustered, this value represents the number of tasks that are currently executing across all nodes in the cluster.
  • Active Tasks: 此处理器当前正在执行的任务数。 此数字受“处理器配置”对话框的“计划”选项卡中的“并发任务”设置的约束。 在这里,我们可以看到处理器当前正在执行一项任务。 如果NiFi实例是群集的,则此值表示当前正在群集中的所有节点上执行的任务数。
  • 5-Minute Statistics: The Processor shows several different statistics in tabular form. Each of these statistics represents the amount of work that has been performed in the past five minutes. If the NiFi instance is clustered, these values indicate how much work has been done by all of the Nodes combined in the past five minutes. These metrics are:
  • 5-Minute Statistics: 处理器以表格形式显示几种不同的统计信息。 这些统计数据中的每一个都代表过去五分钟内完成的工作量。 如果NiFi实例是群集的,则这些值表示在过去五分钟内所有节点组合完成了多少工作。 这些指标是:
    • In: The amount of data that the Processor has pulled from the queues of its incoming Connections. This value is represented as () where is the number of FlowFiles that have been pulled from the queues and is the total size of those FlowFiles' content. In this example, the Processor has pulled 29 FlowFiles from the input queues, for a total of 14.16 megabytes (MB).
    • In: 处理器从其传入连接的队列中提取的数据量。 此值表示为),其中是从队列中提取的FlowFiles的数量,是这些FlowFiles内容的总大小。 在此示例中,处理器已从输入队列中提取了29个FlowFiles,总计14.16兆字节(MB)。
    • Read/Write: The total size of the FlowFile content that the Processor has read from disk and written to disk. This provides valuable information about the I/O performance that this Processor requires. Some Processors may only read the data without writing anything while some will not read the data but will only write data. Others will neither read nor write data, and some Processors will both read and write data. In this example, we see that in the past five minutes, this Processor has read 4.88 MB of the FlowFile content and has written 4.88 MB as well. This is what we would expect, since this Processor simply copies the contents of a FlowFile to disk. Note, however, that this is not the same as the amount of data that it pulled from its input queues. This is because some of the files that it pulled from the input queues already exist in the output directory, and the Processor is configured to route FlowFiles to failure when this occurs. Therefore, for those files which already existed in the output directory, data was neither read nor written to disk.
    • Read/Write: 处理器从磁盘读取并写入磁盘的FlowFile内容的总大小。这提供了有关此处理器所需的I / O性能的有用信息。某些处理器可能只读取数据而不写入任何内容,而某些处理器不会读取数据但只会写入数据。其他人既不会读取也不会写入数据,而某些处理器会读取和写入数据。在这个例子中,我们看到在过去的五分钟内,这个处理器读取了4.88 MB的FlowFile内容,并且写了4.88 MB。这是我们所期望的,因为这个处理器只是将FlowFile的内容复制到磁盘。但请注意,这与从输入队列中提取的数据量不同。这是因为它从输入队列中提取的某些文件已经存在于输出目录中,并且处理器配置为在发生这种情况时将FlowFiles路由到失败。因此,对于那些已经存在于输出目录中的文件,数据既不会被读取也不会被写入磁盘。
    • Out: The amount of data that the Processor has transferred to its outbound Connections. This does not include FlowFiles that the Processor removes itself, or FlowFiles that are routed to connections that are auto-terminated. Like the "In" metric above, this value is represented as () where is the number of FlowFiles that have been transferred to outbound Connections and is the total size of those FlowFiles' content. In this example, all of the Relationships are configured to be auto-terminated, so no FlowFiles are reported as having been transferred Out.
    • Out: 处理器已传输到其出站连接的数据量。 这不包括处理器自行删除的FlowFiles,也不包括路由到自动终止的连接的FlowFiles。 与上面的“In”指标一样,此值表示为),其中是已传输到出站Connections的FlowFiles的数量,是这些FlowFiles内容的总大小。 在此示例中,所有关系都配置为自动终止,因此不会报告任何FlowFiles被转出。
    • Tasks/Time: The number of times that this Processor has been triggered to run in the past 5 minutes, and the amount of time taken to perform those tasks. The format of the time is ::. Note that the amount of time taken can exceed five minutes, because many tasks can be executed in parallel. For instance, if the Processor is scheduled to run with 60 Concurrent tasks, and each of those tasks takes one second to complete, it is possible that all 60 tasks will be completed in a single second. However, in this case we will see the Time metric showing that it took 60 seconds, instead of 1 second. This time can be thought of as "System Time," or said another way, this value is 60 seconds because that’s the amount of time it would have taken to perform the action if only a single concurrent task were used.
    • Tasks/Time: 此处理器在过去5分钟内被触发运行的次数,以及执行这些任务所花费的时间。 时间格式为。 请注意,所花费的时间可能超过五分钟,因为许多任务可以并行执行。 例如,如果处理器计划运行60个并发任务,并且每个任务都需要一秒钟才能完成,则所有60个任务可能会在一秒钟内完成。 但是,在这种情况下,我们会看到时间指标显示它需要60秒,而不是1秒。 这个时间可以被认为是“系统时间”,或者说另一种方式,这个值是60秒,因为如果只使用一个并发任务,它就是执行操作所花费的时间。

Anatomy of a Process Group(进程组剖析)

The Process Group provides a mechanism for grouping components together into a logical construct in order to organize the DataFlow in a way that makes it more understandable from a higher level. The following image highlights the different elements that make up the anatomy of a Process Group:

进程组提供了一种机制,用于将组件组合到一个逻辑构造中,以便以更高级别更容易理解的方式组织DataFlow。 下图突出显示了构成Process Group解剖结构的不同元素:

Anatomy of a Process Group

The Process Group consists of the following elements:

过程组由以下元素组成:

  • Name: This is the user-defined name of the Process Group. This name is set when the Process Group is added to the canvas. The name can later by changed by right-clicking on the Process Group and clicking the "Configure" menu option. In this example, the name of the Process Group is "Process Group ABC."
  • Name: 这是用户定义的进程组名称。 将进程组添加到画布时,将设置此名称。 稍后可以通过右键单击“进程组”并单击“配置”菜单选项来更改名称。 在此示例中,进程组的名称是“Process Group ABC”
  • Bulletin Indicator: When a child component of a Process Group emits a bulletin, that bulletin is propagated to the component’s parent Process Group, as well. When any component has an active Bulletin, this indicator will appear, allowing the user to hover over the icon with the mouse to see the Bulletin.
  • Bulletin Indicator: 当进程组的子组件发布公告时,该公告也会传播到组件的父进程组。 当任何组件具有活动公告时,将显示此指示符,允许用户使用鼠标将鼠标悬停在图标上以查看公告。
  • Active Tasks: The number of tasks that are currently executing by the components within this Process Group. Here, we can see that the Process Group is currently performing two tasks. If the NiFi instance is clustered, this value represents the number of tasks that are currently executing across all nodes in the cluster.
  • Active Tasks: 此进程组中的组件当前正在执行的任务数。 在这里,我们可以看到Process Group当前正在执行两项任务。 如果NiFi实例是群集的,则此值表示当前正在群集中的所有节点上执行的任务数。
  • Statistics: Process Groups provide statistics about the amount of data that has been processed by the Process Group in the past 5 minutes as well as the amount of data currently enqueued within the Process Group. The following elements comprise the "Statistics" portion of a Process Group:
  • Statistics: 流程组提供有关过程组在过去5分钟内处理的数据量以及当前在流程组中排队的数据量的统计信息。 以下元素包含流程组的“统计”部分:
    • Queued: The number of FlowFiles currently enqueued within the Process Group. This field is represented as () where is the number of FlowFiles that are currently enqueued in the Process Group and is the total size of those FlowFiles' content. In this example, the Process Group currently has 26 FlowFiles enqueued with a total size of 12.7 megabytes (MB).
    • Queued: 当前在Process Group中排队的FlowFiles数。 此字段表示为),其中是当前在Process Group中排队的FlowFiles的数量,是这些FlowFiles内容的总大小。 在此示例中,Process Group当前有26个FlowFiles排队,总大小为12.7兆字节(MB)。
    • In: The number of FlowFiles that have been transferred into the Process Group through all of its Input Ports over the past 5 minutes. This field is represented as / where is the number of FlowFiles that have entered the Process Group in the past 5 minutes, is the total size of those FlowFiles' content and is the number of Input Ports. In this example, 8 FlowFiles have entered the Process Group with a total size of 800 KB and two Input Ports exist.
    • In: 在过去5分钟内通过其所有输入端口传输到进程组的FlowFiles数。 此字段表示为 / ,其中是过去5分钟内进入Process Group的FlowFiles的数量,是这些FlowFiles内容的总大小, 是输入端口的数量。 在此示例中,8个FlowFiles已进入进程组,总大小为800 KB,并且存在两个输入端口。
    • Read/Write: The total size of the FlowFile content that the components within the Process Group have read from disk and written to disk. This provides valuable information about the I/O performance that this Process Group requires. In this example, we see that in the past five minutes, components within this Process Group have read 14.72 MB of the FlowFile content and have written 14.8 MB.
    • Read/Write: 进程组中的组件已从磁盘读取并写入磁盘的FlowFile内容的总大小。 这提供了有关此Process Group所需的I / O性能的有用信息。 在此示例中,我们看到在过去五分钟内,此Process Group中的组件读取了14.72 MB的FlowFile内容,并写入了14.8 MB。
    • Out: The number of FlowFiles that have been transferred out of the Process Group through its Output Ports over the past 5 minutes. This field is represented as () where is the number of Output Ports, is the number of FlowFiles that have exited the Process Group in the past 5 minutes and is the total size of those FlowFiles' content. In this example, there are three Output Ports, 16 FlowFiles have exited the Process Group and their total size is 78.57 KB.
    • Out: 在过去5分钟内通过其输出端口传输出进程组的FlowFiles数。 此字段表示为),其中是输出端口的数量,是过去5分钟内退出Process Group的FlowFiles的数量和是FlowFiles内容的总大小。 在此示例中,有三个输出端口,16个FlowFiles已退出进程组,其总大小为78.57 KB。
  • Component Counts: The Component Counts element provides information about how many components of each type exist within the Process Group. The following provides information about each of these icons and their meanings:
  • Component Counts: Component Counts元素提供有关Process Group中存在的每种类型的组件数的信息。 以下提供了有关这些图标及其含义的信息:
    • Transmission Active Transmitting Ports: The number of Remote Process Group Ports that currently are configured to transmit data to remote instances of NiFi or pull data from remote instances of NiFi.
    • Transmission Active Transmitting Ports: 当前配置为将数据传输到远程NiFi实例或从远程NiFi实例提取数据的远程进程组端口的数量。
    • Transmission Inactive Non-Transmitting Ports: The number of Remote Process Group Ports that are currently connected to components within this Process Group but currently have their transmission disabled.
    • Transmission Inactive Non-Transmitting Ports: 当前连接到此进程组中的组件但当前已禁用其传输的远程进程组端口的数量。
    • Running Running Components: The number of Processors, Input Ports, and Output Ports that are currently running within this Process Group.
    • Running Running Components: 当前在此进程组中运行的处理器,输入端口和输出端口的数量。
    • Stopped Components Stopped Components: The number of Processors, Input Ports, and Output Ports that are currently not running but are valid and enabled. These components are ready to be started.
    • Stopped Components Stopped Components: 当前未运行但有效且已启用的处理器,输入端口和输出端口的数量。 这些组件已准备好启动。
    • Invalid Components Invalid Components: The number of Processors, Input Ports, and Output Ports that are enabled but are currently not in a valid state. This may be due to misconfigured properties or missing Relationships.
    • Invalid Components Invalid Components: 已启用但当前未处于有效状态的处理器,输入端口和输出端口的数量。 这可能是由于配置错误或缺少关系造成的。
    • Disabled Components Disabled Components: The number of Processors, Input Ports, and Output Ports that are currently disabled. These components may or may not be valid. If the Process Group is started, these components will not cause any errors but will not be started.
    • Disabled Components Disabled Components: 当前禁用的处理器,输入端口和输出端口的数量。 这些组件可能有效,也可能无效。 如果启动了进程组,则这些组件不会导致任何错误,但不会启动。
  • Version State Counts: The Version State Counts element provides information about how many versioned process groups are within the Process Group. See Version States for more information.
  • Version State Counts: Version State Counts元素提供有关Process Group中有多少版本化进程组的信息。 有关详细信息,请参阅版本状态
  • Comments: When the Process Group is added to the canvas, the user is given the option of specifying Comments in order to provide information about the Process Group. The comments can later be changed by right-clicking on the Process Group and clicking the "Configure" menu option.
  • Comments: 将流程组添加到画布后,将为用户提供指定注释的选项,以便提供有关流程组的信息。 稍后可以通过右键单击“进程组”并单击“配置”菜单选项来更改注释。

Anatomy of a Remote Process Group(远程过程组的剖析)

When creating a DataFlow, it is often necessary to transfer data from one instance of NiFi to another. In this case, the remote instance of NiFi can be thought of as a Process Group. For this reason, NiFi provides the concept of a Remote Process Group. From the User Interface, the Remote Process Group looks similar to the Process Group. However, rather than showing information about the inner workings and state of a Remote Process Group, such as queue sizes, the information rendered about a Remote Process Group is related to the interaction that occurs between this instance of NiFi and the remote instance.

创建DataFlow时,通常需要将数据从一个NiFi实例传输到另一个实例。 在这种情况下,NiFi的远程实例可以被视为进程组。 因此,NiFi提供了远程过程组的概念。 从用户界面,远程进程组看起来类似于进程组。 但是,不是显示有关远程进程组的内部工作和状态的信息(例如队列大小),而是呈现有关远程进程组的信息与此NiFi实例与远程实例之间发生的交互有关。

Anatomy of a Remote Process Group

The image above shows the different elements that make up a Remote Process Group. Here, we provide an explanation of the icons and details about the information provided.

上图显示了组成远程进程组的不同元素。 在这里,我们提供有关所提供信息的图标和详细信息的说明。

  • Transmission Status: The Transmission Status indicates whether or not data Transmission between this instance of NiFi and the remote instance is currently enabled. The icon shown will be the Transmission Enabled icon ( Transmission Active ) if any of the Input Ports or Output Ports is currently configured to transmit or the Transmission Disabled icon ( Transmission Inactive ) if all of the Input Ports and Output Ports that are currently connected are stopped.
  • Transmission Status: 传输状态指示当前是否启用了此NiFi实例与远程实例之间的数据传输。 如果有任何输入端口或输出端口,显示的图标将是Transmission Enabled图标(Transmission Active ) 目前配置为传输或传输已禁用图标(传输无效)如果所有输入端口和输出 当前连接的端口已停止。
  • Remote Instance Name: This is the name of the NiFi instance that was reported by the remote instance. When the Remote Process Group is first created, before this information has been obtained, the URL of the remote instance will be shown here instead.
  • Remote Instance Name: 这是远程实例报告的NiFi实例的名称。 首次创建远程进程组时,在获取此信息之前,此处将显示远程实例的URL。
  • Remote Instance URL: This is the URL of the remote instance that the Remote Process Group points to. This URL is entered when the Remote Process Group is added to the canvas and it cannot be changed.
  • Remote Instance URL: 这是远程进程组指向的远程实例的URL。 将远程进程组添加到画布并且无法更改时,将输入此URL。
  • Secure Indicator: This icon indicates whether or not communications with the remote NiFi instance are secure. If communications with the remote instance are secure, this will be indicated by the "Locked" icon ( Secure ). If the communications are not secure, this will be indicated by the "Unlocked" icon ( Not Secure ). If the communications are secure, this instance of NiFi will not be able to communicate with the remote instance until an administrator for the remote instance grants access. Whenever the Remote Process Group is added to the canvas, this will automatically initiate a request to have a user for this instance of NiFi created on the remote instance. This instance will be unable to communicate with the remote instance until an administrator on the remote instance adds the user to the system and adds the "NiFi" role to the user. In the event that communications are not secure, the Remote Process Group is able to receive data from anyone, and the data is not encrypted while it is transferred between instances of NiFi.
  • Secure Indicator: 此图标表示与远程NiFi实例的通信是否安全。如果与远程实例的通信是安全的,则会通过“锁定”图标( 安全 )指示。如果通信不安全,将通过“未锁定”图标( 不安全 )指示。如果通信是安全的,则在远程实例的管理员授予访问权限之前,此NiFi实例将无法与远程实例通信。每当将远程进程组添加到画布时,这将自动发起请求,以便在远程实例上创建此NiFi实例的用户。在远程实例上的管理员将用户添加到系统并向用户添加“NiFi”角色之前,此实例将无法与远程实例通信。如果通信不安全,远程进程组可以从任何人接收数据,并且在NiFi实例之间传输数据时不会对数据进行加密。
  • 5-Minute Statistics: Two statistics are shown for Remote Process Groups: Sent and Received. Both of these are in the format () where is the number of FlowFiles that have been sent or received in the previous five minutes and is the total size of those FlowFiles' content.
  • 5-Minute Statistics: 显示远程进程组的两个统计信息:已发送已接收。 这两种格式都是)格式,其中是在前五分钟内发送或接收的FlowFiles的数量,是这些FlowFiles内容的总大小。
  • Comments: The Comments that are provided for a Remote Process Group are not comments added by the users of this NiFi but rather the Comments added by the administrators of the remote instance. These comments indicate the purpose of the NiFi instance as a whole.
  • Comments: 为远程进程组提供的注释不是由此NiFi的用户添加的注释,而是由远程实例的管理员添加的注释。 这些评论表明了NiFi实例的整体目的。
  • Last Refreshed Time: The information that is pulled from a remote instance and rendered on the Remote Process Group in the User Interface is periodically refreshed in the background. This element indicates the time at which that refresh last happened, or if the information has not been refreshed for a significant amount of time, the value will change to indicate Remote flow not current. NiFi can be triggered to initiate a refresh of this information by right-clicking on the Remote Process Group and choosing the "Refresh flow" menu item.
  • Last Refreshed Time: 从远程实例中提取并在用户界面中的远程进程组上呈现的信息会在后台定期刷新。 此元素指示上次刷新的时间,或者如果信息在相当长的时间内未刷新,则值将更改为指示远程流不是当前。 通过右键单击远程进程组并选择“刷新流程”菜单项,可以触发NiFi以启动刷新此信息。

Queue Interaction(交互队列)

The FlowFiles enqueued in a Connection can be viewed when necessary. The Queue listing is opened via List queue in a Connection’s context menu. The listing will return the top 100 FlowFiles in the active queue according to the configured priority. The listing can be performed even if the source and destination are actively running.

必要时,可以查看连接中排队的FlowFiles。 队列列表通过连接的菜单中的List queue打开。 该列表将根据配置的优先级返回活动队列中的前100个FlowFiles。 即使源和目标正在运行,也可以执行列表。

Additionally, details for a Flowfile in the listing can be viewed by clicking on the Details icon ( Details ) in the left most column. From here, the FlowFile details and attributes are available as well buttons for downloading or viewing the content. Viewing the content is only available if the nifi.content.viewer.url has been configured. If the source or destination of the Connection are actively running, there is a chance that the desired FlowFile will no longer be in the active queue.

此外,可以通过单击详细信息图标(Details )查看列表中Flowfile的详细信息。 最左边的列。 从这里,可以使用FlowFile详细信息和属性以及用于下载或查看内容的按钮。 只有在配置了nifi.content.viewer.url后才能查看内容。 如果Connection的源或目标正在运行,则所需的FlowFile可能不再位于活动队列中。

The FlowFiles enqueued in a Connection can also be deleted when necessary. The removal of the FlowFiles is initiated via Empty queue in the Connection’s context menu. This action can also be performed if the source and destination are actively running.

必要时,还可以删除连接中排队的FlowFiles。 FlowFiles的删除是通过连接的菜单中的“Empty queue”启动的。 如果源和目标正在运行,也可以执行此操作。

Summary Page(摘要页)

While the NiFi canvas is useful for understanding how the configured DataFlow is laid out, this view is not always optimal when trying to discern the status of the system. In order to help the user understand how the DataFlow is functioning at a higher level, NiFi provides a Summary page. This page is available in the Global Menu in the top-right corner of the User Interface. See the NiFi User Interface section for more information about the location of this toolbar.

虽然NiFi画布对于了解如何布置配置的DataFlow非常有用,但在尝试辨别系统状态时,此视图并不总是最佳的。 为了帮助用户了解DataFlow在更高级别的运行方式,NiFi提供了“摘要”页面。 此页面位于用户界面右上角的“全局菜单”中。 有关此工具栏位置的详细信息,请参阅NiFi用户界面 部分。

The Summary Page is opened by selecting Summary from the Global Menu. This opens the Summary table dialog:

通过从全局菜单中选择摘要来打开摘要页面。 这将打开“摘要”表对话框:

Summary Table

This dialog provides a great deal of information about each of the components on the canvas. Below, we have annotated the different elements within the dialog in order to make the discussion of the dialog easier.

此对话框提供有关画布上每个组件的大量信息。 下面,我们在对话框中注释了不同的元素,以便更容易地讨论对话框。

Summary Table Annotated

The Summary page is primarily comprised of a table that provides information about each of the components on the canvas. Above this table is a set of five tabs that can be used to view the different types of components. The information provided in the table is the same information that is provided for each component on the canvas. Each of the columns in the table may be sorted by clicking on the heading of the column. For more on the types of information displayed, see the sections Anatomy of a Processor, Anatomy of a Process Group, and Anatomy of a Remote Process Group above.

“摘要”页面主要由一个表组成,该表提供有关画布上每个组件的信息。 此表上方是一组五个选项卡,可用于查看不同类型的组件。 表中提供的信息与为画布上的每个组件提供的信息相同。 可以通过单击列的标题对表中的每个列进行排序。 有关所显示信息类型的更多信息,请参阅处理器剖析剖析 过程组远程过程组剖析上面。

The Summary page also includes the following elements:

“摘要”页面还包含以下元素:

  • Bulletin Indicator: As in other places throughout the User Interface, when this icon is present, hovering over the icon will provide information about the Bulletin that was generated, including the message, the severity level, the time at which the Bulletin was generated, and (in a clustered environment) the node that generated the Bulletin. Like all the columns in the Summary table, this column where bulletins are shown may be sorted by clicking on the heading so that all the currently existing bulletins are shown at the top of the list.
  • Bulletin Indicator: 与整个用户界面中的其他位置一样,当存在此图标时,将鼠标悬停在图标上将提供有关生成的公告的信息,包括消息,严重性级别,公告生成的时间以及(集群环境)生成公告的节点。 与“摘要”表中的所有列一样,可以通过单击标题对显示公告的列进行排序,以便所有当前存在的公告显示在列表顶部。
  • Details: Clicking the Details icon will provide the user with the details of the component. This dialog is the same as the dialog provided when the user right-clicks on the component and chooses the "View Configuration" menu item.
  • Details: 单击“详细信息”图标将为用户提供组件的详细信息。 此对话框与用户右键单击组件并选择“查看配置”菜单项时提供的对话框相同。
  • Go To: Clicking this button will close the Summary page and take the user directly to the component on the NiFi canvas. This may change the Process Group that the user is currently in. This icon is not available if the Summary page has been opened in a new browser tab or window (by clicking the "Pop Out" button, as described below).
  • Go To: 单击此按钮将关闭“摘要”页面,并将用户直接带到NiFi画布上的组件。 这可能会更改用户当前所在的进程组。如果已在新的浏览器选项卡或窗口中打开“摘要”页面(通过单击“弹出”按钮,如下所述),则此图标不可用。
  • Status History: Clicking the Status History icon will open a new dialog that shows a historical view of the statistics that are rendered for this component. See the section Historical Statistics of a Component for more information.
  • Status History: 单击“状态历史记录”图标将打开一个新对话框,其中显示为此组件呈现的统计信息的历史视图。 有关详细信息,请参阅[组件的历史统计]部分(http://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Status_History)。
  • Refresh: The Refresh button allows the user to refresh the information displayed without closing the dialog and opening it again. The time at which the information was last refreshed is shown just to the right of the Refresh button. The information on the page is not automatically refreshed.
  • Refresh: “刷新”按钮允许用户刷新显示的信息,而无需关闭对话框并再次打开它。 上次刷新信息的时间显示在“刷新”按钮的右侧。 页面上的信息不会自动刷新。
  • Filter: The Filter element allows users to filter the contents of the Summary table by typing in all or part of some criteria, such as a Processor Type or Processor Name. The types of filters available differ according to the selected tab. For instance, if viewing the Processor tab, the user is able to filter by name or by type. When viewing the Connections tab, the user is able to filter by source, by name, or by destination name. The filter is automatically applied when the contents of the text box are changed. Below the text box is an indicator of how many entries in the table match the filter and how many entries exist in the table.
  • Filter: Filter元素允许用户通过键入全部或部分条件(例如处理器类型或处理器名称)来过滤Summary表的内容。 可用的过滤器类型根据所选选项卡而不同。 例如,如果查看“处理器”选项卡,则用户可以按名称或类型进行过滤。 查看“连接”选项卡时,用户可以按源,名称或目标名称进行筛选。 更改文本框的内容时,将自动应用过滤器。 文本框下方是表中表中有多少条目与过滤器匹配以及表中存在多少条目的指示符。
  • Pop-Out: When monitoring a flow, it is helpful to be able to open the Summary table in a separate browser tab or window. The Pop-Out button, next to the Close button, will cause the entire Summary dialog to be opened in a new browser tab or window (depending on the configuration of the browser). Once the page is "popped out", the dialog is closed in the original browser tab/window. In the new tab/window, the Pop-Out button and the Go-To button will no longer be available.
  • Pop-Out: 监视流时,能够在单独的浏览器选项卡或窗口中打开“摘要”表是有帮助的。 “关闭”按钮旁边的“弹出”按钮将导致在新的浏览器选项卡或窗口中打开整个“摘要”对话框(具体取决于浏览器的配置)。 页面“弹出”后,对话框将在原始浏览器选项卡/窗口中关闭。 在新选项卡/窗口中,弹出按钮和“转到”按钮将不再可用。
  • System Diagnostics: The System Diagnostics window provides information about how the system is performing with respect to system resource utilization. While this is intended mostly for administrators, it is provided in this view because it does provide a summary of the system. This dialog shows information such as CPU utilization, how full the disks are, and Java-specific metrics, such as memory size and utilization, as well as Garbage Collection information.
  • System Diagnostics: “系统诊断”窗口提供有关系统在系统资源利用率方面的执行情况的信息。 虽然这主要适用于管理员,但在此视图中提供了它,因为它确实提供了系统摘要。 此对话框显示CPU利用率,磁盘空闲程度以及特定于Java的度量标准(如内存大小和利用率)以及垃圾收集信息等信息。

Historical Statistics of a Component(组件的历史统计)

While the Summary table and the canvas show numeric statistics pertaining to the performance of a component over the past five minutes, it is often useful to have a view of historical statistics as well. This information is available by right-clicking on a component and choosing the "Status History" menu option or by clicking on the Status History in the Summary page (see Summary Page for more information).

虽然“摘要”表和画布显示了与过去五分钟内组件性能相关的数字统计信息,但查看历史统计信息通常也很有用。 右键单击组件并选择“状态历史记录”菜单选项或单击“摘要”页面中的“状态历史记录”(参见摘要页 for more information)了解更多信息)。

The amount of historical information that is stored is configurable in the NiFi properties but defaults to 24 hours. For specific configuration information reference the Component Status Repository of the System Administrator’s Guide. When the Status History dialog is opened, it provides a graph of historical statistics:

存储的历史信息量可在NiFi属性中配置,但默认为24小时。 有关特定配置信息,请参阅系统管理员指南的组件状态存储库。 打开“状态历史记录”对话框时,它会提供历史统计信息的图表:

Status History

The left-hand side of the dialog provides information about the component that the stats are for, as well as a textual representation of the statistics being graphed. The following information is provided on the left-hand side:

对话框的左侧提供有关统计信息所用组件的信息,以及绘制统计信息的文本表示。 左侧提供以下信息:

  • Id: The ID of the component for which the stats are being shown.
  • Id: 显示统计信息的组件的ID。
  • Group Id: The ID of the Process Group in which the component resides.
  • Group Id: 组件所在的进程组的ID。
  • Name: The Name of the Component for which the stats are being shown.
  • Name: 要显示统计信息的组件的名称。
  • Component-Specific Entries: Information is shown for each different type of component. For example, for a Processor, the type of Processor is displayed. For a Connection, the source and destination names and IDs are shown.
  • Component-Specific Entries: 显示每种不同类型组件的信息。 例如,对于处理器,将显示处理器的类型。 对于Connection,将显示源和目标名称和ID。
  • Start: The earliest time shown on the graph.
  • Start: 最早的时间显示在图表上。
  • End: The latest time shown on the graph.
  • End: 最新时间显示在图表上。
  • Min/Max/Mean: The minimum, maximum, and mean (arithmetic mean, or average) values are shown. These values are based only on the range of time selected, if any time range is selected. If this instance of NiFi is clustered, these values are shown for the cluster as a whole, as well as each individual node. In a clustered environment, each node is shown in a different color. This also serves as the graph’s legend, showing the color of each node that is shown in the graph. Hovering the mouse over the Cluster or one of the nodes in the legend will also make the corresponding node bold in the graph.
  • Min/Max/Mean: 显示最小值,最大值和平均值(算术平均值或平均值)。 如果选择了任何时间范围,这些值仅基于所选时间范围。 如果对此NiFi实例进行聚类,则会为整个群集以及每个单独节点显示这些值。 在群集环境中,每个节点以不同的颜色显示。 这也用作图形的图例,显示图形中显示的每个节点的颜色。 将鼠标悬停在群集上或图例中的其中一个节点上也会使相应的节点在图形中变为粗体。

The right-hand side of the dialog provides a drop-down list of the different types of metrics to render in the graphs below. The top graph is larger so as to provide an easier-to-read rendering of the information. In the bottom-right corner of this graph is a small handle ( Resize ) that can be dragged to resize the graph. The blank areas of the dialog can also be dragged around to move the entire dialog.

对话框的右侧提供了下表中要呈现的不同类型度量标准的下拉列表。 顶部图形较大,以便提供更容易阅读的信息呈现。 在这个图的右下角是一个小手柄( Resize),可以拖动来调整大小 图表。 也可以拖动对话框的空白区域以移动整个对话框。

The bottom graph is much shorter and provides the ability to select a time range. Selecting a time range here will cause the top graph to show only the time range selected, but in a more detailed manner. Additionally, this will cause the Min/Max/Mean values on the left-hand side to be recalculated. Once a selection has been created by dragging a rectangle over the graph, double-clicking on the selected portion will cause the selection to fully expand in the vertical direction (i.e., it will select all values in this time range). Clicking on the bottom graph without dragging will remove the selection.

底部图表更短,并提供选择时间范围的能力。 在此处选择时间范围将使顶部图形仅显示所选的时间范围,但是以更详细的方式显示。 此外,这将导致重新计算左侧的最小值/最大值/平均值。 通过在图形上拖动矩形创建选择后,双击所选部分将使选择在垂直方向上完全展开(它将选择此时间范围内的所有值)。 单击底部图形而不拖动将删除选择。


Versioning a DataFlow(DataFlow版本)

When NiFi is connected to a NiFi Registry, dataflows can be version controlled on the process group level. For more information about NiFi Registry usage and configuration, see the documentation at https://nifi.apache.org/docs/nifi-registry-docs/index.html.

当NiFi连接到NiFi注册表时,可以在进程组级别对数据流进行版本控制。 有关NiFi Registry使用和配置的更多信息,请参阅https://nifi.apache.org/docs/nifi-registry-docs/index.html上的文档。

Connecting to a NiFi Registry(连接到NiFi注册表)

To connect NiFi to a Registry, select Controller Settings from the Global Menu.

要将NiFi连接到注册表,请从全局菜单中选择控制器设置。

Global Menu - Controller Settings

This displays the NiFi Settings window. Select the Registry Clients tab and click the "+" button in the upper-right corner to register a new Registry client.

这将显示“NiFi设置”窗口。选择“注册表客户端”选项卡,然后单击右上角的“+”按钮以注册新的注册表客户端。

Registry Clients Tab

In the Add Registry Client window, provide a name and URL.

在“添加注册表客户端”窗口中,提供名称和URL。

Add Registry Client Dialog

Click "Add" to complete the registration.

单击“添加”以完成注册。

Registry Client Added

Versioned flows are stored and organized in registry buckets. Bucket Policies and Special Privileges configured by the registry administrator determine which buckets a user can import versioned flows from and which buckets a user can save versioned flows to. Information on Bucket Policies and Special Privileges can be found in the NiFi Registry User Guide (https://nifi.apache.org/docs/nifi-registry-docs/html/user-guide.html).

版本化流程在注册表桶中存储和组织。 注册管理员配置的存储桶策略和特权限定了用户可以从中导入版本化流的哪些存储桶以及用户可以将版本化流存储到哪些存储桶。 有关存储桶策略和特权的信息可在NiFi注册表用户指南(https://nifi.apache.org/docs/nifi-registry-docs/html/user-guide.html)中找到。

Version States(版本状态)

Versioned process groups exist in the following states:

版本化的进程组存在以下状态:

  • Up to date Up to date: The flow version is the latest.
  • Up to date Up to date: 流程版本是最新版本。
  • Locally Modified Locally modified: Local changes have been made.
  • Locally Modified Locally modified: 已经进行了本地更改。
  • Stale Stale: A newer version of the flow is available.
  • Stale Stale: 可以使用更新版本的流程。
  • Locally Modified and Stale Locally modified and stale: Local changes have been made and a newer version of the flow is available.
  • Locally Modified and Stale Locally modified and stale: 已经进行了本地更改,并且可以使用更新版本的流程。
  • Sync Failure Sync failure: Unable to synchronize the flow with the registry.
  • Sync Failure Sync failure: 无法将流与注册表同步。

Version state information is displayed:

显示版本状态信息:

  1. Next to the process group name, for the versioned process group itself. Hovering over the state icon displays additional information about the versioned flow.

    进程组名称旁边的版本化进程组本身。 将鼠标悬停在状态图标上会显示有关版本化流程的其他信息。

  2. At the bottom of a process group, for the versioned flows contained in the process group.

    在进程组的底部,用于进程组中包含的版本化流。

  3. In the Status Bar at the top of the UI, for the versioned flows contained in the root process group.

    在UI顶部的状态栏中,用于根进程组中包含的版本化流。

Version States Displayed

Version state information is also shown in the "Process Groups" tab of the Summary Page.

版本状态信息也显示在摘要页面的“过程组”选项卡中。

Version State in Summary Page

To see the most recent version states, it may be necessary to right-click on the NiFi canvas and select 'Refresh' from the context menu.

要查看最新版本状态,可能需要右键单击NiFi画布并从菜单中选择“刷新”。

Import a Versioned Flow(导入流程版本)

When a NiFi instance is connected to a registry, an "Import" link will appear in the Add Process Group dialog.

当NiFi实例连接到注册表时,“导入”链接将出现在“添加进程组”对话框中。

Import Process Group

Selecting the link will open the Import Version dialog.

选择链接将打开“导入版本”对话框。

Import Version Dialog

Connected registries will appear as options in the Registry drop-down menu. For the chosen Registry, buckets the user has access to will appear as options in the Bucket drop-down menu. The names of the flows in the chosen bucket will appear as options in the Name drop-down menu. Select the desired version of the flow to import and select "Import" for the dataflow to be placed on the canvas.

已连接的注册表将显示为“注册表”下拉菜单中的选项。 对于选定的注册表,用户有权访问的存储桶将显示为“存储桶”下拉菜单中的选项。 所选存储桶中的流的名称将显示为“名称”下拉菜单中的选项。 选择要导入的流的所需版本,然后为要放置在画布上的数据流选择“导入”。

Versioned Flow Imported

Since the version imported in this example is the latest version (MySQL CDC, Version 3), the state of the versioned process group is "Up to date" (Up To Date Icon). If the version imported had been an older version, the state would be "Stale" (Stale Icon).

由于此示例中导入的版本是最新版本(MySQL CDC,版本3),因此版本化进程组的状态为“最新”(Up To Date Icon)。 如果导入的版本是旧版本,则状态将为“Stale”(Stale Icon)。

Start Version Control(启动版本控制)

To place a process group under version control, right-click on the process group and in the context menu, select "Version→Start version control".

要将进程组置于版本控制之下,请右键单击进程组,然后在菜单中选择“版本→启动版本控制”。

Start Version Control

In the Save Flow Version window, select a Registry and Bucket and enter a Name for the Flow. If desired, add content for the Description and Comment fields.

在Save Flow Version窗口中,选择Registry and Bucket并输入Flow的名称。如果需要,请为“描述”和“注释”字段添加内容。

Save Flow Version Dialog

Select Save and Version 1 of the flow is saved.

选择保存并保存流的版本1。

Versioned Process Group

As the first and latest version of the flow, the state of the versioned process group is "Up to date" (Up To Date Icon).

作为流程的第一个和最新版本,版本化进程组的状态为“最新”(Up To Date Icon)。

The root process group can not be placed under version control.

根进程组不能置于版本控制之下。

Managing Local Changes(管理本地更改)

When changes are made to a versioned process group, the state of the component updates to "Locally modified" (Locally Modified Icon). The DFM can show, revert or commit the local changes. These options are available for selection in the context menu when right-clicking on the process group:

当对进程组的版本进行更改时,组件的状态将更新为“本地修改”(Locally Modified Icon。 DFM可以显示,还原或提交本地更改。 右键单击进程组时,可以在上下文菜单中选择这些选项:

Local Changes PG Selected

or when right-clicking on the canvas inside the process group:

或者右键单击进程组内的画布:

Local Changes Inside PG

The following actions are not considered local changes:

以下操作不被视为本地更改:

  • disabling/enabling processors and controller services
  • 禁用/启用处理器和控制器服务
  • stopping/starting processors
  • 停止/启动处理器
  • modifying sensitive property values
  • 修改敏感属性值
  • modifying remote process group URLs
  • 修改远程进程组URL
  • updating a processor that was referencing a non-existent controller service to reference an externally available controller service
  • 更新引用不存在的控制器服务的处理器以引用外部可用的控制器服务
  • modifying variables
  • 修改变量

Variables do not support sensitive values and will be included when versioning a Process Group. See Variables in Versioned Flows for more information.

变量不支持敏感值,并且在对流程组进行版本控制时将包含变量。 有关更多信息,请参阅版本化流程中的变量。

Show Local Changes(显示本地更改)

The local changes made to a versioned process group can be viewed in the Show Local Changes dialog by selecting "Version→Show local changes" from the context menu.

通过从上下文菜单中选择“版本→显示本地更改”,可以在“显示本地更改”对话框中查看对版本化过程组所做的本地更改。

Show Local Changes Dialog

You can navigate to a component by selecting the "Go To" icon (Go To) in its row.

您可以通过在其行中选择“转到”图标(Go To)导航到组件。

As described in the Managing Local Changes section, there are exceptions to which actions are reviewable local changes. Additionally, multiple changes to the same property will only appear as one change in the list as the changes are determined by diffing the current state of the process group and the saved version of the process group noted in the Show Local Changes dialog.

如“管理本地更改”部分中所述,有些例外可以检查哪些操作是本地更改。 此外,对同一属性的多次更改将仅显示为列表中的一个更改,因为更改是通过区分进程组的当前状态和“显示本地更改”对话框中记录的进程组的已保存版本来确定的。

Revert Local Changes(还原本地更改)

Revert the local changes made to a versioned process group by selecting "Version→Revert local changes" from the context menu. The Revert Local Changes dialog displays a list of the local changes for the DFM to review and consider prior to initiating the revert. Select "Revert" to remove all changes.

通过从菜单中选择“版本→还原本地更改”,还原对版本化进程组所做的本地更改。 “还原本地更改”对话框显示DFM在启动还原之前要查看和考虑的本地更改列表。 选择“还原”以删除所有更改。

Revert Local Changes Dialog

You can navigate to a component by selecting the "Go To" icon (Go To) in its row.

您可以通过选择其行中的“转到”图标(转到)来导航到组件。

As described in the Managing Local Changes section, there are exceptions to which actions are revertible local changes. Additionally, multiple changes to the same property will only appear as one change in the list as the changes are determined by diffing the current state of the process group and the saved version of the process group noted in the Revert Local Changes dialog.

管理本地更改部分所述,有些例外的操作是可恢复的本地更改。 此外,对同一属性的多次更改将仅显示为列表中的一个更改,因为更改是通过区分进程组的当前状态和“还原本地更改”对话框中记录的进程组的已保存版本来确定的。

Commit Local Changes(本地修改)

To commit and save a flow version, select "Version→Commit local changes" from the context menu. In the Save Flow Version dialog, add comments if desired and select "Save".

要提交和保存流版本,请从上下文菜单中选择“版本→提交本地更改”。 在“保存流版本”对话框中,根据需要添加注释,然后选择“保存”。

Save Flow Version Commit

Local changes can not be committed if the version that has been modified is not the latest version. In this scenario, the version state is "Locally modified and stale" (Locally Modified and Stale).

如果已修改的版本不是最新版本,则无法提交本地更改。 在这种情况下,版本状态是“本地修改和陈旧”(Locally Modified and Stale)。

Change Version(更改版本)

To change the version of a flow, right-click on the versioned process group and select "Version→Change version".

要更改流的版本,请右键单击版本化的流程组,然后选择“版本→更改版本”。

Change Version

In the Change Version dialog, select the desired version and select "Change":

在“更改版本”对话框中,选择所需的版本并选择“更改”:

Change Version Dialog

The version of the flow is changed:

流的版本已更改:

Flow Version Changed

In the example shown, the versioned flow is upgraded from an older to the newer latest version. However, a versioned flow can also be rollbacked to an older version.

在所示示例中,版本化流程从较旧版本升级到较新版本。 但是,版本化流程也可以回滚到旧版本。

For "Change version" to be an available selection, local changes to the process group need to be reverted.

要使“更改版本”成为可用选择,需要还原对进程组的本地更改。

Stop Version Control(停止版本控制)

To stop version control on a flow, right-click on the versioned process group and select "Version→Stop version control":

要停止对流的版本控制,请右键单击版本化的进程组,然后选择“版本→停止版本控制”:

Stop Version Control

In the Stop Version Control dialog, select "Disconnect".

在“停止版本控制”对话框中,选择“断开连接”。

Stop Version Control Dialog

The removal of the process group from version control is confirmed.

确认从版本控制中删除进程组。

Disconnect Confirmation Dialog

Version Control Stopped on Process Group

Nested Versioned Flows(嵌套版本化流程)

A versioned process group can contain other versioned process groups. However, local changes to a parent process group cannot be reverted or saved if it contains a child process group that also has local changes. The child process group must first be reverted or have its changes committed for those actions to be performed on the parent process group.

版本化进程组可以包含其他版本化进程组。 但是,如果父进程组包含也具有本地更改的子进程组,则无法还原或保存对父进程组的本地更改。 必须首先还原子进程组,或者为要在父进程组上执行的操作提交其更改。

Variables in Versioned Flows(版本化流程中的变量)

Variables are included when a process group is placed under version control. If a versioned flow is imported that references a variable not defined in the versioned process group, the reference is maintained if the variable exists. If the referenced variable does not exist, a copy of the variable will be defined in the process group. To illustrate, assume the variable “RPG_Var" is defined in the root process group:

当进程组置于版本控制之下时,会包含变量。 如果导入的版本化流引用了未在版本化进程组中定义的变量,则在变量存在时保留引用。 如果引用的变量不存在,则将在进程组中定义变量的副本。 为了说明,假设变量“RPG_Var”在根进程组中定义:

Root Process Group Defined Variable

A process group PG1 is created:

创建进程组PG1:

PG1 Process Group

The GetFile processor in PG1 references the variable "RPG_Var":

PG1中的GetFile处理器引用变量“RPG_Var”:

PG1 References RPG Variable

PG1 is saved as a versioned flow:

PG1保存为版本化流程:

PG1 Versioned Flow

If PG1 versioned flow is imported into this same NiFi instance:

如果PG1版本化流程导入到同一个NiFi实例中:

PG1 Imported to Same NiFi

the added GetFile processor will also reference the "RPG_Var" variable that exists in the root process group:

添加的GetFile处理器还将引用根进程组中存在的“RPG_Var”变量:

Both PG1 Reference RPG Variable

If PG1 versioned flow is imported into a different NiFi instance where "RPG_Var" does not exist:

如果PG1版本化流程导入到不存在“RPG_Var”的不同NiFi实例中:

PG1 Imported to Different NiFi

A "RPG_Var" variable is created in the PG1 process group:

在PG1进程组中创建“RPG_Var”变量:

PG1 References PG Variable Copy

Restricted Components in Versioned Flows(版本化流程中受限制的组件)

To import a versioned flow or revert local changes in a versioned flow, a user must have access to all the components in the versioned flow. As such, it is recommended that restricted components are created at the root process group level if they are to be utilized in versioned flows. Let’s walk through some examples to illustrate the benefits of this configuration. Assume the following:

要导入版本化流程或还原版本化流程中的本地更改,用户必须能够访问版本化流程中的所有组件。 因此,如果要在版本化流程中使用受限组件,则建议在根进程组级别创建受限组件。 让我们通过一些示例来说明此配置的好处。 假设如下:

  • There are two users, "sys_admin" and "test_user" who have access to both view and modify the root process group.
  • 有两个用户“sys_admin”和“test_user”可以访问查看和修改根进程组。
  • "sys_admin" has access to all restricted components.
  • “sys_admin”可以访问所有受限制的组件。

Sys_admin Restricted Component Access Policy

  • "test_user" has access to restricted components requiring 'read filesystem' and 'write filesystem'.
  • “test_user”可以访问需要“读取文件系统”和“写入文件系统”的受限组件。

Test_user Restricted Component Read Filesystem

Test_user Restricted Component Write Filesystem

Restricted Controller Service Created in Root Process Group(根进程组中创建的受限制的控制器服务)

In this first example, sys_admin creates a KeytabCredentialsService controller service at the root process group level.

在第一个示例中,sys_admin在根进程组级别创建KeytabCredentialsService控制器服务。

KeytabCredentialsService Controller Service RPG Level

KeytabCredentialService controller service is a restricted component that requires 'access keytab' permissions:

KeytabCredentialService控制器服务是一个受限制的组件,需要“访问密钥表”权限:

KeytabCredentialService Required Permissions

Sys_admin creates a process group ABC containing a flow with GetFile and PutHDFS processors:

Sys_admin使用GetFile和PutHDFS处理器创建一个包含流的进程组ABC:

Restricted Component Flow

GetFile processor is a restricted component that requires 'write filesystem' and 'read filesystem' permissions:

GetFile处理器是一个受限制的组件,需要“写文件系统”和“读取文件系统”权限:

GetFile Required Permissions

PutHDFS is a restricted component that requires 'write filesystem' permissions:

PutHDFS是一个受限制的组件,需要“写文件系统”权限:

PutHDFS Required Permissions

The PutHDFS processor is configured to use the root process group level KeytabCredentialsService controller service:

PutHDFS处理器配置为使用根进程组级别KeytabCredentialsService控制器服务:

PutHDFS Properties

Sys_admin saves the process group as a versioned flow:

Sys_admin将进程组保存为版本化流:

ABC Versioned Flow

Test_user changes the flow by removing the KeytabCredentialsService controller service:

Test_user通过删除KeytabCredentialsService控制器服务来更改流程:

PutHDFS No Kerberos CS

If test_user chooses to revert this change:

如果test_user选择还原此更改:

img

the revert is successful:

恢复成功:

Revert Local Changes Successful

Additionally, if test_user chooses to import the ABC versioned flow:

此外,如果test_user选择导入ABC版本化流程:

Test_user Import Flow

The import is successful:

导入成功:

Test_user Import Successful

Restricted Controller Service Created in Process Group(流程组中创建的受限制的控制器服务)

Now, consider a second scenario where the controller service is created on the process group level.

现在,考虑第二种情况,即在进程组级别创建控制器服务。

Sys_admin creates a process group XYZ:

Sys_admin创建一个进程组XYZ:

XYZ Process Group

Sys_admin creates a KeytabCredentialsService controller service at the process group level:

Sys_admin在进程组级别创建KeytabCredentialsService控制器服务:

KeytabCredentialsService Controller Service PG Level

The same GetFile and PutHDFS flow is created in the process group:

在进程组中创建相同的GetFile和PutHDFS流:

XYZ Versioned Flow

However, PutHDFS now references the process group level controller service:

但是,PutHDFS现在引用了进程组级控制器服务:

PutHDFS Properties

Sys_admin saves the process group as a versioned flow.

Sys_admin将进程组保存为版本化流。

Test_user changes the flow by removing the KeytabCredentialsService controller service. However, with this configuration, if test_user attempts to revert this change:

Test_user通过删除KeytabCredentialsService控制器服务来更改流。 但是,使用此配置,如果test_user尝试还原此更改:

Test_user Revert Local Changes

the revert is unsuccessful because test_user does not have the 'access keytab' permissions required by the KeytabCredentialService controller service:

恢复不成功,因为test_user没有KeytabCredentialService控制器服务所需的“访问密钥表”权限:

Revert Local Changes Fails

Similarly, if test_user tries to import the XYZ versioned flow:

同样,如果test_user尝试导入XYZ版本化流程:

Test_user Import Flow

The import fails:

导入失败:

XYZ Import Fails

Templates(模板)

DFMs have the ability to build very large and complex DataFlows using NiFi. This is achieved by using the basic components: Processor, Funnel, Input/Output Port, Process Group, and Remote Process Group. These can be thought of as the most basic building blocks for constructing a DataFlow. At times, though, using these small building blocks can become tedious if the same logic needs to be repeated several times.

DFM能够使用NiFi构建非常大且复杂的DataFlow。 这是通过使用基本组件实现的:处理器,漏斗,输入/输出端口,进程组和远程进程组。 这些可以被认为是构建DataFlow的最基本构建块。 但是,有时候,如果需要重复多次相同的逻辑,使用这些小的构建块会变得乏味。

To solve this issue, NiFi provides the concept of a Template. A Template is a way of combining these basic building blocks into larger building blocks. Once a DataFlow has been created, parts of it can be formed into a Template. This Template can then be dragged onto the canvas, or can be exported as an XML file and shared with others. Templates received from others can then be imported into an instance of NiFi and dragged onto the canvas.

为了解决这个问题,NiFi提供了模板的概念。 模板是将这些基本构建块组合成更大的构建块的一种方式。 创建DataFlow后,可以将其中的一部分组成模板。 然后可以将此模板拖到画布上,也可以将其导出为XML文件并与其他人共享。 然后可以将从其他人处接收的模板导入NiFi实例并拖动到画布上。

Creating a Template(创建模板)

To create a Template, select the components that are to be a part of the template, and then click the "Create Template" ( Create Template ) button in the Operate Palette (See NiFi User Interface for more information on the Operate Palette).

要创建模板,请选择要作为模板一部分的组件,然后单击“创建模板”(Create Template)按钮(有关详细信息,请参阅NiFi用户界面 在操作调色板上)。

Clicking this button without selecting anything will create a Template that contains all of the contents of the current Process Group. This means that creating a Template with nothing selected while on the Root Process Group will create a single Template that contains the entire flow.

单击此按钮而不选择任何内容将创建一个包含当前进程组的所有内容的模板。 这意味着在根进程组上创建一个没有选择任何内容的模板将创建一个包含整个流的模板。

After clicking this button, the user is prompted to provide a name and an optional description for the template. Each template must have a unique name. After entering the name and optional description, clicking the Create button will generate the template and notify the user that the template was successfully created, or provide an appropriate error message if unable to create the template for some reason.

单击此按钮后,将提示用户提供模板的名称和可选说明。 每个模板都必须具有唯一的名称。 输入名称和可选说明后,单击“创建”按钮将生成模板并通知用户模板已成功创建,或者如果由于某种原因无法创建模板,则提供相应的错误消息。

It is important to note that if any Processor that is Templated has a sensitive property (such as a password), the value of that sensitive property is not included in the Template. As a result, when dragging the Template onto the canvas, newly created Processors may not be valid if they are missing values for their sensitive properties. Additionally, any Connection that was selected when making the Template is not included in the Template if either the source or the destination of the Connection is not also included in the Template.

请务必注意,如果任何Templated处理器具有敏感属性(例如密码),则该敏感属性的值不会包含在模板中。 因此,在将模板拖动到画布上时,如果新创建的处理器缺少其敏感属性的值,则它们可能无效。 此外,如果连接的源或目标未包含在模板中,则在制作模板时选择的任何连接都不包含在模板中。

Importing a Template(导入模板)

After receiving a Template that has been exported from another NiFi, the first step needed to use the template is to import the template into this instance of NiFi. You may import templates into any Process Group where you have the appropriate authorization.

在收到从另一个NiFi导出的模板后,使用该模板所需的第一步是将模板导入到此NiFi实例中。 您可以将模板导入到具有相应授权的任何Process Group。

From the Operate Palette, click the "Upload Template" ( Upload Template ) button (see NiFi User Interface for more information on the Operate Palette). This will display the Upload Template dialog. Click the find icon and use the File Selection dialog to choose which template file to upload. Select the file and click Open. Clicking the "Upload" button will attempt to import the Template into this instance of NiFi. The Upload Template dialog will update to show "Success" or an error message if there was a problem importing the template.

从操作面板中,单击“上传模板”( Upload Template )按钮(参见NiFi用户界面 解有关操作选项板的更多信息。 这将显示“上载模板”对话框。 单击查找图标并使用“文件选择”对话框选择要上载的模板文件。 选择文件,然后单击“打开”。 单击“上传”按钮将尝试将模板导入此NiFi实例。 如果导入模板时出现问题,“上载模板”对话框将更新为显示“成功”或错误消息。

Instantiating a Template(实例化模板)

Once a Template has been created (see Creating a Template) or imported (see Importing a Template), it is ready to be instantiated, or added to the canvas. This is accomplished by dragging the Template icon ( Template ) from the Components Toolbar (see NiFi User Interface) onto the canvas.

创建模板后(参见创建模板)或导入(参见导入模板),它已准备好实例化,或添加到画布中。 这是通过从组件工具栏拖动模板图标( Template )来完成的(参见NiFi用户界面)到画布上。

This will present a dialog to choose which Template to add to the canvas. After choosing the Template to add, simply click the "Add" button. The Template will be added to the canvas with the upper-left-hand side of the Template being placed wherever the user dropped the Template icon.

这将显示一个对话框,用于选择要添加到画布的模板。 选择要添加的模板后,只需单击“添加”按钮即可。 模板将添加到画布中,模板的左上角放置在用户放置模板图标的任何位置。

This leaves the contents of the newly instantiated Template selected. If there was a mistake, and this Template is no longer wanted, it may be deleted.

这使得新实例化的模板的内容被选中。 如果出现错误,并且不再需要此模板,则可能会将其删除。

Managing Templates(管理模板)

One of the most powerful features of NiFi Templates is the ability to easily export a Template to an XML file and to import a Template that has already been exported. This provides a very simple mechanism for sharing parts of a DataFlow with others. You can select Templates from the Global Menu (see NiFi User Interface) to open a dialog that displays all of the Templates that are currently available, filter the templates to see only those of interest, export, and delete Templates.

NiFi模板最强大的功能之一是能够轻松地将模板导出到XML文件并导入已导出的模板。 这提供了一种非常简单的机制,用于与其他人共享部分DataFlow。 您可以从全局菜单中选择模板(参见NiFi用户界面)打开一个显示全部的对话框 当前可用的模板,过滤模板以仅查看感兴趣的模板,导出和删除模板。

Exporting a Template(导出模板)

Once a Template has been created, it can be shared with others in the Template Management page. To export a Template, locate the Template in the table. The Filter in the top-right corner can be used to help find the appropriate Template if several are available. Then click the Export or Download button ( Export ). This will download the template as an XML file to your computer. This XML file can then be sent to others and imported into other instances of NiFi (see Importing a Template).

创建模板后,可以在“模板管理”页面中与其他人共享。 要导出模板,请在表中找到模板。 如果有几个可用的话,右上角的过滤器可用于帮助查找相应的模板。 然后单击“导出”或“下载”按钮(Export )。 这会将模板作为XML文件下载到您的计算机。 然后可以将此XML文件发送给其他人并导入到其他NiFi实例中(请参阅导入模板)。

Removing a Template(删除模板)

Once it is decided that a Template is no longer needed, it can be easily removed from the Template Management page. To delete a Template, locate it in the table (the Filter in the top-right corner may be used to find the appropriate Template if several are available) and click the Delete button ( Delete ). This will prompt for confirmation. After confirming the deletion, the Template will be removed from this table and will no longer be available to add to the canvas.

一旦确定不再需要模板,就可以从模板管理页面轻松删除它。 要删除模板,请在表格中找到它(如果有几个可用,可以使用右上角的过滤器查找相应的模板),然后单击删除按钮( Delete )。 这将提示确认。 确认删除后,模板将从此表中删除,不再可用于添加到画布。


Data Provenance(数据来源)

While monitoring a dataflow, users often need a way to determine what happened to a particular data object (FlowFile). NiFi’s Data Provenance page provides that information. Because NiFi records and indexes data provenance details as objects flow through the system, users may perform searches, conduct troubleshooting and evaluate things like dataflow compliance and optimization in real time. By default, NiFi updates this information every five minutes, but that is configurable.

在监视数据流时,用户通常需要一种方法来确定特定数据对象(FlowFile)的发生情况。 NiFi的Data Provenance页面提供了该信息。 由于NiFi在对象流经系统时记录和索引数据来源详细信息,因此用户可以执行搜索,进行故障排除以及实时评估数据流合规性和优化等内容。 默认情况下,NiFi每五分钟更新一次此信息,但这是可配置的。

To access the Data Provenance page, select Data Provenance from the Global Menu. Clicking this button opens a dialog window t hat allows the user to see the most recent Data Provenance information available, search the information for specific items, and filter the search results. It is also possible to open additional dialog windows to see event details, replay data at any point within the dataflow, and see a graphical representation of the data’s lineage, or path through the flow. (These features are described in depth below.)

要访问Data Provenance页面,请从Global Menu中选择Data Provenance。 单击此按钮将打开一个对话窗口,允许用户查看可用的最新数据源文件信息,搜索特定项目的信息,并过滤搜索结果。 还可以打开其他对话框窗口以查看事件详细信息,在数据流中的任何位置重放数据,以及查看数据的沿袭或流程路径的图形表示。 (这些功能将在下面详细介绍。)

When authorization is enabled, accessessing Data Provenance information requires the 'query provenance' Global Policy as well as the 'view provenance' Component Policy for the component which generated the event. In addition, access to event details which include FlowFile attributes and content require the 'view the data' Component Policy for the component which generated the event.

启用授权后,访问Data Provenance信息需要“查询出处”全局策略以及生成事件的组件的“查看出处”组件策略。 此外,访问包含FlowFile属性和内容的事件详细信息需要为生成事件的组件“查看数据”组件策略。

Provenance Table

Provenance Events(起源事件)

Each point in a dataflow where a FlowFile is processed in some way is considered a 'provenance event'. Various types of provenance events occur, depending on the dataflow design. For example, when data is brought into the flow, a RECEIVE event occurs, and when data is sent out of the flow, a SEND event occurs. Other types of processing events may occur, such as if the data is cloned (CLONE event), routed (ROUTE event), modified (CONTENT_MODIFIED or ATTRIBUTES_MODIFIED event), split (FORK event), combined with other data objects (JOIN event), and ultimately removed from the flow (DROP event).

以某种方式处理FlowFile的数据流中的每个点都被视为“起源事件”。 根据数据流设计,会发生各种类型的起源事件。 例如,当数据进入流程时,会发生RECEIVE事件,并且当数据从流程中发出时,会发生SEND事件。 可能会发生其他类型的处理事件,例如克隆数据(CLONE事件),路由(ROUTE事件),修改(CONTENT_MODIFIED或ATTRIBUTES_MODIFIED事件),拆分(FORK事件),与其他数据对象(JOIN事件)相结合, 并最终从流程中删除(DROP事件)。

The provenance event types are:

起源事件类型是:

Provenance Event Description
ADDINFO Indicates a provenance event when additional information such as a new linkage to a new URI or UUID is added
ATTRIBUTES_MODIFIED Indicates that a FlowFile’s attributes were modified in some way
CLONE Indicates that a FlowFile is an exact duplicate of its parent FlowFile
CONTENT_MODIFIED Indicates that a FlowFile’s content was modified in some way
CREATE Indicates that a FlowFile was generated from data that was not received from a remote system or external process
DOWNLOAD Indicates that the contents of a FlowFile were downloaded by a user or external entity
DROP Indicates a provenance event for the conclusion of an object’s life for some reason other than object expiration
EXPIRE Indicates a provenance event for the conclusion of an object’s life due to the object not being processed in a timely manner
FETCH Indicates that the contents of a FlowFile were overwritten using the contents of some external resource
FORK Indicates that one or more FlowFiles were derived from a parent FlowFile
JOIN Indicates that a single FlowFile is derived from joining together multiple parent FlowFiles
RECEIVE Indicates a provenance event for receiving data from an external process
REPLAY Indicates a provenance event for replaying a FlowFile
ROUTE Indicates that a FlowFile was routed to a specified relationship and provides information about why the FlowFile was routed to this relationship
SEND Indicates a provenance event for sending data to an external process
UNKNOWN Indicates that the type of provenance event is unknown because the user who is attempting to access the event is not authorized to know the type
Provenance Event Description
ADDINFO 当添加其他信息(例如新链接到新URI或UUID)时,表示源项事件
ATTRIBUTES_MODIFIED Indicates that a FlowFile’s attributes were modified in some way
CLONE 表示以某种方式修改了FlowFile的属性
CONTENT_MODIFIED 表示以某种方式修改了FlowFile的内容
CREATE 表示FlowFile是从未从远程系统或外部进程接收的数据生成的
DOWNLOAD 表示用户或外部实体下载了FlowFile的内容
DROP 表示由于对象到期之外的某些原因导致对象生命结束的起源事件
EXPIRE 表示由于未及时处理对象而导致对象生命结束的起源事件
FETCH 指示使用某些外部资源的内容覆盖FlowFile的内容
FORK 表示一个或多个FlowFiles是从父FlowFile派生的
JOIN 表示单个FlowFile是通过将多个父FlowFiles连接在一起而派生的
RECEIVE 表示从外部进程接收数据的来源事件
REPLAY 表示重放FlowFile的originance事件
ROUTE 表示FlowFile已路由到指定的关系,并提供有关FlowFile路由到此关系的原因的信息
SEND 表示将数据发送到外部进程的originance事件
UNKNOWN 表示原产地事件的类型未知,因为尝试访问该事件的用户无权知道该类型

Searching for Events(事件搜索)

One of the most common tasks performed in the Data Provenance page is a search for a given FlowFile to determine what happened to it. To do this, click the Search button in the upper-right corner of the Data Provenance page. This opens a dialog window with parameters that the user can define for the search. The parameters include the processing event of interest, distinguishing characteristics about the FlowFile or the component that produced the event, the timeframe within which to search, and the size of the FlowFile.

在Data Provenance页面中执行的最常见任务之一是搜索给定的FlowFile以确定它发生了什么。 为此,请单击Data Provenance页面右上角的“Search”按钮。 这将打开一个对话框窗口,其中包含用户可以为搜索定义的参数。 参数包括感兴趣的处理事件,区分FlowFile或产生事件的组件的特征,搜索的时间范围以及FlowFile的大小。

Search Events

For example, to determine if a particular FlowFile was received, search for an Event Type of "RECEIVE" and include an identifier for the FlowFile, such as its uuid or filename. The asterisk (*) may be used as a wildcard for any number of characters. So, to determine whether a FlowFile with "ABC" anywhere in its filename was received at any time on Jan. 6, 2015, the search shown in the following image could be performed:

例如,要确定是否收到特定的FlowFile,请搜索“RECEIVE”的事件类型,并包含FlowFile的标识符,例如其uuid或文件名。 星号(*)可用作任意数量字符的通配符。 因此,要确定在2015年1月6日的任何时间是否收到了文件名中任何位置带有“ABC”的FlowFile,可以执行下图所示的搜索:

Search for RECEIVE Event

Details of an Event(活动详情)

In the far-left column of the Data Provenance page, there is a View Details icon for each event (Details). Clicking this button opens a dialog window with three tabs: Details, Attributes, and Content.

在Data Provenance页面的最左侧列中,每个事件都有一个View Details图标(Details)。 单击此按钮将打开一个对话框窗口,其中包含三个选项卡:详细信息,属性和内容。

Event Details

The Details tab shows various details about the event, such as when it occurred, what type of event it was, and the component that produced the event. The information that is displayed will vary according to the event type. This tab also shows information about the FlowFile that was processed. In addition to the FlowFile’s UUID, which is displayed on the left side of the Details tab, the UUIDs of any parent or children FlowFiles that are related to that FlowFile are displayed on the right side of the Details tab.

“详细信息”选项卡显示有关事件的各种详细信息,例如事件发生的时间,事件的类型以及生成事件的组件。 显示的信息将根据事件类型而有所不同。 此选项卡还显示有关已处理的FlowFile的信息。 除了显示在“详细信息”选项卡左侧的FlowFile的UUID之外,与“详细信息”选项卡右侧显示的与该FlowFile相关的任何父文件或子项FlowFiles的UUID也显示在该详细信息选项卡的右侧。

The Attributes tab shows the attributes that exist on the FlowFile as of that point in the flow. In order to see only the attributes that were modified as a result of the processing event, the user may select the checkbox next to "Only show modified" in the upper-right corner of the Attributes tab.

“属性”选项卡显示流程中该点上FlowFile中存在的属性。 为了仅查看由于处理事件而修改的属性,用户可以选择“属性”选项卡右上角“仅显示已修改”旁边的复选框。

Event Attributes

Replaying a FlowFile(重放FlowFile)

A DFM may need to inspect a FlowFile’s content at some point in the dataflow to ensure that it is being processed as expected. And if it is not being processed properly, the DFM may need to make adjustments to the dataflow and replay the FlowFile again. The Content tab of the View Details dialog window is where the DFM can do these things. The Content tab shows information about the FlowFile’s content, such as its location in the Content Repository and its size. In addition, it is here that the user may click the Download button to download a copy of the FlowFile’s content as it existed at this point in the flow. The user may also click the Submit button to replay the FlowFile at this point in the flow. Upon clicking Submit, the FlowFile is sent to the connection feeding the component that produced this processing event.

DFM可能需要在数据流中的某个点检查FlowFile的内容,以确保按预期处理它。 如果没有正确处理,DFM可能需要调整数据流并再次重放FlowFile。 “查看详细信息”对话框窗口的“内容”选项卡是DFM可以执行这些操作的位置。 “内容”选项卡显示有关FlowFile内容的信息,例如其在内容存储库中的位置及其大小。 此外,用户可以在此处单击“下载”按钮以下载流程中此时存在的FlowFile内容的副本。 用户还可以单击“提交”按钮以在流程中的此时重放FlowFile。 单击“提交”后,FlowFile将被发送到为生成此处理事件的组件提供的连接。

Event Content

Viewing FlowFile Lineage(查看FlowFile Lineage)

It is often useful to see a graphical representation of the lineage or path a FlowFile took within the dataflow. To see a FlowFile’s lineage, click on the "Show Lineage" icon ( Show Lineage ) in the far-right column of the Data Provenance table. This opens a graph displaying the FlowFile ( FlowFile ) and the various processing events that have occurred. The selected event will be highlighted in red. It is possible to right-click or double-click on any event to see that event’s details (see Details of an Event). To see how the lineage evolved over time, click the slider at the bottom-left of the window and move it to the left to see the state of the lineage at earlier stages in the dataflow.

查看FlowFile在数据流中采用的谱系或路径的图形表示通常很有用。 要查看FlowFile的血统,请点击远处的“显示血统”图标( Show Lineage )Data Provenance表的右栏。 这将打开一个显示FlowFile(( FlowFile)以及已发生的各种处理事件的图表。 所选事件将以红色突出显示。 可以右键单击或双击任何事件以查看该事件的详细信息(请参阅事件详细信息)。 要查看谱系如何随时间演变,请单击窗口左下角的滑块并将其向左移动以查看数据流中较早阶段的谱系状态。

Lineage Graph

Find Parents(父级)

Sometimes, a user may need to track down the original FlowFile that another FlowFile was spawned from. For example, when a FORK or CLONE event occurs, NiFi keeps track of the parent FlowFile that produced other FlowFiles, and it is possible to find that parent FlowFile in the Lineage. Right-click on the event in the lineage graph and select "Find parents" from the context menu.

有时,用户可能需要跟踪从中生成另一个FlowFile的原始FlowFile。 例如,当发生FORK或CLONE事件时,NiFi会跟踪生成其他FlowFiles的父FlowFile,并且可以在Lineage中找到父FlowFile。 右键单击沿袭图中的事件,然后从上下文菜单中选择“查找父项”。

Find Parents

Once "Find parents" is selected, the graph is re-drawn to show the parent FlowFile and its lineage as well as the child and its lineage.

选择“查找父项”后,将重新绘制图形以显示父FlowFile及其谱系以及子项及其谱系。

Parent Found

Expanding an Event(扩展)

In the same way that it is useful to find a parent FlowFile, the user may also want to determine what children were spawned from a given FlowFile. To do this, right-click on the event in the lineage graph and select "Expand" from the context menu.

与查找父FlowFile有用的方式相同,用户可能还想确定从给定FlowFile生成的子项。 要执行此操作,请右键单击沿袭图中的事件,然后从上下文菜单中选择“展开”。

Expand Event

Once "Expand" is selected, the graph is re-drawn to show the children and their lineage.

选择“展开”后,将重新绘制图形以显示子项及其谱系。

Expanded Events

Write Ahead Provenance Repository(提前编写源代码存储库)

By default, the Provenance Repository is implemented in a Persistent Provenance configuration. In Apache NiFi 1.2.0, the Write Ahead configuration was introduced to provide the same capabilities as Persistent Provenance, but with far better performance. Migrating to the Write Ahead configuration is easy to accomplish. Simply change the setting for the nifi.provenance.repository.implementation system property in the nifi.properties file from the default value of org.apache.nifi.provenance.PersistentProvenanceRepository to org.apache.nifi.provenance.WriteAheadProvenanceRepository and restart NiFi.

默认情况下,Provenance Repository以Persistent Provenance配置实现。 在Apache NiFi 1.2.0中,引入了Write Ahead配置以提供与Persistent Provenance相同的功能,但性能要好得多。 迁移到Write Ahead配置很容易实现。 只需将nifi.properties文件中``nifi.provenance.repository.implementation系统属性的设置从org.apache.nifi.provenance.PersistentProvenanceRepository的默认值更改为org.apache.nifi.provenance即可。 .WriteAheadProvenanceRepository`并重启NiFi。

However, to increase the chances of a successful migration consider the following factors and recommended actions.

但是,为了增加迁移成功的可能性,请考虑以下因素和建议的操作。

Backwards Compatibility(向后兼容性)

The WriteAheadProvenanceRepository can use the Provenance data stored by the PersistentProvenanceRepository. However, the PersistentProvenanceRepository may not be able to read the data written by the WriteAheadProvenanceRepository. Therefore, once the Provenance Repository is changed to use the WriteAheadProvenanceRepository, it cannot be changed back to the PersistentProvenanceRepository without first deleting the data in the Provenance Repository. It is therefore recommended that before changing the implementation to Write Ahead, ensure your version of NiFi is stable, in case an issue arises that requires the need to roll back to a previous version of NiFi that did not support the WriteAheadProvenanceRepository.

WriteAheadProvenanceRepository可以使用'PersistentProvenanceRepository存储的Provenance数据。 但是,PersistentProvenanceRepository可能无法读取WriteAheadProvenanceRepository所写的数据。 因此,一旦将Provenance Repository更改为使用WriteAheadProvenanceRepository,就不能在没有先删除Provenance Repository中的数据的情况下将其更改回PersistentProvenanceRepository。 因此,建议在将实现更改为Write Ahead之前,确保您的NiFi版本稳定,以防出现问题,需要回滚到不支持WriteAheadProvenanceRepository`的先前版本的NiFi。

Older Existing NiFi Version(较旧的现有NiFi版本)

If you are upgrading from an older version of NiFi to 1.2.0 or later, it is recommended that you do not change the provenance configuration to Write Ahead until you confirm your flows and environment are stable in 1.2.0 first. This reduces the number of variables in your upgrade and can simplify the debugging process if any issues arise.

如果要从较旧版本的NiFi升级到1.2.0或更高版本,建议您在1.2.0之前确认流量和环境稳定之前,不要将起源配置更改为“预先写入”。 这样可以减少升级中的变量数量,并在出现任何问题时简化调试过程。

Bootstrap.conf

While better performance is achieved with the G1 garbage collector, Java 8 bugs may surface more frequently in the Write Ahead configuration. It is recommended that the following line is commented out in the bootstrap.conf file in the confdirectory:

虽然使用G1垃圾收集器可以获得更好的性能,但Java 8错误可能会在Write Ahead配置中更频繁地出现。 建议在confdirectory中的bootstrap.conf文件中注释掉以下行:

java.arg.13=-XX:+UseG1GC

System Properties(系统属性)

Many of the same system properties are supported by both the Persistent and Write Ahead configurations, however the default values have been chosen for a Persistent Provenance configuration. The following exceptions and recommendations should be noted when changing to a Write Ahead configuration:

Persistent和Write Ahead配置都支持许多相同的系统属性,但是为Persistent Provenance配置选择了默认值。 更改为Write Ahead配置时,应注意以下例外和建议:

  • nifi.provenance.repository.journal.count is not relevant to a Write Ahead configuration
  • nifi.provenance.repository.journal.count与Write Ahead配置无关
  • nifi.provenance.repository.concurrent.merge.threads and nifi.provenance.repository.warm.cache.frequency are new properties. The default values of 2 for threads and blank for frequency (i.e. disabled) should remain for most installations.
  • nifi.provenance.repository.concurrent.merge.threadsnifi.provenance.repository.warm.cache.frequency是新属性。 对于大多数安装,线程的默认值“2”和频率的空白(即禁用)应保留。
  • Change the settings for nifi.provenance.repository.max.storage.time (default value of 24 hours) and nifi.provenance.repository.max.storage.size (default value of 1 GB) to values more suitable for your production environment
  • nifi.provenance.repository.max.storage.time(默认值为24小时)和nifi.provenance.repository.max.storage.size(默认值为1 GB)的设置更改为 更适合您的生产环境的价值观
  • Change nifi.provenance.repository.index.shard.size from the default value of 500 MB to 4 GB
  • nifi.provenance.repository.index.shard.size从默认值500 MB更改为4 GB
  • Change nifi.provenance.repository.index.threads from the default value of 2 to either 4 or 8 as the Write Ahead repository enables this to scale better
  • 将“nifi.provenance.repository.index.threads”从默认值“2”更改为“4”或“8”,因为Write Ahead存储库可以使其更好地扩展
  • If processing a high volume of events, change nifi.provenance.repository.rollover.time from a default of 30 secs to 1 min and nifi.provenance.repository.rollover.size from the default of 100 MB to 1 GB
  • 如果处理大量事件,请将“nifi.provenance.repository.rollover.time”从默认的“30秒”更改为“1分钟”,将“nifi.provenance.repository.rollover.size”从默认值更改为100 MB1 GB

Once these property changes have been made, restart NiFi.

完成这些属性更改后,重新启动NiFi。

Note: Detailed descriptions for each of these properties can be found in System Properties.

**Note: **可以在系统属性中找到每个属性的详细说明。

Encrypted Provenance Considerations(加密的源代码注意事项)

The above migration recommendations for WriteAheadProvenanceRepository also apply to the encrypted version of the configuration, EncryptedWriteAheadProvenanceRepository.

上述“WriteAheadProvenanceRepository”的迁移建议也适用于配置的加密版本EncryptedWriteAheadProvenanceRepository

The next section has more information about implementing an Encrypted Provenance Repository.

下一节提供了有关实现Encrypted Provenance Repository的更多信息。

Encrypted Provenance Repository(加密的Provenance存储库)

While OS-level access control can offer some security over the provenance data written to the disk in a repository, there are scenarios where the data may be sensitive, compliance and regulatory requirements exist, or NiFi is running on hardware not under the direct control of the organization (cloud, etc.). In this case, the provenance repository allows for all data to be encrypted before being persisted to the disk.

虽然操作系统级访问控制可以提供对存储库中写入磁盘的起源数据的某些安全性,但有些情况下数据可能是敏感的,合规性和法规要求存在,或者NiFi在不受直接控制的硬件上运行。 组织(云等)。 在这种情况下,originance存储库允许在将所有数据持久保存到磁盘之前对其进行加密。

Performance

The current implementation of the encrypted provenance repository intercepts the record writer and reader of WriteAheadProvenanceRepository, which offers significant performance improvements over the legacy PersistentProvenanceRepository and uses the AES/GCM algorithm, which is fairly performant on commodity hardware. In most scenarios, the added cost will not be significant (unnoticable on a flow with hundreds of provenance events per second, moderately noticable on a flow with thousands - tens of thousands of events per second). However, administrators should perform their own risk assessment and performance analysis and decide how to move forward. Switching back and forth between encrypted/unencrypted implementations is not recommended at this time.

性能

加密源文件库的当前实现拦截了WriteAheadProvenanceRepository的记录编写者和读者,它比传统的PersistentProvenanceRepository提供了显着的性能改进,并使用了AES / GCM算法,该算法在商用硬件上具有相当的性能。 在大多数情况下,增加的成本并不显着(在每秒数百个来源事件的流量上不明显,在每秒数千个 - 数万个事件的流量上适度显着)。 但是,管理员应该执行自己的风险评估和性能分析,并决定如何继续前进。 目前不建议在加密/未加密的实现之间来回切换。

What is it?(它是什么?)

The EncryptedWriteAheadProvenanceRepository is a new implementation of the provenance repository which encrypts all event record information before it is written to the repository. This allows for storage on systems where OS-level access controls are not sufficient to protect the data while still allowing querying and access to the data through the NiFi UI/API.

EncryptedWriteAheadProvenanceRepository是originance资源库的一个新实现,它在将所有事件记录信息写入存储库之前对其进行加密。 这允许在OS级访问控制不足以保护数据的系统上进行存储,同时仍允许通过NiFi UI / API查询和访问数据。

How does it work?(如何工作)

The WriteAheadProvenanceRepository was introduced in NiFi 1.2.0 and provided a refactored and much faster provenance repository implementation than the previous PersistentProvenanceRepository. The encrypted version wraps that implementation with a record writer and reader which encrypt and decrypt the serialized bytes respectively.

WriteAheadProvenanceRepository是在NiFi 1.2.0中引入的,它提供了比以前的PersistentProvenanceRepository更重构和更快的出处库实现。 加密版本使用记录编写器和读取器包装该实现,该记录编写器和读取器分别加密和解密序列化字节。

The fully qualified class org.apache.nifi.provenance.EncryptedWriteAheadProvenanceRepository is specified as the provenance repository implementation in nifi.properties as the value of nifi.provenance.repository.implementation. In addition, new properties must be populated to allow successful initialization.

完全限定的类org.apache.nifi.provenance.EncryptedWriteAheadProvenanceRepository被指定为nifi.properties中的起源存储库实现,作为nifi.provenance.repository.implementation的值。 此外,必须填充新属性才能成功 初始化。

StaticKeyProvider(StaticKeyProvider)

The StaticKeyProvider implementation defines keys directly in nifi.properties. Individual keys are provided in hexadecimal encoding. The keys can also be encrypted like any other sensitive property in nifi.properties using the ./encrypt-config.sh tool in the NiFi Toolkit.

StaticKeyProvider实现直接在nifi.properties中定义键。 各个键以十六进制编码提供。 使用./encrypt-config.sh 也可以像nifi.properties中的任何其他敏感属性一样加密密钥。 NiFi工具包中的工具。

The following configuration section would result in a key provider with two available keys, "Key1" (active) and "AnotherKey".

以下配置部分将导致密钥提供程序具有两个可用密钥,“Key1”(活动)和“AnotherKey”。

nifi.provenance.repository.encryption.key.provider.implementation=org.apache.nifi.security.kms.StaticKeyProvider
nifi.provenance.repository.encryption.key.id=Key1
nifi.provenance.repository.encryption.key=0123456789ABCDEFFEDCBA98765432100123456789ABCDEFFEDCBA9876543210
nifi.provenance.repository.encryption.key.id.AnotherKey=0101010101010101010101010101010101010101010101010101010101010101
FileBasedKeyProvider

The FileBasedKeyProvider implementation reads from an encrypted definition file of the format:

FileBasedKeyProvider实现从格式的加密定义文件中读取:

key1=NGCpDpxBZNN0DBodz0p1SDbTjC2FG5kp1pCmdUKJlxxtcMSo6GC4fMlTyy1mPeKOxzLut3DRX+51j6PCO5SznA==
key2=GYxPbMMDbnraXs09eGJudAM5jTvVYp05XtImkAg4JY4rIbmHOiVUUI6OeOf7ZW+hH42jtPgNW9pSkkQ9HWY/vQ==
key3=SFe11xuz7J89Y/IQ7YbJPOL0/YKZRFL/VUxJgEHxxlXpd/8ELA7wwN59K1KTr3BURCcFP5YGmwrSKfr4OE4Vlg==
key4=kZprfcTSTH69UuOU3jMkZfrtiVR/eqWmmbdku3bQcUJ/+UToecNB5lzOVEMBChyEXppyXXC35Wa6GEXFK6PMKw==
key5=c6FzfnKm7UR7xqI2NFpZ+fEKBfSU7+1NvRw+XWQ9U39MONWqk5gvoyOCdFR1kUgeg46jrN5dGXk13sRqE0GETQ==

Each line defines a key ID and then the Base64-encoded cipher text of a 16 byte IV and wrapped AES-128, AES-192, or AES-256 key depending on the JCE policies available. The individual keys are wrapped by AES/GCM encryption using the master key defined by nifi.bootstrap.sensitive.key in conf/bootstrap.conf.

每行定义一个密钥ID,然后定义16字节IV和包装的AES-128,AES-192或AES-256密钥的Base64编码密文,具体取决于可用的JCE策略。 使用conf / bootstrap.confnifi.bootstrap.sensitive.key定义的主密钥,使用AES / GCM加密包装各个密钥。

Key Rotation

Simply update nifi.properties to reference a new key ID in nifi.provenance.repository.encryption.key.id. Previously-encrypted events can still be decrypted as long as that key is still available in the key definition file or nifi.provenance.repository.encryption.key.id.<OldKeyID> as the key ID is serialized alongside the encrypted record.

只需更新nifi.properties以在nifi.provenance.repository.encryption.key.id中引用新的密钥ID。 以前加密的事件仍可以解密,只要该密钥在密钥定义文件中仍然可用,或者“nifi.provenance.repository.encryption.key.id。”,因为密钥ID与加密记录一起被序列化。

Writing and Reading Event Records

Once the repository is initialized, all provenance event record write operations are serialized according to the configured schema writer (EventIdFirstSchemaRecordWriter by default for WriteAheadProvenanceRepository) to a byte[]. Those bytes are then encrypted using an implementation of ProvenanceEventEncryptor (the only current implementation is AES/GCM/NoPadding) and the encryption metadata (keyId, algorithm, version, IV) is serialized and prepended. The complete byte[] is then written to the repository on disk as normal.

初始化存储库后,所有源项事件记录写入操作将根据配置的模式编写器(WriteAheadProvenanceRepository默认为EventIdFirstSchemaRecordWriter)序列化为byte []。 然后使用ProvenanceEventEncryptor(唯一的当前实现是AES / GCM / NoPadding)的实现来加密这些字节,并且加密元数据(keyIdalgorithmversionIV)被序列化并且前缀。 然后将完整的byte []正常写入磁盘上的存储库。

Encrypted provenance repository file on disk

On record read, the process is reversed. The encryption metadata is parsed and used to decrypt the serialized bytes, which are then deserialized into a ProvenanceEventRecord object. The delegation to the normal schema record writer/reader allows for "random-access" (i.e. immediate seek without decryption of unnecessary records).

在记录读取时,该过程是相反的。 解密加密元数据并用于解密序列化字节,然后将其反序列化为“ProvenanceEventRecord”对象。 对正常模式记录写入器/读取器的委托允许“随机访问”(即,立即搜索而不解密不必要的记录)。

Within the NiFi UI/API, there is no detectable difference between an encrypted and unencrypted provenance repository. The Provenance Query operations work as expected with no change to the process.

在NiFi UI / API中,加密和未加密的起源存储库之间没有可检测到的差异。 Provenance Query操作按预期工作,不会对过程进行任何更改。

Potential Issues

Switching Implementations
When switching between implementation "families" (i.e. VolatileProvenanceRepository or PersistentProvenanceRepository to EncryptedWriteAheadProvenanceRepository), the existing repository must be cleared from the file system before starting NiFi. A terminal command like localhost:$NIFI_HOME $ rm -rf provenance_repository/ is sufficient.

切换实现在实现“族”(即VolatileProvenanceRepository或PersistentProvenanceRepository到EncryptedWriteAheadProvenanceRepository)之间切换时,必须在启动NiFi之前从文件系统中清除现有存储库。 像localhost这样的终端命令:$ NIFI_HOME $ rm -rf provenance_repository /就足够了。

  • Switching between unencrypted and encrypted repositories
    • If a user has an existing repository (WriteAheadProvenanceRepository only — notPersistentProvenanceRepository) that is not encrypted and switches their configuration to use an encrypted repository, the application writes an error to the log but starts up. However, previous events are not accessible through the provenance query interface and new events will overwrite the existing events. The same behavior occurs if a user switches from an encrypted repository to an unencrypted repository. Automatic roll-over is a future effort (NIFI-3722) but NiFi is not intended for long-term storage of provenance events so the impact should be minimal. There are two scenarios for roll-over:
    • Encrypted → unencrypted — if the previous repository implementation was encrypted, these events should be handled seamlessly as long as the key provider available still has the keys used to encrypt the events (see Key Rotation)
    • Unencrypted → encrypted — if the previous repository implementation was unencrypted, these events should be handled seamlessly as the previously recorded events simply need to be read with a plaintext schema record reader and then written back with the encrypted record writer
    • There is also a future effort to provide a standalone tool in NiFi Toolkit to encrypt/decrypt an existing provenance repository to make the transition easier. The translation process could take a long time depending on the size of the existing repository, and being able to perform this task outside of application startup would be valuable (NIFI-3723).
  • Multiple repositories — No additional effort or testing has been applied to multiple repositories at this time. It is possible/likely issues will occur with repositories on different physical devices. There is no option to provide a heterogenous environment (i.e. one encrypted, one plaintext repository).
  • Corruption — when a disk is filled or corrupted, there have been reported issues with the repository becoming corrupted and recovery steps are necessary. This is likely to continue to be an issue with the encrypted repository, although still limited in scope to individual records (i.e. an entire repository file won’t be irrecoverable due to the encryption).
  • 在未加密和加密的存储库之间切换
    • 如果用户具有未加密的现有存储库(仅限WriteAheadProvenanceRepository- **而非**PersistentProvenanceRepository`)并将其配置切换为使用加密存储库,则应用程序会将错误写入日志但会启动。但是,以前的事件无法通过起源查询界面访问,新事件将覆盖现有事件。如果用户从加密存储库切换到未加密的存储库,则会发生相同的行为。自动翻转是未来的努力([NIFI-3722](https://issues.apache.org/jira/browse/NIFI-3722))但NiFi不适用于物源事件的长期存储,因此影响应该是最小的。翻转有两种情况:
    • 加密→未加密 - 如果先前的存储库实现已加密,只要可用的密钥提供程序仍具有用于加密事件的密钥,就应无缝处理这些事件(请参阅密钥轮换
    • 未加密→加密 - 如果以前的存储库实现未加密,则应无缝处理这些事件,因为先前记录的事件只需要使用明文架构记录读取器读取,然后使用加密记录写入器写回
    • 未来还将努力在NiFi Toolkit中提供一个独立工具来加密/解密现有的出处存储库,以简化过渡。翻译过程可能需要很长时间,具体取决于现有存储库的大小,并且能够在应用程序启动之外执行此任务将是有价值的([NIFI-3723](https://issues.apache.org/jira/)浏览/ NIFI-3723))。
    • 多个存储库 - 目前没有额外的工作或测试应用于多个存储库。可能/可能在不同物理设备上的存储库中会出现问题。没有选择提供异构环境(即一个加密的,一个明文存储库)。
    • 损坏 - 当磁盘被填满或损坏时,已经报告存储库损坏并且需要恢复步骤的问题。这可能继续是加密存储库的问题,尽管仍然限制于单个记录的范围(即,由于加密,整个存储库文件将不会是不可恢复的)。

Other Management Features(其他管理功能)

In addition to the Summary Page, Data Provenance Page, Template Management Page, and Bulletin Board Page, there are other tools in the Global Menu (see NiFi User Interface) that are useful to the DFM. Select Flow Configuration History to view all the changes that have been made to the dataflow. The history can aid in troubleshooting, such as if a recent change to the dataflow has caused a problem and needs to be fixed. The DFM can see what changes have been made and adjust the flow as needed to fix the problem. While NiFi does not have an "undo" feature, the DFM can make new changes to the dataflow that will fix the problem.

除了摘要页面,数据源代码页面,模板管理页面和公告板页面外,全局菜单中还有其他工具(参见NiFi用户界面对DFM有用。 选择“流配置历史记录”以查看对数据流所做的所有更改。 历史记录可以帮助进行故障排除,例如,如果最近对数据流的更改导致了问题并且需要修复。 DFM可以查看已进行的更改并根据需要调整流量以解决问题。 虽然NiFi没有“撤消”功能,但DFM可以对数据流进行新的更改以解决问题。

Two other tools in the Global Menu are Controller Settings and Users. The Controller Settings page provides the ability to change the name of the NiFi instance, add comments describing the NiFi instance, and set the maximum number of threads that are available to the application. It also provides tabs where DFMs may add and configure Controller Servicesand Reporting Tasks. The Users page is used to manage user access, which is described in the System Administrator’s Guide.

全局菜单中的另外两个工具是控制器设置和用户。 “控制器设置”页面提供了更改NiFi实例名称,添加描述NiFi实例的注释以及设置应用程序可用的最大线程数的功能。 它还提供了DFM可以添加和配置控制器服务报告任务的选项卡。 “用户”页面用于管理用户访问,如系统管理员指南中所述。

发表评论

电子邮件地址不会被公开。