From 369b495c9d44b6c14d2c902ef16e1aafad35bf71 Mon Sep 17 00:00:00 2001 From: Qi Liu Date: Thu, 2 Dec 2021 16:06:32 +0800 Subject: [PATCH 01/18] docs: perf: Add description for HiSilicon PCIe PMU driver PCIe PMU Root Complex Integrated End Point(RCiEP) device is supported on HiSilicon HIP09 platform. Document it to provide guidance on how to use it. Reviewed-by: John Garry Signed-off-by: Qi Liu Reviewed-by: Shaokun Zhang Link: https://lore.kernel.org/r/20211202080633.2919-2-liuqi115@huawei.com Signed-off-by: Will Deacon Signed-off-by: Slim6882 --- .../admin-guide/perf/hisi-pcie-pmu.rst | 106 ++++++++++++++++++ 1 file changed, 106 insertions(+) create mode 100644 Documentation/admin-guide/perf/hisi-pcie-pmu.rst diff --git a/Documentation/admin-guide/perf/hisi-pcie-pmu.rst b/Documentation/admin-guide/perf/hisi-pcie-pmu.rst new file mode 100644 index 000000000000..294ebbdb22af --- /dev/null +++ b/Documentation/admin-guide/perf/hisi-pcie-pmu.rst @@ -0,0 +1,106 @@ +================================================ +HiSilicon PCIe Performance Monitoring Unit (PMU) +================================================ + +On Hip09, HiSilicon PCIe Performance Monitoring Unit (PMU) could monitor +bandwidth, latency, bus utilization and buffer occupancy data of PCIe. + +Each PCIe Core has a PMU to monitor multi Root Ports of this PCIe Core and +all Endpoints downstream these Root Ports. + + +HiSilicon PCIe PMU driver +========================= + +The PCIe PMU driver registers a perf PMU with the name of its sicl-id and PCIe +Core id.:: + + /sys/bus/event_source/hisi_pcie_ + +PMU driver provides description of available events and filter options in sysfs, +see /sys/bus/event_source/devices/hisi_pcie_. + +The "format" directory describes all formats of the config (events) and config1 +(filter options) fields of the perf_event_attr structure. The "events" directory +describes all documented events shown in perf list. + +The "identifier" sysfs file allows users to identify the version of the +PMU hardware device. + +The "bus" sysfs file allows users to get the bus number of Root Ports +monitored by PMU. + +Example usage of perf:: + + $# perf list + hisi_pcie0_0/rx_mwr_latency/ [kernel PMU event] + hisi_pcie0_0/rx_mwr_cnt/ [kernel PMU event] + ------------------------------------------ + + $# perf stat -e hisi_pcie0_0/rx_mwr_latency/ + $# perf stat -e hisi_pcie0_0/rx_mwr_cnt/ + $# perf stat -g -e hisi_pcie0_0/rx_mwr_latency/ -e hisi_pcie0_0/rx_mwr_cnt/ + +The current driver does not support sampling. So "perf record" is unsupported. +Also attach to a task is unsupported for PCIe PMU. + +Filter options +-------------- + +1. Target filter +PMU could only monitor the performance of traffic downstream target Root Ports +or downstream target Endpoint. PCIe PMU driver support "port" and "bdf" +interfaces for users, and these two interfaces aren't supported at the same +time. + +-port +"port" filter can be used in all PCIe PMU events, target Root Port can be +selected by configuring the 16-bits-bitmap "port". Multi ports can be selected +for AP-layer-events, and only one port can be selected for TL/DL-layer-events. + +For example, if target Root Port is 0000:00:00.0 (x8 lanes), bit0 of bitmap +should be set, port=0x1; if target Root Port is 0000:00:04.0 (x4 lanes), +bit8 is set, port=0x100; if these two Root Ports are both monitored, port=0x101. + +Example usage of perf:: + + $# perf stat -e hisi_pcie0_0/rx_mwr_latency,port=0x1/ sleep 5 + +-bdf + +"bdf" filter can only be used in bandwidth events, target Endpoint is selected +by configuring BDF to "bdf". Counter only counts the bandwidth of message +requested by target Endpoint. + +For example, "bdf=0x3900" means BDF of target Endpoint is 0000:39:00.0. + +Example usage of perf:: + + $# perf stat -e hisi_pcie0_0/rx_mrd_flux,bdf=0x3900/ sleep 5 + +2. Trigger filter +Event statistics start when the first time TLP length is greater/smaller +than trigger condition. You can set the trigger condition by writing "trig_len", +and set the trigger mode by writing "trig_mode". This filter can only be used +in bandwidth events. + +For example, "trig_len=4" means trigger condition is 2^4 DW, "trig_mode=0" +means statistics start when TLP length > trigger condition, "trig_mode=1" +means start when TLP length < condition. + +Example usage of perf:: + + $# perf stat -e hisi_pcie0_0/rx_mrd_flux,trig_len=0x4,trig_mode=1/ sleep 5 + +3. Threshold filter +Counter counts when TLP length within the specified range. You can set the +threshold by writing "thr_len", and set the threshold mode by writing +"thr_mode". This filter can only be used in bandwidth events. + +For example, "thr_len=4" means threshold is 2^4 DW, "thr_mode=0" means +counter counts when TLP length >= threshold, and "thr_mode=1" means counts +when TLP length < threshold. + +Example usage of perf:: + + $# perf stat -e hisi_pcie0_0/rx_mrd_flux,thr_len=0x4,thr_mode=1/ sleep 5 -- Gitee From 89cceb6732bae5eeffcdea51140fbf7e0d560ece Mon Sep 17 00:00:00 2001 From: Yicong Yang Date: Thu, 17 Nov 2022 16:41:36 +0800 Subject: [PATCH 02/18] drivers/perf: hisi: Add TLP filter support The PMU support to filter the TLP when counting the bandwidth with below options: - only count the TLP headers - only count the TLP payloads - count both TLP headers and payloads In the current driver it's default to count the TLP payloads only, which will have an implicity side effects that on the traffic only have header only TLPs, we'll get no data. Make this user configuration through "len_mode" parameter and make it default to count both TLP headers and payloads when user not specified. Also update the documentation for it. Reviewed-by: Jonathan Cameron Signed-off-by: Yicong Yang Link: https://lore.kernel.org/r/20221117084136.53572-5-yangyicong@huawei.com Signed-off-by: Will Deacon Signed-off-by: Slim6882 --- .../admin-guide/perf/hisi-pcie-pmu.rst | 20 +- drivers/perf/hisilicon/hisi_pcie_pmu.c | 960 ++++++++++++++++++ 2 files changed, 979 insertions(+), 1 deletion(-) create mode 100644 drivers/perf/hisilicon/hisi_pcie_pmu.c diff --git a/Documentation/admin-guide/perf/hisi-pcie-pmu.rst b/Documentation/admin-guide/perf/hisi-pcie-pmu.rst index 294ebbdb22af..f93869db85a8 100644 --- a/Documentation/admin-guide/perf/hisi-pcie-pmu.rst +++ b/Documentation/admin-guide/perf/hisi-pcie-pmu.rst @@ -103,4 +103,22 @@ when TLP length < threshold. Example usage of perf:: - $# perf stat -e hisi_pcie0_0/rx_mrd_flux,thr_len=0x4,thr_mode=1/ sleep 5 + $# perf stat -e hisi_pcie0_core0/rx_mrd_flux,thr_len=0x4,thr_mode=1/ sleep 5 + +4. TLP Length filter + + When counting bandwidth, the data can be composed of certain parts of TLP + packets. You can specify it through "len_mode": + + - 2'b00: Reserved (Do not use this since the behaviour is undefined) + - 2'b01: Bandwidth of TLP payloads + - 2'b10: Bandwidth of TLP headers + - 2'b11: Bandwidth of both TLP payloads and headers + + For example, "len_mode=2" means only counting the bandwidth of TLP headers + and "len_mode=3" means the final bandwidth data is composed of both TLP + headers and payloads. Default value if not specified is 2'b11. + + Example usage of perf:: + + $# perf stat -e hisi_pcie0_core0/rx_mrd_flux,len_mode=0x1/ sleep 5 diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c new file mode 100644 index 000000000000..6fee0b6e163b --- /dev/null +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c @@ -0,0 +1,960 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * This driver adds support for PCIe PMU RCiEP device. Related + * perf events are bandwidth, latency etc. + * + * Copyright (C) 2021 HiSilicon Limited + * Author: Qi Liu + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define DRV_NAME "hisi_pcie_pmu" +/* Define registers */ +#define HISI_PCIE_GLOBAL_CTRL 0x00 +#define HISI_PCIE_EVENT_CTRL 0x010 +#define HISI_PCIE_CNT 0x090 +#define HISI_PCIE_EXT_CNT 0x110 +#define HISI_PCIE_INT_STAT 0x150 +#define HISI_PCIE_INT_MASK 0x154 +#define HISI_PCIE_REG_BDF 0xfe0 +#define HISI_PCIE_REG_VERSION 0xfe4 +#define HISI_PCIE_REG_INFO 0xfe8 + +/* Define command in HISI_PCIE_GLOBAL_CTRL */ +#define HISI_PCIE_GLOBAL_EN 0x01 +#define HISI_PCIE_GLOBAL_NONE 0 + +/* Define command in HISI_PCIE_EVENT_CTRL */ +#define HISI_PCIE_EVENT_EN BIT_ULL(20) +#define HISI_PCIE_RESET_CNT BIT_ULL(22) +#define HISI_PCIE_INIT_SET BIT_ULL(34) +#define HISI_PCIE_THR_EN BIT_ULL(26) +#define HISI_PCIE_TARGET_EN BIT_ULL(32) +#define HISI_PCIE_TRIG_EN BIT_ULL(52) + +/* Define offsets in HISI_PCIE_EVENT_CTRL */ +#define HISI_PCIE_EVENT_M GENMASK_ULL(15, 0) +#define HISI_PCIE_THR_MODE_M GENMASK_ULL(27, 27) +#define HISI_PCIE_THR_M GENMASK_ULL(31, 28) +#define HISI_PCIE_LEN_M GENMASK_ULL(35, 34) +#define HISI_PCIE_TARGET_M GENMASK_ULL(52, 36) +#define HISI_PCIE_TRIG_MODE_M GENMASK_ULL(53, 53) +#define HISI_PCIE_TRIG_M GENMASK_ULL(59, 56) + +/* Default config of TLP length mode, will count both TLP headers and payloads */ +#define HISI_PCIE_LEN_M_DEFAULT 3ULL + +#define HISI_PCIE_MAX_COUNTERS 8 +#define HISI_PCIE_REG_STEP 8 +#define HISI_PCIE_THR_MAX_VAL 10 +#define HISI_PCIE_TRIG_MAX_VAL 10 +#define HISI_PCIE_MAX_PERIOD (GENMASK_ULL(63, 0)) +#define HISI_PCIE_INIT_VAL BIT_ULL(63) + +struct hisi_pcie_pmu { + struct perf_event *hw_events[HISI_PCIE_MAX_COUNTERS]; + struct hlist_node node; + struct pci_dev *pdev; + struct pmu pmu; + void __iomem *base; + int irq; + u32 identifier; + /* Minimum and maximum BDF of root ports monitored by PMU */ + u16 bdf_min; + u16 bdf_max; + int on_cpu; +}; + +struct hisi_pcie_reg_pair { + u16 lo; + u16 hi; +}; + +#define to_pcie_pmu(p) (container_of((p), struct hisi_pcie_pmu, pmu)) +#define GET_PCI_DEVFN(bdf) ((bdf) & 0xff) + +#define HISI_PCIE_PMU_FILTER_ATTR(_name, _config, _hi, _lo) \ + static u64 hisi_pcie_get_##_name(struct perf_event *event) \ + { \ + return FIELD_GET(GENMASK(_hi, _lo), event->attr._config); \ + } \ + +HISI_PCIE_PMU_FILTER_ATTR(event, config, 16, 0); +HISI_PCIE_PMU_FILTER_ATTR(thr_len, config1, 3, 0); +HISI_PCIE_PMU_FILTER_ATTR(thr_mode, config1, 4, 4); +HISI_PCIE_PMU_FILTER_ATTR(trig_len, config1, 8, 5); +HISI_PCIE_PMU_FILTER_ATTR(trig_mode, config1, 9, 9); +HISI_PCIE_PMU_FILTER_ATTR(len_mode, config1, 11, 10); +HISI_PCIE_PMU_FILTER_ATTR(port, config2, 15, 0); +HISI_PCIE_PMU_FILTER_ATTR(bdf, config2, 31, 16); + +static ssize_t hisi_pcie_format_sysfs_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct dev_ext_attribute *eattr; + + eattr = container_of(attr, struct dev_ext_attribute, attr); + + return sysfs_emit(buf, "%s\n", (char *)eattr->var); +} + +static ssize_t hisi_pcie_event_sysfs_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct perf_pmu_events_attr *pmu_attr = + container_of(attr, struct perf_pmu_events_attr, attr); + + return sysfs_emit(buf, "config=0x%llx\n", pmu_attr->id); +} + +#define HISI_PCIE_PMU_FORMAT_ATTR(_name, _format) \ + (&((struct dev_ext_attribute[]){ \ + { .attr = __ATTR(_name, 0444, hisi_pcie_format_sysfs_show, \ + NULL), \ + .var = (void *)_format } \ + })[0].attr.attr) + +#define HISI_PCIE_PMU_EVENT_ATTR(_name, _id) \ + PMU_EVENT_ATTR_ID(_name, hisi_pcie_event_sysfs_show, _id) + +static ssize_t cpumask_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(dev_get_drvdata(dev)); + + return cpumap_print_to_pagebuf(true, buf, cpumask_of(pcie_pmu->on_cpu)); +} +static DEVICE_ATTR_RO(cpumask); + +static ssize_t identifier_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(dev_get_drvdata(dev)); + + return sysfs_emit(buf, "%#x\n", pcie_pmu->identifier); +} +static DEVICE_ATTR_RO(identifier); + +static ssize_t bus_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(dev_get_drvdata(dev)); + + return sysfs_emit(buf, "%#04x\n", PCI_BUS_NUM(pcie_pmu->bdf_min)); +} +static DEVICE_ATTR_RO(bus); + +static struct hisi_pcie_reg_pair +hisi_pcie_parse_reg_value(struct hisi_pcie_pmu *pcie_pmu, u32 reg_off) +{ + u32 val = readl_relaxed(pcie_pmu->base + reg_off); + struct hisi_pcie_reg_pair regs = { + .lo = val, + .hi = val >> 16, + }; + + return regs; +} + +/* + * Hardware counter and ext_counter work together for bandwidth, latency, bus + * utilization and buffer occupancy events. For example, RX memory write latency + * events(index = 0x0010), counter counts total delay cycles and ext_counter + * counts RX memory write PCIe packets number. + * + * As we don't want PMU driver to process these two data, "delay cycles" can + * be treated as an independent event(index = 0x0010), "RX memory write packets + * number" as another(index = 0x10010). BIT 16 is used to distinguish and 0-15 + * bits are "real" event index, which can be used to set HISI_PCIE_EVENT_CTRL. + */ +#define EXT_COUNTER_IS_USED(idx) ((idx) & BIT(16)) + +static u32 hisi_pcie_get_real_event(struct perf_event *event) +{ + return hisi_pcie_get_event(event) & GENMASK(15, 0); +} + +static u32 hisi_pcie_pmu_get_offset(u32 offset, u32 idx) +{ + return offset + HISI_PCIE_REG_STEP * idx; +} + +static u32 hisi_pcie_pmu_readl(struct hisi_pcie_pmu *pcie_pmu, u32 reg_offset, + u32 idx) +{ + u32 offset = hisi_pcie_pmu_get_offset(reg_offset, idx); + + return readl_relaxed(pcie_pmu->base + offset); +} + +static void hisi_pcie_pmu_writel(struct hisi_pcie_pmu *pcie_pmu, u32 reg_offset, u32 idx, u32 val) +{ + u32 offset = hisi_pcie_pmu_get_offset(reg_offset, idx); + + writel_relaxed(val, pcie_pmu->base + offset); +} + +static u64 hisi_pcie_pmu_readq(struct hisi_pcie_pmu *pcie_pmu, u32 reg_offset, u32 idx) +{ + u32 offset = hisi_pcie_pmu_get_offset(reg_offset, idx); + + return readq_relaxed(pcie_pmu->base + offset); +} + +static void hisi_pcie_pmu_writeq(struct hisi_pcie_pmu *pcie_pmu, u32 reg_offset, u32 idx, u64 val) +{ + u32 offset = hisi_pcie_pmu_get_offset(reg_offset, idx); + + writeq_relaxed(val, pcie_pmu->base + offset); +} + +static void hisi_pcie_pmu_config_filter(struct perf_event *event) +{ + struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(event->pmu); + struct hw_perf_event *hwc = &event->hw; + u64 port, trig_len, thr_len, len_mode; + u64 reg = HISI_PCIE_INIT_SET; + + /* Config HISI_PCIE_EVENT_CTRL according to event. */ + reg |= FIELD_PREP(HISI_PCIE_EVENT_M, hisi_pcie_get_real_event(event)); + + /* Config HISI_PCIE_EVENT_CTRL according to root port or EP device. */ + port = hisi_pcie_get_port(event); + if (port) + reg |= FIELD_PREP(HISI_PCIE_TARGET_M, port); + else + reg |= HISI_PCIE_TARGET_EN | + FIELD_PREP(HISI_PCIE_TARGET_M, hisi_pcie_get_bdf(event)); + + /* Config HISI_PCIE_EVENT_CTRL according to trigger condition. */ + trig_len = hisi_pcie_get_trig_len(event); + if (trig_len) { + reg |= FIELD_PREP(HISI_PCIE_TRIG_M, trig_len); + reg |= FIELD_PREP(HISI_PCIE_TRIG_MODE_M, hisi_pcie_get_trig_mode(event)); + reg |= HISI_PCIE_TRIG_EN; + } + + /* Config HISI_PCIE_EVENT_CTRL according to threshold condition. */ + thr_len = hisi_pcie_get_thr_len(event); + if (thr_len) { + reg |= FIELD_PREP(HISI_PCIE_THR_M, thr_len); + reg |= FIELD_PREP(HISI_PCIE_THR_MODE_M, hisi_pcie_get_thr_mode(event)); + reg |= HISI_PCIE_THR_EN; + } + + len_mode = hisi_pcie_get_len_mode(event); + if (len_mode) + reg |= FIELD_PREP(HISI_PCIE_LEN_M, len_mode); + else + reg |= FIELD_PREP(HISI_PCIE_LEN_M, HISI_PCIE_LEN_M_DEFAULT); + + hisi_pcie_pmu_writeq(pcie_pmu, HISI_PCIE_EVENT_CTRL, hwc->idx, reg); +} + +static void hisi_pcie_pmu_clear_filter(struct perf_event *event) +{ + struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(event->pmu); + struct hw_perf_event *hwc = &event->hw; + + hisi_pcie_pmu_writeq(pcie_pmu, HISI_PCIE_EVENT_CTRL, hwc->idx, HISI_PCIE_INIT_SET); +} + +static bool hisi_pcie_pmu_valid_requester_id(struct hisi_pcie_pmu *pcie_pmu, u32 bdf) +{ + struct pci_dev *root_port, *pdev; + u16 rp_bdf; + + pdev = pci_get_domain_bus_and_slot(pci_domain_nr(pcie_pmu->pdev->bus), PCI_BUS_NUM(bdf), + GET_PCI_DEVFN(bdf)); + if (!pdev) + return false; + + root_port = pcie_find_root_port(pdev); + if (!root_port) { + pci_dev_put(pdev); + return false; + } + + pci_dev_put(pdev); + rp_bdf = pci_dev_id(root_port); + return rp_bdf >= pcie_pmu->bdf_min && rp_bdf <= pcie_pmu->bdf_max; +} + +static bool hisi_pcie_pmu_valid_filter(struct perf_event *event, + struct hisi_pcie_pmu *pcie_pmu) +{ + u32 requester_id = hisi_pcie_get_bdf(event); + + if (hisi_pcie_get_thr_len(event) > HISI_PCIE_THR_MAX_VAL) + return false; + + if (hisi_pcie_get_trig_len(event) > HISI_PCIE_TRIG_MAX_VAL) + return false; + + if (requester_id) { + if (!hisi_pcie_pmu_valid_requester_id(pcie_pmu, requester_id)) + return false; + } + + return true; +} + +static bool hisi_pcie_pmu_cmp_event(struct perf_event *target, + struct perf_event *event) +{ + return hisi_pcie_get_real_event(target) == hisi_pcie_get_real_event(event); +} + +static bool hisi_pcie_pmu_validate_event_group(struct perf_event *event) +{ + struct perf_event *sibling, *leader = event->group_leader; + struct perf_event *event_group[HISI_PCIE_MAX_COUNTERS]; + int counters = 1; + int num; + + event_group[0] = leader; + if (!is_software_event(leader)) { + if (leader->pmu != event->pmu) + return false; + + if (leader != event && !hisi_pcie_pmu_cmp_event(leader, event)) + event_group[counters++] = event; + } + + for_each_sibling_event(sibling, event->group_leader) { + if (is_software_event(sibling)) + continue; + + if (sibling->pmu != event->pmu) + return false; + + for (num = 0; num < counters; num++) { + if (hisi_pcie_pmu_cmp_event(event_group[num], sibling)) + break; + } + + if (num == counters) + event_group[counters++] = sibling; + } + + return counters <= HISI_PCIE_MAX_COUNTERS; +} + +static int hisi_pcie_pmu_event_init(struct perf_event *event) +{ + struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(event->pmu); + struct hw_perf_event *hwc = &event->hw; + + event->cpu = pcie_pmu->on_cpu; + + if (EXT_COUNTER_IS_USED(hisi_pcie_get_event(event))) + hwc->event_base = HISI_PCIE_EXT_CNT; + else + hwc->event_base = HISI_PCIE_CNT; + + if (event->attr.type != event->pmu->type) + return -ENOENT; + + /* Sampling is not supported. */ + if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK) + return -EOPNOTSUPP; + + if (!hisi_pcie_pmu_valid_filter(event, pcie_pmu)) + return -EINVAL; + + if (!hisi_pcie_pmu_validate_event_group(event)) + return -EINVAL; + + return 0; +} + +static u64 hisi_pcie_pmu_read_counter(struct perf_event *event) +{ + struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(event->pmu); + u32 idx = event->hw.idx; + + return hisi_pcie_pmu_readq(pcie_pmu, event->hw.event_base, idx); +} + +static int hisi_pcie_pmu_find_related_event(struct hisi_pcie_pmu *pcie_pmu, + struct perf_event *event) +{ + struct perf_event *sibling; + int idx; + + for (idx = 0; idx < HISI_PCIE_MAX_COUNTERS; idx++) { + sibling = pcie_pmu->hw_events[idx]; + if (!sibling) + continue; + + if (!hisi_pcie_pmu_cmp_event(sibling, event)) + continue; + + /* Related events must be used in group */ + if (sibling->group_leader == event->group_leader) + return idx; + else + return -EINVAL; + } + + return idx; +} + +static int hisi_pcie_pmu_get_event_idx(struct hisi_pcie_pmu *pcie_pmu) +{ + int idx; + + for (idx = 0; idx < HISI_PCIE_MAX_COUNTERS; idx++) { + if (!pcie_pmu->hw_events[idx]) + return idx; + } + + return -EINVAL; +} + +static void hisi_pcie_pmu_event_update(struct perf_event *event) +{ + struct hw_perf_event *hwc = &event->hw; + u64 new_cnt, prev_cnt, delta; + + do { + prev_cnt = local64_read(&hwc->prev_count); + new_cnt = hisi_pcie_pmu_read_counter(event); + } while (local64_cmpxchg(&hwc->prev_count, prev_cnt, + new_cnt) != prev_cnt); + + delta = (new_cnt - prev_cnt) & HISI_PCIE_MAX_PERIOD; + local64_add(delta, &event->count); +} + +static void hisi_pcie_pmu_read(struct perf_event *event) +{ + hisi_pcie_pmu_event_update(event); +} + +static void hisi_pcie_pmu_set_period(struct perf_event *event) +{ + struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(event->pmu); + struct hw_perf_event *hwc = &event->hw; + int idx = hwc->idx; + + local64_set(&hwc->prev_count, HISI_PCIE_INIT_VAL); + hisi_pcie_pmu_writeq(pcie_pmu, HISI_PCIE_CNT, idx, HISI_PCIE_INIT_VAL); + hisi_pcie_pmu_writeq(pcie_pmu, HISI_PCIE_EXT_CNT, idx, HISI_PCIE_INIT_VAL); +} + +static void hisi_pcie_pmu_enable_counter(struct hisi_pcie_pmu *pcie_pmu, struct hw_perf_event *hwc) +{ + u32 idx = hwc->idx; + u64 val; + + val = hisi_pcie_pmu_readq(pcie_pmu, HISI_PCIE_EVENT_CTRL, idx); + val |= HISI_PCIE_EVENT_EN; + hisi_pcie_pmu_writeq(pcie_pmu, HISI_PCIE_EVENT_CTRL, idx, val); +} + +static void hisi_pcie_pmu_disable_counter(struct hisi_pcie_pmu *pcie_pmu, struct hw_perf_event *hwc) +{ + u32 idx = hwc->idx; + u64 val; + + val = hisi_pcie_pmu_readq(pcie_pmu, HISI_PCIE_EVENT_CTRL, idx); + val &= ~HISI_PCIE_EVENT_EN; + hisi_pcie_pmu_writeq(pcie_pmu, HISI_PCIE_EVENT_CTRL, idx, val); +} + +static void hisi_pcie_pmu_enable_int(struct hisi_pcie_pmu *pcie_pmu, struct hw_perf_event *hwc) +{ + u32 idx = hwc->idx; + + hisi_pcie_pmu_writel(pcie_pmu, HISI_PCIE_INT_MASK, idx, 0); +} + +static void hisi_pcie_pmu_disable_int(struct hisi_pcie_pmu *pcie_pmu, struct hw_perf_event *hwc) +{ + u32 idx = hwc->idx; + + hisi_pcie_pmu_writel(pcie_pmu, HISI_PCIE_INT_MASK, idx, 1); +} + +static void hisi_pcie_pmu_reset_counter(struct hisi_pcie_pmu *pcie_pmu, int idx) +{ + hisi_pcie_pmu_writeq(pcie_pmu, HISI_PCIE_EVENT_CTRL, idx, HISI_PCIE_RESET_CNT); + hisi_pcie_pmu_writeq(pcie_pmu, HISI_PCIE_EVENT_CTRL, idx, HISI_PCIE_INIT_SET); +} + +static void hisi_pcie_pmu_start(struct perf_event *event, int flags) +{ + struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(event->pmu); + struct hw_perf_event *hwc = &event->hw; + int idx = hwc->idx; + u64 prev_cnt; + + if (WARN_ON_ONCE(!(hwc->state & PERF_HES_STOPPED))) + return; + + WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE)); + hwc->state = 0; + + hisi_pcie_pmu_config_filter(event); + hisi_pcie_pmu_enable_counter(pcie_pmu, hwc); + hisi_pcie_pmu_enable_int(pcie_pmu, hwc); + hisi_pcie_pmu_set_period(event); + + if (flags & PERF_EF_RELOAD) { + prev_cnt = local64_read(&hwc->prev_count); + hisi_pcie_pmu_writeq(pcie_pmu, hwc->event_base, idx, prev_cnt); + } + + perf_event_update_userpage(event); +} + +static void hisi_pcie_pmu_stop(struct perf_event *event, int flags) +{ + struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(event->pmu); + struct hw_perf_event *hwc = &event->hw; + + hisi_pcie_pmu_event_update(event); + hisi_pcie_pmu_disable_int(pcie_pmu, hwc); + hisi_pcie_pmu_disable_counter(pcie_pmu, hwc); + hisi_pcie_pmu_clear_filter(event); + WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED); + hwc->state |= PERF_HES_STOPPED; + + if (hwc->state & PERF_HES_UPTODATE) + return; + + hwc->state |= PERF_HES_UPTODATE; +} + +static int hisi_pcie_pmu_add(struct perf_event *event, int flags) +{ + struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(event->pmu); + struct hw_perf_event *hwc = &event->hw; + int idx; + + hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE; + + /* Check all working events to find a related event. */ + idx = hisi_pcie_pmu_find_related_event(pcie_pmu, event); + if (idx < 0) + return idx; + + /* Current event shares an enabled counter with the related event */ + if (idx < HISI_PCIE_MAX_COUNTERS) { + hwc->idx = idx; + goto start_count; + } + + idx = hisi_pcie_pmu_get_event_idx(pcie_pmu); + if (idx < 0) + return idx; + + hwc->idx = idx; + pcie_pmu->hw_events[idx] = event; + /* Reset Counter to avoid previous statistic interference. */ + hisi_pcie_pmu_reset_counter(pcie_pmu, idx); + +start_count: + if (flags & PERF_EF_START) + hisi_pcie_pmu_start(event, PERF_EF_RELOAD); + + return 0; +} + +static void hisi_pcie_pmu_del(struct perf_event *event, int flags) +{ + struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(event->pmu); + struct hw_perf_event *hwc = &event->hw; + + hisi_pcie_pmu_stop(event, PERF_EF_UPDATE); + pcie_pmu->hw_events[hwc->idx] = NULL; + perf_event_update_userpage(event); +} + +static void hisi_pcie_pmu_enable(struct pmu *pmu) +{ + struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(pmu); + int num; + + for (num = 0; num < HISI_PCIE_MAX_COUNTERS; num++) { + if (pcie_pmu->hw_events[num]) + break; + } + + if (num == HISI_PCIE_MAX_COUNTERS) + return; + + writel(HISI_PCIE_GLOBAL_EN, pcie_pmu->base + HISI_PCIE_GLOBAL_CTRL); +} + +static void hisi_pcie_pmu_disable(struct pmu *pmu) +{ + struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(pmu); + + writel(HISI_PCIE_GLOBAL_NONE, pcie_pmu->base + HISI_PCIE_GLOBAL_CTRL); +} + +static irqreturn_t hisi_pcie_pmu_irq(int irq, void *data) +{ + struct hisi_pcie_pmu *pcie_pmu = data; + irqreturn_t ret = IRQ_NONE; + struct perf_event *event; + u32 overflown; + int idx; + + for (idx = 0; idx < HISI_PCIE_MAX_COUNTERS; idx++) { + overflown = hisi_pcie_pmu_readl(pcie_pmu, HISI_PCIE_INT_STAT, idx); + if (!overflown) + continue; + + /* Clear status of interrupt. */ + hisi_pcie_pmu_writel(pcie_pmu, HISI_PCIE_INT_STAT, idx, 1); + event = pcie_pmu->hw_events[idx]; + if (!event) + continue; + + hisi_pcie_pmu_event_update(event); + hisi_pcie_pmu_set_period(event); + ret = IRQ_HANDLED; + } + + return ret; +} + +static int hisi_pcie_pmu_irq_register(struct pci_dev *pdev, struct hisi_pcie_pmu *pcie_pmu) +{ + int irq, ret; + + ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_MSI); + if (ret < 0) { + pci_err(pdev, "Failed to enable MSI vectors: %d\n", ret); + return ret; + } + + irq = pci_irq_vector(pdev, 0); + ret = request_irq(irq, hisi_pcie_pmu_irq, IRQF_NOBALANCING | IRQF_NO_THREAD, DRV_NAME, + pcie_pmu); + if (ret) { + pci_err(pdev, "Failed to register IRQ: %d\n", ret); + pci_free_irq_vectors(pdev); + return ret; + } + + pcie_pmu->irq = irq; + + return 0; +} + +static void hisi_pcie_pmu_irq_unregister(struct pci_dev *pdev, struct hisi_pcie_pmu *pcie_pmu) +{ + free_irq(pcie_pmu->irq, pcie_pmu); + pci_free_irq_vectors(pdev); +} + +static int hisi_pcie_pmu_online_cpu(unsigned int cpu, struct hlist_node *node) +{ + struct hisi_pcie_pmu *pcie_pmu = hlist_entry_safe(node, struct hisi_pcie_pmu, node); + + if (pcie_pmu->on_cpu == -1) { + pcie_pmu->on_cpu = cpu; + WARN_ON(irq_set_affinity(pcie_pmu->irq, cpumask_of(cpu))); + } + + return 0; +} + +static int hisi_pcie_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node) +{ + struct hisi_pcie_pmu *pcie_pmu = hlist_entry_safe(node, struct hisi_pcie_pmu, node); + unsigned int target; + + /* Nothing to do if this CPU doesn't own the PMU */ + if (pcie_pmu->on_cpu != cpu) + return 0; + + pcie_pmu->on_cpu = -1; + /* Choose a new CPU from all online cpus. */ + target = cpumask_first(cpu_online_mask); + if (target >= nr_cpu_ids) { + pci_err(pcie_pmu->pdev, "There is no CPU to set\n"); + return 0; + } + + perf_pmu_migrate_context(&pcie_pmu->pmu, cpu, target); + /* Use this CPU for event counting */ + pcie_pmu->on_cpu = target; + WARN_ON(irq_set_affinity(pcie_pmu->irq, cpumask_of(target))); + + return 0; +} + +static struct attribute *hisi_pcie_pmu_events_attr[] = { + HISI_PCIE_PMU_EVENT_ATTR(rx_mwr_latency, 0x0010), + HISI_PCIE_PMU_EVENT_ATTR(rx_mwr_cnt, 0x10010), + HISI_PCIE_PMU_EVENT_ATTR(rx_mrd_latency, 0x0210), + HISI_PCIE_PMU_EVENT_ATTR(rx_mrd_cnt, 0x10210), + HISI_PCIE_PMU_EVENT_ATTR(tx_mrd_latency, 0x0011), + HISI_PCIE_PMU_EVENT_ATTR(tx_mrd_cnt, 0x10011), + HISI_PCIE_PMU_EVENT_ATTR(rx_mrd_flux, 0x0804), + HISI_PCIE_PMU_EVENT_ATTR(rx_mrd_time, 0x10804), + HISI_PCIE_PMU_EVENT_ATTR(tx_mrd_flux, 0x0405), + HISI_PCIE_PMU_EVENT_ATTR(tx_mrd_time, 0x10405), + NULL +}; + +static struct attribute_group hisi_pcie_pmu_events_group = { + .name = "events", + .attrs = hisi_pcie_pmu_events_attr, +}; + +static struct attribute *hisi_pcie_pmu_format_attr[] = { + HISI_PCIE_PMU_FORMAT_ATTR(event, "config:0-16"), + HISI_PCIE_PMU_FORMAT_ATTR(thr_len, "config1:0-3"), + HISI_PCIE_PMU_FORMAT_ATTR(thr_mode, "config1:4"), + HISI_PCIE_PMU_FORMAT_ATTR(trig_len, "config1:5-8"), + HISI_PCIE_PMU_FORMAT_ATTR(trig_mode, "config1:9"), + HISI_PCIE_PMU_FORMAT_ATTR(len_mode, "config1:10-11"), + HISI_PCIE_PMU_FORMAT_ATTR(port, "config2:0-15"), + HISI_PCIE_PMU_FORMAT_ATTR(bdf, "config2:16-31"), + NULL +}; + +static const struct attribute_group hisi_pcie_pmu_format_group = { + .name = "format", + .attrs = hisi_pcie_pmu_format_attr, +}; + +static struct attribute *hisi_pcie_pmu_bus_attrs[] = { + &dev_attr_bus.attr, + NULL +}; + +static const struct attribute_group hisi_pcie_pmu_bus_attr_group = { + .attrs = hisi_pcie_pmu_bus_attrs, +}; + +static struct attribute *hisi_pcie_pmu_cpumask_attrs[] = { + &dev_attr_cpumask.attr, + NULL +}; + +static const struct attribute_group hisi_pcie_pmu_cpumask_attr_group = { + .attrs = hisi_pcie_pmu_cpumask_attrs, +}; + +static struct attribute *hisi_pcie_pmu_identifier_attrs[] = { + &dev_attr_identifier.attr, + NULL +}; + +static const struct attribute_group hisi_pcie_pmu_identifier_attr_group = { + .attrs = hisi_pcie_pmu_identifier_attrs, +}; + +static const struct attribute_group *hisi_pcie_pmu_attr_groups[] = { + &hisi_pcie_pmu_events_group, + &hisi_pcie_pmu_format_group, + &hisi_pcie_pmu_bus_attr_group, + &hisi_pcie_pmu_cpumask_attr_group, + &hisi_pcie_pmu_identifier_attr_group, + NULL +}; + +static int hisi_pcie_alloc_pmu(struct pci_dev *pdev, struct hisi_pcie_pmu *pcie_pmu) +{ + struct hisi_pcie_reg_pair regs; + u16 sicl_id, core_id; + char *name; + + regs = hisi_pcie_parse_reg_value(pcie_pmu, HISI_PCIE_REG_BDF); + pcie_pmu->bdf_min = regs.lo; + pcie_pmu->bdf_max = regs.hi; + + regs = hisi_pcie_parse_reg_value(pcie_pmu, HISI_PCIE_REG_INFO); + sicl_id = regs.hi; + core_id = regs.lo; + + name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "hisi_pcie%u_core%u", sicl_id, core_id); + if (!name) + return -ENOMEM; + + pcie_pmu->pdev = pdev; + pcie_pmu->on_cpu = -1; + pcie_pmu->identifier = readl(pcie_pmu->base + HISI_PCIE_REG_VERSION); + pcie_pmu->pmu = (struct pmu) { + .name = name, + .module = THIS_MODULE, + .event_init = hisi_pcie_pmu_event_init, + .pmu_enable = hisi_pcie_pmu_enable, + .pmu_disable = hisi_pcie_pmu_disable, + .add = hisi_pcie_pmu_add, + .del = hisi_pcie_pmu_del, + .start = hisi_pcie_pmu_start, + .stop = hisi_pcie_pmu_stop, + .read = hisi_pcie_pmu_read, + .task_ctx_nr = perf_invalid_context, + .attr_groups = hisi_pcie_pmu_attr_groups, + .capabilities = PERF_PMU_CAP_NO_EXCLUDE, + }; + + return 0; +} + +static int hisi_pcie_init_pmu(struct pci_dev *pdev, struct hisi_pcie_pmu *pcie_pmu) +{ + int ret; + + pcie_pmu->base = pci_ioremap_bar(pdev, 2); + if (!pcie_pmu->base) { + pci_err(pdev, "Ioremap failed for pcie_pmu resource\n"); + return -ENOMEM; + } + + ret = hisi_pcie_alloc_pmu(pdev, pcie_pmu); + if (ret) + goto err_iounmap; + + ret = hisi_pcie_pmu_irq_register(pdev, pcie_pmu); + if (ret) + goto err_iounmap; + + ret = cpuhp_state_add_instance(CPUHP_AP_PERF_ARM_HISI_PCIE_PMU_ONLINE, &pcie_pmu->node); + if (ret) { + pci_err(pdev, "Failed to register hotplug: %d\n", ret); + goto err_irq_unregister; + } + + ret = perf_pmu_register(&pcie_pmu->pmu, pcie_pmu->pmu.name, -1); + if (ret) { + pci_err(pdev, "Failed to register PCIe PMU: %d\n", ret); + goto err_hotplug_unregister; + } + + return ret; + +err_hotplug_unregister: + cpuhp_state_remove_instance_nocalls( + CPUHP_AP_PERF_ARM_HISI_PCIE_PMU_ONLINE, &pcie_pmu->node); + +err_irq_unregister: + hisi_pcie_pmu_irq_unregister(pdev, pcie_pmu); + +err_iounmap: + iounmap(pcie_pmu->base); + + return ret; +} + +static void hisi_pcie_uninit_pmu(struct pci_dev *pdev) +{ + struct hisi_pcie_pmu *pcie_pmu = pci_get_drvdata(pdev); + + perf_pmu_unregister(&pcie_pmu->pmu); + cpuhp_state_remove_instance_nocalls( + CPUHP_AP_PERF_ARM_HISI_PCIE_PMU_ONLINE, &pcie_pmu->node); + hisi_pcie_pmu_irq_unregister(pdev, pcie_pmu); + iounmap(pcie_pmu->base); +} + +static int hisi_pcie_init_dev(struct pci_dev *pdev) +{ + int ret; + + ret = pcim_enable_device(pdev); + if (ret) { + pci_err(pdev, "Failed to enable PCI device: %d\n", ret); + return ret; + } + + ret = pcim_iomap_regions(pdev, BIT(2), DRV_NAME); + if (ret < 0) { + pci_err(pdev, "Failed to request PCI mem regions: %d\n", ret); + return ret; + } + + pci_set_master(pdev); + + return 0; +} + +static int hisi_pcie_pmu_probe(struct pci_dev *pdev, const struct pci_device_id *id) +{ + struct hisi_pcie_pmu *pcie_pmu; + int ret; + + pcie_pmu = devm_kzalloc(&pdev->dev, sizeof(*pcie_pmu), GFP_KERNEL); + if (!pcie_pmu) + return -ENOMEM; + + ret = hisi_pcie_init_dev(pdev); + if (ret) + return ret; + + ret = hisi_pcie_init_pmu(pdev, pcie_pmu); + if (ret) + return ret; + + pci_set_drvdata(pdev, pcie_pmu); + + return ret; +} + +static void hisi_pcie_pmu_remove(struct pci_dev *pdev) +{ + hisi_pcie_uninit_pmu(pdev); + pci_set_drvdata(pdev, NULL); +} + +static const struct pci_device_id hisi_pcie_pmu_ids[] = { + { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, 0xa12d) }, + { 0, } +}; +MODULE_DEVICE_TABLE(pci, hisi_pcie_pmu_ids); + +static struct pci_driver hisi_pcie_pmu_driver = { + .name = DRV_NAME, + .id_table = hisi_pcie_pmu_ids, + .probe = hisi_pcie_pmu_probe, + .remove = hisi_pcie_pmu_remove, +}; + +static int __init hisi_pcie_module_init(void) +{ + int ret; + + ret = cpuhp_setup_state_multi(CPUHP_AP_PERF_ARM_HISI_PCIE_PMU_ONLINE, + "AP_PERF_ARM_HISI_PCIE_PMU_ONLINE", + hisi_pcie_pmu_online_cpu, + hisi_pcie_pmu_offline_cpu); + if (ret) { + pr_err("Failed to setup PCIe PMU hotplug: %d\n", ret); + return ret; + } + + ret = pci_register_driver(&hisi_pcie_pmu_driver); + if (ret) + cpuhp_remove_multi_state(CPUHP_AP_PERF_ARM_HISI_PCIE_PMU_ONLINE); + + return ret; +} +module_init(hisi_pcie_module_init); + +static void __exit hisi_pcie_module_exit(void) +{ + pci_unregister_driver(&hisi_pcie_pmu_driver); + cpuhp_remove_multi_state(CPUHP_AP_PERF_ARM_HISI_PCIE_PMU_ONLINE); +} +module_exit(hisi_pcie_module_exit); + +MODULE_DESCRIPTION("HiSilicon PCIe PMU driver"); +MODULE_LICENSE("GPL v2"); +MODULE_AUTHOR("Qi Liu "); -- Gitee From a58afa3be6e9ddbe5b0c3473354a6ed1026a7971 Mon Sep 17 00:00:00 2001 From: Yicong Yang Date: Thu, 17 Nov 2022 16:41:36 +0800 Subject: [PATCH 03/18] drivers/perf: hisi: Add TLP filter support The PMU support to filter the TLP when counting the bandwidth with below options: - only count the TLP headers - only count the TLP payloads - count both TLP headers and payloads In the current driver it's default to count the TLP payloads only, which will have an implicity side effects that on the traffic only have header only TLPs, we'll get no data. Make this user configuration through "len_mode" parameter and make it default to count both TLP headers and payloads when user not specified. Also update the documentation for it. Reviewed-by: Jonathan Cameron Signed-off-by: Yicong Yang Link: https://lore.kernel.org/r/20221117084136.53572-5-yangyicong@huawei.com Signed-off-by: Will Deacon Signed-off-by: Slim6882 --- .../admin-guide/perf/hisi-pcie-pmu.rst | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/Documentation/admin-guide/perf/hisi-pcie-pmu.rst b/Documentation/admin-guide/perf/hisi-pcie-pmu.rst index f93869db85a8..4bccea061323 100644 --- a/Documentation/admin-guide/perf/hisi-pcie-pmu.rst +++ b/Documentation/admin-guide/perf/hisi-pcie-pmu.rst @@ -105,6 +105,24 @@ Example usage of perf:: $# perf stat -e hisi_pcie0_core0/rx_mrd_flux,thr_len=0x4,thr_mode=1/ sleep 5 +4. TLP Length filter + + When counting bandwidth, the data can be composed of certain parts of TLP + packets. You can specify it through "len_mode": + + - 2'b00: Reserved (Do not use this since the behaviour is undefined) + - 2'b01: Bandwidth of TLP payloads + - 2'b10: Bandwidth of TLP headers + - 2'b11: Bandwidth of both TLP payloads and headers + + For example, "len_mode=2" means only counting the bandwidth of TLP headers + and "len_mode=3" means the final bandwidth data is composed of both TLP + headers and payloads. Default value if not specified is 2'b11. + + Example usage of perf:: + + $# perf stat -e hisi_pcie0_core0/rx_mrd_flux,thr_len=0x4,thr_mode=1/ sleep 5 + 4. TLP Length filter When counting bandwidth, the data can be composed of certain parts of TLP -- Gitee From 91cc6c982f53169b8417bf46d2238b50efc3a970 Mon Sep 17 00:00:00 2001 From: Junhao He Date: Thu, 8 Jun 2023 19:43:26 +0800 Subject: [PATCH 04/18] drivers/perf: hisi: Don't migrate perf to the CPU going to teardown The driver needs to migrate the perf context if the current using CPU going to teardown. By the time calling the cpuhp::teardown() callback the cpu_online_mask() hasn't updated yet and still includes the CPU going to teardown. In current driver's implementation we may migrate the context to the teardown CPU and leads to the below calltrace: ... [ 368.104662][ T932] task:cpuhp/0 state:D stack: 0 pid: 15 ppid: 2 flags:0x00000008 [ 368.113699][ T932] Call trace: [ 368.116834][ T932] __switch_to+0x7c/0xbc [ 368.120924][ T932] __schedule+0x338/0x6f0 [ 368.125098][ T932] schedule+0x50/0xe0 [ 368.128926][ T932] schedule_preempt_disabled+0x18/0x24 [ 368.134229][ T932] __mutex_lock.constprop.0+0x1d4/0x5dc [ 368.139617][ T932] __mutex_lock_slowpath+0x1c/0x30 [ 368.144573][ T932] mutex_lock+0x50/0x60 [ 368.148579][ T932] perf_pmu_migrate_context+0x84/0x2b0 [ 368.153884][ T932] hisi_pcie_pmu_offline_cpu+0x90/0xe0 [hisi_pcie_pmu] [ 368.160579][ T932] cpuhp_invoke_callback+0x2a0/0x650 [ 368.165707][ T932] cpuhp_thread_fun+0xe4/0x190 [ 368.170316][ T932] smpboot_thread_fn+0x15c/0x1a0 [ 368.175099][ T932] kthread+0x108/0x13c [ 368.179012][ T932] ret_from_fork+0x10/0x18 ... Use function cpumask_any_but() to find one correct active cpu to fixes this issue. Fixes: 8404b0fbc7fb ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU") Signed-off-by: Junhao He Reviewed-by: Jonathan Cameron Reviewed-by: Yicong Yang Acked-by: Mark Rutland Link: https://lore.kernel.org/r/20230608114326.27649-1-hejunhao3@huawei.com Signed-off-by: Will Deacon Signed-off-by: Slim6882 --- drivers/perf/hisilicon/hisi_pcie_pmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c index 6fee0b6e163b..e10fc7cb9493 100644 --- a/drivers/perf/hisilicon/hisi_pcie_pmu.c +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c @@ -683,7 +683,7 @@ static int hisi_pcie_pmu_offline_cpu(unsigned int cpu, struct hlist_node *node) pcie_pmu->on_cpu = -1; /* Choose a new CPU from all online cpus. */ - target = cpumask_first(cpu_online_mask); + target = cpumask_any_but(cpu_online_mask, cpu); if (target >= nr_cpu_ids) { pci_err(pcie_pmu->pdev, "There is no CPU to set\n"); return 0; -- Gitee From 0d3e4dcfc5f75b807b17c417f84383c2e95fc794 Mon Sep 17 00:00:00 2001 From: Yicong Yang Date: Tue, 16 Aug 2022 19:44:10 +0800 Subject: [PATCH 05/18] iommu/arm-smmu-v3: Make default domain type of HiSilicon PTT device to identity The DMA operations of HiSilicon PTT device can only work properly with identical mappings. So add a quirk for the device to force the domain as passthrough. Acked-by: Will Deacon Signed-off-by: Yicong Yang Reviewed-by: John Garry Link: https://lore.kernel.org/r/20220816114414.4092-2-yangyicong@huawei.com Signed-off-by: Mathieu Poirier Signed-off-by: Slim6882 --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3934 +++++++++++++++++++ 1 file changed, 3934 insertions(+) create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c new file mode 100644 index 000000000000..71f7edded9cf --- /dev/null +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -0,0 +1,3934 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * IOMMU API for ARM architected SMMUv3 implementations. + * + * Copyright (C) 2015 ARM Limited + * + * Author: Will Deacon + * + * This driver is powered by bad coffee and bombay mix. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#include "arm-smmu-v3.h" +#include "../../iommu-sva-lib.h" + +static bool disable_bypass = true; +module_param(disable_bypass, bool, 0444); +MODULE_PARM_DESC(disable_bypass, + "Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU."); + +static bool disable_msipolling; +module_param(disable_msipolling, bool, 0444); +MODULE_PARM_DESC(disable_msipolling, + "Disable MSI-based polling for CMD_SYNC completion."); + +enum arm_smmu_msi_index { + EVTQ_MSI_INDEX, + GERROR_MSI_INDEX, + PRIQ_MSI_INDEX, + ARM_SMMU_MAX_MSIS, +}; + +static phys_addr_t arm_smmu_msi_cfg[ARM_SMMU_MAX_MSIS][3] = { + [EVTQ_MSI_INDEX] = { + ARM_SMMU_EVTQ_IRQ_CFG0, + ARM_SMMU_EVTQ_IRQ_CFG1, + ARM_SMMU_EVTQ_IRQ_CFG2, + }, + [GERROR_MSI_INDEX] = { + ARM_SMMU_GERROR_IRQ_CFG0, + ARM_SMMU_GERROR_IRQ_CFG1, + ARM_SMMU_GERROR_IRQ_CFG2, + }, + [PRIQ_MSI_INDEX] = { + ARM_SMMU_PRIQ_IRQ_CFG0, + ARM_SMMU_PRIQ_IRQ_CFG1, + ARM_SMMU_PRIQ_IRQ_CFG2, + }, +}; + +struct arm_smmu_option_prop { + u32 opt; + const char *prop; +}; + +DEFINE_XARRAY_ALLOC1(arm_smmu_asid_xa); +DEFINE_MUTEX(arm_smmu_asid_lock); + +/* + * Special value used by SVA when a process dies, to quiesce a CD without + * disabling it. + */ +struct arm_smmu_ctx_desc quiet_cd = { 0 }; + +static struct arm_smmu_option_prop arm_smmu_options[] = { + { ARM_SMMU_OPT_SKIP_PREFETCH, "hisilicon,broken-prefetch-cmd" }, + { ARM_SMMU_OPT_PAGE0_REGS_ONLY, "cavium,cn9900-broken-page1-regspace"}, + { 0, NULL}, +}; + +static void parse_driver_options(struct arm_smmu_device *smmu) +{ + int i = 0; + + do { + if (of_property_read_bool(smmu->dev->of_node, + arm_smmu_options[i].prop)) { + smmu->options |= arm_smmu_options[i].opt; + dev_notice(smmu->dev, "option %s\n", + arm_smmu_options[i].prop); + } + } while (arm_smmu_options[++i].opt); +} + +/* Low-level queue manipulation functions */ +static bool queue_has_space(struct arm_smmu_ll_queue *q, u32 n) +{ + u32 space, prod, cons; + + prod = Q_IDX(q, q->prod); + cons = Q_IDX(q, q->cons); + + if (Q_WRP(q, q->prod) == Q_WRP(q, q->cons)) + space = (1 << q->max_n_shift) - (prod - cons); + else + space = cons - prod; + + return space >= n; +} + +static bool queue_full(struct arm_smmu_ll_queue *q) +{ + return Q_IDX(q, q->prod) == Q_IDX(q, q->cons) && + Q_WRP(q, q->prod) != Q_WRP(q, q->cons); +} + +static bool queue_empty(struct arm_smmu_ll_queue *q) +{ + return Q_IDX(q, q->prod) == Q_IDX(q, q->cons) && + Q_WRP(q, q->prod) == Q_WRP(q, q->cons); +} + +static bool queue_consumed(struct arm_smmu_ll_queue *q, u32 prod) +{ + return ((Q_WRP(q, q->cons) == Q_WRP(q, prod)) && + (Q_IDX(q, q->cons) > Q_IDX(q, prod))) || + ((Q_WRP(q, q->cons) != Q_WRP(q, prod)) && + (Q_IDX(q, q->cons) <= Q_IDX(q, prod))); +} + +static void queue_sync_cons_out(struct arm_smmu_queue *q) +{ + /* + * Ensure that all CPU accesses (reads and writes) to the queue + * are complete before we update the cons pointer. + */ + __iomb(); + writel_relaxed(q->llq.cons, q->cons_reg); +} + +static void queue_inc_cons(struct arm_smmu_ll_queue *q) +{ + u32 cons = (Q_WRP(q, q->cons) | Q_IDX(q, q->cons)) + 1; + q->cons = Q_OVF(q->cons) | Q_WRP(q, cons) | Q_IDX(q, cons); +} + +static int queue_sync_prod_in(struct arm_smmu_queue *q) +{ + u32 prod; + int ret = 0; + + /* + * We can't use the _relaxed() variant here, as we must prevent + * speculative reads of the queue before we have determined that + * prod has indeed moved. + */ + prod = readl(q->prod_reg); + + if (Q_OVF(prod) != Q_OVF(q->llq.prod)) + ret = -EOVERFLOW; + + q->llq.prod = prod; + return ret; +} + +static u32 queue_inc_prod_n(struct arm_smmu_ll_queue *q, int n) +{ + u32 prod = (Q_WRP(q, q->prod) | Q_IDX(q, q->prod)) + n; + return Q_OVF(q->prod) | Q_WRP(q, prod) | Q_IDX(q, prod); +} + +static void queue_poll_init(struct arm_smmu_device *smmu, + struct arm_smmu_queue_poll *qp) +{ + qp->delay = 1; + qp->spin_cnt = 0; + qp->wfe = !!(smmu->features & ARM_SMMU_FEAT_SEV); + qp->timeout = ktime_add_us(ktime_get(), ARM_SMMU_POLL_TIMEOUT_US); +} + +static int queue_poll(struct arm_smmu_queue_poll *qp) +{ + if (ktime_compare(ktime_get(), qp->timeout) > 0) + return -ETIMEDOUT; + + if (qp->wfe) { + wfe(); + } else if (++qp->spin_cnt < ARM_SMMU_POLL_SPIN_COUNT) { + cpu_relax(); + } else { + udelay(qp->delay); + qp->delay *= 2; + qp->spin_cnt = 0; + } + + return 0; +} + +static void queue_write(__le64 *dst, u64 *src, size_t n_dwords) +{ + int i; + + for (i = 0; i < n_dwords; ++i) + *dst++ = cpu_to_le64(*src++); +} + +static void queue_read(u64 *dst, __le64 *src, size_t n_dwords) +{ + int i; + + for (i = 0; i < n_dwords; ++i) + *dst++ = le64_to_cpu(*src++); +} + +static int queue_remove_raw(struct arm_smmu_queue *q, u64 *ent) +{ + if (queue_empty(&q->llq)) + return -EAGAIN; + + queue_read(ent, Q_ENT(q, q->llq.cons), q->ent_dwords); + queue_inc_cons(&q->llq); + queue_sync_cons_out(q); + return 0; +} + +/* High-level queue accessors */ +static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent) +{ + memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT); + cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode); + + switch (ent->opcode) { + case CMDQ_OP_TLBI_EL2_ALL: + case CMDQ_OP_TLBI_NSNH_ALL: + break; + case CMDQ_OP_PREFETCH_CFG: + cmd[0] |= FIELD_PREP(CMDQ_PREFETCH_0_SID, ent->prefetch.sid); + break; + case CMDQ_OP_CFGI_CD: + cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SSID, ent->cfgi.ssid); + fallthrough; + case CMDQ_OP_CFGI_STE: + cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, ent->cfgi.sid); + cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_LEAF, ent->cfgi.leaf); + break; + case CMDQ_OP_CFGI_CD_ALL: + cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, ent->cfgi.sid); + break; + case CMDQ_OP_CFGI_ALL: + /* Cover the entire SID range */ + cmd[1] |= FIELD_PREP(CMDQ_CFGI_1_RANGE, 31); + break; + case CMDQ_OP_TLBI_NH_VA: + cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid); + fallthrough; + case CMDQ_OP_TLBI_EL2_VA: + cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_NUM, ent->tlbi.num); + cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_SCALE, ent->tlbi.scale); + cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid); + cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_LEAF, ent->tlbi.leaf); + cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_TTL, ent->tlbi.ttl); + cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_TG, ent->tlbi.tg); + cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_VA_MASK; + break; + case CMDQ_OP_TLBI_S2_IPA: + cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_NUM, ent->tlbi.num); + cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_SCALE, ent->tlbi.scale); + cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid); + cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_LEAF, ent->tlbi.leaf); + cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_TTL, ent->tlbi.ttl); + cmd[1] |= FIELD_PREP(CMDQ_TLBI_1_TG, ent->tlbi.tg); + cmd[1] |= ent->tlbi.addr & CMDQ_TLBI_1_IPA_MASK; + break; + case CMDQ_OP_TLBI_NH_ASID: + cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid); + fallthrough; + case CMDQ_OP_TLBI_S12_VMALL: + cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid); + break; + case CMDQ_OP_TLBI_EL2_ASID: + cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid); + break; + case CMDQ_OP_ATC_INV: + cmd[0] |= FIELD_PREP(CMDQ_0_SSV, ent->substream_valid); + cmd[0] |= FIELD_PREP(CMDQ_ATC_0_GLOBAL, ent->atc.global); + cmd[0] |= FIELD_PREP(CMDQ_ATC_0_SSID, ent->atc.ssid); + cmd[0] |= FIELD_PREP(CMDQ_ATC_0_SID, ent->atc.sid); + cmd[1] |= FIELD_PREP(CMDQ_ATC_1_SIZE, ent->atc.size); + cmd[1] |= ent->atc.addr & CMDQ_ATC_1_ADDR_MASK; + break; + case CMDQ_OP_PRI_RESP: + cmd[0] |= FIELD_PREP(CMDQ_0_SSV, ent->substream_valid); + cmd[0] |= FIELD_PREP(CMDQ_PRI_0_SSID, ent->pri.ssid); + cmd[0] |= FIELD_PREP(CMDQ_PRI_0_SID, ent->pri.sid); + cmd[1] |= FIELD_PREP(CMDQ_PRI_1_GRPID, ent->pri.grpid); + switch (ent->pri.resp) { + case PRI_RESP_DENY: + case PRI_RESP_FAIL: + case PRI_RESP_SUCC: + break; + default: + return -EINVAL; + } + cmd[1] |= FIELD_PREP(CMDQ_PRI_1_RESP, ent->pri.resp); + break; + case CMDQ_OP_RESUME: + cmd[0] |= FIELD_PREP(CMDQ_RESUME_0_SID, ent->resume.sid); + cmd[0] |= FIELD_PREP(CMDQ_RESUME_0_RESP, ent->resume.resp); + cmd[1] |= FIELD_PREP(CMDQ_RESUME_1_STAG, ent->resume.stag); + break; + case CMDQ_OP_CMD_SYNC: + if (ent->sync.msiaddr) { + cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_IRQ); + cmd[1] |= ent->sync.msiaddr & CMDQ_SYNC_1_MSIADDR_MASK; + } else { + cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_CS, CMDQ_SYNC_0_CS_SEV); + } + cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSH, ARM_SMMU_SH_ISH); + cmd[0] |= FIELD_PREP(CMDQ_SYNC_0_MSIATTR, ARM_SMMU_MEMATTR_OIWB); + break; + default: + return -ENOENT; + } + + return 0; +} + +static struct arm_smmu_cmdq *arm_smmu_get_cmdq(struct arm_smmu_device *smmu) +{ + return &smmu->cmdq; +} + +static void arm_smmu_cmdq_build_sync_cmd(u64 *cmd, struct arm_smmu_device *smmu, + struct arm_smmu_queue *q, u32 prod) +{ + struct arm_smmu_cmdq_ent ent = { + .opcode = CMDQ_OP_CMD_SYNC, + }; + + /* + * Beware that Hi16xx adds an extra 32 bits of goodness to its MSI + * payload, so the write will zero the entire command on that platform. + */ + if (smmu->options & ARM_SMMU_OPT_MSIPOLL) { + ent.sync.msiaddr = q->base_dma + Q_IDX(&q->llq, prod) * + q->ent_dwords * 8; + } + + arm_smmu_cmdq_build_cmd(cmd, &ent); +} + +static void __arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu, + struct arm_smmu_queue *q) +{ + static const char * const cerror_str[] = { + [CMDQ_ERR_CERROR_NONE_IDX] = "No error", + [CMDQ_ERR_CERROR_ILL_IDX] = "Illegal command", + [CMDQ_ERR_CERROR_ABT_IDX] = "Abort on command fetch", + [CMDQ_ERR_CERROR_ATC_INV_IDX] = "ATC invalidate timeout", + }; + + int i; + u64 cmd[CMDQ_ENT_DWORDS]; + u32 cons = readl_relaxed(q->cons_reg); + u32 idx = FIELD_GET(CMDQ_CONS_ERR, cons); + struct arm_smmu_cmdq_ent cmd_sync = { + .opcode = CMDQ_OP_CMD_SYNC, + }; + + dev_err(smmu->dev, "CMDQ error (cons 0x%08x): %s\n", cons, + idx < ARRAY_SIZE(cerror_str) ? cerror_str[idx] : "Unknown"); + + switch (idx) { + case CMDQ_ERR_CERROR_ABT_IDX: + dev_err(smmu->dev, "retrying command fetch\n"); + return; + case CMDQ_ERR_CERROR_NONE_IDX: + return; + case CMDQ_ERR_CERROR_ATC_INV_IDX: + /* + * ATC Invalidation Completion timeout. CONS is still pointing + * at the CMD_SYNC. Attempt to complete other pending commands + * by repeating the CMD_SYNC, though we might well end up back + * here since the ATC invalidation may still be pending. + */ + return; + case CMDQ_ERR_CERROR_ILL_IDX: + default: + break; + } + + /* + * We may have concurrent producers, so we need to be careful + * not to touch any of the shadow cmdq state. + */ + queue_read(cmd, Q_ENT(q, cons), q->ent_dwords); + dev_err(smmu->dev, "skipping command in error state:\n"); + for (i = 0; i < ARRAY_SIZE(cmd); ++i) + dev_err(smmu->dev, "\t0x%016llx\n", (unsigned long long)cmd[i]); + + /* Convert the erroneous command into a CMD_SYNC */ + arm_smmu_cmdq_build_cmd(cmd, &cmd_sync); + + queue_write(Q_ENT(q, cons), cmd, q->ent_dwords); +} + +static void arm_smmu_cmdq_skip_err(struct arm_smmu_device *smmu) +{ + __arm_smmu_cmdq_skip_err(smmu, &smmu->cmdq.q); +} + +/* + * Command queue locking. + * This is a form of bastardised rwlock with the following major changes: + * + * - The only LOCK routines are exclusive_trylock() and shared_lock(). + * Neither have barrier semantics, and instead provide only a control + * dependency. + * + * - The UNLOCK routines are supplemented with shared_tryunlock(), which + * fails if the caller appears to be the last lock holder (yes, this is + * racy). All successful UNLOCK routines have RELEASE semantics. + */ +static void arm_smmu_cmdq_shared_lock(struct arm_smmu_cmdq *cmdq) +{ + int val; + + /* + * We can try to avoid the cmpxchg() loop by simply incrementing the + * lock counter. When held in exclusive state, the lock counter is set + * to INT_MIN so these increments won't hurt as the value will remain + * negative. + */ + if (atomic_fetch_inc_relaxed(&cmdq->lock) >= 0) + return; + + do { + val = atomic_cond_read_relaxed(&cmdq->lock, VAL >= 0); + } while (atomic_cmpxchg_relaxed(&cmdq->lock, val, val + 1) != val); +} + +static void arm_smmu_cmdq_shared_unlock(struct arm_smmu_cmdq *cmdq) +{ + (void)atomic_dec_return_release(&cmdq->lock); +} + +static bool arm_smmu_cmdq_shared_tryunlock(struct arm_smmu_cmdq *cmdq) +{ + if (atomic_read(&cmdq->lock) == 1) + return false; + + arm_smmu_cmdq_shared_unlock(cmdq); + return true; +} + +#define arm_smmu_cmdq_exclusive_trylock_irqsave(cmdq, flags) \ +({ \ + bool __ret; \ + local_irq_save(flags); \ + __ret = !atomic_cmpxchg_relaxed(&cmdq->lock, 0, INT_MIN); \ + if (!__ret) \ + local_irq_restore(flags); \ + __ret; \ +}) + +#define arm_smmu_cmdq_exclusive_unlock_irqrestore(cmdq, flags) \ +({ \ + atomic_set_release(&cmdq->lock, 0); \ + local_irq_restore(flags); \ +}) + + +/* + * Command queue insertion. + * This is made fiddly by our attempts to achieve some sort of scalability + * since there is one queue shared amongst all of the CPUs in the system. If + * you like mixed-size concurrency, dependency ordering and relaxed atomics, + * then you'll *love* this monstrosity. + * + * The basic idea is to split the queue up into ranges of commands that are + * owned by a given CPU; the owner may not have written all of the commands + * itself, but is responsible for advancing the hardware prod pointer when + * the time comes. The algorithm is roughly: + * + * 1. Allocate some space in the queue. At this point we also discover + * whether the head of the queue is currently owned by another CPU, + * or whether we are the owner. + * + * 2. Write our commands into our allocated slots in the queue. + * + * 3. Mark our slots as valid in arm_smmu_cmdq.valid_map. + * + * 4. If we are an owner: + * a. Wait for the previous owner to finish. + * b. Mark the queue head as unowned, which tells us the range + * that we are responsible for publishing. + * c. Wait for all commands in our owned range to become valid. + * d. Advance the hardware prod pointer. + * e. Tell the next owner we've finished. + * + * 5. If we are inserting a CMD_SYNC (we may or may not have been an + * owner), then we need to stick around until it has completed: + * a. If we have MSIs, the SMMU can write back into the CMD_SYNC + * to clear the first 4 bytes. + * b. Otherwise, we spin waiting for the hardware cons pointer to + * advance past our command. + * + * The devil is in the details, particularly the use of locking for handling + * SYNC completion and freeing up space in the queue before we think that it is + * full. + */ +static void __arm_smmu_cmdq_poll_set_valid_map(struct arm_smmu_cmdq *cmdq, + u32 sprod, u32 eprod, bool set) +{ + u32 swidx, sbidx, ewidx, ebidx; + struct arm_smmu_ll_queue llq = { + .max_n_shift = cmdq->q.llq.max_n_shift, + .prod = sprod, + }; + + ewidx = BIT_WORD(Q_IDX(&llq, eprod)); + ebidx = Q_IDX(&llq, eprod) % BITS_PER_LONG; + + while (llq.prod != eprod) { + unsigned long mask; + atomic_long_t *ptr; + u32 limit = BITS_PER_LONG; + + swidx = BIT_WORD(Q_IDX(&llq, llq.prod)); + sbidx = Q_IDX(&llq, llq.prod) % BITS_PER_LONG; + + ptr = &cmdq->valid_map[swidx]; + + if ((swidx == ewidx) && (sbidx < ebidx)) + limit = ebidx; + + mask = GENMASK(limit - 1, sbidx); + + /* + * The valid bit is the inverse of the wrap bit. This means + * that a zero-initialised queue is invalid and, after marking + * all entries as valid, they become invalid again when we + * wrap. + */ + if (set) { + atomic_long_xor(mask, ptr); + } else { /* Poll */ + unsigned long valid; + + valid = (ULONG_MAX + !!Q_WRP(&llq, llq.prod)) & mask; + atomic_long_cond_read_relaxed(ptr, (VAL & mask) == valid); + } + + llq.prod = queue_inc_prod_n(&llq, limit - sbidx); + } +} + +/* Mark all entries in the range [sprod, eprod) as valid */ +static void arm_smmu_cmdq_set_valid_map(struct arm_smmu_cmdq *cmdq, + u32 sprod, u32 eprod) +{ + __arm_smmu_cmdq_poll_set_valid_map(cmdq, sprod, eprod, true); +} + +/* Wait for all entries in the range [sprod, eprod) to become valid */ +static void arm_smmu_cmdq_poll_valid_map(struct arm_smmu_cmdq *cmdq, + u32 sprod, u32 eprod) +{ + __arm_smmu_cmdq_poll_set_valid_map(cmdq, sprod, eprod, false); +} + +/* Wait for the command queue to become non-full */ +static int arm_smmu_cmdq_poll_until_not_full(struct arm_smmu_device *smmu, + struct arm_smmu_ll_queue *llq) +{ + unsigned long flags; + struct arm_smmu_queue_poll qp; + struct arm_smmu_cmdq *cmdq = arm_smmu_get_cmdq(smmu); + int ret = 0; + + /* + * Try to update our copy of cons by grabbing exclusive cmdq access. If + * that fails, spin until somebody else updates it for us. + */ + if (arm_smmu_cmdq_exclusive_trylock_irqsave(cmdq, flags)) { + WRITE_ONCE(cmdq->q.llq.cons, readl_relaxed(cmdq->q.cons_reg)); + arm_smmu_cmdq_exclusive_unlock_irqrestore(cmdq, flags); + llq->val = READ_ONCE(cmdq->q.llq.val); + return 0; + } + + queue_poll_init(smmu, &qp); + do { + llq->val = READ_ONCE(cmdq->q.llq.val); + if (!queue_full(llq)) + break; + + ret = queue_poll(&qp); + } while (!ret); + + return ret; +} + +/* + * Wait until the SMMU signals a CMD_SYNC completion MSI. + * Must be called with the cmdq lock held in some capacity. + */ +static int __arm_smmu_cmdq_poll_until_msi(struct arm_smmu_device *smmu, + struct arm_smmu_ll_queue *llq) +{ + int ret = 0; + struct arm_smmu_queue_poll qp; + struct arm_smmu_cmdq *cmdq = arm_smmu_get_cmdq(smmu); + u32 *cmd = (u32 *)(Q_ENT(&cmdq->q, llq->prod)); + + queue_poll_init(smmu, &qp); + + /* + * The MSI won't generate an event, since it's being written back + * into the command queue. + */ + qp.wfe = false; + smp_cond_load_relaxed(cmd, !VAL || (ret = queue_poll(&qp))); + llq->cons = ret ? llq->prod : queue_inc_prod_n(llq, 1); + return ret; +} + +/* + * Wait until the SMMU cons index passes llq->prod. + * Must be called with the cmdq lock held in some capacity. + */ +static int __arm_smmu_cmdq_poll_until_consumed(struct arm_smmu_device *smmu, + struct arm_smmu_ll_queue *llq) +{ + struct arm_smmu_queue_poll qp; + struct arm_smmu_cmdq *cmdq = arm_smmu_get_cmdq(smmu); + u32 prod = llq->prod; + int ret = 0; + + queue_poll_init(smmu, &qp); + llq->val = READ_ONCE(cmdq->q.llq.val); + do { + if (queue_consumed(llq, prod)) + break; + + ret = queue_poll(&qp); + + /* + * This needs to be a readl() so that our subsequent call + * to arm_smmu_cmdq_shared_tryunlock() can fail accurately. + * + * Specifically, we need to ensure that we observe all + * shared_lock()s by other CMD_SYNCs that share our owner, + * so that a failing call to tryunlock() means that we're + * the last one out and therefore we can safely advance + * cmdq->q.llq.cons. Roughly speaking: + * + * CPU 0 CPU1 CPU2 (us) + * + * if (sync) + * shared_lock(); + * + * dma_wmb(); + * set_valid_map(); + * + * if (owner) { + * poll_valid_map(); + * + * writel(prod_reg); + * + * readl(cons_reg); + * tryunlock(); + * + * Requires us to see CPU 0's shared_lock() acquisition. + */ + llq->cons = readl(cmdq->q.cons_reg); + } while (!ret); + + return ret; +} + +static int arm_smmu_cmdq_poll_until_sync(struct arm_smmu_device *smmu, + struct arm_smmu_ll_queue *llq) +{ + if (smmu->options & ARM_SMMU_OPT_MSIPOLL) + return __arm_smmu_cmdq_poll_until_msi(smmu, llq); + + return __arm_smmu_cmdq_poll_until_consumed(smmu, llq); +} + +static void arm_smmu_cmdq_write_entries(struct arm_smmu_cmdq *cmdq, u64 *cmds, + u32 prod, int n) +{ + int i; + struct arm_smmu_ll_queue llq = { + .max_n_shift = cmdq->q.llq.max_n_shift, + .prod = prod, + }; + + for (i = 0; i < n; ++i) { + u64 *cmd = &cmds[i * CMDQ_ENT_DWORDS]; + + prod = queue_inc_prod_n(&llq, i); + queue_write(Q_ENT(&cmdq->q, prod), cmd, CMDQ_ENT_DWORDS); + } +} + +/* + * This is the actual insertion function, and provides the following + * ordering guarantees to callers: + * + * - There is a dma_wmb() before publishing any commands to the queue. + * This can be relied upon to order prior writes to data structures + * in memory (such as a CD or an STE) before the command. + * + * - On completion of a CMD_SYNC, there is a control dependency. + * This can be relied upon to order subsequent writes to memory (e.g. + * freeing an IOVA) after completion of the CMD_SYNC. + * + * - Command insertion is totally ordered, so if two CPUs each race to + * insert their own list of commands then all of the commands from one + * CPU will appear before any of the commands from the other CPU. + */ +static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, + u64 *cmds, int n, bool sync) +{ + u64 cmd_sync[CMDQ_ENT_DWORDS]; + u32 prod; + unsigned long flags; + bool owner; + struct arm_smmu_cmdq *cmdq = arm_smmu_get_cmdq(smmu); + struct arm_smmu_ll_queue llq, head; + int ret = 0; + + llq.max_n_shift = cmdq->q.llq.max_n_shift; + + /* 1. Allocate some space in the queue */ + local_irq_save(flags); + llq.val = READ_ONCE(cmdq->q.llq.val); + do { + u64 old; + + while (!queue_has_space(&llq, n + sync)) { + local_irq_restore(flags); + if (arm_smmu_cmdq_poll_until_not_full(smmu, &llq)) + dev_err_ratelimited(smmu->dev, "CMDQ timeout\n"); + local_irq_save(flags); + } + + head.cons = llq.cons; + head.prod = queue_inc_prod_n(&llq, n + sync) | + CMDQ_PROD_OWNED_FLAG; + + old = cmpxchg_relaxed(&cmdq->q.llq.val, llq.val, head.val); + if (old == llq.val) + break; + + llq.val = old; + } while (1); + owner = !(llq.prod & CMDQ_PROD_OWNED_FLAG); + head.prod &= ~CMDQ_PROD_OWNED_FLAG; + llq.prod &= ~CMDQ_PROD_OWNED_FLAG; + + /* + * 2. Write our commands into the queue + * Dependency ordering from the cmpxchg() loop above. + */ + arm_smmu_cmdq_write_entries(cmdq, cmds, llq.prod, n); + if (sync) { + prod = queue_inc_prod_n(&llq, n); + arm_smmu_cmdq_build_sync_cmd(cmd_sync, smmu, &cmdq->q, prod); + queue_write(Q_ENT(&cmdq->q, prod), cmd_sync, CMDQ_ENT_DWORDS); + + /* + * In order to determine completion of our CMD_SYNC, we must + * ensure that the queue can't wrap twice without us noticing. + * We achieve that by taking the cmdq lock as shared before + * marking our slot as valid. + */ + arm_smmu_cmdq_shared_lock(cmdq); + } + + /* 3. Mark our slots as valid, ensuring commands are visible first */ + dma_wmb(); + arm_smmu_cmdq_set_valid_map(cmdq, llq.prod, head.prod); + + /* 4. If we are the owner, take control of the SMMU hardware */ + if (owner) { + /* a. Wait for previous owner to finish */ + atomic_cond_read_relaxed(&cmdq->owner_prod, VAL == llq.prod); + + /* b. Stop gathering work by clearing the owned flag */ + prod = atomic_fetch_andnot_relaxed(CMDQ_PROD_OWNED_FLAG, + &cmdq->q.llq.atomic.prod); + prod &= ~CMDQ_PROD_OWNED_FLAG; + + /* + * c. Wait for any gathered work to be written to the queue. + * Note that we read our own entries so that we have the control + * dependency required by (d). + */ + arm_smmu_cmdq_poll_valid_map(cmdq, llq.prod, prod); + + /* + * d. Advance the hardware prod pointer + * Control dependency ordering from the entries becoming valid. + */ + writel_relaxed(prod, cmdq->q.prod_reg); + + /* + * e. Tell the next owner we're done + * Make sure we've updated the hardware first, so that we don't + * race to update prod and potentially move it backwards. + */ + atomic_set_release(&cmdq->owner_prod, prod); + } + + /* 5. If we are inserting a CMD_SYNC, we must wait for it to complete */ + if (sync) { + llq.prod = queue_inc_prod_n(&llq, n); + ret = arm_smmu_cmdq_poll_until_sync(smmu, &llq); + if (ret) { + dev_err_ratelimited(smmu->dev, + "CMD_SYNC timeout at 0x%08x [hwprod 0x%08x, hwcons 0x%08x]\n", + llq.prod, + readl_relaxed(cmdq->q.prod_reg), + readl_relaxed(cmdq->q.cons_reg)); + } + + /* + * Try to unlock the cmdq lock. This will fail if we're the last + * reader, in which case we can safely update cmdq->q.llq.cons + */ + if (!arm_smmu_cmdq_shared_tryunlock(cmdq)) { + WRITE_ONCE(cmdq->q.llq.cons, llq.cons); + arm_smmu_cmdq_shared_unlock(cmdq); + } + } + + local_irq_restore(flags); + return ret; +} + +static int __arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu, + struct arm_smmu_cmdq_ent *ent, + bool sync) +{ + u64 cmd[CMDQ_ENT_DWORDS]; + + if (unlikely(arm_smmu_cmdq_build_cmd(cmd, ent))) { + dev_warn(smmu->dev, "ignoring unknown CMDQ opcode 0x%x\n", + ent->opcode); + return -EINVAL; + } + + return arm_smmu_cmdq_issue_cmdlist(smmu, cmd, 1, sync); +} + +static int arm_smmu_cmdq_issue_cmd(struct arm_smmu_device *smmu, + struct arm_smmu_cmdq_ent *ent) +{ + return __arm_smmu_cmdq_issue_cmd(smmu, ent, false); +} + +static int arm_smmu_cmdq_issue_cmd_with_sync(struct arm_smmu_device *smmu, + struct arm_smmu_cmdq_ent *ent) +{ + return __arm_smmu_cmdq_issue_cmd(smmu, ent, true); +} + +static void arm_smmu_cmdq_batch_add(struct arm_smmu_device *smmu, + struct arm_smmu_cmdq_batch *cmds, + struct arm_smmu_cmdq_ent *cmd) +{ + int index; + + if (cmds->num == CMDQ_BATCH_ENTRIES) { + arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmds, cmds->num, false); + cmds->num = 0; + } + + index = cmds->num * CMDQ_ENT_DWORDS; + if (unlikely(arm_smmu_cmdq_build_cmd(&cmds->cmds[index], cmd))) { + dev_warn(smmu->dev, "ignoring unknown CMDQ opcode 0x%x\n", + cmd->opcode); + return; + } + + cmds->num++; +} + +static int arm_smmu_cmdq_batch_submit(struct arm_smmu_device *smmu, + struct arm_smmu_cmdq_batch *cmds) +{ + return arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmds, cmds->num, true); +} + +static int arm_smmu_page_response(struct device *dev, + struct iommu_fault_event *unused, + struct iommu_page_response *resp) +{ + struct arm_smmu_cmdq_ent cmd = {0}; + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + int sid = master->streams[0].id; + + if (master->stall_enabled) { + cmd.opcode = CMDQ_OP_RESUME; + cmd.resume.sid = sid; + cmd.resume.stag = resp->grpid; + switch (resp->code) { + case IOMMU_PAGE_RESP_INVALID: + case IOMMU_PAGE_RESP_FAILURE: + cmd.resume.resp = CMDQ_RESUME_0_RESP_ABORT; + break; + case IOMMU_PAGE_RESP_SUCCESS: + cmd.resume.resp = CMDQ_RESUME_0_RESP_RETRY; + break; + default: + return -EINVAL; + } + } else { + return -ENODEV; + } + + arm_smmu_cmdq_issue_cmd(master->smmu, &cmd); + /* + * Don't send a SYNC, it doesn't do anything for RESUME or PRI_RESP. + * RESUME consumption guarantees that the stalled transaction will be + * terminated... at some point in the future. PRI_RESP is fire and + * forget. + */ + + return 0; +} + +/* Context descriptor manipulation functions */ +void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid) +{ + struct arm_smmu_cmdq_ent cmd = { + .opcode = smmu->features & ARM_SMMU_FEAT_E2H ? + CMDQ_OP_TLBI_EL2_ASID : CMDQ_OP_TLBI_NH_ASID, + .tlbi.asid = asid, + }; + + arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd); +} + +static void arm_smmu_sync_cd(struct arm_smmu_domain *smmu_domain, + int ssid, bool leaf) +{ + size_t i; + unsigned long flags; + struct arm_smmu_master *master; + struct arm_smmu_cmdq_batch cmds; + struct arm_smmu_device *smmu = smmu_domain->smmu; + struct arm_smmu_cmdq_ent cmd = { + .opcode = CMDQ_OP_CFGI_CD, + .cfgi = { + .ssid = ssid, + .leaf = leaf, + }, + }; + + cmds.num = 0; + + spin_lock_irqsave(&smmu_domain->devices_lock, flags); + list_for_each_entry(master, &smmu_domain->devices, domain_head) { + for (i = 0; i < master->num_streams; i++) { + cmd.cfgi.sid = master->streams[i].id; + arm_smmu_cmdq_batch_add(smmu, &cmds, &cmd); + } + } + spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); + + arm_smmu_cmdq_batch_submit(smmu, &cmds); +} + +static int arm_smmu_alloc_cd_leaf_table(struct arm_smmu_device *smmu, + struct arm_smmu_l1_ctx_desc *l1_desc) +{ + size_t size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3); + + l1_desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, + &l1_desc->l2ptr_dma, GFP_KERNEL); + if (!l1_desc->l2ptr) { + dev_warn(smmu->dev, + "failed to allocate context descriptor table\n"); + return -ENOMEM; + } + return 0; +} + +static void arm_smmu_write_cd_l1_desc(__le64 *dst, + struct arm_smmu_l1_ctx_desc *l1_desc) +{ + u64 val = (l1_desc->l2ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) | + CTXDESC_L1_DESC_V; + + /* See comment in arm_smmu_write_ctx_desc() */ + WRITE_ONCE(*dst, cpu_to_le64(val)); +} + +static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_domain *smmu_domain, + u32 ssid) +{ + __le64 *l1ptr; + unsigned int idx; + struct arm_smmu_l1_ctx_desc *l1_desc; + struct arm_smmu_device *smmu = smmu_domain->smmu; + struct arm_smmu_ctx_desc_cfg *cdcfg = &smmu_domain->s1_cfg.cdcfg; + + if (smmu_domain->s1_cfg.s1fmt == STRTAB_STE_0_S1FMT_LINEAR) + return cdcfg->cdtab + ssid * CTXDESC_CD_DWORDS; + + idx = ssid >> CTXDESC_SPLIT; + l1_desc = &cdcfg->l1_desc[idx]; + if (!l1_desc->l2ptr) { + if (arm_smmu_alloc_cd_leaf_table(smmu, l1_desc)) + return NULL; + + l1ptr = cdcfg->cdtab + idx * CTXDESC_L1_DESC_DWORDS; + arm_smmu_write_cd_l1_desc(l1ptr, l1_desc); + /* An invalid L1CD can be cached */ + arm_smmu_sync_cd(smmu_domain, ssid, false); + } + idx = ssid & (CTXDESC_L2_ENTRIES - 1); + return l1_desc->l2ptr + idx * CTXDESC_CD_DWORDS; +} + +int arm_smmu_write_ctx_desc(struct arm_smmu_domain *smmu_domain, int ssid, + struct arm_smmu_ctx_desc *cd) +{ + /* + * This function handles the following cases: + * + * (1) Install primary CD, for normal DMA traffic (SSID = 0). + * (2) Install a secondary CD, for SID+SSID traffic. + * (3) Update ASID of a CD. Atomically write the first 64 bits of the + * CD, then invalidate the old entry and mappings. + * (4) Quiesce the context without clearing the valid bit. Disable + * translation, and ignore any translation fault. + * (5) Remove a secondary CD. + */ + u64 val; + bool cd_live; + __le64 *cdptr; + + if (WARN_ON(ssid >= (1 << smmu_domain->s1_cfg.s1cdmax))) + return -E2BIG; + + cdptr = arm_smmu_get_cd_ptr(smmu_domain, ssid); + if (!cdptr) + return -ENOMEM; + + val = le64_to_cpu(cdptr[0]); + cd_live = !!(val & CTXDESC_CD_0_V); + + if (!cd) { /* (5) */ + val = 0; + } else if (cd == &quiet_cd) { /* (4) */ + val |= CTXDESC_CD_0_TCR_EPD0; + } else if (cd_live) { /* (3) */ + val &= ~CTXDESC_CD_0_ASID; + val |= FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid); + /* + * Until CD+TLB invalidation, both ASIDs may be used for tagging + * this substream's traffic + */ + } else { /* (1) and (2) */ + cdptr[1] = cpu_to_le64(cd->ttbr & CTXDESC_CD_1_TTB0_MASK); + cdptr[2] = 0; + cdptr[3] = cpu_to_le64(cd->mair); + + /* + * STE is live, and the SMMU might read dwords of this CD in any + * order. Ensure that it observes valid values before reading + * V=1. + */ + arm_smmu_sync_cd(smmu_domain, ssid, true); + + val = cd->tcr | +#ifdef __BIG_ENDIAN + CTXDESC_CD_0_ENDI | +#endif + CTXDESC_CD_0_R | CTXDESC_CD_0_A | + (cd->mm ? 0 : CTXDESC_CD_0_ASET) | + CTXDESC_CD_0_AA64 | + FIELD_PREP(CTXDESC_CD_0_ASID, cd->asid) | + CTXDESC_CD_0_V; + + if (smmu_domain->stall_enabled) + val |= CTXDESC_CD_0_S; + } + + /* + * The SMMU accesses 64-bit values atomically. See IHI0070Ca 3.21.3 + * "Configuration structures and configuration invalidation completion" + * + * The size of single-copy atomic reads made by the SMMU is + * IMPLEMENTATION DEFINED but must be at least 64 bits. Any single + * field within an aligned 64-bit span of a structure can be altered + * without first making the structure invalid. + */ + WRITE_ONCE(cdptr[0], cpu_to_le64(val)); + arm_smmu_sync_cd(smmu_domain, ssid, true); + return 0; +} + +static int arm_smmu_alloc_cd_tables(struct arm_smmu_domain *smmu_domain) +{ + int ret; + size_t l1size; + size_t max_contexts; + struct arm_smmu_device *smmu = smmu_domain->smmu; + struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg; + struct arm_smmu_ctx_desc_cfg *cdcfg = &cfg->cdcfg; + + max_contexts = 1 << cfg->s1cdmax; + + if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB) || + max_contexts <= CTXDESC_L2_ENTRIES) { + cfg->s1fmt = STRTAB_STE_0_S1FMT_LINEAR; + cdcfg->num_l1_ents = max_contexts; + + l1size = max_contexts * (CTXDESC_CD_DWORDS << 3); + } else { + cfg->s1fmt = STRTAB_STE_0_S1FMT_64K_L2; + cdcfg->num_l1_ents = DIV_ROUND_UP(max_contexts, + CTXDESC_L2_ENTRIES); + + cdcfg->l1_desc = devm_kcalloc(smmu->dev, cdcfg->num_l1_ents, + sizeof(*cdcfg->l1_desc), + GFP_KERNEL); + if (!cdcfg->l1_desc) + return -ENOMEM; + + l1size = cdcfg->num_l1_ents * (CTXDESC_L1_DESC_DWORDS << 3); + } + + cdcfg->cdtab = dmam_alloc_coherent(smmu->dev, l1size, &cdcfg->cdtab_dma, + GFP_KERNEL); + if (!cdcfg->cdtab) { + dev_warn(smmu->dev, "failed to allocate context descriptor\n"); + ret = -ENOMEM; + goto err_free_l1; + } + + return 0; + +err_free_l1: + if (cdcfg->l1_desc) { + devm_kfree(smmu->dev, cdcfg->l1_desc); + cdcfg->l1_desc = NULL; + } + return ret; +} + +static void arm_smmu_free_cd_tables(struct arm_smmu_domain *smmu_domain) +{ + int i; + size_t size, l1size; + struct arm_smmu_device *smmu = smmu_domain->smmu; + struct arm_smmu_ctx_desc_cfg *cdcfg = &smmu_domain->s1_cfg.cdcfg; + + if (cdcfg->l1_desc) { + size = CTXDESC_L2_ENTRIES * (CTXDESC_CD_DWORDS << 3); + + for (i = 0; i < cdcfg->num_l1_ents; i++) { + if (!cdcfg->l1_desc[i].l2ptr) + continue; + + dmam_free_coherent(smmu->dev, size, + cdcfg->l1_desc[i].l2ptr, + cdcfg->l1_desc[i].l2ptr_dma); + } + devm_kfree(smmu->dev, cdcfg->l1_desc); + cdcfg->l1_desc = NULL; + + l1size = cdcfg->num_l1_ents * (CTXDESC_L1_DESC_DWORDS << 3); + } else { + l1size = cdcfg->num_l1_ents * (CTXDESC_CD_DWORDS << 3); + } + + dmam_free_coherent(smmu->dev, l1size, cdcfg->cdtab, cdcfg->cdtab_dma); + cdcfg->cdtab_dma = 0; + cdcfg->cdtab = NULL; +} + +bool arm_smmu_free_asid(struct arm_smmu_ctx_desc *cd) +{ + bool free; + struct arm_smmu_ctx_desc *old_cd; + + if (!cd->asid) + return false; + + free = refcount_dec_and_test(&cd->refs); + if (free) { + old_cd = xa_erase(&arm_smmu_asid_xa, cd->asid); + WARN_ON(old_cd != cd); + } + return free; +} + +/* Stream table manipulation functions */ +static void +arm_smmu_write_strtab_l1_desc(__le64 *dst, struct arm_smmu_strtab_l1_desc *desc) +{ + u64 val = 0; + + val |= FIELD_PREP(STRTAB_L1_DESC_SPAN, desc->span); + val |= desc->l2ptr_dma & STRTAB_L1_DESC_L2PTR_MASK; + + /* See comment in arm_smmu_write_ctx_desc() */ + WRITE_ONCE(*dst, cpu_to_le64(val)); +} + +static void arm_smmu_sync_ste_for_sid(struct arm_smmu_device *smmu, u32 sid) +{ + struct arm_smmu_cmdq_ent cmd = { + .opcode = CMDQ_OP_CFGI_STE, + .cfgi = { + .sid = sid, + .leaf = true, + }, + }; + + arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd); +} + +static void arm_smmu_write_strtab_ent(struct arm_smmu_master *master, u32 sid, + __le64 *dst) +{ + /* + * This is hideously complicated, but we only really care about + * three cases at the moment: + * + * 1. Invalid (all zero) -> bypass/fault (init) + * 2. Bypass/fault -> translation/bypass (attach) + * 3. Translation/bypass -> bypass/fault (detach) + * + * Given that we can't update the STE atomically and the SMMU + * doesn't read the thing in a defined order, that leaves us + * with the following maintenance requirements: + * + * 1. Update Config, return (init time STEs aren't live) + * 2. Write everything apart from dword 0, sync, write dword 0, sync + * 3. Update Config, sync + */ + u64 val = le64_to_cpu(dst[0]); + bool ste_live = false; + struct arm_smmu_device *smmu = NULL; + struct arm_smmu_s1_cfg *s1_cfg = NULL; + struct arm_smmu_s2_cfg *s2_cfg = NULL; + struct arm_smmu_domain *smmu_domain = NULL; + struct arm_smmu_cmdq_ent prefetch_cmd = { + .opcode = CMDQ_OP_PREFETCH_CFG, + .prefetch = { + .sid = sid, + }, + }; + + if (master) { + smmu_domain = master->domain; + smmu = master->smmu; + } + + if (smmu_domain) { + switch (smmu_domain->stage) { + case ARM_SMMU_DOMAIN_S1: + s1_cfg = &smmu_domain->s1_cfg; + break; + case ARM_SMMU_DOMAIN_S2: + case ARM_SMMU_DOMAIN_NESTED: + s2_cfg = &smmu_domain->s2_cfg; + break; + default: + break; + } + } + + if (val & STRTAB_STE_0_V) { + switch (FIELD_GET(STRTAB_STE_0_CFG, val)) { + case STRTAB_STE_0_CFG_BYPASS: + break; + case STRTAB_STE_0_CFG_S1_TRANS: + case STRTAB_STE_0_CFG_S2_TRANS: + ste_live = true; + break; + case STRTAB_STE_0_CFG_ABORT: + BUG_ON(!disable_bypass); + break; + default: + BUG(); /* STE corruption */ + } + } + + /* Nuke the existing STE_0 value, as we're going to rewrite it */ + val = STRTAB_STE_0_V; + + /* Bypass/fault */ + if (!smmu_domain || !(s1_cfg || s2_cfg)) { + if (!smmu_domain && disable_bypass) + val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT); + else + val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS); + + dst[0] = cpu_to_le64(val); + dst[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG, + STRTAB_STE_1_SHCFG_INCOMING)); + dst[2] = 0; /* Nuke the VMID */ + /* + * The SMMU can perform negative caching, so we must sync + * the STE regardless of whether the old value was live. + */ + if (smmu) + arm_smmu_sync_ste_for_sid(smmu, sid); + return; + } + + if (s1_cfg) { + u64 strw = smmu->features & ARM_SMMU_FEAT_E2H ? + STRTAB_STE_1_STRW_EL2 : STRTAB_STE_1_STRW_NSEL1; + + BUG_ON(ste_live); + dst[1] = cpu_to_le64( + FIELD_PREP(STRTAB_STE_1_S1DSS, STRTAB_STE_1_S1DSS_SSID0) | + FIELD_PREP(STRTAB_STE_1_S1CIR, STRTAB_STE_1_S1C_CACHE_WBRA) | + FIELD_PREP(STRTAB_STE_1_S1COR, STRTAB_STE_1_S1C_CACHE_WBRA) | + FIELD_PREP(STRTAB_STE_1_S1CSH, ARM_SMMU_SH_ISH) | + FIELD_PREP(STRTAB_STE_1_STRW, strw)); + + if (smmu->features & ARM_SMMU_FEAT_STALLS && + !master->stall_enabled) + dst[1] |= cpu_to_le64(STRTAB_STE_1_S1STALLD); + + val |= (s1_cfg->cdcfg.cdtab_dma & STRTAB_STE_0_S1CTXPTR_MASK) | + FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S1_TRANS) | + FIELD_PREP(STRTAB_STE_0_S1CDMAX, s1_cfg->s1cdmax) | + FIELD_PREP(STRTAB_STE_0_S1FMT, s1_cfg->s1fmt); + } + + if (s2_cfg) { + BUG_ON(ste_live); + dst[2] = cpu_to_le64( + FIELD_PREP(STRTAB_STE_2_S2VMID, s2_cfg->vmid) | + FIELD_PREP(STRTAB_STE_2_VTCR, s2_cfg->vtcr) | +#ifdef __BIG_ENDIAN + STRTAB_STE_2_S2ENDI | +#endif + STRTAB_STE_2_S2PTW | STRTAB_STE_2_S2AA64 | + STRTAB_STE_2_S2R); + + dst[3] = cpu_to_le64(s2_cfg->vttbr & STRTAB_STE_3_S2TTB_MASK); + + val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_S2_TRANS); + } + + if (master->ats_enabled) + dst[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_EATS, + STRTAB_STE_1_EATS_TRANS)); + + arm_smmu_sync_ste_for_sid(smmu, sid); + /* See comment in arm_smmu_write_ctx_desc() */ + WRITE_ONCE(dst[0], cpu_to_le64(val)); + arm_smmu_sync_ste_for_sid(smmu, sid); + + /* It's likely that we'll want to use the new STE soon */ + if (!(smmu->options & ARM_SMMU_OPT_SKIP_PREFETCH)) + arm_smmu_cmdq_issue_cmd(smmu, &prefetch_cmd); +} + +static void arm_smmu_init_bypass_stes(__le64 *strtab, unsigned int nent, bool force) +{ + unsigned int i; + u64 val = STRTAB_STE_0_V; + + if (disable_bypass && !force) + val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_ABORT); + else + val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS); + + for (i = 0; i < nent; ++i) { + strtab[0] = cpu_to_le64(val); + strtab[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG, + STRTAB_STE_1_SHCFG_INCOMING)); + strtab[2] = 0; + strtab += STRTAB_STE_DWORDS; + } +} + +static int arm_smmu_init_l2_strtab(struct arm_smmu_device *smmu, u32 sid) +{ + size_t size; + void *strtab; + struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; + struct arm_smmu_strtab_l1_desc *desc = &cfg->l1_desc[sid >> STRTAB_SPLIT]; + + if (desc->l2ptr) + return 0; + + size = 1 << (STRTAB_SPLIT + ilog2(STRTAB_STE_DWORDS) + 3); + strtab = &cfg->strtab[(sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS]; + + desc->span = STRTAB_SPLIT + 1; + desc->l2ptr = dmam_alloc_coherent(smmu->dev, size, &desc->l2ptr_dma, + GFP_KERNEL); + if (!desc->l2ptr) { + dev_err(smmu->dev, + "failed to allocate l2 stream table for SID %u\n", + sid); + return -ENOMEM; + } + + arm_smmu_init_bypass_stes(desc->l2ptr, 1 << STRTAB_SPLIT, false); + arm_smmu_write_strtab_l1_desc(strtab, desc); + return 0; +} + +static struct arm_smmu_master * +arm_smmu_find_master(struct arm_smmu_device *smmu, u32 sid) +{ + struct rb_node *node; + struct arm_smmu_stream *stream; + + lockdep_assert_held(&smmu->streams_mutex); + + node = smmu->streams.rb_node; + while (node) { + stream = rb_entry(node, struct arm_smmu_stream, node); + if (stream->id < sid) + node = node->rb_right; + else if (stream->id > sid) + node = node->rb_left; + else + return stream->master; + } + + return NULL; +} + +/* IRQ and event handlers */ +static int arm_smmu_handle_evt(struct arm_smmu_device *smmu, u64 *evt) +{ + int ret; + u32 reason; + u32 perm = 0; + struct arm_smmu_master *master; + bool ssid_valid = evt[0] & EVTQ_0_SSV; + u32 sid = FIELD_GET(EVTQ_0_SID, evt[0]); + struct iommu_fault_event fault_evt = { }; + struct iommu_fault *flt = &fault_evt.fault; + + switch (FIELD_GET(EVTQ_0_ID, evt[0])) { + case EVT_ID_TRANSLATION_FAULT: + reason = IOMMU_FAULT_REASON_PTE_FETCH; + break; + case EVT_ID_ADDR_SIZE_FAULT: + reason = IOMMU_FAULT_REASON_OOR_ADDRESS; + break; + case EVT_ID_ACCESS_FAULT: + reason = IOMMU_FAULT_REASON_ACCESS; + break; + case EVT_ID_PERMISSION_FAULT: + reason = IOMMU_FAULT_REASON_PERMISSION; + break; + default: + return -EOPNOTSUPP; + } + + /* Stage-2 is always pinned at the moment */ + if (evt[1] & EVTQ_1_S2) + return -EFAULT; + + if (evt[1] & EVTQ_1_RnW) + perm |= IOMMU_FAULT_PERM_READ; + else + perm |= IOMMU_FAULT_PERM_WRITE; + + if (evt[1] & EVTQ_1_InD) + perm |= IOMMU_FAULT_PERM_EXEC; + + if (evt[1] & EVTQ_1_PnU) + perm |= IOMMU_FAULT_PERM_PRIV; + + if (evt[1] & EVTQ_1_STALL) { + flt->type = IOMMU_FAULT_PAGE_REQ; + flt->prm = (struct iommu_fault_page_request) { + .flags = IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE, + .grpid = FIELD_GET(EVTQ_1_STAG, evt[1]), + .perm = perm, + .addr = FIELD_GET(EVTQ_2_ADDR, evt[2]), + }; + + if (ssid_valid) { + flt->prm.flags |= IOMMU_FAULT_PAGE_REQUEST_PASID_VALID; + flt->prm.pasid = FIELD_GET(EVTQ_0_SSID, evt[0]); + } + } else { + flt->type = IOMMU_FAULT_DMA_UNRECOV; + flt->event = (struct iommu_fault_unrecoverable) { + .reason = reason, + .flags = IOMMU_FAULT_UNRECOV_ADDR_VALID, + .perm = perm, + .addr = FIELD_GET(EVTQ_2_ADDR, evt[2]), + }; + + if (ssid_valid) { + flt->event.flags |= IOMMU_FAULT_UNRECOV_PASID_VALID; + flt->event.pasid = FIELD_GET(EVTQ_0_SSID, evt[0]); + } + } + + mutex_lock(&smmu->streams_mutex); + master = arm_smmu_find_master(smmu, sid); + if (!master) { + ret = -EINVAL; + goto out_unlock; + } + + ret = iommu_report_device_fault(master->dev, &fault_evt); + if (ret && flt->type == IOMMU_FAULT_PAGE_REQ) { + /* Nobody cared, abort the access */ + struct iommu_page_response resp = { + .pasid = flt->prm.pasid, + .grpid = flt->prm.grpid, + .code = IOMMU_PAGE_RESP_FAILURE, + }; + arm_smmu_page_response(master->dev, &fault_evt, &resp); + } + +out_unlock: + mutex_unlock(&smmu->streams_mutex); + return ret; +} + +static irqreturn_t arm_smmu_evtq_thread(int irq, void *dev) +{ + int i, ret; + struct arm_smmu_device *smmu = dev; + struct arm_smmu_queue *q = &smmu->evtq.q; + struct arm_smmu_ll_queue *llq = &q->llq; + static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL, + DEFAULT_RATELIMIT_BURST); + u64 evt[EVTQ_ENT_DWORDS]; + + do { + while (!queue_remove_raw(q, evt)) { + u8 id = FIELD_GET(EVTQ_0_ID, evt[0]); + + ret = arm_smmu_handle_evt(smmu, evt); + if (!ret || !__ratelimit(&rs)) + continue; + + dev_info(smmu->dev, "event 0x%02x received:\n", id); + for (i = 0; i < ARRAY_SIZE(evt); ++i) + dev_info(smmu->dev, "\t0x%016llx\n", + (unsigned long long)evt[i]); + + cond_resched(); + } + + /* + * Not much we can do on overflow, so scream and pretend we're + * trying harder. + */ + if (queue_sync_prod_in(q) == -EOVERFLOW) + dev_err(smmu->dev, "EVTQ overflow detected -- events lost\n"); + } while (!queue_empty(llq)); + + /* Sync our overflow flag, as we believe we're up to speed */ + llq->cons = Q_OVF(llq->prod) | Q_WRP(llq, llq->cons) | + Q_IDX(llq, llq->cons); + return IRQ_HANDLED; +} + +static void arm_smmu_handle_ppr(struct arm_smmu_device *smmu, u64 *evt) +{ + u32 sid, ssid; + u16 grpid; + bool ssv, last; + + sid = FIELD_GET(PRIQ_0_SID, evt[0]); + ssv = FIELD_GET(PRIQ_0_SSID_V, evt[0]); + ssid = ssv ? FIELD_GET(PRIQ_0_SSID, evt[0]) : 0; + last = FIELD_GET(PRIQ_0_PRG_LAST, evt[0]); + grpid = FIELD_GET(PRIQ_1_PRG_IDX, evt[1]); + + dev_info(smmu->dev, "unexpected PRI request received:\n"); + dev_info(smmu->dev, + "\tsid 0x%08x.0x%05x: [%u%s] %sprivileged %s%s%s access at iova 0x%016llx\n", + sid, ssid, grpid, last ? "L" : "", + evt[0] & PRIQ_0_PERM_PRIV ? "" : "un", + evt[0] & PRIQ_0_PERM_READ ? "R" : "", + evt[0] & PRIQ_0_PERM_WRITE ? "W" : "", + evt[0] & PRIQ_0_PERM_EXEC ? "X" : "", + evt[1] & PRIQ_1_ADDR_MASK); + + if (last) { + struct arm_smmu_cmdq_ent cmd = { + .opcode = CMDQ_OP_PRI_RESP, + .substream_valid = ssv, + .pri = { + .sid = sid, + .ssid = ssid, + .grpid = grpid, + .resp = PRI_RESP_DENY, + }, + }; + + arm_smmu_cmdq_issue_cmd(smmu, &cmd); + } +} + +static irqreturn_t arm_smmu_priq_thread(int irq, void *dev) +{ + struct arm_smmu_device *smmu = dev; + struct arm_smmu_queue *q = &smmu->priq.q; + struct arm_smmu_ll_queue *llq = &q->llq; + u64 evt[PRIQ_ENT_DWORDS]; + + do { + while (!queue_remove_raw(q, evt)) + arm_smmu_handle_ppr(smmu, evt); + + if (queue_sync_prod_in(q) == -EOVERFLOW) + dev_err(smmu->dev, "PRIQ overflow detected -- requests lost\n"); + } while (!queue_empty(llq)); + + /* Sync our overflow flag, as we believe we're up to speed */ + llq->cons = Q_OVF(llq->prod) | Q_WRP(llq, llq->cons) | + Q_IDX(llq, llq->cons); + queue_sync_cons_out(q); + return IRQ_HANDLED; +} + +static int arm_smmu_device_disable(struct arm_smmu_device *smmu); + +static irqreturn_t arm_smmu_gerror_handler(int irq, void *dev) +{ + u32 gerror, gerrorn, active; + struct arm_smmu_device *smmu = dev; + + gerror = readl_relaxed(smmu->base + ARM_SMMU_GERROR); + gerrorn = readl_relaxed(smmu->base + ARM_SMMU_GERRORN); + + active = gerror ^ gerrorn; + if (!(active & GERROR_ERR_MASK)) + return IRQ_NONE; /* No errors pending */ + + dev_warn(smmu->dev, + "unexpected global error reported (0x%08x), this could be serious\n", + active); + + if (active & GERROR_SFM_ERR) { + dev_err(smmu->dev, "device has entered Service Failure Mode!\n"); + arm_smmu_device_disable(smmu); + } + + if (active & GERROR_MSI_GERROR_ABT_ERR) + dev_warn(smmu->dev, "GERROR MSI write aborted\n"); + + if (active & GERROR_MSI_PRIQ_ABT_ERR) + dev_warn(smmu->dev, "PRIQ MSI write aborted\n"); + + if (active & GERROR_MSI_EVTQ_ABT_ERR) + dev_warn(smmu->dev, "EVTQ MSI write aborted\n"); + + if (active & GERROR_MSI_CMDQ_ABT_ERR) + dev_warn(smmu->dev, "CMDQ MSI write aborted\n"); + + if (active & GERROR_PRIQ_ABT_ERR) + dev_err(smmu->dev, "PRIQ write aborted -- events may have been lost\n"); + + if (active & GERROR_EVTQ_ABT_ERR) + dev_err(smmu->dev, "EVTQ write aborted -- events may have been lost\n"); + + if (active & GERROR_CMDQ_ERR) + arm_smmu_cmdq_skip_err(smmu); + + writel(gerror, smmu->base + ARM_SMMU_GERRORN); + return IRQ_HANDLED; +} + +static irqreturn_t arm_smmu_combined_irq_thread(int irq, void *dev) +{ + struct arm_smmu_device *smmu = dev; + + arm_smmu_evtq_thread(irq, dev); + if (smmu->features & ARM_SMMU_FEAT_PRI) + arm_smmu_priq_thread(irq, dev); + + return IRQ_HANDLED; +} + +static irqreturn_t arm_smmu_combined_irq_handler(int irq, void *dev) +{ + arm_smmu_gerror_handler(irq, dev); + return IRQ_WAKE_THREAD; +} + +static void +arm_smmu_atc_inv_to_cmd(int ssid, unsigned long iova, size_t size, + struct arm_smmu_cmdq_ent *cmd) +{ + size_t log2_span; + size_t span_mask; + /* ATC invalidates are always on 4096-bytes pages */ + size_t inval_grain_shift = 12; + unsigned long page_start, page_end; + + /* + * ATS and PASID: + * + * If substream_valid is clear, the PCIe TLP is sent without a PASID + * prefix. In that case all ATC entries within the address range are + * invalidated, including those that were requested with a PASID! There + * is no way to invalidate only entries without PASID. + * + * When using STRTAB_STE_1_S1DSS_SSID0 (reserving CD 0 for non-PASID + * traffic), translation requests without PASID create ATC entries + * without PASID, which must be invalidated with substream_valid clear. + * This has the unpleasant side-effect of invalidating all PASID-tagged + * ATC entries within the address range. + */ + *cmd = (struct arm_smmu_cmdq_ent) { + .opcode = CMDQ_OP_ATC_INV, + .substream_valid = !!ssid, + .atc.ssid = ssid, + }; + + if (!size) { + cmd->atc.size = ATC_INV_SIZE_ALL; + return; + } + + page_start = iova >> inval_grain_shift; + page_end = (iova + size - 1) >> inval_grain_shift; + + /* + * In an ATS Invalidate Request, the address must be aligned on the + * range size, which must be a power of two number of page sizes. We + * thus have to choose between grossly over-invalidating the region, or + * splitting the invalidation into multiple commands. For simplicity + * we'll go with the first solution, but should refine it in the future + * if multiple commands are shown to be more efficient. + * + * Find the smallest power of two that covers the range. The most + * significant differing bit between the start and end addresses, + * fls(start ^ end), indicates the required span. For example: + * + * We want to invalidate pages [8; 11]. This is already the ideal range: + * x = 0b1000 ^ 0b1011 = 0b11 + * span = 1 << fls(x) = 4 + * + * To invalidate pages [7; 10], we need to invalidate [0; 15]: + * x = 0b0111 ^ 0b1010 = 0b1101 + * span = 1 << fls(x) = 16 + */ + log2_span = fls_long(page_start ^ page_end); + span_mask = (1ULL << log2_span) - 1; + + page_start &= ~span_mask; + + cmd->atc.addr = page_start << inval_grain_shift; + cmd->atc.size = log2_span; +} + +static int arm_smmu_atc_inv_master(struct arm_smmu_master *master) +{ + int i; + struct arm_smmu_cmdq_ent cmd; + struct arm_smmu_cmdq_batch cmds; + + arm_smmu_atc_inv_to_cmd(0, 0, 0, &cmd); + + cmds.num = 0; + for (i = 0; i < master->num_streams; i++) { + cmd.atc.sid = master->streams[i].id; + arm_smmu_cmdq_batch_add(master->smmu, &cmds, &cmd); + } + + return arm_smmu_cmdq_batch_submit(master->smmu, &cmds); +} + +int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, int ssid, + unsigned long iova, size_t size) +{ + int i; + unsigned long flags; + struct arm_smmu_cmdq_ent cmd; + struct arm_smmu_master *master; + struct arm_smmu_cmdq_batch cmds; + + if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS)) + return 0; + + /* + * Ensure that we've completed prior invalidation of the main TLBs + * before we read 'nr_ats_masters' in case of a concurrent call to + * arm_smmu_enable_ats(): + * + * // unmap() // arm_smmu_enable_ats() + * TLBI+SYNC atomic_inc(&nr_ats_masters); + * smp_mb(); [...] + * atomic_read(&nr_ats_masters); pci_enable_ats() // writel() + * + * Ensures that we always see the incremented 'nr_ats_masters' count if + * ATS was enabled at the PCI device before completion of the TLBI. + */ + smp_mb(); + if (!atomic_read(&smmu_domain->nr_ats_masters)) + return 0; + + arm_smmu_atc_inv_to_cmd(ssid, iova, size, &cmd); + + cmds.num = 0; + + spin_lock_irqsave(&smmu_domain->devices_lock, flags); + list_for_each_entry(master, &smmu_domain->devices, domain_head) { + if (!master->ats_enabled) + continue; + + for (i = 0; i < master->num_streams; i++) { + cmd.atc.sid = master->streams[i].id; + arm_smmu_cmdq_batch_add(smmu_domain->smmu, &cmds, &cmd); + } + } + spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); + + return arm_smmu_cmdq_batch_submit(smmu_domain->smmu, &cmds); +} + +/* IO_PGTABLE API */ +static void arm_smmu_tlb_inv_context(void *cookie) +{ + struct arm_smmu_domain *smmu_domain = cookie; + struct arm_smmu_device *smmu = smmu_domain->smmu; + struct arm_smmu_cmdq_ent cmd; + + /* + * NOTE: when io-pgtable is in non-strict mode, we may get here with + * PTEs previously cleared by unmaps on the current CPU not yet visible + * to the SMMU. We are relying on the dma_wmb() implicit during cmd + * insertion to guarantee those are observed before the TLBI. Do be + * careful, 007. + */ + if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) { + arm_smmu_tlb_inv_asid(smmu, smmu_domain->s1_cfg.cd.asid); + } else { + cmd.opcode = CMDQ_OP_TLBI_S12_VMALL; + cmd.tlbi.vmid = smmu_domain->s2_cfg.vmid; + arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd); + } + arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0); +} + +static void __arm_smmu_tlb_inv_range(struct arm_smmu_cmdq_ent *cmd, + unsigned long iova, size_t size, + size_t granule, + struct arm_smmu_domain *smmu_domain) +{ + struct arm_smmu_device *smmu = smmu_domain->smmu; + unsigned long end = iova + size, num_pages = 0, tg = 0; + size_t inv_range = granule; + struct arm_smmu_cmdq_batch cmds; + + if (!size) + return; + + if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) { + /* Get the leaf page size */ + tg = __ffs(smmu_domain->domain.pgsize_bitmap); + + /* Convert page size of 12,14,16 (log2) to 1,2,3 */ + cmd->tlbi.tg = (tg - 10) / 2; + + /* Determine what level the granule is at */ + cmd->tlbi.ttl = 4 - ((ilog2(granule) - 3) / (tg - 3)); + + num_pages = size >> tg; + } + + cmds.num = 0; + + while (iova < end) { + if (smmu->features & ARM_SMMU_FEAT_RANGE_INV) { + /* + * On each iteration of the loop, the range is 5 bits + * worth of the aligned size remaining. + * The range in pages is: + * + * range = (num_pages & (0x1f << __ffs(num_pages))) + */ + unsigned long scale, num; + + /* Determine the power of 2 multiple number of pages */ + scale = __ffs(num_pages); + cmd->tlbi.scale = scale; + + /* Determine how many chunks of 2^scale size we have */ + num = (num_pages >> scale) & CMDQ_TLBI_RANGE_NUM_MAX; + cmd->tlbi.num = num - 1; + + /* range is num * 2^scale * pgsize */ + inv_range = num << (scale + tg); + + /* Clear out the lower order bits for the next iteration */ + num_pages -= num << scale; + } + + cmd->tlbi.addr = iova; + arm_smmu_cmdq_batch_add(smmu, &cmds, cmd); + iova += inv_range; + } + arm_smmu_cmdq_batch_submit(smmu, &cmds); +} + +static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size, + size_t granule, bool leaf, + struct arm_smmu_domain *smmu_domain) +{ + struct arm_smmu_cmdq_ent cmd = { + .tlbi = { + .leaf = leaf, + }, + }; + + if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) { + cmd.opcode = smmu_domain->smmu->features & ARM_SMMU_FEAT_E2H ? + CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA; + cmd.tlbi.asid = smmu_domain->s1_cfg.cd.asid; + } else { + cmd.opcode = CMDQ_OP_TLBI_S2_IPA; + cmd.tlbi.vmid = smmu_domain->s2_cfg.vmid; + } + __arm_smmu_tlb_inv_range(&cmd, iova, size, granule, smmu_domain); + + /* + * Unfortunately, this can't be leaf-only since we may have + * zapped an entire table. + */ + arm_smmu_atc_inv_domain(smmu_domain, 0, iova, size); +} + +void arm_smmu_tlb_inv_range_asid(unsigned long iova, size_t size, int asid, + size_t granule, bool leaf, + struct arm_smmu_domain *smmu_domain) +{ + struct arm_smmu_cmdq_ent cmd = { + .opcode = smmu_domain->smmu->features & ARM_SMMU_FEAT_E2H ? + CMDQ_OP_TLBI_EL2_VA : CMDQ_OP_TLBI_NH_VA, + .tlbi = { + .asid = asid, + .leaf = leaf, + }, + }; + + __arm_smmu_tlb_inv_range(&cmd, iova, size, granule, smmu_domain); +} + +static void arm_smmu_tlb_inv_page_nosync(struct iommu_iotlb_gather *gather, + unsigned long iova, size_t granule, + void *cookie) +{ + struct arm_smmu_domain *smmu_domain = cookie; + struct iommu_domain *domain = &smmu_domain->domain; + + iommu_iotlb_gather_add_page(domain, gather, iova, granule); +} + +static void arm_smmu_tlb_inv_walk(unsigned long iova, size_t size, + size_t granule, void *cookie) +{ + arm_smmu_tlb_inv_range_domain(iova, size, granule, false, cookie); +} + +static const struct iommu_flush_ops arm_smmu_flush_ops = { + .tlb_flush_all = arm_smmu_tlb_inv_context, + .tlb_flush_walk = arm_smmu_tlb_inv_walk, + .tlb_add_page = arm_smmu_tlb_inv_page_nosync, +}; + +/* IOMMU API */ +static bool arm_smmu_capable(enum iommu_cap cap) +{ + switch (cap) { + case IOMMU_CAP_CACHE_COHERENCY: + return true; + case IOMMU_CAP_NOEXEC: + return true; + default: + return false; + } +} + +static struct iommu_domain *arm_smmu_domain_alloc(unsigned type) +{ + struct arm_smmu_domain *smmu_domain; + + if (type != IOMMU_DOMAIN_UNMANAGED && + type != IOMMU_DOMAIN_DMA && + type != IOMMU_DOMAIN_DMA_FQ && + type != IOMMU_DOMAIN_IDENTITY) + return NULL; + + /* + * Allocate the domain and initialise some of its data structures. + * We can't really do anything meaningful until we've added a + * master. + */ + smmu_domain = kzalloc(sizeof(*smmu_domain), GFP_KERNEL); + if (!smmu_domain) + return NULL; + + mutex_init(&smmu_domain->init_mutex); + INIT_LIST_HEAD(&smmu_domain->devices); + spin_lock_init(&smmu_domain->devices_lock); + INIT_LIST_HEAD(&smmu_domain->mmu_notifiers); + + return &smmu_domain->domain; +} + +static int arm_smmu_bitmap_alloc(unsigned long *map, int span) +{ + int idx, size = 1 << span; + + do { + idx = find_first_zero_bit(map, size); + if (idx == size) + return -ENOSPC; + } while (test_and_set_bit(idx, map)); + + return idx; +} + +static void arm_smmu_bitmap_free(unsigned long *map, int idx) +{ + clear_bit(idx, map); +} + +static void arm_smmu_domain_free(struct iommu_domain *domain) +{ + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + + free_io_pgtable_ops(smmu_domain->pgtbl_ops); + + /* Free the CD and ASID, if we allocated them */ + if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1) { + struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg; + + /* Prevent SVA from touching the CD while we're freeing it */ + mutex_lock(&arm_smmu_asid_lock); + if (cfg->cdcfg.cdtab) + arm_smmu_free_cd_tables(smmu_domain); + arm_smmu_free_asid(&cfg->cd); + mutex_unlock(&arm_smmu_asid_lock); + } else { + struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg; + if (cfg->vmid) + arm_smmu_bitmap_free(smmu->vmid_map, cfg->vmid); + } + + kfree(smmu_domain); +} + +static int arm_smmu_domain_finalise_s1(struct arm_smmu_domain *smmu_domain, + struct arm_smmu_master *master, + struct io_pgtable_cfg *pgtbl_cfg) +{ + int ret; + u32 asid; + struct arm_smmu_device *smmu = smmu_domain->smmu; + struct arm_smmu_s1_cfg *cfg = &smmu_domain->s1_cfg; + typeof(&pgtbl_cfg->arm_lpae_s1_cfg.tcr) tcr = &pgtbl_cfg->arm_lpae_s1_cfg.tcr; + + refcount_set(&cfg->cd.refs, 1); + + /* Prevent SVA from modifying the ASID until it is written to the CD */ + mutex_lock(&arm_smmu_asid_lock); + ret = xa_alloc(&arm_smmu_asid_xa, &asid, &cfg->cd, + XA_LIMIT(1, (1 << smmu->asid_bits) - 1), GFP_KERNEL); + if (ret) + goto out_unlock; + + cfg->s1cdmax = master->ssid_bits; + + smmu_domain->stall_enabled = master->stall_enabled; + + ret = arm_smmu_alloc_cd_tables(smmu_domain); + if (ret) + goto out_free_asid; + + cfg->cd.asid = (u16)asid; + cfg->cd.ttbr = pgtbl_cfg->arm_lpae_s1_cfg.ttbr; + cfg->cd.tcr = FIELD_PREP(CTXDESC_CD_0_TCR_T0SZ, tcr->tsz) | + FIELD_PREP(CTXDESC_CD_0_TCR_TG0, tcr->tg) | + FIELD_PREP(CTXDESC_CD_0_TCR_IRGN0, tcr->irgn) | + FIELD_PREP(CTXDESC_CD_0_TCR_ORGN0, tcr->orgn) | + FIELD_PREP(CTXDESC_CD_0_TCR_SH0, tcr->sh) | + FIELD_PREP(CTXDESC_CD_0_TCR_IPS, tcr->ips) | + CTXDESC_CD_0_TCR_EPD1 | CTXDESC_CD_0_AA64; + cfg->cd.mair = pgtbl_cfg->arm_lpae_s1_cfg.mair; + + /* + * Note that this will end up calling arm_smmu_sync_cd() before + * the master has been added to the devices list for this domain. + * This isn't an issue because the STE hasn't been installed yet. + */ + ret = arm_smmu_write_ctx_desc(smmu_domain, 0, &cfg->cd); + if (ret) + goto out_free_cd_tables; + + mutex_unlock(&arm_smmu_asid_lock); + return 0; + +out_free_cd_tables: + arm_smmu_free_cd_tables(smmu_domain); +out_free_asid: + arm_smmu_free_asid(&cfg->cd); +out_unlock: + mutex_unlock(&arm_smmu_asid_lock); + return ret; +} + +static int arm_smmu_domain_finalise_s2(struct arm_smmu_domain *smmu_domain, + struct arm_smmu_master *master, + struct io_pgtable_cfg *pgtbl_cfg) +{ + int vmid; + struct arm_smmu_device *smmu = smmu_domain->smmu; + struct arm_smmu_s2_cfg *cfg = &smmu_domain->s2_cfg; + typeof(&pgtbl_cfg->arm_lpae_s2_cfg.vtcr) vtcr; + + vmid = arm_smmu_bitmap_alloc(smmu->vmid_map, smmu->vmid_bits); + if (vmid < 0) + return vmid; + + vtcr = &pgtbl_cfg->arm_lpae_s2_cfg.vtcr; + cfg->vmid = (u16)vmid; + cfg->vttbr = pgtbl_cfg->arm_lpae_s2_cfg.vttbr; + cfg->vtcr = FIELD_PREP(STRTAB_STE_2_VTCR_S2T0SZ, vtcr->tsz) | + FIELD_PREP(STRTAB_STE_2_VTCR_S2SL0, vtcr->sl) | + FIELD_PREP(STRTAB_STE_2_VTCR_S2IR0, vtcr->irgn) | + FIELD_PREP(STRTAB_STE_2_VTCR_S2OR0, vtcr->orgn) | + FIELD_PREP(STRTAB_STE_2_VTCR_S2SH0, vtcr->sh) | + FIELD_PREP(STRTAB_STE_2_VTCR_S2TG, vtcr->tg) | + FIELD_PREP(STRTAB_STE_2_VTCR_S2PS, vtcr->ps); + return 0; +} + +static int arm_smmu_domain_finalise(struct iommu_domain *domain, + struct arm_smmu_master *master) +{ + int ret; + unsigned long ias, oas; + enum io_pgtable_fmt fmt; + struct io_pgtable_cfg pgtbl_cfg; + struct io_pgtable_ops *pgtbl_ops; + int (*finalise_stage_fn)(struct arm_smmu_domain *, + struct arm_smmu_master *, + struct io_pgtable_cfg *); + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_device *smmu = smmu_domain->smmu; + + if (domain->type == IOMMU_DOMAIN_IDENTITY) { + smmu_domain->stage = ARM_SMMU_DOMAIN_BYPASS; + return 0; + } + + /* Restrict the stage to what we can actually support */ + if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S1)) + smmu_domain->stage = ARM_SMMU_DOMAIN_S2; + if (!(smmu->features & ARM_SMMU_FEAT_TRANS_S2)) + smmu_domain->stage = ARM_SMMU_DOMAIN_S1; + + switch (smmu_domain->stage) { + case ARM_SMMU_DOMAIN_S1: + ias = (smmu->features & ARM_SMMU_FEAT_VAX) ? 52 : 48; + ias = min_t(unsigned long, ias, VA_BITS); + oas = smmu->ias; + fmt = ARM_64_LPAE_S1; + finalise_stage_fn = arm_smmu_domain_finalise_s1; + break; + case ARM_SMMU_DOMAIN_NESTED: + case ARM_SMMU_DOMAIN_S2: + ias = smmu->ias; + oas = smmu->oas; + fmt = ARM_64_LPAE_S2; + finalise_stage_fn = arm_smmu_domain_finalise_s2; + break; + default: + return -EINVAL; + } + + pgtbl_cfg = (struct io_pgtable_cfg) { + .pgsize_bitmap = smmu->pgsize_bitmap, + .ias = ias, + .oas = oas, + .coherent_walk = smmu->features & ARM_SMMU_FEAT_COHERENCY, + .tlb = &arm_smmu_flush_ops, + .iommu_dev = smmu->dev, + }; + + pgtbl_ops = alloc_io_pgtable_ops(fmt, &pgtbl_cfg, smmu_domain); + if (!pgtbl_ops) + return -ENOMEM; + + domain->pgsize_bitmap = pgtbl_cfg.pgsize_bitmap; + domain->geometry.aperture_end = (1UL << pgtbl_cfg.ias) - 1; + domain->geometry.force_aperture = true; + + ret = finalise_stage_fn(smmu_domain, master, &pgtbl_cfg); + if (ret < 0) { + free_io_pgtable_ops(pgtbl_ops); + return ret; + } + + smmu_domain->pgtbl_ops = pgtbl_ops; + return 0; +} + +static __le64 *arm_smmu_get_step_for_sid(struct arm_smmu_device *smmu, u32 sid) +{ + __le64 *step; + struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; + + if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) { + struct arm_smmu_strtab_l1_desc *l1_desc; + int idx; + + /* Two-level walk */ + idx = (sid >> STRTAB_SPLIT) * STRTAB_L1_DESC_DWORDS; + l1_desc = &cfg->l1_desc[idx]; + idx = (sid & ((1 << STRTAB_SPLIT) - 1)) * STRTAB_STE_DWORDS; + step = &l1_desc->l2ptr[idx]; + } else { + /* Simple linear lookup */ + step = &cfg->strtab[sid * STRTAB_STE_DWORDS]; + } + + return step; +} + +static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master) +{ + int i, j; + struct arm_smmu_device *smmu = master->smmu; + + for (i = 0; i < master->num_streams; ++i) { + u32 sid = master->streams[i].id; + __le64 *step = arm_smmu_get_step_for_sid(smmu, sid); + + /* Bridged PCI devices may end up with duplicated IDs */ + for (j = 0; j < i; j++) + if (master->streams[j].id == sid) + break; + if (j < i) + continue; + + arm_smmu_write_strtab_ent(master, sid, step); + } +} + +static bool arm_smmu_ats_supported(struct arm_smmu_master *master) +{ + struct device *dev = master->dev; + struct arm_smmu_device *smmu = master->smmu; + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); + + if (!(smmu->features & ARM_SMMU_FEAT_ATS)) + return false; + + if (!(fwspec->flags & IOMMU_FWSPEC_PCI_RC_ATS)) + return false; + + return dev_is_pci(dev) && pci_ats_supported(to_pci_dev(dev)); +} + +static void arm_smmu_enable_ats(struct arm_smmu_master *master) +{ + size_t stu; + struct pci_dev *pdev; + struct arm_smmu_device *smmu = master->smmu; + struct arm_smmu_domain *smmu_domain = master->domain; + + /* Don't enable ATS at the endpoint if it's not enabled in the STE */ + if (!master->ats_enabled) + return; + + /* Smallest Translation Unit: log2 of the smallest supported granule */ + stu = __ffs(smmu->pgsize_bitmap); + pdev = to_pci_dev(master->dev); + + atomic_inc(&smmu_domain->nr_ats_masters); + arm_smmu_atc_inv_domain(smmu_domain, 0, 0, 0); + if (pci_enable_ats(pdev, stu)) + dev_err(master->dev, "Failed to enable ATS (STU %zu)\n", stu); +} + +static void arm_smmu_disable_ats(struct arm_smmu_master *master) +{ + struct arm_smmu_domain *smmu_domain = master->domain; + + if (!master->ats_enabled) + return; + + pci_disable_ats(to_pci_dev(master->dev)); + /* + * Ensure ATS is disabled at the endpoint before we issue the + * ATC invalidation via the SMMU. + */ + wmb(); + arm_smmu_atc_inv_master(master); + atomic_dec(&smmu_domain->nr_ats_masters); +} + +static int arm_smmu_enable_pasid(struct arm_smmu_master *master) +{ + int ret; + int features; + int num_pasids; + struct pci_dev *pdev; + + if (!dev_is_pci(master->dev)) + return -ENODEV; + + pdev = to_pci_dev(master->dev); + + features = pci_pasid_features(pdev); + if (features < 0) + return features; + + num_pasids = pci_max_pasids(pdev); + if (num_pasids <= 0) + return num_pasids; + + ret = pci_enable_pasid(pdev, features); + if (ret) { + dev_err(&pdev->dev, "Failed to enable PASID\n"); + return ret; + } + + master->ssid_bits = min_t(u8, ilog2(num_pasids), + master->smmu->ssid_bits); + return 0; +} + +static void arm_smmu_disable_pasid(struct arm_smmu_master *master) +{ + struct pci_dev *pdev; + + if (!dev_is_pci(master->dev)) + return; + + pdev = to_pci_dev(master->dev); + + if (!pdev->pasid_enabled) + return; + + master->ssid_bits = 0; + pci_disable_pasid(pdev); +} + +static void arm_smmu_detach_dev(struct arm_smmu_master *master) +{ + unsigned long flags; + struct arm_smmu_domain *smmu_domain = master->domain; + + if (!smmu_domain) + return; + + arm_smmu_disable_ats(master); + + spin_lock_irqsave(&smmu_domain->devices_lock, flags); + list_del(&master->domain_head); + spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); + + master->domain = NULL; + master->ats_enabled = false; + arm_smmu_install_ste_for_dev(master); +} + +static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) +{ + int ret = 0; + unsigned long flags; + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); + struct arm_smmu_device *smmu; + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct arm_smmu_master *master; + + if (!fwspec) + return -ENOENT; + + master = dev_iommu_priv_get(dev); + smmu = master->smmu; + + /* + * Checking that SVA is disabled ensures that this device isn't bound to + * any mm, and can be safely detached from its old domain. Bonds cannot + * be removed concurrently since we're holding the group mutex. + */ + if (arm_smmu_master_sva_enabled(master)) { + dev_err(dev, "cannot attach - SVA enabled\n"); + return -EBUSY; + } + + arm_smmu_detach_dev(master); + + mutex_lock(&smmu_domain->init_mutex); + + if (!smmu_domain->smmu) { + smmu_domain->smmu = smmu; + ret = arm_smmu_domain_finalise(domain, master); + if (ret) { + smmu_domain->smmu = NULL; + goto out_unlock; + } + } else if (smmu_domain->smmu != smmu) { + dev_err(dev, + "cannot attach to SMMU %s (upstream of %s)\n", + dev_name(smmu_domain->smmu->dev), + dev_name(smmu->dev)); + ret = -ENXIO; + goto out_unlock; + } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1 && + master->ssid_bits != smmu_domain->s1_cfg.s1cdmax) { + dev_err(dev, + "cannot attach to incompatible domain (%u SSID bits != %u)\n", + smmu_domain->s1_cfg.s1cdmax, master->ssid_bits); + ret = -EINVAL; + goto out_unlock; + } else if (smmu_domain->stage == ARM_SMMU_DOMAIN_S1 && + smmu_domain->stall_enabled != master->stall_enabled) { + dev_err(dev, "cannot attach to stall-%s domain\n", + smmu_domain->stall_enabled ? "enabled" : "disabled"); + ret = -EINVAL; + goto out_unlock; + } + + master->domain = smmu_domain; + + if (smmu_domain->stage != ARM_SMMU_DOMAIN_BYPASS) + master->ats_enabled = arm_smmu_ats_supported(master); + + arm_smmu_install_ste_for_dev(master); + + spin_lock_irqsave(&smmu_domain->devices_lock, flags); + list_add(&master->domain_head, &smmu_domain->devices); + spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); + + arm_smmu_enable_ats(master); + +out_unlock: + mutex_unlock(&smmu_domain->init_mutex); + return ret; +} + +static int arm_smmu_map_pages(struct iommu_domain *domain, unsigned long iova, + phys_addr_t paddr, size_t pgsize, size_t pgcount, + int prot, gfp_t gfp, size_t *mapped) +{ + struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; + + if (!ops) + return -ENODEV; + + return ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot, gfp, mapped); +} + +static size_t arm_smmu_unmap_pages(struct iommu_domain *domain, unsigned long iova, + size_t pgsize, size_t pgcount, + struct iommu_iotlb_gather *gather) +{ + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + struct io_pgtable_ops *ops = smmu_domain->pgtbl_ops; + + if (!ops) + return 0; + + return ops->unmap_pages(ops, iova, pgsize, pgcount, gather); +} + +static void arm_smmu_flush_iotlb_all(struct iommu_domain *domain) +{ + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + + if (smmu_domain->smmu) + arm_smmu_tlb_inv_context(smmu_domain); +} + +static void arm_smmu_iotlb_sync(struct iommu_domain *domain, + struct iommu_iotlb_gather *gather) +{ + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + + if (!gather->pgsize) + return; + + arm_smmu_tlb_inv_range_domain(gather->start, + gather->end - gather->start + 1, + gather->pgsize, true, smmu_domain); +} + +static phys_addr_t +arm_smmu_iova_to_phys(struct iommu_domain *domain, dma_addr_t iova) +{ + struct io_pgtable_ops *ops = to_smmu_domain(domain)->pgtbl_ops; + + if (!ops) + return 0; + + return ops->iova_to_phys(ops, iova); +} + +static struct platform_driver arm_smmu_driver; + +static +struct arm_smmu_device *arm_smmu_get_by_fwnode(struct fwnode_handle *fwnode) +{ + struct device *dev = driver_find_device_by_fwnode(&arm_smmu_driver.driver, + fwnode); + put_device(dev); + return dev ? dev_get_drvdata(dev) : NULL; +} + +static bool arm_smmu_sid_in_range(struct arm_smmu_device *smmu, u32 sid) +{ + unsigned long limit = smmu->strtab_cfg.num_l1_ents; + + if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) + limit *= 1UL << STRTAB_SPLIT; + + return sid < limit; +} + +static int arm_smmu_init_sid_strtab(struct arm_smmu_device *smmu, u32 sid) +{ + /* Check the SIDs are in range of the SMMU and our stream table */ + if (!arm_smmu_sid_in_range(smmu, sid)) + return -ERANGE; + + /* Ensure l2 strtab is initialised */ + if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) + return arm_smmu_init_l2_strtab(smmu, sid); + + return 0; +} + +static int arm_smmu_insert_master(struct arm_smmu_device *smmu, + struct arm_smmu_master *master) +{ + int i; + int ret = 0; + struct arm_smmu_stream *new_stream, *cur_stream; + struct rb_node **new_node, *parent_node = NULL; + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev); + + master->streams = kcalloc(fwspec->num_ids, sizeof(*master->streams), + GFP_KERNEL); + if (!master->streams) + return -ENOMEM; + master->num_streams = fwspec->num_ids; + + mutex_lock(&smmu->streams_mutex); + for (i = 0; i < fwspec->num_ids; i++) { + u32 sid = fwspec->ids[i]; + + new_stream = &master->streams[i]; + new_stream->id = sid; + new_stream->master = master; + + ret = arm_smmu_init_sid_strtab(smmu, sid); + if (ret) + break; + + /* Insert into SID tree */ + new_node = &(smmu->streams.rb_node); + while (*new_node) { + cur_stream = rb_entry(*new_node, struct arm_smmu_stream, + node); + parent_node = *new_node; + if (cur_stream->id > new_stream->id) { + new_node = &((*new_node)->rb_left); + } else if (cur_stream->id < new_stream->id) { + new_node = &((*new_node)->rb_right); + } else { + dev_warn(master->dev, + "stream %u already in tree\n", + cur_stream->id); + ret = -EINVAL; + break; + } + } + if (ret) + break; + + rb_link_node(&new_stream->node, parent_node, new_node); + rb_insert_color(&new_stream->node, &smmu->streams); + } + + if (ret) { + for (i--; i >= 0; i--) + rb_erase(&master->streams[i].node, &smmu->streams); + kfree(master->streams); + } + mutex_unlock(&smmu->streams_mutex); + + return ret; +} + +static void arm_smmu_remove_master(struct arm_smmu_master *master) +{ + int i; + struct arm_smmu_device *smmu = master->smmu; + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev); + + if (!smmu || !master->streams) + return; + + mutex_lock(&smmu->streams_mutex); + for (i = 0; i < fwspec->num_ids; i++) + rb_erase(&master->streams[i].node, &smmu->streams); + mutex_unlock(&smmu->streams_mutex); + + kfree(master->streams); +} + +static struct iommu_ops arm_smmu_ops; + +static struct iommu_device *arm_smmu_probe_device(struct device *dev) +{ + int ret; + struct arm_smmu_device *smmu; + struct arm_smmu_master *master; + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); + + if (!fwspec || fwspec->ops != &arm_smmu_ops) + return ERR_PTR(-ENODEV); + + if (WARN_ON_ONCE(dev_iommu_priv_get(dev))) + return ERR_PTR(-EBUSY); + + smmu = arm_smmu_get_by_fwnode(fwspec->iommu_fwnode); + if (!smmu) + return ERR_PTR(-ENODEV); + + master = kzalloc(sizeof(*master), GFP_KERNEL); + if (!master) + return ERR_PTR(-ENOMEM); + + master->dev = dev; + master->smmu = smmu; + INIT_LIST_HEAD(&master->bonds); + dev_iommu_priv_set(dev, master); + + ret = arm_smmu_insert_master(smmu, master); + if (ret) + goto err_free_master; + + device_property_read_u32(dev, "pasid-num-bits", &master->ssid_bits); + master->ssid_bits = min(smmu->ssid_bits, master->ssid_bits); + + /* + * Note that PASID must be enabled before, and disabled after ATS: + * PCI Express Base 4.0r1.0 - 10.5.1.3 ATS Control Register + * + * Behavior is undefined if this bit is Set and the value of the PASID + * Enable, Execute Requested Enable, or Privileged Mode Requested bits + * are changed. + */ + arm_smmu_enable_pasid(master); + + if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB)) + master->ssid_bits = min_t(u8, master->ssid_bits, + CTXDESC_LINEAR_CDMAX); + + if ((smmu->features & ARM_SMMU_FEAT_STALLS && + device_property_read_bool(dev, "dma-can-stall")) || + smmu->features & ARM_SMMU_FEAT_STALL_FORCE) + master->stall_enabled = true; + + return &smmu->iommu; + +err_free_master: + kfree(master); + dev_iommu_priv_set(dev, NULL); + return ERR_PTR(ret); +} + +static void arm_smmu_release_device(struct device *dev) +{ + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + + if (WARN_ON(arm_smmu_master_sva_enabled(master))) + iopf_queue_remove_device(master->smmu->evtq.iopf, dev); + arm_smmu_detach_dev(master); + arm_smmu_disable_pasid(master); + arm_smmu_remove_master(master); + kfree(master); +} + +static struct iommu_group *arm_smmu_device_group(struct device *dev) +{ + struct iommu_group *group; + + /* + * We don't support devices sharing stream IDs other than PCI RID + * aliases, since the necessary ID-to-device lookup becomes rather + * impractical given a potential sparse 32-bit stream ID space. + */ + if (dev_is_pci(dev)) + group = pci_device_group(dev); + else + group = generic_device_group(dev); + + return group; +} + +static int arm_smmu_enable_nesting(struct iommu_domain *domain) +{ + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + int ret = 0; + + mutex_lock(&smmu_domain->init_mutex); + if (smmu_domain->smmu) + ret = -EPERM; + else + smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED; + mutex_unlock(&smmu_domain->init_mutex); + + return ret; +} + +static int arm_smmu_of_xlate(struct device *dev, struct of_phandle_args *args) +{ + return iommu_fwspec_add_ids(dev, args->args, 1); +} + +static void arm_smmu_get_resv_regions(struct device *dev, + struct list_head *head) +{ + struct iommu_resv_region *region; + int prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO; + + region = iommu_alloc_resv_region(MSI_IOVA_BASE, MSI_IOVA_LENGTH, + prot, IOMMU_RESV_SW_MSI); + if (!region) + return; + + list_add_tail(®ion->list, head); + + iommu_dma_get_resv_regions(dev, head); +} + +static int arm_smmu_dev_enable_feature(struct device *dev, + enum iommu_dev_features feat) +{ + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + + if (!master) + return -ENODEV; + + switch (feat) { + case IOMMU_DEV_FEAT_IOPF: + if (!arm_smmu_master_iopf_supported(master)) + return -EINVAL; + if (master->iopf_enabled) + return -EBUSY; + master->iopf_enabled = true; + return 0; + case IOMMU_DEV_FEAT_SVA: + if (!arm_smmu_master_sva_supported(master)) + return -EINVAL; + if (arm_smmu_master_sva_enabled(master)) + return -EBUSY; + return arm_smmu_master_enable_sva(master); + default: + return -EINVAL; + } +} + +static int arm_smmu_dev_disable_feature(struct device *dev, + enum iommu_dev_features feat) +{ + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + + if (!master) + return -EINVAL; + + switch (feat) { + case IOMMU_DEV_FEAT_IOPF: + if (!master->iopf_enabled) + return -EINVAL; + if (master->sva_enabled) + return -EBUSY; + master->iopf_enabled = false; + return 0; + case IOMMU_DEV_FEAT_SVA: + if (!arm_smmu_master_sva_enabled(master)) + return -EINVAL; + return arm_smmu_master_disable_sva(master); + default: + return -EINVAL; + } +} + +/* + * HiSilicon PCIe tune and trace device can be used to trace TLP headers on the + * PCIe link and save the data to memory by DMA. The hardware is restricted to + * use identity mapping only. + */ +#define IS_HISI_PTT_DEVICE(pdev) ((pdev)->vendor == PCI_VENDOR_ID_HUAWEI && \ + (pdev)->device == 0xa12e) + +static int arm_smmu_def_domain_type(struct device *dev) +{ + if (dev_is_pci(dev)) { + struct pci_dev *pdev = to_pci_dev(dev); + + if (IS_HISI_PTT_DEVICE(pdev)) + return IOMMU_DOMAIN_IDENTITY; + } + + return 0; +} + +static struct iommu_ops arm_smmu_ops = { + .capable = arm_smmu_capable, + .domain_alloc = arm_smmu_domain_alloc, + .probe_device = arm_smmu_probe_device, + .release_device = arm_smmu_release_device, + .device_group = arm_smmu_device_group, + .of_xlate = arm_smmu_of_xlate, + .get_resv_regions = arm_smmu_get_resv_regions, + .dev_enable_feat = arm_smmu_dev_enable_feature, + .dev_disable_feat = arm_smmu_dev_disable_feature, + .sva_bind = arm_smmu_sva_bind, + .sva_unbind = arm_smmu_sva_unbind, + .sva_get_pasid = arm_smmu_sva_get_pasid, + .page_response = arm_smmu_page_response, + .def_domain_type = arm_smmu_def_domain_type, + .pgsize_bitmap = -1UL, /* Restricted during device attach */ + .owner = THIS_MODULE, + .default_domain_ops = &(const struct iommu_domain_ops) { + .attach_dev = arm_smmu_attach_dev, + .map_pages = arm_smmu_map_pages, + .unmap_pages = arm_smmu_unmap_pages, + .flush_iotlb_all = arm_smmu_flush_iotlb_all, + .iotlb_sync = arm_smmu_iotlb_sync, + .iova_to_phys = arm_smmu_iova_to_phys, + .enable_nesting = arm_smmu_enable_nesting, + .free = arm_smmu_domain_free, + } +}; + +/* Probing and initialisation functions */ +static int arm_smmu_init_one_queue(struct arm_smmu_device *smmu, + struct arm_smmu_queue *q, + void __iomem *page, + unsigned long prod_off, + unsigned long cons_off, + size_t dwords, const char *name) +{ + size_t qsz; + + do { + qsz = ((1 << q->llq.max_n_shift) * dwords) << 3; + q->base = dmam_alloc_coherent(smmu->dev, qsz, &q->base_dma, + GFP_KERNEL); + if (q->base || qsz < PAGE_SIZE) + break; + + q->llq.max_n_shift--; + } while (1); + + if (!q->base) { + dev_err(smmu->dev, + "failed to allocate queue (0x%zx bytes) for %s\n", + qsz, name); + return -ENOMEM; + } + + if (!WARN_ON(q->base_dma & (qsz - 1))) { + dev_info(smmu->dev, "allocated %u entries for %s\n", + 1 << q->llq.max_n_shift, name); + } + + q->prod_reg = page + prod_off; + q->cons_reg = page + cons_off; + q->ent_dwords = dwords; + + q->q_base = Q_BASE_RWA; + q->q_base |= q->base_dma & Q_BASE_ADDR_MASK; + q->q_base |= FIELD_PREP(Q_BASE_LOG2SIZE, q->llq.max_n_shift); + + q->llq.prod = q->llq.cons = 0; + return 0; +} + +static int arm_smmu_cmdq_init(struct arm_smmu_device *smmu) +{ + struct arm_smmu_cmdq *cmdq = &smmu->cmdq; + unsigned int nents = 1 << cmdq->q.llq.max_n_shift; + + atomic_set(&cmdq->owner_prod, 0); + atomic_set(&cmdq->lock, 0); + + cmdq->valid_map = (atomic_long_t *)devm_bitmap_zalloc(smmu->dev, nents, + GFP_KERNEL); + if (!cmdq->valid_map) + return -ENOMEM; + + return 0; +} + +static int arm_smmu_init_queues(struct arm_smmu_device *smmu) +{ + int ret; + + /* cmdq */ + ret = arm_smmu_init_one_queue(smmu, &smmu->cmdq.q, smmu->base, + ARM_SMMU_CMDQ_PROD, ARM_SMMU_CMDQ_CONS, + CMDQ_ENT_DWORDS, "cmdq"); + if (ret) + return ret; + + ret = arm_smmu_cmdq_init(smmu); + if (ret) + return ret; + + /* evtq */ + ret = arm_smmu_init_one_queue(smmu, &smmu->evtq.q, smmu->page1, + ARM_SMMU_EVTQ_PROD, ARM_SMMU_EVTQ_CONS, + EVTQ_ENT_DWORDS, "evtq"); + if (ret) + return ret; + + if ((smmu->features & ARM_SMMU_FEAT_SVA) && + (smmu->features & ARM_SMMU_FEAT_STALLS)) { + smmu->evtq.iopf = iopf_queue_alloc(dev_name(smmu->dev)); + if (!smmu->evtq.iopf) + return -ENOMEM; + } + + /* priq */ + if (!(smmu->features & ARM_SMMU_FEAT_PRI)) + return 0; + + return arm_smmu_init_one_queue(smmu, &smmu->priq.q, smmu->page1, + ARM_SMMU_PRIQ_PROD, ARM_SMMU_PRIQ_CONS, + PRIQ_ENT_DWORDS, "priq"); +} + +static int arm_smmu_init_l1_strtab(struct arm_smmu_device *smmu) +{ + unsigned int i; + struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; + void *strtab = smmu->strtab_cfg.strtab; + + cfg->l1_desc = devm_kcalloc(smmu->dev, cfg->num_l1_ents, + sizeof(*cfg->l1_desc), GFP_KERNEL); + if (!cfg->l1_desc) + return -ENOMEM; + + for (i = 0; i < cfg->num_l1_ents; ++i) { + arm_smmu_write_strtab_l1_desc(strtab, &cfg->l1_desc[i]); + strtab += STRTAB_L1_DESC_DWORDS << 3; + } + + return 0; +} + +static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) +{ + void *strtab; + u64 reg; + u32 size, l1size; + struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; + + /* Calculate the L1 size, capped to the SIDSIZE. */ + size = STRTAB_L1_SZ_SHIFT - (ilog2(STRTAB_L1_DESC_DWORDS) + 3); + size = min(size, smmu->sid_bits - STRTAB_SPLIT); + cfg->num_l1_ents = 1 << size; + + size += STRTAB_SPLIT; + if (size < smmu->sid_bits) + dev_warn(smmu->dev, + "2-level strtab only covers %u/%u bits of SID\n", + size, smmu->sid_bits); + + l1size = cfg->num_l1_ents * (STRTAB_L1_DESC_DWORDS << 3); + strtab = dmam_alloc_coherent(smmu->dev, l1size, &cfg->strtab_dma, + GFP_KERNEL); + if (!strtab) { + dev_err(smmu->dev, + "failed to allocate l1 stream table (%u bytes)\n", + l1size); + return -ENOMEM; + } + cfg->strtab = strtab; + + /* Configure strtab_base_cfg for 2 levels */ + reg = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_2LVL); + reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, size); + reg |= FIELD_PREP(STRTAB_BASE_CFG_SPLIT, STRTAB_SPLIT); + cfg->strtab_base_cfg = reg; + + return arm_smmu_init_l1_strtab(smmu); +} + +static int arm_smmu_init_strtab_linear(struct arm_smmu_device *smmu) +{ + void *strtab; + u64 reg; + u32 size; + struct arm_smmu_strtab_cfg *cfg = &smmu->strtab_cfg; + + size = (1 << smmu->sid_bits) * (STRTAB_STE_DWORDS << 3); + strtab = dmam_alloc_coherent(smmu->dev, size, &cfg->strtab_dma, + GFP_KERNEL); + if (!strtab) { + dev_err(smmu->dev, + "failed to allocate linear stream table (%u bytes)\n", + size); + return -ENOMEM; + } + cfg->strtab = strtab; + cfg->num_l1_ents = 1 << smmu->sid_bits; + + /* Configure strtab_base_cfg for a linear table covering all SIDs */ + reg = FIELD_PREP(STRTAB_BASE_CFG_FMT, STRTAB_BASE_CFG_FMT_LINEAR); + reg |= FIELD_PREP(STRTAB_BASE_CFG_LOG2SIZE, smmu->sid_bits); + cfg->strtab_base_cfg = reg; + + arm_smmu_init_bypass_stes(strtab, cfg->num_l1_ents, false); + return 0; +} + +static int arm_smmu_init_strtab(struct arm_smmu_device *smmu) +{ + u64 reg; + int ret; + + if (smmu->features & ARM_SMMU_FEAT_2_LVL_STRTAB) + ret = arm_smmu_init_strtab_2lvl(smmu); + else + ret = arm_smmu_init_strtab_linear(smmu); + + if (ret) + return ret; + + /* Set the strtab base address */ + reg = smmu->strtab_cfg.strtab_dma & STRTAB_BASE_ADDR_MASK; + reg |= STRTAB_BASE_RA; + smmu->strtab_cfg.strtab_base = reg; + + /* Allocate the first VMID for stage-2 bypass STEs */ + set_bit(0, smmu->vmid_map); + return 0; +} + +static int arm_smmu_init_structures(struct arm_smmu_device *smmu) +{ + int ret; + + mutex_init(&smmu->streams_mutex); + smmu->streams = RB_ROOT; + + ret = arm_smmu_init_queues(smmu); + if (ret) + return ret; + + return arm_smmu_init_strtab(smmu); +} + +static int arm_smmu_write_reg_sync(struct arm_smmu_device *smmu, u32 val, + unsigned int reg_off, unsigned int ack_off) +{ + u32 reg; + + writel_relaxed(val, smmu->base + reg_off); + return readl_relaxed_poll_timeout(smmu->base + ack_off, reg, reg == val, + 1, ARM_SMMU_POLL_TIMEOUT_US); +} + +/* GBPA is "special" */ +static int arm_smmu_update_gbpa(struct arm_smmu_device *smmu, u32 set, u32 clr) +{ + int ret; + u32 reg, __iomem *gbpa = smmu->base + ARM_SMMU_GBPA; + + ret = readl_relaxed_poll_timeout(gbpa, reg, !(reg & GBPA_UPDATE), + 1, ARM_SMMU_POLL_TIMEOUT_US); + if (ret) + return ret; + + reg &= ~clr; + reg |= set; + writel_relaxed(reg | GBPA_UPDATE, gbpa); + ret = readl_relaxed_poll_timeout(gbpa, reg, !(reg & GBPA_UPDATE), + 1, ARM_SMMU_POLL_TIMEOUT_US); + + if (ret) + dev_err(smmu->dev, "GBPA not responding to update\n"); + return ret; +} + +static void arm_smmu_free_msis(void *data) +{ + struct device *dev = data; + platform_msi_domain_free_irqs(dev); +} + +static void arm_smmu_write_msi_msg(struct msi_desc *desc, struct msi_msg *msg) +{ + phys_addr_t doorbell; + struct device *dev = msi_desc_to_dev(desc); + struct arm_smmu_device *smmu = dev_get_drvdata(dev); + phys_addr_t *cfg = arm_smmu_msi_cfg[desc->msi_index]; + + doorbell = (((u64)msg->address_hi) << 32) | msg->address_lo; + doorbell &= MSI_CFG0_ADDR_MASK; + + writeq_relaxed(doorbell, smmu->base + cfg[0]); + writel_relaxed(msg->data, smmu->base + cfg[1]); + writel_relaxed(ARM_SMMU_MEMATTR_DEVICE_nGnRE, smmu->base + cfg[2]); +} + +static void arm_smmu_setup_msis(struct arm_smmu_device *smmu) +{ + int ret, nvec = ARM_SMMU_MAX_MSIS; + struct device *dev = smmu->dev; + + /* Clear the MSI address regs */ + writeq_relaxed(0, smmu->base + ARM_SMMU_GERROR_IRQ_CFG0); + writeq_relaxed(0, smmu->base + ARM_SMMU_EVTQ_IRQ_CFG0); + + if (smmu->features & ARM_SMMU_FEAT_PRI) + writeq_relaxed(0, smmu->base + ARM_SMMU_PRIQ_IRQ_CFG0); + else + nvec--; + + if (!(smmu->features & ARM_SMMU_FEAT_MSI)) + return; + + if (!dev->msi.domain) { + dev_info(smmu->dev, "msi_domain absent - falling back to wired irqs\n"); + return; + } + + /* Allocate MSIs for evtq, gerror and priq. Ignore cmdq */ + ret = platform_msi_domain_alloc_irqs(dev, nvec, arm_smmu_write_msi_msg); + if (ret) { + dev_warn(dev, "failed to allocate MSIs - falling back to wired irqs\n"); + return; + } + + smmu->evtq.q.irq = msi_get_virq(dev, EVTQ_MSI_INDEX); + smmu->gerr_irq = msi_get_virq(dev, GERROR_MSI_INDEX); + smmu->priq.q.irq = msi_get_virq(dev, PRIQ_MSI_INDEX); + + /* Add callback to free MSIs on teardown */ + devm_add_action(dev, arm_smmu_free_msis, dev); +} + +static void arm_smmu_setup_unique_irqs(struct arm_smmu_device *smmu) +{ + int irq, ret; + + arm_smmu_setup_msis(smmu); + + /* Request interrupt lines */ + irq = smmu->evtq.q.irq; + if (irq) { + ret = devm_request_threaded_irq(smmu->dev, irq, NULL, + arm_smmu_evtq_thread, + IRQF_ONESHOT, + "arm-smmu-v3-evtq", smmu); + if (ret < 0) + dev_warn(smmu->dev, "failed to enable evtq irq\n"); + } else { + dev_warn(smmu->dev, "no evtq irq - events will not be reported!\n"); + } + + irq = smmu->gerr_irq; + if (irq) { + ret = devm_request_irq(smmu->dev, irq, arm_smmu_gerror_handler, + 0, "arm-smmu-v3-gerror", smmu); + if (ret < 0) + dev_warn(smmu->dev, "failed to enable gerror irq\n"); + } else { + dev_warn(smmu->dev, "no gerr irq - errors will not be reported!\n"); + } + + if (smmu->features & ARM_SMMU_FEAT_PRI) { + irq = smmu->priq.q.irq; + if (irq) { + ret = devm_request_threaded_irq(smmu->dev, irq, NULL, + arm_smmu_priq_thread, + IRQF_ONESHOT, + "arm-smmu-v3-priq", + smmu); + if (ret < 0) + dev_warn(smmu->dev, + "failed to enable priq irq\n"); + } else { + dev_warn(smmu->dev, "no priq irq - PRI will be broken\n"); + } + } +} + +static int arm_smmu_setup_irqs(struct arm_smmu_device *smmu) +{ + int ret, irq; + u32 irqen_flags = IRQ_CTRL_EVTQ_IRQEN | IRQ_CTRL_GERROR_IRQEN; + + /* Disable IRQs first */ + ret = arm_smmu_write_reg_sync(smmu, 0, ARM_SMMU_IRQ_CTRL, + ARM_SMMU_IRQ_CTRLACK); + if (ret) { + dev_err(smmu->dev, "failed to disable irqs\n"); + return ret; + } + + irq = smmu->combined_irq; + if (irq) { + /* + * Cavium ThunderX2 implementation doesn't support unique irq + * lines. Use a single irq line for all the SMMUv3 interrupts. + */ + ret = devm_request_threaded_irq(smmu->dev, irq, + arm_smmu_combined_irq_handler, + arm_smmu_combined_irq_thread, + IRQF_ONESHOT, + "arm-smmu-v3-combined-irq", smmu); + if (ret < 0) + dev_warn(smmu->dev, "failed to enable combined irq\n"); + } else + arm_smmu_setup_unique_irqs(smmu); + + if (smmu->features & ARM_SMMU_FEAT_PRI) + irqen_flags |= IRQ_CTRL_PRIQ_IRQEN; + + /* Enable interrupt generation on the SMMU */ + ret = arm_smmu_write_reg_sync(smmu, irqen_flags, + ARM_SMMU_IRQ_CTRL, ARM_SMMU_IRQ_CTRLACK); + if (ret) + dev_warn(smmu->dev, "failed to enable irqs\n"); + + return 0; +} + +static int arm_smmu_device_disable(struct arm_smmu_device *smmu) +{ + int ret; + + ret = arm_smmu_write_reg_sync(smmu, 0, ARM_SMMU_CR0, ARM_SMMU_CR0ACK); + if (ret) + dev_err(smmu->dev, "failed to clear cr0\n"); + + return ret; +} + +static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass) +{ + int ret; + u32 reg, enables; + struct arm_smmu_cmdq_ent cmd; + + /* Clear CR0 and sync (disables SMMU and queue processing) */ + reg = readl_relaxed(smmu->base + ARM_SMMU_CR0); + if (reg & CR0_SMMUEN) { + dev_warn(smmu->dev, "SMMU currently enabled! Resetting...\n"); + WARN_ON(is_kdump_kernel() && !disable_bypass); + arm_smmu_update_gbpa(smmu, GBPA_ABORT, 0); + } + + ret = arm_smmu_device_disable(smmu); + if (ret) + return ret; + + /* CR1 (table and queue memory attributes) */ + reg = FIELD_PREP(CR1_TABLE_SH, ARM_SMMU_SH_ISH) | + FIELD_PREP(CR1_TABLE_OC, CR1_CACHE_WB) | + FIELD_PREP(CR1_TABLE_IC, CR1_CACHE_WB) | + FIELD_PREP(CR1_QUEUE_SH, ARM_SMMU_SH_ISH) | + FIELD_PREP(CR1_QUEUE_OC, CR1_CACHE_WB) | + FIELD_PREP(CR1_QUEUE_IC, CR1_CACHE_WB); + writel_relaxed(reg, smmu->base + ARM_SMMU_CR1); + + /* CR2 (random crap) */ + reg = CR2_PTM | CR2_RECINVSID; + + if (smmu->features & ARM_SMMU_FEAT_E2H) + reg |= CR2_E2H; + + writel_relaxed(reg, smmu->base + ARM_SMMU_CR2); + + /* Stream table */ + writeq_relaxed(smmu->strtab_cfg.strtab_base, + smmu->base + ARM_SMMU_STRTAB_BASE); + writel_relaxed(smmu->strtab_cfg.strtab_base_cfg, + smmu->base + ARM_SMMU_STRTAB_BASE_CFG); + + /* Command queue */ + writeq_relaxed(smmu->cmdq.q.q_base, smmu->base + ARM_SMMU_CMDQ_BASE); + writel_relaxed(smmu->cmdq.q.llq.prod, smmu->base + ARM_SMMU_CMDQ_PROD); + writel_relaxed(smmu->cmdq.q.llq.cons, smmu->base + ARM_SMMU_CMDQ_CONS); + + enables = CR0_CMDQEN; + ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0, + ARM_SMMU_CR0ACK); + if (ret) { + dev_err(smmu->dev, "failed to enable command queue\n"); + return ret; + } + + /* Invalidate any cached configuration */ + cmd.opcode = CMDQ_OP_CFGI_ALL; + arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd); + + /* Invalidate any stale TLB entries */ + if (smmu->features & ARM_SMMU_FEAT_HYP) { + cmd.opcode = CMDQ_OP_TLBI_EL2_ALL; + arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd); + } + + cmd.opcode = CMDQ_OP_TLBI_NSNH_ALL; + arm_smmu_cmdq_issue_cmd_with_sync(smmu, &cmd); + + /* Event queue */ + writeq_relaxed(smmu->evtq.q.q_base, smmu->base + ARM_SMMU_EVTQ_BASE); + writel_relaxed(smmu->evtq.q.llq.prod, smmu->page1 + ARM_SMMU_EVTQ_PROD); + writel_relaxed(smmu->evtq.q.llq.cons, smmu->page1 + ARM_SMMU_EVTQ_CONS); + + enables |= CR0_EVTQEN; + ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0, + ARM_SMMU_CR0ACK); + if (ret) { + dev_err(smmu->dev, "failed to enable event queue\n"); + return ret; + } + + /* PRI queue */ + if (smmu->features & ARM_SMMU_FEAT_PRI) { + writeq_relaxed(smmu->priq.q.q_base, + smmu->base + ARM_SMMU_PRIQ_BASE); + writel_relaxed(smmu->priq.q.llq.prod, + smmu->page1 + ARM_SMMU_PRIQ_PROD); + writel_relaxed(smmu->priq.q.llq.cons, + smmu->page1 + ARM_SMMU_PRIQ_CONS); + + enables |= CR0_PRIQEN; + ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0, + ARM_SMMU_CR0ACK); + if (ret) { + dev_err(smmu->dev, "failed to enable PRI queue\n"); + return ret; + } + } + + if (smmu->features & ARM_SMMU_FEAT_ATS) { + enables |= CR0_ATSCHK; + ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0, + ARM_SMMU_CR0ACK); + if (ret) { + dev_err(smmu->dev, "failed to enable ATS check\n"); + return ret; + } + } + + ret = arm_smmu_setup_irqs(smmu); + if (ret) { + dev_err(smmu->dev, "failed to setup irqs\n"); + return ret; + } + + if (is_kdump_kernel()) + enables &= ~(CR0_EVTQEN | CR0_PRIQEN); + + /* Enable the SMMU interface, or ensure bypass */ + if (!bypass || disable_bypass) { + enables |= CR0_SMMUEN; + } else { + ret = arm_smmu_update_gbpa(smmu, 0, GBPA_ABORT); + if (ret) + return ret; + } + ret = arm_smmu_write_reg_sync(smmu, enables, ARM_SMMU_CR0, + ARM_SMMU_CR0ACK); + if (ret) { + dev_err(smmu->dev, "failed to enable SMMU interface\n"); + return ret; + } + + return 0; +} + +static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu) +{ + u32 reg; + bool coherent = smmu->features & ARM_SMMU_FEAT_COHERENCY; + + /* IDR0 */ + reg = readl_relaxed(smmu->base + ARM_SMMU_IDR0); + + /* 2-level structures */ + if (FIELD_GET(IDR0_ST_LVL, reg) == IDR0_ST_LVL_2LVL) + smmu->features |= ARM_SMMU_FEAT_2_LVL_STRTAB; + + if (reg & IDR0_CD2L) + smmu->features |= ARM_SMMU_FEAT_2_LVL_CDTAB; + + /* + * Translation table endianness. + * We currently require the same endianness as the CPU, but this + * could be changed later by adding a new IO_PGTABLE_QUIRK. + */ + switch (FIELD_GET(IDR0_TTENDIAN, reg)) { + case IDR0_TTENDIAN_MIXED: + smmu->features |= ARM_SMMU_FEAT_TT_LE | ARM_SMMU_FEAT_TT_BE; + break; +#ifdef __BIG_ENDIAN + case IDR0_TTENDIAN_BE: + smmu->features |= ARM_SMMU_FEAT_TT_BE; + break; +#else + case IDR0_TTENDIAN_LE: + smmu->features |= ARM_SMMU_FEAT_TT_LE; + break; +#endif + default: + dev_err(smmu->dev, "unknown/unsupported TT endianness!\n"); + return -ENXIO; + } + + /* Boolean feature flags */ + if (IS_ENABLED(CONFIG_PCI_PRI) && reg & IDR0_PRI) + smmu->features |= ARM_SMMU_FEAT_PRI; + + if (IS_ENABLED(CONFIG_PCI_ATS) && reg & IDR0_ATS) + smmu->features |= ARM_SMMU_FEAT_ATS; + + if (reg & IDR0_SEV) + smmu->features |= ARM_SMMU_FEAT_SEV; + + if (reg & IDR0_MSI) { + smmu->features |= ARM_SMMU_FEAT_MSI; + if (coherent && !disable_msipolling) + smmu->options |= ARM_SMMU_OPT_MSIPOLL; + } + + if (reg & IDR0_HYP) { + smmu->features |= ARM_SMMU_FEAT_HYP; + if (cpus_have_cap(ARM64_HAS_VIRT_HOST_EXTN)) + smmu->features |= ARM_SMMU_FEAT_E2H; + } + + /* + * The coherency feature as set by FW is used in preference to the ID + * register, but warn on mismatch. + */ + if (!!(reg & IDR0_COHACC) != coherent) + dev_warn(smmu->dev, "IDR0.COHACC overridden by FW configuration (%s)\n", + coherent ? "true" : "false"); + + switch (FIELD_GET(IDR0_STALL_MODEL, reg)) { + case IDR0_STALL_MODEL_FORCE: + smmu->features |= ARM_SMMU_FEAT_STALL_FORCE; + fallthrough; + case IDR0_STALL_MODEL_STALL: + smmu->features |= ARM_SMMU_FEAT_STALLS; + } + + if (reg & IDR0_S1P) + smmu->features |= ARM_SMMU_FEAT_TRANS_S1; + + if (reg & IDR0_S2P) + smmu->features |= ARM_SMMU_FEAT_TRANS_S2; + + if (!(reg & (IDR0_S1P | IDR0_S2P))) { + dev_err(smmu->dev, "no translation support!\n"); + return -ENXIO; + } + + /* We only support the AArch64 table format at present */ + switch (FIELD_GET(IDR0_TTF, reg)) { + case IDR0_TTF_AARCH32_64: + smmu->ias = 40; + fallthrough; + case IDR0_TTF_AARCH64: + break; + default: + dev_err(smmu->dev, "AArch64 table format not supported!\n"); + return -ENXIO; + } + + /* ASID/VMID sizes */ + smmu->asid_bits = reg & IDR0_ASID16 ? 16 : 8; + smmu->vmid_bits = reg & IDR0_VMID16 ? 16 : 8; + + /* IDR1 */ + reg = readl_relaxed(smmu->base + ARM_SMMU_IDR1); + if (reg & (IDR1_TABLES_PRESET | IDR1_QUEUES_PRESET | IDR1_REL)) { + dev_err(smmu->dev, "embedded implementation not supported\n"); + return -ENXIO; + } + + /* Queue sizes, capped to ensure natural alignment */ + smmu->cmdq.q.llq.max_n_shift = min_t(u32, CMDQ_MAX_SZ_SHIFT, + FIELD_GET(IDR1_CMDQS, reg)); + if (smmu->cmdq.q.llq.max_n_shift <= ilog2(CMDQ_BATCH_ENTRIES)) { + /* + * We don't support splitting up batches, so one batch of + * commands plus an extra sync needs to fit inside the command + * queue. There's also no way we can handle the weird alignment + * restrictions on the base pointer for a unit-length queue. + */ + dev_err(smmu->dev, "command queue size <= %d entries not supported\n", + CMDQ_BATCH_ENTRIES); + return -ENXIO; + } + + smmu->evtq.q.llq.max_n_shift = min_t(u32, EVTQ_MAX_SZ_SHIFT, + FIELD_GET(IDR1_EVTQS, reg)); + smmu->priq.q.llq.max_n_shift = min_t(u32, PRIQ_MAX_SZ_SHIFT, + FIELD_GET(IDR1_PRIQS, reg)); + + /* SID/SSID sizes */ + smmu->ssid_bits = FIELD_GET(IDR1_SSIDSIZE, reg); + smmu->sid_bits = FIELD_GET(IDR1_SIDSIZE, reg); + + /* + * If the SMMU supports fewer bits than would fill a single L2 stream + * table, use a linear table instead. + */ + if (smmu->sid_bits <= STRTAB_SPLIT) + smmu->features &= ~ARM_SMMU_FEAT_2_LVL_STRTAB; + + /* IDR3 */ + reg = readl_relaxed(smmu->base + ARM_SMMU_IDR3); + if (FIELD_GET(IDR3_RIL, reg)) + smmu->features |= ARM_SMMU_FEAT_RANGE_INV; + + /* IDR5 */ + reg = readl_relaxed(smmu->base + ARM_SMMU_IDR5); + + /* Maximum number of outstanding stalls */ + smmu->evtq.max_stalls = FIELD_GET(IDR5_STALL_MAX, reg); + + /* Page sizes */ + if (reg & IDR5_GRAN64K) + smmu->pgsize_bitmap |= SZ_64K | SZ_512M; + if (reg & IDR5_GRAN16K) + smmu->pgsize_bitmap |= SZ_16K | SZ_32M; + if (reg & IDR5_GRAN4K) + smmu->pgsize_bitmap |= SZ_4K | SZ_2M | SZ_1G; + + /* Input address size */ + if (FIELD_GET(IDR5_VAX, reg) == IDR5_VAX_52_BIT) + smmu->features |= ARM_SMMU_FEAT_VAX; + + /* Output address size */ + switch (FIELD_GET(IDR5_OAS, reg)) { + case IDR5_OAS_32_BIT: + smmu->oas = 32; + break; + case IDR5_OAS_36_BIT: + smmu->oas = 36; + break; + case IDR5_OAS_40_BIT: + smmu->oas = 40; + break; + case IDR5_OAS_42_BIT: + smmu->oas = 42; + break; + case IDR5_OAS_44_BIT: + smmu->oas = 44; + break; + case IDR5_OAS_52_BIT: + smmu->oas = 52; + smmu->pgsize_bitmap |= 1ULL << 42; /* 4TB */ + break; + default: + dev_info(smmu->dev, + "unknown output address size. Truncating to 48-bit\n"); + fallthrough; + case IDR5_OAS_48_BIT: + smmu->oas = 48; + } + + if (arm_smmu_ops.pgsize_bitmap == -1UL) + arm_smmu_ops.pgsize_bitmap = smmu->pgsize_bitmap; + else + arm_smmu_ops.pgsize_bitmap |= smmu->pgsize_bitmap; + + /* Set the DMA mask for our table walker */ + if (dma_set_mask_and_coherent(smmu->dev, DMA_BIT_MASK(smmu->oas))) + dev_warn(smmu->dev, + "failed to set DMA mask for table walker\n"); + + smmu->ias = max(smmu->ias, smmu->oas); + + if (arm_smmu_sva_supported(smmu)) + smmu->features |= ARM_SMMU_FEAT_SVA; + + dev_info(smmu->dev, "ias %lu-bit, oas %lu-bit (features 0x%08x)\n", + smmu->ias, smmu->oas, smmu->features); + return 0; +} + +#ifdef CONFIG_ACPI +static void acpi_smmu_get_options(u32 model, struct arm_smmu_device *smmu) +{ + switch (model) { + case ACPI_IORT_SMMU_V3_CAVIUM_CN99XX: + smmu->options |= ARM_SMMU_OPT_PAGE0_REGS_ONLY; + break; + case ACPI_IORT_SMMU_V3_HISILICON_HI161X: + smmu->options |= ARM_SMMU_OPT_SKIP_PREFETCH; + break; + } + + dev_notice(smmu->dev, "option mask 0x%x\n", smmu->options); +} + +static int arm_smmu_device_acpi_probe(struct platform_device *pdev, + struct arm_smmu_device *smmu) +{ + struct acpi_iort_smmu_v3 *iort_smmu; + struct device *dev = smmu->dev; + struct acpi_iort_node *node; + + node = *(struct acpi_iort_node **)dev_get_platdata(dev); + + /* Retrieve SMMUv3 specific data */ + iort_smmu = (struct acpi_iort_smmu_v3 *)node->node_data; + + acpi_smmu_get_options(iort_smmu->model, smmu); + + if (iort_smmu->flags & ACPI_IORT_SMMU_V3_COHACC_OVERRIDE) + smmu->features |= ARM_SMMU_FEAT_COHERENCY; + + return 0; +} +#else +static inline int arm_smmu_device_acpi_probe(struct platform_device *pdev, + struct arm_smmu_device *smmu) +{ + return -ENODEV; +} +#endif + +static int arm_smmu_device_dt_probe(struct platform_device *pdev, + struct arm_smmu_device *smmu) +{ + struct device *dev = &pdev->dev; + u32 cells; + int ret = -EINVAL; + + if (of_property_read_u32(dev->of_node, "#iommu-cells", &cells)) + dev_err(dev, "missing #iommu-cells property\n"); + else if (cells != 1) + dev_err(dev, "invalid #iommu-cells value (%d)\n", cells); + else + ret = 0; + + parse_driver_options(smmu); + + if (of_dma_is_coherent(dev->of_node)) + smmu->features |= ARM_SMMU_FEAT_COHERENCY; + + return ret; +} + +static unsigned long arm_smmu_resource_size(struct arm_smmu_device *smmu) +{ + if (smmu->options & ARM_SMMU_OPT_PAGE0_REGS_ONLY) + return SZ_64K; + else + return SZ_128K; +} + +static int arm_smmu_set_bus_ops(struct iommu_ops *ops) +{ + int err; + +#ifdef CONFIG_PCI + if (pci_bus_type.iommu_ops != ops) { + err = bus_set_iommu(&pci_bus_type, ops); + if (err) + return err; + } +#endif +#ifdef CONFIG_ARM_AMBA + if (amba_bustype.iommu_ops != ops) { + err = bus_set_iommu(&amba_bustype, ops); + if (err) + goto err_reset_pci_ops; + } +#endif + if (platform_bus_type.iommu_ops != ops) { + err = bus_set_iommu(&platform_bus_type, ops); + if (err) + goto err_reset_amba_ops; + } + + return 0; + +err_reset_amba_ops: +#ifdef CONFIG_ARM_AMBA + bus_set_iommu(&amba_bustype, NULL); +#endif +err_reset_pci_ops: __maybe_unused; +#ifdef CONFIG_PCI + bus_set_iommu(&pci_bus_type, NULL); +#endif + return err; +} + +static void __iomem *arm_smmu_ioremap(struct device *dev, resource_size_t start, + resource_size_t size) +{ + struct resource res = DEFINE_RES_MEM(start, size); + + return devm_ioremap_resource(dev, &res); +} + +static void arm_smmu_rmr_install_bypass_ste(struct arm_smmu_device *smmu) +{ + struct list_head rmr_list; + struct iommu_resv_region *e; + + INIT_LIST_HEAD(&rmr_list); + iort_get_rmr_sids(dev_fwnode(smmu->dev), &rmr_list); + + list_for_each_entry(e, &rmr_list, list) { + __le64 *step; + struct iommu_iort_rmr_data *rmr; + int ret, i; + + rmr = container_of(e, struct iommu_iort_rmr_data, rr); + for (i = 0; i < rmr->num_sids; i++) { + ret = arm_smmu_init_sid_strtab(smmu, rmr->sids[i]); + if (ret) { + dev_err(smmu->dev, "RMR SID(0x%x) bypass failed\n", + rmr->sids[i]); + continue; + } + + step = arm_smmu_get_step_for_sid(smmu, rmr->sids[i]); + arm_smmu_init_bypass_stes(step, 1, true); + } + } + + iort_put_rmr_sids(dev_fwnode(smmu->dev), &rmr_list); +} + +static int arm_smmu_device_probe(struct platform_device *pdev) +{ + int irq, ret; + struct resource *res; + resource_size_t ioaddr; + struct arm_smmu_device *smmu; + struct device *dev = &pdev->dev; + bool bypass; + + smmu = devm_kzalloc(dev, sizeof(*smmu), GFP_KERNEL); + if (!smmu) + return -ENOMEM; + smmu->dev = dev; + + if (dev->of_node) { + ret = arm_smmu_device_dt_probe(pdev, smmu); + } else { + ret = arm_smmu_device_acpi_probe(pdev, smmu); + if (ret == -ENODEV) + return ret; + } + + /* Set bypass mode according to firmware probing result */ + bypass = !!ret; + + /* Base address */ + res = platform_get_resource(pdev, IORESOURCE_MEM, 0); + if (!res) + return -EINVAL; + if (resource_size(res) < arm_smmu_resource_size(smmu)) { + dev_err(dev, "MMIO region too small (%pr)\n", res); + return -EINVAL; + } + ioaddr = res->start; + + /* + * Don't map the IMPLEMENTATION DEFINED regions, since they may contain + * the PMCG registers which are reserved by the PMU driver. + */ + smmu->base = arm_smmu_ioremap(dev, ioaddr, ARM_SMMU_REG_SZ); + if (IS_ERR(smmu->base)) + return PTR_ERR(smmu->base); + + if (arm_smmu_resource_size(smmu) > SZ_64K) { + smmu->page1 = arm_smmu_ioremap(dev, ioaddr + SZ_64K, + ARM_SMMU_REG_SZ); + if (IS_ERR(smmu->page1)) + return PTR_ERR(smmu->page1); + } else { + smmu->page1 = smmu->base; + } + + /* Interrupt lines */ + + irq = platform_get_irq_byname_optional(pdev, "combined"); + if (irq > 0) + smmu->combined_irq = irq; + else { + irq = platform_get_irq_byname_optional(pdev, "eventq"); + if (irq > 0) + smmu->evtq.q.irq = irq; + + irq = platform_get_irq_byname_optional(pdev, "priq"); + if (irq > 0) + smmu->priq.q.irq = irq; + + irq = platform_get_irq_byname_optional(pdev, "gerror"); + if (irq > 0) + smmu->gerr_irq = irq; + } + /* Probe the h/w */ + ret = arm_smmu_device_hw_probe(smmu); + if (ret) + return ret; + + /* Initialise in-memory data structures */ + ret = arm_smmu_init_structures(smmu); + if (ret) + return ret; + + /* Record our private device structure */ + platform_set_drvdata(pdev, smmu); + + /* Check for RMRs and install bypass STEs if any */ + arm_smmu_rmr_install_bypass_ste(smmu); + + /* Reset the device */ + ret = arm_smmu_device_reset(smmu, bypass); + if (ret) + return ret; + + /* And we're up. Go go go! */ + ret = iommu_device_sysfs_add(&smmu->iommu, dev, NULL, + "smmu3.%pa", &ioaddr); + if (ret) + return ret; + + ret = iommu_device_register(&smmu->iommu, &arm_smmu_ops, dev); + if (ret) { + dev_err(dev, "Failed to register iommu\n"); + goto err_sysfs_remove; + } + + ret = arm_smmu_set_bus_ops(&arm_smmu_ops); + if (ret) + goto err_unregister_device; + + return 0; + +err_unregister_device: + iommu_device_unregister(&smmu->iommu); +err_sysfs_remove: + iommu_device_sysfs_remove(&smmu->iommu); + return ret; +} + +static int arm_smmu_device_remove(struct platform_device *pdev) +{ + struct arm_smmu_device *smmu = platform_get_drvdata(pdev); + + arm_smmu_set_bus_ops(NULL); + iommu_device_unregister(&smmu->iommu); + iommu_device_sysfs_remove(&smmu->iommu); + arm_smmu_device_disable(smmu); + iopf_queue_free(smmu->evtq.iopf); + + return 0; +} + +static void arm_smmu_device_shutdown(struct platform_device *pdev) +{ + arm_smmu_device_remove(pdev); +} + +static const struct of_device_id arm_smmu_of_match[] = { + { .compatible = "arm,smmu-v3", }, + { }, +}; +MODULE_DEVICE_TABLE(of, arm_smmu_of_match); + +static void arm_smmu_driver_unregister(struct platform_driver *drv) +{ + arm_smmu_sva_notifier_synchronize(); + platform_driver_unregister(drv); +} + +static struct platform_driver arm_smmu_driver = { + .driver = { + .name = "arm-smmu-v3", + .of_match_table = arm_smmu_of_match, + .suppress_bind_attrs = true, + }, + .probe = arm_smmu_device_probe, + .remove = arm_smmu_device_remove, + .shutdown = arm_smmu_device_shutdown, +}; +module_driver(arm_smmu_driver, platform_driver_register, + arm_smmu_driver_unregister); + +MODULE_DESCRIPTION("IOMMU API for ARM architected SMMUv3 implementations"); +MODULE_AUTHOR("Will Deacon "); +MODULE_ALIAS("platform:arm-smmu-v3"); +MODULE_LICENSE("GPL v2"); -- Gitee From 0c465016252827fb94e92d9db22a3f78240c63e7 Mon Sep 17 00:00:00 2001 From: Wangming Shao Date: Thu, 8 Dec 2022 21:45:41 +0800 Subject: [PATCH 06/18] Fixed the issue that the macro def_domain_type is repeatedly defined. driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I63WFC ------------------------------------------------------------------- The type error caused by repeated definition of the def_domain_type macro is fixed. Signed-off-by: Wangming Shao Reviewed-by: Hanjun Guo Signed-off-by: Zheng Zengkai Signed-off-by: Slim6882 --- drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 445 +++++++++++++++++++- 1 file changed, 435 insertions(+), 10 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 71f7edded9cf..128111abcaa6 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2802,16 +2802,369 @@ static int arm_smmu_dev_disable_feature(struct device *dev, switch (feat) { case IOMMU_DEV_FEAT_IOPF: - if (!master->iopf_enabled) - return -EINVAL; - if (master->sva_enabled) - return -EBUSY; - master->iopf_enabled = false; - return 0; + return arm_smmu_master_disable_iopf(master); case IOMMU_DEV_FEAT_SVA: - if (!arm_smmu_master_sva_enabled(master)) - return -EINVAL; return arm_smmu_master_disable_sva(master); + case IOMMU_DEV_FEAT_AUX: + /* TODO: check if aux domains are still attached? */ + master->auxd_enabled = false; + return 0; + default: + return -EINVAL; + } +} + +static int arm_smmu_aux_attach_dev(struct iommu_domain *domain, struct device *dev) +{ + int ret; + struct iommu_domain *parent_domain; + struct arm_smmu_domain *parent_smmu_domain; + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + + if (!arm_smmu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX)) + return -EINVAL; + + parent_domain = iommu_get_domain_for_dev(dev); + if (!parent_domain) + return -EINVAL; + parent_smmu_domain = to_smmu_domain(parent_domain); + + mutex_lock(&smmu_domain->init_mutex); + if (smmu_domain->stage != ARM_SMMU_DOMAIN_S1 || + parent_smmu_domain->stage != ARM_SMMU_DOMAIN_S1) { + ret = -EINVAL; + goto out_unlock; + } else if (smmu_domain->s1_cfg.cdcfg.cdtab) { + /* Already attached as a normal domain */ + dev_err(dev, "cannot attach domain in auxiliary mode\n"); + ret = -EINVAL; + goto out_unlock; + } else if (!smmu_domain->smmu) { + ioasid_t ssid = ioasid_alloc(&private_ioasid, 1, + (1UL << master->ssid_bits) - 1, + NULL); + if (ssid == INVALID_IOASID) { + ret = -EINVAL; + goto out_unlock; + } + smmu_domain->smmu = master->smmu; + smmu_domain->parent = parent_smmu_domain; + smmu_domain->ssid = ssid; + + ret = arm_smmu_domain_finalise(domain, master); + if (ret) { + smmu_domain->smmu = NULL; + smmu_domain->ssid = 0; + smmu_domain->parent = NULL; + ioasid_free(ssid); + goto out_unlock; + } + } else if (smmu_domain->parent != parent_smmu_domain) { + /* Additional restriction: an aux domain has a single parent */ + dev_err(dev, "cannot attach aux domain with different parent\n"); + ret = -EINVAL; + goto out_unlock; + } + + /* FIXME: serialize against arm_smmu_share_asid() */ + if (!smmu_domain->aux_nr_devs++) + arm_smmu_write_ctx_desc(parent_smmu_domain, smmu_domain->ssid, + &smmu_domain->s1_cfg.cd); + /* + * Note that all other devices attached to the parent domain can now + * access this context as well. + */ + +out_unlock: + mutex_unlock(&smmu_domain->init_mutex); + return ret; +} + +static void arm_smmu_aux_detach_dev(struct iommu_domain *domain, struct device *dev) +{ + struct iommu_domain *parent_domain; + struct arm_smmu_domain *parent_smmu_domain; + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + + if (!arm_smmu_dev_feature_enabled(dev, IOMMU_DEV_FEAT_AUX)) + return; + + parent_domain = iommu_get_domain_for_dev(dev); + if (!parent_domain) + return; + parent_smmu_domain = to_smmu_domain(parent_domain); + + mutex_lock(&smmu_domain->init_mutex); + if (!smmu_domain->aux_nr_devs) + goto out_unlock; + + if (!--smmu_domain->aux_nr_devs) { + arm_smmu_write_ctx_desc(parent_smmu_domain, smmu_domain->ssid, + NULL); + /* + * TLB doesn't need invalidation since accesses from the device + * can't use this domain's ASID once the CD is clear. + * + * Sadly that doesn't apply to ATCs, which are PASID tagged. + * Invalidate all other devices as well, because even though + * they weren't 'officially' attached to the auxiliary domain, + * they could have formed ATC entries. + */ + arm_smmu_atc_inv_domain(parent_smmu_domain, smmu_domain->ssid, + 0, 0); + } else { + /* Invalidate only this device's ATC */ + arm_smmu_atc_inv_master(master, smmu_domain->ssid); + } +out_unlock: + mutex_unlock(&smmu_domain->init_mutex); +} + +static int arm_smmu_aux_get_pasid(struct iommu_domain *domain, struct device *dev) +{ + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); + + return smmu_domain->ssid ?: -EINVAL; +} + +static int arm_smmu_set_mpam(struct arm_smmu_device *smmu, + int sid, int ssid, int partid, int pmg, int s1mpam) +{ + struct arm_smmu_master *master = arm_smmu_find_master(smmu, sid); + struct arm_smmu_domain *domain = master ? master->domain : NULL; + u64 val; + __le64 *ste, *cd; + + struct arm_smmu_cmdq_ent prefetch_cmd = { + .opcode = CMDQ_OP_PREFETCH_CFG, + .prefetch = { + .sid = sid, + }, + }; + + if (WARN_ON(!domain)) + return -EINVAL; + if (WARN_ON(domain->stage != ARM_SMMU_DOMAIN_S1)) + return -EINVAL; + if (WARN_ON(ssid >= (1 << domain->s1_cfg.s1cdmax))) + return -E2BIG; + + if (!(smmu->features & ARM_SMMU_FEAT_MPAM)) + return -ENODEV; + + if (partid > smmu->mpam_partid_max || pmg > smmu->mpam_pmg_max) { + dev_err(smmu->dev, + "mpam rmid out of range: partid[0, %d] pmg[0, %d]\n", + smmu->mpam_partid_max, smmu->mpam_pmg_max); + return -ERANGE; + } + + /* get ste ptr */ + ste = arm_smmu_get_step_for_sid(smmu, sid); + + /* write s1mpam to ste */ + val = le64_to_cpu(ste[1]); + val &= ~STRTAB_STE_1_S1MPAM; + val |= FIELD_PREP(STRTAB_STE_1_S1MPAM, s1mpam); + WRITE_ONCE(ste[1], cpu_to_le64(val)); + + val = le64_to_cpu(ste[4]); + val &= ~STRTAB_STE_4_PARTID_MASK; + val |= FIELD_PREP(STRTAB_STE_4_PARTID_MASK, partid); + WRITE_ONCE(ste[4], cpu_to_le64(val)); + + val = le64_to_cpu(ste[5]); + val &= ~STRTAB_STE_5_PMG_MASK; + val |= FIELD_PREP(STRTAB_STE_5_PMG_MASK, pmg); + WRITE_ONCE(ste[5], cpu_to_le64(val)); + arm_smmu_sync_ste_for_sid(smmu, sid); + + /* do not modify cd table which owned by guest */ + if (domain->stage == ARM_SMMU_DOMAIN_NESTED) { + dev_err(smmu->dev, + "mpam: smmu cd is owned by guest, not modified\n"); + return 0; + } + + /* get cd ptr */ + cd = arm_smmu_get_cd_ptr(domain, ssid); + if (s1mpam && WARN_ON(!cd)) + return -ENOMEM; + + val = le64_to_cpu(cd[5]); + val &= ~CTXDESC_CD_5_PARTID_MASK; + val &= ~CTXDESC_CD_5_PMG_MASK; + val |= FIELD_PREP(CTXDESC_CD_5_PARTID_MASK, partid); + val |= FIELD_PREP(CTXDESC_CD_5_PMG_MASK, pmg); + WRITE_ONCE(cd[5], cpu_to_le64(val)); + arm_smmu_sync_cd(domain, ssid, true); + + /* It's likely that we'll want to use the new STE soon */ + if (!(smmu->options & ARM_SMMU_OPT_SKIP_PREFETCH)) + arm_smmu_cmdq_issue_cmd(smmu, &prefetch_cmd); + + dev_info(smmu->dev, "partid %d, pmg %d\n", partid, pmg); + + return 0; +} + +static int arm_smmu_set_dev_user_mpam_en(struct device *dev, int user_mpam_en) +{ + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct arm_smmu_device *smmu; + u32 reg, __iomem *cfg; + + if (WARN_ON(!master)) + return -EINVAL; + + smmu = master->domain->smmu; + cfg = smmu->base + ARM_SMMU_USER_CFG0; + + reg = readl_relaxed(cfg); + reg &= ~ARM_SMMU_USER_MPAM_EN; + reg |= FIELD_PREP(ARM_SMMU_USER_MPAM_EN, user_mpam_en); + writel(reg, cfg); + return 0; +} + +static int arm_smmu_device_set_mpam(struct device *dev, + struct arm_smmu_mpam *mpam) +{ + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + int ret; + + if (WARN_ON(!master) || WARN_ON(!mpam)) + return -EINVAL; + + if (mpam->flags & ARM_SMMU_DEV_SET_MPAM) { + ret = arm_smmu_set_mpam(master->domain->smmu, + master->streams->id, + mpam->pasid, + mpam->partid, + mpam->pmg, + mpam->s1mpam); + if (ret < 0) + return ret; + } + + if (mpam->flags & ARM_SMMU_DEV_SET_USER_MPAM_EN) { + ret = arm_smmu_set_dev_user_mpam_en(dev, mpam->user_mpam_en); + if (ret < 0) + return ret; + } + + return 0; + +} + +static int arm_smmu_get_mpam(struct arm_smmu_device *smmu, + int sid, int ssid, int *partid, int *pmg, int *s1mpam) +{ + struct arm_smmu_master *master = arm_smmu_find_master(smmu, sid); + struct arm_smmu_domain *domain = master ? master->domain : NULL; + u64 val; + __le64 *ste, *cd; + + if (WARN_ON(!domain)) + return -EINVAL; + if (WARN_ON(domain->stage != ARM_SMMU_DOMAIN_S1)) + return -EINVAL; + if (WARN_ON(ssid >= (1 << domain->s1_cfg.s1cdmax))) + return -E2BIG; + + if (!(smmu->features & ARM_SMMU_FEAT_MPAM)) + return -ENODEV; + + /* get ste ptr */ + ste = arm_smmu_get_step_for_sid(smmu, sid); + + val = le64_to_cpu(ste[4]); + *partid = FIELD_GET(STRTAB_STE_4_PARTID_MASK, val); + + val = le64_to_cpu(ste[5]); + *pmg = FIELD_GET(STRTAB_STE_5_PMG_MASK, val); + + val = le64_to_cpu(ste[1]); + *s1mpam = FIELD_GET(STRTAB_STE_1_S1MPAM, val); + /* return STE mpam configuration when s1mpam == 0 */ + if (!(*s1mpam)) + return 0; + + /* get cd ptr */ + cd = arm_smmu_get_cd_ptr(domain, ssid); + if (WARN_ON(!cd)) + return -ENOMEM; + + val = le64_to_cpu(cd[5]); + *partid = FIELD_GET(CTXDESC_CD_5_PARTID_MASK, val); + *pmg = FIELD_GET(CTXDESC_CD_5_PMG_MASK, val); + + return 0; +} + +static int arm_smmu_get_dev_user_mpam_en(struct device *dev, int *user_mpam_en) +{ + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + struct arm_smmu_device *smmu; + u32 reg; + + if (WARN_ON(!master)) + return -EINVAL; + + smmu = master->domain->smmu; + + reg = readl_relaxed(smmu->base + ARM_SMMU_USER_CFG0); + *user_mpam_en = FIELD_GET(ARM_SMMU_USER_MPAM_EN, reg); + return 0; +} + +static int arm_smmu_device_get_mpam(struct device *dev, + struct arm_smmu_mpam *mpam) +{ + struct arm_smmu_master *master = dev_iommu_priv_get(dev); + int ret; + + if (WARN_ON(!master) || WARN_ON(!mpam)) + return -EINVAL; + + if (mpam->flags & ARM_SMMU_DEV_GET_MPAM) { + ret = arm_smmu_get_mpam(master->domain->smmu, + master->streams->id, + mpam->pasid, + &mpam->partid, + &mpam->pmg, + &mpam->s1mpam); + if (ret < 0) + return ret; + } + + if (mpam->flags & ARM_SMMU_DEV_GET_USER_MPAM_EN) { + ret = arm_smmu_get_dev_user_mpam_en(dev, &mpam->user_mpam_en); + if (ret < 0) + return ret; + } + + return 0; +} + +static int arm_smmu_device_get_config(struct device *dev, int type, void *data) +{ + switch (type) { + case ARM_SMMU_MPAM: + return arm_smmu_device_get_mpam(dev, data); + default: + return -EINVAL; + } +} + +static int arm_smmu_device_set_config(struct device *dev, int type, void *data) +{ + switch (type) { + case ARM_SMMU_MPAM: + return arm_smmu_device_set_mpam(dev, data); + default: return -EINVAL; } @@ -2825,16 +3178,40 @@ static int arm_smmu_dev_disable_feature(struct device *dev, #define IS_HISI_PTT_DEVICE(pdev) ((pdev)->vendor == PCI_VENDOR_ID_HUAWEI && \ (pdev)->device == 0xa12e) +#ifdef CONFIG_SMMU_BYPASS_DEV +static int arm_smmu_bypass_dev_domain_type(struct device *dev) +{ + int i; + struct pci_dev *pdev = to_pci_dev(dev); + + for (i = 0; i < smmu_bypass_devices_num; i++) { + if ((smmu_bypass_devices[i].vendor == pdev->vendor) && + (smmu_bypass_devices[i].device == pdev->device)) { + dev_info(dev, "device 0x%hx:0x%hx uses identity mapping.", + pdev->vendor, pdev->device); + return IOMMU_DOMAIN_IDENTITY; + } + } + + return 0; +} +#endif + static int arm_smmu_def_domain_type(struct device *dev) { + int ret = 0; + if (dev_is_pci(dev)) { struct pci_dev *pdev = to_pci_dev(dev); if (IS_HISI_PTT_DEVICE(pdev)) return IOMMU_DOMAIN_IDENTITY; + #ifdef CONFIG_SMMU_BYPASS_DEV + ret = arm_smmu_bypass_dev_domain_type(dev); + #endif } - return 0; + return ret; } static struct iommu_ops arm_smmu_ops = { @@ -2851,7 +3228,12 @@ static struct iommu_ops arm_smmu_ops = { .sva_unbind = arm_smmu_sva_unbind, .sva_get_pasid = arm_smmu_sva_get_pasid, .page_response = arm_smmu_page_response, - .def_domain_type = arm_smmu_def_domain_type, + .def_domain_type = arm_smmu_def_domain_type, + .aux_attach_dev = arm_smmu_aux_attach_dev, + .aux_detach_dev = arm_smmu_aux_detach_dev, + .aux_get_pasid = arm_smmu_aux_get_pasid, + .dev_get_config = arm_smmu_device_get_config, + .dev_set_config = arm_smmu_device_set_config, .pgsize_bitmap = -1UL, /* Restricted during device attach */ .owner = THIS_MODULE, .default_domain_ops = &(const struct iommu_domain_ops) { @@ -2983,6 +3365,49 @@ static int arm_smmu_init_l1_strtab(struct arm_smmu_device *smmu) return 0; } +#ifdef CONFIG_SMMU_BYPASS_DEV +static void arm_smmu_install_bypass_ste_for_dev(struct arm_smmu_device *smmu, + u32 sid) +{ + u64 val; + __le64 *step = arm_smmu_get_step_for_sid(smmu, sid); + + if (!step) + return; + + val = STRTAB_STE_0_V; + val |= FIELD_PREP(STRTAB_STE_0_CFG, STRTAB_STE_0_CFG_BYPASS); + step[0] = cpu_to_le64(val); + step[1] = cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG, + STRTAB_STE_1_SHCFG_INCOMING)); + step[2] = 0; +} + +static int arm_smmu_prepare_init_l2_strtab(struct device *dev, void *data) +{ + u32 sid; + int ret; + struct pci_dev *pdev; + struct arm_smmu_device *smmu = (struct arm_smmu_device *)data; + + if (!arm_smmu_def_domain_type(dev)) + return 0; + + pdev = to_pci_dev(dev); + sid = PCI_DEVID(pdev->bus->number, pdev->devfn); + if (!arm_smmu_sid_in_range(smmu, sid)) + return -ERANGE; + + ret = arm_smmu_init_l2_strtab(smmu, sid); + if (ret) + return ret; + + arm_smmu_install_bypass_ste_for_dev(smmu, sid); + + return 0; +} +#endif + static int arm_smmu_init_strtab_2lvl(struct arm_smmu_device *smmu) { void *strtab; -- Gitee From 2bc02cafe7ccd69cef4eb6cc078383e2f6b99876 Mon Sep 17 00:00:00 2001 From: Dongdong Liu Date: Tue, 18 Jan 2022 17:21:17 +0800 Subject: [PATCH 07/18] PCI: Support BAR sizes up to 8TB Current kernel reports that BARs larger than 128GB, e.g., this 4TB BAR, are disabled: pci 0000:01:00.0: disabling BAR 4: [mem 0x00000000-0x3ffffffffff 64bit pref] (bad alignment 0x40000000000) Increase the maximum BAR size from 128GB to 8TB for future expansion. [bhelgaas: commit log] Link: https://lore.kernel.org/r/20220118092117.10089-1-liudongdong3@huawei.com Signed-off-by: Dongdong Liu Signed-off-by: Bjorn Helgaas Signed-off-by: Slim6882 --- drivers/pci/setup-bus.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c index 44f4866d95d8..894e4a523274 100644 --- a/drivers/pci/setup-bus.c +++ b/drivers/pci/setup-bus.c @@ -988,7 +988,7 @@ static int pbus_size_mem(struct pci_bus *bus, unsigned long mask, { struct pci_dev *dev; resource_size_t min_align, align, size, size0, size1; - resource_size_t aligns[18]; /* Alignments from 1MB to 128GB */ + resource_size_t aligns[24]; /* Alignments from 1MB to 8TB */ int order, max_order; struct resource *b_res = find_bus_resource_of_type(bus, mask | IORESOURCE_PREFETCH, type); -- Gitee From ea6090e4a9ad4e7dc6ca2081c8268a9951cd104a Mon Sep 17 00:00:00 2001 From: Qi Liu Date: Tue, 27 Sep 2022 16:13:59 +0800 Subject: [PATCH 08/18] perf auxtrace arm64: Add support for HiSilicon PCIe Tune and Trace device driver HiSilicon PCIe tune and trace device (PTT) could dynamically tune the PCIe link's events, and trace the TLP headers). This patch add support for PTT device in perf tool, so users could use 'perf record' to get TLP headers trace data. Reviewed-by: Leo Yan Signed-off-by: Qi Liu Signed-off-by: Yicong Yang Acked-by: John Garry Cc: Alexander Shishkin Cc: Bjorn Helgaas Cc: Greg Kroah-Hartman Cc: Ingo Molnar Cc: James Clark Cc: Jonathan Cameron Cc: Lorenzo Pieralisi Cc: Mark Rutland Cc: Mathieu Poirier Cc: Mike Leach Cc: Peter Zijlstra Cc: Qi Liu Cc: Shameerali Kolothum Thodi Cc: Shaokun Zhang Cc: Suzuki Poulouse Cc: Will Deacon Cc: Zeng Prime Cc: linux-arm-kernel@lists.infradead.org Cc: linux-pci@vger.kernel.org Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/20220927081400.14364-3-yangyicong@huawei.com Signed-off-by: Arnaldo Carvalho de Melo Signed-off-by: Slim6882 --- tools/perf/arch/arm/util/auxtrace.c | 114 +++++++++++++--- tools/perf/arch/arm/util/pmu.c | 6 +- tools/perf/arch/arm64/util/Build | 2 +- tools/perf/arch/arm64/util/hisi-ptt.c | 188 ++++++++++++++++++++++++++ tools/perf/util/auxtrace.c | 4 +- tools/perf/util/auxtrace.h | 1 + tools/perf/util/hisi-ptt.h | 16 +++ 7 files changed, 309 insertions(+), 22 deletions(-) create mode 100644 tools/perf/arch/arm64/util/hisi-ptt.c create mode 100644 tools/perf/util/hisi-ptt.h diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c index 28a5d0c18b1d..2d74e4084f13 100644 --- a/tools/perf/arch/arm/util/auxtrace.c +++ b/tools/perf/arch/arm/util/auxtrace.c @@ -4,9 +4,11 @@ * Author: Mathieu Poirier */ +#include #include #include #include +#include #include "../../util/auxtrace.h" #include "../../util/debug.h" @@ -14,6 +16,7 @@ #include "../../util/pmu.h" #include "cs-etm.h" #include "arm-spe.h" +#include "hisi-ptt.h" static struct perf_pmu **find_all_arm_spe_pmus(int *nr_spes, int *err) { @@ -50,43 +53,113 @@ static struct perf_pmu **find_all_arm_spe_pmus(int *nr_spes, int *err) return arm_spe_pmus; } +static struct perf_pmu **find_all_hisi_ptt_pmus(int *nr_ptts, int *err) +{ + const char *sysfs = sysfs__mountpoint(); + struct perf_pmu **hisi_ptt_pmus = NULL; + struct dirent *dent; + char path[PATH_MAX]; + DIR *dir = NULL; + int idx = 0; + + snprintf(path, PATH_MAX, "%s" EVENT_SOURCE_DEVICE_PATH, sysfs); + dir = opendir(path); + if (!dir) { + pr_err("can't read directory '%s'\n", EVENT_SOURCE_DEVICE_PATH); + *err = -EINVAL; + return NULL; + } + + while ((dent = readdir(dir))) { + if (strstr(dent->d_name, HISI_PTT_PMU_NAME)) + (*nr_ptts)++; + } + + if (!(*nr_ptts)) + goto out; + + hisi_ptt_pmus = zalloc(sizeof(struct perf_pmu *) * (*nr_ptts)); + if (!hisi_ptt_pmus) { + pr_err("hisi_ptt alloc failed\n"); + *err = -ENOMEM; + goto out; + } + + rewinddir(dir); + while ((dent = readdir(dir))) { + if (strstr(dent->d_name, HISI_PTT_PMU_NAME) && idx < *nr_ptts) { + hisi_ptt_pmus[idx] = perf_pmu__find(dent->d_name); + if (hisi_ptt_pmus[idx]) + idx++; + } + } + +out: + closedir(dir); + return hisi_ptt_pmus; +} + +static struct perf_pmu *find_pmu_for_event(struct perf_pmu **pmus, + int pmu_nr, struct evsel *evsel) +{ + int i; + + if (!pmus) + return NULL; + + for (i = 0; i < pmu_nr; i++) { + if (evsel->core.attr.type == pmus[i]->type) + return pmus[i]; + } + + return NULL; +} + struct auxtrace_record *auxtrace_record__init(struct evlist *evlist, int *err) { - struct perf_pmu *cs_etm_pmu; + struct perf_pmu *cs_etm_pmu = NULL; + struct perf_pmu **arm_spe_pmus = NULL; + struct perf_pmu **hisi_ptt_pmus = NULL; struct evsel *evsel; bool found_etm = false; struct perf_pmu *found_spe = NULL; - static struct perf_pmu **arm_spe_pmus = NULL; - static int nr_spes = 0; - int i = 0; + struct perf_pmu *found_ptt = NULL; + int auxtrace_event_cnt = 0; + int nr_spes = 0; + int nr_ptts = 0; if (!evlist) return NULL; cs_etm_pmu = perf_pmu__find(CORESIGHT_ETM_PMU_NAME); + arm_spe_pmus = find_all_arm_spe_pmus(&nr_spes, err); + hisi_ptt_pmus = find_all_hisi_ptt_pmus(&nr_ptts, err); if (!arm_spe_pmus) arm_spe_pmus = find_all_arm_spe_pmus(&nr_spes, err); - evlist__for_each_entry(evlist, evsel) { - if (cs_etm_pmu && - evsel->core.attr.type == cs_etm_pmu->type) - found_etm = true; - - if (!nr_spes || found_spe) - continue; + if (arm_spe_pmus && !found_spe) + found_spe = find_pmu_for_event(arm_spe_pmus, nr_spes, evsel); - for (i = 0; i < nr_spes; i++) { - if (evsel->core.attr.type == arm_spe_pmus[i]->type) { - found_spe = arm_spe_pmus[i]; - break; - } - } + if (hisi_ptt_pmus && !found_ptt) + found_ptt = find_pmu_for_event(hisi_ptt_pmus, nr_ptts, evsel); } - if (found_etm && found_spe) { - pr_err("Concurrent ARM Coresight ETM and SPE operation not currently supported\n"); + free(arm_spe_pmus); + free(hisi_ptt_pmus); + + if (found_etm) + auxtrace_event_cnt++; + + if (found_spe) + auxtrace_event_cnt++; + + if (found_ptt) + auxtrace_event_cnt++; + + if (auxtrace_event_cnt > 1) { + pr_err("Concurrent AUX trace operation not currently supported\n"); *err = -EOPNOTSUPP; return NULL; } @@ -97,6 +170,9 @@ struct auxtrace_record #if defined(__aarch64__) if (found_spe) return arm_spe_recording_init(err, found_spe); + + if (found_ptt) + return hisi_ptt_recording_init(err, found_ptt); #endif /* diff --git a/tools/perf/arch/arm/util/pmu.c b/tools/perf/arch/arm/util/pmu.c index bbc297a7e2e3..fd2b1df4a877 100644 --- a/tools/perf/arch/arm/util/pmu.c +++ b/tools/perf/arch/arm/util/pmu.c @@ -10,7 +10,9 @@ #include #include "arm-spe.h" -#include "../../util/pmu.h" +#include "hisi-ptt.h" +#include "../../../util/pmu.h" + struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu __maybe_unused) @@ -22,6 +24,8 @@ struct perf_event_attr #if defined(__aarch64__) } else if (strstarts(pmu->name, ARM_SPE_PMU_NAME)) { return arm_spe_pmu_default_config(pmu); + } else if (strstarts(pmu->name, HISI_PTT_PMU_NAME)) { + pmu->selectable = true; #endif } diff --git a/tools/perf/arch/arm64/util/Build b/tools/perf/arch/arm64/util/Build index 3cde540d2fcf..40874d0e11cc 100644 --- a/tools/perf/arch/arm64/util/Build +++ b/tools/perf/arch/arm64/util/Build @@ -7,4 +7,4 @@ perf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o perf-$(CONFIG_AUXTRACE) += ../../arm/util/pmu.o \ ../../arm/util/auxtrace.o \ ../../arm/util/cs-etm.o \ - arm-spe.o + arm-spe.o mem-events.o hisi-ptt.o diff --git a/tools/perf/arch/arm64/util/hisi-ptt.c b/tools/perf/arch/arm64/util/hisi-ptt.c new file mode 100644 index 000000000000..ba97c8a562a0 --- /dev/null +++ b/tools/perf/arch/arm64/util/hisi-ptt.c @@ -0,0 +1,188 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * HiSilicon PCIe Trace and Tuning (PTT) support + * Copyright (c) 2022 HiSilicon Technologies Co., Ltd. + */ + +#include +#include +#include +#include +#include +#include + +#include // page_size +#include "../../../util/auxtrace.h" +#include "../../../util/cpumap.h" +#include "../../../util/debug.h" +#include "../../../util/event.h" +#include "../../../util/evlist.h" +#include "../../../util/evsel.h" +#include "../../../util/hisi-ptt.h" +#include "../../../util/pmu.h" +#include "../../../util/record.h" +#include "../../../util/session.h" +#include "../../../util/tsc.h" + +#define KiB(x) ((x) * 1024) +#define MiB(x) ((x) * 1024 * 1024) + +struct hisi_ptt_recording { + struct auxtrace_record itr; + struct perf_pmu *hisi_ptt_pmu; + struct evlist *evlist; +}; + +static size_t +hisi_ptt_info_priv_size(struct auxtrace_record *itr __maybe_unused, + struct evlist *evlist __maybe_unused) +{ + return HISI_PTT_AUXTRACE_PRIV_SIZE; +} + +static int hisi_ptt_info_fill(struct auxtrace_record *itr, + struct perf_session *session, + struct perf_record_auxtrace_info *auxtrace_info, + size_t priv_size) +{ + struct hisi_ptt_recording *pttr = + container_of(itr, struct hisi_ptt_recording, itr); + struct perf_pmu *hisi_ptt_pmu = pttr->hisi_ptt_pmu; + + if (priv_size != HISI_PTT_AUXTRACE_PRIV_SIZE) + return -EINVAL; + + if (!session->evlist->core.nr_mmaps) + return -EINVAL; + + auxtrace_info->type = PERF_AUXTRACE_HISI_PTT; + auxtrace_info->priv[0] = hisi_ptt_pmu->type; + + return 0; +} + +static int hisi_ptt_set_auxtrace_mmap_page(struct record_opts *opts) +{ + bool privileged = perf_event_paranoid_check(-1); + + if (!opts->full_auxtrace) + return 0; + + if (opts->full_auxtrace && !opts->auxtrace_mmap_pages) { + if (privileged) { + opts->auxtrace_mmap_pages = MiB(16) / page_size; + } else { + opts->auxtrace_mmap_pages = KiB(128) / page_size; + if (opts->mmap_pages == UINT_MAX) + opts->mmap_pages = KiB(256) / page_size; + } + } + + /* Validate auxtrace_mmap_pages */ + if (opts->auxtrace_mmap_pages) { + size_t sz = opts->auxtrace_mmap_pages * (size_t)page_size; + size_t min_sz = KiB(8); + + if (sz < min_sz || !is_power_of_2(sz)) { + pr_err("Invalid mmap size for HISI PTT: must be at least %zuKiB and a power of 2\n", + min_sz / 1024); + return -EINVAL; + } + } + + return 0; +} + +static int hisi_ptt_recording_options(struct auxtrace_record *itr, + struct evlist *evlist, + struct record_opts *opts) +{ + struct hisi_ptt_recording *pttr = + container_of(itr, struct hisi_ptt_recording, itr); + struct perf_pmu *hisi_ptt_pmu = pttr->hisi_ptt_pmu; + struct evsel *evsel, *hisi_ptt_evsel = NULL; + struct evsel *tracking_evsel; + int err; + + pttr->evlist = evlist; + evlist__for_each_entry(evlist, evsel) { + if (evsel->core.attr.type == hisi_ptt_pmu->type) { + if (hisi_ptt_evsel) { + pr_err("There may be only one " HISI_PTT_PMU_NAME "x event\n"); + return -EINVAL; + } + evsel->core.attr.freq = 0; + evsel->core.attr.sample_period = 1; + evsel->needs_auxtrace_mmap = true; + hisi_ptt_evsel = evsel; + opts->full_auxtrace = true; + } + } + + err = hisi_ptt_set_auxtrace_mmap_page(opts); + if (err) + return err; + /* + * To obtain the auxtrace buffer file descriptor, the auxtrace event + * must come first. + */ + evlist__to_front(evlist, hisi_ptt_evsel); + evsel__set_sample_bit(hisi_ptt_evsel, TIME); + + /* Add dummy event to keep tracking */ + err = parse_event(evlist, "dummy:u"); + if (err) + return err; + + tracking_evsel = evlist__last(evlist); + evlist__set_tracking_event(evlist, tracking_evsel); + + tracking_evsel->core.attr.freq = 0; + tracking_evsel->core.attr.sample_period = 1; + evsel__set_sample_bit(tracking_evsel, TIME); + + return 0; +} + +static u64 hisi_ptt_reference(struct auxtrace_record *itr __maybe_unused) +{ + return rdtsc(); +} + +static void hisi_ptt_recording_free(struct auxtrace_record *itr) +{ + struct hisi_ptt_recording *pttr = + container_of(itr, struct hisi_ptt_recording, itr); + + free(pttr); +} + +struct auxtrace_record *hisi_ptt_recording_init(int *err, + struct perf_pmu *hisi_ptt_pmu) +{ + struct hisi_ptt_recording *pttr; + + if (!hisi_ptt_pmu) { + *err = -ENODEV; + return NULL; + } + + pttr = zalloc(sizeof(*pttr)); + if (!pttr) { + *err = -ENOMEM; + return NULL; + } + + pttr->hisi_ptt_pmu = hisi_ptt_pmu; + pttr->itr.pmu = hisi_ptt_pmu; + pttr->itr.recording_options = hisi_ptt_recording_options; + pttr->itr.info_priv_size = hisi_ptt_info_priv_size; + pttr->itr.info_fill = hisi_ptt_info_fill; + pttr->itr.free = hisi_ptt_recording_free; + pttr->itr.reference = hisi_ptt_reference; + pttr->itr.read_finish = auxtrace_record__read_finish; + pttr->itr.alignment = 0; + + *err = 0; + return &pttr->itr; +} diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c index 3d5cd16ca4de..4f6e5d0effaa 100644 --- a/tools/perf/util/auxtrace.c +++ b/tools/perf/util/auxtrace.c @@ -928,7 +928,9 @@ int perf_event__process_auxtrace_info(struct perf_session *session, case PERF_AUXTRACE_CS_ETM: return cs_etm__process_auxtrace_info(event, session); case PERF_AUXTRACE_S390_CPUMSF: - return s390_cpumsf_process_auxtrace_info(event, session); + err = s390_cpumsf_process_auxtrace_info(event, session); + break; + case PERF_AUXTRACE_HISI_PTT: case PERF_AUXTRACE_UNKNOWN: default: return -EINVAL; diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h index f201f36bc35f..39db1cf681a2 100644 --- a/tools/perf/util/auxtrace.h +++ b/tools/perf/util/auxtrace.h @@ -45,6 +45,7 @@ enum auxtrace_type { PERF_AUXTRACE_CS_ETM, PERF_AUXTRACE_ARM_SPE, PERF_AUXTRACE_S390_CPUMSF, + PERF_AUXTRACE_HISI_PTT, }; enum itrace_period_type { diff --git a/tools/perf/util/hisi-ptt.h b/tools/perf/util/hisi-ptt.h new file mode 100644 index 000000000000..82283c81b4c1 --- /dev/null +++ b/tools/perf/util/hisi-ptt.h @@ -0,0 +1,16 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * HiSilicon PCIe Trace and Tuning (PTT) support + * Copyright (c) 2022 HiSilicon Technologies Co., Ltd. + */ + +#ifndef INCLUDE__PERF_HISI_PTT_H__ +#define INCLUDE__PERF_HISI_PTT_H__ + +#define HISI_PTT_PMU_NAME "hisi_ptt" +#define HISI_PTT_AUXTRACE_PRIV_SIZE sizeof(u64) + +struct auxtrace_record *hisi_ptt_recording_init(int *err, + struct perf_pmu *hisi_ptt_pmu); + +#endif -- Gitee From 5de3fbb760da7591b193c09a37abeeeaf2f6d909 Mon Sep 17 00:00:00 2001 From: Yicong Yang Date: Tue, 16 Aug 2022 19:44:13 +0800 Subject: [PATCH 09/18] docs: trace: Add HiSilicon PTT device driver documentation Document the introduction and usage of HiSilicon PTT device driver as well as the sysfs attributes description provided by the driver. Signed-off-by: Yicong Yang Reviewed-by: Jonathan Cameron Reviewed-by: Bagas Sanjaya [Fixed month and kernel version] Link: https://lore.kernel.org/r/20220816114414.4092-5-yangyicong@huawei.com Signed-off-by: Mathieu Poirier Signed-off-by: Slim6882 --- .../ABI/testing/sysfs-devices-hisi_ptt | 61 ++++ Documentation/trace/hisi-ptt.rst | 298 ++++++++++++++++++ Documentation/trace/index.rst | 4 + 3 files changed, 363 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-devices-hisi_ptt create mode 100644 Documentation/trace/hisi-ptt.rst diff --git a/Documentation/ABI/testing/sysfs-devices-hisi_ptt b/Documentation/ABI/testing/sysfs-devices-hisi_ptt new file mode 100644 index 000000000000..82de6d710266 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-devices-hisi_ptt @@ -0,0 +1,61 @@ +What: /sys/devices/hisi_ptt_/tune +Date: October 2022 +KernelVersion: 6.1 +Contact: Yicong Yang +Description: This directory contains files for tuning the PCIe link + parameters(events). Each file is named after the event + of the PCIe link. + + See Documentation/trace/hisi-ptt.rst for more information. + +What: /sys/devices/hisi_ptt_/tune/qos_tx_cpl +Date: October 2022 +KernelVersion: 6.1 +Contact: Yicong Yang +Description: (RW) Controls the weight of Tx completion TLPs, which influence + the proportion of outbound completion TLPs on the PCIe link. + The available tune data is [0, 1, 2]. Writing a negative value + will return an error, and out of range values will be converted + to 2. The value indicates a probable level of the event. + +What: /sys/devices/hisi_ptt_/tune/qos_tx_np +Date: October 2022 +KernelVersion: 6.1 +Contact: Yicong Yang +Description: (RW) Controls the weight of Tx non-posted TLPs, which influence + the proportion of outbound non-posted TLPs on the PCIe link. + The available tune data is [0, 1, 2]. Writing a negative value + will return an error, and out of range values will be converted + to 2. The value indicates a probable level of the event. + +What: /sys/devices/hisi_ptt_/tune/qos_tx_p +Date: October 2022 +KernelVersion: 6.1 +Contact: Yicong Yang +Description: (RW) Controls the weight of Tx posted TLPs, which influence the + proportion of outbound posted TLPs on the PCIe link. + The available tune data is [0, 1, 2]. Writing a negative value + will return an error, and out of range values will be converted + to 2. The value indicates a probable level of the event. + +What: /sys/devices/hisi_ptt_/tune/rx_alloc_buf_level +Date: October 2022 +KernelVersion: 6.1 +Contact: Yicong Yang +Description: (RW) Control the allocated buffer watermark for inbound packets. + The packets will be stored in the buffer first and then transmitted + either when the watermark reached or when timed out. + The available tune data is [0, 1, 2]. Writing a negative value + will return an error, and out of range values will be converted + to 2. The value indicates a probable level of the event. + +What: /sys/devices/hisi_ptt_/tune/tx_alloc_buf_level +Date: October 2022 +KernelVersion: 6.1 +Contact: Yicong Yang +Description: (RW) Control the allocated buffer watermark of outbound packets. + The packets will be stored in the buffer first and then transmitted + either when the watermark reached or when timed out. + The available tune data is [0, 1, 2]. Writing a negative value + will return an error, and out of range values will be converted + to 2. The value indicates a probable level of the event. diff --git a/Documentation/trace/hisi-ptt.rst b/Documentation/trace/hisi-ptt.rst new file mode 100644 index 000000000000..4f87d8e21065 --- /dev/null +++ b/Documentation/trace/hisi-ptt.rst @@ -0,0 +1,298 @@ +.. SPDX-License-Identifier: GPL-2.0 + +====================================== +HiSilicon PCIe Tune and Trace device +====================================== + +Introduction +============ + +HiSilicon PCIe tune and trace device (PTT) is a PCIe Root Complex +integrated Endpoint (RCiEP) device, providing the capability +to dynamically monitor and tune the PCIe link's events (tune), +and trace the TLP headers (trace). The two functions are independent, +but is recommended to use them together to analyze and enhance the +PCIe link's performance. + +On Kunpeng 930 SoC, the PCIe Root Complex is composed of several +PCIe cores. Each PCIe core includes several Root Ports and a PTT +RCiEP, like below. The PTT device is capable of tuning and +tracing the links of the PCIe core. +:: + + +--------------Core 0-------+ + | | [ PTT ] | + | | [Root Port]---[Endpoint] + | | [Root Port]---[Endpoint] + | | [Root Port]---[Endpoint] + Root Complex |------Core 1-------+ + | | [ PTT ] | + | | [Root Port]---[ Switch ]---[Endpoint] + | | [Root Port]---[Endpoint] `-[Endpoint] + | | [Root Port]---[Endpoint] + +---------------------------+ + +The PTT device driver registers one PMU device for each PTT device. +The name of each PTT device is composed of 'hisi_ptt' prefix with +the id of the SICL and the Core where it locates. The Kunpeng 930 +SoC encapsulates multiple CPU dies (SCCL, Super CPU Cluster) and +IO dies (SICL, Super I/O Cluster), where there's one PCIe Root +Complex for each SICL. +:: + + /sys/devices/hisi_ptt_ + +Tune +==== + +PTT tune is designed for monitoring and adjusting PCIe link parameters (events). +Currently we support events in 2 classes. The scope of the events +covers the PCIe core to which the PTT device belongs. + +Each event is presented as a file under $(PTT PMU dir)/tune, and +a simple open/read/write/close cycle will be used to tune the event. +:: + + $ cd /sys/devices/hisi_ptt_/tune + $ ls + qos_tx_cpl qos_tx_np qos_tx_p + tx_path_rx_req_alloc_buf_level + tx_path_tx_req_alloc_buf_level + $ cat qos_tx_dp + 1 + $ echo 2 > qos_tx_dp + $ cat qos_tx_dp + 2 + +Current value (numerical value) of the event can be simply read +from the file, and the desired value written to the file to tune. + +1. Tx Path QoS Control +------------------------ + +The following files are provided to tune the QoS of the tx path of +the PCIe core. + +- qos_tx_cpl: weight of Tx completion TLPs +- qos_tx_np: weight of Tx non-posted TLPs +- qos_tx_p: weight of Tx posted TLPs + +The weight influences the proportion of certain packets on the PCIe link. +For example, for the storage scenario, increase the proportion +of the completion packets on the link to enhance the performance as +more completions are consumed. + +The available tune data of these events is [0, 1, 2]. +Writing a negative value will return an error, and out of range +values will be converted to 2. Note that the event value just +indicates a probable level, but is not precise. + +2. Tx Path Buffer Control +------------------------- + +Following files are provided to tune the buffer of tx path of the PCIe core. + +- rx_alloc_buf_level: watermark of Rx requested +- tx_alloc_buf_level: watermark of Tx requested + +These events influence the watermark of the buffer allocated for each +type. Rx means the inbound while Tx means outbound. The packets will +be stored in the buffer first and then transmitted either when the +watermark reached or when timed out. For a busy direction, you should +increase the related buffer watermark to avoid frequently posting and +thus enhance the performance. In most cases just keep the default value. + +The available tune data of above events is [0, 1, 2]. +Writing a negative value will return an error, and out of range +values will be converted to 2. Note that the event value just +indicates a probable level, but is not precise. + +Trace +===== + +PTT trace is designed for dumping the TLP headers to the memory, which +can be used to analyze the transactions and usage condition of the PCIe +Link. You can choose to filter the traced headers by either Requester ID, +or those downstream of a set of Root Ports on the same core of the PTT +device. It's also supported to trace the headers of certain type and of +certain direction. + +You can use the perf command `perf record` to set the parameters, start +trace and get the data. It's also supported to decode the trace +data with `perf report`. The control parameters for trace is inputted +as event code for each events, which will be further illustrated later. +An example usage is like +:: + + $ perf record -e hisi_ptt0_2/filter=0x80001,type=1,direction=1, + format=1/ -- sleep 5 + +This will trace the TLP headers downstream root port 0000:00:10.1 (event +code for event 'filter' is 0x80001) with type of posted TLP requests, +direction of inbound and traced data format of 8DW. + +1. Filter +--------- + +The TLP headers to trace can be filtered by the Root Ports or the Requester ID +of the Endpoint, which are located on the same core of the PTT device. You can +set the filter by specifying the `filter` parameter which is required to start +the trace. The parameter value is 20 bit. Bit 19 indicates the filter type. +1 for Root Port filter and 0 for Requester filter. Bit[15:0] indicates the +filter value. The value for a Root Port is a mask of the core port id which is +calculated from its PCI Slot ID as (slotid & 7) * 2. The value for a Requester +is the Requester ID (Device ID of the PCIe function). Bit[18:16] is currently +reserved for extension. + +For example, if the desired filter is Endpoint function 0000:01:00.1 the filter +value will be 0x00101. If the desired filter is Root Port 0000:00:10.0 then +then filter value is calculated as 0x80001. + +Note that multiple Root Ports can be specified at one time, but only one +Endpoint function can be specified in one trace. Specifying both Root Port +and function at the same time is not supported. Driver maintains a list of +available filters and will check the invalid inputs. + +Currently the available filters are detected in driver's probe. If the supported +devices are removed/added after probe, you may need to reload the driver to update +the filters. + +2. Type +------- + +You can trace the TLP headers of certain types by specifying the `type` +parameter, which is required to start the trace. The parameter value is +8 bit. Current supported types and related values are shown below: + +- 8'b00000001: posted requests (P) +- 8'b00000010: non-posted requests (NP) +- 8'b00000100: completions (CPL) + +You can specify multiple types when tracing inbound TLP headers, but can only +specify one when tracing outbound TLP headers. + +3. Direction +------------ + +You can trace the TLP headers from certain direction, which is relative +to the Root Port or the PCIe core, by specifying the `direction` parameter. +This is optional and the default parameter is inbound. The parameter value +is 4 bit. When the desired format is 4DW, directions and related values +supported are shown below: + +- 4'b0000: inbound TLPs (P, NP, CPL) +- 4'b0001: outbound TLPs (P, NP, CPL) +- 4'b0010: outbound TLPs (P, NP, CPL) and inbound TLPs (P, NP, CPL B) +- 4'b0011: outbound TLPs (P, NP, CPL) and inbound TLPs (CPL A) + +When the desired format is 8DW, directions and related values supported are +shown below: + +- 4'b0000: reserved +- 4'b0001: outbound TLPs (P, NP, CPL) +- 4'b0010: inbound TLPs (P, NP, CPL B) +- 4'b0011: inbound TLPs (CPL A) + +Inbound completions are classified into two types: + +- completion A (CPL A): completion of CHI/DMA/Native non-posted requests, except for CPL B +- completion B (CPL B): completion of DMA remote2local and P2P non-posted requests + +4. Format +-------------- + +You can change the format of the traced TLP headers by specifying the +`format` parameter. The default format is 4DW. The parameter value is 4 bit. +Current supported formats and related values are shown below: + +- 4'b0000: 4DW length per TLP header +- 4'b0001: 8DW length per TLP header + +The traced TLP header format is different from the PCIe standard. + +When using the 8DW data format, the entire TLP header is logged +(Header DW0-3 shown below). For example, the TLP header for Memory +Reads with 64-bit addresses is shown in PCIe r5.0, Figure 2-17; +the header for Configuration Requests is shown in Figure 2.20, etc. + +In addition, 8DW trace buffer entries contain a timestamp and +possibly a prefix for a PASID TLP prefix (see Figure 6-20, PCIe r5.0). +Otherwise this field will be all 0. + +The bit[31:11] of DW0 is always 0x1fffff, which can be +used to distinguish the data format. 8DW format is like +:: + + bits [ 31:11 ][ 10:0 ] + |---------------------------------------|-------------------| + DW0 [ 0x1fffff ][ Reserved (0x7ff) ] + DW1 [ Prefix ] + DW2 [ Header DW0 ] + DW3 [ Header DW1 ] + DW4 [ Header DW2 ] + DW5 [ Header DW3 ] + DW6 [ Reserved (0x0) ] + DW7 [ Time ] + +When using the 4DW data format, DW0 of the trace buffer entry +contains selected fields of DW0 of the TLP, together with a +timestamp. DW1-DW3 of the trace buffer entry contain DW1-DW3 +directly from the TLP header. + +4DW format is like +:: + + bits [31:30] [ 29:25 ][24][23][22][21][ 20:11 ][ 10:0 ] + |-----|---------|---|---|---|---|-------------|-------------| + DW0 [ Fmt ][ Type ][T9][T8][TH][SO][ Length ][ Time ] + DW1 [ Header DW1 ] + DW2 [ Header DW2 ] + DW3 [ Header DW3 ] + +5. Memory Management +-------------------- + +The traced TLP headers will be written to the memory allocated +by the driver. The hardware accepts 4 DMA address with same size, +and writes the buffer sequentially like below. If DMA addr 3 is +finished and the trace is still on, it will return to addr 0. +:: + + +->[DMA addr 0]->[DMA addr 1]->[DMA addr 2]->[DMA addr 3]-+ + +---------------------------------------------------------+ + +Driver will allocate each DMA buffer of 4MiB. The finished buffer +will be copied to the perf AUX buffer allocated by the perf core. +Once the AUX buffer is full while the trace is still on, driver +will commit the AUX buffer first and then apply for a new one with +the same size. The size of AUX buffer is default to 16MiB. User can +adjust the size by specifying the `-m` parameter of the perf command. + +6. Decoding +----------- + +You can decode the traced data with `perf report -D` command (currently +only support to dump the raw trace data). The traced data will be decoded +according to the format described previously (take 8DW as an example): +:: + + [...perf headers and other information] + . ... HISI PTT data: size 4194304 bytes + . 00000000: 00 00 00 00 Prefix + . 00000004: 01 00 00 60 Header DW0 + . 00000008: 0f 1e 00 01 Header DW1 + . 0000000c: 04 00 00 00 Header DW2 + . 00000010: 40 00 81 02 Header DW3 + . 00000014: 33 c0 04 00 Time + . 00000020: 00 00 00 00 Prefix + . 00000024: 01 00 00 60 Header DW0 + . 00000028: 0f 1e 00 01 Header DW1 + . 0000002c: 04 00 00 00 Header DW2 + . 00000030: 40 00 81 02 Header DW3 + . 00000034: 02 00 00 00 Time + . 00000040: 00 00 00 00 Prefix + . 00000044: 01 00 00 60 Header DW0 + . 00000048: 0f 1e 00 01 Header DW1 + . 0000004c: 04 00 00 00 Header DW2 + . 00000050: 40 00 81 02 Header DW3 + [...] diff --git a/Documentation/trace/index.rst b/Documentation/trace/index.rst index b7891cb1ab4d..2ca74fccffa7 100644 --- a/Documentation/trace/index.rst +++ b/Documentation/trace/index.rst @@ -25,3 +25,7 @@ Linux Tracing Technologies sys-t coresight coresight-cpu-debug + coresight/index + user_events + rv/index + hisi-ptt -- Gitee From 804a33b30bf0fe896e89b381342b294b7c9d9a30 Mon Sep 17 00:00:00 2001 From: Stephen Rothwell Date: Mon, 12 Sep 2022 10:03:59 -0600 Subject: [PATCH 10/18] hwtracing: hisi_ptt: Fix up for "iommu/dma: Make header private" drivers/hwtracing/ptt/hisi_ptt.c:13:10: fatal error: linux/dma-iommu.h: No such file or directory 13 | #include | ^~~~~~~~~~~~~~~~~~~ Caused by: commit ff0de066b463 ("hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device") interacting with: commit f2042ed21da7 ("iommu/dma: Make header private") from the iommu tree. Signed-off-by: Stephen Rothwell Acked-by: Robin Murphy Acked-by: Yicong Yang [Fixed subject line and added changelog text] Signed-off-by: Mathieu Poirier Signed-off-by: Slim6882 --- drivers/hwtracing/ptt/hisi_ptt.c | 1046 ++++++++++++++++++++++++++++++ 1 file changed, 1046 insertions(+) create mode 100644 drivers/hwtracing/ptt/hisi_ptt.c diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c new file mode 100644 index 000000000000..5d5526aa60c4 --- /dev/null +++ b/drivers/hwtracing/ptt/hisi_ptt.c @@ -0,0 +1,1046 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Driver for HiSilicon PCIe tune and trace device + * + * Copyright (c) 2022 HiSilicon Technologies Co., Ltd. + * Author: Yicong Yang + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "hisi_ptt.h" + +/* Dynamic CPU hotplug state used by PTT */ +static enum cpuhp_state hisi_ptt_pmu_online; + +static bool hisi_ptt_wait_tuning_finish(struct hisi_ptt *hisi_ptt) +{ + u32 val; + + return !readl_poll_timeout(hisi_ptt->iobase + HISI_PTT_TUNING_INT_STAT, + val, !(val & HISI_PTT_TUNING_INT_STAT_MASK), + HISI_PTT_WAIT_POLL_INTERVAL_US, + HISI_PTT_WAIT_TUNE_TIMEOUT_US); +} + +static ssize_t hisi_ptt_tune_attr_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct hisi_ptt *hisi_ptt = to_hisi_ptt(dev_get_drvdata(dev)); + struct dev_ext_attribute *ext_attr; + struct hisi_ptt_tune_desc *desc; + u32 reg; + u16 val; + + ext_attr = container_of(attr, struct dev_ext_attribute, attr); + desc = ext_attr->var; + + mutex_lock(&hisi_ptt->tune_lock); + + reg = readl(hisi_ptt->iobase + HISI_PTT_TUNING_CTRL); + reg &= ~(HISI_PTT_TUNING_CTRL_CODE | HISI_PTT_TUNING_CTRL_SUB); + reg |= FIELD_PREP(HISI_PTT_TUNING_CTRL_CODE | HISI_PTT_TUNING_CTRL_SUB, + desc->event_code); + writel(reg, hisi_ptt->iobase + HISI_PTT_TUNING_CTRL); + + /* Write all 1 to indicates it's the read process */ + writel(~0U, hisi_ptt->iobase + HISI_PTT_TUNING_DATA); + + if (!hisi_ptt_wait_tuning_finish(hisi_ptt)) { + mutex_unlock(&hisi_ptt->tune_lock); + return -ETIMEDOUT; + } + + reg = readl(hisi_ptt->iobase + HISI_PTT_TUNING_DATA); + reg &= HISI_PTT_TUNING_DATA_VAL_MASK; + val = FIELD_GET(HISI_PTT_TUNING_DATA_VAL_MASK, reg); + + mutex_unlock(&hisi_ptt->tune_lock); + return sysfs_emit(buf, "%u\n", val); +} + +static ssize_t hisi_ptt_tune_attr_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct hisi_ptt *hisi_ptt = to_hisi_ptt(dev_get_drvdata(dev)); + struct dev_ext_attribute *ext_attr; + struct hisi_ptt_tune_desc *desc; + u32 reg; + u16 val; + + ext_attr = container_of(attr, struct dev_ext_attribute, attr); + desc = ext_attr->var; + + if (kstrtou16(buf, 10, &val)) + return -EINVAL; + + mutex_lock(&hisi_ptt->tune_lock); + + reg = readl(hisi_ptt->iobase + HISI_PTT_TUNING_CTRL); + reg &= ~(HISI_PTT_TUNING_CTRL_CODE | HISI_PTT_TUNING_CTRL_SUB); + reg |= FIELD_PREP(HISI_PTT_TUNING_CTRL_CODE | HISI_PTT_TUNING_CTRL_SUB, + desc->event_code); + writel(reg, hisi_ptt->iobase + HISI_PTT_TUNING_CTRL); + writel(FIELD_PREP(HISI_PTT_TUNING_DATA_VAL_MASK, val), + hisi_ptt->iobase + HISI_PTT_TUNING_DATA); + + if (!hisi_ptt_wait_tuning_finish(hisi_ptt)) { + mutex_unlock(&hisi_ptt->tune_lock); + return -ETIMEDOUT; + } + + mutex_unlock(&hisi_ptt->tune_lock); + return count; +} + +#define HISI_PTT_TUNE_ATTR(_name, _val, _show, _store) \ + static struct hisi_ptt_tune_desc _name##_desc = { \ + .name = #_name, \ + .event_code = (_val), \ + }; \ + static struct dev_ext_attribute hisi_ptt_##_name##_attr = { \ + .attr = __ATTR(_name, 0600, _show, _store), \ + .var = &_name##_desc, \ + } + +#define HISI_PTT_TUNE_ATTR_COMMON(_name, _val) \ + HISI_PTT_TUNE_ATTR(_name, _val, \ + hisi_ptt_tune_attr_show, \ + hisi_ptt_tune_attr_store) + +/* + * The value of the tuning event are composed of two parts: main event code + * in BIT[0,15] and subevent code in BIT[16,23]. For example, qox_tx_cpl is + * a subevent of 'Tx path QoS control' which for tuning the weight of Tx + * completion TLPs. See hisi_ptt.rst documentation for more information. + */ +#define HISI_PTT_TUNE_QOS_TX_CPL (0x4 | (3 << 16)) +#define HISI_PTT_TUNE_QOS_TX_NP (0x4 | (4 << 16)) +#define HISI_PTT_TUNE_QOS_TX_P (0x4 | (5 << 16)) +#define HISI_PTT_TUNE_RX_ALLOC_BUF_LEVEL (0x5 | (6 << 16)) +#define HISI_PTT_TUNE_TX_ALLOC_BUF_LEVEL (0x5 | (7 << 16)) + +HISI_PTT_TUNE_ATTR_COMMON(qos_tx_cpl, HISI_PTT_TUNE_QOS_TX_CPL); +HISI_PTT_TUNE_ATTR_COMMON(qos_tx_np, HISI_PTT_TUNE_QOS_TX_NP); +HISI_PTT_TUNE_ATTR_COMMON(qos_tx_p, HISI_PTT_TUNE_QOS_TX_P); +HISI_PTT_TUNE_ATTR_COMMON(rx_alloc_buf_level, HISI_PTT_TUNE_RX_ALLOC_BUF_LEVEL); +HISI_PTT_TUNE_ATTR_COMMON(tx_alloc_buf_level, HISI_PTT_TUNE_TX_ALLOC_BUF_LEVEL); + +static struct attribute *hisi_ptt_tune_attrs[] = { + &hisi_ptt_qos_tx_cpl_attr.attr.attr, + &hisi_ptt_qos_tx_np_attr.attr.attr, + &hisi_ptt_qos_tx_p_attr.attr.attr, + &hisi_ptt_rx_alloc_buf_level_attr.attr.attr, + &hisi_ptt_tx_alloc_buf_level_attr.attr.attr, + NULL, +}; + +static struct attribute_group hisi_ptt_tune_group = { + .name = "tune", + .attrs = hisi_ptt_tune_attrs, +}; + +static u16 hisi_ptt_get_filter_val(u16 devid, bool is_port) +{ + if (is_port) + return BIT(HISI_PCIE_CORE_PORT_ID(devid & 0xff)); + + return devid; +} + +static bool hisi_ptt_wait_trace_hw_idle(struct hisi_ptt *hisi_ptt) +{ + u32 val; + + return !readl_poll_timeout_atomic(hisi_ptt->iobase + HISI_PTT_TRACE_STS, + val, val & HISI_PTT_TRACE_IDLE, + HISI_PTT_WAIT_POLL_INTERVAL_US, + HISI_PTT_WAIT_TRACE_TIMEOUT_US); +} + +static void hisi_ptt_wait_dma_reset_done(struct hisi_ptt *hisi_ptt) +{ + u32 val; + + readl_poll_timeout_atomic(hisi_ptt->iobase + HISI_PTT_TRACE_WR_STS, + val, !val, HISI_PTT_RESET_POLL_INTERVAL_US, + HISI_PTT_RESET_TIMEOUT_US); +} + +static void hisi_ptt_trace_end(struct hisi_ptt *hisi_ptt) +{ + writel(0, hisi_ptt->iobase + HISI_PTT_TRACE_CTRL); + hisi_ptt->trace_ctrl.started = false; +} + +static int hisi_ptt_trace_start(struct hisi_ptt *hisi_ptt) +{ + struct hisi_ptt_trace_ctrl *ctrl = &hisi_ptt->trace_ctrl; + u32 val; + int i; + + /* Check device idle before start trace */ + if (!hisi_ptt_wait_trace_hw_idle(hisi_ptt)) { + pci_err(hisi_ptt->pdev, "Failed to start trace, the device is still busy\n"); + return -EBUSY; + } + + ctrl->started = true; + + /* Reset the DMA before start tracing */ + val = readl(hisi_ptt->iobase + HISI_PTT_TRACE_CTRL); + val |= HISI_PTT_TRACE_CTRL_RST; + writel(val, hisi_ptt->iobase + HISI_PTT_TRACE_CTRL); + + hisi_ptt_wait_dma_reset_done(hisi_ptt); + + val = readl(hisi_ptt->iobase + HISI_PTT_TRACE_CTRL); + val &= ~HISI_PTT_TRACE_CTRL_RST; + writel(val, hisi_ptt->iobase + HISI_PTT_TRACE_CTRL); + + /* Reset the index of current buffer */ + hisi_ptt->trace_ctrl.buf_index = 0; + + /* Zero the trace buffers */ + for (i = 0; i < HISI_PTT_TRACE_BUF_CNT; i++) + memset(ctrl->trace_buf[i].addr, 0, HISI_PTT_TRACE_BUF_SIZE); + + /* Clear the interrupt status */ + writel(HISI_PTT_TRACE_INT_STAT_MASK, hisi_ptt->iobase + HISI_PTT_TRACE_INT_STAT); + writel(0, hisi_ptt->iobase + HISI_PTT_TRACE_INT_MASK); + + /* Set the trace control register */ + val = FIELD_PREP(HISI_PTT_TRACE_CTRL_TYPE_SEL, ctrl->type); + val |= FIELD_PREP(HISI_PTT_TRACE_CTRL_RXTX_SEL, ctrl->direction); + val |= FIELD_PREP(HISI_PTT_TRACE_CTRL_DATA_FORMAT, ctrl->format); + val |= FIELD_PREP(HISI_PTT_TRACE_CTRL_TARGET_SEL, hisi_ptt->trace_ctrl.filter); + if (!hisi_ptt->trace_ctrl.is_port) + val |= HISI_PTT_TRACE_CTRL_FILTER_MODE; + + /* Start the Trace */ + val |= HISI_PTT_TRACE_CTRL_EN; + writel(val, hisi_ptt->iobase + HISI_PTT_TRACE_CTRL); + + return 0; +} + +static int hisi_ptt_update_aux(struct hisi_ptt *hisi_ptt, int index, bool stop) +{ + struct hisi_ptt_trace_ctrl *ctrl = &hisi_ptt->trace_ctrl; + struct perf_output_handle *handle = &ctrl->handle; + struct perf_event *event = handle->event; + struct hisi_ptt_pmu_buf *buf; + size_t size; + void *addr; + + buf = perf_get_aux(handle); + if (!buf || !handle->size) + return -EINVAL; + + addr = ctrl->trace_buf[ctrl->buf_index].addr; + + /* + * If we're going to stop, read the size of already traced data from + * HISI_PTT_TRACE_WR_STS. Otherwise we're coming from the interrupt, + * the data size is always HISI_PTT_TRACE_BUF_SIZE. + */ + if (stop) { + u32 reg; + + reg = readl(hisi_ptt->iobase + HISI_PTT_TRACE_WR_STS); + size = FIELD_GET(HISI_PTT_TRACE_WR_STS_WRITE, reg); + } else { + size = HISI_PTT_TRACE_BUF_SIZE; + } + + memcpy(buf->base + buf->pos, addr, size); + buf->pos += size; + + /* + * Just commit the traced data if we're going to stop. Otherwise if the + * resident AUX buffer cannot contain the data of next trace buffer, + * apply a new one. + */ + if (stop) { + perf_aux_output_end(handle, buf->pos); + } else if (buf->length - buf->pos < HISI_PTT_TRACE_BUF_SIZE) { + perf_aux_output_end(handle, buf->pos); + + buf = perf_aux_output_begin(handle, event); + if (!buf) + return -EINVAL; + + buf->pos = handle->head % buf->length; + if (buf->length - buf->pos < HISI_PTT_TRACE_BUF_SIZE) { + perf_aux_output_end(handle, 0); + return -EINVAL; + } + } + + return 0; +} + +static irqreturn_t hisi_ptt_isr(int irq, void *context) +{ + struct hisi_ptt *hisi_ptt = context; + u32 status, buf_idx; + + status = readl(hisi_ptt->iobase + HISI_PTT_TRACE_INT_STAT); + if (!(status & HISI_PTT_TRACE_INT_STAT_MASK)) + return IRQ_NONE; + + buf_idx = ffs(status) - 1; + + /* Clear the interrupt status of buffer @buf_idx */ + writel(status, hisi_ptt->iobase + HISI_PTT_TRACE_INT_STAT); + + /* + * Update the AUX buffer and cache the current buffer index, + * as we need to know this and save the data when the trace + * is ended out of the interrupt handler. End the trace + * if the updating fails. + */ + if (hisi_ptt_update_aux(hisi_ptt, buf_idx, false)) + hisi_ptt_trace_end(hisi_ptt); + else + hisi_ptt->trace_ctrl.buf_index = (buf_idx + 1) % HISI_PTT_TRACE_BUF_CNT; + + return IRQ_HANDLED; +} + +static void hisi_ptt_irq_free_vectors(void *pdev) +{ + pci_free_irq_vectors(pdev); +} + +static int hisi_ptt_register_irq(struct hisi_ptt *hisi_ptt) +{ + struct pci_dev *pdev = hisi_ptt->pdev; + int ret; + + ret = pci_alloc_irq_vectors(pdev, 1, 1, PCI_IRQ_MSI); + if (ret < 0) { + pci_err(pdev, "failed to allocate irq vector, ret = %d\n", ret); + return ret; + } + + ret = devm_add_action_or_reset(&pdev->dev, hisi_ptt_irq_free_vectors, pdev); + if (ret < 0) + return ret; + + ret = devm_request_threaded_irq(&pdev->dev, + pci_irq_vector(pdev, HISI_PTT_TRACE_DMA_IRQ), + NULL, hisi_ptt_isr, 0, + DRV_NAME, hisi_ptt); + if (ret) { + pci_err(pdev, "failed to request irq %d, ret = %d\n", + pci_irq_vector(pdev, HISI_PTT_TRACE_DMA_IRQ), ret); + return ret; + } + + return 0; +} + +static int hisi_ptt_init_filters(struct pci_dev *pdev, void *data) +{ + struct hisi_ptt_filter_desc *filter; + struct hisi_ptt *hisi_ptt = data; + + /* + * We won't fail the probe if filter allocation failed here. The filters + * should be partial initialized and users would know which filter fails + * through the log. Other functions of PTT device are still available. + */ + filter = kzalloc(sizeof(*filter), GFP_KERNEL); + if (!filter) { + pci_err(hisi_ptt->pdev, "failed to add filter %s\n", pci_name(pdev)); + return -ENOMEM; + } + + filter->devid = PCI_DEVID(pdev->bus->number, pdev->devfn); + + if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT) { + filter->is_port = true; + list_add_tail(&filter->list, &hisi_ptt->port_filters); + + /* Update the available port mask */ + hisi_ptt->port_mask |= hisi_ptt_get_filter_val(filter->devid, true); + } else { + list_add_tail(&filter->list, &hisi_ptt->req_filters); + } + + return 0; +} + +static void hisi_ptt_release_filters(void *data) +{ + struct hisi_ptt_filter_desc *filter, *tmp; + struct hisi_ptt *hisi_ptt = data; + + list_for_each_entry_safe(filter, tmp, &hisi_ptt->req_filters, list) { + list_del(&filter->list); + kfree(filter); + } + + list_for_each_entry_safe(filter, tmp, &hisi_ptt->port_filters, list) { + list_del(&filter->list); + kfree(filter); + } +} + +static int hisi_ptt_config_trace_buf(struct hisi_ptt *hisi_ptt) +{ + struct hisi_ptt_trace_ctrl *ctrl = &hisi_ptt->trace_ctrl; + struct device *dev = &hisi_ptt->pdev->dev; + int i; + + ctrl->trace_buf = devm_kcalloc(dev, HISI_PTT_TRACE_BUF_CNT, + sizeof(*ctrl->trace_buf), GFP_KERNEL); + if (!ctrl->trace_buf) + return -ENOMEM; + + for (i = 0; i < HISI_PTT_TRACE_BUF_CNT; ++i) { + ctrl->trace_buf[i].addr = dmam_alloc_coherent(dev, HISI_PTT_TRACE_BUF_SIZE, + &ctrl->trace_buf[i].dma, + GFP_KERNEL); + if (!ctrl->trace_buf[i].addr) + return -ENOMEM; + } + + /* Configure the trace DMA buffer */ + for (i = 0; i < HISI_PTT_TRACE_BUF_CNT; i++) { + writel(lower_32_bits(ctrl->trace_buf[i].dma), + hisi_ptt->iobase + HISI_PTT_TRACE_ADDR_BASE_LO_0 + + i * HISI_PTT_TRACE_ADDR_STRIDE); + writel(upper_32_bits(ctrl->trace_buf[i].dma), + hisi_ptt->iobase + HISI_PTT_TRACE_ADDR_BASE_HI_0 + + i * HISI_PTT_TRACE_ADDR_STRIDE); + } + writel(HISI_PTT_TRACE_BUF_SIZE, hisi_ptt->iobase + HISI_PTT_TRACE_ADDR_SIZE); + + return 0; +} + +static int hisi_ptt_init_ctrls(struct hisi_ptt *hisi_ptt) +{ + struct pci_dev *pdev = hisi_ptt->pdev; + struct pci_bus *bus; + int ret; + u32 reg; + + INIT_LIST_HEAD(&hisi_ptt->port_filters); + INIT_LIST_HEAD(&hisi_ptt->req_filters); + + ret = hisi_ptt_config_trace_buf(hisi_ptt); + if (ret) + return ret; + + /* + * The device range register provides the information about the root + * ports which the RCiEP can control and trace. The RCiEP and the root + * ports which it supports are on the same PCIe core, with same domain + * number but maybe different bus number. The device range register + * will tell us which root ports we can support, Bit[31:16] indicates + * the upper BDF numbers of the root port, while Bit[15:0] indicates + * the lower. + */ + reg = readl(hisi_ptt->iobase + HISI_PTT_DEVICE_RANGE); + hisi_ptt->upper_bdf = FIELD_GET(HISI_PTT_DEVICE_RANGE_UPPER, reg); + hisi_ptt->lower_bdf = FIELD_GET(HISI_PTT_DEVICE_RANGE_LOWER, reg); + + bus = pci_find_bus(pci_domain_nr(pdev->bus), PCI_BUS_NUM(hisi_ptt->upper_bdf)); + if (bus) + pci_walk_bus(bus, hisi_ptt_init_filters, hisi_ptt); + + ret = devm_add_action_or_reset(&pdev->dev, hisi_ptt_release_filters, hisi_ptt); + if (ret) + return ret; + + hisi_ptt->trace_ctrl.on_cpu = -1; + return 0; +} + +static ssize_t cpumask_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct hisi_ptt *hisi_ptt = to_hisi_ptt(dev_get_drvdata(dev)); + const cpumask_t *cpumask = cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev)); + + return cpumap_print_to_pagebuf(true, buf, cpumask); +} +static DEVICE_ATTR_RO(cpumask); + +static struct attribute *hisi_ptt_cpumask_attrs[] = { + &dev_attr_cpumask.attr, + NULL +}; + +static const struct attribute_group hisi_ptt_cpumask_attr_group = { + .attrs = hisi_ptt_cpumask_attrs, +}; + +/* + * Bit 19 indicates the filter type, 1 for Root Port filter and 0 for Requester + * filter. Bit[15:0] indicates the filter value, for Root Port filter it's + * a bit mask of desired ports and for Requester filter it's the Requester ID + * of the desired PCIe function. Bit[18:16] is reserved for extension. + * + * See hisi_ptt.rst documentation for detailed information. + */ +PMU_FORMAT_ATTR(filter, "config:0-19"); +PMU_FORMAT_ATTR(direction, "config:20-23"); +PMU_FORMAT_ATTR(type, "config:24-31"); +PMU_FORMAT_ATTR(format, "config:32-35"); + +static struct attribute *hisi_ptt_pmu_format_attrs[] = { + &format_attr_filter.attr, + &format_attr_direction.attr, + &format_attr_type.attr, + &format_attr_format.attr, + NULL +}; + +static struct attribute_group hisi_ptt_pmu_format_group = { + .name = "format", + .attrs = hisi_ptt_pmu_format_attrs, +}; + +static const struct attribute_group *hisi_ptt_pmu_groups[] = { + &hisi_ptt_cpumask_attr_group, + &hisi_ptt_pmu_format_group, + &hisi_ptt_tune_group, + NULL +}; + +static int hisi_ptt_trace_valid_direction(u32 val) +{ + /* + * The direction values have different effects according to the data + * format (specified in the parentheses). TLP set A/B means different + * set of TLP types. See hisi_ptt.rst documentation for more details. + */ + static const u32 hisi_ptt_trace_available_direction[] = { + 0, /* inbound(4DW) or reserved(8DW) */ + 1, /* outbound(4DW) */ + 2, /* {in, out}bound(4DW) or inbound(8DW), TLP set A */ + 3, /* {in, out}bound(4DW) or inbound(8DW), TLP set B */ + }; + int i; + + for (i = 0; i < ARRAY_SIZE(hisi_ptt_trace_available_direction); i++) { + if (val == hisi_ptt_trace_available_direction[i]) + return 0; + } + + return -EINVAL; +} + +static int hisi_ptt_trace_valid_type(u32 val) +{ + /* Different types can be set simultaneously */ + static const u32 hisi_ptt_trace_available_type[] = { + 1, /* posted_request */ + 2, /* non-posted_request */ + 4, /* completion */ + }; + int i; + + if (!val) + return -EINVAL; + + /* + * Walk the available list and clear the valid bits of + * the config. If there is any resident bit after the + * walk then the config is invalid. + */ + for (i = 0; i < ARRAY_SIZE(hisi_ptt_trace_available_type); i++) + val &= ~hisi_ptt_trace_available_type[i]; + + if (val) + return -EINVAL; + + return 0; +} + +static int hisi_ptt_trace_valid_format(u32 val) +{ + static const u32 hisi_ptt_trace_availble_format[] = { + 0, /* 4DW */ + 1, /* 8DW */ + }; + int i; + + for (i = 0; i < ARRAY_SIZE(hisi_ptt_trace_availble_format); i++) { + if (val == hisi_ptt_trace_availble_format[i]) + return 0; + } + + return -EINVAL; +} + +static int hisi_ptt_trace_valid_filter(struct hisi_ptt *hisi_ptt, u64 config) +{ + unsigned long val, port_mask = hisi_ptt->port_mask; + struct hisi_ptt_filter_desc *filter; + + hisi_ptt->trace_ctrl.is_port = FIELD_GET(HISI_PTT_PMU_FILTER_IS_PORT, config); + val = FIELD_GET(HISI_PTT_PMU_FILTER_VAL_MASK, config); + + /* + * Port filters are defined as bit mask. For port filters, check + * the bits in the @val are within the range of hisi_ptt->port_mask + * and whether it's empty or not, otherwise user has specified + * some unsupported root ports. + * + * For Requester ID filters, walk the available filter list to see + * whether we have one matched. + */ + if (!hisi_ptt->trace_ctrl.is_port) { + list_for_each_entry(filter, &hisi_ptt->req_filters, list) { + if (val == hisi_ptt_get_filter_val(filter->devid, filter->is_port)) + return 0; + } + } else if (bitmap_subset(&val, &port_mask, BITS_PER_LONG)) { + return 0; + } + + return -EINVAL; +} + +static void hisi_ptt_pmu_init_configs(struct hisi_ptt *hisi_ptt, struct perf_event *event) +{ + struct hisi_ptt_trace_ctrl *ctrl = &hisi_ptt->trace_ctrl; + u32 val; + + val = FIELD_GET(HISI_PTT_PMU_FILTER_VAL_MASK, event->attr.config); + hisi_ptt->trace_ctrl.filter = val; + + val = FIELD_GET(HISI_PTT_PMU_DIRECTION_MASK, event->attr.config); + ctrl->direction = val; + + val = FIELD_GET(HISI_PTT_PMU_TYPE_MASK, event->attr.config); + ctrl->type = val; + + val = FIELD_GET(HISI_PTT_PMU_FORMAT_MASK, event->attr.config); + ctrl->format = val; +} + +static int hisi_ptt_pmu_event_init(struct perf_event *event) +{ + struct hisi_ptt *hisi_ptt = to_hisi_ptt(event->pmu); + int ret; + u32 val; + + if (event->cpu < 0) { + dev_dbg(event->pmu->dev, "Per-task mode not supported\n"); + return -EOPNOTSUPP; + } + + if (event->attr.type != hisi_ptt->hisi_ptt_pmu.type) + return -ENOENT; + + ret = hisi_ptt_trace_valid_filter(hisi_ptt, event->attr.config); + if (ret < 0) + return ret; + + val = FIELD_GET(HISI_PTT_PMU_DIRECTION_MASK, event->attr.config); + ret = hisi_ptt_trace_valid_direction(val); + if (ret < 0) + return ret; + + val = FIELD_GET(HISI_PTT_PMU_TYPE_MASK, event->attr.config); + ret = hisi_ptt_trace_valid_type(val); + if (ret < 0) + return ret; + + val = FIELD_GET(HISI_PTT_PMU_FORMAT_MASK, event->attr.config); + return hisi_ptt_trace_valid_format(val); +} + +static void *hisi_ptt_pmu_setup_aux(struct perf_event *event, void **pages, + int nr_pages, bool overwrite) +{ + struct hisi_ptt_pmu_buf *buf; + struct page **pagelist; + int i; + + if (overwrite) { + dev_warn(event->pmu->dev, "Overwrite mode is not supported\n"); + return NULL; + } + + /* If the pages size less than buffers, we cannot start trace */ + if (nr_pages < HISI_PTT_TRACE_TOTAL_BUF_SIZE / PAGE_SIZE) + return NULL; + + buf = kzalloc(sizeof(*buf), GFP_KERNEL); + if (!buf) + return NULL; + + pagelist = kcalloc(nr_pages, sizeof(*pagelist), GFP_KERNEL); + if (!pagelist) + goto err; + + for (i = 0; i < nr_pages; i++) + pagelist[i] = virt_to_page(pages[i]); + + buf->base = vmap(pagelist, nr_pages, VM_MAP, PAGE_KERNEL); + if (!buf->base) { + kfree(pagelist); + goto err; + } + + buf->nr_pages = nr_pages; + buf->length = nr_pages * PAGE_SIZE; + buf->pos = 0; + + kfree(pagelist); + return buf; +err: + kfree(buf); + return NULL; +} + +static void hisi_ptt_pmu_free_aux(void *aux) +{ + struct hisi_ptt_pmu_buf *buf = aux; + + vunmap(buf->base); + kfree(buf); +} + +static void hisi_ptt_pmu_start(struct perf_event *event, int flags) +{ + struct hisi_ptt *hisi_ptt = to_hisi_ptt(event->pmu); + struct perf_output_handle *handle = &hisi_ptt->trace_ctrl.handle; + struct hw_perf_event *hwc = &event->hw; + struct device *dev = event->pmu->dev; + struct hisi_ptt_pmu_buf *buf; + int cpu = event->cpu; + int ret; + + hwc->state = 0; + + /* Serialize the perf process if user specified several CPUs */ + spin_lock(&hisi_ptt->pmu_lock); + if (hisi_ptt->trace_ctrl.started) { + dev_dbg(dev, "trace has already started\n"); + goto stop; + } + + /* + * Handle the interrupt on the same cpu which starts the trace to avoid + * context mismatch. Otherwise we'll trigger the WARN from the perf + * core in event_function_local(). If CPU passed is offline we'll fail + * here, just log it since we can do nothing here. + */ + ret = irq_set_affinity(pci_irq_vector(hisi_ptt->pdev, HISI_PTT_TRACE_DMA_IRQ), + cpumask_of(cpu)); + if (ret) + dev_warn(dev, "failed to set the affinity of trace interrupt\n"); + + hisi_ptt->trace_ctrl.on_cpu = cpu; + + buf = perf_aux_output_begin(handle, event); + if (!buf) { + dev_dbg(dev, "aux output begin failed\n"); + goto stop; + } + + buf->pos = handle->head % buf->length; + + hisi_ptt_pmu_init_configs(hisi_ptt, event); + + ret = hisi_ptt_trace_start(hisi_ptt); + if (ret) { + dev_dbg(dev, "trace start failed, ret = %d\n", ret); + perf_aux_output_end(handle, 0); + goto stop; + } + + spin_unlock(&hisi_ptt->pmu_lock); + return; +stop: + event->hw.state |= PERF_HES_STOPPED; + spin_unlock(&hisi_ptt->pmu_lock); +} + +static void hisi_ptt_pmu_stop(struct perf_event *event, int flags) +{ + struct hisi_ptt *hisi_ptt = to_hisi_ptt(event->pmu); + struct hw_perf_event *hwc = &event->hw; + + if (hwc->state & PERF_HES_STOPPED) + return; + + spin_lock(&hisi_ptt->pmu_lock); + if (hisi_ptt->trace_ctrl.started) { + hisi_ptt_trace_end(hisi_ptt); + + if (!hisi_ptt_wait_trace_hw_idle(hisi_ptt)) + dev_warn(event->pmu->dev, "Device is still busy\n"); + + hisi_ptt_update_aux(hisi_ptt, hisi_ptt->trace_ctrl.buf_index, true); + } + spin_unlock(&hisi_ptt->pmu_lock); + + hwc->state |= PERF_HES_STOPPED; + perf_event_update_userpage(event); + hwc->state |= PERF_HES_UPTODATE; +} + +static int hisi_ptt_pmu_add(struct perf_event *event, int flags) +{ + struct hisi_ptt *hisi_ptt = to_hisi_ptt(event->pmu); + struct hw_perf_event *hwc = &event->hw; + int cpu = event->cpu; + + /* Only allow the cpus on the device's node to add the event */ + if (!cpumask_test_cpu(cpu, cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev)))) + return 0; + + hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE; + + if (flags & PERF_EF_START) { + hisi_ptt_pmu_start(event, PERF_EF_RELOAD); + if (hwc->state & PERF_HES_STOPPED) + return -EINVAL; + } + + return 0; +} + +static void hisi_ptt_pmu_del(struct perf_event *event, int flags) +{ + hisi_ptt_pmu_stop(event, PERF_EF_UPDATE); +} + +static void hisi_ptt_remove_cpuhp_instance(void *hotplug_node) +{ + cpuhp_state_remove_instance_nocalls(hisi_ptt_pmu_online, hotplug_node); +} + +static void hisi_ptt_unregister_pmu(void *pmu) +{ + perf_pmu_unregister(pmu); +} + +static int hisi_ptt_register_pmu(struct hisi_ptt *hisi_ptt) +{ + u16 core_id, sicl_id; + char *pmu_name; + u32 reg; + int ret; + + ret = cpuhp_state_add_instance_nocalls(hisi_ptt_pmu_online, + &hisi_ptt->hotplug_node); + if (ret) + return ret; + + ret = devm_add_action_or_reset(&hisi_ptt->pdev->dev, + hisi_ptt_remove_cpuhp_instance, + &hisi_ptt->hotplug_node); + if (ret) + return ret; + + mutex_init(&hisi_ptt->tune_lock); + spin_lock_init(&hisi_ptt->pmu_lock); + + hisi_ptt->hisi_ptt_pmu = (struct pmu) { + .module = THIS_MODULE, + .capabilities = PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE, + .task_ctx_nr = perf_sw_context, + .attr_groups = hisi_ptt_pmu_groups, + .event_init = hisi_ptt_pmu_event_init, + .setup_aux = hisi_ptt_pmu_setup_aux, + .free_aux = hisi_ptt_pmu_free_aux, + .start = hisi_ptt_pmu_start, + .stop = hisi_ptt_pmu_stop, + .add = hisi_ptt_pmu_add, + .del = hisi_ptt_pmu_del, + }; + + reg = readl(hisi_ptt->iobase + HISI_PTT_LOCATION); + core_id = FIELD_GET(HISI_PTT_CORE_ID, reg); + sicl_id = FIELD_GET(HISI_PTT_SICL_ID, reg); + + pmu_name = devm_kasprintf(&hisi_ptt->pdev->dev, GFP_KERNEL, "hisi_ptt%u_%u", + sicl_id, core_id); + if (!pmu_name) + return -ENOMEM; + + ret = perf_pmu_register(&hisi_ptt->hisi_ptt_pmu, pmu_name, -1); + if (ret) + return ret; + + return devm_add_action_or_reset(&hisi_ptt->pdev->dev, + hisi_ptt_unregister_pmu, + &hisi_ptt->hisi_ptt_pmu); +} + +/* + * The DMA of PTT trace can only use direct mappings due to some + * hardware restriction. Check whether there is no IOMMU or the + * policy of the IOMMU domain is passthrough, otherwise the trace + * cannot work. + * + * The PTT device is supposed to behind an ARM SMMUv3, which + * should have passthrough the device by a quirk. + */ +static int hisi_ptt_check_iommu_mapping(struct pci_dev *pdev) +{ + struct iommu_domain *iommu_domain; + + iommu_domain = iommu_get_domain_for_dev(&pdev->dev); + if (!iommu_domain || iommu_domain->type == IOMMU_DOMAIN_IDENTITY) + return 0; + + return -EOPNOTSUPP; +} + +static int hisi_ptt_probe(struct pci_dev *pdev, + const struct pci_device_id *id) +{ + struct hisi_ptt *hisi_ptt; + int ret; + + ret = hisi_ptt_check_iommu_mapping(pdev); + if (ret) { + pci_err(pdev, "requires direct DMA mappings\n"); + return ret; + } + + hisi_ptt = devm_kzalloc(&pdev->dev, sizeof(*hisi_ptt), GFP_KERNEL); + if (!hisi_ptt) + return -ENOMEM; + + hisi_ptt->pdev = pdev; + pci_set_drvdata(pdev, hisi_ptt); + + ret = pcim_enable_device(pdev); + if (ret) { + pci_err(pdev, "failed to enable device, ret = %d\n", ret); + return ret; + } + + ret = pcim_iomap_regions(pdev, BIT(2), DRV_NAME); + if (ret) { + pci_err(pdev, "failed to remap io memory, ret = %d\n", ret); + return ret; + } + + hisi_ptt->iobase = pcim_iomap_table(pdev)[2]; + + ret = dma_set_coherent_mask(&pdev->dev, DMA_BIT_MASK(64)); + if (ret) { + pci_err(pdev, "failed to set 64 bit dma mask, ret = %d\n", ret); + return ret; + } + + pci_set_master(pdev); + + ret = hisi_ptt_register_irq(hisi_ptt); + if (ret) + return ret; + + ret = hisi_ptt_init_ctrls(hisi_ptt); + if (ret) { + pci_err(pdev, "failed to init controls, ret = %d\n", ret); + return ret; + } + + ret = hisi_ptt_register_pmu(hisi_ptt); + if (ret) { + pci_err(pdev, "failed to register PMU device, ret = %d", ret); + return ret; + } + + return 0; +} + +static const struct pci_device_id hisi_ptt_id_tbl[] = { + { PCI_DEVICE(PCI_VENDOR_ID_HUAWEI, 0xa12e) }, + { } +}; +MODULE_DEVICE_TABLE(pci, hisi_ptt_id_tbl); + +static struct pci_driver hisi_ptt_driver = { + .name = DRV_NAME, + .id_table = hisi_ptt_id_tbl, + .probe = hisi_ptt_probe, +}; + +static int hisi_ptt_cpu_teardown(unsigned int cpu, struct hlist_node *node) +{ + struct hisi_ptt *hisi_ptt; + struct device *dev; + int target, src; + + hisi_ptt = hlist_entry_safe(node, struct hisi_ptt, hotplug_node); + src = hisi_ptt->trace_ctrl.on_cpu; + dev = hisi_ptt->hisi_ptt_pmu.dev; + + if (!hisi_ptt->trace_ctrl.started || src != cpu) + return 0; + + target = cpumask_any_but(cpumask_of_node(dev_to_node(&hisi_ptt->pdev->dev)), cpu); + if (target >= nr_cpu_ids) { + dev_err(dev, "no available cpu for perf context migration\n"); + return 0; + } + + perf_pmu_migrate_context(&hisi_ptt->hisi_ptt_pmu, src, target); + + /* + * Also make sure the interrupt bind to the migrated CPU as well. Warn + * the user on failure here. + */ + if (irq_set_affinity(pci_irq_vector(hisi_ptt->pdev, HISI_PTT_TRACE_DMA_IRQ), + cpumask_of(target))) + dev_warn(dev, "failed to set the affinity of trace interrupt\n"); + + hisi_ptt->trace_ctrl.on_cpu = target; + return 0; +} + +static int __init hisi_ptt_init(void) +{ + int ret; + + ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, DRV_NAME, NULL, + hisi_ptt_cpu_teardown); + if (ret < 0) + return ret; + hisi_ptt_pmu_online = ret; + + ret = pci_register_driver(&hisi_ptt_driver); + if (ret) + cpuhp_remove_multi_state(hisi_ptt_pmu_online); + + return ret; +} +module_init(hisi_ptt_init); + +static void __exit hisi_ptt_exit(void) +{ + pci_unregister_driver(&hisi_ptt_driver); + cpuhp_remove_multi_state(hisi_ptt_pmu_online); +} +module_exit(hisi_ptt_exit); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Yicong Yang "); +MODULE_DESCRIPTION("Driver for HiSilicon PCIe tune and trace device"); -- Gitee From fcba69c0dc2ca03d4ab854e1d291876f6f434013 Mon Sep 17 00:00:00 2001 From: Yicong Yang Date: Tue, 16 Aug 2022 19:44:12 +0800 Subject: [PATCH 11/18] hwtracing: hisi_ptt: Add tune function support for HiSilicon PCIe Tune and Trace device Add tune function for the HiSilicon Tune and Trace device. The interface of tune is exposed through sysfs attributes of PTT PMU device. Acked-by: Mathieu Poirier Reviewed-by: Jonathan Cameron Reviewed-by: John Garry Signed-off-by: Yicong Yang Link: https://lore.kernel.org/r/20220816114414.4092-4-yangyicong@huawei.com Signed-off-by: Mathieu Poirier Signed-off-by: Slim6882 --- drivers/hwtracing/ptt/hisi_ptt.h | 200 +++++++++++++++++++++++++++++++ 1 file changed, 200 insertions(+) create mode 100644 drivers/hwtracing/ptt/hisi_ptt.h diff --git a/drivers/hwtracing/ptt/hisi_ptt.h b/drivers/hwtracing/ptt/hisi_ptt.h new file mode 100644 index 000000000000..5beb1648c93a --- /dev/null +++ b/drivers/hwtracing/ptt/hisi_ptt.h @@ -0,0 +1,200 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Driver for HiSilicon PCIe tune and trace device + * + * Copyright (c) 2022 HiSilicon Technologies Co., Ltd. + * Author: Yicong Yang + */ + +#ifndef _HISI_PTT_H +#define _HISI_PTT_H + +#include +#include +#include +#include +#include +#include +#include +#include + +#define DRV_NAME "hisi_ptt" + +/* + * The definition of the device registers and register fields. + */ +#define HISI_PTT_TUNING_CTRL 0x0000 +#define HISI_PTT_TUNING_CTRL_CODE GENMASK(15, 0) +#define HISI_PTT_TUNING_CTRL_SUB GENMASK(23, 16) +#define HISI_PTT_TUNING_DATA 0x0004 +#define HISI_PTT_TUNING_DATA_VAL_MASK GENMASK(15, 0) +#define HISI_PTT_TRACE_ADDR_SIZE 0x0800 +#define HISI_PTT_TRACE_ADDR_BASE_LO_0 0x0810 +#define HISI_PTT_TRACE_ADDR_BASE_HI_0 0x0814 +#define HISI_PTT_TRACE_ADDR_STRIDE 0x8 +#define HISI_PTT_TRACE_CTRL 0x0850 +#define HISI_PTT_TRACE_CTRL_EN BIT(0) +#define HISI_PTT_TRACE_CTRL_RST BIT(1) +#define HISI_PTT_TRACE_CTRL_RXTX_SEL GENMASK(3, 2) +#define HISI_PTT_TRACE_CTRL_TYPE_SEL GENMASK(7, 4) +#define HISI_PTT_TRACE_CTRL_DATA_FORMAT BIT(14) +#define HISI_PTT_TRACE_CTRL_FILTER_MODE BIT(15) +#define HISI_PTT_TRACE_CTRL_TARGET_SEL GENMASK(31, 16) +#define HISI_PTT_TRACE_INT_STAT 0x0890 +#define HISI_PTT_TRACE_INT_STAT_MASK GENMASK(3, 0) +#define HISI_PTT_TRACE_INT_MASK 0x0894 +#define HISI_PTT_TUNING_INT_STAT 0x0898 +#define HISI_PTT_TUNING_INT_STAT_MASK BIT(0) +#define HISI_PTT_TRACE_WR_STS 0x08a0 +#define HISI_PTT_TRACE_WR_STS_WRITE GENMASK(27, 0) +#define HISI_PTT_TRACE_WR_STS_BUFFER GENMASK(29, 28) +#define HISI_PTT_TRACE_STS 0x08b0 +#define HISI_PTT_TRACE_IDLE BIT(0) +#define HISI_PTT_DEVICE_RANGE 0x0fe0 +#define HISI_PTT_DEVICE_RANGE_UPPER GENMASK(31, 16) +#define HISI_PTT_DEVICE_RANGE_LOWER GENMASK(15, 0) +#define HISI_PTT_LOCATION 0x0fe8 +#define HISI_PTT_CORE_ID GENMASK(15, 0) +#define HISI_PTT_SICL_ID GENMASK(31, 16) + +/* Parameters of PTT trace DMA part. */ +#define HISI_PTT_TRACE_DMA_IRQ 0 +#define HISI_PTT_TRACE_BUF_CNT 4 +#define HISI_PTT_TRACE_BUF_SIZE SZ_4M +#define HISI_PTT_TRACE_TOTAL_BUF_SIZE (HISI_PTT_TRACE_BUF_SIZE * \ + HISI_PTT_TRACE_BUF_CNT) +/* Wait time for hardware DMA to reset */ +#define HISI_PTT_RESET_TIMEOUT_US 10UL +#define HISI_PTT_RESET_POLL_INTERVAL_US 1UL +/* Poll timeout and interval for waiting hardware work to finish */ +#define HISI_PTT_WAIT_TUNE_TIMEOUT_US 1000000UL +#define HISI_PTT_WAIT_TRACE_TIMEOUT_US 100UL +#define HISI_PTT_WAIT_POLL_INTERVAL_US 10UL + +#define HISI_PCIE_CORE_PORT_ID(devfn) ((PCI_SLOT(devfn) & 0x7) << 1) + +/* Definition of the PMU configs */ +#define HISI_PTT_PMU_FILTER_IS_PORT BIT(19) +#define HISI_PTT_PMU_FILTER_VAL_MASK GENMASK(15, 0) +#define HISI_PTT_PMU_DIRECTION_MASK GENMASK(23, 20) +#define HISI_PTT_PMU_TYPE_MASK GENMASK(31, 24) +#define HISI_PTT_PMU_FORMAT_MASK GENMASK(35, 32) + +/** + * struct hisi_ptt_tune_desc - Describe tune event for PTT tune + * @hisi_ptt: PTT device this tune event belongs to + * @name: name of this event + * @event_code: code of the event + */ +struct hisi_ptt_tune_desc { + struct hisi_ptt *hisi_ptt; + const char *name; + u32 event_code; +}; + +/** + * struct hisi_ptt_dma_buffer - Describe a single trace buffer of PTT trace. + * The detail of the data format is described + * in the documentation of PTT device. + * @dma: DMA address of this buffer visible to the device + * @addr: virtual address of this buffer visible to the cpu + */ +struct hisi_ptt_dma_buffer { + dma_addr_t dma; + void *addr; +}; + +/** + * struct hisi_ptt_trace_ctrl - Control and status of PTT trace + * @trace_buf: array of the trace buffers for holding the trace data. + * the length will be HISI_PTT_TRACE_BUF_CNT. + * @handle: perf output handle of current trace session + * @buf_index: the index of current using trace buffer + * @on_cpu: current tracing cpu + * @started: current trace status, true for started + * @is_port: whether we're tracing root port or not + * @direction: direction of the TLP headers to trace + * @filter: filter value for tracing the TLP headers + * @format: format of the TLP headers to trace + * @type: type of the TLP headers to trace + */ +struct hisi_ptt_trace_ctrl { + struct hisi_ptt_dma_buffer *trace_buf; + struct perf_output_handle handle; + u32 buf_index; + int on_cpu; + bool started; + bool is_port; + u32 direction:2; + u32 filter:16; + u32 format:1; + u32 type:4; +}; + +/** + * struct hisi_ptt_filter_desc - Descriptor of the PTT trace filter + * @list: entry of this descriptor in the filter list + * @is_port: the PCI device of the filter is a Root Port or not + * @devid: the PCI device's devid of the filter + */ +struct hisi_ptt_filter_desc { + struct list_head list; + bool is_port; + u16 devid; +}; + +/** + * struct hisi_ptt_pmu_buf - Descriptor of the AUX buffer of PTT trace + * @length: size of the AUX buffer + * @nr_pages: number of pages of the AUX buffer + * @base: start address of AUX buffer + * @pos: position in the AUX buffer to commit traced data + */ +struct hisi_ptt_pmu_buf { + size_t length; + int nr_pages; + void *base; + long pos; +}; + +/** + * struct hisi_ptt - Per PTT device data + * @trace_ctrl: the control information of PTT trace + * @hotplug_node: node for register cpu hotplug event + * @hisi_ptt_pmu: the pum device of trace + * @iobase: base IO address of the device + * @pdev: pci_dev of this PTT device + * @tune_lock: lock to serialize the tune process + * @pmu_lock: lock to serialize the perf process + * @upper_bdf: the upper BDF range of the PCI devices managed by this PTT device + * @lower_bdf: the lower BDF range of the PCI devices managed by this PTT device + * @port_filters: the filter list of root ports + * @req_filters: the filter list of requester ID + * @port_mask: port mask of the managed root ports + */ +struct hisi_ptt { + struct hisi_ptt_trace_ctrl trace_ctrl; + struct hlist_node hotplug_node; + struct pmu hisi_ptt_pmu; + void __iomem *iobase; + struct pci_dev *pdev; + struct mutex tune_lock; + spinlock_t pmu_lock; + u32 upper_bdf; + u32 lower_bdf; + + /* + * The trace TLP headers can either be filtered by certain + * root port, or by the requester ID. Organize the filters + * by @port_filters and @req_filters here. The mask of all + * the valid ports is also cached for doing sanity check + * of user input. + */ + struct list_head port_filters; + struct list_head req_filters; + u16 port_mask; +}; + +#define to_hisi_ptt(pmu) container_of(pmu, struct hisi_ptt, hisi_ptt_pmu) + +#endif /* _HISI_PTT_H */ -- Gitee From 01d85dc68212cf663f8e307b939f12e855949daf Mon Sep 17 00:00:00 2001 From: Wangming Shao Date: Tue, 8 Nov 2022 18:12:23 +0800 Subject: [PATCH 12/18] Fix the header file location error and adjust the function and structure version. driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I5WGEF ------------------------------------------------------------------- Fixed the header file location error. Rectify the missing member and function name errors in the structure. Signed-off-by: Wangming Shao Reviewed-by: Yicong Yang Reviewed-by: Yang Jihong Signed-off-by: Zheng Zengkai Signed-off-by: Slim6882 --- tools/perf/arch/arm/util/auxtrace.c | 6 +++--- tools/perf/arch/arm64/util/hisi-ptt.c | 7 +++---- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/tools/perf/arch/arm/util/auxtrace.c b/tools/perf/arch/arm/util/auxtrace.c index 2d74e4084f13..fefabcb8de43 100644 --- a/tools/perf/arch/arm/util/auxtrace.c +++ b/tools/perf/arch/arm/util/auxtrace.c @@ -10,10 +10,10 @@ #include #include -#include "../../util/auxtrace.h" -#include "../../util/debug.h" +#include "../../../util/auxtrace.h" +#include "../../../util/debug.h" #include "../../util/evlist.h" -#include "../../util/pmu.h" +#include "../../../util/pmu.h" #include "cs-etm.h" #include "arm-spe.h" #include "hisi-ptt.h" diff --git a/tools/perf/arch/arm64/util/hisi-ptt.c b/tools/perf/arch/arm64/util/hisi-ptt.c index ba97c8a562a0..110b2edf3e6b 100644 --- a/tools/perf/arch/arm64/util/hisi-ptt.c +++ b/tools/perf/arch/arm64/util/hisi-ptt.c @@ -113,7 +113,6 @@ static int hisi_ptt_recording_options(struct auxtrace_record *itr, } evsel->core.attr.freq = 0; evsel->core.attr.sample_period = 1; - evsel->needs_auxtrace_mmap = true; hisi_ptt_evsel = evsel; opts->full_auxtrace = true; } @@ -126,16 +125,16 @@ static int hisi_ptt_recording_options(struct auxtrace_record *itr, * To obtain the auxtrace buffer file descriptor, the auxtrace event * must come first. */ - evlist__to_front(evlist, hisi_ptt_evsel); + perf_evlist__to_front(evlist, hisi_ptt_evsel); evsel__set_sample_bit(hisi_ptt_evsel, TIME); /* Add dummy event to keep tracking */ - err = parse_event(evlist, "dummy:u"); + err = parse_events(evlist, "dummy:u", NULL); if (err) return err; tracking_evsel = evlist__last(evlist); - evlist__set_tracking_event(evlist, tracking_evsel); + perf_evlist__set_tracking_event(evlist, tracking_evsel); tracking_evsel->core.attr.freq = 0; tracking_evsel->core.attr.sample_period = 1; -- Gitee From a33d8187fa0345567c7c2ad61c680de22b7a1043 Mon Sep 17 00:00:00 2001 From: Yicong Yang Date: Wed, 21 Jun 2023 17:28:03 +0800 Subject: [PATCH 13/18] hwtracing: hisi_ptt: Advertise PERF_PMU_CAP_NO_EXCLUDE for PTT PMU The PTT trace collects PCIe TLP headers from the PCIe link and don't have the ability to exclude certain context. It doesn't support itrace as well. So replace PERF_PMU_CAP_ITRACE with PERF_PMU_CAP_NO_EXCLUDE. This will greatly save the storage of final data. Tested tracing idle link for ~15s, without this patch we'll collect ~28.682MB data for additional information and with this patch it reduced to ~0.226MB. Reviewed-by: Jonathan Cameron Signed-off-by: Yicong Yang Tested-by: Junhao He Signed-off-by: Suzuki K Poulose Link: https://lore.kernel.org/r/20230621092804.15120-5-yangyicong@huawei.com Signed-off-by: Slim6882 --- drivers/hwtracing/ptt/hisi_ptt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c index 5d5526aa60c4..8f3330d3cc74 100644 --- a/drivers/hwtracing/ptt/hisi_ptt.c +++ b/drivers/hwtracing/ptt/hisi_ptt.c @@ -861,7 +861,7 @@ static int hisi_ptt_register_pmu(struct hisi_ptt *hisi_ptt) hisi_ptt->hisi_ptt_pmu = (struct pmu) { .module = THIS_MODULE, - .capabilities = PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_ITRACE, + .capabilities = PERF_PMU_CAP_EXCLUSIVE | PERF_PMU_CAP_NO_EXCLUDE, .task_ctx_nr = perf_sw_context, .attr_groups = hisi_ptt_pmu_groups, .event_init = hisi_ptt_pmu_event_init, -- Gitee From a94b531128c4a67e550d063930f7044eb3837687 Mon Sep 17 00:00:00 2001 From: Devyn Liu Date: Tue, 31 Oct 2023 10:52:47 +0800 Subject: [PATCH 14/18] ipmi: Errata workaround to prevent SMS message processing timeout driver inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I8BW7F CVE: NA -------------------------------------------------------------------------- Add erratum 162102203 for HIP09. Add erratum 162102203 kconfig to enable workaround for SMS message processing timeout Add a workaround method to prevent bridge commands from timeout. Because SMS_ATN bit will be cleared by specific CPU hardware before it is driven to process. sms_atn_quirk: Identify whether the SMS_ATN flag needs to be stored and do workaround operations. sms_atn_flag: This variable is used to store SMS_ATN when the driver polling enters the bt_event until the IDLE state is reached.Then reset this flag when driver finished proccessing SMS message and clear SMS_ATN bit. Identify whether the incorrect BT_B2H_ATN is possibly due to receiving an SMS message from BMC. If so, clear B2H_ATN. Prevents SMS_ATN from being discarded after processing drain stale response. Signed-off-by: Devyn Liu Signed-off-by: Slim6882 --- Documentation/arm64/silicon-errata.rst | 7 +++ arch/arm64/Kconfig | 70 +++++++++++++++++++++++ drivers/char/ipmi/ipmi_bt_sm.c | 78 ++++++++++++++++++++++++++ 3 files changed, 155 insertions(+) diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst index 6bafd420a694..0ef894fe7a25 100644 --- a/Documentation/arm64/silicon-errata.rst +++ b/Documentation/arm64/silicon-errata.rst @@ -140,6 +140,13 @@ stable kernels. | Hisilicon | Hip08 SMMU PMCG | #162001900 | N/A | | | Hip09 SMMU PMCG | | | +----------------+-----------------+-----------------+-----------------------------+ +| Hisilicon | TSV{110,200} | #1980005 | HISILICON_ERRATUM_1980005 | ++----------------+-----------------+-----------------+-----------------------------+ +| Hisilicon | Hip09 | #162100801 | HISILICON_ERRATUM_162100801 | ++----------------+-----------------+-----------------+-----------------------------+ +| Hisilicon | LINXICORE9100 | #162100125 | HISILICON_ERRATUM_162100125 | ++----------------+-----------------+-----------------+-----------------------------+ +| Hisilicon | HIP09 | #162102203 | HISILICON_ERRATUM_162102203 | +----------------+-----------------+-----------------+-----------------------------+ | Qualcomm Tech. | Kryo/Falkor v1 | E1003 | QCOM_FALKOR_ERRATUM_1003 | +----------------+-----------------+-----------------+-----------------------------+ diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index ae9b53b7835e..2bc3f98d8ad7 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -691,6 +691,76 @@ config CAVIUM_TX2_ERRATUM_219 If unsure, say Y. +config FUJITSU_ERRATUM_010001 + bool "Fujitsu-A64FX erratum E#010001: Undefined fault may occur wrongly" + default y + help + This option adds a workaround for Fujitsu-A64FX erratum E#010001. + On some variants of the Fujitsu-A64FX cores ver(1.0, 1.1), memory + accesses may cause undefined fault (Data abort, DFSC=0b111111). + This fault occurs under a specific hardware condition when a + load/store instruction performs an address translation using: + case-1 TTBR0_EL1 with TCR_EL1.NFD0 == 1. + case-2 TTBR0_EL2 with TCR_EL2.NFD0 == 1. + case-3 TTBR1_EL1 with TCR_EL1.NFD1 == 1. + case-4 TTBR1_EL2 with TCR_EL2.NFD1 == 1. + + The workaround is to ensure these bits are clear in TCR_ELx. + The workaround only affects the Fujitsu-A64FX. + + If unsure, say Y. + +config HISILICON_ERRATUM_161600802 + bool "Hip07 161600802: Erroneous redistributor VLPI base" + default y + help + The HiSilicon Hip07 SoC uses the wrong redistributor base + when issued ITS commands such as VMOVP and VMAPP, and requires + a 128kB offset to be applied to the target address in this commands. + + If unsure, say Y. + +config HISILICON_ERRATUM_1980005 + bool "Hisilicon erratum IDC support" + default n + help + The HiSilicon TSV100/200 SoC support idc but report wrong value to + kernel. + + If unsure, say N. + +config HISILICON_ERRATUM_162100801 + bool "Hip09 162100801 erratum support" + default y + help + When enabled GICv4.1 in hip09, there are some invalid vPE config + in configuration tables for some situation, which will cause vSGI + interrupts lost. So fix it by sending vinvall commands after vmovp. + + If unsure, say Y. + +config HISILICON_ERRATUM_162100125 + bool "Hisilicon erratum 162100125" + default y + help + On Hisilicon LINXICORE9100 cores, sharing tlb entries on two cores when + TTBRx.CNP=1 differs from the standard ARM core. This causes issues when + tlb entries sharing between CPU cores. Avoid these issues by disabling + CNP support for Hisilicon LINXICORE9100 cores. + + If unsure, say Y. + +config HISILICON_ERRATUM_162102203 + bool "Hisilicon erratum 162102203" + default n + help + On HIP09, SMS_ATN bit may be cleared by hardware before it is driven + to process. This will causes bridge commands timeout. Avoid these issues + by modifying the IPMI driver to use variables to store SMS_ATNs that may + be lost. + + If unsure, say N. + config QCOM_FALKOR_ERRATUM_1003 bool "Falkor E1003: Incorrect translation due to ASID change" default y diff --git a/drivers/char/ipmi/ipmi_bt_sm.c b/drivers/char/ipmi/ipmi_bt_sm.c index f3f216cdf686..7ecf7e235b81 100644 --- a/drivers/char/ipmi/ipmi_bt_sm.c +++ b/drivers/char/ipmi/ipmi_bt_sm.c @@ -88,6 +88,13 @@ struct si_sm_data { enum bt_states complete; /* to divert the state machine */ long BT_CAP_req2rsp; int BT_CAP_retries; /* Recommended retries */ + +#ifdef CONFIG_HISILICON_ERRATUM_162102203 + /* Record sms_atn when sms_atn set && bt_state not in idle */ + int sms_atn_flag; + /* If true, need to store SMS_ATN bit */ + bool sms_atn_quirk; +#endif }; #define BT_CLR_WR_PTR 0x01 /* See IPMI 1.5 table 11.6.4 */ @@ -168,6 +175,32 @@ static char *status2txt(unsigned char status) } #define STATUS2TXT status2txt(status) +#ifdef CONFIG_HISILICON_ERRATUM_162102203 +/* + * To confirm whether the SMS_ATN flag needs to be stored and get + * quirk through the method reported by the BIOS. Because in special + * cases SMS_ATN flag bits may be lost before being processed. + */ +static bool get_sms_atn_quirk(struct si_sm_io *io) +{ + acpi_handle handle; + acpi_status status; + unsigned long long tmp; + + handle = ACPI_HANDLE(io->dev); + if (!handle) + return false; + + status = acpi_evaluate_integer(handle, "SATN", NULL, &tmp); + if (ACPI_FAILURE(status)) + return false; + else if (tmp != 1) + return false; + + return true; +} +#endif + /* called externally at insmod time, and internally on cleanup */ static unsigned int bt_init_data(struct si_sm_data *bt, struct si_sm_io *io) @@ -182,6 +215,12 @@ static unsigned int bt_init_data(struct si_sm_data *bt, struct si_sm_io *io) bt->complete = BT_STATE_IDLE; /* end here */ bt->BT_CAP_req2rsp = BT_NORMAL_TIMEOUT * USEC_PER_SEC; bt->BT_CAP_retries = BT_NORMAL_RETRY_LIMIT; + +#ifdef CONFIG_HISILICON_ERRATUM_162102203 + bt->sms_atn_quirk = get_sms_atn_quirk(io); + bt->sms_atn_flag = 0; +#endif + return 3; /* We claim 3 bytes of space; ought to check SPMI table */ } @@ -281,6 +320,11 @@ static void reset_flags(struct si_sm_data *bt) BT_CONTROL(BT_H_BUSY); /* force clear */ BT_CONTROL(BT_CLR_WR_PTR); /* always reset */ BT_CONTROL(BT_SMS_ATN); /* always clear */ + +#ifdef CONFIG_HISILICON_ERRATUM_162102203 + bt->sms_atn_flag = 0; /* Reset sms_atn_flag */ +#endif + BT_INTMASK_W(BT_BMC_HWRST); } @@ -452,6 +496,36 @@ static enum si_sm_result bt_event(struct si_sm_data *bt, long time) int i; status = BT_STATUS; +#ifdef CONFIG_HISILICON_ERRATUM_162102203 + if (bt->sms_atn_quirk) { + /* + * Identify whether the current BT_B2H_ATN is possibly due to + * receiving an SMS message from BMC. If so, only clear the + * incorrectly set BT_B2H_ATN without returning, and continue + * to execute downwards. + */ + if ((bt->state < BT_STATE_WRITE_BYTES) && (status & BT_B2H_ATN) && + (status & BT_SMS_ATN)) { + BT_CONTROL(BT_B2H_ATN); /* Clear it */ + status &= ~BT_B2H_ATN; /* Refresh status */ + if (bt_debug) + dev_dbg(bt->io->dev, "clear wrong B2H_ATN, BT: %s\n", + status2txt(BT_STATUS)); + } + + /* + * Record the SMS_ATN by sms_atn_flag, because SMS_ATN would be clear + * incorrectly by hardware when received a SMS message from BMC if + * bt->state is not in IDLE state. And the BT_SMS_ATN will be lost. + */ + if ((bt->state >= BT_STATE_XACTION_START) && (status & BT_SMS_ATN)) + bt->sms_atn_flag = 1; + + /* Restore SMS_ATN */ + if (bt->sms_atn_flag) + status |= BT_SMS_ATN; + } +#endif bt->nonzero_status |= status; if ((bt_debug & BT_DEBUG_STATES) && (bt->state != last_printed)) { dev_dbg(bt->io->dev, "BT: %s %s TO=%ld - %ld\n", @@ -494,6 +568,10 @@ static enum si_sm_result bt_event(struct si_sm_data *bt, long time) case BT_STATE_IDLE: if (status & BT_SMS_ATN) { BT_CONTROL(BT_SMS_ATN); /* clear it */ + +#ifdef CONFIG_HISILICON_ERRATUM_162102203 + bt->sms_atn_flag = 0; /* Reset sms_atn_flag */ +#endif return SI_SM_ATTN; } -- Gitee From 2c1bffbb6aa5facffa7a1cba8cdfa0160a5cdfa5 Mon Sep 17 00:00:00 2001 From: Yicong Yang Date: Wed, 21 Jun 2023 17:28:04 +0800 Subject: [PATCH 15/18] hwtracing: hisi_ptt: Fix potential sleep in atomic context We're using pci_irq_vector() to obtain the interrupt number and then bind it to the CPU start perf under the protection of spinlock in pmu::start(). pci_irq_vector() might sleep since [1] because it will call msi_domain_get_virq() to get the MSI interrupt number and it needs to acquire dev->msi.data->mutex. Getting a mutex will sleep on contention. So use pci_irq_vector() in an atomic context is problematic. This patch cached the interrupt number in the probe() and uses the cached data instead to avoid potential sleep. [1] commit 82ff8e6b78fc ("PCI/MSI: Use msi_get_virq() in pci_get_vector()") Fixes: ff0de066b463 ("hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device") Reviewed-by: Jonathan Cameron Signed-off-by: Yicong Yang Signed-off-by: Suzuki K Poulose Link: https://lore.kernel.org/r/20230621092804.15120-6-yangyicong@huawei.com Signed-off-by: Slim6882 --- drivers/hwtracing/ptt/hisi_ptt.c | 12 +++++------- drivers/hwtracing/ptt/hisi_ptt.h | 2 ++ 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c index 8f3330d3cc74..865d86b31056 100644 --- a/drivers/hwtracing/ptt/hisi_ptt.c +++ b/drivers/hwtracing/ptt/hisi_ptt.c @@ -341,13 +341,13 @@ static int hisi_ptt_register_irq(struct hisi_ptt *hisi_ptt) if (ret < 0) return ret; - ret = devm_request_threaded_irq(&pdev->dev, - pci_irq_vector(pdev, HISI_PTT_TRACE_DMA_IRQ), + hisi_ptt->trace_irq = pci_irq_vector(pdev, HISI_PTT_TRACE_DMA_IRQ); + ret = devm_request_threaded_irq(&pdev->dev, hisi_ptt->trace_irq, NULL, hisi_ptt_isr, 0, DRV_NAME, hisi_ptt); if (ret) { pci_err(pdev, "failed to request irq %d, ret = %d\n", - pci_irq_vector(pdev, HISI_PTT_TRACE_DMA_IRQ), ret); + hisi_ptt->trace_irq, ret); return ret; } @@ -747,8 +747,7 @@ static void hisi_ptt_pmu_start(struct perf_event *event, int flags) * core in event_function_local(). If CPU passed is offline we'll fail * here, just log it since we can do nothing here. */ - ret = irq_set_affinity(pci_irq_vector(hisi_ptt->pdev, HISI_PTT_TRACE_DMA_IRQ), - cpumask_of(cpu)); + ret = irq_set_affinity(hisi_ptt->trace_irq, cpumask_of(cpu)); if (ret) dev_warn(dev, "failed to set the affinity of trace interrupt\n"); @@ -1008,8 +1007,7 @@ static int hisi_ptt_cpu_teardown(unsigned int cpu, struct hlist_node *node) * Also make sure the interrupt bind to the migrated CPU as well. Warn * the user on failure here. */ - if (irq_set_affinity(pci_irq_vector(hisi_ptt->pdev, HISI_PTT_TRACE_DMA_IRQ), - cpumask_of(target))) + if (irq_set_affinity(hisi_ptt->trace_irq, cpumask_of(target))) dev_warn(dev, "failed to set the affinity of trace interrupt\n"); hisi_ptt->trace_ctrl.on_cpu = target; diff --git a/drivers/hwtracing/ptt/hisi_ptt.h b/drivers/hwtracing/ptt/hisi_ptt.h index 5beb1648c93a..948a4c423152 100644 --- a/drivers/hwtracing/ptt/hisi_ptt.h +++ b/drivers/hwtracing/ptt/hisi_ptt.h @@ -166,6 +166,7 @@ struct hisi_ptt_pmu_buf { * @pdev: pci_dev of this PTT device * @tune_lock: lock to serialize the tune process * @pmu_lock: lock to serialize the perf process + * @trace_irq: interrupt number used by trace * @upper_bdf: the upper BDF range of the PCI devices managed by this PTT device * @lower_bdf: the lower BDF range of the PCI devices managed by this PTT device * @port_filters: the filter list of root ports @@ -180,6 +181,7 @@ struct hisi_ptt { struct pci_dev *pdev; struct mutex tune_lock; spinlock_t pmu_lock; + int trace_irq; u32 upper_bdf; u32 lower_bdf; -- Gitee From 86f2da9594963b256b3870cd2bed14f91c6ae5bd Mon Sep 17 00:00:00 2001 From: Yicong Yang Date: Wed, 21 Jun 2023 17:28:02 +0800 Subject: [PATCH 16/18] hwtracing: hisi_ptt: Export available filters through sysfs The PTT can only filter the traced TLP headers by the Root Ports or the Requester ID of the Endpoint, which are located on the same PCIe core of the PTT device. The filter value used is derived from the BDF number of the supported Root Port or the Endpoint. It's not friendly enough for the users since it requires the user to be familiar enough with the platform and calculate the filter value manually. This patch export the available filters through sysfs. Each available filters is presented as an individual file with the name of the BDF number of the related PCIe device. The files are created under $(PTT PMU dir)/available_root_port_filters and $(PTT PMU dir)/available_requester_filters respectively. The filter value can be known by reading the related file. Then the users can easily know the available filters for trace and get the filter values without calculating. Reviewed-by: Jonathan Cameron Signed-off-by: Yicong Yang Signed-off-by: Suzuki K Poulose Link: https://lore.kernel.org/r/20230621092804.15120-4-yangyicong@huawei.com Signed-off-by: Slim6882 --- .../ABI/testing/sysfs-devices-hisi_ptt | 52 +++ Documentation/trace/hisi-ptt.rst | 6 + drivers/hwtracing/ptt/hisi_ptt.c | 331 ++++++++++++++++++ drivers/hwtracing/ptt/hisi_ptt.h | 17 + 4 files changed, 406 insertions(+) diff --git a/Documentation/ABI/testing/sysfs-devices-hisi_ptt b/Documentation/ABI/testing/sysfs-devices-hisi_ptt index 82de6d710266..d7e206b4901c 100644 --- a/Documentation/ABI/testing/sysfs-devices-hisi_ptt +++ b/Documentation/ABI/testing/sysfs-devices-hisi_ptt @@ -59,3 +59,55 @@ Description: (RW) Control the allocated buffer watermark of outbound packets. The available tune data is [0, 1, 2]. Writing a negative value will return an error, and out of range values will be converted to 2. The value indicates a probable level of the event. + +What: /sys/devices/hisi_ptt_/root_port_filters +Date: May 2023 +KernelVersion: 6.5 +Contact: Yicong Yang +Description: This directory contains the files providing the PCIe Root Port filters + information used for PTT trace. Each file is named after the supported + Root Port device name ::.. + + See the description of the "filter" in Documentation/trace/hisi-ptt.rst + for more information. + +What: /sys/devices/hisi_ptt_/root_port_filters/multiselect +Date: May 2023 +KernelVersion: 6.5 +Contact: Yicong Yang +Description: (Read) Indicates if this kind of filter can be selected at the same + time as others filters, or must be used on it's own. 1 indicates + the former case and 0 indicates the latter. + +What: /sys/devices/hisi_ptt_/root_port_filters/ +Date: May 2023 +KernelVersion: 6.5 +Contact: Yicong Yang +Description: (Read) Indicates the filter value of this Root Port filter, which + can be used to control the TLP headers to trace by the PTT trace. + +What: /sys/devices/hisi_ptt_/requester_filters +Date: May 2023 +KernelVersion: 6.5 +Contact: Yicong Yang +Description: This directory contains the files providing the PCIe Requester filters + information used for PTT trace. Each file is named after the supported + Endpoint device name ::.. + + See the description of the "filter" in Documentation/trace/hisi-ptt.rst + for more information. + +What: /sys/devices/hisi_ptt_/requester_filters/multiselect +Date: May 2023 +KernelVersion: 6.5 +Contact: Yicong Yang +Description: (Read) Indicates if this kind of filter can be selected at the same + time as others filters, or must be used on it's own. 1 indicates + the former case and 0 indicates the latter. + +What: /sys/devices/hisi_ptt_/requester_filters/ +Date: May 2023 +KernelVersion: 6.5 +Contact: Yicong Yang +Description: (Read) Indicates the filter value of this Requester filter, which + can be used to control the TLP headers to trace by the PTT trace. diff --git a/Documentation/trace/hisi-ptt.rst b/Documentation/trace/hisi-ptt.rst index 4f87d8e21065..090bf3e8e8f9 100644 --- a/Documentation/trace/hisi-ptt.rst +++ b/Documentation/trace/hisi-ptt.rst @@ -148,6 +148,12 @@ For example, if the desired filter is Endpoint function 0000:01:00.1 the filter value will be 0x00101. If the desired filter is Root Port 0000:00:10.0 then then filter value is calculated as 0x80001. +The driver also presents every supported Root Port and Requester filter through +sysfs. Each filter will be an individual file with name of its related PCIe +device name (domain:bus:device.function). The files of Root Port filters are +under $(PTT PMU dir)/root_port_filters and files of Requester filters +are under $(PTT PMU dir)/requester_filters. + Note that multiple Root Ports can be specified at one time, but only one Endpoint function can be specified in one trace. Specifying both Root Port and function at the same time is not supported. Driver maintains a list of diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c index 865d86b31056..bd05c4c8918a 100644 --- a/drivers/hwtracing/ptt/hisi_ptt.c +++ b/drivers/hwtracing/ptt/hisi_ptt.c @@ -382,6 +382,283 @@ static int hisi_ptt_init_filters(struct pci_dev *pdev, void *data) list_add_tail(&filter->list, &hisi_ptt->req_filters); } + return filter; +} + +static ssize_t hisi_ptt_filter_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + struct hisi_ptt_filter_desc *filter; + unsigned long filter_val; + + filter = container_of(attr, struct hisi_ptt_filter_desc, attr); + filter_val = hisi_ptt_get_filter_val(filter->devid, filter->is_port) | + (filter->is_port ? HISI_PTT_PMU_FILTER_IS_PORT : 0); + + return sysfs_emit(buf, "0x%05lx\n", filter_val); +} + +static int hisi_ptt_create_rp_filter_attr(struct hisi_ptt *hisi_ptt, + struct hisi_ptt_filter_desc *filter) +{ + struct kobject *kobj = &hisi_ptt->hisi_ptt_pmu.dev->kobj; + + sysfs_attr_init(&filter->attr.attr); + filter->attr.attr.name = filter->name; + filter->attr.attr.mode = 0400; /* DEVICE_ATTR_ADMIN_RO */ + filter->attr.show = hisi_ptt_filter_show; + + return sysfs_add_file_to_group(kobj, &filter->attr.attr, + HISI_PTT_RP_FILTERS_GRP_NAME); +} + +static void hisi_ptt_remove_rp_filter_attr(struct hisi_ptt *hisi_ptt, + struct hisi_ptt_filter_desc *filter) +{ + struct kobject *kobj = &hisi_ptt->hisi_ptt_pmu.dev->kobj; + + sysfs_remove_file_from_group(kobj, &filter->attr.attr, + HISI_PTT_RP_FILTERS_GRP_NAME); +} + +static int hisi_ptt_create_req_filter_attr(struct hisi_ptt *hisi_ptt, + struct hisi_ptt_filter_desc *filter) +{ + struct kobject *kobj = &hisi_ptt->hisi_ptt_pmu.dev->kobj; + + sysfs_attr_init(&filter->attr.attr); + filter->attr.attr.name = filter->name; + filter->attr.attr.mode = 0400; /* DEVICE_ATTR_ADMIN_RO */ + filter->attr.show = hisi_ptt_filter_show; + + return sysfs_add_file_to_group(kobj, &filter->attr.attr, + HISI_PTT_REQ_FILTERS_GRP_NAME); +} + +static void hisi_ptt_remove_req_filter_attr(struct hisi_ptt *hisi_ptt, + struct hisi_ptt_filter_desc *filter) +{ + struct kobject *kobj = &hisi_ptt->hisi_ptt_pmu.dev->kobj; + + sysfs_remove_file_from_group(kobj, &filter->attr.attr, + HISI_PTT_REQ_FILTERS_GRP_NAME); +} + +static int hisi_ptt_create_filter_attr(struct hisi_ptt *hisi_ptt, + struct hisi_ptt_filter_desc *filter) +{ + int ret; + + if (filter->is_port) + ret = hisi_ptt_create_rp_filter_attr(hisi_ptt, filter); + else + ret = hisi_ptt_create_req_filter_attr(hisi_ptt, filter); + + if (ret) + pci_err(hisi_ptt->pdev, "failed to create sysfs attribute for filter %s\n", + filter->name); + + return ret; +} + +static void hisi_ptt_remove_filter_attr(struct hisi_ptt *hisi_ptt, + struct hisi_ptt_filter_desc *filter) +{ + if (filter->is_port) + hisi_ptt_remove_rp_filter_attr(hisi_ptt, filter); + else + hisi_ptt_remove_req_filter_attr(hisi_ptt, filter); +} + +static void hisi_ptt_remove_all_filter_attributes(void *data) +{ + struct hisi_ptt_filter_desc *filter; + struct hisi_ptt *hisi_ptt = data; + + mutex_lock(&hisi_ptt->filter_lock); + + list_for_each_entry(filter, &hisi_ptt->req_filters, list) + hisi_ptt_remove_filter_attr(hisi_ptt, filter); + + list_for_each_entry(filter, &hisi_ptt->port_filters, list) + hisi_ptt_remove_filter_attr(hisi_ptt, filter); + + hisi_ptt->sysfs_inited = false; + mutex_unlock(&hisi_ptt->filter_lock); +} + +static int hisi_ptt_init_filter_attributes(struct hisi_ptt *hisi_ptt) +{ + struct hisi_ptt_filter_desc *filter; + int ret; + + mutex_lock(&hisi_ptt->filter_lock); + + /* + * Register the reset callback in the first stage. In reset we traverse + * the filters list to remove the sysfs attributes so the callback can + * be called safely even without below filter attributes creation. + */ + ret = devm_add_action(&hisi_ptt->pdev->dev, + hisi_ptt_remove_all_filter_attributes, + hisi_ptt); + if (ret) + goto out; + + list_for_each_entry(filter, &hisi_ptt->port_filters, list) { + ret = hisi_ptt_create_filter_attr(hisi_ptt, filter); + if (ret) + goto out; + } + + list_for_each_entry(filter, &hisi_ptt->req_filters, list) { + ret = hisi_ptt_create_filter_attr(hisi_ptt, filter); + if (ret) + goto out; + } + + hisi_ptt->sysfs_inited = true; +out: + mutex_unlock(&hisi_ptt->filter_lock); + return ret; +} + +static void hisi_ptt_update_filters(struct work_struct *work) +{ + struct delayed_work *delayed_work = to_delayed_work(work); + struct hisi_ptt_filter_update_info info; + struct hisi_ptt_filter_desc *filter; + struct hisi_ptt *hisi_ptt; + + hisi_ptt = container_of(delayed_work, struct hisi_ptt, work); + + if (!mutex_trylock(&hisi_ptt->filter_lock)) { + schedule_delayed_work(&hisi_ptt->work, HISI_PTT_WORK_DELAY_MS); + return; + } + + while (kfifo_get(&hisi_ptt->filter_update_kfifo, &info)) { + if (info.is_add) { + /* + * Notify the users if failed to add this filter, others + * still work and available. See the comments in + * hisi_ptt_init_filters(). + */ + filter = hisi_ptt_alloc_add_filter(hisi_ptt, info.devid, info.is_port); + if (!filter) + continue; + + /* + * If filters' sysfs entries hasn't been initialized, + * then we're still at probe stage. Add the filters to + * the list and later hisi_ptt_init_filter_attributes() + * will create sysfs attributes for all the filters. + */ + if (hisi_ptt->sysfs_inited && + hisi_ptt_create_filter_attr(hisi_ptt, filter)) { + hisi_ptt_del_free_filter(hisi_ptt, filter); + continue; + } + } else { + struct hisi_ptt_filter_desc *tmp; + struct list_head *target_list; + + target_list = info.is_port ? &hisi_ptt->port_filters : + &hisi_ptt->req_filters; + + list_for_each_entry_safe(filter, tmp, target_list, list) + if (filter->devid == info.devid) { + if (hisi_ptt->sysfs_inited) + hisi_ptt_remove_filter_attr(hisi_ptt, filter); + + hisi_ptt_del_free_filter(hisi_ptt, filter); + break; + } + } + } + + mutex_unlock(&hisi_ptt->filter_lock); +} + +/* + * A PCI bus notifier is used here for dynamically updating the filter + * list. + */ +static int hisi_ptt_notifier_call(struct notifier_block *nb, unsigned long action, + void *data) +{ + struct hisi_ptt *hisi_ptt = container_of(nb, struct hisi_ptt, hisi_ptt_nb); + struct hisi_ptt_filter_update_info info; + struct pci_dev *pdev, *root_port; + struct device *dev = data; + u32 port_devid; + + pdev = to_pci_dev(dev); + root_port = pcie_find_root_port(pdev); + if (!root_port) + return 0; + + port_devid = PCI_DEVID(root_port->bus->number, root_port->devfn); + if (port_devid < hisi_ptt->lower_bdf || + port_devid > hisi_ptt->upper_bdf) + return 0; + + info.is_port = pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT; + info.devid = PCI_DEVID(pdev->bus->number, pdev->devfn); + + switch (action) { + case BUS_NOTIFY_ADD_DEVICE: + info.is_add = true; + break; + case BUS_NOTIFY_DEL_DEVICE: + info.is_add = false; + break; + default: + return 0; + } + + /* + * The FIFO size is 16 which is sufficient for almost all the cases, + * since each PCIe core will have most 8 Root Ports (typically only + * 1~4 Root Ports). On failure log the failed filter and let user + * handle it. + */ + if (kfifo_in_spinlocked(&hisi_ptt->filter_update_kfifo, &info, 1, + &hisi_ptt->filter_update_lock)) + schedule_delayed_work(&hisi_ptt->work, 0); + else + pci_warn(hisi_ptt->pdev, + "filter update fifo overflow for target %s\n", + pci_name(pdev)); + + return 0; +} + +static int hisi_ptt_init_filters(struct pci_dev *pdev, void *data) +{ + struct pci_dev *root_port = pcie_find_root_port(pdev); + struct hisi_ptt_filter_desc *filter; + struct hisi_ptt *hisi_ptt = data; + u32 port_devid; + + if (!root_port) + return 0; + + port_devid = PCI_DEVID(root_port->bus->number, root_port->devfn); + if (port_devid < hisi_ptt->lower_bdf || + port_devid > hisi_ptt->upper_bdf) + return 0; + + /* + * We won't fail the probe if filter allocation failed here. The filters + * should be partial initialized and users would know which filter fails + * through the log. Other functions of PTT device are still available. + */ + filter = hisi_ptt_alloc_add_filter(hisi_ptt, PCI_DEVID(pdev->bus->number, pdev->devfn), + pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT); + if (!filter) + return -ENOMEM; + return 0; } @@ -518,10 +795,58 @@ static struct attribute_group hisi_ptt_pmu_format_group = { .attrs = hisi_ptt_pmu_format_attrs, }; +static ssize_t hisi_ptt_filter_multiselect_show(struct device *dev, + struct device_attribute *attr, + char *buf) +{ + struct dev_ext_attribute *ext_attr; + + ext_attr = container_of(attr, struct dev_ext_attribute, attr); + return sysfs_emit(buf, "%s\n", (char *)ext_attr->var); +} + +static struct dev_ext_attribute root_port_filters_multiselect = { + .attr = { + .attr = { .name = "multiselect", .mode = 0400 }, + .show = hisi_ptt_filter_multiselect_show, + }, + .var = "1", +}; + +static struct attribute *hisi_ptt_pmu_root_ports_attrs[] = { + &root_port_filters_multiselect.attr.attr, + NULL +}; + +static struct attribute_group hisi_ptt_pmu_root_ports_group = { + .name = HISI_PTT_RP_FILTERS_GRP_NAME, + .attrs = hisi_ptt_pmu_root_ports_attrs, +}; + +static struct dev_ext_attribute requester_filters_multiselect = { + .attr = { + .attr = { .name = "multiselect", .mode = 0400 }, + .show = hisi_ptt_filter_multiselect_show, + }, + .var = "0", +}; + +static struct attribute *hisi_ptt_pmu_requesters_attrs[] = { + &requester_filters_multiselect.attr.attr, + NULL +}; + +static struct attribute_group hisi_ptt_pmu_requesters_group = { + .name = HISI_PTT_REQ_FILTERS_GRP_NAME, + .attrs = hisi_ptt_pmu_requesters_attrs, +}; + static const struct attribute_group *hisi_ptt_pmu_groups[] = { &hisi_ptt_cpumask_attr_group, &hisi_ptt_pmu_format_group, &hisi_ptt_tune_group, + &hisi_ptt_pmu_root_ports_group, + &hisi_ptt_pmu_requesters_group, NULL }; @@ -967,6 +1292,12 @@ static int hisi_ptt_probe(struct pci_dev *pdev, return ret; } + ret = hisi_ptt_init_filter_attributes(hisi_ptt); + if (ret) { + pci_err(pdev, "failed to init sysfs filter attributes, ret = %d", ret); + return ret; + } + return 0; } diff --git a/drivers/hwtracing/ptt/hisi_ptt.h b/drivers/hwtracing/ptt/hisi_ptt.h index 948a4c423152..e9839362b79c 100644 --- a/drivers/hwtracing/ptt/hisi_ptt.h +++ b/drivers/hwtracing/ptt/hisi_ptt.h @@ -11,6 +11,8 @@ #include #include +#include +#include #include #include #include @@ -131,13 +133,24 @@ struct hisi_ptt_trace_ctrl { u32 type:4; }; +/* + * sysfs attribute group name for root port filters and requester filters: + * /sys/devices/hisi_ptt_/root_port_filters + * and + * /sys/devices/hisi_ptt_/requester_filters + */ +#define HISI_PTT_RP_FILTERS_GRP_NAME "root_port_filters" +#define HISI_PTT_REQ_FILTERS_GRP_NAME "requester_filters" + /** * struct hisi_ptt_filter_desc - Descriptor of the PTT trace filter + * @attr: sysfs attribute of this filter * @list: entry of this descriptor in the filter list * @is_port: the PCI device of the filter is a Root Port or not * @devid: the PCI device's devid of the filter */ struct hisi_ptt_filter_desc { + struct device_attribute attr; struct list_head list; bool is_port; u16 devid; @@ -171,6 +184,8 @@ struct hisi_ptt_pmu_buf { * @lower_bdf: the lower BDF range of the PCI devices managed by this PTT device * @port_filters: the filter list of root ports * @req_filters: the filter list of requester ID + * @filter_lock: lock to protect the filters + * @sysfs_inited: whether the filters' sysfs entries has been initialized * @port_mask: port mask of the managed root ports */ struct hisi_ptt { @@ -194,6 +209,8 @@ struct hisi_ptt { */ struct list_head port_filters; struct list_head req_filters; + struct mutex filter_lock; + bool sysfs_inited; u16 port_mask; }; -- Gitee From 4a01406f3a8625783a6a61ea74eead649261bb89 Mon Sep 17 00:00:00 2001 From: Yicong Yang Date: Thu, 7 Dec 2023 19:56:24 +0800 Subject: [PATCH 17/18] perf hisi-ptt: Fix one memory leakage in hisi_ptt_process_auxtrace_event() driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I8MEWF CVE: NA -------------------------------- ASan complains a memory leakage in hisi_ptt_process_auxtrace_event() that the data buffer is not freed. Since currently we only support the raw dump trace mode, the data buffer is used only within this function. So fix this by freeing the data buffer before going out. Fixes: 5e91e57e6809 ("perf auxtrace arm64: Add support for parsing HiSilicon PCIe Trace packet") Signed-off-by: Yicong Yang Signed-off-by: Junhao He Signed-off-by: Slim6882 --- tools/perf/util/hisi-ptt.c | 195 +++++++++++++++++++++++++++++++++++++ 1 file changed, 195 insertions(+) create mode 100644 tools/perf/util/hisi-ptt.c diff --git a/tools/perf/util/hisi-ptt.c b/tools/perf/util/hisi-ptt.c new file mode 100644 index 000000000000..52d0ce302ca0 --- /dev/null +++ b/tools/perf/util/hisi-ptt.c @@ -0,0 +1,195 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * HiSilicon PCIe Trace and Tuning (PTT) support + * Copyright (c) 2022 HiSilicon Technologies Co., Ltd. + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "auxtrace.h" +#include "color.h" +#include "debug.h" +#include "evsel.h" +#include "hisi-ptt.h" +#include "hisi-ptt-decoder/hisi-ptt-pkt-decoder.h" +#include "machine.h" +#include "session.h" +#include "tool.h" +#include + +struct hisi_ptt { + struct auxtrace auxtrace; + u32 auxtrace_type; + struct perf_session *session; + struct machine *machine; + u32 pmu_type; +}; + +struct hisi_ptt_queue { + struct hisi_ptt *ptt; + struct auxtrace_buffer *buffer; +}; + +static enum hisi_ptt_pkt_type hisi_ptt_check_packet_type(unsigned char *buf) +{ + uint32_t head = *(uint32_t *)buf; + + if ((HISI_PTT_8DW_CHECK_MASK & head) == HISI_PTT_IS_8DW_PKT) + return HISI_PTT_8DW_PKT; + + return HISI_PTT_4DW_PKT; +} + +static void hisi_ptt_dump(struct hisi_ptt *ptt __maybe_unused, + unsigned char *buf, size_t len) +{ + const char *color = PERF_COLOR_BLUE; + enum hisi_ptt_pkt_type type; + size_t pos = 0; + int pkt_len; + + type = hisi_ptt_check_packet_type(buf); + len = round_down(len, hisi_ptt_pkt_size[type]); + color_fprintf(stdout, color, ". ... HISI PTT data: size %zu bytes\n", + len); + + while (len > 0) { + pkt_len = hisi_ptt_pkt_desc(buf, pos, type); + if (!pkt_len) + color_fprintf(stdout, color, " Bad packet!\n"); + + pos += pkt_len; + len -= pkt_len; + } +} + +static void hisi_ptt_dump_event(struct hisi_ptt *ptt, unsigned char *buf, + size_t len) +{ + printf(".\n"); + + hisi_ptt_dump(ptt, buf, len); +} + +static int hisi_ptt_process_event(struct perf_session *session __maybe_unused, + union perf_event *event __maybe_unused, + struct perf_sample *sample __maybe_unused, + struct perf_tool *tool __maybe_unused) +{ + return 0; +} + +static int hisi_ptt_process_auxtrace_event(struct perf_session *session, + union perf_event *event, + struct perf_tool *tool __maybe_unused) +{ + struct hisi_ptt *ptt = container_of(session->auxtrace, struct hisi_ptt, + auxtrace); + int fd = perf_data__fd(session->data); + int size = event->auxtrace.size; + void *data = malloc(size); + off_t data_offset; + int err; + + if (!data) + return -errno; + + if (perf_data__is_pipe(session->data)) { + data_offset = 0; + } else { + data_offset = lseek(fd, 0, SEEK_CUR); + if (data_offset == -1) { + free(data); + return -errno; + } + } + + err = readn(fd, data, size); + if (err != (ssize_t)size) { + free(data); + return -errno; + } + + if (dump_trace) + hisi_ptt_dump_event(ptt, data, size); + + free(data); + return 0; +} + +static int hisi_ptt_flush(struct perf_session *session __maybe_unused, + struct perf_tool *tool __maybe_unused) +{ + return 0; +} + +static void hisi_ptt_free_events(struct perf_session *session __maybe_unused) +{ +} + +static void hisi_ptt_free(struct perf_session *session) +{ + struct hisi_ptt *ptt = container_of(session->auxtrace, struct hisi_ptt, + auxtrace); + + session->auxtrace = NULL; + free(ptt); +} + +static bool hisi_ptt_evsel_is_auxtrace(struct perf_session *session, + struct evsel *evsel) +{ + struct hisi_ptt *ptt = container_of(session->auxtrace, struct hisi_ptt, auxtrace); + + return evsel->core.attr.type == ptt->pmu_type; +} + +static void hisi_ptt_print_info(__u64 type) +{ + if (!dump_trace) + return; + + fprintf(stdout, " PMU Type %" PRId64 "\n", (s64) type); +} + +int hisi_ptt_process_auxtrace_info(union perf_event *event, + struct perf_session *session) +{ + struct perf_record_auxtrace_info *auxtrace_info = &event->auxtrace_info; + struct hisi_ptt *ptt; + + if (auxtrace_info->header.size < HISI_PTT_AUXTRACE_PRIV_SIZE + + sizeof(struct perf_record_auxtrace_info)) + return -EINVAL; + + ptt = zalloc(sizeof(*ptt)); + if (!ptt) + return -ENOMEM; + + ptt->session = session; + ptt->machine = &session->machines.host; /* No kvm support */ + ptt->auxtrace_type = auxtrace_info->type; + ptt->pmu_type = auxtrace_info->priv[0]; + + ptt->auxtrace.process_event = hisi_ptt_process_event; + ptt->auxtrace.process_auxtrace_event = hisi_ptt_process_auxtrace_event; + ptt->auxtrace.flush_events = hisi_ptt_flush; + ptt->auxtrace.free_events = hisi_ptt_free_events; + ptt->auxtrace.free = hisi_ptt_free; + ptt->auxtrace.evsel_is_auxtrace = hisi_ptt_evsel_is_auxtrace; + session->auxtrace = &ptt->auxtrace; + + hisi_ptt_print_info(auxtrace_info->priv[0]); + + return 0; +} -- Gitee From 127a8808423580885577e4a356331624b0013838 Mon Sep 17 00:00:00 2001 From: Yicong Yang Date: Wed, 21 Jun 2023 17:28:01 +0800 Subject: [PATCH 18/18] hwtracing: hisi_ptt: Add support for dynamically updating the filter list The PCIe devices supported by the PTT trace can be removed/rescanned by hotplug or through sysfs. Add support for dynamically updating the available filter list by registering a PCI bus notifier block. Then user can always get latest information about available tracing filters and driver can block the invalid filters of which related devices no longer exist in the system. Reviewed-by: Jonathan Cameron Signed-off-by: Yicong Yang Signed-off-by: Suzuki K Poulose Link: https://lore.kernel.org/r/20230621092804.15120-3-yangyicong@huawei.com Signed-off-by: Slim6882 --- Documentation/trace/hisi-ptt.rst | 6 +-- drivers/hwtracing/ptt/hisi_ptt.c | 84 ++++++++++++++++++++++++++++---- drivers/hwtracing/ptt/hisi_ptt.h | 37 ++++++++++++++ 3 files changed, 114 insertions(+), 13 deletions(-) diff --git a/Documentation/trace/hisi-ptt.rst b/Documentation/trace/hisi-ptt.rst index 090bf3e8e8f9..989255eb5622 100644 --- a/Documentation/trace/hisi-ptt.rst +++ b/Documentation/trace/hisi-ptt.rst @@ -159,9 +159,9 @@ Endpoint function can be specified in one trace. Specifying both Root Port and function at the same time is not supported. Driver maintains a list of available filters and will check the invalid inputs. -Currently the available filters are detected in driver's probe. If the supported -devices are removed/added after probe, you may need to reload the driver to update -the filters. +The available filters will be dynamically updated, which means you will always +get correct filter information when hotplug events happen, or when you manually +remove/rescan the devices. 2. Type ------- diff --git a/drivers/hwtracing/ptt/hisi_ptt.c b/drivers/hwtracing/ptt/hisi_ptt.c index bd05c4c8918a..8e40fc236a23 100644 --- a/drivers/hwtracing/ptt/hisi_ptt.c +++ b/drivers/hwtracing/ptt/hisi_ptt.c @@ -354,10 +354,32 @@ static int hisi_ptt_register_irq(struct hisi_ptt *hisi_ptt) return 0; } -static int hisi_ptt_init_filters(struct pci_dev *pdev, void *data) +static void hisi_ptt_del_free_filter(struct hisi_ptt *hisi_ptt, + struct hisi_ptt_filter_desc *filter) +{ + if (filter->is_port) + hisi_ptt->port_mask &= ~hisi_ptt_get_filter_val(filter->devid, true); + + list_del(&filter->list); + kfree(filter->name); + kfree(filter); +} + +static struct hisi_ptt_filter_desc * +hisi_ptt_alloc_add_filter(struct hisi_ptt *hisi_ptt, u16 devid, bool is_port) { struct hisi_ptt_filter_desc *filter; - struct hisi_ptt *hisi_ptt = data; + u8 devfn = devid & 0xff; + char *filter_name; + + filter_name = kasprintf(GFP_KERNEL, "%04x:%02x:%02x.%d", pci_domain_nr(hisi_ptt->pdev->bus), + PCI_BUS_NUM(devid), PCI_SLOT(devfn), PCI_FUNC(devfn)); + if (!filter_name) { + pci_err(hisi_ptt->pdev, "failed to allocate name for filter %04x:%02x:%02x.%d\n", + pci_domain_nr(hisi_ptt->pdev->bus), PCI_BUS_NUM(devid), + PCI_SLOT(devfn), PCI_FUNC(devfn)); + return NULL; + } /* * We won't fail the probe if filter allocation failed here. The filters @@ -366,14 +388,17 @@ static int hisi_ptt_init_filters(struct pci_dev *pdev, void *data) */ filter = kzalloc(sizeof(*filter), GFP_KERNEL); if (!filter) { - pci_err(hisi_ptt->pdev, "failed to add filter %s\n", pci_name(pdev)); - return -ENOMEM; + pci_err(hisi_ptt->pdev, "failed to add filter for %s\n", + filter_name); + kfree(filter_name); + return NULL; } - filter->devid = PCI_DEVID(pdev->bus->number, pdev->devfn); + filter->name = filter_name; + filter->is_port = is_port; + filter->devid = devid; - if (pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT) { - filter->is_port = true; + if (filter->is_port) { list_add_tail(&filter->list, &hisi_ptt->port_filters); /* Update the available port mask */ @@ -718,8 +743,13 @@ static int hisi_ptt_init_ctrls(struct hisi_ptt *hisi_ptt) int ret; u32 reg; + INIT_DELAYED_WORK(&hisi_ptt->work, hisi_ptt_update_filters); + INIT_KFIFO(hisi_ptt->filter_update_kfifo); + spin_lock_init(&hisi_ptt->filter_update_lock); + INIT_LIST_HEAD(&hisi_ptt->port_filters); INIT_LIST_HEAD(&hisi_ptt->req_filters); + mutex_init(&hisi_ptt->filter_lock); ret = hisi_ptt_config_trace_buf(hisi_ptt); if (ret) @@ -920,6 +950,7 @@ static int hisi_ptt_trace_valid_filter(struct hisi_ptt *hisi_ptt, u64 config) { unsigned long val, port_mask = hisi_ptt->port_mask; struct hisi_ptt_filter_desc *filter; + int ret = 0; hisi_ptt->trace_ctrl.is_port = FIELD_GET(HISI_PTT_PMU_FILTER_IS_PORT, config); val = FIELD_GET(HISI_PTT_PMU_FILTER_VAL_MASK, config); @@ -933,16 +964,20 @@ static int hisi_ptt_trace_valid_filter(struct hisi_ptt *hisi_ptt, u64 config) * For Requester ID filters, walk the available filter list to see * whether we have one matched. */ + mutex_lock(&hisi_ptt->filter_lock); if (!hisi_ptt->trace_ctrl.is_port) { list_for_each_entry(filter, &hisi_ptt->req_filters, list) { if (val == hisi_ptt_get_filter_val(filter->devid, filter->is_port)) - return 0; + goto out; } } else if (bitmap_subset(&val, &port_mask, BITS_PER_LONG)) { - return 0; + goto out; } - return -EINVAL; + ret = -EINVAL; +out: + mutex_unlock(&hisi_ptt->filter_lock); + return ret; } static void hisi_ptt_pmu_init_configs(struct hisi_ptt *hisi_ptt, struct perf_event *event) @@ -1215,6 +1250,31 @@ static int hisi_ptt_register_pmu(struct hisi_ptt *hisi_ptt) &hisi_ptt->hisi_ptt_pmu); } +static void hisi_ptt_unregister_filter_update_notifier(void *data) +{ + struct hisi_ptt *hisi_ptt = data; + + bus_unregister_notifier(&pci_bus_type, &hisi_ptt->hisi_ptt_nb); + + /* Cancel any work that has been queued */ + cancel_delayed_work_sync(&hisi_ptt->work); +} + +/* Register the bus notifier for dynamically updating the filter list */ +static int hisi_ptt_register_filter_update_notifier(struct hisi_ptt *hisi_ptt) +{ + int ret; + + hisi_ptt->hisi_ptt_nb.notifier_call = hisi_ptt_notifier_call; + ret = bus_register_notifier(&pci_bus_type, &hisi_ptt->hisi_ptt_nb); + if (ret) + return ret; + + return devm_add_action_or_reset(&hisi_ptt->pdev->dev, + hisi_ptt_unregister_filter_update_notifier, + hisi_ptt); +} + /* * The DMA of PTT trace can only use direct mappings due to some * hardware restriction. Check whether there is no IOMMU or the @@ -1286,6 +1346,10 @@ static int hisi_ptt_probe(struct pci_dev *pdev, return ret; } + ret = hisi_ptt_register_filter_update_notifier(hisi_ptt); + if (ret) + pci_warn(pdev, "failed to register filter update notifier, ret = %d", ret); + ret = hisi_ptt_register_pmu(hisi_ptt); if (ret) { pci_err(pdev, "failed to register PMU device, ret = %d", ret); diff --git a/drivers/hwtracing/ptt/hisi_ptt.h b/drivers/hwtracing/ptt/hisi_ptt.h index e9839362b79c..e17f045d7e72 100644 --- a/drivers/hwtracing/ptt/hisi_ptt.h +++ b/drivers/hwtracing/ptt/hisi_ptt.h @@ -15,10 +15,12 @@ #include #include #include +#include #include #include #include #include +#include #define DRV_NAME "hisi_ptt" @@ -73,6 +75,11 @@ #define HISI_PTT_WAIT_TRACE_TIMEOUT_US 100UL #define HISI_PTT_WAIT_POLL_INTERVAL_US 10UL +/* FIFO size for dynamically updating the PTT trace filter list. */ +#define HISI_PTT_FILTER_UPDATE_FIFO_SIZE 16 +/* Delay time for filter updating work */ +#define HISI_PTT_WORK_DELAY_MS 100UL + #define HISI_PCIE_CORE_PORT_ID(devfn) ((PCI_SLOT(devfn) & 0x7) << 1) /* Definition of the PMU configs */ @@ -147,12 +154,26 @@ struct hisi_ptt_trace_ctrl { * @attr: sysfs attribute of this filter * @list: entry of this descriptor in the filter list * @is_port: the PCI device of the filter is a Root Port or not + * @name: name of this filter, same as the name of the related PCI device * @devid: the PCI device's devid of the filter */ struct hisi_ptt_filter_desc { struct device_attribute attr; struct list_head list; bool is_port; + char *name; + u16 devid; +}; + +/** + * struct hisi_ptt_filter_update_info - Information for PTT filter updating + * @is_port: the PCI device to update is a Root Port or not + * @is_add: adding to the filter or not + * @devid: the PCI device's devid of the filter + */ +struct hisi_ptt_filter_update_info { + bool is_port; + bool is_add; u16 devid; }; @@ -173,6 +194,7 @@ struct hisi_ptt_pmu_buf { /** * struct hisi_ptt - Per PTT device data * @trace_ctrl: the control information of PTT trace + * @hisi_ptt_nb: dynamic filter update notifier * @hotplug_node: node for register cpu hotplug event * @hisi_ptt_pmu: the pum device of trace * @iobase: base IO address of the device @@ -187,9 +209,13 @@ struct hisi_ptt_pmu_buf { * @filter_lock: lock to protect the filters * @sysfs_inited: whether the filters' sysfs entries has been initialized * @port_mask: port mask of the managed root ports + * @work: delayed work for filter updating + * @filter_update_lock: spinlock to protect the filter update fifo + * @filter_update_fifo: fifo of the filters waiting to update the filter list */ struct hisi_ptt { struct hisi_ptt_trace_ctrl trace_ctrl; + struct notifier_block hisi_ptt_nb; struct hlist_node hotplug_node; struct pmu hisi_ptt_pmu; void __iomem *iobase; @@ -212,6 +238,17 @@ struct hisi_ptt { struct mutex filter_lock; bool sysfs_inited; u16 port_mask; + + /* + * We use a delayed work here to avoid indefinitely waiting for + * the hisi_ptt->mutex which protecting the filter list. The + * work will be delayed only if the mutex can not be held, + * otherwise no delay will be applied. + */ + struct delayed_work work; + spinlock_t filter_update_lock; + DECLARE_KFIFO(filter_update_kfifo, struct hisi_ptt_filter_update_info, + HISI_PTT_FILTER_UPDATE_FIFO_SIZE); }; #define to_hisi_ptt(pmu) container_of(pmu, struct hisi_ptt, hisi_ptt_pmu) -- Gitee