3.19 CW 模拟赛 T4. 字符串-编程知识

3.19 CW 模拟赛 T4. 字符串

news/2025/3/25 17:42:47/文章来源:https://www.cnblogs.com/YzaCsp/p/18788299

前言

其实这种在排序时应该靠前的, 比较难评

思路

这个这个真的比较这个这个, 这下这下了

显然 $M = 2$ 是非常好的提醒
我们发现可以通过记录 $?$ 的模式来匹配问题

但是正如我赛时感受到的, 这显然不是一个好的可供模拟的方法, 必须厉害一点啊
因此不难考虑到状压哪些地方是问号, 以此来统计个数

现在最大的问题是怎么去查询哪些串和这个串可以匹配
不难发现 $?$ 位置包含非问号位置可以匹配, 包含一部分 $($可以为空$)$, 剩下的直接匹配也行

怎么写, 显然可以利用一车二进制, 这不困难因此不加赘述

所以这样做复杂度大概是 $\mathcal{O} (nm 2^m)$

写着写着就不出意外的出意外了
遇到这种情况, 我的构想是不完善的

? ? o ? o o ?
? o o o ? ? ?

这种情况下怎么判断这两个东西是否匹配
也就是说, 我们可能需要修改一下状压的方式, 改成只记录被提取的位置
但是这样还是有问题, 因为通配符仍然是不好识别的

显然这是一个很重要的问题

当前遇到的问题

如何快速统计所有

通配符包含当前非通配符
通配符不包含当前非通配符的部分相同

其中比较不好处理的是第二个部分
也就是需要处理无视一些地方之后, 剩下部分相同的字符串数量

但是这又有问题了, 你发现无视的地方需要用当前通配符和之前通配符的交集来处理, 然而我们不可能同时知道之前通配符的位置提取出来的结果

因此这些性质过于复杂, 简直不太可能是正确做法了

考虑推倒重来
假设当前是串 ? o o o ? ? ? $($记为串 $p$$)$, 那么什么串可以和它匹配?
分为以下两种

通配符包含 $p$ 中非通配符
通配符不完全$($当然也可以完全不$)$包含 $p$ 中非通配符, 但是不包含的部分都对应相同

发现不好处理在于我们枚举通配符位置之后, 不好再去找「不包含的部分」了

这个时候 $20 \rm{pts}$ 的做法是否能给予一些启示?
也就是我们能不能分开维护

通配符位置对应数量及对应提取串
单独提取出一部分对应的「通配符位置对应数量及对应提取串」

那么维护

通配符不完全$($当然也可以完全不$)$包含 $p$ 中非通配符, 但是不包含的部分都对应相同

是否就可以先提取出串 $p$ 对应的非通配符对应的串, 然后再去做

感觉这样很复杂但确实可行, 也就是维护外层表示到底考虑哪些位, 内层表示这些位置对应的通配符位置对应的提取串的数量和总共的数量, 可以做完 $60$, 甚至到 $100$

正常 $\rm{hash}$ 做法

发现原来不好处理本质上是分析问题出了一些哮问题

于是考虑两串 $s, t$ 相似仅当 $\displaystyle \forall i, s_i = \textrm{?} \lor t_i = \textrm{?} \lor s_i = t_i$
假设当前串为 $t$, 如何高效统计相似 $s$ 个数
对当前 $t$ 中非 $\textrm{?}$ 的部分枚举是对应字符还是通配符, 然后再对应匹配 $\rm{hash}$

如何更新
发现我们需要的信息是对一些位置提取出来的子串做匹配, 因此直接这么写就好了

代码

#include <bits/stdc++.h>
#define FOR(i, a, b) for (register int i = (a); i <= (b); ++i)
using namespace std;int n, m;
char str[10];
map<int, int> mp[65]; // mp[j] stores the count of hash values for mask j
int ans;// Function to hash a character
inline int HashChar(char ch) {return (ch == '?') ? 26 : (ch - 'a');
}int main() {// Read input valuesscanf("%d %d", &n, &m);// Process each stringFOR(i, 1, n) {scanf("%s", str);int has = 0, opt = 0, tt = -1;char tmp[10]; // Stores non-wildcard characters// Compute the mask and extract non-wildcard charactersFOR(j, 0, m - 1) {if (str[j] != '?') {opt |= (1 << j); // Set the bit for non-wildcard positionshas = (has << 1) | 1; // Update the has masktmp[++tt] = str[j]; // Store the character}}// If the string is all wildcards, it matches all previous stringsif (opt == 0) {ans += i - 1;} else {// Enumerate all possible subsets of the non-wildcard positionsFOR(j, 0, has) {int cnt = 0; // Compute the hash value for the current subsetfor (int k = 0; (1 << k) <= has; ++k) {if ((1 << k) & j) {cnt = cnt * 30 + 26; // Wildcard position} else {cnt = cnt * 30 + (tmp[k] - 'a'); // Non-wildcard position}}// If the hash value exists in the map, add its count to the answerif (mp[opt].count(cnt)) {ans += mp[opt][cnt];}}}// Update the map with all possible masks for the current stringFOR(j, 1, (1 << m) - 1) {int cnt = 0; // Compute the hash value for the current maskFOR(k, 0, m - 1) {if ((1 << k) & j) {cnt = cnt * 30 + HashChar(str[k]); // Include the character in the hash}}mp[j][cnt]++; // Increment the count for the hash value}}// Output the final answerprintf("%d\n", ans);return 0;
}

逆天 $\rm{bitset}$ 做法

对 $s$ 每一个位置处理 $\displaystyle \forall i, s_i = \textrm{?} \lor t_i = \textrm{?} \lor s_i = t_i$ 对应的 $t$ 的位置, 取交即可

具体的, 对每个字符 $x$ 维护 $t_i = x$ 对应的 $t$ 的位置, 然后取交即可

代码

#include <bits/stdc++.h>
using namespace std;bitset <50000> now, las[6][26];
long long ans;
int n, m;
char ch;signed main () {scanf("%d%d", &n, &m);for (int i = 0, tmp; i < n; i++) {now.set (); getchar (), getchar ();for (int j = 0; j < m; j++) {ch = getchar ();if (ch != '?') {now &= las[j][ch - 'a'];las[j][ch - 'a'].set (i);}else for (int x = 0; x < 26; x++) las[j][x].set (i);}tmp = now.count ();ans += (tmp < 50000 ? tmp : i);}printf("%lld", ans);return 0;
}

容斥做法

最有学习价值的一集, 可惜题解太屎了
但是对于集训队爷来说, 这种题写题解都是浪费时间

首先形式化问题为

题意

定义长为 $m$ 的串 $s, t$ 相似, 仅当
$\forall i \in [1, m], s_i = t_i \lor s_i = \text{?} \lor t_i = \text{?}$
给定 $n$ 个长为 $m$ 的串 $s_i$ , 求相似串对的个数

比较一眼的是 $\rm{hash}$ 的做法, 上面已经讲过了, 凭我自己不太可能想得到枚举匹配情况, 但是先不扯那么多, 继续降下去

燃尽最后一点热爱, 这个题必须想出来!
首先 $s, t$ 相似仅当 $\forall i, s_i = t_i \lor s_i = \text{?} \lor t_i = \text{?}$

你发现这种形式并不易于维护
考虑正难则反, 转化成不相似
$s, t$ 不相似仅当 $\exists i, s_i \neq t_i \land s_i \neq \text{?} \land t_i \neq \text{?}$
这个东西可以转化来用容斥原理维护, 下面记 $s \nsim t$ 表示 $s, t$ 不相似

\[ \begin{align*} & [s \nsim t] \\ =& [\exists i, s_i \neq t_i \land s_i \neq \text{?} \land t_i \neq \text{?}] \\ =& \sum_{\mathbb{S} \subseteq \mathbb{U}} (-1)^{|S| + 1} \times \Big( [\forall i \in \mathbb{S}, s_i \neq t_i \land s_i \neq \text{?} \land t_i \neq \text{?}] \Big)^{\ast} \\ =& \sum_{\mathbb{S} \subseteq \mathbb{U}} (-1)^{|S| + 1} \times \begin{cases} 0 & \text{若 } \exists i \in \mathbb{S}, s_i = \text{?} \lor t_i = \text{?}^{\dagger} \\ [\forall i \in \mathbb{S}, s_i \neq t_i] & \text{若 } \nexists i \in \mathbb{S}, s_i = \text{?} \lor t_i = \text{?} \end{cases} \\ =& \sum_{\mathbb{S} \subseteq \mathbb{U}} (-1)^{|S| + 1} \times [\forall i \in \mathbb{S}, s_i \neq \text{?} \land t_i \neq \text{?}] \times [\forall i \in \mathbb{S}, s_i \neq t_i]^{\ddagger} \\ =& \sum_{\mathbb{S} \subseteq \mathbb{U}} (-1)^{|S| + 1} \times [\forall i \in \mathbb{S}, s_i \neq \text{?} \land t_i \neq \text{?}] \times \Big(1 - [\exists i \in \mathbb{S}, s_i = t_i]\Big) \\ =& \sum_{\mathbb{S} \subseteq \mathbb{U}} \bigg\{ (-1)^{|S| + 1} \times [\forall i \in \mathbb{S}, s_i \neq \text{?} \land t_i \neq \text{?}] \bigg\} - \bigg\{ (-1)^{|S| + 1} \times [\forall i \in \mathbb{S}, s_i \neq \text{?} \land t_i \neq \text{?}] \times [\exists i \in \mathbb{S}, s_i = t_i] \bigg\}^{\S} \\ =& \sum_{\mathbb{S} \subseteq \mathbb{U}} \alpha - \left\{ \alpha \times \sum_{\mathbb{T} \subseteq \mathbb{S}} (-1)^{|T| + 1} \times [\forall i \in \mathbb{T}, s_i = t_i] \right\} \\ =& \sum_{\mathbb{S} \subseteq \mathbb{U}} (-1)^{|S| + 1} \times [\forall i \in \mathbb{S}, s_i \neq \text{?} \land t_i \neq \text{?}] - \sum_{\mathbb{S} \subseteq \mathbb{U}} [\forall i \in \mathbb{S}, s_i \neq \text{?} \land t_i \neq \text{?}] \times \sum_{\mathbb{T} \subseteq \mathbb{S}} (-1)^{|S| + |T|} \times [\forall i \in \mathbb{T}, s_i = t_i]^{\P} \end{align*} \]

$\ast :$ 到这一步仍然不好直接处理, 于是下面才开始分类讨论转化成单一条件
$\dagger :$ 本质上是发现处理 $s_i = \text{?} \lor t_i = \text{?}$ 是简单的, 因为这与串串之间并没有关系
$\ddagger :$ 发现 $\neq$ 不好做, 考虑搞到 $=$, 于是简单转化成 $1 - [\exists i \in \mathbb{S}, s_i = t_i]$, 同上面用二次容斥把 $\exists$ 转化成 $\forall$ 即可
$\S :$ 以下记 $\alpha = (-1)^{|S| + 1} \times [\forall i \in \mathbb{S}, s_i \neq \text{?} \land t_i \neq \text{?}]$
$\P :$ 不难发现这个部分是易于统计的, 因此转化到对上 $($假设字符串集合为 $\mathbb{P}$$)$

\[ \begin{align*} & \sum_{s \in \mathbb{P}, t \in \mathbb{P}, s \neq t} [s \nsim t] \\ =& \sum_{s \in \mathbb{P}, t \in \mathbb{P}, s \neq t} \sum_{\mathbb{S} \subseteq \mathbb{U}} (-1)^{|S| + 1} \times [\forall i \in \mathbb{S}, s_i \neq \text{?} \land t_i \neq \text{?}] - \sum_{s \in \mathbb{P}, t \in \mathbb{P}, s \neq t} \sum_{\mathbb{S} \subseteq \mathbb{U}} [\forall i \in \mathbb{S}, s_i \neq \text{?} \land t_i \neq \text{?}] \times \sum_{\mathbb{T} \subseteq \mathbb{S}} (-1)^{|S| + |T|} \times [\forall i \in \mathbb{T}, s_i = t_i] \\ =& \sum_{\mathbb{S} \subseteq \mathbb{U}} (-1)^{|S| + 1} \times \left(\sum_{s \in \mathbb{P}} [\forall i \in \mathbb{S}, s_i \neq \text{?}]\right) \times \left(\sum_{s \in \mathbb{P}} [\forall i \in \mathbb{S}, s_i \neq \text{?}] - 1\right) - \sum_{s \in \mathbb{P}, t \in \mathbb{P}, s \neq t} \sum_{\mathbb{S} \subseteq \mathbb{U}} \sum_{\mathbb{T} \subseteq \mathbb{S}} (-1)^{|S| + |T|} \times [\forall i \in \mathbb{T}, s_i = t_i \neq \text{?}] \\ \end{align*} \]

前面部分易于模拟, 后面部分用 $\rm{hash}$ 维护即可

发现写出代码之后不太对, 只能开始数据检验了

发现空集不能被考虑
一个地方的精度问题被忽视了

代码

#include <bits/stdc++.h>
#define int long long
using namespace std;// 自定义哈希函数，避免哈希碰撞
struct custom_hash {size_t operator()(uint64_t x) const {static const uint64_t FIXED_RANDOM = chrono::steady_clock::now().time_since_epoch().count();return x ^ FIXED_RANDOM;}
};signed main() {ios::sync_with_stdio(false);cin.tie(0);int n, m;cin >> n >> m;int part1 = 0, part2 = 0;vector<string> words(n);for (int i = 0; i < n; ++i) {cin >> words[i];}long long ans = 0;// 枚举所有可能的mask_i（严格匹配的位置集合）for (int mask_i = 0; mask_i < (1 << m); ++mask_i) {if (mask_i == 0) continue;int bits_i = __builtin_popcount(mask_i);int lsy = 0; // 现在好像也没感觉了, 时间会冲淡除了友情的一切for (int k = 0; k < n; ++k) {bool valid = true;// 检查mask_i的位置是否有`?`for (int l = 0; l < m; ++l) {if ((mask_i & (1 << l)) && words[k][l] == '?') {valid = false;break;}}if (valid) lsy++;}int sign = 1;if ((bits_i + 1) % 2 == 1) {sign = -1;}part1 += sign * lsy * (lsy - 1) / 2;// 枚举mask_j为mask_i的所有子集for (int mask_j = mask_i;; mask_j = (mask_j - 1) & mask_i) {if (mask_j == 0) break;unordered_map<uint64_t, int, custom_hash> cnt;// 遍历所有字符串，筛选有效字符串for (int k = 0; k < n; ++k) {bool valid = true;// 检查mask_i的位置是否有`?`for (int l = 0; l < m; ++l) {if ((mask_i & (1 << l)) && words[k][l] == '?') {valid = false;break;}}if (!valid) continue;// 计算当前字符串在mask_j位置的哈希值uint64_t hash_val = 0;for (int l = 0; l < m; ++l) {if (mask_j & (1 << l)) {// 每个字符用5位表示（足够覆盖26字母）hash_val |= (uint64_t)(words[k][l] - 'a') << (5 * l);}}cnt[hash_val]++;}// 计算容斥符号sign = 1;int bits_j = __builtin_popcount(mask_j);if ((bits_i + bits_j) % 2 == 1) {sign = -1;}// 累加对数到答案for (auto &p : cnt) {long long c = p.second;part2 += sign * c * (c - 1) / 2;}if (mask_j == 0) break;}}cout << (n * (n - 1) / 2) - (part1 - part2) << endl;return 0;
}