Naive approach is,
A loop for haystack, 0 to n - m. m being the length of needle. And at each index, try to match the needle length from haystack with the needle.
Time would be , because for every index, we can try matching m length inside inner loop.
If you think why inner loop, and not get substring from haystack and match with needle at once, well substring also does the same thing internally. And it’s worse, because now you have space as well.
So,
public class Solution {
public int StrStr(string haystack, string needle) {
int n = haystack.Length;
int m = needle.Length;
if (m == 0) return 0;
for (int i = 0; i <= n - m; i++)
{
for (int j = i; j < i + m; j++)
{
if (haystack[j] != needle[j - i])
{
break;
}
if (j == i + m - 1)
{
return i;
}
}
}
return -1;
}
}^This works,
Here’s a little cleaner? version of it,
public int StrStr(string haystack, string needle)
{
int n = haystack.Length, m = needle.Length;
if (m == 0) return 0;
for (int i = 0; i <= n - m; i++)
{
int j = 0;
while (j < m && haystack[i + j] == needle[j])
j++;
if (j == m) return i;
}
return -1;
}Btw, we can do this very concisely this way,
public int StrStr(string haystack, string needle)
{
return haystack.IndexOf(needle);
}haystack.IndexOf(needle)returns first occurrence index or -1- If needle is empty string, it returns 0.
- Worst case is . But in practice ~ due to optimized algorithms (Boyer-Moore etc.)
Basically,
IndexOf = optimized native substring search (fast, safe, usually better than manual code)
Alright, now optimal solution to this,
Knuth-Morris-Pratt (KMP) algorithm,
KMP improves over the naive approach by avoiding re-checking characters after a mismatch: instead of restarting the pattern (needle) from index 0, it uses a precomputed LPS (Longest Prefix which is also Suffix) array to determine how far the pattern can safely ‘shift’ while preserving already matched information; during matching, when a mismatch occurs at position j, we jump j back to LPS[j-1] (instead of 0) and continue without moving the text pointer unnecessarily, ensuring each character is processed at most once and giving overall time complexity .