Compare commits

...

35 Commits

Author SHA1 Message Date
CI
005e0c2bc6 chore: release version v1.84.23 2025-11-15 22:17:28 +00:00
Dmitry Popov
808acb540e fix(core): fixed map pings cancel errors 2025-11-15 23:16:58 +01:00
CI
812582d955 chore: [skip ci] 2025-11-15 11:38:00 +00:00
CI
f3077c0bf1 chore: release version v1.84.22 2025-11-15 11:38:00 +00:00
Dmitry Popov
32c70cbbad Merge branch 'main' of github.com:wanderer-industries/wanderer 2025-11-15 12:37:31 +01:00
Dmitry Popov
8934935e10 fix(core): fixed map initialization 2025-11-15 12:37:27 +01:00
CI
20c8a53712 chore: [skip ci] 2025-11-15 08:48:30 +00:00
CI
b22970fef3 chore: release version v1.84.21 2025-11-15 08:48:30 +00:00
Dmitry Popov
cf72394ef9 Merge branch 'main' of github.com:wanderer-industries/wanderer 2025-11-15 09:47:53 +01:00
Dmitry Popov
e6dbba7283 fix(core): fixed map characters adding 2025-11-15 09:47:48 +01:00
CI
843b3b86b2 chore: [skip ci] 2025-11-15 07:29:25 +00:00
CI
bd865b9f64 chore: release version v1.84.20 2025-11-15 07:29:25 +00:00
Dmitry Popov
ae91cd2f92 Merge branch 'main' of github.com:wanderer-industries/wanderer 2025-11-15 08:25:59 +01:00
Dmitry Popov
0be7a5f9d0 fix(core): fixed map start issues 2025-11-15 08:25:55 +01:00
CI
e15bfa426a chore: [skip ci] 2025-11-14 19:28:51 +00:00
CI
4198e4b07a chore: release version v1.84.19 2025-11-14 19:28:51 +00:00
Dmitry Popov
03ee08ff67 Merge branch 'main' of github.com:wanderer-industries/wanderer 2025-11-14 20:28:16 +01:00
Dmitry Popov
ac4dd4c28b fix(core): fixed map start issues 2025-11-14 20:28:12 +01:00
CI
308e81a464 chore: [skip ci] 2025-11-14 18:36:20 +00:00
CI
6f4240d931 chore: release version v1.84.18 2025-11-14 18:36:20 +00:00
Dmitry Popov
847b45a431 fix(core): added gracefull map poll recovery from saved state. added map slug unique checks 2025-11-14 19:35:45 +01:00
CI
5ec97d74ca chore: [skip ci] 2025-11-14 13:43:40 +00:00
CI
74359a5542 chore: release version v1.84.17 2025-11-14 13:43:40 +00:00
Dmitry Popov
0020f46dd8 fix(core): fixed activity tracking issues 2025-11-14 14:42:44 +01:00
CI
a6751b45c6 chore: [skip ci] 2025-11-13 16:20:24 +00:00
CI
f48aeb5cec chore: release version v1.84.16 2025-11-13 16:20:24 +00:00
Dmitry Popov
a5f25646c9 Merge branch 'main' of github.com:wanderer-industries/wanderer 2025-11-13 17:19:47 +01:00
Dmitry Popov
23cf1fd96f fix(core): removed maps auto-start logic 2025-11-13 17:19:44 +01:00
CI
6f15521069 chore: [skip ci] 2025-11-13 14:49:32 +00:00
CI
9d41e57c06 chore: release version v1.84.15 2025-11-13 14:49:32 +00:00
Dmitry Popov
ea9a22df09 Merge branch 'main' of github.com:wanderer-industries/wanderer 2025-11-13 15:49:01 +01:00
Dmitry Popov
0d4fd6f214 fix(core): fixed maps start/stop logic, added server downtime period support 2025-11-13 15:48:56 +01:00
CI
87a6c20545 chore: [skip ci] 2025-11-13 14:46:26 +00:00
CI
c375f4e4ce chore: release version v1.84.14 2025-11-13 14:46:26 +00:00
Dmitry Popov
843a6d7320 Merge pull request #543 from wanderer-industries/fix-error-on-remove-settings
fix(Map): Fixed problem related with error if settings was removed an…
2025-11-13 18:43:13 +04:00
35 changed files with 3889 additions and 357 deletions

View File

@@ -2,6 +2,96 @@
<!-- changelog -->
## [v1.84.23](https://github.com/wanderer-industries/wanderer/compare/v1.84.22...v1.84.23) (2025-11-15)
### Bug Fixes:
* core: fixed map pings cancel errors
## [v1.84.22](https://github.com/wanderer-industries/wanderer/compare/v1.84.21...v1.84.22) (2025-11-15)
### Bug Fixes:
* core: fixed map initialization
## [v1.84.21](https://github.com/wanderer-industries/wanderer/compare/v1.84.20...v1.84.21) (2025-11-15)
### Bug Fixes:
* core: fixed map characters adding
## [v1.84.20](https://github.com/wanderer-industries/wanderer/compare/v1.84.19...v1.84.20) (2025-11-15)
### Bug Fixes:
* core: fixed map start issues
## [v1.84.19](https://github.com/wanderer-industries/wanderer/compare/v1.84.18...v1.84.19) (2025-11-14)
### Bug Fixes:
* core: fixed map start issues
## [v1.84.18](https://github.com/wanderer-industries/wanderer/compare/v1.84.17...v1.84.18) (2025-11-14)
### Bug Fixes:
* core: added gracefull map poll recovery from saved state. added map slug unique checks
## [v1.84.17](https://github.com/wanderer-industries/wanderer/compare/v1.84.16...v1.84.17) (2025-11-14)
### Bug Fixes:
* core: fixed activity tracking issues
## [v1.84.16](https://github.com/wanderer-industries/wanderer/compare/v1.84.15...v1.84.16) (2025-11-13)
### Bug Fixes:
* core: removed maps auto-start logic
## [v1.84.15](https://github.com/wanderer-industries/wanderer/compare/v1.84.14...v1.84.15) (2025-11-13)
### Bug Fixes:
* core: fixed maps start/stop logic, added server downtime period support
## [v1.84.14](https://github.com/wanderer-industries/wanderer/compare/v1.84.13...v1.84.14) (2025-11-13)
### Bug Fixes:
* Map: Fixed problem related with error if settings was removed and mapper crashed. Fixed settings reset.
## [v1.84.13](https://github.com/wanderer-industries/wanderer/compare/v1.84.12...v1.84.13) (2025-11-13)

View File

@@ -27,11 +27,7 @@ config :wanderer_app,
generators: [timestamp_type: :utc_datetime],
ddrt: WandererApp.Map.CacheRTree,
logger: Logger,
pubsub_client: Phoenix.PubSub,
wanderer_kills_base_url:
System.get_env("WANDERER_KILLS_BASE_URL", "ws://host.docker.internal:4004"),
wanderer_kills_service_enabled:
System.get_env("WANDERER_KILLS_SERVICE_ENABLED", "false") == "true"
pubsub_client: Phoenix.PubSub
config :wanderer_app, WandererAppWeb.Endpoint,
adapter: Bandit.PhoenixAdapter,

View File

@@ -4,7 +4,7 @@ import Config
config :wanderer_app, WandererApp.Repo,
username: "postgres",
password: "postgres",
hostname: System.get_env("DB_HOST", "localhost"),
hostname: "localhost",
database: "wanderer_dev",
stacktrace: true,
show_sensitive_data_on_connection_error: true,

View File

@@ -1,8 +1,25 @@
defmodule WandererApp.Api.Changes.SlugifyName do
@moduledoc """
Ensures map slugs are unique by:
1. Slugifying the provided slug/name
2. Checking for existing slugs (optimization)
3. Finding next available slug with numeric suffix if needed
4. Relying on database unique constraint as final arbiter
Race Condition Mitigation:
- Optimistic check reduces DB roundtrips for most cases
- Database unique index ensures no duplicates slip through
- Proper error messages for constraint violations
- Telemetry events for monitoring conflicts
"""
use Ash.Resource.Change
alias Ash.Changeset
require Ash.Query
require Logger
# Maximum number of attempts to find a unique slug
@max_attempts 100
@impl true
@spec change(Changeset.t(), keyword, Change.context()) :: Changeset.t()
@@ -26,7 +43,7 @@ defmodule WandererApp.Api.Changes.SlugifyName do
# Get the current record ID if this is an update operation
current_id = Changeset.get_attribute(changeset, :id)
# Check if the base slug is available
# Check if the base slug is available (optimization to avoid numeric suffixes when possible)
if slug_available?(base_slug, current_id) do
base_slug
else
@@ -35,16 +52,44 @@ defmodule WandererApp.Api.Changes.SlugifyName do
end
end
defp find_available_slug(base_slug, current_id, n) do
defp find_available_slug(base_slug, current_id, n) when n <= @max_attempts do
candidate_slug = "#{base_slug}-#{n}"
if slug_available?(candidate_slug, current_id) do
# Emit telemetry when we had to use a suffix (indicates potential conflict)
:telemetry.execute(
[:wanderer_app, :map, :slug_suffix_used],
%{suffix_number: n},
%{base_slug: base_slug, final_slug: candidate_slug}
)
candidate_slug
else
find_available_slug(base_slug, current_id, n + 1)
end
end
defp find_available_slug(base_slug, _current_id, n) when n > @max_attempts do
# Fallback: use timestamp suffix if we've tried too many numeric suffixes
# This handles edge cases where many maps have similar names
timestamp = System.system_time(:millisecond)
fallback_slug = "#{base_slug}-#{timestamp}"
Logger.warning(
"Slug generation exceeded #{@max_attempts} attempts for '#{base_slug}', using timestamp fallback",
base_slug: base_slug,
fallback_slug: fallback_slug
)
:telemetry.execute(
[:wanderer_app, :map, :slug_fallback_used],
%{attempts: n},
%{base_slug: base_slug, fallback_slug: fallback_slug}
)
fallback_slug
end
defp slug_available?(slug, current_id) do
query =
WandererApp.Api.Map
@@ -60,9 +105,20 @@ defmodule WandererApp.Api.Changes.SlugifyName do
|> Ash.Query.limit(1)
case Ash.read(query) do
{:ok, []} -> true
{:ok, _} -> false
{:error, _} -> false
{:ok, []} ->
true
{:ok, _existing} ->
false
{:error, error} ->
# Log error but be conservative - assume slug is not available
Logger.warning("Error checking slug availability",
slug: slug,
error: inspect(error)
)
false
end
end
end

View File

@@ -31,7 +31,7 @@ defmodule WandererApp.Api.Map do
routes do
base("/maps")
get(:by_slug, route: "/:slug")
index :read
# index :read
post(:new)
patch(:update)
delete(:destroy)

View File

@@ -187,6 +187,10 @@ defmodule WandererApp.Character do
{:ok, result} ->
{:ok, result |> prepare_search_results()}
{:error, error} ->
Logger.warning("#{__MODULE__} failed search: #{inspect(error)}")
{:ok, []}
error ->
Logger.warning("#{__MODULE__} failed search: #{inspect(error)}")
{:ok, []}

View File

@@ -8,7 +8,7 @@ defmodule WandererApp.Character.TrackerPool do
:tracked_ids,
:uuid,
:characters,
server_online: true
server_online: false
]
@name __MODULE__

View File

@@ -12,7 +12,7 @@ defmodule WandererApp.Character.TransactionsTracker.Impl do
total_balance: 0,
transactions: [],
retries: 5,
server_online: true,
server_online: false,
status: :started
]
@@ -75,7 +75,7 @@ defmodule WandererApp.Character.TransactionsTracker.Impl do
def handle_event(
:update_corp_wallets,
%{character: character} = state
%{character: character, server_online: true} = state
) do
Process.send_after(self(), :update_corp_wallets, @update_interval)
@@ -88,26 +88,26 @@ defmodule WandererApp.Character.TransactionsTracker.Impl do
:update_corp_wallets,
state
) do
Process.send_after(self(), :update_corp_wallets, :timer.seconds(15))
Process.send_after(self(), :update_corp_wallets, @update_interval)
state
end
def handle_event(
:check_wallets,
%{wallets: []} = state
%{character: character, wallets: wallets, server_online: true} = state
) do
Process.send_after(self(), :check_wallets, :timer.seconds(5))
Process.send_after(self(), :check_wallets, @update_interval)
state
end
def handle_event(
:check_wallets,
%{character: character, wallets: wallets} = state
) do
check_wallets(wallets, character)
state
end
def handle_event(
:check_wallets,
state
) do
Process.send_after(self(), :check_wallets, @update_interval)
state

View File

@@ -53,8 +53,8 @@ defmodule WandererApp.Map do
{:ok, map} ->
map
_ ->
Logger.error(fn -> "Failed to get map #{map_id}" end)
error ->
Logger.error("Failed to get map #{map_id}: #{inspect(error)}")
%{}
end
end
@@ -183,9 +183,31 @@ defmodule WandererApp.Map do
def add_characters!(map, []), do: map
def add_characters!(%{map_id: map_id} = map, [character | rest]) do
add_character(map_id, character)
add_characters!(map, rest)
def add_characters!(%{map_id: map_id} = map, characters) when is_list(characters) do
# Get current characters list once
current_characters = Map.get(map, :characters, [])
characters_ids =
characters
|> Enum.map(fn %{id: char_id} -> char_id end)
# Filter out characters that already exist
new_character_ids =
characters_ids
|> Enum.reject(fn char_id -> char_id in current_characters end)
# If all characters already exist, return early
if new_character_ids == [] do
map
else
case update_map(map_id, %{characters: new_character_ids ++ current_characters}) do
{:commit, map} ->
map
_ ->
map
end
end
end
def add_character(
@@ -198,61 +220,10 @@ defmodule WandererApp.Map do
case not (characters |> Enum.member?(character_id)) do
true ->
WandererApp.Character.get_map_character(map_id, character_id)
|> case do
{:ok,
%{
alliance_id: alliance_id,
corporation_id: corporation_id,
solar_system_id: solar_system_id,
structure_id: structure_id,
station_id: station_id,
ship: ship_type_id,
ship_name: ship_name
}} ->
map_id
|> update_map(%{characters: [character_id | characters]})
map_id
|> update_map(%{characters: [character_id | characters]})
# WandererApp.Cache.insert(
# "map:#{map_id}:character:#{character_id}:alliance_id",
# alliance_id
# )
# WandererApp.Cache.insert(
# "map:#{map_id}:character:#{character_id}:corporation_id",
# corporation_id
# )
# WandererApp.Cache.insert(
# "map:#{map_id}:character:#{character_id}:solar_system_id",
# solar_system_id
# )
# WandererApp.Cache.insert(
# "map:#{map_id}:character:#{character_id}:structure_id",
# structure_id
# )
# WandererApp.Cache.insert(
# "map:#{map_id}:character:#{character_id}:station_id",
# station_id
# )
# WandererApp.Cache.insert(
# "map:#{map_id}:character:#{character_id}:ship_type_id",
# ship_type_id
# )
# WandererApp.Cache.insert(
# "map:#{map_id}:character:#{character_id}:ship_name",
# ship_name
# )
:ok
error ->
error
end
:ok
_ ->
{:error, :already_exists}

View File

@@ -9,8 +9,8 @@ defmodule WandererApp.Map.Manager do
alias WandererApp.Map.Server
@maps_start_per_second 10
@maps_start_interval 1000
@maps_start_chunk_size 20
@maps_start_interval 500
@maps_queue :maps_queue
@check_maps_queue_interval :timer.seconds(1)
@@ -58,9 +58,9 @@ defmodule WandererApp.Map.Manager do
{:ok, pings_cleanup_timer} =
:timer.send_interval(@pings_cleanup_interval, :cleanup_pings)
safe_async_task(fn ->
start_last_active_maps()
end)
# safe_async_task(fn ->
# start_last_active_maps()
# end)
{:ok,
%{
@@ -153,7 +153,7 @@ defmodule WandererApp.Map.Manager do
@maps_queue
|> WandererApp.Queue.to_list!()
|> Enum.uniq()
|> Enum.chunk_every(@maps_start_per_second)
|> Enum.chunk_every(@maps_start_chunk_size)
WandererApp.Queue.clear(@maps_queue)

View File

@@ -4,7 +4,7 @@ defmodule WandererApp.Map.MapPool do
require Logger
alias WandererApp.Map.Server
alias WandererApp.Map.{MapPoolState, Server}
defstruct [
:map_ids,
@@ -15,6 +15,7 @@ defmodule WandererApp.Map.MapPool do
@cache :map_pool_cache
@registry :map_pool_registry
@unique_registry :unique_map_pool_registry
@map_pool_limit 10
@garbage_collection_interval :timer.hours(4)
@systems_cleanup_timeout :timer.minutes(30)
@@ -25,7 +26,17 @@ defmodule WandererApp.Map.MapPool do
def new(), do: __struct__()
def new(args), do: __struct__(args)
def start_link(map_ids) do
# Accept both {uuid, map_ids} tuple (from supervisor restart) and just map_ids (legacy)
def start_link({uuid, map_ids}) when is_binary(uuid) and is_list(map_ids) do
GenServer.start_link(
@name,
{uuid, map_ids},
name: Module.concat(__MODULE__, uuid)
)
end
# For backward compatibility - generate UUID if only map_ids provided
def start_link(map_ids) when is_list(map_ids) do
uuid = UUID.uuid1()
GenServer.start_link(
@@ -37,13 +48,42 @@ defmodule WandererApp.Map.MapPool do
@impl true
def init({uuid, map_ids}) do
{:ok, _} = Registry.register(@unique_registry, Module.concat(__MODULE__, uuid), map_ids)
# Check for crash recovery - if we have previous state in ETS, merge it with new map_ids
{final_map_ids, recovery_info} =
case MapPoolState.get_pool_state(uuid) do
{:ok, recovered_map_ids} ->
# Merge and deduplicate map IDs
merged = Enum.uniq(recovered_map_ids ++ map_ids)
recovery_count = length(recovered_map_ids)
Logger.info(
"[Map Pool #{uuid}] Crash recovery detected: recovering #{recovery_count} maps",
pool_uuid: uuid,
recovered_maps: recovered_map_ids,
new_maps: map_ids,
total_maps: length(merged)
)
# Emit telemetry for crash recovery
:telemetry.execute(
[:wanderer_app, :map_pool, :recovery, :start],
%{recovered_map_count: recovery_count, total_map_count: length(merged)},
%{pool_uuid: uuid}
)
{merged, %{recovered: true, count: recovery_count}}
{:error, :not_found} ->
# Normal startup, no previous state to recover
{map_ids, %{recovered: false}}
end
# Register with empty list - maps will be added as they're started in handle_continue
{:ok, _} = Registry.register(@unique_registry, Module.concat(__MODULE__, uuid), [])
{:ok, _} = Registry.register(@registry, __MODULE__, uuid)
map_ids
|> Enum.each(fn id ->
Cachex.put(@cache, id, uuid)
end)
# Don't pre-populate cache - will be populated as maps start in handle_continue
# This prevents duplicates when recovering
state =
%{
@@ -52,23 +92,100 @@ defmodule WandererApp.Map.MapPool do
}
|> new()
{:ok, state, {:continue, {:start, map_ids}}}
{:ok, state, {:continue, {:start, {final_map_ids, recovery_info}}}}
end
@impl true
def terminate(_reason, _state) do
def terminate(reason, %{uuid: uuid} = _state) do
# On graceful shutdown, clean up ETS state
# On crash, keep ETS state for recovery
case reason do
:normal ->
Logger.debug("[Map Pool #{uuid}] Graceful shutdown, cleaning up ETS state")
MapPoolState.delete_pool_state(uuid)
:shutdown ->
Logger.debug("[Map Pool #{uuid}] Graceful shutdown, cleaning up ETS state")
MapPoolState.delete_pool_state(uuid)
{:shutdown, _} ->
Logger.debug("[Map Pool #{uuid}] Graceful shutdown, cleaning up ETS state")
MapPoolState.delete_pool_state(uuid)
_ ->
Logger.warning(
"[Map Pool #{uuid}] Abnormal termination (#{inspect(reason)}), keeping ETS state for recovery"
)
# Keep ETS state for crash recovery
:ok
end
:ok
end
@impl true
def handle_continue({:start, map_ids}, state) do
def handle_continue({:start, {map_ids, recovery_info}}, state) do
Logger.info("#{@name} started")
map_ids
|> Enum.each(fn map_id ->
GenServer.cast(self(), {:start_map, map_id})
end)
# Track recovery statistics
start_time = System.monotonic_time(:millisecond)
initial_count = length(map_ids)
# Start maps synchronously and accumulate state changes
{new_state, failed_maps} =
map_ids
|> Enum.reduce({state, []}, fn map_id, {current_state, failed} ->
case do_start_map(map_id, current_state) do
{:ok, updated_state} ->
{updated_state, failed}
{:error, reason} ->
Logger.error("[Map Pool] Failed to start map #{map_id}: #{reason}")
# Emit telemetry for individual map recovery failure
if recovery_info.recovered do
:telemetry.execute(
[:wanderer_app, :map_pool, :recovery, :map_failed],
%{map_id: map_id},
%{pool_uuid: state.uuid, reason: reason}
)
end
{current_state, [map_id | failed]}
end
end)
# Calculate final statistics
end_time = System.monotonic_time(:millisecond)
duration_ms = end_time - start_time
successful_count = length(new_state.map_ids)
failed_count = length(failed_maps)
# Log and emit telemetry for recovery completion
if recovery_info.recovered do
Logger.info(
"[Map Pool #{state.uuid}] Crash recovery completed: #{successful_count}/#{initial_count} maps recovered in #{duration_ms}ms",
pool_uuid: state.uuid,
recovered_count: successful_count,
failed_count: failed_count,
total_count: initial_count,
duration_ms: duration_ms,
failed_maps: failed_maps
)
:telemetry.execute(
[:wanderer_app, :map_pool, :recovery, :complete],
%{
recovered_count: successful_count,
failed_count: failed_count,
duration_ms: duration_ms
},
%{pool_uuid: state.uuid}
)
end
# Schedule periodic tasks
Process.send_after(self(), :backup_state, @backup_state_timeout)
Process.send_after(self(), :cleanup_systems, 15_000)
Process.send_after(self(), :cleanup_characters, @characters_cleanup_timeout)
@@ -77,56 +194,372 @@ defmodule WandererApp.Map.MapPool do
# Start message queue monitoring
Process.send_after(self(), :monitor_message_queue, :timer.seconds(30))
{:noreply, state}
{:noreply, new_state}
end
@impl true
def handle_continue({:init_map, map_id}, %{uuid: uuid} = state) do
# Perform the actual map initialization asynchronously
# This runs after the GenServer.call has already returned
start_time = System.monotonic_time(:millisecond)
try do
# Initialize the map state and start the map server using extracted helper
do_initialize_map_server(map_id)
duration = System.monotonic_time(:millisecond) - start_time
Logger.info("[Map Pool #{uuid}] Map #{map_id} initialized successfully in #{duration}ms")
# Emit telemetry for slow initializations
if duration > 5_000 do
Logger.warning("[Map Pool #{uuid}] Slow map initialization: #{map_id} took #{duration}ms")
:telemetry.execute(
[:wanderer_app, :map_pool, :slow_init],
%{duration_ms: duration},
%{map_id: map_id, pool_uuid: uuid}
)
end
{:noreply, state}
rescue
e ->
duration = System.monotonic_time(:millisecond) - start_time
Logger.error("""
[Map Pool #{uuid}] Failed to initialize map #{map_id} after #{duration}ms: #{Exception.message(e)}
#{Exception.format_stacktrace(__STACKTRACE__)}
""")
# Rollback: Remove from state, registry, cache, and ETS using extracted helper
new_state = do_unregister_map(map_id, uuid, state)
# Emit telemetry for failed initialization
:telemetry.execute(
[:wanderer_app, :map_pool, :init_failed],
%{duration_ms: duration},
%{map_id: map_id, pool_uuid: uuid, reason: Exception.message(e)}
)
{:noreply, new_state}
end
end
@impl true
def handle_cast(:stop, state), do: {:stop, :normal, state}
@impl true
def handle_cast({:start_map, map_id}, %{map_ids: map_ids, uuid: uuid} = state) do
if map_id not in map_ids do
Registry.update_value(@unique_registry, Module.concat(__MODULE__, uuid), fn r_map_ids ->
[map_id | r_map_ids]
def handle_call({:start_map, map_id}, _from, %{map_ids: map_ids, uuid: uuid} = state) do
# Enforce capacity limit to prevent pool overload due to race conditions
if length(map_ids) >= @map_pool_limit do
Logger.warning(
"[Map Pool #{uuid}] Pool at capacity (#{length(map_ids)}/#{@map_pool_limit}), " <>
"rejecting map #{map_id} and triggering new pool creation"
)
# Trigger a new pool creation attempt asynchronously
# This allows the system to create a new pool for this map
spawn(fn ->
WandererApp.Map.MapPoolDynamicSupervisor.start_map(map_id)
end)
Cachex.put(@cache, map_id, uuid)
map_id
|> WandererApp.Map.get_map_state!()
|> Server.Impl.start_map()
{:noreply, %{state | map_ids: [map_id | map_ids]}}
{:reply, :ok, state}
else
{:noreply, state}
# Check if map is already started or being initialized
if map_id in map_ids do
Logger.debug("[Map Pool #{uuid}] Map #{map_id} already in pool")
{:reply, {:ok, :already_started}, state}
else
# Pre-register the map in registry and cache to claim ownership
# This prevents race conditions where multiple pools try to start the same map
registry_result =
Registry.update_value(@unique_registry, Module.concat(__MODULE__, uuid), fn r_map_ids ->
[map_id | r_map_ids]
end)
case registry_result do
{_new_value, _old_value} ->
# Add to cache
Cachex.put(@cache, map_id, uuid)
# Add to state
new_state = %{state | map_ids: [map_id | map_ids]}
# Persist state to ETS
MapPoolState.save_pool_state(uuid, new_state.map_ids)
Logger.debug("[Map Pool #{uuid}] Map #{map_id} queued for async initialization")
# Return immediately and initialize asynchronously
{:reply, {:ok, :initializing}, new_state, {:continue, {:init_map, map_id}}}
:error ->
Logger.error("[Map Pool #{uuid}] Failed to register map #{map_id} in registry")
{:reply, {:error, :registration_failed}, state}
end
end
end
end
@impl true
def handle_cast(
def handle_call(
{:stop_map, map_id},
%{map_ids: map_ids, uuid: uuid} = state
_from,
state
) do
case do_stop_map(map_id, state) do
{:ok, new_state} ->
{:reply, :ok, new_state}
{:error, reason} ->
{:reply, {:error, reason}, state}
end
end
defp do_start_map(map_id, %{map_ids: map_ids, uuid: uuid} = state) do
if map_id in map_ids do
# Map already started
{:ok, state}
else
# Track what operations succeeded for potential rollback
completed_operations = []
try do
# Step 1: Update Registry (most critical, do first)
registry_result =
Registry.update_value(@unique_registry, Module.concat(__MODULE__, uuid), fn r_map_ids ->
[map_id | r_map_ids]
end)
completed_operations = [:registry | completed_operations]
case registry_result do
{new_value, _old_value} when is_list(new_value) ->
:ok
:error ->
raise "Failed to update registry for pool #{uuid}"
end
# Step 2: Add to cache
case Cachex.put(@cache, map_id, uuid) do
{:ok, _} ->
:ok
{:error, reason} ->
raise "Failed to add to cache: #{inspect(reason)}"
end
completed_operations = [:cache | completed_operations]
# Step 3: Start the map server using extracted helper
do_initialize_map_server(map_id)
completed_operations = [:map_server | completed_operations]
# Step 4: Update GenServer state (last, as this is in-memory and fast)
new_state = %{state | map_ids: [map_id | map_ids]}
# Step 5: Persist state to ETS for crash recovery
MapPoolState.save_pool_state(uuid, new_state.map_ids)
Logger.debug("[Map Pool] Successfully started map #{map_id} in pool #{uuid}")
{:ok, new_state}
rescue
e ->
Logger.error("""
[Map Pool] Failed to start map #{map_id} (completed: #{inspect(completed_operations)}): #{Exception.message(e)}
#{Exception.format_stacktrace(__STACKTRACE__)}
""")
# Attempt rollback of completed operations
rollback_start_map_operations(map_id, uuid, completed_operations)
{:error, Exception.message(e)}
end
end
end
defp rollback_start_map_operations(map_id, uuid, completed_operations) do
Logger.warning("[Map Pool] Attempting to rollback start_map operations for #{map_id}")
# Rollback in reverse order
if :map_server in completed_operations do
Logger.debug("[Map Pool] Rollback: Stopping map server for #{map_id}")
try do
Server.Impl.stop_map(map_id)
rescue
e ->
Logger.error("[Map Pool] Rollback failed to stop map server: #{Exception.message(e)}")
end
end
if :cache in completed_operations do
Logger.debug("[Map Pool] Rollback: Removing #{map_id} from cache")
case Cachex.del(@cache, map_id) do
{:ok, _} ->
:ok
{:error, reason} ->
Logger.error("[Map Pool] Rollback failed for cache: #{inspect(reason)}")
end
end
if :registry in completed_operations do
Logger.debug("[Map Pool] Rollback: Removing #{map_id} from registry")
try do
Registry.update_value(@unique_registry, Module.concat(__MODULE__, uuid), fn r_map_ids ->
r_map_ids |> Enum.reject(fn id -> id == map_id end)
end)
rescue
e ->
Logger.error("[Map Pool] Rollback failed for registry: #{Exception.message(e)}")
end
end
end
defp do_stop_map(map_id, %{map_ids: map_ids, uuid: uuid} = state) do
# Track what operations succeeded for potential rollback
completed_operations = []
try do
# Step 1: Update Registry (most critical, do first)
registry_result =
Registry.update_value(@unique_registry, Module.concat(__MODULE__, uuid), fn r_map_ids ->
r_map_ids |> Enum.reject(fn id -> id == map_id end)
end)
completed_operations = [:registry | completed_operations]
case registry_result do
{new_value, _old_value} when is_list(new_value) ->
:ok
:error ->
raise "Failed to update registry for pool #{uuid}"
end
# Step 2: Delete from cache
case Cachex.del(@cache, map_id) do
{:ok, _} ->
:ok
{:error, reason} ->
raise "Failed to delete from cache: #{inspect(reason)}"
end
completed_operations = [:cache | completed_operations]
# Step 3: Stop the map server (clean up all map resources)
map_id
|> Server.Impl.stop_map()
completed_operations = [:map_server | completed_operations]
# Step 4: Update GenServer state (last, as this is in-memory and fast)
new_state = %{state | map_ids: map_ids |> Enum.reject(fn id -> id == map_id end)}
# Step 5: Persist state to ETS for crash recovery
MapPoolState.save_pool_state(uuid, new_state.map_ids)
Logger.debug("[Map Pool] Successfully stopped map #{map_id} from pool #{uuid}")
{:ok, new_state}
rescue
e ->
Logger.error("""
[Map Pool] Failed to stop map #{map_id} (completed: #{inspect(completed_operations)}): #{Exception.message(e)}
#{Exception.format_stacktrace(__STACKTRACE__)}
""")
# Attempt rollback of completed operations
rollback_stop_map_operations(map_id, uuid, completed_operations)
{:error, Exception.message(e)}
end
end
# Helper function to initialize the map server (no state management)
# This extracts the common map initialization logic used in both
# synchronous (do_start_map) and asynchronous ({:init_map, map_id}) paths
defp do_initialize_map_server(map_id) do
map_id
|> WandererApp.Map.get_map_state!()
|> Server.Impl.start_map()
end
# Helper function to unregister a map from all tracking
# Used for rollback when map initialization fails in the async path
defp do_unregister_map(map_id, uuid, state) do
# Remove from registry
Registry.update_value(@unique_registry, Module.concat(__MODULE__, uuid), fn r_map_ids ->
r_map_ids |> Enum.reject(fn id -> id == map_id end)
Enum.reject(r_map_ids, &(&1 == map_id))
end)
# Remove from cache
Cachex.del(@cache, map_id)
map_id
|> Server.Impl.stop_map()
# Update state
new_state = %{state | map_ids: Enum.reject(state.map_ids, &(&1 == map_id))}
{:noreply, %{state | map_ids: map_ids |> Enum.reject(fn id -> id == map_id end)}}
# Update ETS
MapPoolState.save_pool_state(uuid, new_state.map_ids)
new_state
end
defp rollback_stop_map_operations(map_id, uuid, completed_operations) do
Logger.warning("[Map Pool] Attempting to rollback stop_map operations for #{map_id}")
# Rollback in reverse order
if :cache in completed_operations do
Logger.debug("[Map Pool] Rollback: Re-adding #{map_id} to cache")
case Cachex.put(@cache, map_id, uuid) do
{:ok, _} ->
:ok
{:error, reason} ->
Logger.error("[Map Pool] Rollback failed for cache: #{inspect(reason)}")
end
end
if :registry in completed_operations do
Logger.debug("[Map Pool] Rollback: Re-adding #{map_id} to registry")
try do
Registry.update_value(@unique_registry, Module.concat(__MODULE__, uuid), fn r_map_ids ->
if map_id in r_map_ids do
r_map_ids
else
[map_id | r_map_ids]
end
end)
rescue
e ->
Logger.error("[Map Pool] Rollback failed for registry: #{Exception.message(e)}")
end
end
# Note: We don't rollback map_server stop as Server.Impl.stop_map() is idempotent
# and the cleanup operations are safe to leave in a "stopped" state
end
@impl true
def handle_call(:error, _, state), do: {:stop, :error, :ok, state}
@impl true
def handle_info(:backup_state, %{map_ids: map_ids} = state) do
def handle_info(:backup_state, %{map_ids: map_ids, uuid: uuid} = state) do
Process.send_after(self(), :backup_state, @backup_state_timeout)
try do
# Persist pool state to ETS
MapPoolState.save_pool_state(uuid, map_ids)
# Backup individual map states to database
map_ids
|> Task.async_stream(
fn map_id ->
@@ -231,25 +664,38 @@ defmodule WandererApp.Map.MapPool do
Process.send_after(self(), :garbage_collect, @garbage_collection_interval)
try do
map_ids
|> Enum.each(fn map_id ->
# presence_character_ids =
# WandererApp.Cache.lookup!("map_#{map_id}:presence_character_ids", [])
# Process each map and accumulate state changes
new_state =
map_ids
|> Enum.reduce(state, fn map_id, current_state ->
presence_character_ids =
WandererApp.Cache.lookup!("map_#{map_id}:presence_character_ids", [])
# if presence_character_ids |> Enum.empty?() do
Logger.info(
"#{uuid}: No more characters present on: #{map_id}, shutting down map server..."
)
if presence_character_ids |> Enum.empty?() do
Logger.info(
"#{uuid}: No more characters present on: #{map_id}, shutting down map server..."
)
GenServer.cast(self(), {:stop_map, map_id})
# end
end)
case do_stop_map(map_id, current_state) do
{:ok, updated_state} ->
Logger.debug("#{uuid}: Successfully stopped map #{map_id}")
updated_state
{:error, reason} ->
Logger.error("#{uuid}: Failed to stop map #{map_id}: #{reason}")
current_state
end
else
current_state
end
end)
{:noreply, new_state}
rescue
e ->
Logger.error(Exception.message(e))
Logger.error("#{uuid}: Garbage collection error: #{Exception.message(e)}")
{:noreply, state}
end
{:noreply, state}
end
@impl true
@@ -309,6 +755,57 @@ defmodule WandererApp.Map.MapPool do
{:noreply, state}
end
def handle_info(:map_deleted, %{map_ids: map_ids} = state) do
# When a map is deleted, stop all maps in this pool that are deleted
# This is a graceful shutdown triggered by user action
Logger.info("[Map Pool #{state.uuid}] Received map_deleted event, stopping affected maps")
# Check which of our maps were deleted and stop them
new_state =
map_ids
|> Enum.reduce(state, fn map_id, current_state ->
# Check if the map still exists in the database
case WandererApp.MapRepo.get(map_id) do
{:ok, %{deleted: true}} ->
Logger.info("[Map Pool #{state.uuid}] Map #{map_id} was deleted, stopping it")
case do_stop_map(map_id, current_state) do
{:ok, updated_state} ->
updated_state
{:error, reason} ->
Logger.error(
"[Map Pool #{state.uuid}] Failed to stop deleted map #{map_id}: #{reason}"
)
current_state
end
{:ok, _map} ->
# Map still exists and is not deleted
current_state
{:error, _} ->
# Map doesn't exist, should stop it
Logger.info("[Map Pool #{state.uuid}] Map #{map_id} not found, stopping it")
case do_stop_map(map_id, current_state) do
{:ok, updated_state} ->
updated_state
{:error, reason} ->
Logger.error(
"[Map Pool #{state.uuid}] Failed to stop missing map #{map_id}: #{reason}"
)
current_state
end
end
end)
{:noreply, new_state}
end
def handle_info(event, state) do
try do
Server.Impl.handle_event(event)

View File

@@ -8,6 +8,7 @@ defmodule WandererApp.Map.MapPoolDynamicSupervisor do
@registry :map_pool_registry
@unique_registry :unique_map_pool_registry
@map_pool_limit 10
@genserver_call_timeout :timer.minutes(2)
@name __MODULE__
@@ -30,23 +31,109 @@ defmodule WandererApp.Map.MapPoolDynamicSupervisor do
start_child([map_id], pools |> Enum.count())
pid ->
GenServer.cast(pid, {:start_map, map_id})
result = GenServer.call(pid, {:start_map, map_id}, @genserver_call_timeout)
case result do
{:ok, :initializing} ->
Logger.debug(
"[Map Pool Supervisor] Map #{map_id} queued for async initialization"
)
result
{:ok, :already_started} ->
Logger.debug("[Map Pool Supervisor] Map #{map_id} already started")
result
:ok ->
# Legacy synchronous response (from crash recovery path)
Logger.debug("[Map Pool Supervisor] Map #{map_id} started synchronously")
result
other ->
Logger.warning(
"[Map Pool Supervisor] Unexpected response for map #{map_id}: #{inspect(other)}"
)
other
end
end
end
end
def stop_map(map_id) do
{:ok, pool_uuid} = Cachex.get(@cache, map_id)
case Cachex.get(@cache, map_id) do
{:ok, nil} ->
# Cache miss - try to find the pool by scanning the registry
Logger.warning(
"Cache miss for map #{map_id}, scanning registry for pool containing this map"
)
case Registry.lookup(
@unique_registry,
Module.concat(WandererApp.Map.MapPool, pool_uuid)
) do
find_pool_by_scanning_registry(map_id)
{:ok, pool_uuid} ->
# Cache hit - use the pool_uuid to lookup the pool
case Registry.lookup(
@unique_registry,
Module.concat(WandererApp.Map.MapPool, pool_uuid)
) do
[] ->
Logger.warning(
"Pool with UUID #{pool_uuid} not found in registry for map #{map_id}, scanning registry"
)
find_pool_by_scanning_registry(map_id)
[{pool_pid, _}] ->
GenServer.call(pool_pid, {:stop_map, map_id}, @genserver_call_timeout)
end
{:error, reason} ->
Logger.error("Failed to lookup map #{map_id} in cache: #{inspect(reason)}")
:ok
end
end
defp find_pool_by_scanning_registry(map_id) do
case Registry.lookup(@registry, WandererApp.Map.MapPool) do
[] ->
Logger.debug("No map pools found in registry for map #{map_id}")
:ok
[{pool_pid, _}] ->
GenServer.cast(pool_pid, {:stop_map, map_id})
pools ->
# Scan all pools to find the one containing this map_id
found_pool =
Enum.find_value(pools, fn {_pid, uuid} ->
case Registry.lookup(
@unique_registry,
Module.concat(WandererApp.Map.MapPool, uuid)
) do
[{pool_pid, map_ids}] ->
if map_id in map_ids do
{pool_pid, uuid}
else
nil
end
_ ->
nil
end
end)
case found_pool do
{pool_pid, pool_uuid} ->
Logger.info(
"Found map #{map_id} in pool #{pool_uuid} via registry scan, updating cache"
)
# Update the cache to fix the inconsistency
Cachex.put(@cache, map_id, pool_uuid)
GenServer.call(pool_pid, {:stop_map, map_id}, @genserver_call_timeout)
nil ->
Logger.debug("Map #{map_id} not found in any pool registry")
:ok
end
end
end
@@ -79,9 +166,13 @@ defmodule WandererApp.Map.MapPoolDynamicSupervisor do
end
defp start_child(map_ids, pools_count) do
case DynamicSupervisor.start_child(@name, {WandererApp.Map.MapPool, map_ids}) do
# Generate UUID for the new pool - this will be used for crash recovery
uuid = UUID.uuid1()
# Pass both UUID and map_ids to the pool for crash recovery support
case DynamicSupervisor.start_child(@name, {WandererApp.Map.MapPool, {uuid, map_ids}}) do
{:ok, pid} ->
Logger.info("Starting map pool, total map_pools: #{pools_count + 1}")
Logger.info("Starting map pool #{uuid}, total map_pools: #{pools_count + 1}")
{:ok, pid}
{:error, {:already_started, pid}} ->

View File

@@ -0,0 +1,190 @@
defmodule WandererApp.Map.MapPoolState do
@moduledoc """
Helper module for persisting MapPool state to ETS for crash recovery.
This module provides functions to save and retrieve MapPool state from an ETS table.
The state survives GenServer crashes but is lost on node restart, which ensures
automatic recovery from crashes while avoiding stale state on system restart.
## ETS Table Ownership
The ETS table `:map_pool_state_table` is owned by the MapPoolSupervisor,
ensuring it survives individual MapPool process crashes.
## State Format
State is stored as tuples: `{pool_uuid, map_ids, last_updated_timestamp}`
where:
- `pool_uuid` is the unique identifier for the pool (key)
- `map_ids` is a list of map IDs managed by this pool
- `last_updated_timestamp` is the Unix timestamp of the last update
"""
require Logger
@table_name :map_pool_state_table
@stale_threshold_hours 24
@doc """
Initializes the ETS table for storing MapPool state.
This should be called by the MapPoolSupervisor during initialization.
The table is created as:
- `:set` - Each pool UUID has exactly one entry
- `:public` - Any process can read/write
- `:named_table` - Can be accessed by name
Returns the table reference or raises if table already exists.
"""
@spec init_table() :: :ets.table()
def init_table do
:ets.new(@table_name, [:set, :public, :named_table])
end
@doc """
Saves the current state of a MapPool to ETS.
## Parameters
- `uuid` - The unique identifier for the pool
- `map_ids` - List of map IDs currently managed by this pool
## Examples
iex> MapPoolState.save_pool_state("pool-123", [1, 2, 3])
:ok
"""
@spec save_pool_state(String.t(), [integer()]) :: :ok
def save_pool_state(uuid, map_ids) when is_binary(uuid) and is_list(map_ids) do
timestamp = System.system_time(:second)
true = :ets.insert(@table_name, {uuid, map_ids, timestamp})
Logger.debug("Saved MapPool state for #{uuid}: #{length(map_ids)} maps",
pool_uuid: uuid,
map_count: length(map_ids)
)
:ok
end
@doc """
Retrieves the saved state for a MapPool from ETS.
## Parameters
- `uuid` - The unique identifier for the pool
## Returns
- `{:ok, map_ids}` if state exists
- `{:error, :not_found}` if no state exists for this UUID
## Examples
iex> MapPoolState.get_pool_state("pool-123")
{:ok, [1, 2, 3]}
iex> MapPoolState.get_pool_state("non-existent")
{:error, :not_found}
"""
@spec get_pool_state(String.t()) :: {:ok, [integer()]} | {:error, :not_found}
def get_pool_state(uuid) when is_binary(uuid) do
case :ets.lookup(@table_name, uuid) do
[{^uuid, map_ids, _timestamp}] ->
{:ok, map_ids}
[] ->
{:error, :not_found}
end
end
@doc """
Deletes the state for a MapPool from ETS.
This should be called when a pool is gracefully shut down.
## Parameters
- `uuid` - The unique identifier for the pool
## Examples
iex> MapPoolState.delete_pool_state("pool-123")
:ok
"""
@spec delete_pool_state(String.t()) :: :ok
def delete_pool_state(uuid) when is_binary(uuid) do
true = :ets.delete(@table_name, uuid)
Logger.debug("Deleted MapPool state for #{uuid}", pool_uuid: uuid)
:ok
end
@doc """
Removes stale entries from the ETS table.
Entries are considered stale if they haven't been updated in the last
#{@stale_threshold_hours} hours. This helps prevent the table from growing
unbounded due to pool UUIDs that are no longer in use.
Returns the number of entries deleted.
## Examples
iex> MapPoolState.cleanup_stale_entries()
{:ok, 3}
"""
@spec cleanup_stale_entries() :: {:ok, non_neg_integer()}
def cleanup_stale_entries do
stale_threshold = System.system_time(:second) - @stale_threshold_hours * 3600
match_spec = [
{
{:"$1", :"$2", :"$3"},
[{:<, :"$3", stale_threshold}],
[:"$1"]
}
]
stale_uuids = :ets.select(@table_name, match_spec)
Enum.each(stale_uuids, fn uuid ->
:ets.delete(@table_name, uuid)
Logger.info("Cleaned up stale MapPool state for #{uuid}",
pool_uuid: uuid,
reason: :stale
)
end)
{:ok, length(stale_uuids)}
end
@doc """
Returns all pool states currently stored in ETS.
Useful for debugging and monitoring.
## Examples
iex> MapPoolState.list_all_states()
[
{"pool-123", [1, 2, 3], 1699564800},
{"pool-456", [4, 5], 1699564900}
]
"""
@spec list_all_states() :: [{String.t(), [integer()], integer()}]
def list_all_states do
:ets.tab2list(@table_name)
end
@doc """
Returns the count of pool states currently stored in ETS.
## Examples
iex> MapPoolState.count_states()
5
"""
@spec count_states() :: non_neg_integer()
def count_states do
:ets.info(@table_name, :size)
end
end

View File

@@ -2,6 +2,8 @@ defmodule WandererApp.Map.MapPoolSupervisor do
@moduledoc false
use Supervisor
alias WandererApp.Map.MapPoolState
@name __MODULE__
@registry :map_pool_registry
@unique_registry :unique_map_pool_registry
@@ -11,10 +13,15 @@ defmodule WandererApp.Map.MapPoolSupervisor do
end
def init(_args) do
# Initialize ETS table for MapPool state persistence
# This table survives individual MapPool crashes but is lost on node restart
MapPoolState.init_table()
children = [
{Registry, [keys: :unique, name: @unique_registry]},
{Registry, [keys: :duplicate, name: @registry]},
{WandererApp.Map.MapPoolDynamicSupervisor, []}
{WandererApp.Map.MapPoolDynamicSupervisor, []},
{WandererApp.Map.Reconciler, []}
]
Supervisor.init(children, strategy: :rest_for_one, max_restarts: 10)

View File

@@ -0,0 +1,280 @@
defmodule WandererApp.Map.Reconciler do
@moduledoc """
Periodically reconciles map state across different stores (Cache, Registry, GenServer state)
to detect and fix inconsistencies that may prevent map servers from restarting.
"""
use GenServer
require Logger
@cache :map_pool_cache
@registry :map_pool_registry
@unique_registry :unique_map_pool_registry
@reconciliation_interval :timer.minutes(5)
def start_link(_opts) do
GenServer.start_link(__MODULE__, [], name: __MODULE__)
end
@impl true
def init(_opts) do
Logger.info("Starting Map Reconciler")
schedule_reconciliation()
{:ok, %{}}
end
@impl true
def handle_info(:reconcile, state) do
schedule_reconciliation()
try do
reconcile_state()
rescue
e ->
Logger.error("""
[Map Reconciler] reconciliation error: #{Exception.message(e)}
#{Exception.format_stacktrace(__STACKTRACE__)}
""")
end
{:noreply, state}
end
@doc """
Manually trigger a reconciliation (useful for testing or manual cleanup)
"""
def trigger_reconciliation do
GenServer.cast(__MODULE__, :reconcile_now)
end
@impl true
def handle_cast(:reconcile_now, state) do
try do
reconcile_state()
rescue
e ->
Logger.error("""
[Map Reconciler] manual reconciliation error: #{Exception.message(e)}
#{Exception.format_stacktrace(__STACKTRACE__)}
""")
end
{:noreply, state}
end
defp schedule_reconciliation do
Process.send_after(self(), :reconcile, @reconciliation_interval)
end
defp reconcile_state do
Logger.debug("[Map Reconciler] Starting state reconciliation")
# Get started_maps from cache
{:ok, started_maps} = WandererApp.Cache.lookup("started_maps", [])
# Get all maps from registries
registry_maps = get_all_registry_maps()
# Detect zombie maps (in started_maps but not in any registry)
zombie_maps = started_maps -- registry_maps
# Detect orphan maps (in registry but not in started_maps)
orphan_maps = registry_maps -- started_maps
# Detect cache inconsistencies (map_pool_cache pointing to wrong or non-existent pools)
cache_inconsistencies = find_cache_inconsistencies(registry_maps)
stats = %{
total_started_maps: length(started_maps),
total_registry_maps: length(registry_maps),
zombie_maps: length(zombie_maps),
orphan_maps: length(orphan_maps),
cache_inconsistencies: length(cache_inconsistencies)
}
Logger.info("[Map Reconciler] Reconciliation stats: #{inspect(stats)}")
# Emit telemetry
:telemetry.execute(
[:wanderer_app, :map, :reconciliation],
stats,
%{}
)
# Clean up zombie maps
cleanup_zombie_maps(zombie_maps)
# Fix orphan maps
fix_orphan_maps(orphan_maps)
# Fix cache inconsistencies
fix_cache_inconsistencies(cache_inconsistencies)
Logger.debug("[Map Reconciler] State reconciliation completed")
end
defp get_all_registry_maps do
case Registry.lookup(@registry, WandererApp.Map.MapPool) do
[] ->
[]
pools ->
pools
|> Enum.flat_map(fn {_pid, uuid} ->
case Registry.lookup(
@unique_registry,
Module.concat(WandererApp.Map.MapPool, uuid)
) do
[{_pool_pid, map_ids}] -> map_ids
_ -> []
end
end)
|> Enum.uniq()
end
end
defp find_cache_inconsistencies(registry_maps) do
registry_maps
|> Enum.filter(fn map_id ->
case Cachex.get(@cache, map_id) do
{:ok, nil} ->
# Map in registry but not in cache
true
{:ok, pool_uuid} ->
# Check if the pool_uuid actually exists in registry
case Registry.lookup(
@unique_registry,
Module.concat(WandererApp.Map.MapPool, pool_uuid)
) do
[] ->
# Cache points to non-existent pool
true
[{_pool_pid, map_ids}] ->
# Check if this map is actually in the pool's map_ids
map_id not in map_ids
_ ->
false
end
{:error, _} ->
true
end
end)
end
defp cleanup_zombie_maps([]), do: :ok
defp cleanup_zombie_maps(zombie_maps) do
Logger.warning("[Map Reconciler] Found #{length(zombie_maps)} zombie maps: #{inspect(zombie_maps)}")
Enum.each(zombie_maps, fn map_id ->
Logger.info("[Map Reconciler] Cleaning up zombie map: #{map_id}")
# Remove from started_maps cache
WandererApp.Cache.insert_or_update(
"started_maps",
[],
fn started_maps ->
started_maps |> Enum.reject(fn started_map_id -> started_map_id == map_id end)
end
)
# Clean up any stale map_pool_cache entries
Cachex.del(@cache, map_id)
# Clean up map-specific caches
WandererApp.Cache.delete("map_#{map_id}:started")
WandererApp.Cache.delete("map_characters-#{map_id}")
WandererApp.Map.CacheRTree.clear_tree("rtree_#{map_id}")
WandererApp.Map.delete_map_state(map_id)
:telemetry.execute(
[:wanderer_app, :map, :reconciliation, :zombie_cleanup],
%{count: 1},
%{map_id: map_id}
)
end)
end
defp fix_orphan_maps([]), do: :ok
defp fix_orphan_maps(orphan_maps) do
Logger.warning("[Map Reconciler] Found #{length(orphan_maps)} orphan maps: #{inspect(orphan_maps)}")
Enum.each(orphan_maps, fn map_id ->
Logger.info("[Map Reconciler] Fixing orphan map: #{map_id}")
# Add to started_maps cache
WandererApp.Cache.insert_or_update(
"started_maps",
[map_id],
fn existing ->
[map_id | existing] |> Enum.uniq()
end
)
:telemetry.execute(
[:wanderer_app, :map, :reconciliation, :orphan_fixed],
%{count: 1},
%{map_id: map_id}
)
end)
end
defp fix_cache_inconsistencies([]), do: :ok
defp fix_cache_inconsistencies(inconsistent_maps) do
Logger.warning(
"[Map Reconciler] Found #{length(inconsistent_maps)} cache inconsistencies: #{inspect(inconsistent_maps)}"
)
Enum.each(inconsistent_maps, fn map_id ->
Logger.info("[Map Reconciler] Fixing cache inconsistency for map: #{map_id}")
# Find the correct pool for this map
case find_pool_for_map(map_id) do
{:ok, pool_uuid} ->
Logger.info("[Map Reconciler] Updating cache: #{map_id} -> #{pool_uuid}")
Cachex.put(@cache, map_id, pool_uuid)
:telemetry.execute(
[:wanderer_app, :map, :reconciliation, :cache_fixed],
%{count: 1},
%{map_id: map_id, pool_uuid: pool_uuid}
)
:error ->
Logger.warning("[Map Reconciler] Could not find pool for map #{map_id}, removing from cache")
Cachex.del(@cache, map_id)
end
end)
end
defp find_pool_for_map(map_id) do
case Registry.lookup(@registry, WandererApp.Map.MapPool) do
[] ->
:error
pools ->
pools
|> Enum.find_value(:error, fn {_pid, uuid} ->
case Registry.lookup(
@unique_registry,
Module.concat(WandererApp.Map.MapPool, uuid)
) do
[{_pool_pid, map_ids}] ->
if map_id in map_ids do
{:ok, uuid}
else
nil
end
_ ->
nil
end
end)
end
end
end

View File

@@ -401,7 +401,7 @@ defmodule WandererApp.Map.Server.ConnectionsImpl do
)
else
error ->
Logger.error("Failed to update_linked_signature_time_status: #{inspect(error)}")
Logger.warning("Failed to update_linked_signature_time_status: #{inspect(error)}")
end
end
@@ -537,6 +537,12 @@ defmodule WandererApp.Map.Server.ConnectionsImpl do
Impl.broadcast!(map_id, :add_connection, connection)
Impl.broadcast!(map_id, :maybe_link_signature, %{
character_id: character_id,
solar_system_source: old_location.solar_system_id,
solar_system_target: location.solar_system_id
})
# ADDITIVE: Also broadcast to external event system (webhooks/WebSocket)
WandererApp.ExternalEvents.broadcast(map_id, :connection_added, %{
connection_id: connection.id,
@@ -548,19 +554,12 @@ defmodule WandererApp.Map.Server.ConnectionsImpl do
time_status: connection.time_status
})
{:ok, _} =
WandererApp.User.ActivityTracker.track_map_event(:map_connection_added, %{
character_id: character_id,
user_id: character.user_id,
map_id: map_id,
solar_system_source_id: old_location.solar_system_id,
solar_system_target_id: location.solar_system_id
})
Impl.broadcast!(map_id, :maybe_link_signature, %{
WandererApp.User.ActivityTracker.track_map_event(:map_connection_added, %{
character_id: character_id,
solar_system_source: old_location.solar_system_id,
solar_system_target: location.solar_system_id
user_id: character.user_id,
map_id: map_id,
solar_system_source_id: old_location.solar_system_id,
solar_system_target_id: location.solar_system_id
})
:ok

View File

@@ -25,6 +25,7 @@ defmodule WandererApp.Map.Server.Impl do
]
@pubsub_client Application.compile_env(:wanderer_app, :pubsub_client)
@ddrt Application.compile_env(:wanderer_app, :ddrt)
@connections_cleanup_timeout :timer.minutes(1)
@@ -45,19 +46,77 @@ defmodule WandererApp.Map.Server.Impl do
}
|> new()
with {:ok, map} <-
WandererApp.MapRepo.get(map_id, [
:owner,
:characters,
acls: [
:owner_id,
members: [:role, :eve_character_id, :eve_corporation_id, :eve_alliance_id]
]
]),
{:ok, systems} <- WandererApp.MapSystemRepo.get_visible_by_map(map_id),
{:ok, connections} <- WandererApp.MapConnectionRepo.get_by_map(map_id),
{:ok, subscription_settings} <-
WandererApp.Map.SubscriptionManager.get_active_map_subscription(map_id) do
# Parallelize database queries for faster initialization
start_time = System.monotonic_time(:millisecond)
tasks = [
Task.async(fn ->
{:map,
WandererApp.MapRepo.get(map_id, [
:owner,
:characters,
acls: [
:owner_id,
members: [:role, :eve_character_id, :eve_corporation_id, :eve_alliance_id]
]
])}
end),
Task.async(fn ->
{:systems, WandererApp.MapSystemRepo.get_visible_by_map(map_id)}
end),
Task.async(fn ->
{:connections, WandererApp.MapConnectionRepo.get_by_map(map_id)}
end),
Task.async(fn ->
{:subscription, WandererApp.Map.SubscriptionManager.get_active_map_subscription(map_id)}
end)
]
results = Task.await_many(tasks, :timer.seconds(15))
duration = System.monotonic_time(:millisecond) - start_time
# Emit telemetry for slow initializations
if duration > 5_000 do
Logger.warning("[Map Server] Slow map state initialization: #{map_id} took #{duration}ms")
:telemetry.execute(
[:wanderer_app, :map, :slow_init],
%{duration_ms: duration},
%{map_id: map_id}
)
end
# Extract results
map_result =
Enum.find_value(results, fn
{:map, result} -> result
_ -> nil
end)
systems_result =
Enum.find_value(results, fn
{:systems, result} -> result
_ -> nil
end)
connections_result =
Enum.find_value(results, fn
{:connections, result} -> result
_ -> nil
end)
subscription_result =
Enum.find_value(results, fn
{:subscription, result} -> result
_ -> nil
end)
# Process results
with {:ok, map} <- map_result,
{:ok, systems} <- systems_result,
{:ok, connections} <- connections_result,
{:ok, subscription_settings} <- subscription_result do
initial_state
|> init_map(
map,
@@ -88,7 +147,6 @@ defmodule WandererApp.Map.Server.Impl do
"maps:#{map_id}"
)
WandererApp.Map.CacheRTree.init_tree("rtree_#{map_id}", %{width: 150, verbose: false})
Process.send_after(self(), {:update_characters, map_id}, @update_characters_timeout)
Process.send_after(
@@ -358,6 +416,13 @@ defmodule WandererApp.Map.Server.Impl do
update_options(map_id, options)
end
def handle_event(:map_deleted) do
# Map has been deleted - this event is handled by MapPool to stop the server
# and by MapLive to redirect users. Nothing to do here.
Logger.debug("Map deletion event received, will be handled by MapPool")
:ok
end
def handle_event({ref, _result}) when is_reference(ref) do
Process.demonitor(ref, [:flush])
end
@@ -452,6 +517,8 @@ defmodule WandererApp.Map.Server.Impl do
) do
{:ok, options} = WandererApp.MapRepo.options_to_form_data(initial_map)
@ddrt.init_tree("rtree_#{map_id}", %{width: 150, verbose: false})
map =
initial_map
|> WandererApp.Map.new()

View File

@@ -521,6 +521,10 @@ defmodule WandererApp.Map.Server.SystemsImpl do
:ok
{:error, error} ->
Logger.warning("Failed to create system: #{inspect(error, pretty: true)}")
:ok
error ->
Logger.warning("Failed to create system: #{inspect(error, pretty: true)}")
:ok
@@ -642,13 +646,12 @@ defmodule WandererApp.Map.Server.SystemsImpl do
position_y: system.position_y
})
{:ok, _} =
WandererApp.User.ActivityTracker.track_map_event(:system_added, %{
character_id: character_id,
user_id: user_id,
map_id: map_id,
solar_system_id: solar_system_id
})
WandererApp.User.ActivityTracker.track_map_event(:system_added, %{
character_id: character_id,
user_id: user_id,
map_id: map_id,
solar_system_id: solar_system_id
})
end
defp maybe_update_extra_info(system, nil), do: system
@@ -805,6 +808,10 @@ defmodule WandererApp.Map.Server.SystemsImpl do
update_map_system_last_activity(map_id, updated_system)
else
{:error, error} ->
Logger.error("Failed to update system: #{inspect(error, pretty: true)}")
:ok
error ->
Logger.error("Failed to update system: #{inspect(error, pretty: true)}")
:ok

View File

@@ -3,11 +3,25 @@ defmodule WandererApp.MapPingsRepo do
require Logger
def get_by_id(ping_id),
do: WandererApp.Api.MapPing.by_id!(ping_id) |> Ash.load([:system])
def get_by_id(ping_id) do
case WandererApp.Api.MapPing.by_id(ping_id) do
{:ok, ping} ->
ping |> Ash.load([:system])
def get_by_map(map_id),
do: WandererApp.Api.MapPing.by_map!(%{map_id: map_id}) |> Ash.load([:character, :system])
error ->
error
end
end
def get_by_map(map_id) do
case WandererApp.Api.MapPing.by_map(%{map_id: map_id}) do
{:ok, ping} ->
ping |> Ash.load([:character, :system])
error ->
error
end
end
def get_by_map_and_system!(map_id, system_id),
do: WandererApp.Api.MapPing.by_map_and_system!(%{map_id: map_id, system_id: system_id})

View File

@@ -11,7 +11,9 @@ defmodule WandererApp.Server.ServerStatusTracker do
:server_version,
:start_time,
:vip,
:retries
:retries,
:in_forced_downtime,
:downtime_notified
]
@retries_count 3
@@ -21,9 +23,17 @@ defmodule WandererApp.Server.ServerStatusTracker do
retries: @retries_count,
server_version: "0",
start_time: "0",
vip: true
vip: true,
in_forced_downtime: false,
downtime_notified: false
}
# EVE Online daily downtime period (UTC/GMT)
@downtime_start_hour 10
@downtime_start_minute 58
@downtime_end_hour 11
@downtime_end_minute 2
@refresh_interval :timer.minutes(1)
@logger Application.compile_env(:wanderer_app, :logger)
@@ -57,13 +67,51 @@ defmodule WandererApp.Server.ServerStatusTracker do
def handle_info(
:refresh_status,
%{
retries: retries
retries: retries,
in_forced_downtime: was_in_downtime
} = state
) do
Process.send_after(self(), :refresh_status, @refresh_interval)
Task.async(fn -> get_server_status(retries) end)
{:noreply, state}
in_downtime = in_forced_downtime?()
cond do
# Entering downtime period - broadcast offline status immediately
in_downtime and not was_in_downtime ->
@logger.info("#{__MODULE__} entering forced downtime period (10:58-11:02 GMT)")
downtime_status = %{
players: 0,
server_version: "downtime",
start_time: DateTime.utc_now() |> DateTime.to_iso8601(),
vip: true
}
Phoenix.PubSub.broadcast(
WandererApp.PubSub,
"server_status",
{:server_status, downtime_status}
)
{:noreply,
%{state | in_forced_downtime: true, downtime_notified: true}
|> Map.merge(downtime_status)}
# Currently in downtime - skip API call
in_downtime ->
{:noreply, state}
# Exiting downtime period - resume normal operations
not in_downtime and was_in_downtime ->
@logger.info("#{__MODULE__} exiting forced downtime period, resuming normal operations")
Task.async(fn -> get_server_status(retries) end)
{:noreply, %{state | in_forced_downtime: false, downtime_notified: false}}
# Normal operation
true ->
Task.async(fn -> get_server_status(retries) end)
{:noreply, state}
end
end
@impl true
@@ -155,4 +203,19 @@ defmodule WandererApp.Server.ServerStatusTracker do
vip: false
}
end
# Checks if the current UTC time falls within the forced downtime period (10:58-11:02 GMT).
defp in_forced_downtime? do
now = DateTime.utc_now()
current_hour = now.hour
current_minute = now.minute
# Convert times to minutes since midnight for easier comparison
current_time_minutes = current_hour * 60 + current_minute
downtime_start_minutes = @downtime_start_hour * 60 + @downtime_start_minute
downtime_end_minutes = @downtime_end_hour * 60 + @downtime_end_minute
current_time_minutes >= downtime_start_minutes and
current_time_minutes < downtime_end_minutes
end
end

View File

@@ -1,16 +1,57 @@
defmodule WandererApp.User.ActivityTracker do
@moduledoc false
@moduledoc """
Activity tracking wrapper that ensures audit logging never crashes application logic.
Activity tracking is best-effort and errors are logged but not propagated to callers.
This prevents race conditions (e.g., duplicate activity records) from affecting
critical business operations like character tracking or connection management.
"""
require Logger
def track_map_event(
event_type,
metadata
),
do: WandererApp.Map.Audit.track_map_event(event_type, metadata)
@doc """
Track a map-related event. Always returns `{:ok, result}` even on error.
def track_acl_event(
event_type,
metadata
),
do: WandererApp.Map.Audit.track_acl_event(event_type, metadata)
Errors (such as unique constraint violations from concurrent operations)
are logged but do not propagate to prevent crashing critical application logic.
"""
def track_map_event(event_type, metadata) do
case WandererApp.Map.Audit.track_map_event(event_type, metadata) do
{:ok, result} ->
{:ok, result}
{:error, error} ->
Logger.warning("Failed to track map event (non-critical)",
event_type: event_type,
map_id: metadata[:map_id],
error: inspect(error),
reason: :best_effort_tracking
)
# Return success to prevent crashes - activity tracking is best-effort
{:ok, nil}
end
end
@doc """
Track an ACL-related event. Always returns `{:ok, result}` even on error.
Errors are logged but do not propagate to prevent crashing critical application logic.
"""
def track_acl_event(event_type, metadata) do
case WandererApp.Map.Audit.track_acl_event(event_type, metadata) do
{:ok, result} ->
{:ok, result}
{:error, error} ->
Logger.warning("Failed to track ACL event (non-critical)",
event_type: event_type,
acl_id: metadata[:acl_id],
error: inspect(error),
reason: :best_effort_tracking
)
# Return success to prevent crashes - activity tracking is best-effort
{:ok, nil}
end
end
end

View File

@@ -59,14 +59,13 @@ defmodule WandererAppWeb.MapConnectionsEventHandler do
character_id: main_character_id
})
{:ok, _} =
WandererApp.User.ActivityTracker.track_map_event(:map_connection_added, %{
character_id: main_character_id,
user_id: current_user_id,
map_id: map_id,
solar_system_source_id: "#{solar_system_source_id}" |> String.to_integer(),
solar_system_target_id: "#{solar_system_target_id}" |> String.to_integer()
})
WandererApp.User.ActivityTracker.track_map_event(:map_connection_added, %{
character_id: main_character_id,
user_id: current_user_id,
map_id: map_id,
solar_system_source_id: "#{solar_system_source_id}" |> String.to_integer(),
solar_system_target_id: "#{solar_system_target_id}" |> String.to_integer()
})
{:noreply, socket}
end
@@ -149,7 +148,6 @@ defmodule WandererAppWeb.MapConnectionsEventHandler do
end
end
{:ok, _} =
WandererApp.User.ActivityTracker.track_map_event(:map_connection_removed, %{
character_id: main_character_id,
user_id: current_user_id,
@@ -202,7 +200,6 @@ defmodule WandererAppWeb.MapConnectionsEventHandler do
_ -> nil
end
{:ok, _} =
WandererApp.User.ActivityTracker.track_map_event(:map_connection_updated, %{
character_id: main_character_id,
user_id: current_user_id,

View File

@@ -21,59 +21,85 @@ defmodule WandererAppWeb.MapCoreEventHandler do
:refresh_permissions,
%{assigns: %{current_user: current_user, map_slug: map_slug}} = socket
) do
{:ok, %{id: map_id, user_permissions: user_permissions, owner_id: owner_id}} =
map_slug
|> WandererApp.Api.Map.get_map_by_slug!()
|> Ash.load(:user_permissions, actor: current_user)
try do
{:ok, %{id: map_id, user_permissions: user_permissions, owner_id: owner_id}} =
map_slug
|> WandererApp.Api.Map.get_map_by_slug!()
|> Ash.load(:user_permissions, actor: current_user)
user_permissions =
WandererApp.Permissions.get_map_permissions(
user_permissions,
owner_id,
current_user.characters |> Enum.map(& &1.id)
)
user_permissions =
WandererApp.Permissions.get_map_permissions(
user_permissions,
owner_id,
current_user.characters |> Enum.map(& &1.id)
)
case user_permissions do
%{view_system: false} ->
socket
|> Phoenix.LiveView.put_flash(:error, "Your access to the map have been revoked.")
|> Phoenix.LiveView.push_navigate(to: ~p"/maps")
case user_permissions do
%{view_system: false} ->
socket
|> Phoenix.LiveView.put_flash(:error, "Your access to the map have been revoked.")
|> Phoenix.LiveView.push_navigate(to: ~p"/maps")
%{track_character: track_character} ->
{:ok, map_characters} =
case WandererApp.MapCharacterSettingsRepo.get_tracked_by_map_filtered(
map_id,
current_user.characters |> Enum.map(& &1.id)
) do
{:ok, settings} ->
{:ok,
settings
|> Enum.map(fn s -> s |> Ash.load!(:character) |> Map.get(:character) end)}
%{track_character: track_character} ->
{:ok, map_characters} =
case WandererApp.MapCharacterSettingsRepo.get_tracked_by_map_filtered(
map_id,
current_user.characters |> Enum.map(& &1.id)
) do
{:ok, settings} ->
{:ok,
settings
|> Enum.map(fn s -> s |> Ash.load!(:character) |> Map.get(:character) end)}
_ ->
{:ok, []}
end
case track_character do
false ->
:ok = WandererApp.Character.TrackingUtils.untrack(map_characters, map_id, self())
_ ->
{:ok, []}
:ok =
WandererApp.Character.TrackingUtils.track(
map_characters,
map_id,
true,
self()
)
end
case track_character do
false ->
:ok = WandererApp.Character.TrackingUtils.untrack(map_characters, map_id, self())
socket
|> assign(user_permissions: user_permissions)
|> MapEventHandler.push_map_event(
"user_permissions",
user_permissions
)
end
rescue
error in Ash.Error.Invalid.MultipleResults ->
Logger.error("Multiple maps found with slug '#{map_slug}' during refresh_permissions",
slug: map_slug,
error: inspect(error)
)
_ ->
:ok =
WandererApp.Character.TrackingUtils.track(
map_characters,
map_id,
true,
self()
)
end
# Emit telemetry for monitoring
:telemetry.execute(
[:wanderer_app, :map, :duplicate_slug_detected],
%{count: 1},
%{slug: map_slug, operation: :refresh_permissions}
)
# Return socket unchanged - permissions won't refresh but won't crash
socket
error ->
Logger.error("Error refreshing permissions for map slug '#{map_slug}'",
slug: map_slug,
error: inspect(error)
)
socket
|> assign(user_permissions: user_permissions)
|> MapEventHandler.push_map_event(
"user_permissions",
user_permissions
)
end
end

View File

@@ -165,13 +165,12 @@ defmodule WandererAppWeb.MapRoutesEventHandler do
solar_system_id: solar_system_id
})
{:ok, _} =
WandererApp.User.ActivityTracker.track_map_event(:hub_added, %{
character_id: main_character_id,
user_id: current_user.id,
map_id: map_id,
solar_system_id: solar_system_id
})
WandererApp.User.ActivityTracker.track_map_event(:hub_added, %{
character_id: main_character_id,
user_id: current_user.id,
map_id: map_id,
solar_system_id: solar_system_id
})
{:noreply, socket}
else
@@ -204,13 +203,12 @@ defmodule WandererAppWeb.MapRoutesEventHandler do
solar_system_id: solar_system_id
})
{:ok, _} =
WandererApp.User.ActivityTracker.track_map_event(:hub_removed, %{
character_id: main_character_id,
user_id: current_user.id,
map_id: map_id,
solar_system_id: solar_system_id
})
WandererApp.User.ActivityTracker.track_map_event(:hub_removed, %{
character_id: main_character_id,
user_id: current_user.id,
map_id: map_id,
solar_system_id: solar_system_id
})
{:noreply, socket}
end

View File

@@ -250,15 +250,14 @@ defmodule WandererAppWeb.MapSystemsEventHandler do
|> Map.put_new(key_atom, value)
])
{:ok, _} =
WandererApp.User.ActivityTracker.track_map_event(:system_updated, %{
character_id: main_character_id,
user_id: current_user.id,
map_id: map_id,
solar_system_id: "#{solar_system_id}" |> String.to_integer(),
key: key_atom,
value: value
})
WandererApp.User.ActivityTracker.track_map_event(:system_updated, %{
character_id: main_character_id,
user_id: current_user.id,
map_id: map_id,
solar_system_id: "#{solar_system_id}" |> String.to_integer(),
key: key_atom,
value: value
})
end
{:noreply, socket}

View File

@@ -74,6 +74,13 @@ defmodule WandererAppWeb.MapLive do
"You don't have main character set, please update it in tracking settings (top right icon)."
)}
def handle_info(:map_deleted, socket),
do:
{:noreply,
socket
|> put_flash(:info, "This map has been deleted.")
|> push_navigate(to: ~p"/maps")}
def handle_info(:no_access, socket),
do:
{:noreply,

View File

@@ -1,6 +1,8 @@
defmodule WandererAppWeb.MapsLive do
use WandererAppWeb, :live_view
alias Phoenix.LiveView.AsyncResult
require Logger
@pubsub_client Application.compile_env(:wanderer_app, :pubsub_client)
@@ -275,17 +277,57 @@ defmodule WandererAppWeb.MapsLive do
:telemetry.execute([:wanderer_app, :map, :created], %{count: 1})
maybe_create_default_acl(form, new_map)
# Reload maps synchronously to avoid timing issues with flash messages
{:ok, %{maps: maps}} = load_maps(current_user)
{:noreply,
socket
|> assign_async(:maps, fn ->
load_maps(current_user)
end)
|> put_flash(
:info,
"Map '#{new_map.name}' created successfully with slug '#{new_map.slug}'"
)
|> assign(:maps, AsyncResult.ok(maps))
|> push_patch(to: ~p"/maps")}
{:error, %Ash.Error.Invalid{errors: errors}} ->
# Check for slug uniqueness constraint violation
slug_error =
Enum.find(errors, fn error ->
case error do
%{field: :slug} -> true
%{message: message} when is_binary(message) -> String.contains?(message, "unique")
_ -> false
end
end)
error_message =
if slug_error do
"A map with this name already exists. The system will automatically adjust the name if needed. Please try again."
else
errors
|> Enum.map(fn error ->
field = Map.get(error, :field, "field")
message = Map.get(error, :message, "validation error")
"#{field}: #{message}"
end)
|> Enum.join(", ")
end
Logger.warning("Map creation failed",
form: form,
errors: inspect(errors),
slug_error: slug_error != nil
)
{:noreply,
socket
|> put_flash(:error, "Failed to create map: #{error_message}")
|> assign(error: error_message)}
{:error, %{errors: errors}} ->
error_message =
errors
|> Enum.map(fn %{field: _field} = error ->
|> Enum.map(fn error ->
"#{Map.get(error, :message, "Field validation error")}"
end)
|> Enum.join(", ")
@@ -296,9 +338,14 @@ defmodule WandererAppWeb.MapsLive do
|> assign(error: error_message)}
{:error, error} ->
Logger.error("Unexpected error creating map",
form: form,
error: inspect(error)
)
{:noreply,
socket
|> put_flash(:error, "Failed to create map")
|> put_flash(:error, "Failed to create map. Please try again.")
|> assign(error: error)}
end
end
@@ -342,99 +389,156 @@ defmodule WandererAppWeb.MapsLive do
%{"form" => form} = _params,
%{assigns: %{map_slug: map_slug, current_user: current_user}} = socket
) do
{:ok, map} =
map_slug
|> WandererApp.Api.Map.get_map_by_slug!()
|> Ash.load(:acls)
case get_map_by_slug_safely(map_slug) do
{:ok, map} ->
# Successfully found the map, proceed with loading and updating
{:ok, map_with_acls} = Ash.load(map, :acls)
scope =
form
|> Map.get("scope")
|> case do
"" -> "wormholes"
scope -> scope
end
scope =
form
|> Map.get("scope")
|> case do
"" -> "wormholes"
scope -> scope
end
form =
form
|> Map.put("acls", form["acls"] || [])
|> Map.put("scope", scope)
|> Map.put(
"only_tracked_characters",
(form["only_tracked_characters"] || "false") |> String.to_existing_atom()
)
form =
form
|> Map.put("acls", form["acls"] || [])
|> Map.put("scope", scope)
|> Map.put(
"only_tracked_characters",
(form["only_tracked_characters"] || "false") |> String.to_existing_atom()
)
map
|> WandererApp.Api.Map.update(form)
|> case do
{:ok, _updated_map} ->
{added_acls, removed_acls} = map.acls |> Enum.map(& &1.id) |> _get_acls_diff(form["acls"])
map_with_acls
|> WandererApp.Api.Map.update(form)
|> case do
{:ok, _updated_map} ->
{added_acls, removed_acls} =
map_with_acls.acls |> Enum.map(& &1.id) |> _get_acls_diff(form["acls"])
Phoenix.PubSub.broadcast(
WandererApp.PubSub,
"maps:#{map.id}",
{:map_acl_updated, map.id, added_acls, removed_acls}
)
Phoenix.PubSub.broadcast(
WandererApp.PubSub,
"maps:#{map_with_acls.id}",
{:map_acl_updated, map_with_acls.id, added_acls, removed_acls}
)
{:ok, tracked_characters} =
WandererApp.Maps.get_tracked_map_characters(map.id, current_user)
{:ok, tracked_characters} =
WandererApp.Maps.get_tracked_map_characters(map_with_acls.id, current_user)
first_tracked_character_id = Enum.map(tracked_characters, & &1.id) |> List.first()
first_tracked_character_id = Enum.map(tracked_characters, & &1.id) |> List.first()
added_acls
|> Enum.each(fn acl_id ->
{:ok, _} =
WandererApp.User.ActivityTracker.track_map_event(:map_acl_added, %{
character_id: first_tracked_character_id,
user_id: current_user.id,
map_id: map.id,
acl_id: acl_id
})
end)
added_acls
|> Enum.each(fn acl_id ->
WandererApp.User.ActivityTracker.track_map_event(:map_acl_added, %{
character_id: first_tracked_character_id,
user_id: current_user.id,
map_id: map_with_acls.id,
acl_id: acl_id
})
end)
removed_acls
|> Enum.each(fn acl_id ->
{:ok, _} =
WandererApp.User.ActivityTracker.track_map_event(:map_acl_removed, %{
character_id: first_tracked_character_id,
user_id: current_user.id,
map_id: map.id,
acl_id: acl_id
})
end)
removed_acls
|> Enum.each(fn acl_id ->
WandererApp.User.ActivityTracker.track_map_event(:map_acl_removed, %{
character_id: first_tracked_character_id,
user_id: current_user.id,
map_id: map_with_acls.id,
acl_id: acl_id
})
end)
{:noreply,
socket
|> push_navigate(to: ~p"/maps")}
{:error, error} ->
{:noreply,
socket
|> put_flash(:error, "Failed to update map")
|> assign(error: error)}
end
{:error, :multiple_results} ->
{:noreply,
socket
|> put_flash(
:error,
"Multiple maps found with this identifier. Please contact support to resolve this issue."
)
|> push_navigate(to: ~p"/maps")}
{:error, error} ->
{:error, :not_found} ->
{:noreply,
socket
|> put_flash(:error, "Failed to update map")
|> assign(error: error)}
|> put_flash(:error, "Map not found")
|> push_navigate(to: ~p"/maps")}
{:error, _reason} ->
{:noreply,
socket
|> put_flash(:error, "Failed to load map. Please try again.")
|> push_navigate(to: ~p"/maps")}
end
end
def handle_event("delete", %{"data" => map_slug} = _params, socket) do
map =
map_slug
|> WandererApp.Api.Map.get_map_by_slug!()
|> WandererApp.Api.Map.mark_as_deleted!()
case get_map_by_slug_safely(map_slug) do
{:ok, map} ->
# Successfully found the map, proceed with deletion
deleted_map = WandererApp.Api.Map.mark_as_deleted!(map)
Phoenix.PubSub.broadcast(
WandererApp.PubSub,
"maps:#{map.id}",
:map_deleted
)
Phoenix.PubSub.broadcast(
WandererApp.PubSub,
"maps:#{deleted_map.id}",
:map_deleted
)
current_user = socket.assigns.current_user
current_user = socket.assigns.current_user
{:noreply,
socket
|> assign_async(:maps, fn ->
load_maps(current_user)
end)
|> push_patch(to: ~p"/maps")}
# Reload maps synchronously to avoid timing issues with flash messages
{:ok, %{maps: maps}} = load_maps(current_user)
{:noreply,
socket
|> assign(:maps, AsyncResult.ok(maps))
|> push_patch(to: ~p"/maps")}
{:error, :multiple_results} ->
# Multiple maps found with this slug - data integrity issue
# Reload maps synchronously
{:ok, %{maps: maps}} = load_maps(socket.assigns.current_user)
{:noreply,
socket
|> put_flash(
:error,
"Multiple maps found with this identifier. Please contact support to resolve this issue."
)
|> assign(:maps, AsyncResult.ok(maps))}
{:error, :not_found} ->
# Map not found
# Reload maps synchronously
{:ok, %{maps: maps}} = load_maps(socket.assigns.current_user)
{:noreply,
socket
|> put_flash(:error, "Map not found or already deleted")
|> assign(:maps, AsyncResult.ok(maps))
|> push_patch(to: ~p"/maps")}
{:error, _reason} ->
# Other error
# Reload maps synchronously
{:ok, %{maps: maps}} = load_maps(socket.assigns.current_user)
{:noreply,
socket
|> put_flash(:error, "Failed to delete map. Please try again.")
|> assign(:maps, AsyncResult.ok(maps))}
end
end
def handle_event(
@@ -683,4 +787,49 @@ defmodule WandererAppWeb.MapsLive do
map
|> Map.put(:acls, acls |> Enum.map(&map_acl/1))
end
@doc """
Safely retrieves a map by slug, handling the case where multiple maps
with the same slug exist (database integrity issue).
Returns:
- `{:ok, map}` - Single map found
- `{:error, :multiple_results}` - Multiple maps found (logs error)
- `{:error, :not_found}` - No map found
- `{:error, reason}` - Other error
"""
defp get_map_by_slug_safely(slug) do
try do
map = WandererApp.Api.Map.get_map_by_slug!(slug)
{:ok, map}
rescue
error in Ash.Error.Invalid.MultipleResults ->
Logger.error("Multiple maps found with slug '#{slug}' - database integrity issue",
slug: slug,
error: inspect(error)
)
# Emit telemetry for monitoring
:telemetry.execute(
[:wanderer_app, :map, :duplicate_slug_detected],
%{count: 1},
%{slug: slug, operation: :get_by_slug}
)
# Return error - caller should handle this appropriately
{:error, :multiple_results}
error in Ash.Error.Query.NotFound ->
Logger.debug("Map not found with slug: #{slug}")
{:error, :not_found}
error ->
Logger.error("Error retrieving map by slug",
slug: slug,
error: inspect(error)
)
{:error, :unknown_error}
end
end
end

View File

@@ -3,7 +3,7 @@ defmodule WandererApp.MixProject do
@source_url "https://github.com/wanderer-industries/wanderer"
@version "1.84.13"
@version "1.84.23"
def project do
[

View File

@@ -0,0 +1,212 @@
defmodule WandererApp.Repo.Migrations.EnsureNoDuplicateMapSlugs do
@moduledoc """
Final migration to ensure all duplicate map slugs are removed and unique index exists.
This migration:
1. Checks for any remaining duplicate slugs
2. Fixes duplicates by renaming them (keeps oldest, renames newer ones)
3. Ensures unique index exists on maps_v1.slug
4. Verifies no duplicates remain after migration
Safe to run multiple times (idempotent).
"""
use Ecto.Migration
import Ecto.Query
require Logger
def up do
IO.puts("\n=== Starting Map Slug Deduplication Migration ===\n")
# Step 1: Check for duplicates
duplicate_count = count_duplicates()
if duplicate_count > 0 do
IO.puts("Found #{duplicate_count} duplicate slug(s) - proceeding with cleanup...")
# Step 2: Drop index temporarily if it exists (to allow updates)
drop_index_if_exists()
# Step 3: Fix all duplicates
fix_duplicate_slugs()
# Step 4: Recreate unique index
ensure_unique_index()
else
IO.puts("No duplicate slugs found - ensuring unique index exists...")
ensure_unique_index()
end
# Step 5: Final verification
verify_no_duplicates()
IO.puts("\n=== Migration completed successfully! ===\n")
end
def down do
# This migration is idempotent and only fixes data integrity issues
# No need to revert as it doesn't change schema in a harmful way
IO.puts("This migration does not need to be reverted")
:ok
end
defp count_duplicates do
duplicates_query = """
SELECT COUNT(*) as duplicate_count
FROM (
SELECT slug
FROM maps_v1
WHERE deleted = false
GROUP BY slug
HAVING COUNT(*) > 1
) duplicates
"""
case repo().query(duplicates_query, []) do
{:ok, %{rows: [[count]]}} ->
count
{:error, error} ->
IO.puts("Error counting duplicates: #{inspect(error)}")
0
end
end
defp drop_index_if_exists do
index_exists_query = """
SELECT EXISTS (
SELECT 1
FROM pg_indexes
WHERE tablename = 'maps_v1'
AND indexname = 'maps_v1_unique_slug_index'
)
"""
case repo().query(index_exists_query, []) do
{:ok, %{rows: [[true]]}} ->
IO.puts("Temporarily dropping unique index to allow updates...")
execute("DROP INDEX IF EXISTS maps_v1_unique_slug_index")
IO.puts("✓ Index dropped")
{:ok, %{rows: [[false]]}} ->
IO.puts("No existing index to drop")
{:error, error} ->
IO.puts("Error checking index: #{inspect(error)}")
end
end
defp fix_duplicate_slugs do
# Get all duplicate slugs with their IDs and timestamps
# Order by inserted_at to keep the oldest one unchanged
duplicates_query = """
SELECT
slug,
array_agg(id::text ORDER BY inserted_at ASC, id ASC) as ids,
array_agg(name ORDER BY inserted_at ASC, id ASC) as names
FROM maps_v1
WHERE deleted = false
GROUP BY slug
HAVING COUNT(*) > 1
ORDER BY slug
"""
case repo().query(duplicates_query, []) do
{:ok, %{rows: rows}} when length(rows) > 0 ->
IO.puts("\nFixing #{length(rows)} duplicate slug(s)...")
Enum.each(rows, fn [slug, ids, names] ->
IO.puts("\n Processing: '#{slug}' (#{length(ids)} duplicates)")
# Keep the first one (oldest by inserted_at), rename the rest
[keep_id | rename_ids] = ids
[keep_name | rename_names] = names
IO.puts(" ✓ Keeping: #{keep_id} - '#{keep_name}'")
# Rename duplicates
rename_ids
|> Enum.zip(rename_names)
|> Enum.with_index(2)
|> Enum.each(fn {{id_string, name}, n} ->
new_slug = generate_unique_slug(slug, n)
# Use parameterized query for safety
update_query = "UPDATE maps_v1 SET slug = $1 WHERE id::text = $2"
repo().query!(update_query, [new_slug, id_string])
IO.puts(" → Renamed: #{id_string} - '#{name}' to slug '#{new_slug}'")
end)
end)
IO.puts("\n✓ All duplicate slugs fixed!")
{:ok, %{rows: []}} ->
IO.puts("No duplicate slugs to fix")
{:error, error} ->
IO.puts("Error finding duplicates: #{inspect(error)}")
raise "Failed to query duplicate slugs: #{inspect(error)}"
end
end
defp generate_unique_slug(base_slug, n) when n >= 2 do
candidate = "#{base_slug}-#{n}"
# Check if this slug already exists
check_query = "SELECT COUNT(*) FROM maps_v1 WHERE slug = $1 AND deleted = false"
case repo().query!(check_query, [candidate]) do
%{rows: [[0]]} ->
candidate
%{rows: [[_count]]} ->
# Try next number
generate_unique_slug(base_slug, n + 1)
end
end
defp ensure_unique_index do
# Check if index exists
index_exists_query = """
SELECT EXISTS (
SELECT 1
FROM pg_indexes
WHERE tablename = 'maps_v1'
AND indexname = 'maps_v1_unique_slug_index'
)
"""
case repo().query(index_exists_query, []) do
{:ok, %{rows: [[true]]}} ->
IO.puts("✓ Unique index on slug already exists")
{:ok, %{rows: [[false]]}} ->
IO.puts("Creating unique index on slug column...")
create_if_not_exists(
index(:maps_v1, [:slug],
unique: true,
name: :maps_v1_unique_slug_index,
where: "deleted = false"
)
)
IO.puts("✓ Unique index created successfully!")
{:error, error} ->
IO.puts("Error checking index: #{inspect(error)}")
raise "Failed to check index existence: #{inspect(error)}"
end
end
defp verify_no_duplicates do
IO.puts("\nVerifying no duplicates remain...")
remaining_duplicates = count_duplicates()
if remaining_duplicates > 0 do
IO.puts("❌ ERROR: #{remaining_duplicates} duplicate(s) still exist!")
raise "Migration failed: duplicates still exist after cleanup"
else
IO.puts("✓ Verification passed: No duplicates found")
end
end
end

View File

@@ -0,0 +1,463 @@
defmodule WandererApp.Map.MapPoolCrashIntegrationTest do
@moduledoc """
Integration tests for MapPool crash recovery.
These tests verify end-to-end crash recovery behavior including:
- MapPool GenServer crashes and restarts
- State recovery from ETS
- Registry and cache consistency after recovery
- Telemetry events during recovery
- Multi-pool scenarios
Note: Many tests are skipped as they require full map infrastructure
(database, Server.Impl, map data, etc.) to be set up.
"""
use ExUnit.Case, async: false
alias WandererApp.Map.{MapPool, MapPoolDynamicSupervisor, MapPoolState}
@cache :map_pool_cache
@registry :map_pool_registry
@unique_registry :unique_map_pool_registry
@ets_table :map_pool_state_table
setup do
# Clean up any existing test data
cleanup_test_data()
# Check if required infrastructure is running
supervisor_running? = Process.whereis(MapPoolDynamicSupervisor) != nil
ets_exists? =
try do
:ets.info(@ets_table) != :undefined
rescue
_ -> false
end
on_exit(fn ->
cleanup_test_data()
end)
{:ok, supervisor_running: supervisor_running?, ets_exists: ets_exists?}
end
defp cleanup_test_data do
# Clean up test caches
WandererApp.Cache.delete("started_maps")
Cachex.clear(@cache)
# Clean up ETS entries
if :ets.whereis(@ets_table) != :undefined do
:ets.match_delete(@ets_table, {:"$1", :"$2", :"$3"})
end
end
defp find_pool_pid(uuid) do
pool_name = Module.concat(MapPool, uuid)
case Registry.lookup(@unique_registry, pool_name) do
[{pid, _value}] -> {:ok, pid}
[] -> {:error, :not_found}
end
end
describe "End-to-end crash recovery" do
@tag :skip
@tag :integration
test "MapPool recovers all maps after abnormal crash" do
# This test would:
# 1. Start a MapPool with test maps via MapPoolDynamicSupervisor
# 2. Verify maps are running and state is in ETS
# 3. Simulate crash using GenServer.call(pool_pid, :error)
# 4. Wait for supervisor to restart the pool
# 5. Verify all maps are recovered
# 6. Verify Registry, Cache, and ETS are consistent
# Requires:
# - Test map data in database
# - Server.Impl.start_map to work with test data
# - Full supervision tree running
:ok
end
@tag :skip
@tag :integration
test "MapPool preserves ETS state on abnormal termination" do
# This test would:
# 1. Start a MapPool with maps
# 2. Force crash
# 3. Verify ETS state is preserved (not deleted)
# 4. Verify new pool instance recovers from ETS
:ok
end
@tag :skip
@tag :integration
test "MapPool cleans ETS state on graceful shutdown" do
# This test would:
# 1. Start a MapPool with maps
# 2. Gracefully stop the pool (GenServer.cast(pool_pid, :stop))
# 3. Verify ETS state is deleted
# 4. Verify new pool starts with empty state
:ok
end
end
describe "Multi-pool crash scenarios" do
@tag :skip
@tag :integration
test "multiple pools crash and recover independently" do
# This test would:
# 1. Start multiple MapPool instances with different maps
# 2. Crash one pool
# 3. Verify only that pool recovers, others unaffected
# 4. Verify no cross-pool state corruption
:ok
end
@tag :skip
@tag :integration
test "concurrent pool crashes don't corrupt recovery state" do
# This test would:
# 1. Start multiple pools
# 2. Crash multiple pools simultaneously
# 3. Verify all pools recover correctly
# 4. Verify no ETS corruption or race conditions
:ok
end
end
describe "State consistency after recovery" do
@tag :skip
@tag :integration
test "Registry state matches recovered state" do
# This test would verify that after recovery:
# - unique_registry has correct map_ids for pool UUID
# - map_pool_registry has correct pool UUID entry
# - All map_ids in Registry match ETS state
:ok
end
@tag :skip
@tag :integration
test "Cache state matches recovered state" do
# This test would verify that after recovery:
# - map_pool_cache has correct map_id -> uuid mappings
# - started_maps cache includes all recovered maps
# - No orphaned cache entries
:ok
end
@tag :skip
@tag :integration
test "Map servers are actually running after recovery" do
# This test would:
# 1. Recover maps from crash
# 2. Verify each map's GenServer is actually running
# 3. Verify maps respond to requests
# 4. Verify map state is correct
:ok
end
end
describe "Recovery failure handling" do
@tag :skip
@tag :integration
test "recovery continues when individual map fails to start" do
# This test would:
# 1. Save state with maps [1, 2, 3] to ETS
# 2. Delete map 2 from database
# 3. Trigger recovery
# 4. Verify maps 1 and 3 recover successfully
# 5. Verify map 2 failure is logged and telemetry emitted
# 6. Verify pool continues with maps [1, 3]
:ok
end
@tag :skip
@tag :integration
test "recovery handles maps already running in different pool" do
# This test would simulate a race condition where:
# 1. Pool A crashes with map X
# 2. Before recovery, map X is started in Pool B
# 3. Pool A tries to recover map X
# 4. Verify conflict is detected and handled gracefully
:ok
end
@tag :skip
@tag :integration
test "recovery handles corrupted ETS state" do
# This test would:
# 1. Manually corrupt ETS state (invalid map IDs, wrong types, etc.)
# 2. Trigger recovery
# 3. Verify pool handles corruption gracefully
# 4. Verify telemetry emitted for failures
# 5. Verify pool continues with valid maps only
:ok
end
end
describe "Telemetry during recovery" do
test "telemetry events emitted in correct order", %{ets_exists: ets_exists?} do
if ets_exists? do
test_pid = self()
events = []
# Attach handlers for all recovery events
:telemetry.attach_many(
"test-recovery-events",
[
[:wanderer_app, :map_pool, :recovery, :start],
[:wanderer_app, :map_pool, :recovery, :complete],
[:wanderer_app, :map_pool, :recovery, :map_failed]
],
fn event, measurements, metadata, _config ->
send(test_pid, {:telemetry_event, event, measurements, metadata})
end,
nil
)
uuid = "test-pool-#{:rand.uniform(1_000_000)}"
# Simulate recovery sequence
# 1. Start event
:telemetry.execute(
[:wanderer_app, :map_pool, :recovery, :start],
%{recovered_map_count: 3, total_map_count: 3},
%{pool_uuid: uuid}
)
# 2. Complete event (in real recovery, this comes after all maps start)
:telemetry.execute(
[:wanderer_app, :map_pool, :recovery, :complete],
%{recovered_count: 3, failed_count: 0, duration_ms: 100},
%{pool_uuid: uuid}
)
# Verify we received both events
assert_receive {:telemetry_event, [:wanderer_app, :map_pool, :recovery, :start], _, _},
500
assert_receive {:telemetry_event, [:wanderer_app, :map_pool, :recovery, :complete], _, _},
500
:telemetry.detach("test-recovery-events")
else
:ok
end
end
@tag :skip
@tag :integration
test "telemetry includes accurate recovery statistics" do
# This test would verify that:
# - recovered_map_count matches actual recovered maps
# - failed_count matches actual failed maps
# - duration_ms is accurate
# - All metadata is correct
:ok
end
end
describe "Interaction with Reconciler" do
@tag :skip
@tag :integration
test "Reconciler doesn't interfere with crash recovery" do
# This test would:
# 1. Crash a pool with maps
# 2. Trigger both recovery and reconciliation
# 3. Verify they don't conflict
# 4. Verify final state is consistent
:ok
end
@tag :skip
@tag :integration
test "Reconciler detects failed recovery" do
# This test would:
# 1. Crash a pool with map X
# 2. Make recovery fail for map X
# 3. Run reconciler
# 4. Verify reconciler detects and potentially fixes the issue
:ok
end
end
describe "Edge cases" do
@tag :skip
@tag :integration
test "recovery during pool at capacity" do
# This test would:
# 1. Create pool with 19 maps
# 2. Crash pool while adding 20th map
# 3. Verify recovery handles capacity limit
# 4. Verify all maps start or overflow is handled
:ok
end
@tag :skip
@tag :integration
test "recovery with empty map list" do
# This test would:
# 1. Crash pool with empty map_ids
# 2. Verify recovery completes successfully
# 3. Verify pool starts with no maps
:ok
end
@tag :skip
@tag :integration
test "multiple crashes in quick succession" do
# This test would:
# 1. Crash pool
# 2. Immediately crash again during recovery
# 3. Verify supervisor's max_restarts is respected
# 4. Verify state remains consistent
:ok
end
end
describe "Performance under load" do
@tag :slow
@tag :skip
@tag :integration
test "recovery completes within 2 seconds for 20 maps" do
# This test would:
# 1. Create pool with 20 maps (pool limit)
# 2. Crash pool
# 3. Measure time to full recovery
# 4. Assert recovery < 2 seconds
:ok
end
@tag :slow
@tag :skip
@tag :integration
test "recovery doesn't block other pools" do
# This test would:
# 1. Start multiple pools
# 2. Crash one pool with many maps
# 3. Verify other pools continue to operate normally during recovery
# 4. Measure performance impact on healthy pools
:ok
end
end
describe "Supervisor interaction" do
test "ETS table survives individual pool crash", %{ets_exists: ets_exists?} do
if ets_exists? do
# Verify ETS table is owned by supervisor, not individual pools
table_info = :ets.info(@ets_table)
owner_pid = Keyword.get(table_info, :owner)
# Owner should be alive and be the supervisor or a system process
assert Process.alive?(owner_pid)
# Verify we can still access the table
uuid = "test-pool-#{:rand.uniform(1_000_000)}"
MapPoolState.save_pool_state(uuid, [1, 2, 3])
assert {:ok, [1, 2, 3]} = MapPoolState.get_pool_state(uuid)
else
:ok
end
end
@tag :skip
@tag :integration
test "supervisor restarts pool after crash" do
# This test would:
# 1. Start a pool via DynamicSupervisor
# 2. Crash the pool
# 3. Verify supervisor restarts it
# 4. Verify new PID is different from old PID
# 5. Verify pool is functional after restart
:ok
end
end
describe "Database consistency" do
@tag :skip
@tag :integration
test "recovered maps load latest state from database" do
# This test would:
# 1. Start maps with initial state
# 2. Modify map state in database
# 3. Crash pool
# 4. Verify recovered maps have latest database state
:ok
end
@tag :skip
@tag :integration
test "recovery uses MapState for map configuration" do
# This test would:
# 1. Verify recovery calls WandererApp.Map.get_map_state!/1
# 2. Verify state comes from database MapState table
# 3. Verify maps start with correct configuration
:ok
end
end
describe "Real-world scenarios" do
@tag :skip
@tag :integration
test "recovery after OOM crash" do
# This test would simulate recovery after out-of-memory crash:
# 1. Start pool with maps
# 2. Simulate OOM condition
# 3. Verify recovery completes successfully
# 4. Verify no memory leaks after recovery
:ok
end
@tag :skip
@tag :integration
test "recovery after network partition" do
# This test would simulate recovery after network issues:
# 1. Start maps with external dependencies
# 2. Simulate network partition
# 3. Crash pool
# 4. Verify recovery handles network errors gracefully
:ok
end
@tag :skip
@tag :integration
test "recovery preserves user sessions" do
# This test would:
# 1. Start maps with active user sessions
# 2. Crash pool
# 3. Verify users can continue after recovery
# 4. Verify presence tracking works after recovery
:ok
end
end
end

View File

@@ -410,7 +410,7 @@ defmodule WandererApp.Map.CacheRTreeTest do
# Check many positions for availability (simulating auto-positioning)
test_positions = for x <- 0..20, y <- 0..20, do: {x * 100, y * 50}
for {x, y} do
for {x, y} <- test_positions do
box = [{x, x + 130}, {y, y + 34}]
{:ok, _ids} = CacheRTree.query(box, name)
# Not asserting anything, just verifying queries work

View File

@@ -0,0 +1,561 @@
defmodule WandererApp.Map.MapPoolCrashRecoveryTest do
use ExUnit.Case, async: false
alias WandererApp.Map.{MapPool, MapPoolState}
@cache :map_pool_cache
@registry :map_pool_registry
@unique_registry :unique_map_pool_registry
@ets_table :map_pool_state_table
setup do
# Clean up any existing test data
cleanup_test_data()
# Check if ETS table exists
ets_exists? =
try do
:ets.info(@ets_table) != :undefined
rescue
_ -> false
end
on_exit(fn ->
cleanup_test_data()
end)
{:ok, ets_exists: ets_exists?}
end
defp cleanup_test_data do
# Clean up test caches
WandererApp.Cache.delete("started_maps")
Cachex.clear(@cache)
# Clean up ETS entries for test pools
if :ets.whereis(@ets_table) != :undefined do
:ets.match_delete(@ets_table, {:"$1", :"$2", :"$3"})
end
end
defp create_test_pool_with_uuid(uuid, map_ids) do
# Manually register in unique_registry
{:ok, _} = Registry.register(@unique_registry, Module.concat(MapPool, uuid), map_ids)
{:ok, _} = Registry.register(@registry, MapPool, uuid)
# Add to cache
Enum.each(map_ids, fn map_id ->
Cachex.put(@cache, map_id, uuid)
end)
# Save to ETS
MapPoolState.save_pool_state(uuid, map_ids)
uuid
end
defp get_pool_map_ids(uuid) do
case Registry.lookup(@unique_registry, Module.concat(MapPool, uuid)) do
[{_pid, map_ids}] -> map_ids
[] -> []
end
end
describe "MapPoolState - ETS operations" do
test "save_pool_state stores state in ETS", %{ets_exists: ets_exists?} do
if ets_exists? do
uuid = "test-pool-#{:rand.uniform(1_000_000)}"
map_ids = [1, 2, 3]
assert :ok = MapPoolState.save_pool_state(uuid, map_ids)
# Verify it's in ETS
assert {:ok, ^map_ids} = MapPoolState.get_pool_state(uuid)
else
:ok
end
end
test "get_pool_state returns not_found for non-existent pool", %{ets_exists: ets_exists?} do
if ets_exists? do
uuid = "non-existent-#{:rand.uniform(1_000_000)}"
assert {:error, :not_found} = MapPoolState.get_pool_state(uuid)
else
:ok
end
end
test "delete_pool_state removes state from ETS", %{ets_exists: ets_exists?} do
if ets_exists? do
uuid = "test-pool-#{:rand.uniform(1_000_000)}"
map_ids = [1, 2, 3]
MapPoolState.save_pool_state(uuid, map_ids)
assert {:ok, ^map_ids} = MapPoolState.get_pool_state(uuid)
assert :ok = MapPoolState.delete_pool_state(uuid)
assert {:error, :not_found} = MapPoolState.get_pool_state(uuid)
else
:ok
end
end
test "save_pool_state updates existing state", %{ets_exists: ets_exists?} do
if ets_exists? do
uuid = "test-pool-#{:rand.uniform(1_000_000)}"
# Save initial state
MapPoolState.save_pool_state(uuid, [1, 2])
assert {:ok, [1, 2]} = MapPoolState.get_pool_state(uuid)
# Update state
MapPoolState.save_pool_state(uuid, [1, 2, 3, 4])
assert {:ok, [1, 2, 3, 4]} = MapPoolState.get_pool_state(uuid)
else
:ok
end
end
test "list_all_states returns all pool states", %{ets_exists: ets_exists?} do
if ets_exists? do
# Clean first
:ets.delete_all_objects(@ets_table)
uuid1 = "test-pool-1-#{:rand.uniform(1_000_000)}"
uuid2 = "test-pool-2-#{:rand.uniform(1_000_000)}"
MapPoolState.save_pool_state(uuid1, [1, 2])
MapPoolState.save_pool_state(uuid2, [3, 4])
states = MapPoolState.list_all_states()
assert length(states) >= 2
# Verify our pools are in there
uuids = Enum.map(states, fn {uuid, _map_ids, _timestamp} -> uuid end)
assert uuid1 in uuids
assert uuid2 in uuids
else
:ok
end
end
test "count_states returns correct count", %{ets_exists: ets_exists?} do
if ets_exists? do
# Clean first
:ets.delete_all_objects(@ets_table)
uuid1 = "test-pool-1-#{:rand.uniform(1_000_000)}"
uuid2 = "test-pool-2-#{:rand.uniform(1_000_000)}"
MapPoolState.save_pool_state(uuid1, [1, 2])
MapPoolState.save_pool_state(uuid2, [3, 4])
count = MapPoolState.count_states()
assert count >= 2
else
:ok
end
end
end
describe "MapPoolState - stale entry cleanup" do
test "cleanup_stale_entries removes old entries", %{ets_exists: ets_exists?} do
if ets_exists? do
uuid = "stale-pool-#{:rand.uniform(1_000_000)}"
# Manually insert a stale entry (24+ hours old)
stale_timestamp = System.system_time(:second) - 25 * 3600
:ets.insert(@ets_table, {uuid, [1, 2], stale_timestamp})
assert {:ok, [1, 2]} = MapPoolState.get_pool_state(uuid)
# Clean up stale entries
{:ok, deleted_count} = MapPoolState.cleanup_stale_entries()
assert deleted_count >= 1
# Verify stale entry was removed
assert {:error, :not_found} = MapPoolState.get_pool_state(uuid)
else
:ok
end
end
test "cleanup_stale_entries preserves recent entries", %{ets_exists: ets_exists?} do
if ets_exists? do
uuid = "recent-pool-#{:rand.uniform(1_000_000)}"
map_ids = [1, 2, 3]
# Save recent entry
MapPoolState.save_pool_state(uuid, map_ids)
# Clean up
MapPoolState.cleanup_stale_entries()
# Recent entry should still exist
assert {:ok, ^map_ids} = MapPoolState.get_pool_state(uuid)
else
:ok
end
end
end
describe "Crash recovery - basic scenarios" do
@tag :skip
test "MapPool recovers single map after crash" do
# This test requires a full MapPool GenServer with actual map data
# Skipping as it needs integration with Server.Impl.start_map
:ok
end
@tag :skip
test "MapPool recovers multiple maps after crash" do
# Similar to above - requires full integration
:ok
end
@tag :skip
test "MapPool merges new and recovered map_ids" do
# Tests that if pool crashes while starting a new map,
# both the new map and recovered maps are started
:ok
end
end
describe "Crash recovery - telemetry" do
test "recovery emits start telemetry event", %{ets_exists: ets_exists?} do
if ets_exists? do
test_pid = self()
# Attach telemetry handler
:telemetry.attach(
"test-recovery-start",
[:wanderer_app, :map_pool, :recovery, :start],
fn _event, measurements, metadata, _config ->
send(test_pid, {:telemetry_start, measurements, metadata})
end,
nil
)
uuid = "test-pool-#{:rand.uniform(1_000_000)}"
recovered_maps = [1, 2, 3]
# Save state to ETS (simulating previous run)
MapPoolState.save_pool_state(uuid, recovered_maps)
# Simulate init with recovery
# Note: Can't actually start a MapPool here without full integration,
# but we can verify the telemetry handler is set up correctly
# Manually emit the event to test handler
:telemetry.execute(
[:wanderer_app, :map_pool, :recovery, :start],
%{recovered_map_count: 3, total_map_count: 3},
%{pool_uuid: uuid}
)
assert_receive {:telemetry_start, measurements, metadata}, 500
assert measurements.recovered_map_count == 3
assert measurements.total_map_count == 3
assert metadata.pool_uuid == uuid
# Cleanup
:telemetry.detach("test-recovery-start")
else
:ok
end
end
test "recovery emits complete telemetry event", %{ets_exists: ets_exists?} do
if ets_exists? do
test_pid = self()
:telemetry.attach(
"test-recovery-complete",
[:wanderer_app, :map_pool, :recovery, :complete],
fn _event, measurements, metadata, _config ->
send(test_pid, {:telemetry_complete, measurements, metadata})
end,
nil
)
uuid = "test-pool-#{:rand.uniform(1_000_000)}"
# Manually emit the event
:telemetry.execute(
[:wanderer_app, :map_pool, :recovery, :complete],
%{recovered_count: 3, failed_count: 0, duration_ms: 100},
%{pool_uuid: uuid}
)
assert_receive {:telemetry_complete, measurements, metadata}, 500
assert measurements.recovered_count == 3
assert measurements.failed_count == 0
assert measurements.duration_ms == 100
assert metadata.pool_uuid == uuid
:telemetry.detach("test-recovery-complete")
else
:ok
end
end
test "recovery emits map_failed telemetry event", %{ets_exists: ets_exists?} do
if ets_exists? do
test_pid = self()
:telemetry.attach(
"test-recovery-map-failed",
[:wanderer_app, :map_pool, :recovery, :map_failed],
fn _event, measurements, metadata, _config ->
send(test_pid, {:telemetry_map_failed, measurements, metadata})
end,
nil
)
uuid = "test-pool-#{:rand.uniform(1_000_000)}"
failed_map_id = 123
# Manually emit the event
:telemetry.execute(
[:wanderer_app, :map_pool, :recovery, :map_failed],
%{map_id: failed_map_id},
%{pool_uuid: uuid, reason: "Map not found"}
)
assert_receive {:telemetry_map_failed, measurements, metadata}, 500
assert measurements.map_id == failed_map_id
assert metadata.pool_uuid == uuid
assert metadata.reason == "Map not found"
:telemetry.detach("test-recovery-map-failed")
else
:ok
end
end
end
describe "Crash recovery - state persistence" do
@tag :skip
test "state persisted after successful map start" do
# Would need to start actual MapPool and trigger start_map
:ok
end
@tag :skip
test "state persisted after successful map stop" do
# Would need to start actual MapPool and trigger stop_map
:ok
end
@tag :skip
test "state persisted during backup_state" do
# Would need to trigger backup_state handler
:ok
end
end
describe "Graceful shutdown cleanup" do
test "ETS state cleaned on normal termination", %{ets_exists: ets_exists?} do
if ets_exists? do
uuid = "test-pool-#{:rand.uniform(1_000_000)}"
map_ids = [1, 2, 3]
# Save state
MapPoolState.save_pool_state(uuid, map_ids)
assert {:ok, ^map_ids} = MapPoolState.get_pool_state(uuid)
# Simulate graceful shutdown by calling delete
MapPoolState.delete_pool_state(uuid)
# State should be gone
assert {:error, :not_found} = MapPoolState.get_pool_state(uuid)
else
:ok
end
end
@tag :skip
test "ETS state preserved on abnormal termination" do
# Would need to actually crash a MapPool to test this
# The terminate callback would not call delete_pool_state
:ok
end
end
describe "Edge cases" do
test "recovery with empty map_ids list", %{ets_exists: ets_exists?} do
if ets_exists? do
uuid = "test-pool-#{:rand.uniform(1_000_000)}"
# Save empty state
MapPoolState.save_pool_state(uuid, [])
assert {:ok, []} = MapPoolState.get_pool_state(uuid)
else
:ok
end
end
test "recovery with duplicate map_ids gets deduplicated", %{ets_exists: ets_exists?} do
if ets_exists? do
# This tests the deduplication logic in init
# If we have [1, 2] in ETS and [2, 3] in new map_ids,
# result should be [1, 2, 3] after Enum.uniq
recovered_maps = [1, 2]
new_maps = [2, 3]
expected = Enum.uniq(recovered_maps ++ new_maps)
# Should be [1, 2, 3] or [2, 3, 1] depending on order
assert 1 in expected
assert 2 in expected
assert 3 in expected
assert length(expected) == 3
else
:ok
end
end
test "large number of maps in recovery", %{ets_exists: ets_exists?} do
if ets_exists? do
uuid = "test-pool-#{:rand.uniform(1_000_000)}"
# Test with 20 maps (the pool limit)
map_ids = Enum.to_list(1..20)
MapPoolState.save_pool_state(uuid, map_ids)
assert {:ok, recovered} = MapPoolState.get_pool_state(uuid)
assert length(recovered) == 20
assert recovered == map_ids
else
:ok
end
end
end
describe "Concurrent operations" do
test "multiple pools can save state concurrently", %{ets_exists: ets_exists?} do
if ets_exists? do
# Create 10 pools concurrently
tasks =
1..10
|> Enum.map(fn i ->
Task.async(fn ->
uuid = "concurrent-pool-#{i}-#{:rand.uniform(1_000_000)}"
map_ids = [i * 10, i * 10 + 1]
MapPoolState.save_pool_state(uuid, map_ids)
{uuid, map_ids}
end)
end)
results = Task.await_many(tasks, 5000)
# Verify all pools saved successfully
Enum.each(results, fn {uuid, expected_map_ids} ->
assert {:ok, ^expected_map_ids} = MapPoolState.get_pool_state(uuid)
end)
else
:ok
end
end
test "concurrent reads and writes don't corrupt state", %{ets_exists: ets_exists?} do
if ets_exists? do
uuid = "test-pool-#{:rand.uniform(1_000_000)}"
MapPoolState.save_pool_state(uuid, [1, 2, 3])
# Spawn multiple readers and writers
readers =
1..5
|> Enum.map(fn _ ->
Task.async(fn ->
MapPoolState.get_pool_state(uuid)
end)
end)
writers =
1..5
|> Enum.map(fn i ->
Task.async(fn ->
MapPoolState.save_pool_state(uuid, [i, i + 1])
end)
end)
# All operations should complete without error
reader_results = Task.await_many(readers, 5000)
writer_results = Task.await_many(writers, 5000)
assert Enum.all?(reader_results, fn
{:ok, _} -> true
_ -> false
end)
assert Enum.all?(writer_results, fn :ok -> true end)
# Final state should be valid (one of the writer's values)
assert {:ok, final_state} = MapPoolState.get_pool_state(uuid)
assert is_list(final_state)
assert length(final_state) == 2
else
:ok
end
end
end
describe "Performance" do
@tag :slow
test "recovery completes within acceptable time", %{ets_exists: ets_exists?} do
if ets_exists? do
uuid = "perf-pool-#{:rand.uniform(1_000_000)}"
# Test with pool at limit (20 maps)
map_ids = Enum.to_list(1..20)
# Measure save time
{save_time_us, :ok} = :timer.tc(fn ->
MapPoolState.save_pool_state(uuid, map_ids)
end)
# Measure retrieval time
{get_time_us, {:ok, _}} = :timer.tc(fn ->
MapPoolState.get_pool_state(uuid)
end)
# Both operations should be very fast (< 1ms)
assert save_time_us < 1000, "Save took #{save_time_us}µs, expected < 1000µs"
assert get_time_us < 1000, "Get took #{get_time_us}µs, expected < 1000µs"
else
:ok
end
end
@tag :slow
test "cleanup performance with many stale entries", %{ets_exists: ets_exists?} do
if ets_exists? do
# Insert 100 stale entries
stale_timestamp = System.system_time(:second) - 25 * 3600
1..100
|> Enum.each(fn i ->
uuid = "stale-pool-#{i}"
:ets.insert(@ets_table, {uuid, [i], stale_timestamp})
end)
# Measure cleanup time
{cleanup_time_us, {:ok, deleted_count}} = :timer.tc(fn ->
MapPoolState.cleanup_stale_entries()
end)
# Should have deleted at least 100 entries
assert deleted_count >= 100
# Cleanup should be reasonably fast (< 100ms for 100 entries)
assert cleanup_time_us < 100_000,
"Cleanup took #{cleanup_time_us}µs, expected < 100,000µs"
else
:ok
end
end
end
end

View File

@@ -0,0 +1,343 @@
defmodule WandererApp.Map.MapPoolTest do
use ExUnit.Case, async: false
alias WandererApp.Map.{MapPool, MapPoolDynamicSupervisor, Reconciler}
@cache :map_pool_cache
@registry :map_pool_registry
@unique_registry :unique_map_pool_registry
setup do
# Clean up any existing test data
cleanup_test_data()
# Check if required infrastructure is running
registries_running? =
try do
Registry.keys(@registry, self()) != :error
rescue
_ -> false
end
reconciler_running? = Process.whereis(Reconciler) != nil
on_exit(fn ->
cleanup_test_data()
end)
{:ok, registries_running: registries_running?, reconciler_running: reconciler_running?}
end
defp cleanup_test_data do
# Clean up test caches
WandererApp.Cache.delete("started_maps")
Cachex.clear(@cache)
end
describe "garbage collection with synchronous stop" do
@tag :skip
test "garbage collector successfully stops map with synchronous call" do
# This test would require setting up a full map pool with a test map
# Skipping for now as it requires more complex setup with actual map data
:ok
end
@tag :skip
test "garbage collector handles stop failures gracefully" do
# This test would verify error handling when stop fails
:ok
end
end
describe "cache lookup with registry fallback" do
test "stop_map handles cache miss by scanning registry", %{registries_running: registries_running?} do
if registries_running? do
# Setup: Create a map_id that's not in cache but will be found in registry scan
map_id = "test_map_#{:rand.uniform(1_000_000)}"
# Verify cache is empty for this map
assert {:ok, nil} = Cachex.get(@cache, map_id)
# Call stop_map - should handle gracefully with fallback
assert :ok = MapPoolDynamicSupervisor.stop_map(map_id)
else
# Skip test if registries not running
:ok
end
end
test "stop_map handles non-existent pool_uuid in registry", %{registries_running: registries_running?} do
if registries_running? do
map_id = "test_map_#{:rand.uniform(1_000_000)}"
fake_uuid = "fake_uuid_#{:rand.uniform(1_000_000)}"
# Put fake uuid in cache that doesn't exist in registry
Cachex.put(@cache, map_id, fake_uuid)
# Call stop_map - should handle gracefully with fallback
assert :ok = MapPoolDynamicSupervisor.stop_map(map_id)
else
:ok
end
end
test "stop_map updates cache when found via registry scan", %{registries_running: registries_running?} do
if registries_running? do
# This test would require a running pool with registered maps
# For now, we verify the fallback logic doesn't crash
map_id = "test_map_#{:rand.uniform(1_000_000)}"
assert :ok = MapPoolDynamicSupervisor.stop_map(map_id)
else
:ok
end
end
end
describe "state cleanup atomicity" do
@tag :skip
test "rollback occurs when registry update fails" do
# This would require mocking Registry.update_value to fail
# Skipping for now as it requires more complex mocking setup
:ok
end
@tag :skip
test "rollback occurs when cache delete fails" do
# This would require mocking Cachex.del to fail
:ok
end
@tag :skip
test "successful cleanup updates all three state stores" do
# This would verify Registry, Cache, and GenServer state are all updated
:ok
end
end
describe "Reconciler - zombie map detection and cleanup" do
test "reconciler detects zombie maps in started_maps cache", %{reconciler_running: reconciler_running?} do
if reconciler_running? do
# Setup: Add maps to started_maps that aren't in any registry
zombie_map_id = "zombie_map_#{:rand.uniform(1_000_000)}"
WandererApp.Cache.insert_or_update(
"started_maps",
[zombie_map_id],
fn existing -> [zombie_map_id | existing] |> Enum.uniq() end
)
# Get started_maps
{:ok, started_maps} = WandererApp.Cache.lookup("started_maps", [])
assert zombie_map_id in started_maps
# Trigger reconciliation
send(Reconciler, :reconcile)
# Give it time to process
Process.sleep(200)
# Verify zombie was cleaned up
{:ok, started_maps_after} = WandererApp.Cache.lookup("started_maps", [])
refute zombie_map_id in started_maps_after
else
:ok
end
end
test "reconciler cleans up zombie map caches", %{reconciler_running: reconciler_running?} do
if reconciler_running? do
zombie_map_id = "zombie_map_#{:rand.uniform(1_000_000)}"
# Setup zombie state
WandererApp.Cache.insert_or_update(
"started_maps",
[zombie_map_id],
fn existing -> [zombie_map_id | existing] |> Enum.uniq() end
)
WandererApp.Cache.insert("map_#{zombie_map_id}:started", true)
Cachex.put(@cache, zombie_map_id, "fake_uuid")
# Trigger reconciliation
send(Reconciler, :reconcile)
Process.sleep(200)
# Verify all caches cleaned
{:ok, started_maps} = WandererApp.Cache.lookup("started_maps", [])
refute zombie_map_id in started_maps
{:ok, cache_entry} = Cachex.get(@cache, zombie_map_id)
assert cache_entry == nil
else
:ok
end
end
end
describe "Reconciler - orphan map detection and fix" do
@tag :skip
test "reconciler detects orphan maps in registry" do
# This would require setting up a pool with maps in registry
# but not in started_maps cache
:ok
end
@tag :skip
test "reconciler adds orphan maps to started_maps cache" do
# This would verify orphan maps get added to the cache
:ok
end
end
describe "Reconciler - cache inconsistency detection and fix" do
test "reconciler detects map with missing cache entry", %{reconciler_running: reconciler_running?} do
if reconciler_running? do
# This test verifies the reconciler can detect when a map
# is in the registry but has no cache entry
# Since we can't easily set up a full pool, we test the detection logic
map_id = "test_map_#{:rand.uniform(1_000_000)}"
# Ensure no cache entry
Cachex.del(@cache, map_id)
# The reconciler would detect this if the map was in a registry
# For now, we just verify the logic doesn't crash
send(Reconciler, :reconcile)
Process.sleep(200)
# No assertions needed - just verifying no crashes
end
end
test "reconciler detects cache pointing to non-existent pool", %{reconciler_running: reconciler_running?} do
if reconciler_running? do
map_id = "test_map_#{:rand.uniform(1_000_000)}"
fake_uuid = "fake_uuid_#{:rand.uniform(1_000_000)}"
# Put fake uuid in cache
Cachex.put(@cache, map_id, fake_uuid)
# Trigger reconciliation
send(Reconciler, :reconcile)
Process.sleep(200)
# Cache entry should be removed since pool doesn't exist
{:ok, cache_entry} = Cachex.get(@cache, map_id)
assert cache_entry == nil
else
:ok
end
end
end
describe "Reconciler - stats and telemetry" do
test "reconciler emits telemetry events", %{reconciler_running: reconciler_running?} do
if reconciler_running? do
# Setup telemetry handler
test_pid = self()
:telemetry.attach(
"test-reconciliation",
[:wanderer_app, :map, :reconciliation],
fn _event, measurements, _metadata, _config ->
send(test_pid, {:telemetry, measurements})
end,
nil
)
# Trigger reconciliation
send(Reconciler, :reconcile)
Process.sleep(200)
# Should receive telemetry event
assert_receive {:telemetry, measurements}, 500
assert is_integer(measurements.total_started_maps)
assert is_integer(measurements.total_registry_maps)
assert is_integer(measurements.zombie_maps)
assert is_integer(measurements.orphan_maps)
assert is_integer(measurements.cache_inconsistencies)
# Cleanup
:telemetry.detach("test-reconciliation")
else
:ok
end
end
end
describe "Reconciler - manual trigger" do
test "trigger_reconciliation runs reconciliation immediately", %{reconciler_running: reconciler_running?} do
if reconciler_running? do
zombie_map_id = "zombie_map_#{:rand.uniform(1_000_000)}"
# Setup zombie state
WandererApp.Cache.insert_or_update(
"started_maps",
[zombie_map_id],
fn existing -> [zombie_map_id | existing] |> Enum.uniq() end
)
# Verify it exists
{:ok, started_maps_before} = WandererApp.Cache.lookup("started_maps", [])
assert zombie_map_id in started_maps_before
# Trigger manual reconciliation
Reconciler.trigger_reconciliation()
Process.sleep(200)
# Verify zombie was cleaned up
{:ok, started_maps_after} = WandererApp.Cache.lookup("started_maps", [])
refute zombie_map_id in started_maps_after
else
:ok
end
end
end
describe "edge cases and error handling" do
test "stop_map with cache error returns ok", %{registries_running: registries_running?} do
if registries_running? do
map_id = "test_map_#{:rand.uniform(1_000_000)}"
# Even if cache operations fail, should return :ok
assert :ok = MapPoolDynamicSupervisor.stop_map(map_id)
else
:ok
end
end
test "reconciler handles empty registries gracefully", %{reconciler_running: reconciler_running?} do
if reconciler_running? do
# Clear everything
cleanup_test_data()
# Should not crash even with empty data
send(Reconciler, :reconcile)
Process.sleep(200)
# No assertions - just verifying no crash
assert true
else
:ok
end
end
test "reconciler handles nil values in caches", %{reconciler_running: reconciler_running?} do
if reconciler_running? do
map_id = "test_map_#{:rand.uniform(1_000_000)}"
# Explicitly set nil
Cachex.put(@cache, map_id, nil)
# Should handle gracefully
send(Reconciler, :reconcile)
Process.sleep(200)
assert true
else
:ok
end
end
end
end

View File

@@ -0,0 +1,320 @@
defmodule WandererApp.Map.SlugUniquenessTest do
@moduledoc """
Tests for map slug uniqueness constraints and handling.
These tests verify that:
1. Database unique constraint is enforced
2. Application-level slug generation handles uniqueness
3. Concurrent map creation doesn't create duplicates
4. Error handling works correctly for slug conflicts
"""
use WandererApp.DataCase, async: false
alias WandererApp.Api.Map
require Logger
describe "slug uniqueness constraint" do
setup do
# Create a test user
user = create_test_user()
%{user: user}
end
test "prevents duplicate slugs via database constraint", %{user: user} do
# Create first map with a specific slug
{:ok, _map1} =
Map.new(%{
name: "Test Map",
slug: "test-map",
owner_id: user.id,
description: "First map",
scope: "wormholes"
})
# Attempt to create second map with same slug by bypassing Ash slug generation
# This simulates a race condition where slug generation passes but DB insert fails
result =
Map.new(%{
name: "Different Name",
slug: "test-map",
owner_id: user.id,
description: "Second map",
scope: "wormholes"
})
# Should get a unique constraint error from database
assert {:error, _error} = result
end
test "automatically increments slug when duplicate detected", %{user: user} do
# Create first map
{:ok, map1} =
Map.new(%{
name: "Test Map",
slug: "test-map",
owner_id: user.id,
description: "First map",
scope: "wormholes"
})
assert map1.slug == "test-map"
# Create second map with same name (should auto-increment slug)
{:ok, map2} =
Map.new(%{
name: "Test Map",
slug: "test-map",
owner_id: user.id,
description: "Second map",
scope: "wormholes"
})
# Slug should be automatically incremented
assert map2.slug == "test-map-2"
# Create third map with same name
{:ok, map3} =
Map.new(%{
name: "Test Map",
slug: "test-map",
owner_id: user.id,
description: "Third map",
scope: "wormholes"
})
assert map3.slug == "test-map-3"
end
test "handles many maps with similar names", %{user: user} do
# Create 10 maps with the same base slug
maps =
for i <- 1..10 do
{:ok, map} =
Map.new(%{
name: "Popular Name",
slug: "popular-name",
owner_id: user.id,
description: "Map #{i}",
scope: "wormholes"
})
map
end
# Verify all slugs are unique
slugs = Enum.map(maps, & &1.slug)
assert length(Enum.uniq(slugs)) == 10
# First should keep the base slug
assert List.first(maps).slug == "popular-name"
# Others should be numbered
assert "popular-name-2" in slugs
assert "popular-name-10" in slugs
end
end
describe "concurrent slug creation (race condition)" do
setup do
user = create_test_user()
%{user: user}
end
@tag :slow
test "handles concurrent map creation with identical slugs", %{user: user} do
# Create 5 concurrent map creation requests with the same slug
tasks =
for i <- 1..5 do
Task.async(fn ->
Map.new(%{
name: "Concurrent Test",
slug: "concurrent-test",
owner_id: user.id,
description: "Concurrent map #{i}",
scope: "wormholes"
})
end)
end
# Wait for all tasks to complete
results = Task.await_many(tasks, 10_000)
# All should either succeed or fail gracefully (no crashes)
assert length(results) == 5
# Get successful results
successful = Enum.filter(results, &match?({:ok, _}, &1))
failed = Enum.filter(results, &match?({:error, _}, &1))
# At least some should succeed
assert length(successful) > 0
# Extract maps from successful results
maps = Enum.map(successful, fn {:ok, map} -> map end)
# Verify all successful maps have unique slugs
slugs = Enum.map(maps, & &1.slug)
assert length(Enum.uniq(slugs)) == length(slugs), "All successful maps should have unique slugs"
# Log results for visibility
Logger.info("Concurrent test: #{length(successful)} succeeded, #{length(failed)} failed")
Logger.info("Unique slugs created: #{inspect(slugs)}")
end
@tag :slow
test "concurrent creation with different names creates different base slugs", %{user: user} do
# Create concurrent requests with different names (should all succeed)
tasks =
for i <- 1..5 do
Task.async(fn ->
Map.new(%{
name: "Concurrent Map #{i}",
slug: "concurrent-map-#{i}",
owner_id: user.id,
description: "Map #{i}",
scope: "wormholes"
})
end)
end
results = Task.await_many(tasks, 10_000)
# All should succeed
assert Enum.all?(results, &match?({:ok, _}, &1))
# All should have different slugs
slugs = Enum.map(results, fn {:ok, map} -> map.slug end)
assert length(Enum.uniq(slugs)) == 5
end
end
describe "slug generation edge cases" do
setup do
user = create_test_user()
%{user: user}
end
test "handles very long slugs", %{user: user} do
# Create map with name that would generate very long slug
long_name = String.duplicate("a", 100)
{:ok, map} =
Map.new(%{
name: long_name,
slug: long_name,
owner_id: user.id,
description: "Long name test",
scope: "wormholes"
})
# Slug should be truncated to max length (40 chars based on map.ex constraints)
assert String.length(map.slug) <= 40
end
test "handles special characters in slugs", %{user: user} do
# Test that special characters are properly slugified
{:ok, map} =
Map.new(%{
name: "Test: Map & Name!",
slug: "test-map-name",
owner_id: user.id,
description: "Special chars test",
scope: "wormholes"
})
# Slug should only contain allowed characters
assert map.slug =~ ~r/^[a-z0-9-]+$/
end
end
describe "slug update operations" do
setup do
user = create_test_user()
{:ok, map} =
Map.new(%{
name: "Original Map",
slug: "original-map",
owner_id: user.id,
description: "Original",
scope: "wormholes"
})
%{user: user, map: map}
end
test "updating map with same slug succeeds", %{map: map} do
# Update other fields, keep same slug
result =
Map.update(map, %{
description: "Updated description",
slug: "original-map"
})
assert {:ok, updated_map} = result
assert updated_map.slug == "original-map"
assert updated_map.description == "Updated description"
end
test "updating to conflicting slug is handled", %{user: user, map: map} do
# Create another map
{:ok, _other_map} =
Map.new(%{
name: "Other Map",
slug: "other-map",
owner_id: user.id,
description: "Other",
scope: "wormholes"
})
# Try to update first map to use other map's slug
result =
Map.update(map, %{
slug: "other-map"
})
# Should either fail or auto-increment
case result do
{:ok, updated_map} ->
# If successful, slug should be different
assert updated_map.slug != "other-map"
assert updated_map.slug =~ ~r/^other-map-\d+$/
{:error, _} ->
# Or it can fail with validation error
:ok
end
end
end
describe "get_map_by_slug with duplicates" do
setup do
user = create_test_user()
%{user: user}
end
test "get_map_by_slug! raises on duplicates if they exist" do
# Note: This test documents the behavior when duplicates somehow exist
# In production, this should be prevented by our fixes
# If duplicates exist (data integrity issue), the query should fail
# This is a documentation test - we can't easily create duplicates
# due to the database constraint, but we document expected behavior
assert true
end
end
# Helper functions
defp create_test_user do
# Create a test user with necessary attributes
{:ok, user} =
WandererApp.Api.User.new(%{
name: "Test User #{:rand.uniform(10_000)}",
eve_id: :rand.uniform(100_000_000)
})
user
end
end

View File

@@ -0,0 +1,84 @@
defmodule WandererApp.User.ActivityTrackerTest do
use WandererApp.DataCase, async: false
alias WandererApp.User.ActivityTracker
describe "track_map_event/2" do
test "returns {:ok, result} on success" do
# This test verifies the happy path
# In real scenarios, this would succeed when creating a new activity record
assert {:ok, _} = ActivityTracker.track_map_event(:test_event, %{})
end
test "returns {:ok, nil} on error without crashing" do
# This simulates the scenario where tracking fails (e.g., unique constraint violation)
# The function should handle the error gracefully and return {:ok, nil}
# Note: In actual implementation, this would catch errors from:
# - Unique constraint violations
# - Database connection issues
# - Invalid data
# The key requirement is that it NEVER crashes the calling code
result = ActivityTracker.track_map_event(:map_connection_added, %{
character_id: nil, # This will cause the function to skip tracking
user_id: nil,
map_id: nil
})
# Should return success even when input is incomplete
assert {:ok, _} = result
end
test "handles errors gracefully and logs them" do
# Verify that errors are logged for observability
# This is important for monitoring and debugging
# The function should complete without raising even with incomplete data
assert {:ok, _} = ActivityTracker.track_map_event(:map_connection_added, %{
character_id: nil,
user_id: nil,
map_id: nil
})
end
end
describe "track_acl_event/2" do
test "returns {:ok, result} on success" do
assert {:ok, _} = ActivityTracker.track_acl_event(:test_event, %{})
end
test "returns {:ok, nil} on error without crashing" do
result = ActivityTracker.track_acl_event(:map_acl_added, %{
user_id: nil,
acl_id: nil
})
assert {:ok, _} = result
end
end
describe "error resilience" do
test "always returns success tuple even on internal errors" do
# The key guarantee is that activity tracking never crashes calling code
# Even if the internal tracking fails (e.g., unique constraint violation),
# the wrapper ensures a success tuple is returned
# This test verifies that the function signature guarantees {:ok, _}
# regardless of internal errors
# Test with nil values (which will fail validation)
assert {:ok, _} = ActivityTracker.track_map_event(:test_event, %{
character_id: nil,
user_id: nil,
map_id: nil
})
# Test with empty map (which will fail validation)
assert {:ok, _} = ActivityTracker.track_map_event(:test_event, %{})
# The guarantee is: no matter what, it returns {:ok, _}
# This prevents MatchError crashes in calling code
end
end
end